Extended automated NGS pipeline for reproducible analysis of mosquito genomes
Abstract
Over the past decade, the decreasing costs of sequencing have facilitated the agnostic interrogation of vector genomes, giving access to an ever-expanding volume of high quality genomic and transcriptomic data. Today, the real challenge at hand is the conversion of the vast amounts of raw genome sequences generated at lower costs and in shorter time frames into useful biological insights such as insecticide resistance, mutations and genetic relationship. We present an open-source, validated, automated, and scalable NGS pipeline based on the snakemake framework implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols and best coding practices. The workflow allows the user to perform quality control, maps reads to a reference genome of choice for variant detection and provide interactive single nucleotide polymorphism (SNP) visualization across genomes in the data set. We demonstrate utility of the workflow by genotyping SNPs in select Ag1000G samples of the major malaria vector, Anopheles gambiae