Email updates

Keep up to date with the latest news and content from Journal of Clinical Bioinformatics and BioMed Central.

Open Access Highly Accessed Database

Huvariome: a web server resource of whole genome next-generation sequencing allelic frequencies to aid in pathological candidate gene selection

Andrew Stubbs1*, Elizabeth A McClellan1, Sebastiaan Horsman1, Saskia D Hiltemann12, Ivo Palli1, Stephan Nouwens1, Anton HJ Koning1, Frits Hoogland3, Joke Reumers4, Daphne Heijsman1, Sigrid Swagemakers1, Andreas Kremer1, Jules Meijerink5, Diether Lambrechts4 and Peter J van der Spek1

Author Affiliations

1 Department of Bioinformatics, Erasmus University Medical Center, Molewaterplein 50, Rotterdam, The Netherlands

2 Department of Urology, Erasmus University Medical Center, Molewaterplein 50, Rotterdam, The Netherlands

3 VX Company IT Services, Baarnsche dijk 8, 3741, LR, Baarn, The Netherlands

4 Vesalius Research Center, VIB and University of Leuven, Gasthuisberg Herestraat 49, 3000, Leuven, Belgium

5 Department of Pediatric Oncology, Sophia Children’s Hospital, Erasmus University Medical Center, Molewaterplein 50, Rotterdam, The Netherlands

For all author emails, please log on.

Journal of Clinical Bioinformatics 2012, 2:19  doi:10.1186/2043-9113-2-19

Published: 19 November 2012

Abstract

Background

Next generation sequencing provides clinical research scientists with direct read out of innumerable variants, including personal, pathological and common benign variants. The aim of resequencing studies is to determine the candidate pathogenic variants from individual genomes, or from family-based or tumor/normal genome comparisons. Whilst the use of appropriate controls within the experimental design will minimize the number of false positive variations selected, this number can be reduced further with the use of high quality whole genome reference data to minimize false positives variants prior to candidate gene selection. In addition the use of platform related sequencing error models can help in the recovery of ambiguous genotypes from lower coverage data.

Description

We have developed a whole genome database of human genetic variations, Huvariome, determined by whole genome deep sequencing data with high coverage and low error rates. The database was designed to be sequencing technology independent but is currently populated with 165 individual whole genomes consisting of small pedigrees and matched tumor/normal samples sequenced with the Complete Genomics sequencing platform. Common variants have been determined for a Benelux population cohort and represented as genotypes alongside the results of two sets of control data (73 of the 165 genomes), Huvariome Core which comprises 31 healthy individuals from the Benelux region, and Diversity Panel consisting of 46 healthy individuals representing 10 different populations and 21 samples in three Pedigrees. Users can query the database by gene or position via a web interface and the results are displayed as the frequency of the variations as detected in the datasets. We demonstrate that Huvariome can provide accurate reference allele frequencies to disambiguate sequencing inconsistencies produced in resequencing experiments. Huvariome has been used to support the selection of candidate cardiomyopathy related genes which have a homozygous genotype in the reference cohorts. This database allows the users to see which selected variants are common variants (> 5% minor allele frequency) in the Huvariome core samples, thus aiding in the selection of potentially pathogenic variants by filtering out common variants that are not listed in one of the other public genomic variation databases. The no-call rate and the accuracy of allele calling in Huvariome provides the user with the possibility of identifying platform dependent errors associated with specific regions of the human genome.

Conclusion

Huvariome is a simple to use resource for validation of resequencing results obtained by NGS experiments. The high sequence coverage and low error rates provide scientists with the ability to remove false positive results from pedigree studies. Results are returned via a web interface that displays location-based genetic variation frequency, impact on protein function, association with known genetic variations and a quality score of the variation base derived from Huvariome Core and the Diversity Panel data. These results may be used to identify and prioritize rare variants that, for example, might be disease relevant. In testing the accuracy of the Huvariome database, alleles of a selection of ambiguously called coding single nucleotide variants were successfully predicted in all cases. Data protection of individuals is ensured by restricted access to patient derived genomes from the host institution which is relevant for future molecular diagnostics.

Keywords:
Medical genetics; Medical genomics; Whole genome sequencing; Allele frequency; Cardiomyopathy