Article Text

Download PDFPDF

Human Gene Mutation Database: towards a comprehensive central mutation database
  1. P D Stenson,
  2. E Ball,
  3. K Howells,
  4. A Phillips,
  5. M Mort,
  6. D N Cooper
  1. Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff, UK
  1. P Stenson, Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF15 8DW, UK; StensonPD{at}cardiff.ac.uk

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

We write in response to a number of very specific criticisms of the Human Gene Mutation Database (HGMD; www.hgmd.org) made in the recently published article by George et al (see review on p 65).1 All seven claims made were amenable to empirical testing. Having tested these claims, we find all of them to be either false or highly misleading. In the text that follows, we refute or rebut each claim in turn.

HGMD represents an attempt to collate known (published) gene lesions responsible for human inherited disease. HGMD comprises various types of germ-line mutation within the coding, splicing and regulatory regions of human nuclear genes. HGMD currently (1 December 2007) contains 76 011 different mutations in 2876 human genes.

Claim 1: 143 genes are present in OMIM but have no corresponding HGMD entry Response—This claim is wholly misleading. OMIM (Online Mendelian Inheritance in Man) records many types of gene mutation which HGMD does not, such as somatic lesions, neutral polymorphisms and mitochondrial mutations. It is therefore to be expected that there will be some entries in OMIM that do not have a corresponding HGMD entry. We received the list of 143 genes from George et al and performed our own analysis. After careful comparison with OMIM, we found the following:

  • 33 genes (23.1%) contain only somatic allelic variants (for example, LEF1)

  • 29 genes (20.3%) contain variants exclusively from the mitochondrial genome (for example, MTCO2)

  • 8 genes (5.6%) contain exclusively normal polymorphic protein variants with no known disease association (for example, GPT)

  • 13 genes (9.1%) were actually present in HGMD at the time of the study, but, for whatever reason, had not been found by George et al (for example, CD2AP)

  • 12 genes (8.4%) were misidentified by George et al as having allelic variants in OMIM when they did not (or no longer have) (for example, FUZ)

  • 5 genes (3.5%) contained “disease-associated variants” whose accompanying information was deemed to be in some way of insufficient quality to allow these to be entered into HGMD (for example, GABRA2)

  • 43 genes (30%) were entered into HGMD after the George et al study was performed (for example, SAMD9).

George et al could legitimately have claimed that HGMD was missing 43 (not 143) genes on the basis of their study data. However, it should be noted that all but three of these 43 genes were entered into HGMD at the latest by the end of February 2007 (within 4 months of the stated date of the George et al study). The remaining three genes were entered more recently. Thus, far from being an indictment of HGMD content, the analysis of George et al would appear to confirm what we believe to be HGMD’s very high degree of efficiency in incorporating pathological mutations responsible for causing human genetic disease.

Claim 2: 226 genes in OMIM contain more mutation entries than HGMD Response—This claim is false. George et al provided HGMD with a list of the 226 genes which we have carefully reviewed. As 143 of these genes were present in this list simply due to their initial inclusion in the previous list (reviewed in claim 1), we discounted these for the purposes of this analysis. We were therefore left with 83 genes to check. With respect to these 83 genes, we found the following:

  • 23 (27.7%) entries contained additional somatic allelic variants (for example, AXIN2)

  • 10 (12%) entries contained additional polymorphic variants or haplotypes (for example, CCR5)

  • 5 (6%) entries (all globin genes) contained additional non-disease associated variants (for example, HBA1)

  • 17 (20.5%) entries actually had an equal number or fewer mutations than HGMD (for example, HAL)

  • 18 (21.7%) entries were added to HGMD after the George et al study was completed (for example, OTOF)

  • 10 (12.1%) entries contained data which were indeed missing from HGMD (for example, CFD).

Consequently, George et al could, on the basis of their study data, legitimately have claimed that, at the time of writing, there were 28 (not 226) genes present in OMIM with more allelic variants than HGMD. The mutations from 18 of these genes were, however, added very shortly after the study of George et al was concluded. The remaining 10 genes contained around 18 mutations which were inadvertently omitted from HGMD. Thanks, however, to George et al and the prior efforts of the OMIM curators, the missing mutation data for these 10 genes have now been included in HGMD.

In their analysis, George et al ignored several categories of mutation present in HGMD (small and gross deletions, insertions and indels, complex rearrangements and repeat variations). These categories contain significant (24 461 in HGMD Professional release 7.4) numbers of mutations. To ignore them in the published analysis was highly misleading and would have inevitably led to erroneous conclusions being drawn (due to an apparent failure to compare like with like). A good example of the type of error made is provided by the C6 gene. This gene currently has four allelic variants listed in OMIM (plus one neutral polymorphism). The HGMD entry for C6 has nine mutations listed (six at the time of the published study), yet according to George et al, HGMD had fewer mutation entries than OMIM for this gene.

Claim 3: Many mutations are missing from HGMD that were published in the journal Human Mutation Response—This claim is highly misleading. These “missing” mutations are listed in supplementary table 2 of the George et al paper. We have carefully reviewed the data in this table and have concluded that HGMD is not missing any of the mutations that the authors claim. All but four of the disease-causing inherited lesions listed with either “no”, “not in website” or “unable to determine” had been entered into HGMD within 1–2 months of their publication in the Human Mutation issue cited in the table. Four others were entered later, but would have certainly been present at the time of the study. George et al had access to HGMD Professional and could easily have obtained data entry dates from this version of HGMD had they been able to locate the listed mutations. Several mutations had already been described in the literature before their publication as “novel” in Human Mutation (for example, PAX6 1410delC). In accordance with HGMD policy, the earlier paper was given priority as the reference to be cited rather than the subsequent report in Human Mutation. George et al also listed many neutral and somatic variants in their table as if they had expected to find these data in HGMD (for example, ATM c.185+78A>G and ANP32C g.4870T>C). It is quite apparent to us that George et al have displayed a complete lack of understanding of the nature of the data that HGMD seeks to collate, a serious inability to interpret published mutation data and/or an inability to undertake basic data searching and retrieval from HGMD.

Claim 4: R158Q in PAH is in error This claim is incorrect. The reference cited by both HGMD and OMIM (Dworniczak et al2) contained an error, in that the G>A base change reported would have given rise to R158Q and not R158E as described. As part of our curation process, we corrected this error (and the curators of OMIM have done likewise). It is noteworthy that the reference given by George et al for this mutation (Hennermann et al3) does not actually claim that this lesion was novel to their study.

Claim 5: HGMD is missing two specific genes (COL9A1 and PTCH2) Response—This claim is incorrect. COL9A1 has been present in HGMD since 2001 with one small insertion mutation logged at that time. Since George et al elected to utilise only single base-pair substitutions in their analyses (thereby ignoring approximately one third of the HGMD dataset), this entry was missed (a nonsense mutation (R272X) was added to this entry shortly after the George et al study was concluded). The question should be again raised as to why the authors chose to ignore HGMD micro-insertions, microdeletions, indels, gross lesions and repeat variations in their analyses, thereby excluding some 24 461 different human gene lesions and 271 genes logged in HGMD with only these categories of mutation. The second gene that was claimed to be “missing” from HGMD was PTCH2. However, the two allelic variants listed in OMIM for this gene are both somatic and so HGMD did not “miss” this gene at all, since HGMD only includes heritable lesions.

Claim 6: Patchy coverage of gene and mutation data in HGMD Response—This assertion was made on the basis of a study that appears to be deeply flawed, methodologically and statistically. The authors seem to have little appreciation or understanding of the types of mutation data recorded by either OMIM or HGMD. Once again, and for whatever reason, the authors excluded 24 461 mutations (almost one third of HGMD data) from their analysis. They then have the temerity to criticise HGMD for patchy coverage!

Claim 7: The authors claim no competing interests Response—In our opinion, this claim is hard to justify. Several of the authors of the George et al paper are currently seeking substantial funding to set up from scratch a new and all embracing human variation/mutation database. Since HGMD is in practice the only comprehensive central repository for human gene mutations in existence, their comparative “analysis” of HGMD data should at the very least, in our view, have been accompanied by a clear statement of the potential conflict of interest inherent in their critical conclusions. It is quite disingenuous for the authors to claim otherwise.

In summary, in a deeply flawed study, George et al have drawn numerous incorrect or misleading conclusions with respect to HGMD, its remit, content and coverage. Their study represents a graphic example of how over-reliance on automated text mining, a reluctance to attempt any independent verification of their initial findings and an apparent lack of knowledge of the mutation databases they were analysing, can combine together to yield wholly erroneous conclusions. We are not in any way resistant to the idea of data quality assessment, but any such assessment should at the very least adhere to certain basic analytical standards and ought to be carried out in a proper scientific manner. We were not contacted prior to this article being accepted for publication. Had we been asked to comment, we could have easily cleared up the many inaccuracies and misinterpretations that litter the George et al paper. Having said this, however, it is unclear whether the authors would then have been able to draw any meaningful conclusions other than that HGMD has succeeded in providing fairly comprehensive coverage of its target data viz mutations in human nuclear genes causing inherited human disease. Thus, it would appear that far from providing evidence for the shortcomings of a central mutation database, George et al have inadvertently succeeded in demonstrating that HGMD fulfils this role exceptionally well.

REFERENCES

Footnotes

  • Competing interests: While 75% of HGMD data are freely available to registered users via the HGMD website (www.hgmd.org), HGMD Professional, which includes a suite of advanced analytical/search tools, is only available via subscription through our commercial partner, BIOBASE GmbH (www.biobase.de).

Linked Articles

  • Review
    R A George T D Smith S Callaghan L Hardman C Pierides O Horaitis M A Wouters R G H Cotton