This page lists the SNPs in SNPedia present on the Illumina 1M array, used by deCodeMe.
This video explores the results website.
Example Promethease reports for this company
Customers can now download their raw data. The raw data is delivered in a compressed ZIP file (about 9 Mb).
unzip -t MyName.zip Archive: MyName.zip testing: deCODEme_sign.txt OK testing: deCODEme_info.txt OK testing: deCODEme_scan.csv OK No errors detected in compressed data of MyName.zip.
unzip -c MyName.zip deCODEme_scan.csv | head -10 Archive: MyName.zip inflating: deCODEme_scan.csv Name,Variation,Chromosome,Position,Strand,YourCode rs4477212,A/G,1,72017,+,AA rs2185539,C/T,1,556738,+,CC rs6681105,C/T,1,581938,+,TT rs11240767,C/T,1,718814,+,CC rs3094315,C/T,1,742429,-,TT rs3131972,C/T,1,742584,-,CC rs3131969,C/T,1,744045,-,CC rs17162846,A/G,1,1673425,+,-- rs34686476,C/T,Y,22308921,+,CC rs9786018,A/G,Y,22309830,+,AA rs4144073,G/T,Y,22368737,+,TT rs3923607,A/G,X,2197927,+,AG rs34605807,G/T,X,2204282,+,GT rs35178888,A/G,X,2220071,+,AG rs5951636,C/T,X,21751554,+,TT rs5951469,C/T,X,21758529,+,CC rs6528054,C/T,X,21761263,+,CC MitoC16329T,C/T,M,16327,+,CC MitoG16392A,A/G,M,16390,+,GG MitoG16393A,A/G,M,16391,+,GG ...
Where "Name" is the official symbol of the SNP taken from dbSNP (www.ncbi.nlm.nih.gov, and the Cambridge reference sequence, Public SNP source, dbSNP 128);
Where "Variation" are the nucleotides (A, C, G, T, or --) at a the particular SNP location. There will generally be only two alternatives;
Where "Chromosome" and "Position" are the physical location of the SNP. There are 22 autosomes (1 through 22), the X chromosome, the Y chromosome, and the mitochondrial chromosome ("Mito"). All position values are taken from NCBI Build 36. Note that the human genome is periodically "reassembled" and the precise locations of SNPs will vary over the next few years. However, the names/symbols of the SNPs should not change;
Where "strand" is either a plus "+" or a minus "-" depending on the reading orientation used to define the SNP. This can be important information;
Where "YourCode" provides the two genotypes for the sample that were submitted to deCodeme. Males have only a single X chromosome genotype and only a single Y chromosome genotypes but deCODEme still lists two genotypes. This is also the case for the mitochondrial SNPs. Female samples should typically have few if any successful genotypes on the Y chromosomes. They will usually be scored "--".
Correctness of data and error rates: deCODEme does not provide any summary data the quality of the sample or the quality of the individual genotypes. There is no way to distinguish between a deletion and an unreadable genotype. Both are apparently scored with a - hyphen/minus character.
The files from deCODEme also include a short INFO file that has this format:
Format version,1 User name,DCMEXXXXXX Date,Feb 25 2008 SNP count,1013349 Reference sequence,NCBI Build 36 and Cambridge reference sequence Public SNP source,dbSNP 128
GENERAL EXPLANATION and COMMENTS on these DeCODEme files:
Each data set consists of 1,013,349 SNP genotypes that you can download as a comma-separated-value (csv) text file (as shown above). These files can open be opened by most common word processor programs (Microsoft Word, for example). The specific number of SNPs listed in the INFO.txt file is actually the maximum number of SNP assays provided by the Illumina SNP platform. It is not the number of genotypes obtained from a specific sample. Deletions are marked -- in the file (for example the Y chromosome for a female subject). There do not appear to be any hemizygous genotypes (male X chromosome, or single chromosomal deletions). The data are output in chromosomal order, starting from Chr 1 through Chr 22, followed by X, Y, and the mitochondrion. The output file format is structured such that all genotypes are given as diploid, even the male X, the male Y, and the mitochondrion. (RWW)
The Illumina web site emphasizes that the SNP array used by deCODEme generated extensive data on copy number variants (CNV) and contains 52,167 new markers designed specifically to interrogate nearly 9,000 CNV region not currently available in public databases. It is not clear whether these CNV probes are provided in the deCODEme file (probably not). The Illumina site indicates that there are a total of just over 1,070,000 usable markers on the array. This number is about 56,650 higher than that provided by deCODEme in their ZIP file. Thus the deficit may include all of the new CNV probes, as well as a few thousand control probes.
You should probably download data sets that belong to you and store them safely. Burn a CD. If you are concerned about privacy, investigate ways to store your data with encryption. Note, that for archival family purposes, the longevity of certain types of media (CDs, DVDs, disk drives, memory sticks) is often surprisingly short (<10 years). The issue of how to pass down digital records of family genotypes for the ages (similar to records of dates and places of births, marriages, and death) has not yet be effectively addressed, let alone solved. Activities of large organizations with an interest in archiving data (e.g., The Library of Congress, the Church of the Latter Day Saints, and the British Museum) may be helpful (see Strategies for long-term data retention, 2007).
The million SNP question: What to do with these data. For genealogical analyses these SNP data can definitely be put to great use. If checking and extending your family history and background was one factor that motivated you to get the test, then you should soon be able to exploit several on-line analysis tools (the deCODEme Genome Browser is a great starting place). For example, we are looking for a Native American pattern of SNPs (a haplotype) in deCODEme files. We soon expect to have reference data that will make it possible to compare with deCODEme files.
[Update, June 6, 2008: The deCODEme Genome Browser is an impressive SNP and Genome visualization tool. This web-Java application downloads your genome file to your machine and then displays SNPs along with other data using in an attractive genome browser graphic interface. Use it in combination with SNPedia.]
COMMENTS ABOUT YOU, YOUR SNPs, and HEALTH CARE
For health care planning, the deCODEme and other SNP data sets from 23andMe and Navigenics can currently be used for relatively simple genotype-phenotype comparisons. The Promethease program that is now offered by SNPedia is often a better solution for the analysis of the SNP data than that proved by deCODEme or 23and Me, and the price is right.
Keep in mind that the science that relates your particular set of sequence variants (SNPs, CNVs etc.) to your individual risk of disease is immature. But prospects are improving rapidly thanks to a myriad so-called genome-wide association studies (GWAS). There is also a great deal of progress being made using animal models (mice, rats, flies, monkeys, and worms) to understand gene and SNP function. The experimental biology that would give you a good idea of the best course of action given your particular genome and given your particular environment is undeveloped. Some geneticists would argue that we don't even know what we don't know.
More effective individualized health care will require a better understanding of our own genomes and our interactions with the environment. This next stage of medical care--personalized, predictive, preventative, and participatory--will also require an improved social context designed with proactive health care in mind. We will need well informed care providers and consultants who understand the details and subtleties of genetic analysis. Even more important, each of us "patients" will need to become more responsible for our own health and welfare (participatory). It does us no good to know about risk of disease if this information is not be linked to effective and persistent actions--actions that respects our rights and beliefs and that also improve quality of life given the reality of our physical, economic, and social environment. Genetic data of the type we are now being given is a potentially powerful preventive medicine, but only if self-prescribed correctly. Doing so will require more than the usual will power and thought. Its an obvious truism that each of us needs to be motivated to take more responsibility for our health. Genetic testing provides yet one more impetus to do what we probably already know are the right things to do.
A CAUTION ON INTERPRETATION: Here is one thing that you should keep in mind when interpreting SNP data in SNPpedia: You are not a SNP, and the SNP statistics that you encounter in web sites at deCODEme, 23andMe, SNPedia, and elsewhere do not apply to single humans; they apply to large populations of humans. All of the studies are done by determining if the two alleles of a SNP are associated with differences in disease risk or other phenotypes across large populations of humans. In essence, the SNP is the "individual." But you are actually far more interested in a different question. What you want to know is given your genome, and given your environment, and given your particular collection of SNPs, what is the probability that you will develop a disease or have a particular trait? For most important diseases, we do not yet know how to answer this question. It is still hard to answer this question even using mice, a mammal in which we can control and change the environment precisely, and with which we can study many genetically identical animals (essentially clones of animals).
To answer this question about you and your collection of SNPs with compelling and relevant statistics we would ideally have to study a few hundred clones of you. Ideally we would actually have two populations of your clones--a few hundred with each SNP (or combinations of SNPs) in one state, and another few hundred "almost clones" with SNPs is the other state. Then we could actually tell you what the influence of a particular SNP (or combination of SNPs) would be, assuming all environmental factors were equal. In the near future, we may be able to undertake variants of these kinds of studies with clones of cells from your skin or blood, but this will only be useful for a limited number of traits, most of which probably will not interest you too much.
Does that mean that the SNP data are useless? No, not at all. Certain SNPs have exceptionally strong effects on traits and disease risk, and the distinction drawn above becomes almost moot. A few examples include SNPs within genes such as CCR5 for AIDS resistance and FUT2 for the Norwalk (norovirus, cruise ship disease) virus resistance [PMID 12692541]). If you inherit a SNP (or a set of SNPs) that is almost always associated with disease risk or resistance then that SNP can have real predictive utility. It does not tell you that you are "fated" but it can tell you that you may want to seriously considering taking some actions to modify aspects of your environment or exposure. For example, if you have the right FUT2 SNPs (rs601338 A;A) it can mean that you are particularly resistant to the gastrointestinal disease associated with noravirus infection.
ON FALSE POSITIVE RESULTS. Of course it is nice to know what diseases you are unlikely to come down with, but most of us who get tested are much more concerned about the diseases for which we have unusually high risk--particularly heart disease and Alzheimer's. If you find that a SNPedia Promethease analysis of your SNPs suggests you have unusually high risk for disease X do not get too worked up about it. There is a pretty high chance that your high relative risk is a False Positive result. For example, I just ran of Promethease scan of a a first-degree relative. He had a 10-fold relative risk for Alzheimer's relative based on a collection of SNPs, yet made it to 82 years of age, sharp as a tack. He also had high risk for heart disease, yet had absolutely no cardiovascular symptoms. The final disease for which he had high relative risk, prostate cancer, was unfortunately not a false positive.
If you find yourself in this situation with a high risk for a disease, by all means consult an expert (a real expert). And please be aware that "excessive testing" is itself risky, since a false positive finding can result is more tests, more treatment, more costs, and surgery, all of which is itself obviously risky and stressful. In simple terms, the treatment may be far worse than the supposed disease. Doctors are not rewarded for NOT treating disease (they tend to get sued for malpractice if they do not treat), so they are often the worst individuals to ask about whether treatment is statistically appropriate given a high false discovery rate. You probably will need advice from a statistical geneticists, although the final decision (to test or not to test; to treat or not to treat) is yours. The risks associated with increased testing are covered in an interesting review: "The incidentalome: a threat to genomic medicine" by Isaac Kohane, Dan Masys, and Russ Altman (2006, [PMID 16835427]).
The truth is that there are actually few SNPs with strong predictive utility. Most SNPs only tell you that in a large human sample, those groups with one of the alleles have a 1.5X higher burden of some disease. That may sound like a lot, but it is actually a comparatively small effect. Even in the "risk" group there will be enormous variation in traits and disease onset and severity. In other words, the SNP may account for less than 1-2% of the risk of a disease. SNPs are even pretty poor at predicting hair and eye color.
However, having these data can still help put you on the right path and encourage you to take a generically good course of action (yes, you could have done this even without the SNPs). If the relative risk of a large population of humans with a SNP associated with heart disease is 1.5X then maybe this will motivate you even more to drop some weight and exercise more. Usually we make these decisions without benefit of a genetic context. So for $500 - $1000 you now have an empirical basis to nudge your behavior in a direction that you probably already suspected would be in your best interest.
Additional deCODEme tests and SNPs
Note, that the use and interpretation of these genetic tests are controversial. There is considerable difficulty in diagnosing early stages of breast cancer. Over-treatment can be just as serious a problem as failure to treat. Any relative risk calculation is subject to significant error when applied to an individual case.
Mar 2009 deCODEme reported the following extra SNPs for a customer
- rs3087243 Type-1 diabetes
- rs864745 Type-2 diabetes
- rs10757278 Abdominal aortic aneurysm
- rs7961581 Type-2 diabetes
- rs429358 Alzheimer's disease
Available using the deCODE Cancer Scan
Available via the deCODE Cardio Scan