Have questions? Visit https://www.reddit.com/r/SNPedia

User talk:J1

From SNPedia

A magazine reporter working on an upcoming article would like to speak with one or more dedicated SNPedia contributors. We appreciate your involvement and would like to personally invite you to consider speaking to this writer. If you might be willing, or have questions about this, let us know by email (info@snpedia.com) – thanks. Greg (talk) 03:31, 21 August 2014 (UTC)Greg

Intelligence SNPs[edit]

Recent research has shown that a substantial amount of intelligence can be estimated by considering all SNPs at the same time. This technique would be a great addition to Promethease.

http://xkcd.com/285/ --- cariaso 10:33, 23 February 2014 (UTC)

PLoS One. 2013 Dec 12;8(12):e81189. doi: 10.1371/journal.pone.0081189. eCollection 2013.

Complex variation in measures of general intelligence and cognitive change.

"When the top ten regions (Table 2) from each trait were fitted together in a LMM they explained 13% (Pperm = 0.58), 15% (Pperm = 0.11) and 18% (Pperm = 0.43) of the phenotypic variation for crystallised intelligence, fluid intelligence, and cognitive change respectively."


Mol Psychiatry. 2011 October; 16(10): 996–1005. Published online 2011 August 9. doi: 10.1038/mp.2011.85

Genome-wide association studies establish that human intelligence is highly heritable and polygenic

"Finally, using just SNP data we predicted approximately 1% of the variance of crystallized and fluid cognitive phenotypes in an independent sample (P = 0.009 and 0.028, respectively)."

doesn't this mean 99% of the variance is *not* explained by SNPs? As for the PLoS One paper, it's main point appears to be the usual "more research is needed" conclusion, whereas what we'd need would be a model that is actually computationally reasonable as well as SNP-specific (not region-specific), including taking into account the SNPs that are on the commonly used microarray platforms.Greg (talk) 19:44, 23 February 2014 (UTC)

-64% of the variance in gf is explained by SNPs -It is true that "more research is needed" likely applies here. However, the quoted 2011 article above noted that 1% of crystallized and fluid cognitive phenotypes could already be predicted (P = 0.009 and 0.028, respectively) using only SNPs. It is likely that with more research this approach could explain much more than 1% of the variation in intelligence. If a genomics company were to include some online option that would measure intelligence, then a better prediction could be made.

Adding a GCTA intelligence measurement on Promethease (even at 1% predictive power) would be great. The SNPs currently listed on Promethease have had difficulty replicating.

-GCTA software appears to get around being SNP-specific.

The Genome-wide complex trait analysis (GCTA) software has been greeted with a large amount of enthusiasm within the psychometric community. This software is helping to resolve the longstanding question of nature-nurture in intelligence research.

Also from the article [Complex Variation in Measures of General Intelligence and Cognitive Change]: "SNPs within genes explained 48, 64 and 38% of the total genetic variation for gc, gf and cognitive change respectively."

SNPs explain 64% in gf!

Also from the article [Genome-wide association studies establish that human intelligence is highly heritable and polygenic]: "We estimate that 40% of the variation in crystallized-type intelligence and 51% of the variation in fluid-type intelligence between individuals is accounted for by linkage disequilibrium between genotyped common SNP markers and unknown causal variants. These estimates provide lower bounds for the narrow-sense heritability of the traits."

The second quoted study (2011) was done several years ago and only used 550,000 SNPs (with the Illumina610-Quadv1 chip) . What if they used the current SNP chips with 1 million?

Leveraging the SNPedia database[edit]

The article below describes how a GWAS could be reanalyzed using subjects from other sources to extract more of the genetic signal. It would be great if SNPedia allowed access to its database for such an effort (for example, in combination with The International Genomics of Alzheimer's Disease Project (IGAP)). Such a study would likely uncover important genetic information about Alzheimer's disease.

Exploiting Population Samples to Enhance Genome-Wide Association Studies of Disease Early Online March 10, 2014, doi: 10.1534/genetics.114.162511 Genetics March 10, 2014 genetics.114.162511}

Why don't tables maintain their formatting on snpedia? When I enter tables on my Talk page everything is lined up. Yet when I submit them to snpedia, all the columns get scramble.

please see https://www.mediawiki.org/wiki/Help:Tables and provide a specific example --- cariaso 05:15, 3 April 2014 (UTC)

When creating a new rs# page, it provides a default template similar to {{Rsnum |rsid=123456 }}

I don't know quite how, but you seem to be creating pages without that. If you're removing it, please don't. If you're doing something else, please help me to understand. That template is important to make the page behave correctly.

I wasn't sure if the rs# information should remain after I finished editing a snpedia page. However, I have checked and the rs# is there on the edit page for rs3747742. Is this why the population frequencies, clinvar etc. on the right side of the page have not been added? (I thought a bot would do this).

It is there because I put it back. You can see that in the page history. bots will eventually add a lot of information, but putting the Rsnum template helps the bots to find the page. --- cariaso 18:23, 5 April 2014 (UTC)

(Does anyone know how to take Population attributable fractions (PAFs) and create a disease risk score? I would like to create an AD genoset that computes an AD genorisk score with the SNPs on this page that have a known PAF. Should I just use a simple linear weighting?)

Even if this is the method used by one or more DTC genomic testing companies, it's not credible (i.e. it's bad science). Please only create genosets of this type (combined risk scores) that are based on peer-reviewed publications in which the combined effects of multiple SNPs have been specifically analyzed. Greg (talk) 20:57, 25 May 2014 (UTC)

Might Snpedia or Promethease add a feature that would allow pasting a text document full of SNP rs#s and receive a table back with the rs#s and the genotypes (perhaps the code could be embedded within a Promethease report)? This would be a very helpful feature. One of the genetics companies does not provide this service.

Do you mean all 3 genotypes (normally) for each SNP? What do you see as the use(s) for this? Greg (talk) 20:57, 25 May 2014 (UTC)

Snpedia cites an article reporting the epistatic interaction between rs1049296 (P589S) in the transferrin gene (TF) and rs1800562 (C282Y) in the hemochromatosis gene (HFE) [PMID 20029940OA-icon.png] . This genoset does not appear to be included on Promethease, though the result has been replicated [PMID 20817350] . Tri-carriers of P589S, C282Y and Apoe epsilon 4 have a 37.5 increased risk of Alzheimer's disease! However, this genoset might be too terrible to report. It would be worse than epsilon 4/4.

At the same time, if someone knew that they were a tri-carrier they might be able to do something to prevent the onset of AD (e.g. treatment with chelation).

Are some genosets too terrible to report?

We strive to include all statistically valid (and preferably replicated!) results, good or bad. The C282Y/P589S genoset does seem to meet the criteria, and by the way, it seems as if HFE 63D is equivalent to C282Y in this context [PMID 15060098]. But the only article discussing the tri-carrier situation that we can readily find is [PMID 15060098], a 2004 publication with a total of only 14 such folks (tri-carriers). Are you aware of any publication in the last decade that has replicated that finding, presumably with a greater number of tri-carriers? Greg (talk) 20:57, 25 May 2014 (UTC)

We still haven't addressed the C282Y and H63D from HFE, and TD C2 tri-carriers. Not sure about replication for this triple combo. In [PMID 15060098] study all 5 of these tri-carriers had AD or MCI. OR= 12.9 p=0.03. There are quite a few different combinaitons involved. Interestingly this combination does not involve APOE epsilon 4. A genoset that included the aboce tri-carrier genoset and APOE epsilon 4 would likely increase AD risk massively.

rs1049296 (P589S) in the transferrin gene (TF) and rs1800562 (C282Y)
  • homozygosity is now known as gs291
  • heterozygosity is now known as gs292
I've done only an extremely minimal text, and the magnitudes are probably too low. I hope the two of you will alter them to something more suitable. --- cariaso 21:14, 25 May 2014 (UTC)

The tricarrier genoset that I noted carried the minor alleles of rs1049296, and rs1800562, with APOE epsilon 4 genotype. I was more focused on the tricarriers than the bicarrier combinations as in genosets gs291 and gs292. The tricarrier genoset should probably be graded BAD with a coefficient of 10!

Yes, the HFE H63D (rs1799945) and TF AA (rs1130459) genotypes are also mentioned in relation to Alzheimer's risk, though there was some confusion about how they related to age of onset. The article discussed how H63D might actually be related to delayed onset of AD. HFE H63D might not be exactly the same as C282Y, though it also appears to confer risk for AD.

      • The article notes that tricarriers were only exposed to increased risk for AD if they were Northern Europeans!

The 2004 article did only have 14 tricarriers: 12 of them had AD and 2 had MCI! With such a huge odds ratio (OR (tricarriers versus all others)= 37.5 ) even only 14 tricarriers reached statistical significance (p<0.0001).

The 2004 findings were replicated in:

Kauwe et al., 2010 J.S. Kauwe, S. Bertelsen, K. Mayo, C. Cruchaga, R. Abraham, P. Hollingworth, D. Harold, M.J. Owen, J. Williams, S. Lovestone, J.C. Morris, A.M. Goate Suggestive synergy between genetic variants in TF and HFE as risk factors for Alzheimer's disease Am. J. Med. Genet. B Neuropsychiatr. Genet, 153B (2010), pp. 955–959

(It is not entirely clear whether this study is really studying the tricarriers directly. They seemed to analyze the C282Y and H63D carriers and then adjusted for Apoe epsilon 4 as a covariate. This article notes that 4% of AD patients in the study were bi-carriers. It would be difficult to get a large number of tri-carriers for a study because the minor alleles in the combination only have frequencies of approximately 15%, 2% and 14% ).


Transferrin and HFE genes interact in Alzheimer's disease risk: the Epistasis Project Neurobiol Aging. 2012 Jan;33(1):202.e1-13. doi: 10.1016/j.neurobiolaging.2010.07.018. Epub 2010 Sep 2.

Not sure whether the Gs293 criteria coding was correct. It is important that it is!

check out [PMID 24081379]; it's a cohort study, rather than a population study, but it complicates matter even a bit more. The other issue is age of onset (as you indicated earlier). Greg (talk) 23:25, 25 May 2014 (UTC)

Wow, those odds ratios are huge! Study had a small number of participants, the ratios might contract during the replication round. It appears that any combination of APOE epsilon 4, HFE and/or TF mutations and another Alzheimer risk factor will cause very large risk of cognitive impairment. The brain can only withstand so many insults.

OK, a bit more reading and I'm concerned again about the tri-carrier conclusion. First, though: the synergy factor SF is defined as the odds ratio of the combined case over the multiplied odds ratio of either factor on it's own. For SNPedia (and Promethease users), it's important to know the odds ratio for these bi- or tri-carrier cases (and at what age this is calculated). The data in Kauwe et al 2010 supports an interaction between HFE and TF, but I don't see any direct statement about tri-carriers there at all (as you indicated as well). And they never state the odds ratios either, only the SF. So are we back to only the Robson 2004 paper with it's 14 tri-carriers?

This is frustrating! They clearly seem to indicate the triple combo has extreme risk, though it is being described in terms of synergy factors and APOE epsilon stratification / covariates.

From the article, Transferrin and HFE genes interact in Alzheimer's disease risk: the Epistasis Project Neurobiology of Aging Volume 33, Issue 1, January 2012, Pages 202.e1–202.e13

From the Abstract, "...We replicated the reported interaction between HFE 282Y and TF C2 in the risk of AD: synergy factor, 1.75 (95% confidence interval, 1.1–2.8, p = 0.02) in Northern Europeans. The synergy factor was 3.1 (1.4–6.9; 0.007) in subjects with the APOEε4 allele."

I do not fully understand what they mean by "synergy factor", though the word synergy gives me the sense that this is large (as opposed to the word additive).

We might have to do the math ourselves. synergy factor= SF= 3.1 = X /{OR(HFE mutation) x OR (TF mutation) x OR (APOE epsilon 4 genotype)}

OR(APOE epsilon 4 genotype)= 2.5 (roughly) OR(HFE mutation)= 1 OR(TF mutation)= 1

OR(tricarrier)= X= 7.7 roughly Somehow in the original article the OR was 35. In any event the risk for tricarriers is large.


"To have even 50% power to replicate the interaction between HFE 282Y and TF C2 at p = 0.05 in a Northern European sample, i.e., with control allelic frequencies similar to those in Table 1, would require 2400 cases and 2400 controls. It would require an even larger dataset in other populations, which have still lower frequencies of HFE 282Y. The interaction between HFE 63HH and TF –2AA would require 1025 cases and 1025 controls to have 50% power. However, the former interaction has now been replicated twice independently, in samples totaling 2313 cases and 7065 controls, i.e., in Kauwe et al. (2010) and in the Northern Europeans of this study. This interaction is so far the only example of epistasis in AD to have been consistently replicated in such numbers."

'Former interaction' as in HFE282Y and TF C2 ? ... "has now been replicated twice independently."

Why did not the recent International Genomics of Alzheimer Project consider this? Why do they only consider individual SNPs in GWASs? The articles mentioned that the HFE and TF mutations were not significant for AD when considered individually. IGAP had a sample of 75,000 people! Anyone know how to access the IGAP database?

   "Supplementary Table 4.
   Interaction between HFE 282Y and TF C2 in Northern Europeans, stratified by APOE ε4
   APOE ε4 status	Numbers	Adjusted⁎ synergy factor (95% CI, p)
   Controls	AD
   ε4-positive	1427	579	3.0 (1.3–6.9, 0.008)
   ε4-negative	4066	480	1.06 (0.55–2.0, 0.87)
   HFE = the haemochromatosis gene; TF = the transferrin gene; CI = confidence interval
       All analyses controlled for center, age, gender and genotype of apolipoprotein E ε4."

I agree - I came to the same conclusion, i.e. that I can't see how they got their large (>30) SF since the math seems to support a number between 5 - 10. I'm also not sure what these authors are using when they compare the tri/combined odds ratio; are they comparing (only) to those carrying zero of the minor/affected alleles, or, to the pool of all non-tri-carriers? One other thing - I take the back the comment about the equivalence of H63D and C282Y in terms of TF interaction, after reading the Robson paper more carefully.Greg (talk) 01:03, 26 May 2014 (UTC)

The numbers would work out if a Bicarrier OR were used. This would probably make more sense in this instance, as the OR (HFE mutation) and OR ( TF mutation ) = 1 for the main effects.

For OR (HFE mutation and TF mutation)= 5.

X=3 x 2.5 x 5 = 37.5 which is exactly what the result from the original article!

The synergy factor of 5 was taken from the original 2004 article abstract (synergy factor for bicarriers = 5.1) Snpedia Gs291 uses 2.71. The current article has 1.75 in Table 3 for North Europe.

This at least gets the number into range. The article is not entirely clear whether or not what approach for calculating SFs was used.

The criteria for gs293 now uses the epsilon 4 homozygous genoset gs216. It should be only a single carrier of the epsilon 4 genotype e.g. epsilon 2/4 or epsilon 3/4 in gs293. Epsilon 3/4 is gs141.

Is the claim made on gs141 that APOE epsilon 3/4 only increases risk for Alzheimer's by 2 times accurate?

Alzgene has the risk of 4 versus 3 as 3.68 (3.30,4.11). The first table of Alzheimer's disease, late-onset (IGAP) lists the OR as 4.89 (4.45-5.39). However, 23andme lists the relative risk around 2. Further, recent research suggests that epsilon 4 might not be as detrimental to men as it is for women.

See the text for gs188, which includes the reference cited for both a single ApoE4 and the gender-based risk difference. Note that the risk cited is the estimated "remaining lifetime risk at age 65", which for late-onset AD, is probably a reasonable risk statistic to use. Can the Alzgene risks be put in the same context, in other words, use the same method of estimating risk, or are they already? Greg (talk) 23:26, 26 May 2014 (UTC)

Wow!! apoe epsilon 4 is phase dependent? Most people do not know that. A whole lot of people might be worrying about an epsilon 4 genotype when they do not even have it. 23andme did not say anything about phasing.

Could we define this new disease as Iron Overload with cognitive impairment? Calling all forms of cognitive impairment in seniors "Alzheimer Disease" without reference to genetics, treatment and prevention options, etiology, etc. does not seem reasonable. Perhaps this is one of the reasons that Alzheimer trials have not been successful. If people have unaddressed iron issues, then simply treating amyloid might not be effective. Some of the 30% of AD patients in the anti-body trial without amyloid might have had iron overload.

There is a substantial group of people with iron overload cognitive impairment genotypes who are at extreme risk for cognitive impairment. Why is there not a political movement to address this issue?

Iron Overload with cognitive impairment is a curable illness!

SNPs with high LD reporting on a single disease should include LD estimates for these SNPs. This could avoid confusion with multiple reporting.

See http://www.broadinstitute.org/mpg/snap/ldsearch.php

Does Promethease offer a service that would allow potential parents to determine possible genotypes for their children?

This would be especially important for gs293 etc. as many people with tri-carrier status would not have an overly prominent family history of Alzheimer's.

Desktop promethease has a feature documented at Promethease/Features#Possible_Offspring, however it does not recognize compound heterozygosity at 2 different positions in the same gene, only when both parents are heterozygous carriers of the same snp. it is not yet supported in the web based version.

I am not sure whether gs291 is correct. It does not appear in Suggestive synergy between genetic variants in TF and HFE as risk factors for Alzheimer's disease Am J Med Genet Part B 153B: 955–959.

The sample size to demonstrate such a genoset would be huge. MAF of C282Y is 2%. 2% times 2% times 15% times 15% would be a very small number. What is the reference for gs291? The 2004 study had 0 C282Y homozygotes!

Though the 2004 study found that tricarriers of H63D, C282Y, and TF C2 had extremely high risk of AD versus all others OR=12.9 p=0.03 (CI=0.7 -242)!

Are you suggesting that gs291 is wrong because there is no literature to support a 'diagnosis' for homozygotes? If so, perhaps the wording should be softened, and changed to a 'info@snpedia.com' would like to hear from you. But I think it's worth calling out that genotype as notable, even if the rareness means that there isn't any high confidence literature on the phenotype. This is similar to gs267 and gs189. ~

gs291 needs a reference. I am not what is the source of the 2.71 increased risk of Alzheimer's diasease statement.

I generated it based on some of the discussions above, but cannot find the citation I was apparently (mis-)using. I've deleted the page for the time being. I can easily restore it, if you will indicate what text is more appropriate for that criteria.

rs1800562 already has a page. The homozygous form GG has a magnitude 4 Bad rating. This hemochromatosis SNP is important and has been well studied. Alzheimer disease would be only one among many health related concerns for a GG carrier. The SNP page also has a magnitude 3 Bad for AG genotype when combined with H63D, though it is not clear whether there is a genoset for this or only anyone with AG genotype is given the 3 Bad warning. It should not be reported in that way (if it is). 23andme carefully notes that simple carriage of C282Y has no clinical significance. rs1800562 AG notes this as true, though it seems that all AG carriers receive the 3 bad warning.

Genoset 294 needs some work. I wanted some input on this because I gave it a bad rating of 2. I am not sure whether or not the initial result has been replicated, though if it has then perhaps a higher Bad rating should apply.

A new page should be created listing all the combinations that increase AD risk in the HFE, TF and APOe genes. All these combinations are starting to become confusing.

Snpedia should probably consider including phenotype data in its analysis. In an above reference [PMID 24081379], it was noted that the combination of APOE epsilon 4, H63D, and diabetes resulted in an Alzheimer's odds ratio (for females) of 52.0!! When submitting data for Promethease, a simple question Do you have diabetes? Yes or No could be combined with the genotype files and the above Tricarrier genoset would be detected. This would be consistent with the current direction in genetic studies to include phenotype data.

For the C2 variant of TF a recent meta analysis did find that it increased AD risk though not very much. Meta-analysis on the association between the TF gene rs1049296 and AD Can J Neurol Sci. 2013 Sep;40(5):691-7.

I am not sure what the homozygous risk was for the C2 allele.

Snpedia has [Rs638405]. It reports a doubling of the risk of AD among carriers of the GG allele and apoe epsilon 4.

Should it not be higher? OR (APOE epsilon 4)=2 SF=2.5 (from article Epistasis in sporadic Alzheimer's disease see below) So, OR=5 (at least)

This article calculated the risk as 4.7 times. The Odds ratio increased to 7 times when adjustment was made for age and gender. Specific BACE1 genotypes provide additional risk for late-onset Alzheimer disease in APOE epsilon 4 carriers.

Meta-Analysis was conducted in this article. There are 27 significant gene - gene interactions in AD noted in this article. However, most had not been replicated. Epistasis in sporadic Alzheimer’s disease Neurobiology of Aging 30 (2009) 1333–1349 Am J Med Genet B Neuropsychiatr Genet. 2003 May 15;119B(1):44-7.

It is not clear whether the 2004 report for tricarriers 282Y, C2 and APOE epsilon 4 (with a 37.5 times increased risk of AD) was in fact replicated in subsequent reports.

The 2010 report reads: "In the Optima report (2004) we had suggested that there might be a further interaction between HFE 282Y, TF C2 and APOE�4. Here we found that the interaction between these 2 iron-related SNPs only occurred in subjects with APOE�4, where the synergy factor in Northern Europeans was 3.0 (1.3– 6.9; 0.008), but not in APOE�4 negatives, where the synergy factor was 1.06 (0.55–2.0; 0.87) (Supplementary Table 4). However, there were no significant interactions between APOE�4 and either SNP or both together."

Suggestive synergy between genetic variants in TF and HFE as risk factors for Alzheimer's disease Am J Med Genet B Neuropsychiatr Genet. 2010 June 5; 153B(4): 955–959. doi:10.1002/ajmg.b.31053.

I am still not sure if the OR for gs293

What does the last sentence mean? They confirmed the interaction earlier in the paragraph, though seem not to be confirming it in the last sentence. It is important that this genoset gs293 is correct as it is rated Bad magnitude 8.

Might someone check this?

I am still not sure whether the OR for Gs293 is correct.

"A 2012 study also replicated the synergy in bicarriers of rs1049296 and rs1800562 for risk of AD (" synergy factor, 1.75 (95% confidence interval, 1.1–2.8,p =0.02) in Northern Europeans. The synergy factor was 3.1 (1.4–6.9; 0.007) in subjects with the APOE4 allele.") [PMID 20817350]"

Should the 3.1 synergy factor noted above used as the OR. Then why did they not simply state it as a usual OR?

In the above discussion we agreed on the 7.7 OR. If they intended it to be a true synergy factor, then the OR for gs293 should 2.5 x 3.1 = 7.7 as originally suggested. As it is the tricarriers are reported to have a relative increased risk of 1.8 versus bicarriers (i.e 3.1/1.75). (This might be more plausible than 4.4 times higher with the 7.7 (i.e 7.7/1.75)

If so, how could the odds ratio have possibly decreased from 37.5 times increased risk in tricarriers (p< 0.0001) in the original study to 3.1 in the last ? What explains this? A special risk factor (genetic, lifestyle etc.)? Astonishingly, tricarriers made up 6.3% (12 of 191) of AD cases, 2.9% (2 of 69) of MCI cases, and 0% of (0 of 269) controls, even though tricarriers only have a frequency of about .15% ( 2% HFE C282Y, 15% TF C2, and 14% apoe epsilon 4

We have not even included the HFE H63D, C282Y and TF C2 tricarriers from the original study. Of the 5 people with this tricarrier combo 4 (of 5) had AD, and 1 (of 5) had MCI OR=12.9, p=0.03). [Not sure if this has been replicated].

These 2 tricarriers combos made up 8.4% (16 of 191) of the AD cases in the original study.

What could explain such overwhelming results in the original study and such mediocre results in the followup? (n might be involved)

Is the software for SNPedia freely available so that it could be used for other similar projects?

  • Yes, see http://snpedia.com/index.php/Special:Version . If your interest was in hosting extremely similar content (ie, Chicken SNPedia) we might be happy to include the content directly in SNPedia, and info@snpedia.com would be happy to discuss. --- cariaso 14:50, 31 May 2014 (UTC)

The rs pages do not seem to directly link to the associated genosets (gs) pages. This would be a helpful feature.

Anyone know of a commercial provider that offers whole exome sequencing, whole genome sequencing, or gene chip genotyping ( e.g. exome chips) at a reasonable price on a direct to consumer basis?

Assuming you are interested in a US provider ... How about seeing if Omega Bioservices is for real? Greg (talk) 05:01, 13 July 2014 (UTC)
Full Genomes Corp is doing a Whole Genome Sequencing pilot for $1800. -- Jlick (talk) 05:58, 13 July 2014 (UTC)

Thank you very much for your suggestions! What would be a smart buy in genetic services? I have been genotyped with the 23andme V3 gene chip.

I am inclined to start with a low end exome chip and then ask for imputation services. It seems that the low end of the genetics market is where the smart money is buying. A company that makes gene chips has done extremely well serving this demand.

Full exome and genome sequencing also look appealing to me. (I might gradually work up through the products).

Would a company be able to perform haplotyping? I have downloaded Haploview. It would be interesting to have my genome at that level of organization.

I am interested in comments on what an informed customer would want in genetic services. There are many people who are amazed with the technology and just hand over their credit card without thinking. I want to be a more sophisticated consumer.

(What format does Promethease recognize? If a gene chip file included SNPs not on the 23andme platforms, though the file was formatted in 23andme style, would Promethease accept such a file?)

Promethease understands VCF files with rs#s. In an ideal world, the VCF file would contain all rs#s, not just the ones which were different from the reference.

Rs1799852 Variants in TF and HFE explain approximately 40% of genetic variation in serum-transferrin levels [PMID 19084217OOA-icon.png]

This article showed that endophenotypes can be easier to determine than diseases in a GWAS. Is it acceptable in snpedia to link the endophenotype to the disease?

For example, high transferrin saturation levels when combined with high cholesterol levels have been related to Alzheimer's disease. Could the markers for high transferrin saturation be included as risk factors for AD?

sure, we'll try it out. I don't see a problem.

How to get one's exome sequenced[edit]

Hi J1,

"I am in the final stages of arranging for my exome to be sequenced. (This is overwhelmingly exciting!) Would Promethease be able to run my exome sequence to find any variants as listed on the Exome Variant Service (see above)?"

If you wouldn't mind sharing, could you let us know which company you went with?

Thanks --User:Epsilon4 2014-07-29 12:07 (UTC)

It is still quite hush hush. It will be a great relief when (if?) this is accomplished.

I know how desperate you can feel when you want your exome or genome sequenced and it seems that everyone is trying to prevent you form doing so. Even still, I do not know whether I should disclose the company's name to you because I fear that any American genomics company that develops a profile will be shut down.

I understand. Whenever you're ready (if you're ever ready), I'm ready to hear more. Email is fine, too. (See my user page, and click on "Email user" -- or whatever it says in your SNPedia's language.) --User:Epsilon4 2014-07-30 10:25 (UTC)
Promethease will only be able to give information about variants which are listed in SNPedia. --- cariaso 12:58, 29 July 2014 (UTC)
Quite right! (But SNPedia can be expanded -- indeed, is being expanded all the time!) --User:Epsilon4 2014-07-29 14:07 (UTC)
The problem with the Exome Variant Server(EVS) SNPs is that they are rare. It could take years to work them out. However, when you look on the EVS under the APOE gene so many of the missense SNPs (in red) are hits. rs 7412 , rs429358 = epsilon4, rs199768005 = epsilon 3b [extreme low AD risk allele, possibly lower than epsilon 2], etc. . Exome SNPs are crucial mutations involved in disease. When you start throwing junk into your protein machines, there is quite likely going to be trouble. Sequencing one's exome makes tremendous sense.

It would be great if someone out there could download the EVS SNPs and write the code to help people find the 100-1,000? SNPs that they have likely inherited that meaningfully increase disease risk.

Yet, it probably would be more fiscally prudent to go with the exome chips genotyping. This is the cheaper approach. Unfortunately, I was unable to locate a service provider.

Does Promethease offer any imputation service? For example, when (if) my exome is sequenced, would Promethease be able to do the imputation of my son's exome from his 23andme fle, and my 23andme file and my exome sequence?

Anyone know what quality could be expected from a 30x,50x,100x exome sequence? (i.e. error rates,coverage characteristics).

Another thing that is holding things up a little is the issue of outside region exome targetting. Apparently, exome sequencing actually can generate quite a bit of off target sequecning. (up to 50%). If I could get another 50Mbp of high quality SNPs outside of the exome for free, then so much the better. I am just not clear how much more off target sequence I would recieve with the different coverage options (30x,50x,100x).

Any comments to the above or related matters would be appreciated.

Anyone know where I might be able to purchase an Oragene saliva kit from DNA Genotek online or otherwise? Should I buy the research or diagnostic version?

I have as of yet no answers to any of your questions, but I've put out some calls to people I know in the biohacker community who may know.
By the way, your page here might be more readable (to at least the people who are used to Wikipedia discussion pages) if you signed your comments, which one does by adding four tildes in a row.
--User:Epsilon4 2014-07-31 09:31 (UTC)

Exome Scan has arrived!!![edit]

I am thrilled to have just received the exome scan. Anyone with knowledge of the Basespace platform, please help me.

The company that did the exome sequencing, shared the results in a Basespace account. However, the company is still listed as the owner of the project. When I try to launch the VariantStudio App nothing happens. The vcf file that is available from the sequencing does not appear to be accessible to the app.

Any advice would be greatly appreciated.

no particular experience, but plenty of interest in assisting. I'm eager to learn how to integrate promethease and basespace. --- cariaso 14:39, 13 September 2014 (UTC)
Integrating Promethease and basespace would be easy. basespace has an apps section (for example, the app VariantStudio is in this section). Promethease could be easily included there. basespace from Illumina (the current leader in genome sequencing) is a complete genomics analysis tool.

Many of the apps on basespace are currently free, though probably not for long. To be competitive, an upgraded version of Promethease could be included. It would be helpful in such an upgrade to include variants from the Exome Variant Server (which I have suggested before). Currently, the main limitation of exome/genome analysis is the interpretation of large numbers of rare variants. I am struggling with this problem now in trying to understand the exome results I recently received. Finding the disease causing variants from among the over 60,000 mutations in the scan is going to require a real effort. Perhaps Promethease could partner with one of the other apps, in order to provide a complete genome analysis package.

Exome Results: How do you Resequence?[edit]

After using the filtering software, some very interesting results have emerged from the recent exome scan. A few of the interesting variants only had a few alternative reads and so they were not judged to be of high quality. Other interesting variants were of high quality, though it might be worthwhile to double check them on another run.

If the reported results for these variants were to be confirmed, then it would be helpful to have other family members tested at these loci.

How does one go about running a custom SNP panel of perhaps 100-1000 SNPs on 5-10 people? The minor alleles of the SNPs of interest are very rare and they are not likely to be on a genechip. Many of the SNPs do not even have rs numbers! Should one genotype or sequence such SNPs?

What genetic technology would be most suitable? J1 (talk) 20:12, 14 September 2014 (UTC)

Understanding VCF files[edit]

Here is some output for the off-target exome VCF file.

"chr1 14600 . C . . LowGQX;LowMQ END=14606;BLOCKAVG_min30p3a GT:DP:GQX:MQ 0/0:266:18:4

chr1 14607 . A . . LowGQX;LowMQ END=14609;BLOCKAVG_min30p3a GT:DP:GQX:MQ 0/0:261:24:5

chr1 14610 . T . . LowGQX;LowMQ . GT:DP:GQX:MQ 0/0:254:2:5

chr1 14653 . C T 282.10 LowMQ;LowQD;SB BaseQRankSum=0.500;DP=177;Dels=0.00;FS=9.863;HRun=0;HaplotypeScore=0.0000;MQ=10;MQ0=138;MQRankSum=-1.328;QD=1.59;ReadPosRankSum=-1.613;SB=-0.01;CSQT=WASH7P|NR_024540.1|non_coding_exon_variant:nc_transcript_variant,DDX11L1|NR_046018.2|downstream_gene_variant GT:AD:DP:GQ:PL:MQ:GQX:VF 0/1:142,35:177:99:312,0,241:10:99:0.198"

I am having some trouble figuring this out. The last locus at 1:14653 seemed to have read out at C/T.

It appears that the first three loci were dicarded, or they all read homozygous reference (i.e. 0/0). The first three SNPs all had over 250 reads (DP>250).

I think GQX is a quality score. I do not understand how the quality is so low for the first three SNPs (18,24,2) What does the GQX score mean? Help interpreting these results would be appreciated.

The VCF file with these off target results is over 300 MB in size. How might I determine how many of these SNPs actually were high quality results. (Perhaps upload to Promethease and see how many it counts?)

Bioinformatics resources on Promethease would be helpful. For example, I now have a 23andme file, 1 exome on target VCF file, 1 exome off-target VCF file, and a 23andme file for my son.It would be useful if Promethease could merge my files and check for concordances between my 23andme file and my exome VCF file etc. Does Promethease allow more than 1 file to be uploaded for a report?

Desktop Promethease does. put in all 3 files during the first screen. --- cariaso 23:27, 15 September 2014 (UTC)

I tried to run Promethease with my 3 genotype files. I first entered my 23andme file. After the file loaded Promethease requested payment. I wanted to input my three files: 23andme, exome on target VCF file, and exome off target VCF file.

What should I do so that all 3 of these read in for my Promethease? J1 (talk) 04:05, 23 September 2014 (UTC)

On the first screen, when you normally put in just your 23andMe data, instead put in all 3 files. --- cariaso 14:38, 23 September 2014 (UTC)

Promethease only allowed 1 upload file before requesting payment information.
No that is wrong. See [File:Promethease_Desktop_with_2_files.png]

Should I just copy and paste all three files into one file and then submit this to Promethease?


Would Promethease understand a file that was composed of 2 vcf files and 1 23andme file?


(I tried to copy and paste the off target VCF . This file is almost 400 MB and it crashed my computer. The on target VCF file and the 23andme file did merge into a VCF file. Would Promethease understand this file?

The formats are too different. I don't know how you merged them. It's possible it could understand the result if you were consistent, but there is no need to do it this way, and its error prone.

J1 (talk) 03:12, 24 September 2014 (UTC)

Concerned about 3'UTR Variant[edit]

One of the variants from the recent exome scan is especially worrisome. This variant is in a gene of concern and is very rare. There is no rs number for it.

A web site called Mutation Tester ran an analysis and determined that the variant was pathogenic. (However, the site notes that its analysis for this particular type of mutation [a splice site mutation] is only 70% accurate.)

Mutation taster reported that the mutation occurred in the 3'UTR region, the regulatory features were H3K36me3, Histone, Histone 3 Lysine 36 Tri-Methylation, at the splice site Donor increased, Model: without_aae { this is explained "as 'silent' (non-synonymous or intronic) alterations (without_aae model)"}. The above description is very unclear to me. If anyone could clarify the meaning it would be appreciated. Suggestions for visualizing genes with the exons, introns, 3'UTR regions etc. would also be welcome. Most of the tools online appear to be very cluttered and do not display these gene features clearly.

http://macarthurlab.org/lof/ http://genomesunzipped.org/2012/02/all-genomes-are-dysfunctional-broken-genes-in-healthy-individuals.php --- cariaso 18:06, 19 September 2014 (UTC)

rs4129148 not in Psychiatric Genomics Consortium[edit]

In July a massive schizophrenia GWAS was reported in Nature noting over 100 loci.



{see page 23}

The largest risk effect of the SNPs was 1.3. rs4129148 was not reported. No SNP on chromosome Y was included.

however, rs4129148 has been previously replicated.

Should rs4129148 still be considered a valid SNP for schizophrenia?

That is not for me to decide, but I would strongly encourage you to mention this on rs4129148. --- cariaso 17:11, 4 October 2014 (UTC)

Promethease Compound Uploads[edit]

I have asked about this before (I cannot find it on my pages): uploading more than one file to Promethease. I still cannot figure it out.

1. I go to the Promethease site.

2. It asks me if I want to upload from 23andme or from my computer.

  {I choose my computer.]

3. My file uploads. Promethease then asks for a $5 payment.

  I want to upload a 23andme file, and 2 VCF files from my exome!

What am I doing wrong?

It isn't yet supported. I'm working towards it http://authenticjobs.com/jobs/22793/flask , but I have a very long list of things to do. --- cariaso 03:26, 29 November 2014 (UTC)

Here's my idea. Why not add all files together before uploading to Promethease?

There are free file conversion programs available online. For example, 23andme files could be converted to a VCF file and this could be merged with another VCF file. The values needed for the 23andme --> VCF conversion could be derived from the 23andme platform ( Or perhaps a program could simply read in the two quality scores for the SNPs and decide which is more likely (if there is a discrepancy between the two genotypes).

This could save considerable effort trying to rework the software on the Promethease server.

I want a snpedia T-shirt and coffee mug for this idea!

Promethease and Imprinting[edit]

Does snpedia or Promethease include information on imprinting? Is it possible to provide this information?

Regions of chromosomes that are imprinted from mother or father could be highlighted. Knowing which of the two alleles in a gene was active could be very useful.

Imprinting in mammals is supposed to be fairly rare, but on a quick search I found this table http://igc.otago.ac.nz/1101Summary-table.pdf from http://igc.otago.ac.nz/home.html - it's possible there's some more recent data too, but don't really have time to research that just now. Due to the very small amount of imprinted genes, it seems like the most reasonable course of action would be to just make a note of the imprinting on the gene and/or SNP page. Though if codified via a template, Promethease might make use of that information some day. This will run into the generic issue of SNPedia only cataloging variations of which there is something known though; Promethease would thus not annotate imprinted variations of which nothing (else) is known. This is the current philosophy though, and I can't imagine it making sense to change that, with dbSNP having 113 million registered SNP's at present, there would be no point to listing them all even when nothing is known about their effect and they aren't included on genotyping chips so there's little potential for gaining it. There are other annotation tools - most notably Ensembl - which can do overall annotation of huge number of variants of which only genomic data is known. --Donwulff (talk) 21:20, 14 December 2014 (UTC)


Does Promethease or 23andme report copy number variations? Having a deletion on a chromosome often leads to severe illness, so it would be helpful to have this information.

No, since it's not in those data files. Greg (talk) 00:31, 9 December 2014 (UTC)

Are there online programs that would take an exome or 23andme file and report any deletions (or additions)?

If a customer from 23andme with a large deletion (for example, due to schizophrenia) were genotyped, would their genotype in the deletion region read (for example, C-, G-, G-, etc.)? [Assuming the deletion is only on one chromosome.] Or, might the genotype read back as CC,GG,GG,etc. , that is as a run of homozygosity?
This is from my own 23andMe file:
grep -v "[#XYM]" genome_N_N_Full_20140314155714.txt | cut -f4 | sort | uniq -c
 20702 --
147296 AA
 24656 AC
108892 AG
   614 AT
172924 CC
   997 CG
108940 CT
   147 DD
    25 DI
172106 GG
 24859 GT
   675 II
147548 TT
So on 23andMe No Call is always on both sides, which makes sense given the genotyping technology relies on clustering the results into one of three bins - zero, one or two of the variant. The sides aren't read individually, and if the result is too far from any of the clusters, it reads as No Call. But what's that on the list? Yes, that's right, 23andMe has already included in the raw data 847 (For me, some may read as NC) indels based on them clustering to such a cluster. This gives reason to assume most No Calls probably aren't common indels, because if they were, 23andMe would already report them as such.
Note that technically these usually aren't Copy Number Variations - common beadchip technology such as 23andMe/Ancestry/FTDNA Family Finder doesn't suit for short repeating sequences, because the repeating sequence could bind at any location on the probe. One method to try to get to deletions they don't detect would be to impute the microarray genotypes into 1000 Genomes data; the SNP variants might reveal a deletion that's always inherited with said variant. Although in that case the association testing would likely have revealed the tag SNP instead of the deletion, and it's useless for novel deletions.
Some other folks have been suggesting looking at pattern of non-mendelian inheritance of SNP's between parents and child, on the theory that copy number variants are likely to be called as homozygous for the remaining allele. See http://genomesunzipped.org/2010/08/dude-where-are-my-copy-number-variants.php - although I don't know if there's any real research supporting that conclusion, it does seem credible. This would of course require the deletion is specifically over the probed variant(s). --Donwulff (talk) 06:07, 9 December 2014 (UTC)

Thank you very much for your reply. There are so many layers of genetic information in the genome. I was worried that SNV was a whole new perspective that had been missed. However, it seems that SNVs are often benign.

Another variety of genome variant I am interested in is short sequence repeat (SSR), in particular rs10524523.

see http://www.alzheimersanddementia.com/article/S1552-5260%2814%2902470-4/pdf

snpedia reports that rs10524523 is on the 23andme genechip. It is not on my version. How does one find the SNPs on different versions of their genechip? Can one switch the version showing for the Mendel family example in Browse mode?

I have downloaded the NCBI Genome Workbench software. I can download the reference sequence from their website. It would be very cool if I could add a track with my 23andme file or exome file and ask for the variants.

If this software feature is not available I might just have to write the code myself, though it should not be much trouble.

I don't see SNPedia's page having or having had rs10524523 on 23andMe chip, but SNPedia's microarray information seems often out of date. It could also be under one of the Illumina i* - codes. SNPedia has it listed as "Status: Deleted" though, which surely can't be correct? I don't find anything to that effect on dbSNP or other sources. When in doubt I frequently use the Broad Institute SNAP tool at https://www.broadinstitute.org/mpg/snap/doc.php - in this case it does give rs34992067 and rs71685227 as aliases of rs10524523 (Ensembl agrees), but unfortunately none of those three exist in the databases available with the tool. 1000 Genomes Phase 3 does have it, at nearly 1 percent: "19 45403048 rs10524523 CTTTTTTTTTTTTTTTTTTT C 100 PASS AC=4990;AF=0.996406;AN=5008;NS=2504;DP=18112;EAS_AF=0.999;AMR_AF=1;AFR_AF=0.9924;EUR_AF=0.997;SAS_AF=0.9959". Could look at it via HaploView, but not sure that makes sense. --Donwulff (talk) 22:02, 14 December 2014 (UTC)

LIPC rs121912502 and heterozygous recessive disease risk[edit]

The Lipase deficiency caused by this SNP is recessive. Does that mean that the CT genotype has no deleterious effect? The snpedia entry for CT has not been completed.

If all such recessive SNPs have no deleterious effect, then it would be enormously helpful if snpedia could find a database containing all such recessive SNPs and automatically fill in all the heterozygous genotypes as having a magnitude 0.

I having been pouring over an exome file for months. There are tens of thousands of variants. Many of them are extremely rare and look very scary. It would help enormously if Promthease could help eliminate from consideration all those heterozygous recessive SNPs that confer no disease risk.

Anyone know what Invalid or virtual namespace -1 given. means?[edit]

I tried to add the AG genotype for rs1129844 and I got the error " Invalid or virtual namespace -1 given." What does that mean?

A known issue with the ConfirmEdit extension which will be fixed whenever I rebuild SNPedia with latest release code. Which I might as well do now. --- cariaso 20:04, 19 July 2015 (UTC)

Genetics of Intelligence (Part II)[edit]

We might be approaching a time when Promethease could give users meaningful insights into cognitive ability through genetic analysis.

From the Nature article below: "However, analyses using genome-wide similarity between unrelated individuals (genome-wide complex trait analysis) indicate that the genotyped functional protein-altering variation yields a heritability estimate of 17.4% (s.e. 1.7%) based on a liability model. "

The SNP discussed in this article was rs28379706 (C allele raised IQ and explained 0.16% of variance in IQ)

http://www.ncbi.nlm.nih.gov/pubmed/25644384?dopt=Abstract&holding=npg http://www.nature.com/mp/journal/vaop/ncurrent/full/mp2015108a.html

I wonder whether snpedia could help out with solving the genetics of intelligence.

If we could find a way for people on snpedia to be genotyped on the genechip in the above Nature article perhaps we could increase the sample size. The study only had 3200 normal IQ individuals and 1400 with IQs over 170. And, of course this being snpedia there might be a few in the mega intelligent group, though presumably those with IQs over 170 would already be aware of this fact and could disclose it on Promethease.

The exonic gene chip used for this study was probably not expensive: some only cost $50. This would be a very interesting project to work on. Anyone interested? Anyone know where we could be genotyped with the Illumina HumanExome-12v1_A array ? This should not be expensive!

We could make a meaningful contribution to unraveling this very important question.

Alzheimer's Link to Copper[edit]

fyi http://www.healthcanal.com/cancers/prostate-cancer/69277-common-treatment-for-prostate-cancer-appears-to-double-alzheimer%E2%80%99s-risk.html

Thank you very much for the reference. I am also now very interested in the copper connection to Alzheimer's.

I wonder whether it could be so obvious? Dementia is now a universal aspect of aging, though in the not too distant past and still in some undeveloped or other nations dementia is not very common. Could it simply be the inorganic copper in our plumbing that has caused this pandemic?


What chip am I on?[edit]

Having a genoset that returned which genechip was present would be very helpful. Many people are not sure whether they are on the v3 or v4 chip.

your logic for gs320 fails for people who've had full genome sequencing. please add some common i# snps to fix that. --- cariaso 04:56, 26 May 2016 (UTC)

Not sure about the syntax of Gs320. Brackets and commas might be out

Why is the T allele of Rs121964976 not showing on the rs page?[edit]

dbsnp shows the minor allele of Rs121964976. This seems to be the pathogenic form of the SNP. Why did the snpedia bot give the alleles as C or G?

All caught up and corrected now. The (cDNA) G>C change is pathogenic; the G>A change is benign (so there's no particular reason to mention it in SNPedia).Greg (talk) 20:21, 19 April 2017 (UTC)

Additional variants associated with HIV progression[edit]

While CCR5 variants are of importance in the progression of AIDS,

"The homozygous CCR5 Δ32/Δ32 genotype and complex heterozygotes with other rare amino acid mutations confers near complete resistance to HIV infection"

they only explain 20% of the genetic variance.

(GWAS) performed in HIV-1 cohorts have shown that the HLA region and the chemokine receptor CCR5 gene have major roles in control of HIV-1 replication and disease progression—together they explain approximately 20% of genetic variability...

Other cytokine genes appear to explain more of this variance.

"These findings from GWAS highlighted the leading role of chemokine receptors among non-HLA genes in HIV-1 pathogenesis"

"CCR3, CCR8 and CCRL2 may contribute additional genetic regulation of HIV-1 disease in addition to that conferred by the major HIV-1 coreceptor gene CCR5."

Table 1

Characteristics and allele frequencies of the chemokine receptor variants. Gene Variant dbSNP ID Chromosome Position1 Domain2 EA freq.3 AA freq.3 CX3CR1 CX3CR1-V249I rs3732379 39282260 TM6 0.261 0.140 CX3CR1 CX3CR1-T280M rs3732378 39282166 TM7 0.159 0.040 CCR8 CCR8-A27G rs2853699 39348906 N-terminus 0.306 0.144 CXCR6 CXCR6-E3K rs2234355 45962984 N-terminus 0.006 0.438 CCR3 CCR3-255C (Y17Y)4 rs4987053 46281704 N-terminus 0.083 0.103 CCR3 CCR3-P39L rs5742906 46281769 TM 1 0.004 0.013 CCR2 CCR2-V64I rs1799864 46374212 TM 1 0.1 0.15 CCR5 CCR5-2459A5 rs1799987 46386939 promoter 0.56 0.42 CCR5 CCR5-Δ32 rs333 46389951 Intracellular 0.1 0.01 CCRL2 CCRL2-Y167F rs3204849 46425074 3rd ECL 0.394 0.134 CCRL2 CCRL2-I243V rs3204850 46425301 TM 6 0.085 0.016

PLoS Genet. 2011 Oct; 7(10): e1002328.

Any way to use an exome file to determine HLA type?[edit]

HLA types could be of value to people, for example some illnesses respond to certain treatments based on HLA type. Anyone know what the SNPs of interest are to do this?