Have questions? Visit https://www.reddit.com/r/SNPedia

User:J1

From SNPedia

Exome Variant Server[edit]

Including the SNPs from the Exome Variant Server (located at http://evs.gs.washington.edu/EVS/) would be a great addition to Promethease. This site lists rare mutations for genes along with a Polyphen2 score which attempts to determine the seriousness of the mutations. GWAS studies have yet to be done on many of these mutations, so the deleterious (or salutary) nature of these SNPs is yet to be confirmed. However, it would useful to be aware that one carried these rare missense mutations in key disease genes.

Consider the journal article below.

ApoE variant p.V236E is associated with markedly reduced risk of Alzheimer's disease Molecular Neurodegeneration 2014,9:11 doi:10.1186/1750-1326-9-11


The method of selecting the SNPs for the study can be replicated by accessing the EVS site, search using APOE gene, then use pull down "Sorts by Variant" Box use GVS Function, choose Show 50 entries. The SNPs in reddish orange are missense mutations. The above study investigated, rs769452 (on 23andme), rs199768005, and rs769455 which are the most frequently occurring missense mutations in APOE (look down the EA Allele # column with reddish orange background) aside from rs7412 and rs429358 [these make up the APOE epsilon4 genotype (strangely EVS lists rs429358 as benign)]. In this study it was found that the rare SNP rs199768005 formed an APOE epsilon3b haplotype that markedly reduced risk of Alzheimer's disease(OR=0.10)

These mutations are very rare. However, a similar search could be done for any other gene! Collectively, these rare mutations could explain a substantial amount of disease risk (especially the mutations with large effect size). Many of these SNPs are not on the 23andme gene chip. However, rs9331936, rs9331938(considered benign) from CLU are, as is rs3752233 from ABCA7. Yet, with your full genome scan in hand, the entire EVS database would be callable. Many of these mutations are rare enough that they are also not in Alzgene.


I don't think that the study found that rs62256378 AA reduced AD risk. The SNP was shown to reduce cognitive decline in AD patients.

Promethease could include these rare mutations in reports without waiting for GWAS confirmation.

EVS Response[edit]


http://evs.gs.washington.edu/evs_bulk_data/ESP6500SI-V2-SSA137.protein-hgvs-update.snps_indels.vcf.tar.gz

shows me ~1889348 unique snps with rs#s. 272,781 of these have what appears to be the most significant classification as 'probably-damaging'. Adding all of those would quadruple the size of snpedia. I'm not completely opposed, but it would hope to find a way to further filter so that the signal to noise ratio improves. I've run a promethease report based on the ESP snps that are already in SNPedia, and it is now located at View or download



The SNPs in the Exome Variant Server (EVS) are likely very informative for possible disease risk. The missense SNPs (it would be best to include all the missense SNPs because quite a few of them seem to be miscategorized) listed in EVS are all in the exome! All of these SNPs involve protein substitution in genes. That sounds dangerous! Further, these SNPs are probably the causal mutations that everyone has been searching for. The article notes that these mutations are close to a definitive list of all exonic mutations that occur at approximately .1% or greater frequency.

Exactly because these SNPs are so potentially dangerous, there might not be much noise. It might be all signal. The referenced article above simply searched for the 3 most common missense SNPs in APOE (aside from the epsilon genotype) and made 1 discovery and 1 need more research. Not bad, 1+ out of 3. Rare missense SNPs might commonly result in such discoveries. Now think of all the hundreds of thousands of SNPs on most gene chips that have absolutely no discernible significance. Gene chips are almost all noise and no signal. For many diseases, geneticists do not even bother including common SNPs into the model because they add little (if any) information.

Though, if you really wanted to filter, then you could include only the missense SNPs related to disease genes. So, for example, in Alzheimer's disease you could search for all the Alzheimer SNPs in Promethease, determine which genes these SNPs were from, and then add in all the missense exome SNPs from these genes in the Alzheimer tab. If you included missense SNPs from genes without a disease association, it would not be clear what illness might result (though it would still be best to include all the missense SNPs).


These exonic misense SNPs are what Promethesase customers are most interested in. However, most of these SNPs are rare (likely because they are not adaptive). Even if 1 million of these missense SNPs were added to Promethease with an average frequency of 0.1%, then, a typical Promethease report would only have 100 of them. (Many of these SNPs are not on 23andme).

It would be amazing if 23andme offered an Exome SNP chip service that included 1-1.89 million of these SNPs. (Illumina has an Exome chip.) It would be worth $100 to have this information.

It would also be great if Promethease included these SNPs and combined this with family genotyping. If a family had a genetic illness, ran a Promethease scan with these rare EVS SNPs included and there was a correspondence between presence of the illness and the rare SNPs, then this would be very helpful information for them to know. It would be especially helpful if full genome scans were done and all of the rare EVS SNPs could be called.

The Promethease report (noted above) that included the EVS dataset has 132 SNPs with 0 frequency which are likely from the EVS. This is not an excessive addition to the size of the report, though it would be very interesting to investigate each one of these SNPs for possible disease significance.


There is certainly a lot of discussion in the exome and WGS world about VUS (variants of unknown significance). Currently, though, the vast (or maybe I should say VAAST) majority of Promethease users get their data from DNA chips. There are very very few who at this time have exome or especially WGS about themselves and their family. That is surely going to change, but it hasn't yet, and the cost of getting even an exome done is about $1000 at the moment, so it's an order of magnitude more expensive than SNP/chip data. We are actually in discussions about ways to help folks get exome data, and in parallel will keep up with the world of predicted - rather than published - significance about DNA variants. Greg (talk) 03:02, 17 March 2014 (UTC)


Genome Imputting[edit]

For anyone interested in imputing their genome, this should be helpful.

-Download Plink 1.9 at https://www.cog-genomics.org/plink2/ make a folder on the desktop, and put Plink 1.9 into it.

-Set the directory path using commands cd and cd.. to the folder with Plink in it. the command cd.. brings you up one directory. For example if current directory is C:\Users\A\Desktop> and the folder is named plink, then type cd Plink. [You need to open the DOS command prompt to do this. The DOS command prompt can be opened in Windows 8 by swiping the top right corner, typing "command prompt" in the search box, pressing enter, and clicking on the command prompt icon that should be displayed.]


-Read in your 23andme files with below commands. (Impute2 requires at least 2 people in the g file to function):


plink --23file genome.txt AA --out plink_genome1 plink --23file genome.txt BB --out plink_genome2


-Merge the 23andme files with below command:

plink --bfile plink_genome1 --bmerge plink_genome2 --make-bed --out combogenome1_trial


-Recode the Plink file to Impute2 format with below command:

plink --bfile combogenome1_trial --recode oxford --out combogenome1ox


-Download the Impute2 software from http://mathgen.stats.ox.ac.uk/impute/impute_v2.html make a folder on the desktop and put Impute2 download into it.

-Set the directory path to the folder where you put the Impute2 software using the command prompt commands cd.. and cd

-Run the examples using the commands given on the Impute2 website.

-Download the 1000 Genomes reference files from the Impute2 site and put it into the folder with Impute2. (These files are huge {about 3GB}.)

-Unzip them and put them into the folder with Impute2.

-Run the below command to impute your 23andme file. (change the -int argument ranges to move along a chromosome; in the -m argument change the chromosome by changing "21" to the desired chromosome; and with the -o argument change the chromosome number to the desired chromosome and change the number in "Impute1" in order to have a record of your results.) {You might also need to change the name of the "ALL" files as the file names might be too long.}

impute2 -m genetic_map_chr21_combined_b37.txt -h ALL.chr21.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.nomono.haplotypes -l ALL.chr21.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.nomono.legend -g combogenome1ox -int 40.0e6 42.0e6 -Ne 20000 -o Impute1.chr21

-Check the concordance tables at the end of each run, the values should be in the 90+%. It is sometimes can be difficult getting high concordances. One thing to help with this is to change the composition of the g file.


-If this all works out, then you would be able to impute millions of SNPs from your 23andme file! The results for SNPs with non reference allele frequencies above 10% have aggregate r2s in the mid .90s. ( http://mathgen.stats.ox.ac.uk/impute/data_download_1000G_phase1_integrated_SHAPEIT2_9-12-13.html )

This means that for most common SNPs you should receive an accurately imputed genotype. This would be very useful if you wanted to estimate for your risk of a disease with many common variants.


Double Signal[edit]

The recent International Genomics of Alzheimer Project cites rs6733839 (MAF=.37(T) see the first table in [Alzheimer's Disease, late onset (IGAP)] ) located at 2:127135234 on the gene BIN1 as a risk SNP for Alzheimer's disease with OR=1.22. In fact, BIN1 has the second largest population attributable fraction for AD (behind only APOE)(see the first table in [Alzheimer's Disease, late onset (IGAP)] ).

SNPedia has recently added rs744373 (listed on dbSNP MAF=.37 (G) , SNPedia is listing the minor allele as C), located at 2:127137039 on the gene BIN1 as a risk allele for AD with OR=1.17.

rs6733839 and rs744373 are less than 2,000 base pairs apart,and they have similar ORs and MAFs. Therefore, it seems reasonable to suspect that these 2 SNPs are actually reporting the same signal. It is important that Promethease report only one of these risk variants. (Admittedly, rs744373 might be the better choice as it is on the 23andme gene chip, whereas rs6733839 is not on the gene chip).

It seems reasonable to suspect that these 2 SNPs are actually reporting the same signal.
Agreed
It is important that Promethease report only one of these risk variants. (Admittedly, rs744373 might be the better choice as it is on the 23adnme gene chip, whereas rs6733839 is not on the gene chip).
Disagreed.
Promethease is not just a support tool for 23andMe data, and certainly not for just the current version of their platform. Currently, they're on their 4th configuration, and that doesn't even count the exome sequencing they've offered. Promethease is used by MANY other audiences, including a lot of full genome sequencing. SNPedia text can be used to highlight the LD, but assuming that there is no LD between snps in a promethease report is dangerously wrong.
--- cariaso 00:53, 16 May 2014 (UTC)

Large Schizophrenia GWAS[edit]

The Psychiatric Genomics Consortium (PGC) is claiming 128 genome wide significant associations for schizophrenia. With their current sampling effort, this number should approach 1,000 within the next year. Slide 18 of the below presentation shows that the 128 SNPs can predict cases versus controls at p=4x10e-175 ! Look at the upper decile of risk in the figure, the odds ratio is around 7. I am not aware of a publication that reports these results, though it will be very useful to have this information when it becomes available(especially since these SNPs are reported to be common).

Anyone know how to be genotyped on the Psych chip? PGC has professed their commitment to open science. Does open science mean the people can opt in and help move the science forward?

http://www.med.unc.edu/cpg/news/banbury-presentation


There are several reports of combinations of HFE, TF and APOE epsilon etc. that might be made into genosets.

For example, bicarriers of HFE C282Y and HFE H63D which constitutes 3% of the population appear at risk for Alzheimer's

see Iron genes, iron load and risk of Alzheimer’s disease J Med Genet 2006;43 :e52 (http://www.jmedgenet.com/cgi/content/full/43/10/e52). doi: 10.1136/jmg.2006.040519

This result might not have been confirmed, though the article notes that these people would be at risk for a condition that is treatable. Perhaps snpedia could provide this as supplemental information

Also, Involvement of ApoE E4 and H63D in sporadic Alzheimer's disease in a folate-supplemented Ontario population J Alzheimers Dis. 2008 May;14(1):69-84


Huge news!

The Schizophrenia Working Group of the Psychiatric Genomic Consortium (PGC) has released their results of a massive GWAS in schizophrenia. Open access at Nature http://www.nature.com/nature/journal/vaop/ncurrent/full/nature13595.html

They found 108 loci.

Schizophrenia has, until now, been considered undecipherable.

We might need a team on snpedia to upload all these results.

So on a quick first read of the data, there are several issues, which maybe dedicated SNPedians can help work out. First and foremost is that the list of 108 loci [Supplementary Table 3] contains no SNPs ... they are just loci, meaning regions spanning at least 20KB each, without named SNPs. 128 SNPs are listed in Supp. Table 2, which could presumably be paired off with the loci listed in Table 3. Second, at first glance, there doesn't appear to any data indicating whether the effect, positive or negative for each SNP is seen (only) for the homozygous minor allele, or also for the heterozygous case. And third, the odds ratios are so small (i.e. so close to 1) that the only application seems to be the "just add'm up" approach, binning in this case into the deciles composed of sums ranging from (presumably) 0 to 216 (i.e. 108x2), and as the authors stress, neither the sensitivity or specificity of even this approach has predictive value. Greg (talk) 18:56, 23 July 2014 (UTC)


Yes, the real issue is how to present these results on snpedia in an accessible way. The authors of the present study intend to continue with larger GWAS to include 100,000s subjects. There are an estimated 10,000 genetic loci involved in schizophrenia risk. The 108 loci reported in the current study likely have the strongest effect among the common mutations. Yet, the highest OR was only 1.3. We are now starting to see GWAS with large numbers of SNPs that convey minimal amounts of useful information. The SNP with OR 1.3 had a frequency of 0.0225 in cases and 0.0191 in controls. The difference is minuscule!

Clearly snpedia is going to have to move to the add'm up approach. However,it is very unclear what meaningful biological units are being measured in such a summation (apples and oranges). Scaring Promethease users with the information that they have 1,000 markers that predispose them to schizophrenia will likely not properly convey to them the truth: having only 1,000 markers might mean that they were in the low risk decile. Having 1 number that conveys meaningful information about risks for schizophrenia, intelligence, etc. will make Promethease a much more valuable service.

Even with the warnings given in the article, I would still be interested in knowing what decile I am in. The top decile had an OR of 25!

On page 18 of the presentation (see url below), area under the curve is quoted as 0.7. That is not good enough for a diagnosis, though it is getting there. The top decile had an OR of 25!

http://www.med.unc.edu/cpg/news/banbury-presentation

Schizophrenia only occurs in 1% of the population. Why did they divide the risk sets into deciles (percentiles might be more informative)? It would be very interesting to know what the 1st percentile OR was and how predictive this was for schizophrenia.

None of the 23andme SNPs appear to be included in the new study.


Well, how about for the moment we agree that when the AUC is 0.90 or greater, it's worth entering, even with all the caveats? Greg (talk) 00:14, 26 July 2014 (UTC)

Yes, though it will be a tough wait. It could take years to get to an AUC of 0.90.


SNP Linkage Tool[edit]

Does Promethease routinely use the SNP linkage tool from http://www.broadinstitute.org/mpg/snap/ldsearch.php ?

I was reading an article on pubmed http://www.ncbi.nlm.nih.gov/pubmed/25038421 that included a SNP (rs998382) not listed on my 23andme file. (The study found a large increased risk of AD in carriers of a haplotype). However, when I entered rs998382 into the above SNP linkage tool, four SNPs were listed with 1.000 linkage, several more were near 0.900. snpedia might include these SNPs on the rs pages for the convenience of its readers?


Would Promethease have picked up on this and gave the results to 23andme users? If this feature is not included in Promethease, it should be. Perhaps whatever linkage statistic corresponds to 95% probability of determining the correct genotype could be used. This would be a fast and easy way for Promethease to do some imputing.




The text below (between the hyphen) was copied from the top of this page and concerns background information on the Exome Variant Server.



"Including the SNPs from the Exome Variant Server (located at http://evs.gs.washington.edu/EVS/) would be a great addition to Promethease. This site lists rare mutations for genes along with a Polyphen2 score which attempts to determine the seriousness of the mutations. GWAS studies have yet to be done on many of these mutations, so the deleterious (or salutary) nature of these SNPs is yet to be confirmed. However, it would useful to be aware that one carried these rare missense mutations in key disease genes.

Consider the journal article below.

APOE epsilon 3b markedly reduces risk of AD[edit]

ApoE variant p.V236E is associated with markedly reduced risk of Alzheimer's disease Molecular Neurodegeneration 2014,9:11 doi:10.1186/1750-1326-9-11


The method of selecting the SNPs for the study can be replicated by accessing the EVS site, search using APOE gene, then use pull down "Sorts by Variant" Box use GVS Function, choose Show 50 entries. The SNPs in reddish orange are missense mutations. The above study investigated, rs769452 (on 23andme), rs1997768005, and rs769455 which are the most frequently occurring missense mutations in APOE (look down the EA Allele # column with reddish orange background) aside from rs7412 and rs429358 [these make up the APOE epsilon4 genotype (strangely EVS lists rs429358 as benign)]. In this study it was found that the rare SNP rs1997768005 formed an APOE epsilon3b haplotype that markedly reduced risk of Alzheimer's disease(OR=0.10)

These mutations are very rare. However, a similar search could be done for any other gene! Collectively, these rare mutations could explain a substantial amount of disease risk (especially the mutations with large effect size). Many of these SNPs are not on the 23andme gene chip. However, rs9331936, rs9331938(considered benign) from CLU are, as is rs3752233 from ABCA7. Yet, with your full genome scan in hand, the entire EVS database would be callable. Many of these mutations are rare enough that they are also not in Alzgene.


I don't think that the study found that rs62256378 AA reduced AD risk. The SNP was shown to reduce cognitive decline in AD patients.

Promethease could include these rare mutations in reports without waiting for GWAS confirmation.


http://evs.gs.washington.edu/evs_bulk_data/ESP6500SI-V2-SSA137.protein-hgvs-update.snps_indels.vcf.tar.gz

shows me ~1889348 unique snps with rs#s. 272,781 of these have what appears to be the most significant classification as 'probably-damaging'. Adding all of those would quadruple the size of snpedia. I'm not completely opposed, but it would hope to find a way to further filter so that the signal to noise ratio improves. I've run a promethease report based on the ESP snps that are already in SNPedia, and it is now located at View or download


The SNPs in the Exome Variant Server (EVS) are likely very informative for possible disease risk. The missense SNPs (it would be best to include all the missense SNPs because quite a few of them seem to be miscategorized) listed in EVS are all in the exome! All of these SNPs involve protein substitution in genes. That sounds dangerous! Further, these SNPs are probably the causal mutations that everyone has been searching for. The article notes that these mutations are close to a definitive list of all exonic mutations that occur at approximately .1% or greater frequency.

Exactly because these SNPs are so potentially dangerous, there might not be much noise. It might be all signal. The referenced article above simply searched for the 3 most common missense SNPs in APOE (aside from the epsilon genotype) and made 1 discovery and 1 need more research. Not bad, 1+ out of 3. Rare missense SNPs might commonly result in such discoveries. Now think of all the hundreds of thousands of SNPs on most gene chips that have absolutely no discernible significance. Gene chips are almost all noise and no signal. For many diseases, geneticists do not even bother including common SNPs into the model because they add little (if any) information.

Though, if you really wanted to filter, then you could include only the missense SNPs related to disease genes. So, for example, in Alzheimer's disease you could search for all the Alzheimer SNPs in Promethease, determine which genes these SNPs were from, and then add in all the missense exome SNPs from these genes in the Alzheimer tab. If you included missense SNPs from genes without a disease association, it would not be clear what illness might result (though it would still be best to include all the missense SNPs).


These exonic misense SNPs are what Promethesase customers are most interested in. However, most of these SNPs are rare (likely because they are not adaptive). Even if 1 million of these missense SNPs were added to Promethease with an average frequency of 0.1%, then, a typical Promethease report would only have 100 of them. (Many of these SNPs are not on 23andme).

It would be amazing if 23andme offered an Exome SNP chip service that included 1-1.89 million of these SNPs. (Illumina has an Exome chip.) It would be worth $100 to have this information.

It would also be great if Promethease included these SNPs and combined this with family genotyping. If a family had a genetic illness, ran a Promethease scan with these rare EVS SNPs included and there was a correspondence between presence of the illness and the rare SNPs, then this would be very helpful information for them to know. It would be especially helpful if full genome scans were done and all of the rare EVS SNPs could be called.

The Promethease report (noted above) that included the EVS dataset has 132 SNPs with 0 frequency which are likely from the EVS. This is not an excessive addition to the size of the report, though it would be very interesting to investigate each one of these SNPs for possible disease significance.


   There is certainly a lot of discussion in the exome and WGS world about VUS (variants of unknown significance). Currently, though, the vast (or maybe I should say VAAST) majority of Promethease users get their data from DNA chips. There are very very few who at this time have exome or especially WGS about themselves and their family. That is surely going to change, but it hasn't yet, and the cost of getting even an exome done is about $1000 at the moment, so it's an order of magnitude more expensive than SNP/chip data. We are actually in discussions about ways to help folks get exome data, and in parallel will keep up with the world of predicted - rather than published - significance about DNA variants. Greg (talk) 03:02, 17 March 2014 (UTC) 

"


Exome scan update[edit]

I am in the final stages of arranging for my exome to be sequenced. (This is overwhelmingly exciting!) Would Promethease be able to run my exome sequence to find any variants as listed on the Exome Variant Service (see above)?

As was noted above, these rare SNPs are likely to have biological relevance. A function on Promethease that merely reported these rare variants would be very useful. Knowing that a variant was a mis-sense or frameshift mutation in a key disease gene would be highly suggestive of pathological significance. Many of the mutations listed on the Exome Variant Service are very rare. It might take many years for GWASs to unravel the relationship of these SNPs to diseases.


Exome scan result came through and I ran the exome file through Promethease: no problem. I am anxiously waiting for when I will be able to upload a combo exome scan and 23andme file (perhaps I could add in an offspring's 23andme file to check for concordance between our two 23andme files) to Promethease.

Does anyone know of a company that offers exome gene chip services on a direct to consumer basis?


Opportunity for Snpedia[edit]

basespace is putting out a call for software developers to post apps onto their platform.

Seems like an obvious time for snpedia to step up to the opportunity. Illumina has a dominant position in the next generation genomics space.

http://blog.basespace.illumina.com/2015/03/31/basespace-2015-wwdc-the-broad/

I actually attended the basespace training at the illumina offices in SF about 6 months ago, and have used basespace while analyzing some cancer data more recently. basespace is very cool, but its not a high priority for promethease at the moment. --- cariaso 02:12, 2 April 2015 (UTC)

I have been trying to run their VariantStudio App in my basespace account: just will not launch. Very frustrating! I finally realized that Mutation Taster allows users to uphold their exome file to their server. They ran the full analysis on all the exome variants. (I thought Promethease could do something similar. I am very glad that I can have a peek at all the variants. most of the programs stay with variants that have been vetted through OMIM etc.) User:J1



What happened to AG rs1129844?[edit]

I just updated the write up on rs1129844 AG genotype to reflect the new published research on delaying AD for 10 years! There is a lot of buzz on the dementia sites about this SNP now.

For some strange reason, my latest edit to the AG genotype has disappeared from the rs1129844 page. Before I edited it was showing at the top right and now it isn't?


The eotaxin-1 story is exciting. There are a range of interventions/conditions that might possibly affected by eotaxin-1 levels.

For example,

bipolar illness http://www.ncbi.nlm.nih.gov/pubmed/25973785

allergy / asthma / GI tract / cardiovascular http://www.ncbi.nlm.nih.gov/pubmed/25759694

smoking http://www.ncbi.nlm.nih.gov/pubmed/25274579

antioxidants?? http://www.ncbi.nlm.nih.gov/pubmed/25254081

Synephrine (used as a diet aid similar to adrenaline, found in green oranges, the IL4 eotaxin-1 synergy was also noted in the article) http://www.ncbi.nlm.nih.gov/pubmed/25111027

An antibody is already in clinical trials and a diagnostic test is in the works.

rs1129844 AG does not delay AD by 10 Years!![edit]

The write up on the AG genotype seems very misleading. The AG genotype does not appear able to delay AD by 10 years. The article in supplementary Figure 5 only showed a modest delay of 2.5 years (from 61.5 to 64). There was almost total overlap of the error bars. I think the write up should be changed to reflect this fairly small difference. I do not think it would appropriate to suggest to the upwards of 30% of AG carriers that they could expect a 10 year delay in dementia. This is not what the article indicated.

I think (and hope) that we are in agreement on this. The article is of course promising, but thanks to wishful thinking and the popular press, amplified by various internet echo chambers, this very preliminary research is being blown way out of proportion. If it does replicate in large enough cohorts, it won't be too long before such studies get published. But if it doesn't replicate, the lack of results may not get published for quite a while. Greg (talk) 04:12, 8 September 2015 (UTC)


Using Promethease to help confirm such findings would be great. The published article only had 5 AA carriers. The error bars almost totally enclosed the results. It would so helpful if Promethease could ask for phenotype information from people to try and help fill in the missing pieces. Perhaps they could include some sort of cognitive test to fill in or informants to report on cognitive status. I am not sure how valid such an assessment of age of onset would be though it might be worth a try. Better than having to wait months or years for replications through regular channels.

Might be able to move toward real time genetic confirmations.

10x Genomics[edit]

Really excited about this new technology that is now shipping. This simple toaster will allow for phasing of genomes. Wonder how such information could be included in snpedia?



Could Promethease consider Runs of homozygosity?[edit]

"significant associations between summed runs of homozygosity and four complex traits: height, forced expiratory lung volume in one second, general cognitive ability and educational attainment (P < 1 × 10(-300), 2.1 × 10(-6), 2.5 × 10(-10) and 1.8 × 10(-10), respectively). In each case, increased homozygosity was associated with decreased trait value, equivalent to the offspring of first cousins being 1.2 cm shorter and having 10 months' less education."

This would be a great way to get around having to find the SNPs!


Nature. 2015 Jul 23;523(7561):459-62. doi: 10.1038/nature14618. Epub 2015 Jul 1. Directional dominance on stature and cognition in diverse human populations

This is great! Below is a tool that calculates ROH. It would be fantastic to include something like this in Promethease. Does anyone know what how to convert the output from the below url with a 23andme file and some sort of educational attainment or cognitive ability estimate? Would be very interested!!! http://www.math.mun.ca/~dapike/FF23utils/roh.php

I'm not sure if this is what you're looking for, but there is a PMC article called Intellectual Disability Is Associated with Increased Runs of Homozygosity in Simplex Autism, which is all I could find immediately. This article, which is behind a paywall, may also be of interest: Excess of runs of homozygosity is associated with severe cognitive impairment in intellectual disability. Another article, Genome-wide estimates of inbreeding in unrelated individuals and their association with cognitive ability, may also be of interest of you. -- Lilstar

Thank you very much your suggestions! I was particularly interested in the recently published article from Nature that I quoted. I find it remarkable that they found p values less than 10(-10) for cognitive ability and educational achievement using this ROH approach. The genetics of IQ always generates large interest. However, up to this point no large effect SNPs have been found for intellectual traits. The Nature article appears to have now reported a substantial result that many users of 10(-10) Promethease would likely be very interested in. A difference of 10 months' in educational acheivement would substantially differentiate people.

There would be substantial interest in being able to being able to interpret the results of an ROH analysis using a 23andme file uploaded to the www.math.mun.ca site above. This looks very exciting. Can anyone help us out? This would be BIG!! J1


Finding a Variant[edit]

A family member has a trait that we are interested in finding its genetics cause. Could someone help me double check my thinking on how to narrow down the genomic space?

My idea is to have a phased exome scan done for our family member. (The above 10x Genomics technology might be very helpful for this purpose.)

In order to determine which chromosomes were maternal and which were paternal, I thought we could then do some fairly inexpensive genochipping with cousins and other relatives on both sides of the family. Any shared rare variants or haplotypes etc. should identify the source (maternal or paternal). There might have been some crossing over that could obscure this, though hopefully this will not be too much of a problem. (Anyone know the typical number of crossovers over a one or two generation span?)

If we could assign whole chromosomes or portions of chromosomes to a maternal or paternal source, then we could remove the half of chromosomes that did not have the trait of interest.

We could then genechip our family member's siblings to further define the region of genetic interest. Finally we could genechip the offspring of our family member to see whether they have inherited any regions of interest.

This seems like it might achieve its intention. Investigating in this manner would only cost a modest amount of money, and possibly yield a fairly large payback. The phased exome scan might cost $500 and 10 genechippings another $1000. Narrowing down the genetics of this trait would be worth it.

We could also leverage relatives on 23andme for further insight at no additional expense.

Anyone have any comments or criticisms of this plan? I suspect many other people might be interested in pursuing a similar plan of action if it were to be as affordable as outlined above.

-- J1
see the link at phased. 1 or 2 crossings per chromosome per generation --- cariaso 16:21, 10 September 2015 (UTC)


Thank you very much for the link.

Phasing our family member's exome appears to now be simple with the new 10x Genomics technology.

The big problem is that we do not have a genetic sample for our family member's mother or father. We thought that perhaps that genechipping the cousins and other distant relatives would help us to determine which chromosomes in the family member's phased exome were paternal and which were maternal. If there has been no inbreeding in the family during the last few generations (which would be consistent with a recent ROH analysis), then relatives on the paternal and maternal lines should be genetically distinct. Even though crossing over events would have occurred over the generations that separated the relatives and our family member's mother and father, these cross overs would still allow complete identification of the origin of chromosomes to the time of our family member's mother and father.

No matter how many generations we need to span with the cousins, only the last generation of cross overs will be the main hurdle. For consider, a relative 5 generations distant from our loved one's parents. Even though 5 to 10 cross overs (for both cousins and parents) might seperate them, the common rare SNPs would identify whether it was a maternal or paternal chromosome. There would then only be one generation between the parents of our loved one and our loved one to work out the cross overs. If there are only 1 or 2 cross overs per chromosome to deduce, then this might not require that much effort to piece things together.

Would such a plan be feasible? It does not seem overly complicate or expensive. Many others might be interested in doing something similar. This could greatly narrow down the target zone for a trait of interest. -- J1 ||


Great News for Mismatch Repair Cancers![edit]

There is a set of cancers that result from so called mismatch repairs deficiency, such as Lynch syndrome (colorectal /polpys). (PMS2 gene) These mismatches can accumulate and a wide range of cancers can result.

Apparently a new drug can help treat this problem. The idea seems to be that while mismatch repairs can often lead to cancer, these cancers are more easily recognized by the immune system because they are particularly abnormal.

"Among 50 patients with colorectal cancer, 62% of the 25 patients with mismatch repair–deficient tumors responded to pembrolizumab, but no responses were seen among the 25 mismatch repair–proficient patients. The difference in disease control rates (responses plus stable disease) was even greater: 92% in the mismatch repair–deficient group and only 16% in the mismatch repair–proficient group.

An overall response rate of 60% was observed in patients with mismatch repair–deficient advanced endometrial cancer and several types of advanced gastrointestinal cancers including ampullary, duodenal, cholangiocarcinoma, and gastric cancers.

Mismatch repair deficiency is found in 15% to 20% of sporadic (noninherited) colorectal cancers and in nearly all colorectal cancers associated with Lynch syndrome, which constitute up to 5% of all colorectal cancers. Mismatch repair deficiency is also found in other tumor types including stomach, small bowel, endometrial, prostate, and ovarian cancer."


This could be a great win for all those with mismatch repair cancers! Perhaps we could determine the entire set of SNPs that fall within the category of mismatch repair and create a gene set that notifies all those with any of these SNPs the good news about Pembrolizumab.

http://www.ascopost.com/News/27670


wiki has info on this.

MMR deficiency in humans

"In humans, seven DNA mismatch repair (MMR) proteins (MLH1, MLH3, MSH2, MSH3, MSH6, PMS1 and PMS2) work coordinately in sequential steps to initiate repair of DNA mismatches.[22] In addition, there are Exo1-dependent and Exo1-independent MMR subpathways.[23]

Other gene products involved in mismatch repair (subsequent to initiation by MMR genes) in humans include DNA polymerase delta, PCNA, RPA, HMGB1, RFC and DNA ligase I, plus histone and chromatin modifying factors.[24][25]"


mismatch repair errors was highest in melanoma and second in colon.

Among the 27 DNA repair genes evaluated, 13 DNA repair genes, MLH1, MLH3, MGMT, NTHL1, OGG1, SMUG1, ERCC1, ERCC2, ERCC3, ERCC4, RAD50, XRCC4 and XRCC5 were all significantly down-regulated in all three grades

MutS, MutH and MutL were mentioned thought not mutYH. https://en.wikipedia.org/wiki/DNA_mismatch_repair#MutS_homologs

mutYH seems to be in this list. https://en.wikipedia.org/wiki/DNA_glycosylase

Reporting the full set of SNPs that would benefit from Pembrolizumab would be of use to possibly a substantial group of snpedia users.


Yes, the Figure from ASCO was quite impressive! It seems that most of the CRC mmrs responded, almost none of them grew. The ones that did grow were probably the more stable of the tumors. No great surprise that mismatch repair would be such a problem in CRC, doesn't it regrow every few days? Any tissue with that much growth and regrowth (and toxin exposure) would be expected to have these problems-- such as melanoma which is the most mutated of all cancers


http://meetinglibrary.asco.org/content/109904?media=vm


Having Trouble believing rs2024513[edit]

The OR as reported on snpedia of rs2024513 of 1.7 seems to be misinterpreted. "The size of our sample was sufficient to detect a significant difference with a power of more than 90% assuming an odds ratio (OR) values of AA as 1.7 with a minor allele frequency of 0.1." The way I read this is that they are providing their power calculations for AA as 1.7, not reporting it as 1.7.

Table 2

Comparison of genotype and allele frequencies of six SNPs at the NRXN1 gene between schizophrenic patients and healthy control subjects Makers Genotype N (Freq.) Chi-square (df = 2) p- value HWEP Allele N (Freq.) Chi-square (df = 1) p- value OR (95% CI)

rs10490168 AA AG GG A G

Cases 10(0.013) 173(0.225) 585(0.762) 6.049 0.048 0.485 193(0.126) 1343(0.874) 5.677 0.017 0.78(0.63-0.96) Controls 13(0.018) 204(0.276) 521(0.706) 0.168 230(0.156) 1246(0.844)

rs2024513 AA AG GG A G

Cases 544(0.708) 206(0.268) 18(0.023) 7.521 0.023 0.772 1294(0.842) 242(0.158) 7.327 0.006 1.30(1.07-1.56) Controls 480(0.650) 228(0.309) 30(0.041) 0.655 1188(0.805) 288(0.195)

In fact the AA genotype is only an OR of 1.089 (0.708/0.650).

NRXNs do have biological probability for being involved in psychiatric diseases, though I am very unclear how the huge recent schizophrenia GWAS with 100,000 people would have missed this common SNP. The largest OR that was found in this GWAS had an OR of 1.30. It would seem that there might not be any really large effect common SNP in schizophrenia, perhaps this is only true in white populations.

I think this is roughly accurately summarized. The odds ratio cited in Table 2 is 1.3 per (A) allele, so at least theoretically, the odds for the (AA) homozygote is 1.3 x 1.3 = 1.69, or ~1.7. But sure, this is a first report, that hasn't apparently been extended or independently replicated since it's original publication. So probably it will "fade" over time into being of lower and lower significance. Greg (talk) 02:28, 26 May 2016 (UTC)

Just wanted to check. Still don't understand how Table 2 shows a 1.7 OR for AA. I am reading the table as saying 544 (70.8%) of cases and 480 (65.0%) of controls had AA genotype.

Why isn't the AA OR simply 70.8/65.0 =1.089? The numbers are nowhere near 70% excess of cases over controls. It isn't even 10%!

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3080281/table/T2/


Uploading the CFTR2 dataset[edit]

Does snpedia have a feature that would allow for the bulk uploading of a set of SNPs directly into Promethease?

Specifically, I was wondering whether the cystic fibrosis dataset at https://www.cftr2.org/mutations_history might simply be dropped into a table without having to make a SNP page for each of the hundreds of these SNPs.

What I find especially interesting about cystic fibrosis is that it now appears that a nearly complete genetic description of the illness has been achieved. This likely applies to few other genetically heterogeneous illnesses.

I have already curated and added each one. In general, we don't do bulk uploads into genotypes, because we do curate them as much as possible. Greg (talk) 19:55, 14 March 2017 (UTC)
P.S. Go here, scroll down until you find the expandable window.Greg (talk) 19:57, 14 March 2017 (UTC)


Everyone has a pathogenic PSEN1 mutation?[edit]

I just created a report on Promethease and one of the reported findings was:

"rs63750487, also known as c.676C>T, p.Leu226Phe and L226F, represents a rare mutation in the PSEN1 gene. The rs63750487(T) allele has been reported to be a dominant mutation leading with high penetrance to early-onset Alzheimer's disease.10.2147/CIA.S111821"

Yet, our family member genotyped CC which is common. Should not only the disease-increasing genotype of rs63750487 be repored?


Polling Feature for Promethease[edit]

Might a polling feature be added to Promethease reports? Some variants are quite rare and it might require some time for the entire human genome to be unlocked. It might be very helpful for users of Promethease if questions could be asked of those generating reports who carry rare variants with some (though not conclusive) evidence of relationship to traits/illnesses.

This would only require a fairly simple change in the software. For instance, if a rare variant were found in rs123, then an exit question from Promethease might be: Do you have a family history or have you ever been treated for illness X or had trait Y. This would allow more of the signal in uers' genomes to be converted into usable information.

This is one of the reasons we have been collecting user feedback for over a year through almost exactly that mechanism on blood type, to deploy it on more consequential variants and to collect data that can't be verified can be problematic. We are encouraging one-on-one contact for selected variants, where we explicitly solicit feedback, which is helping shape our thoughts about how to do this correctly. Perhaps the biggest issue right now is the issue of platform-specific false positives and false negatives, especially from the direct-to-consumer genotyping platforms, which confuses any attempt to address the biological questions of how penetrant any given variant may be for a given condition.


I am glad that you are doing a trial on this idea. As you said there could be a variety of issues that could emerge. Allowing the information flow to be done away from the snpedia pages would be a great benefit. I was thinking more of super rare variants that crop up in exome/genome files which are listed in dbnp as of "uncertain clinical significance".

One implementation could be to run all existing variants on dbsnp through Mutation Taster and find those that Mutation Taster considers to be disease causing are not in listed in snpedia and are rare. Any new variants that showed up in an exome/ genome file uploaded to Promethease could then also be run through Mutation Taster. You could then ask questions about variants that popped up from Mutation Taster or those that others on snpedia had posted for comment. People could be more willing to help if they realized that members of their extended family had these suspicious variants and these extended family members were concerned about certain variants.

The RAB10 variant related to Alzheimer's might be a good one to also include in a trial. rs142787485 has been reported to have a large effect on reducing AD risk in carriers of APOE4. 1-2 million APOE4 carriers in America actually also carry the RAB10 variant and so would be at lowish risk of dementia. These 1-2 million Americans and many millions elsewhere in the world would greatly appreciate knowing whether the initial research can actually be replicated. A simple polling yet well formulated question on Promethease might quickly resolve any uncertainty. Those who did carry the double combo could then be asked if others in their family had also been genotyped. Finding other family members who carried an APOE4 and not a RAB10 variant and might have a different dementia experience than those family members with the double combo could help validate the RAB10 variant as protective. It would only require a few hundred APOE4 carries with RAB10 to more definitely establish a link. As it is, this story has been on the shelf since July of 2016 and it has only been in the last week or two that the stale research from 18 months ago has been formally published. Millions of people should not have to wait years to know whether they are likely clear of dementia.

For common SNPs on genechips this might not be as helpful. Common SNPs typically have low effects, there should now not be any common large effect variants left undiscovered in this era of the mega-GWAS.

There are tens of thousands of variants in our exome files many of which are extremely extremely rare. It might be years if not decades before the meaning of these variants are unlocked by the official research community. However, using a simple polling feature could help unlock the information much much sooner. How many matches would be required if some SNP with an MAF of 0.0001 could be associated with some rare illness? There is probably still a fair number of traits/illnesses that have unreported, very rare, and large effect variants.


What is going on with all of these new accounts?[edit]

Is snpedia being spammed with all these new accounts? Perhaps no new accounts should be allowed until this can be sorted out. snpedia offers a valuable service to the community. Why might some individual or individuals be motivated to prevent others from adding to this database? J1 (talk)

yes, brutally. but you'll see none of them manage to make any page edits. The protection mechanism is working, it just doesn't care about signup. So they're completely toothless, and easier to ignore than to worry about. --- cariaso 20:29, 22 June 2018 (UTC)


Perhaps, there could be a way of preventing the new accounts from being reported to the recent changes. It seems that all the spam is displacing legitimate changes. For example, I notice that my SORL1 post has went unresponded. SORL1 could be the 6th autosomal dominant AD gene! J1 (talk)


Hmm, I noticed that I can sign up for an account without a robot check. Perhaps that could slow down the account sign up bot. J1 (talk)


An Alzheimer's GWAS for the People[edit]

Might snpedia/Prometease be interested in hosting a genetic databank for those with Alzheimer's dementia? You have developed a knowledge base of how to handle large amounts of genetic data in the form of gene chips results/ exome files/ whole genome files etc.. This knowledge could be applied to giving your customers (and perhaps others) the opportunity to upload genetic data along with some phenotype information related to Alzheimer's (and perhaps a range of other genetic illnesses). Basically, Promethease (or a related online platform) could help assemble a GWAS. Typically those with illnesses such as Alzheimer's do not contribute to the genetic research effort due to various obstacles that they must surmount. There are tens of millions of those currently coping with dementing illness and would quite likely be willing to upload their genetic and phenotype data if given a convenient option to do so. If 1 million AD gene file could be accumulated, then much of the genetic architecture of Alzheimer's could be discovered.

This idea could unlock the genetic risk of AD and other illnesses and have widespread benefits.


Kegg pathways for snpedia?[edit]

One feature that I would find helpful (along with probably many others) would be to display Promethease results with variants organized by major biological pathway. For example, I would find it highly informative if Promethease displayed the glycolysis pathway and then displayed the variants that occurred along these pathways. The path might start with the glucose transporters such as the Gluts, continue with the Hexokinases etc., and then show lactate exporters (MCT-1 etc.). This comprehensive of viewing personal genetics might be of considerable help for people to see a broader biological picture of their unique biology. While this information might not be completely comprehensible at this time, in time it will be more fully understood. In the meantime people would have glimpses of this more complete view of their genome.

https://www.google.com/url?sa=i&url=http%3A%2F%2Fmpmp.huji.ac.il%2Fmaps%2Fglycolysispath.html&psig=AOvVaw1uad9rTRV7iWjuVEPB18dB&ust=1595298041194000&source=images&cd=vfe&ved=0CAIQjRxqFwoTCIjgrbvi2uoCFQAAAAAdAAAAABAO


For example, here are some genes involved in glycolysis: Organizing the variants for these genes by pathway along with MAF etc. in comma separated files would be of considerable value to Promethease customers.

Original Member NCBI (Entrez) Gene Id Gene Symbol Gene Description 10327 10327 AKR1A1 aldo-keto reductase family 1 member A1 [Sou... 124 124 ADH1A "alcohol dehydrogenase 1A (class I), alpha ... 125 125 ADH1B "alcohol dehydrogenase 1B (class I), beta p... 126 126 ADH1C "alcohol dehydrogenase 1C (class I), gamma ... 127 127 ADH4 "alcohol dehydrogenase 4 (class II), pi pol... 128 128 ADH5 "alcohol dehydrogenase 5 (class III), chi p... 130 130 ADH6 alcohol dehydrogenase 6 (class V) [Source:H... 130589 130589 GALM galactose mutarotase [Source:HGNC Symbol;Ac... 131 131 ADH7 "alcohol dehydrogenase 7 (class IV), mu or ... 160287 160287 LDHAL6A lactate dehydrogenase A like 6A [Source:HGN... 1737 1737 DLAT dihydrolipoamide S-acetyltransferase [Sourc... 1738 1738 DLD dihydrolipoamide dehydrogenase [Source:HGNC... 2023 2023 ENO1 enolase 1 [Source:HGNC Symbol;Acc:HGNC:3350] 2026 2026 ENO2 enolase 2 [Source:HGNC Symbol;Acc:HGNC:3353] 2027 2027 ENO3 enolase 3 [Source:HGNC Symbol;Acc:HGNC:3354] 217 217 ALDH2 aldehyde dehydrogenase 2 family member [Sou... 218 218 ALDH3A1 aldehyde dehydrogenase 3 family member A1 [... 219 219 ALDH1B1 aldehyde dehydrogenase 1 family member B1 [... 220 220 ALDH1A3 aldehyde dehydrogenase 1 family member A3 [... 2203 2203 FBP1 fructose-bisphosphatase 1 [Source:HGNC Symb... 221 221 ALDH3B1 aldehyde dehydrogenase 3 family member B1 [... 222 222 ALDH3B2 aldehyde dehydrogenase 3 family member B2 [... 223 223 ALDH9A1 aldehyde dehydrogenase 9 family member A1 [... 224 224 ALDH3A2 aldehyde dehydrogenase 3 family member A2 [... 226 226 ALDOA "aldolase, fructose-bisphosphate A [Source:... 229 229 ALDOB "aldolase, fructose-bisphosphate B [Source:... 230 230 ALDOC "aldolase, fructose-bisphosphate C [Source:... 2538 2538 G6PC glucose-6-phosphatase catalytic subunit [So... 2597 2597 GAPDH glyceraldehyde-3-phosphate dehydrogenase [S... 2645 2645 GCK glucokinase [Source:HGNC Symbol;Acc:HGNC:4195] 2821 2821 GPI glucose-6-phosphate isomerase [Source:HGNC ... 3098 3098 HK1 hexokinase 1 [Source:HGNC Symbol;Acc:HGNC:4... 3099 3099 HK2 hexokinase 2 [Source:HGNC Symbol;Acc:HGNC:4... 3101 3101 HK3 hexokinase 3 [Source:HGNC Symbol;Acc:HGNC:4... 3939 3939 LDHA lactate dehydrogenase A [Source:HGNC Symbol... 3945 3945 LDHB lactate dehydrogenase B [Source:HGNC Symbol... 3948 3948 LDHC lactate dehydrogenase C [Source:HGNC Symbol... 441531 441531 PGAM4 phosphoglycerate mutase family member 4 [So... 501 501 ALDH7A1 aldehyde dehydrogenase 7 family member A1 [... 5105 5105 PCK1 phosphoenolpyruvate carboxykinase 1 [Source... 5106 5106 PCK2 "phosphoenolpyruvate carboxykinase 2, mitoc... 5160 5160 PDHA1 pyruvate dehydrogenase E1 alpha 1 subunit [... 5161 5161 PDHA2 pyruvate dehydrogenase E1 alpha 2 subunit [... 5162 5162 PDHB pyruvate dehydrogenase E1 beta subunit [Sou... 5211 5211 PFKL "phosphofructokinase, liver type [Source:HG... 5213 5213 PFKM "phosphofructokinase, muscle [Source:HGNC S... 5214 5214 PFKP "phosphofructokinase, platelet [Source:HGNC... 5223 5223 PGAM1 phosphoglycerate mutase 1 [Source:HGNC Symb... 5224 5224 PGAM2 phosphoglycerate mutase 2 [Source:HGNC Symb... 5230 5230 PGK1 phosphoglycerate kinase 1 [Source:HGNC Symb... 5232 5232 PGK2 phosphoglycerate kinase 2 [Source:HGNC Symb... 5236 5236 PGM1 phosphoglucomutase 1 [Source:HGNC Symbol;Ac... 5313 5313 PKLR pyruvate kinase L/R [Source:HGNC Symbol;Acc... 5315 5315 PKM pyruvate kinase M1/2 [Source:HGNC Symbol;Ac... 55276 55276 PGM2 phosphoglucomutase 2 [Source:HGNC Symbol;Ac... 55902 55902 ACSS2 acyl-CoA synthetase short chain family memb... 57818 57818 G6PC2 glucose-6-phosphatase catalytic subunit 2 [... 669 669 BPGM bisphosphoglycerate mutase [Source:HGNC Sym... 7167 7167 TPI1 triosephosphate isomerase 1 [Source:HGNC Sy... 84532 84532 ACSS1 acyl-CoA synthetase short chain family memb... 8789 8789 FBP2 fructose-bisphosphatase 2 [Source:HGNC Symb... 92483 92483 LDHAL6B lactate dehydrogenase A like 6B [Source:HGN...