User:J1

Exome Variant Server[edit]

Including the SNPs from the Exome Variant Server (located at http://evs.gs.washington.edu/EVS/) would be a great addition to Promethease. This site lists rare mutations for genes along with a Polyphen2 score which attempts to determine the seriousness of the mutations. GWAS studies have yet to be done on many of these mutations, so the deleterious (or salutary) nature of these SNPs is yet to be confirmed. However, it would useful to be aware that one carried these rare missense mutations in key disease genes.

Consider the journal article below.

ApoE variant p.V236E is associated with markedly reduced risk of Alzheimer's disease Molecular Neurodegeneration 2014,9:11 doi:10.1186/1750-1326-9-11

The method of selecting the SNPs for the study can be replicated by accessing the EVS site, search using APOE gene, then use pull down "Sorts by Variant" Box use GVS Function, choose Show 50 entries. The SNPs in reddish orange are missense mutations. The above study investigated, rs769452 (on 23andme), rs199768005, and rs769455 which are the most frequently occurring missense mutations in APOE (look down the EA Allele # column with reddish orange background) aside from rs7412 and rs429358 [these make up the APOE epsilon4 genotype (strangely EVS lists rs429358 as benign)]. In this study it was found that the rare SNP rs199768005 formed an APOE epsilon3b haplotype that markedly reduced risk of Alzheimer's disease(OR=0.10)

These mutations are very rare. However, a similar search could be done for any other gene! Collectively, these rare mutations could explain a substantial amount of disease risk (especially the mutations with large effect size). Many of these SNPs are not on the 23andme gene chip. However, rs9331936, rs9331938(considered benign) from CLU are, as is rs3752233 from ABCA7. Yet, with your full genome scan in hand, the entire EVS database would be callable. Many of these mutations are rare enough that they are also not in Alzgene.

I don't think that the study found that rs62256378 AA reduced AD risk. The SNP was shown to reduce cognitive decline in AD patients.

Promethease could include these rare mutations in reports without waiting for GWAS confirmation.

EVS Response[edit]

http://evs.gs.washington.edu/evs_bulk_data/ESP6500SI-V2-SSA137.protein-hgvs-update.snps_indels.vcf.tar.gz

shows me ~1889348 unique snps with rs#s. 272,781 of these have what appears to be the most significant classification as 'probably-damaging'. Adding all of those would quadruple the size of snpedia. I'm not completely opposed, but it would hope to find a way to further filter so that the signal to noise ratio improves. I've run a promethease report based on the ESP snps that are already in SNPedia, and it is now located at View or download

The SNPs in the Exome Variant Server (EVS) are likely very informative for possible disease risk. The missense SNPs (it would be best to include all the missense SNPs because quite a few of them seem to be miscategorized) listed in EVS are all in the exome! All of these SNPs involve protein substitution in genes. That sounds dangerous! Further, these SNPs are probably the causal mutations that everyone has been searching for. The article notes that these mutations are close to a definitive list of all exonic mutations that occur at approximately .1% or greater frequency.

Exactly because these SNPs are so potentially dangerous, there might not be much noise. It might be all signal. The referenced article above simply searched for the 3 most common missense SNPs in APOE (aside from the epsilon genotype) and made 1 discovery and 1 need more research. Not bad, 1+ out of 3. Rare missense SNPs might commonly result in such discoveries. Now think of all the hundreds of thousands of SNPs on most gene chips that have absolutely no discernible significance. Gene chips are almost all noise and no signal. For many diseases, geneticists do not even bother including common SNPs into the model because they add little (if any) information.

Though, if you really wanted to filter, then you could include only the missense SNPs related to disease genes. So, for example, in Alzheimer's disease you could search for all the Alzheimer SNPs in Promethease, determine which genes these SNPs were from, and then add in all the missense exome SNPs from these genes in the Alzheimer tab. If you included missense SNPs from genes without a disease association, it would not be clear what illness might result (though it would still be best to include all the missense SNPs).

These exonic misense SNPs are what Promethesase customers are most interested in. However, most of these SNPs are rare (likely because they are not adaptive). Even if 1 million of these missense SNPs were added to Promethease with an average frequency of 0.1%, then, a typical Promethease report would only have 100 of them. (Many of these SNPs are not on 23andme).

It would be amazing if 23andme offered an Exome SNP chip service that included 1-1.89 million of these SNPs. (Illumina has an Exome chip.) It would be worth $100 to have this information.

It would also be great if Promethease included these SNPs and combined this with family genotyping. If a family had a genetic illness, ran a Promethease scan with these rare EVS SNPs included and there was a correspondence between presence of the illness and the rare SNPs, then this would be very helpful information for them to know. It would be especially helpful if full genome scans were done and all of the rare EVS SNPs could be called.

The Promethease report (noted above) that included the EVS dataset has 132 SNPs with 0 frequency which are likely from the EVS. This is not an excessive addition to the size of the report, though it would be very interesting to investigate each one of these SNPs for possible disease significance.

There is certainly a lot of discussion in the exome and WGS world about VUS (variants of unknown significance). Currently, though, the vast (or maybe I should say VAAST) majority of Promethease users get their data from DNA chips. There are very very few who at this time have exome or especially WGS about themselves and their family. That is surely going to change, but it hasn't yet, and the cost of getting even an exome done is about $1000 at the moment, so it's an order of magnitude more expensive than SNP/chip data. We are actually in discussions about ways to help folks get exome data, and in parallel will keep up with the world of predicted - rather than published - significance about DNA variants. Greg (talk) 03:02, 17 March 2014 (UTC)

Genome Imputting[edit]

For anyone interested in imputing their genome, this should be helpful.

-Download Plink 1.9 at https://www.cog-genomics.org/plink2/ make a folder on the desktop, and put Plink 1.9 into it.

-Set the directory path using commands cd and cd.. to the folder with Plink in it. the command cd.. brings you up one directory. For example if current directory is C:\Users\A\Desktop> and the folder is named plink, then type cd Plink. [You need to open the DOS command prompt to do this. The DOS command prompt can be opened in Windows 8 by swiping the top right corner, typing "command prompt" in the search box, pressing enter, and clicking on the command prompt icon that should be displayed.]

-Read in your 23andme files with below commands. (Impute2 requires at least 2 people in the g file to function):

plink --23file genome.txt AA --out plink_genome1 plink --23file genome.txt BB --out plink_genome2

-Merge the 23andme files with below command:

plink --bfile plink_genome1 --bmerge plink_genome2 --make-bed --out combogenome1_trial

-Recode the Plink file to Impute2 format with below command:

plink --bfile combogenome1_trial --recode oxford --out combogenome1ox

-Download the Impute2 software from http://mathgen.stats.ox.ac.uk/impute/impute_v2.html make a folder on the desktop and put Impute2 download into it.

-Set the directory path to the folder where you put the Impute2 software using the command prompt commands cd.. and cd

-Run the examples using the commands given on the Impute2 website.

-Download the 1000 Genomes reference files from the Impute2 site and put it into the folder with Impute2. (These files are huge {about 3GB}.)

-Unzip them and put them into the folder with Impute2.

-Run the below command to impute your 23andme file. (change the -int argument ranges to move along a chromosome; in the -m argument change the chromosome by changing "21" to the desired chromosome; and with the -o argument change the chromosome number to the desired chromosome and change the number in "Impute1" in order to have a record of your results.) {You might also need to change the name of the "ALL" files as the file names might be too long.}

impute2 -m genetic_map_chr21_combined_b37.txt -h ALL.chr21.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.nomono.haplotypes -l ALL.chr21.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.nomono.legend -g combogenome1ox -int 40.0e6 42.0e6 -Ne 20000 -o Impute1.chr21

-Check the concordance tables at the end of each run, the values should be in the 90+%. It is sometimes can be difficult getting high concordances. One thing to help with this is to change the composition of the g file.

-If this all works out, then you would be able to impute millions of SNPs from your 23andme file! The results for SNPs with non reference allele frequencies above 10% have aggregate r2s in the mid .90s. ( http://mathgen.stats.ox.ac.uk/impute/data_download_1000G_phase1_integrated_SHAPEIT2_9-12-13.html )

This means that for most common SNPs you should receive an accurately imputed genotype. This would be very useful if you wanted to estimate for your risk of a disease with many common variants.

Double Signal[edit]

The recent International Genomics of Alzheimer Project cites rs6733839 (MAF=.37(T) see the first table in [Alzheimer's Disease, late onset (IGAP)] ) located at 2:127135234 on the gene BIN1 as a risk SNP for Alzheimer's disease with OR=1.22. In fact, BIN1 has the second largest population attributable fraction for AD (behind only APOE)(see the first table in [Alzheimer's Disease, late onset (IGAP)] ).

SNPedia has recently added rs744373 (listed on dbSNP MAF=.37 (G) , SNPedia is listing the minor allele as C), located at 2:127137039 on the gene BIN1 as a risk allele for AD with OR=1.17.

rs6733839 and rs744373 are less than 2,000 base pairs apart,and they have similar ORs and MAFs. Therefore, it seems reasonable to suspect that these 2 SNPs are actually reporting the same signal. It is important that Promethease report only one of these risk variants. (Admittedly, rs744373 might be the better choice as it is on the 23andme gene chip, whereas rs6733839 is not on the gene chip).

It seems reasonable to suspect that these 2 SNPs are actually reporting the same signal.

Agreed

It is important that Promethease report only one of these risk variants. (Admittedly, rs744373 might be the better choice as it is on the 23adnme gene chip, whereas rs6733839 is not on the gene chip).

Disagreed.

Promethease is not just a support tool for 23andMe data, and certainly not for just the current version of their platform. Currently, they're on their 4th configuration, and that doesn't even count the exome sequencing they've offered. Promethease is used by MANY other audiences, including a lot of full genome sequencing. SNPedia text can be used to highlight the LD, but assuming that there is no LD between snps in a promethease report is dangerously wrong.

--- cariaso 00:53, 16 May 2014 (UTC)

Large Schizophrenia GWAS[edit]

The Psychiatric Genomics Consortium (PGC) is claiming 128 genome wide significant associations for schizophrenia. With their current sampling effort, this number should approach 1,000 within the next year. Slide 18 of the below presentation shows that the 128 SNPs can predict cases versus controls at p=4x10e-175 ! Look at the upper decile of risk in the figure, the odds ratio is around 7. I am not aware of a publication that reports these results, though it will be very useful to have this information when it becomes available(especially since these SNPs are reported to be common).

Anyone know how to be genotyped on the Psych chip? PGC has professed their commitment to open science. Does open science mean the people can opt in and help move the science forward?

http://www.med.unc.edu/cpg/news/banbury-presentation

There are several reports of combinations of HFE, TF and APOE epsilon etc. that might be made into genosets.

For example, bicarriers of HFE C282Y and HFE H63D which constitutes 3% of the population appear at risk for Alzheimer's

see Iron genes, iron load and risk of Alzheimer’s disease J Med Genet 2006;43 :e52 (http://www.jmedgenet.com/cgi/content/full/43/10/e52). doi: 10.1136/jmg.2006.040519

This result might not have been confirmed, though the article notes that these people would be at risk for a condition that is treatable. Perhaps snpedia could provide this as supplemental information

Also, Involvement of ApoE E4 and H63D in sporadic Alzheimer's disease in a folate-supplemented Ontario population J Alzheimers Dis. 2008 May;14(1):69-84

Huge news!

The Schizophrenia Working Group of the Psychiatric Genomic Consortium (PGC) has released their results of a massive GWAS in schizophrenia. Open access at Nature http://www.nature.com/nature/journal/vaop/ncurrent/full/nature13595.html

They found 108 loci.

Schizophrenia has, until now, been considered undecipherable.

We might need a team on snpedia to upload all these results.

So on a quick first read of the data, there are several issues, which maybe dedicated SNPedians can help work out. First and foremost is that the list of 108 loci [Supplementary Table 3] contains no SNPs ... they are just loci, meaning regions spanning at least 20KB each, without named SNPs. 128 SNPs are listed in Supp. Table 2, which could presumably be paired off with the loci listed in Table 3. Second, at first glance, there doesn't appear to any data indicating whether the effect, positive or negative for each SNP is seen (only) for the homozygous minor allele, or also for the heterozygous case. And third, the odds ratios are so small (i.e. so close to 1) that the only application seems to be the "just add'm up" approach, binning in this case into the deciles composed of sums ranging from (presumably) 0 to 216 (i.e. 108x2), and as the authors stress, neither the sensitivity or specificity of even this approach has predictive value. Greg (talk) 18:56, 23 July 2014 (UTC)

Yes, the real issue is how to present these results on snpedia in an accessible way. The authors of the present study intend to continue with larger GWAS to include 100,000s subjects. There are an estimated 10,000 genetic loci involved in schizophrenia risk. The 108 loci reported in the current study likely have the strongest effect among the common mutations. Yet, the highest OR was only 1.3. We are now starting to see GWAS with large numbers of SNPs that convey minimal amounts of useful information. The SNP with OR 1.3 had a frequency of 0.0225 in cases and 0.0191 in controls. The difference is minuscule!

Clearly snpedia is going to have to move to the add'm up approach. However,it is very unclear what meaningful biological units are being measured in such a summation (apples and oranges). Scaring Promethease users with the information that they have 1,000 markers that predispose them to schizophrenia will likely not properly convey to them the truth: having only 1,000 markers might mean that they were in the low risk decile. Having 1 number that conveys meaningful information about risks for schizophrenia, intelligence, etc. will make Promethease a much more valuable service.

Even with the warnings given in the article, I would still be interested in knowing what decile I am in. The top decile had an OR of 25!

On page 18 of the presentation (see url below), area under the curve is quoted as 0.7. That is not good enough for a diagnosis, though it is getting there. The top decile had an OR of 25!

http://www.med.unc.edu/cpg/news/banbury-presentation

Schizophrenia only occurs in 1% of the population. Why did they divide the risk sets into deciles (percentiles might be more informative)? It would be very interesting to know what the 1st percentile OR was and how predictive this was for schizophrenia.

None of the 23andme SNPs appear to be included in the new study.

Well, how about for the moment we agree that when the AUC is 0.90 or greater, it's worth entering, even with all the caveats? Greg (talk) 00:14, 26 July 2014 (UTC)

Yes, though it will be a tough wait. It could take years to get to an AUC of 0.90.

SNP Linkage Tool[edit]

Does Promethease routinely use the SNP linkage tool from http://www.broadinstitute.org/mpg/snap/ldsearch.php ?

I was reading an article on pubmed http://www.ncbi.nlm.nih.gov/pubmed/25038421 that included a SNP (rs998382) not listed on my 23andme file. (The study found a large increased risk of AD in carriers of a haplotype). However, when I entered rs998382 into the above SNP linkage tool, four SNPs were listed with 1.000 linkage, several more were near 0.900. snpedia might include these SNPs on the rs pages for the convenience of its readers?

Would Promethease have picked up on this and gave the results to 23andme users? If this feature is not included in Promethease, it should be. Perhaps whatever linkage statistic corresponds to 95% probability of determining the correct genotype could be used. This would be a fast and easy way for Promethease to do some imputing.

The text below (between the hyphen) was copied from the top of this page and concerns background information on the Exome Variant Server.

"Including the SNPs from the Exome Variant Server (located at http://evs.gs.washington.edu/EVS/) would be a great addition to Promethease. This site lists rare mutations for genes along with a Polyphen2 score which attempts to determine the seriousness of the mutations. GWAS studies have yet to be done on many of these mutations, so the deleterious (or salutary) nature of these SNPs is yet to be confirmed. However, it would useful to be aware that one carried these rare missense mutations in key disease genes.

Consider the journal article below.

APOE epsilon 3b markedly reduces risk of AD[edit]

ApoE variant p.V236E is associated with markedly reduced risk of Alzheimer's disease Molecular Neurodegeneration 2014,9:11 doi:10.1186/1750-1326-9-11

The method of selecting the SNPs for the study can be replicated by accessing the EVS site, search using APOE gene, then use pull down "Sorts by Variant" Box use GVS Function, choose Show 50 entries. The SNPs in reddish orange are missense mutations. The above study investigated, rs769452 (on 23andme), rs1997768005, and rs769455 which are the most frequently occurring missense mutations in APOE (look down the EA Allele # column with reddish orange background) aside from rs7412 and rs429358 [these make up the APOE epsilon4 genotype (strangely EVS lists rs429358 as benign)]. In this study it was found that the rare SNP rs1997768005 formed an APOE epsilon3b haplotype that markedly reduced risk of Alzheimer's disease(OR=0.10)