Have questions? Visit https://www.reddit.com/r/SNPedia

Genes for Good

From SNPedia


After answering 15 health history and 20 health tracking surveys on Facebook, you are able to get free genetic testing which gives you your raw data. This data is compatible with Promethease, and according to Genes for Good, the raw data consists of ~550,000 genotypes, assayed by microarray.

Some Genes for Good files contain imputed data. These files will yield the largest Promethease report, but since some of your genotypes have been 'imputed', they are assumed, and may not be true. Individuals from ethnic groups which were not part of the Genes for Good training sets can expect more errors than people from similar ethnic groups. It's unclear exactly what groups were used for training, but it's safe to assume western europeans are well represented.

Files that don't mentioned 'imputed' should only include genotypes which were actually observed. This results in a smaller Promethease report, but one with higher confidence.

As of Feb 2017 Promethease reports for the unphased (non-imputed) Genes for Good raw data have ~16,700 genotypes reported, with ~2,700 of them being from ClinVar.

Some of their files also indicate 'noY_noMT' indicating that SNPs from these haploid chromosomes are not included. Without them it is impossible to see any data related to your haplogroups.

For a small ($2) additional fee, you can combine (pool) additional files together, which might let you have the best of both worlds.

Some additional questions are addressed at https://www.reddit.com/r/freebies/comments/67v9c5/free_dna_test_from_the_university_of_michigan/

Which Files Can Be Uploaded to Promethease?[edit]

Genes for Good appears to offer several download options. If you download a zipped file containing all your raw data, you will need to unzip that file first. There are usually 9 unzipped files, as shown here:

Genes for Good Unzipped Files to Use With Promethease2.jpg

While Promethease can use the files in the VCF and 23andMe .txt formats, as shown in the image we recommend using the files that are in .gz format, since they are compressed and will upload quicker.

Interpreting a Pooled Report[edit]

If you choose to add your imputed data to your original data to get a combined Promethease report, notice that many of the genotypes in your Promethease report say 'count 2'. Those were the ones in both files, original and imputed. The ones that were only in one file don't say that. Since we expect everything from the original file to be in the imputed, that is enough to let you know what's imputed.

In practice there are a few genos that differ between the files, especially if you combine data from different companies. This relates mostly to different representations of the same information; for example, 23andMe chooses to use II or DD or DI to indicate in/dels (insertions & deletions). Genes for Good will usually use the actual genotype so you'll probably see (for example) rs1234(G;G) or rs1234(;) or rs1234(-;G). By clicking the checkbox for conflicts, you can find these.

Imputed Multiple Calls[edit]

The imputed file sometimes contains multiple lines with the same rs#, but different genotypes. A specific example looks like this 1 20227723 rs4654925 G T . PASS . GT 0|0 1 20227723 rs4654925 G C . PASS . GT 1|1

Which promethease feels should be interpreted to mean that rs4654925(G;G) is reported and rs4654925(C;C) is reported. GfG claims that they intend a different interpretation:

The example you describe is a multi-allelic SNP. Most SNPs are biallelic, meaning that there are two possible alleles, for instance A or T. In your example, the possible alleles are G, T, and C. Since the imputed genotypes are not directly measured and are instead our best "guess" - using statistics and a reference genome of course - this results in a slightly tricky situation when dealing with multi-allelic SNPs. The first line is the imputed value if estimating between G and T. The second line is the imputed value if estimating between G and C. We discussed removing multi-allelic SNPs from the imputed files, but decided against it.

See also VCF.