Have questions? Visit https://www.reddit.com/r/SNPedia


From SNPedia

DNA.land is a service that takes a person's genomic DNA data files from companies such as 23andMe, Ancestry.com, and FamilyTreeDNA for ancestry and possible other research purposes.

DNA.land will take a person's data as produced by such companies and impute additional variants based on population frequency statistics. To put this in concrete terms, a person uploading a typical 23andMe file of ~700,000 variants to DNA.land will get back an (imputed) file of ~39 million variants, all predicted to be present in the person. Promethease reports from such imputed files typically contain about 50% more information (i.e. 50% more genotypes) than the corresponding reports from raw (non-imputed) data.

The process of imputation works best in relatively genetically homogenous individuals from well studied populations (such as Caucasians), and it works best in predicting common variants co-inherited near other common variants. Conversely, it is less accurate in individuals of mixed ancestry or minority populations, and, the rarer the variant a person is actually carrying the less accurate imputation will be at predicting it's existence. Many variants of potential high interest are (unfortunately for imputation) the rarest.

Be aware that, at least at the moment and from what we can see, it also does not appear as if the DNA Land file of imputed SNPs distinguishes between which genotypes are imputed, and which ones were in the original raw data file. So, if a certain genotype in your Promethease report is of particular interest, you will have to go back to your original raw data (for example, the original 23andMe raw data file) to see if it's there if you want to know whether it's imputed or not.

Promethease will produce reports on files with imputed SNPs (all 39 million or however many are in the file). This will results in a Promethease report with a substantial increase in reported genotypes; due to the additional processing expense, the fee is currently $10 (instead of the standard $5).

DNA.land users often also ask which files to use for creating a Promethease report:

  • the 'imputed vcf' file will give you the most results, but some of them will be unreliable due to imputation;
  • the 'raw data' file will produce a smaller result, but a bit more trustworthy; and,
  • the 'imported vcf' file will produce the same result as 'raw data', but, it's better to use the original 'raw data' file instead.