Ambiguous flip
While at this early stage *all data is suspect*, some data is even more so. Genotypes which cannot easily be distinguished from their flipped form are very prone to confusion, including by scientists when they publish their results [PMID 18154681].
Since DNA is composed of 2 antiparallel strands, there is ambiguity over which strand to look at. dbSNP uses the assembled chromosome to establish a plus and a minus strand. Sometimes other sources rely on the orientation of an individual read, the encoded gene, or other information. Nearly every snp in SNPedia has an Orientation field on the righthand side infobox which shows as 'plus' or 'minus'.
Elsewhere you may see the terms orientation and strand used interchangably.
If a microarray claims that you are an rs1234(A;A) for a SNP in which the other allele is G, but dbSNP claims that this is a C;T SNP, then logically we flip your results over and call you a rs1234(T;T). This is safe and reasonable.
Unfortunately if this was instead a SNP where the two alleles are A or T the same flipping logic falls down. We don't (yet) have a way to know for sure if you should be flipped or not, since both forms of your flip rs1234(A;A) and rs1234(T;T) are possible. That problem occurs for the homozygous forms of A/T and C/G SNPs, and remains hard, with several possible avenues of attack, but no clear solution.
be extra skeptical of these cases[edit]
- an (A;A) genotype call when the alleles are A and T
- a (T;T) genotype call when the alleles are A and T
- a (C;C) genotype call when the alleles are C and G
- a (G;G) genotype call when the alleles are C and G
A possible solution to this is to use the Illumina TOPBOT solution