Help (population diversity)
What do the population codes mean?
Each 3 letter code represents a particular population of common ethnic background, representing one of the populations studied in the International HapMap project.
In Phase I and II there were four populations studied:
- CEU - European - 180 samples of Utah residents with Northern and Western European ancestry from the CEPH collection (originally 30 mother-father-child trios)
- CHB - Han Chinese - 90 samples of Han Chinese in Beijing, China (previously called HCB, originally 45 unrelated samples)
- JPT - Japanese Tokyo - 91 samples of Japanese in Tokyo, Japan (originally 44 unrelated samples)
- YRI - Yoruba African - 180 samples of Yoruba in Ibadan, Nigeria (originally 30 Yoruba mother-father-child trios)
In Phase III of the study, seven additional populations were added to the study:
- ASW - 90 samples of African ancestry in Southwest USA
- CHD - 100 samples of Chinese in Metropolitan Denver, Colorado
- GIH - 100 samples of Gujarati Indians in Houston, Texas
- LWK - 100 samples of Luhya in Webuye, Kenya
- MEX - 90 samples of Mexican ancestry in Los Angeles, California
- MKK - 180 samples of Maasai in Kinyawa, Kenya
- TSI - 100 samples of Toscani in Italia
The following computed result is also available in SNPedia:
- AVG Mathematical average of all samples from above groups
How do I interpret the population diversity box?
The first line indicates that
- 50% of Europeans have the (G;G) genotype
- 35% of Europeans have the (G;T) genotype
- 15% of Europeans have the (T;T) genotype
The second line indicates that
- 80% of Chinese have the (G;G) genotype
- 10% of Chinese have the (G;T) genotype
- 10% of Chinese have the (T;T) genotype
The fifth line indicates that 'ASW' (African ancestry in Southwest USA) population did not report data. That is also true for the MEX population.
The number in the upper right corner means that this data is from HapMap Release 27. From time to time the HapMap data is updated to reflect new interpretations of the data, new testing groups, or new SNPs tested.
Why do I see CHB some places and HCB in other places?
The Han Chinese, Beijing group originally went by the code HCB. In more recent HapMap releases the code was changed to CHB. Older data which is not yet updated will show the acronym HCB. Newer entries will list both of them to allow tools such as Promethease and others to make a smooth transition to the new name, though SNPedia may hide the older name from view. Eventually everything will use the new code CHB, but it will take some time to complete the transition. Other sites may also still show the old code. Just keep in mind that HCB and CHB are the same thing.
Why is there no data shown for some populations on some SNPs?
Not all SNPs have data for all populations. This could mean that the study group for that population group was not tested for that SNP, or it could mean that the results did not pass quality control checks. It could also mean that the entry for that SNP has not been updated.
Where does this information come from? I see an error
31094 snps in SNPedia use this template, it is filled with a lot of data, used heavily by Promethease and other external tools, and it needs to be updated periodically as newer information is released by HapMap. For this reason most edits to this template should be done only by bots, otherwise human edits are likely to be lost during future updates. Please see the Population Frequency Edit Policy to further clarify this important topic.
As for rare genotypes see
Other 3 letter population codes
https://www.23andme.com/you/community/thread/18629/ seems to have a very comprehensive list