Have questions? Visit https://www.reddit.com/r/SNPedia

Talk:Help (population diversity)

From SNPedia

*MOST OF THESE ARE MORMONS/LDS-CEU - European[edit]

- 180 samples of Utah residents with Northern and Western European ancestry from the CEPH collection (originally 30 mother-father-child trios)


I expect ALL of these are Mormons/LDS. They are the CEPH HapMap trios as prepared by the Center for the Study of Human Polymorphisms

The CEU are better described as being of Western European ancestry than of Northern European ancestry as often reported. Both the CHB and CEU show subtle but detectable signs of admixture. Thus the YRI and JPT samples are well-suited to standard population-genetic studies, but the CHB and CEU less so.

source 10.1371/journal.pone.0004684

In 1984, Professor Dausset created the Centre d'Etude du Polymorphisme Humain (CEPH), a laboratory, later an internationally renowned genome center, which coordinated the first international genome mapping collaboration by making available DNA from 40 large reference families (later 61) to researchers throughout the world. By their working on DNA from the same set of families, it was possible to map the human genome by linkage, which placed a set of DNA markers along all human chromosomes (including the X-Y pseudoautosomal region). Knowledge of these chromosome maps, which was made available to the scientific community worldwide, permitted researchers, some at CEPH, to localize major genes for various genetic disorders to regions of the human genome. Localization of such genes was the first, important step in cloning and identifying them, a breakthrough for medical genetics. Furthermore, the linkage maps provided the foundation for the International Genome Project, the physical mapping of the human genome (largely initiated at CEPH), which in turn led to determination of the DNA sequence. The CEPH reference families continue to be used for genomics research. More recently, Professor Dausset collaborated with Professor L. L. Cavalli-Sforza in developing, at the Foundation, a widely used DNA resource from world populations for research in human population genetics, the HGDP-CEPH Diversity Panel.

source: Center for the Study of Human Polymorphisms

you can download their raw data --- cariaso 07:27, 20 September 2011 (UTC)
There are about 21 families there, which may be the original LDS families. There is likely a pronounced "Founder's Effect" in all that. John Lloyd Scharf 18:56, 22 September 2011 (UTC)

Help on help[edit]

Sorry if this is an absurd question. (I'm new here.)

I'm wondering about the help offered (from http://www.snpedia.com/index.php/) on a line such as:

? (A;A) (A;C) (C;C) 28

means.

I click on the "?", come to

http://www.snpedia.com/index.php/Help_%28population_diversity%29

and read

//// The first line indicates that

  • 50% of Europeans have the (G;G) genotype
  • 35% of Europeans have the (G;T) genotype
  • 15% of Europeans have the (T;T) genotype

////

I don't see how those numbers are indicated (unless it's via the color-coding, but that would mean that the color-coding means something different each time, and that's not indicated).

Then I read:

//// The second line indicates that

  • 80% of Chinese have the (G;G) genotype
  • 10% of Chinese have the (G;T) genotype
  • 10% of Chinese have the (T;T) genotype

////

I don't see a second line.

Is the question mark (the link to help) in the wrong place?

Sorry, again, if I'm missing something obvious.

Seven7 (talk) 20:17, 17 October 2012 (UTC)

My question was, in turns out, at least partly (probably mostly) irrelevant! What I refer to as a "second line" is the actual population distribution by color, which for some reason doesn't show up in Firefox, at least not as I have it configured (lots of add-ons -- though I green-listed snpedia.com with NoScript, and reinstalled Flash.... Well, I'll keep working on it; or just use IE).
Seven7 (talk) 11:38, 20 October 2012 (UTC)

Why are we still using this old Hap-Map instead of the 1000-Genomes project?[edit]

The old hap-map data is hopelessly outdated, and often based on a sample size of TWO PEOPLE! That produces completely misleading frequencies, like 0% instead of 43% (see rs12416605 for example). Clicking on the automatic link to the 1000 genome project that's on every page shows that the real frequencies for different races are available for these SNPs. They even have the same old categories, so we can still get the frequencies for Mormon's in Utah etc. But ideally I'd like to see the new categories too. So, what's our policy on this? Do we need a bot to go through and pull all the frequencies from the 1000 genome project for SNPs that currently have a tiny sample size or no samples? Should we just switch to the 1000 genome project for all SNPs? Can Promethease handle other race codes like EUR, GBR, FIN, etc? CarlKenner (talk) 17:36, 8 January 2014 (UTC)

I don't believe any of our current data is based on 2 people. I think in some cases it may be ~80 people. We have that horribly out dated data, because User:Jlick made a similar complaint about why were still using HapMap phase 1, when HapMap 3 was now available. There is little reason to remove the HapMap 3 data that we have, but there is plenty of reason to add the 1000 Genomes data. To do so, the usual process is more or less as follows:
  1. Find a specific URL you would like to use for the 1000 Genomes population frequency data
  2. Hand code ~5 snps with examples of how you think it should be represented in SNPedia. This might be a few more fields in the current population frequency template, or perhaps a new template.
  3. We discuss and tweak a bit.
  4. Someone writes a bot to migrate the legacy data and newly added snps. I can probably adapt User:SNPediaBot to do this, perhaps the same is true for User:JlickBot, and perhaps there is a need better suited to a new bot. If so, when writing a bot, see Bulk for some examples and guidlines.
  5. With the data finally available in SNPedia, I can ensure that Promethease is able to make use of it.

For steps 1 and 2, the ball is in your court CarlKenner, unless someone else beats you to it --- cariaso 06:56, 9 January 2014 (UTC)

Oops, you're right. I read dbSNP wrong [1]. There was one sample of two people (or two chromosomes) and another Hapmap sample of 110. The data doesn't make sense though. It looked like a problem caused by small sample size, so that's what I assumed it was. But HapMap isn't registering any variation at all, even though everyone else is seeing quite a lot of variation there. So somebody is wrong. I think I've seen that for a few SNPs on SNPedia. So now I'm not sure who to trust for population frequencies, or if people are even measuring the same SNP. CarlKenner (talk) 15:41, 9 January 2014 (UTC)