Have questions? Visit https://www.reddit.com/r/SNPedia

Bulk

From SNPedia

Reminder[edit]

The content in SNPedia is available under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.

Introduction[edit]

Based on the format, frequency and complexity of your particular needs, you may wish to consider these sources:

The European Bioinformatics Institute hosts a DAS http://www.ebi.ac.uk/das-srv/easydas/bernat/das/SNPedia/features?segment=10:1,51319502

http://www.oppi.uef.fi/bioinformatics/varietas/ provides a web interface which includes SNPedia content. [PMID 20671203]

The file at http://www.snpedia.com/files/gbrowse/SNPedia.gff is updated semi-regularly and can be parsed to provide a reasonable list.

Forbidden[edit]

Bots which try to pull every version of every page crush the server, and will be banned long before you complete the full scrape.

Bots which try to pull every possible rs# (even the ones not in SNPedia) crush the server, and will be banned long before you complete the full scape. You must first ask which snps are in SNPedia with a query such as


This is easier to do with the APIs listed below. See the MediaWiki documentation

Programmers[edit]

Please aim your bots at bots.snpedia.com not www.snpedia.com[edit]

In 2016 MediaWiki updated two parts of their software that affect the use of SNPedia.

1. updated login mechanism. While the old method still works, a new Oauth based method is preferred. Extensive information from mediawiki is at https://www.mediawiki.org/wiki/OAuth/Owner-only_consumers

You can immediately create the necessary tokens for SNPedia by visiting http://bots.snpedia.com/index.php/Special:OAuthConsumerRegistration

2. The mechanism to request more than 500 members of a category has changed. MediaWiki documents this at https://www.mediawiki.org/wiki/API:Query#Generators_and_continuation

However not all languages and libraries have yet been updated to use these new mechanisms.

Check yours at or look for new ones at

https://www.mediawiki.org/wiki/API:Client_code


Here is some sample python 2.7 code which uses both of these correctly via https://github.com/mwclient/mwclient

#!/usr/bin/env python
import mwclient
from mwclient import Site

agent = 'MySNPBot. Run by User:Xyz. xyz@foo.com Using mwclient/' + mwclient.__ver__
# tokens and secrets are only necessary if your bot will write into SNPedia.
# get your own tokens at http://bots.snpedia.com/index.php/Special:OAuthConsumerRegistration
site = mwclient.Site(('https', 'bots.snpedia.com'), path='/',
                    clients_useragent=agent,
                    consumer_token='secret1',
                    consumer_secret='secret2',
                    access_token='secret3',
                    access_secret='secret4')


for i, page in enumerate(site.Categories['Is_a_snp']):
    print i, page.name


Your edits to this page to document the mechanism for your favorite language or library are encouraged.


Everything below is true, but somewhat out of date.

Perl[edit]

Please notice and use the line

$bot->{api}->{use_http_get} = 1;

which is necessary to ensure GET instead of POST for some older versions of the library.

Get all SNP names[edit]

use MediaWiki::Bot;
my $bot = MediaWiki::Bot->new({
   protocol => 'http',
   host => 'bots.snpedia.com',
   path => '/',
   });
$bot->{api}->{use_http_get} = 1;
my @rsnums = $bot->get_pages_in_category('Category:Is_a_snp', {max=>0});
print join("\n",@rsnums),"\n";

How can I grab the text from pages?[edit]

#!/usr/bin/env perl
use MediaWiki::Bot;
my $bot = MediaWiki::Bot->new({
   protocol => 'http',
   host => 'bots.snpedia.com',
   path => '/',
   });
$bot->{api}->{use_http_get} = 1;

foreach my $rs ('rs1815739',
                'rs4420638',
                'rs6152') {
   my $text = $bot->get_text($rs);
   print '=' x 20,"$rs\n";
   print $text;
}


I need Genotypes and their Magnitude[edit]

#!/usr/bin/env perl;
use strict;
use warnings;
use MediaWiki::Bot;

my $bot = MediaWiki::Bot->new({
   protocol => 'http',
   host => 'bots.snpedia.com',
   path => '/',
   });
$bot->{api}->{use_http_get} = 1;
my $text = $bot->get_text('rs1234');
print '=' x 20,"$text\n";
print "\n\nThe above text should prove that we can read from SNPedia\n";
print "Getting some more info from SNPedia\n";

my @genotype = $bot->get_pages_in_category('Category:Is a genotype', {max=>0}) ;

foreach my $geno (@genotype) {
   my $genotext       = $bot->get_text($geno);
   my ($magnitude)    = $genotext =~ m/magnitude\s*=\s*([+-\.\d]+)/;
   my ($beginingtext) = $genotext =~ m/\}\}(.{3,30})/s;
   $beginingtext = $genotext unless $beginingtext;
   $beginingtext =~ tr/\n/ /;
   $magnitude = '' unless defined $magnitude;
   print "Magnitude\t${magnitude}\tfor\t${geno}\t${beginingtext}\n";
}

Python[edit]

Those examples use wikitools

Get all SNP names[edit]

from wikitools import wiki, category
site = wiki.Wiki("http://bots.snpedia.com/api.php")                  # open snpedia
snps = category.Category(site, "Is_a_snp")
snpedia = []
       
for article in snps.getAllMembersGen(namespaces=[0]):   # get all snp-names as list and print them
    snpedia.append(article.title.lower())
    print article.title

Grab a single SNP-page in full text[edit]

You get back a string that contains the unformated wiki-code:

from wikitools import wiki, category, page
site = wiki.Wiki("http://bots.snpedia.com/api.php")
snp = "rs7412"
pagehandle = page.Page(site,snp)
snp_page = pagehandle.getWikiText()

To parse mediawiki templates try https://github.com/earwig/mwparserfromhell

Ruby[edit]

These examples use the Mediawiki-gateway-gem

Please use versions 0.5.0 or later due to http://github.com/jpatokal/mediawiki-gateway/issues/24

Grab all SNP-pages that contain a specific text and iterate over the content[edit]

This example grabs all genotype-pages of a specific SNP

@snp = "Rs7412"
mw = MediaWiki::Gateway.new("http://bots.snpedia.com/api.php")
pages = mw.list(@snp + "(") # return an array of page-titles
if pages != nil
  pages.each do |p| # iterate over the results and grab the full text for each page
    single_page = mw.get(p)
    puts single_page
  end
end

R / Bioconductor[edit]

An R package to query data from SNPedia is available in the Bioconductor web site:
https://bioconductor.org/packages/SNPediaR
See Vignette for usage.

Development version of the library and some extra documentation may be found in GitHub:
https://github.com/genometra/SNPediaR

Limited to 500 entries?[edit]

Please see https://www.mediawiki.org/wiki/API:Query#Generators_and_continuation