Bulk

From SNPedia
Jump to: navigation, search

Reminder[edit]

The content in SNPedia is available under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. Commercial licenses are available from info@snpedia.com.

Introduction[edit]

Based on the format, frequency and complexity of your particular needs, you may wish to consider these sources:

The European Bioinformatics Institute hosts a DAS http://www.ebi.ac.uk/das-srv/easydas/bernat/das/SNPedia/features?segment=10:1,51319502

http://www.oppi.uef.fi/bioinformatics/varietas/ provides a web interface which includes SNPedia content. [PMID 20671203]

The file at http://www.snpedia.com/files/gbrowse/SNPedia.gff is updated semi-regularly and can be parsed to provide a reasonable list.

Forbidden[edit]

Bots which try to pull every version of every page crush the server, and will be banned long before you complete the full scrape.

Bots which try to pull every possible rs# (even the ones not in SNPedia) crush the server, and will be banned long before you complete the full scape. You must first ask which snps are in SNPedia with a query such as

This is easier to do with the APIs listed below. See the MediaWiki documentation

Programmers[edit]

Please aim your bots at http://bots.snpedia.com/api.php and see these two projects

Perl[edit]

Please notice and use the line

$bot->{api}->{use_http_get} = 1;

which is necessary to ensure GET instead of POST for some older versions of the library.

Semantic-MediaWiki-Bot is a new Semantic MediaWiki aware bot library.

Get all SNP names[edit]

use MediaWiki::Bot;
my $bot = MediaWiki::Bot->new({
   protocol => 'http',
   host => 'bots.snpedia.com',
   path => '/',
   });
$bot->{api}->{use_http_get} = 1;
my @rsnums = $bot->get_pages_in_category('Category:Is_a_snp', {max=>0});
print join("\n",@rsnums),"\n";

How can I grab the text from pages?[edit]

#!/usr/bin/env perl
use MediaWiki::Bot;
my $bot = MediaWiki::Bot->new({
   protocol => 'http',
   host => 'bots.snpedia.com',
   path => '/',
   });
$bot->{api}->{use_http_get} = 1;

foreach my $rs ('rs1815739',
                'rs4420638',
                'rs6152') {
   my $text = $bot->get_text($rs);
   print '=' x 20,"$rs\n";
   print $text;
}


I need Genotypes and their Magnitude[edit]

#!/usr/bin/env perl;
use strict;
use warnings;
use MediaWiki::Bot;

my $bot = MediaWiki::Bot->new({
   protocol => 'http',
   host => 'bots.snpedia.com',
   path => '/',
   });
$bot->{api}->{use_http_get} = 1;
my $text = $bot->get_text('rs1234');
print '=' x 20,"$text\n";
print "\n\nThe above text should prove that we can read from SNPedia\n";
print "Getting some more info from SNPedia\n";

my @genotype = $bot->get_pages_in_category('Category:Is a genotype', {max=>0}) ;

foreach my $geno (@genotype) {
   my $genotext       = $bot->get_text($geno);
   my ($magnitude)    = $genotext =~ m/magnitude\s*=\s*([+-\.\d]+)/;
   my ($beginingtext) = $genotext =~ m/\}\}(.{3,30})/s;
   $beginingtext = $genotext unless $beginingtext;
   $beginingtext =~ tr/\n/ /;
   $magnitude = '' unless defined $magnitude;
   print "Magnitude\t${magnitude}\tfor\t${geno}\t${beginingtext}\n";
}

Python[edit]

Those examples use wikitools

Get all SNP names[edit]

site = wiki.Wiki("http://bots.snpedia.com/api.php")                  # open snpedia
snps = category.Category(site, "Is_a_snp")
snpedia = []
       
for article in snps.getAllMembersGen(namespaces=[0]):   # get all snp-names as list and print them
    snpedia.append(article.title.lower())
    print article.title

Grab a single SNP-page in full text[edit]

You get back a string that contains the unformated wiki-code:

site = wiki.Wiki("http://bots.snpedia.com/api.php")
snp = "rs7412"
pagehandle = page.Page(site,snp)
snp_page = pagehandle.getWikiText()

To parse mediawiki templates try https://github.com/earwig/mwparserfromhell

Ruby[edit]

These examples use the Mediawiki-gateway-gem

Please use versions 0.5.0 or later due to http://github.com/jpatokal/mediawiki-gateway/issues/24

Grab all SNP-pages that contain a specific text and iterate over the content[edit]

This example grabs all genotype-pages of a specific SNP

@snp = "Rs7412"
mw = MediaWiki::Gateway.new("http://bots.snpedia.com/api.php")
pages = mw.list(@snp + "(") # return an array of page-titles
if pages != nil
  pages.each do |p| # iterate over the results and grab the full text for each page
    single_page = mw.get(p)
    puts single_page
  end
end