Bulk
Contents |
[edit] Reminder
The content in SNPedia is available under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. Commercial licenses are available from info@snpedia.com.
[edit] Introduction
Based on the format, frequency and complexity of your particular needs, you may wish to consider these sources:
The European Bioinformatics Institute hosts a DAS http://www.ebi.ac.uk/das-srv/easydas/bernat/das/SNPedia/features?segment=10:1,51319502
http://kokki.uku.fi/bioinformatics/varietas/ provides a web interface which includes SNPedia content
The file at http://www.snpedia.com/files/gbrowse/SNPedia.gff is updated semi-regularly and can be parsed to provide a reasonable list.
[edit] Page History
Bots which try to pull every version of every page crush the server, and will be banned long before you complete the full scape.
[edit] Programmers
Please aim your bots at http://bots.snpedia.com/api.php and see these two projects
- https://github.com/cariaso/smwcon2012bots provides examples in several languages
- https://github.com/cariaso/Semantic-MediaWiki-Bot is a not-yet-ready but in-progress planned best practices suggested interface
[edit] Perl
Please notice and use the line
$bot->{api}->{use_http_get} = 1;
which is necessary to ensure GET instead of POST for some older versions of the library.
Semantic-MediaWiki-Bot is a new Semantic MediaWiki aware bot library.
[edit] Get all SNP names
use MediaWiki::Bot;
my $bot = MediaWiki::Bot->new({
protocol => 'http',
host => 'bots.snpedia.com',
path => '/',
});
$bot->{api}->{use_http_get} = 1;
my @rsnums = $bot->get_pages_in_category('Category:Is_a_snp', {max=>0});
print join("\n",@rsnums),"\n";
[edit] How can I grab the text from pages?
#!/usr/bin/env perl
use MediaWiki::Bot;
my $bot = MediaWiki::Bot->new({
protocol => 'http',
host => 'bots.snpedia.com',
path => '/',
});
$bot->{api}->{use_http_get} = 1;
foreach my $rs ('rs1815739',
'rs4420638',
'rs6152') {
my $text = $bot->get_text($rs);
print '=' x 20,"$rs\n";
print $text;
}
[edit] I need Genotypes and their Magnitude
#!/usr/bin/env perl;
use strict;
use warnings;
use MediaWiki::Bot;
my $bot = MediaWiki::Bot->new({
protocol => 'http',
host => 'bots.snpedia.com',
path => '/',
});
$bot->{api}->{use_http_get} = 1;
my $text = $bot->get_text('rs1234');
print '=' x 20,"$text\n";
print "\n\nThe above text should prove that we can read from SNPedia\n";
print "Getting some more info from SNPedia\n";
my @genotype = $bot->get_pages_in_category('Category:Is a genotype', {max=>0}) ;
foreach my $geno (@genotype) {
my $genotext = $bot->get_text($geno);
my ($magnitude) = $genotext =~ m/magnitude\s*=\s*([+-\.\d]+)/;
my ($beginingtext) = $genotext =~ m/\}\}(.{3,30})/s;
$beginingtext = $genotext unless $beginingtext;
$beginingtext =~ tr/\n/ /;
$magnitude = '' unless defined $magnitude;
print "Magnitude\t${magnitude}\tfor\t${geno}\t${beginingtext}\n";
}
[edit] Python
Those examples use the python-wikitools
[edit] Get all SNP names
site = wiki.Wiki("http://bots.snpedia.com/api.php") # open snpedia
snps = category.Category(site, "Is_a_snp")
snpedia = []
for article in snps.getAllMembersGen(namespaces=[0]): # get all snp-names as list and print them
snpedia.append(article.title.lower())
print article.title
[edit] Grab a single SNP-page in full text
You get back a string that contains the unformated wiki-code:
site = wiki.Wiki("http://bots.snpedia.com/api.php")
snp = "rs7412"
pagehandle = page.Page(site,single_snp.name)
snp_page = pagehandle.getWikiText()
[edit] Ruby
These examples use the Mediawiki-gateway-gem
Please use versions 0.5.0 or later due to http://github.com/jpatokal/mediawiki-gateway/issues/24
[edit] Grab all SNP-pages that contain a specific text and iterate over the content
This example grabs all genotype-pages of a specific SNP
@snp = "Rs7412"
mw = MediaWiki::Gateway.new("http://bots.snpedia.com/api.php")
pages = mw.list(@snp + "(") # return an array of page-titles
if pages != nil
pages.each do |p| # iterate over the results and grab the full text for each page
single_page = mw.get(p)
puts single_page
end
end