Have questions? Visit https://www.reddit.com/r/SNPedia

VCF

From SNPedia

Promethease can read VCF, but there is a lot of flexibility in the VCF format.

https://en.wikipedia.org/wiki/Variant_Call_Format

Ideally you'll be able to produce a version 4.2 compliant VCF, with the END= fields set. This is sometimes known as a gVCF. That will be allow us to distinguish positions which match the reference from positions which were not callable due to insufficient sequencing depth. This will provide the best possible Promethease report. [A good explanation of the two types of gVCF file is here.]

Alternatively you could use GATK with --EMIT-ALL-SITES which produces a MUCH larger VCF file, that also allows us to know reference vs missing.

Lastly a standard VCF will work, but will not have any information about positions that match the reference. While not essential these can be helpful for a Promethease report.


https://gatkforums.broadinstitute.org/gatk/discussion/4017/what-is-a-gvcf-and-how-is-it-different-from-a-regular-vcf

https://sites.google.com/site/gvcftools/home/about-gvcf

It seems this command may produce useful gvcfs, please confirm if you try it.

samtools mpileup -g 10 -uf /path/to/refgrch37.fa /path/to/a.sorted.bam

(the actual depth for -g will depend on the nature of your data)


Lastly, https://sequencing.com/genome-vcf may be able to produce a Promethease-compatible file from a .bam for a cost ($15?)