A utility for splitting mixed origin NGS reads with secondary or alt mappings
Xenomapper2 is a utility for post processing mapped reads that have been aligned to a primary genome and a secondary
genome and binning reads into species specific, multimapping in each species, unmapped and unassigned bins.
It can be used on single end or paired end sequencing data.
In paired end data evidence of sequence specificity for either read can be used to assign both reads.
Use cases include xenografts of human cancers and host pathogen interactions.
Xenomapper2 is a complete rewrite of xenomapper with fundamental changes in how the reads are handled internally. Support for SAM files has been removed, and BAM files are read and written using pylazybam, a pure python BAM file parser with basic write support.
Xenomapper2 can be used with most common aligners including Bowtie2, HISAT2, and BWA-MEM.
Running Xenomapper2 generally results in the same calls for data containing only primary alignments, with the simplification that paired end status is automatically detected.
Xenomapper2 improves handling of cases where the second read of a template is unmapped, and the mapped read is reported first. Xenomapper 1.0 would interpret these cases as the first entry in the SAM file as being the forward read, creating a mismatch in the comparison of forward and reverse between primary and secondary. By using flag status this potential source of error is now eliminated.
The major change in Xenomapper2 is treating all alignments from the same read as a group. This allows the new mode
--max
that selects the maximum AS score from all alignments, and the maximum XS score that occurs with this AS score.
This allows assignment based on high scoring secondary mappings.
These feature greatly enhance support for BWA-MEM based pipelines including mapping to GRCh38 with alt contigs. This will be useful especially in the analysis of complex immune regions such as HLA in humanised mice and xenografts.
Xenomapper2 requires python 3.6 or higher and is tested on linux and MacOS with CPython and pypy3.
if you would like to install from the github repository this can be done by cloning the repositorygit clone https://github.com/genomematt/xenomapper2
pip3 install --upgrade xenomapper2
or directly with pip
pip3 install git+https://github.com/genomematt/xenomapper2.git
Although the repository tests by continuous integration with TravisCI its good practice to run the tests locally and check your install works correctly. The tests are run with the following command:
python3 -m xenomapper.tests.test_all
Usage:
xenomapper2 --primary=<file> --secondary=<file>
[ --primary-specific=<file> --primary-multi=<file>
. --secondary-specific=<file> --secondary-multi=<file>
. --unassigned=<file> --unresolved=<file>
| --basename=<str> ]
[ --min-score=<int> ]
[ --zs | --cigar]
[ --max ]
[ --conservative ]
xenomapper2 --version
xenomapper2 [ -h | --help ]
Options:
-h --help Show this screen.
--version Show version.
Input files
--primary=<file> A BAM format file of primary species alignments
--secondary=<file> A BAM format file of secondary species alignments
Output options
--primary-specific=<file> filename for primary specific unique alignments
--primary-multi=<file> filename for primary specific multimap alignments
--secondary-specific=<file> filename for secondary specific unique alignments
--secondary-multi=<file> filename for secondary specific multimap alignments
--unassigned=<file> filename for unassigned alignments
--unresolved=<file> filename for unresolved alignments
--basename=<str> prefix for creating all other output files
only valid if no other output options provided
Processing options
--min-score=<int> minimum AS score required. Lower scores unassigned.
[ Default : None (implemented as -2^31) ]
--zs use ZS scores for spliced aligner (HISAT2)
--cigar use cigar scores to calculate AS score
--max use the maximum score for any alignment
[ Default : Use score of primary alignment ]
--conservative require both ends of paired reads to support the
assignment
Note that unlike prior xenomapper versions there is no --pair option as forward and reverse reads are automatically extracted based on their flag. Files of mixed paired and single end reads are now fully supported.
Most of the time you will want to invoke:
xenomapper2 --primary <primary.bam> --secondary <secondary.bam> --basename <prefix>
This will produce six output files (eg prefix_primary_specific.bam) and print a summary to standard error.
Process substitution will allow the use of input SAM files
xenomapper2 --primary <(samtools view -bS input.sam)
A worked example of using xenomapper can be found in example_usage.ipynb
Xenomapper2 is licensed under the BSD three clause license. You are free to fork this repository under the terms of that license. If you have suggested changes please start by raising an issue in the issue tracker. Pull requests are welcome and will be included at the discretion of the author, but must have 100% test coverage.
Bug reports should be made to the issue tracker. Difficulty in understanding how to use the software is a documentation
bug, and should also be raised on the issue tracker and will be tagged question
so your question and my response are
easily found by others.
Xenomapper2 uses numpy style docstrings, python type annotations, Travis CI, coverage and coveralls. All code should be compatible with python versions >= 3.6 and contain only pure python code.
To ensure the least encumbrance to all users, including those in commercial environments Xenomapper2 is now licensed under the BSD three clause license (rather than the GPL that was used for xenomapper 1.0)
Justin Bedo, Alan Rubin and Tony Papenfuss provided helpful suggestions, early testing and code review. This work was supported by the Stafford Fox Medical Research Foundation