Search tandem repeats in given folder with fasta files:
python parallel_trf.py input_folder output_folder mask threads
Example:
python parallel_trf.py ~/human_genome/fasta ~/human_genome/trf fa 20
Compute and draw distribution of PE fragment lengths:
python fragments_length_from_sam.py -o image_file -i sam_file
Count unmapped reads:
from PyBioSnippets.sam.sam_functions import count_unmapped
(mapped, unmapped) = count_unmapped(sam_file)
Save unmapped reads from SAM file to fasta file:
from PyBioSnippets.sam.sam_functions import save_unmapped_to_fasta
save_unmapped_to_fasta(sam_file, fasta_file)
Compute fragment lengths statistics for first l lines.
python fragments_length_from_sam.py -o stat.png -i data.sam -l 100000
Count FLAG values for given SAM file:
python hiseq/sam_stats.py -i data.sam
Join splitted HiSeq files:
python hiseq/join_fastq.py --remove False --input some_folder --mask read_L001_R1
Fix too long quality scores in corrupted HiSeq files
fix_uncorrect_long_quality(fastq_file, corrected_fastq_output)
Iterator for pair end files:
for read_obj1, read_obj2 in iter_pe_data(fastq_file1, fastq_file2):
do_somethind()
Convert fastq to fasta:
python hiseq/fastq_to_fasta.py -i data.fastq -o data.fasta
Compute kmer frequences percents for coverage plot.
python compute_kmer_coverage.py input_file output_file
Convert bax.h5 files into fasta and fastq files.
ls | grep bax.h5 | xargs -n 1 --max-procs 64 python baxh5_to_fastq.py
cat *fasta > pacbio.fasta
cat *fastq > pacbio.fastq
Get dictionary with chromosome lengths
chr2length = get_chromosome_lengths(rerence_multifasta)