Python and Tcl/Tk scripts and tools to process and analyze DNA sequences and related data
GenBank to Fasta file converter; besides of sequence extraction this parser extracts additional useful information from GenBank file and place it into Fasta header file.
GenBank2Fasta_UniExtractor_126.tcl - current version, minor bug fixes.
DNA sequence processor and translator; it does translation in 6 frames in batch mode.
Brief description is here
seqs_processor_and_translator_bin_V126_AGCT.py - current version, it has new function - sequence split into multiple fasta files.
seqs_processor_and_translator_bin_V128_AGCT.py - the same as above with additional option to create fake quality for FASTA file.
seqs_processor_and_translator_bin_V136_AGCT.py - the same as above with an option to convert fasta alignments into CAP3 style.
NCBI BLAST parser.
Detailed description is here
tcl_blast_parser_123_V039.tcl - current version
tcl_blast_parser_123_V041.tcl - current version - to find common query overlap
tcl_blast_parser_123_V043_SS_beta03.tcl - beta version to extract subgroups in FASTA format for dowstream assembly program (CAP3, DiAlign, etc)
tcl_blast_parser_123_V047.tcl - Febr. 19 2009 version - fixed bug for long query length (10,000 or longer); derived from V041
Extraction of ORF (open reading frame) from BLAST-X report.
BLAST EST sequences against protein reference database and extract EST fragment that correspond to BLAST-X alignment.
SeqsExtractorFromBlastX_V126.py - current version (with no_hits counter).
extraction of sub-region from BLAST report (blast-x) if hit ID has match to query ID.
seqs_subgroup_extr_001.py - sequence subgroup extractor (1)
seqs_subgroup_extr_003.py - sequence subgroup extractor (3)
to extract sequence subset from FASTA file based on gene ID list:
version (1) - full size sequence extraction
version (3) - extraction of defined fragment
seqs_drobilka_003_mod.py - sequence splitter into overlapping fragments.
py_stat_graph_012.py - stat info summary display per column in tab delimited files; useful for downstream analysis in MS excel
EST sequence trimmer. It's weird, use it on your own risk.
seqs_trimmer_2007_05_18.py - current version
sequence masking based on BLAST-N search against
Vector_M_PolyAAA.fasta vector database.
Vector_DB_NCBI_2007_10_29_CGP.polyA.fasta - recent vector db
It's weird too (masking), use it on your own risk.
seqs_processor_ultra_polyA_V009_m.py - to work with tcl_blast_parser version 041
redundancy elimination for sequences in FASTA file by Travis Kleeburg.
read more here
qsep_002M.py - quality scores extractor from Phred output and trimmed sequences
Scripts to process CAP3 alignments:
Python_CAP3_MM_Finder_Uni_2007_03_24f.py - current experimental version
Python_CAP3_MM_Finder_Uni_2007_03_24h.py - current experimental version
Detailed description is here
recent versions of MM finder in CAP3 assembly:
Python_CAP3_MM_Finder_Uni_2007_08_14c.py - generic version;
more details here
Python_CAP3_MM_Finder_Uni_2007_09_01a.py - for CGP particular project, unlikely you need it...
Python_CAP3_MM_Finder_Uni_2008_01_26c.py - state of the art display of sequence coverage per nucleotide and related SNP/InDel info
Scripts to run CAP3 in batch mode with pre-defined groups
Python_CAP3_ContigExtractor_Oct_25_2005.py - to work only with two scripts above, it is buggy...
Manipulation with CAP3 derivative files:
post-processing of so-called CAP3 Info file after Python_CAP3_ContigExtractor_Uni_2007_03_19.py script
estimation of CAP3 contig complexity based on CAP3 Info file after Python_CAP3_ContigExtractor_Uni_2007_03_19.py script
read more here
SequenceTrimmer.py - to trim low-quality region from CAP3 alignment
detailed description is here
cap3_alignment2tab_03.py - to generate 'sequence_gap' file from CAP3 alignment
Scripts for Genetic Maps
addDuplMarker.py - add duplicated markers to non-redundant map
Instructions are here
MadMapper - current versions:
Python_MadMapper_V248_RECBIT_012NR.py - clustering
Python_MadMapper_V248_RECBIT_016NR.py - clustering, February 28 2008 update (reduced memory usage, *.all_pairs file is optional)
Python_MadMapper_V248_XDELTA_117.py - map construction
Python_MadMapper_V248_XDELTA_119.py - map construction (current version; variable column ID with pairwise data)
py_matrix_2D_V248_RECBIT.py - map visualization
MadMapper details here
MadMapper clustering based on numerical data
Python_UniCluster_V011.py - really 'beta' ...
Python_UniCluster_V014.py - the latest 'beta'; it generates pairwise matrix with values that can be used for the fine ordering/sorting using Python_MadMapper_V248_XDELTA_119.py script
Scripts to manipulate tab-delimited tables
Read more here
Pixelirator - graphical data display for tab delimited tables
Scripts for Affymetrix Chip design
seqs_processor_and_translator_bin_V027_AGCT_Affy_V05.py - to generate Affy submission
seqs_processor_and_translator_bin_V027_AGCT_N2A.py - to convert 'N' to 'A' in fasta file
to find multiple perfect matches of affy probes within reference set; the reference set should be provided as tab-delimited file with forward and reverse sequences
last modified: September 24 2007