Python and Tcl/Tk scripts and tools to process and analyze DNA sequences and related data

GenBank2Fasta_UniExtractor_124.tcl - GenBank to Fasta file converter; besides of sequence extraction this parser extracts additional useful information from GenBank file and place it into Fasta header file.
GenBank2Fasta_UniExtractor_126.tcl - current version, minor bug fixes. - DNA sequence processor and translator; it does translation in 6 frames in batch mode. Brief description is here - current version, it has new function - sequence split into multiple fasta files. - the same as above with additional option to create fake quality for FASTA file. - the same as above with an option to convert fasta alignments into CAP3 style.

tcl_blast_parser_123_V038.tcl - NCBI BLAST parser. Detailed description is here
tcl_blast_parser_123_V039.tcl - current version
tcl_blast_parser_123_V041.tcl - current version - to find common query overlap
tcl_blast_parser_123_V043_SS_beta03.tcl - beta version to extract subgroups in FASTA format for dowstream assembly program (CAP3, DiAlign, etc)
tcl_blast_parser_123_V047.tcl - Febr. 19 2009 version - fixed bug for long query length (10,000 or longer); derived from V041 - Extraction of ORF (open reading frame) from BLAST-X report. BLAST EST sequences against protein reference database and extract EST fragment that correspond to BLAST-X alignment. - current version (with no_hits counter). - extraction of sub-region from BLAST report (blast-x) if hit ID has match to query ID. - sequence subgroup extractor (1) - sequence subgroup extractor (3)
to extract sequence subset from FASTA file based on gene ID list: version (1) - full size sequence extraction
version (3) - extraction of defined fragment - sequence splitter into overlapping fragments. - stat info summary display per column in tab delimited files; useful for downstream analysis in MS excel - EST sequence trimmer. It's weird, use it on your own risk. - current version - sequence masking based on BLAST-N search against Vector_M_PolyAAA.fasta vector database.
Vector_DB_NCBI_2007_10_29_CGP.polyA.fasta - recent vector db
It's weird too (masking), use it on your own risk. - to work with tcl_blast_parser version 041 - redundancy elimination for sequences in FASTA file by Travis Kleeburg. read more here - quality scores extractor from Phred output and trimmed sequences

Scripts to process CAP3 alignments: - current experimental version - current experimental version
Detailed description is here
recent versions of MM finder in CAP3 assembly: - generic version; more details here - for CGP particular project, unlikely you need it... - state of the art display of sequence coverage per nucleotide and related SNP/InDel info

Scripts to run CAP3 in batch mode with pre-defined groups - to work only with two scripts above, it is buggy...

Manipulation with CAP3 derivative files: - post-processing of so-called CAP3 Info file after script - estimation of CAP3 contig complexity based on CAP3 Info file after script
read more here - to trim low-quality region from CAP3 alignment
detailed description is here - to generate 'sequence_gap' file from CAP3 alignment

Scripts for Genetic Maps - add duplicated markers to non-redundant map
Instructions are here

MadMapper - current versions: - clustering - clustering, February 28 2008 update (reduced memory usage, *.all_pairs file is optional) - map construction - map construction (current version; variable column ID with pairwise data) - map visualization
MadMapper details here

MadMapper clustering based on numerical data - really 'beta' ... - the latest 'beta'; it generates pairwise matrix with values that can be used for the fine ordering/sorting using script

Scripts to manipulate tab-delimited tables
Read more here

Pixelirator - graphical data display for tab delimited tables

Scripts for Affymetrix Chip design - to generate Affy submission - to convert 'N' to 'A' in fasta file

TkLife_Search_07M_PepperAffy_04_off1_100L_ContigViewerTest.tcl -
to find multiple perfect matches of affy probes within reference set; the reference set should be provided as tab-delimited file with forward and reverse sequences

last modified: September 24 2007