Step 1: Understanding the problem
The basic idea of the approach is to combine two or more different genotypes into a given assembly.
If CAP3 assembly
represents two or more different genotypes then this assembly contains all information needed to find
polymorphism between those genotypes.
A simple algorithm has been deployed: Polymorphic site can be considered as a candidate for SNP/INDEL if it belongs only to all
members of one genotype in the given contig.
All programs and scripts were run on UNIX/Linux environment.
There is an assumption that user has primary skills in
No knowledge of programming is required. To get proper results user should
use "step by step" instructions. Final results can be viewed on any desktop computer.
should be installed on a computer where scripts will be run as well as
NCBI BLAST and
Computer should be with a CPU at least 750 MHz and 1 Gb of memory.
All results for tomato dataset in this environment were obtained in less than 24 hours.
Scripts developed in our lab are freely available
for download and use.
In the case you are using our pipeline and success, please refer to:
Python DIS pipeline developed by A.Kozik, B.Chan and R.Michelmore at UCD.
(DIS stands for Deletions - Insertions - Substitutions)
Note: this pipeline was designed by year 2003. Since that time a sligthly different approach
and improved scripts were developed. You can check the current protocol of EST selection and SNP discovery