Suite of Python MadMapper scripts for quality control of genetic markers,
group analysis and inference of linear order of markers on linkage groups.

Visualization and validation of genetic maps
using two-dimensional CheckMatrix heat-plots.

Example of usage on Arabidopsis Genome Dataset.

Alexander Kozik, UC Davis Genome Center, R.Michelmore lab



Scripts:
Python_MadMapper_V248_RECBIT_012.py - to generate pairwise distance matrix and for group analysis
Python_MadMapper_V248_XDELTA_024.py - for inference of linear order of markers
py_matrix_2D_V248_RECBIT.py - for visualization and validation of genetic maps
www.atgc.org/XLinkage/MadMapper
cgpdb.ucdavis.edu/XLinkage/MadMapper


Original Data Source:
www.arabidopsis.info
www.arabidopsis.info/new_ri_map.html

Locus file with raw marker scores:
DL_RIL_Data.may2001.MM.loc
(Note, all marker IDs were converted to uppercase. Molecular markers with available DNA sequences were selected only [563 markers total])

FrameWork Markers:
Dean_Lister_frame.MM.map
(Note, this file is optional to run with MadMapper_RECBIT and CheckMatrix)


STEP 1: Group analysis and generation of pairwise distance matrix file

bash-2.03$ python   Python_MadMapper_V248_RECBIT_012.py 
           DL_RIL_Data.may2001.MM.loc   DL_RIL_Data.may2001.MM.loc.mmout 
           0.2   100   25   Dean_Lister_frame.MM.map   0.33   50   TRIO   3 
82 files will be generated upon script completion; we are interested in:
DL_RIL_Data.may2001.MM.loc.mmout.pairs_all
DL_RIL_Data.may2001.MM.loc.mmout.x_tree_clust [ DL_RIL_Data.may2001.MM.loc.mmout.x_tree_clust.xls - MS Excel version ]
DL_RIL_Data.may2001.MM.loc.mmout.z_marker_sum [ DL_RIL_Data.may2001.MM.loc.mmout.z_marker_sum.xls - MS Excel version ]


STEP 2: Analysis of DL_RIL_Data.may2001.MM.loc.mmout.pairs_all and DL_RIL_Data.may2001.MM.loc.mmout.z_marker_sum files and extraction of lists 'good' markers to map and framework markers:
DL_RIL_Data.may2001.MM.xlg1.IDs.good
DL_RIL_Data.may2001.MM.xlg1.Frame1

DL_RIL_Data.may2001.MM.xlg2.IDs.good
DL_RIL_Data.may2001.MM.xlg2.Frame1

DL_RIL_Data.may2001.MM.xlg3.IDs.good
DL_RIL_Data.may2001.MM.xlg3.Frame1

DL_RIL_Data.may2001.MM.xlg4.IDs.good
DL_RIL_Data.may2001.MM.xlg4.Frame1

DL_RIL_Data.may2001.MM.xlg5.IDs.good
DL_RIL_Data.may2001.MM.xlg5.Frame1


STEP 3: Mapping - Execution of Python_MadMapper_V248_XDELTA_024.py script with three input files:

DL_RIL_Data.may2001.MM.loc.mmout.pairs_all
DL_RIL_Data.may2001.MM.xlg1.IDs.good
DL_RIL_Data.may2001.MM.xlg1.Frame1

bash-2.03$ python   Python_MadMapper_V248_XDELTA_024.py 
           DL_RIL_Data.may2001.MM.loc.mmout.pairs_all   DL_RIL_Data.may2001.MM.xlg1.IDs.good   
           DL_RIL_Data.may2001.MM.xlg1.Frame1   DL_RIL_Data.may2001.MM.xlg1.mmout.shuffle.v23   
           1   FLEX   SHUFFLE   6   3   
repeat it five times (one time per linkage group).

In each mapping process "*.mad_map_final" file will be generated:
DL_RIL_Data.may2001.MM.xlg1.mmout.shuffle.v23.mad_map_final
DL_RIL_Data.may2001.MM.xlg2.mmout.shuffle.v23.mad_map_final
DL_RIL_Data.may2001.MM.xlg3.mmout.shuffle.v23.mad_map_final
DL_RIL_Data.may2001.MM.xlg4.mmout.shuffle.v23.mad_map_final
DL_RIL_Data.may2001.MM.xlg5.mmout.shuffle.v23.mad_map_final


STEP 4: Visualization using CheckMatrix:
concatenate five individual maps for each linkage group together and run py_matrix_2D_V248_RECBIT.py script.

ath-all5-xdelta.map - global map for all 5 linkage groups
DL_RIL_Data.may2001.MM.loc.mmout.pairs_all - global matrix file

Final output: Arabidopsis Genetic Map




Arabidopsis Markers - Raw Data and Physical Positions on Genome




email to: akozik@atgc.org Alexander Kozik

last modified January 14 2006