Suite of Python MadMapper scripts for quality control of genetic markers,
group analysis and inference of linear order of markers on linkage groups.
Visualization and validation of genetic maps
using two-dimensional CheckMatrix heat-plots.
Alexander Kozik, UC Davis Genome Center, R.Michelmore lab
This web page gives a brief description of
CheckMatrix usage - genetic map visualization and validation program
(Part 1);
detailed description how to use MadMapper_RECBIT to group genetic markers into distinct
linkage groups (Part 2);
and inference of linear order of markers on linkage groups
using MadMapper_XDELTA program (Part 3).
MadMapper_XDELTA works in conjunction with MadMapper_RECBIT.
CheckMatrix is used to visualize output of MadMapper scripts.
Part 1
CheckMatrix (py_matrix_2D.py) -
visualization and validation of genetic maps using 2-dimensional heat plots
CheckMatrix is used for visualization and validation of genetic maps.
It can be used for visualization of clustering/grouping results of MadMapper.
Here is a brief description and explanation CheckMatrix input and output files.
Detailed description of CheckMatrix can be found at
http://cgpdb.ucdavis.edu/XLinkage/
and
http://cgpdb.ucdavis.edu/XLinkage/Genetic_Map_Matrix_Plot_Art.html
CheckMatrix
py_matrix_2D_V248_RECBIT.py
takes as input three files:
1. Pairwise Distance Matrix File
madmapper_test_small.out.pairs_all
pairwise distance matrix file can be generated by
Python MadMapper
from Locus File
.....................
GM01 GM07 0.36
GM01 GM08 0.40
GM01 GM09 0.48
GM01 GM10 0.52
GM01 GM11 0.60
GM01 GM12 0.68
GM02 GM01 0.04
GM02 GM02 0.00
GM02 GM03 0.08
GM02 GM04 0.16
GM02 GM05 0.20
GM02 GM06 0.24
.....................
2. Genetic Map File
madmapper_test_small.map.right
(on this example last column reflects the order markers)
LG GM01 0
LG GM02 1
LG GM03 2
LG GM04 3
LG GM05 4
LG GM06 5
LG GM07 6
LG GM08 7
LG GM09 8
LG GM10 9
LG GM11 10
LG GM12 11
3. Locus File (Raw Marker Scores) madmapper_test_small.loc
1 10 20 25
| | | |
GM01 A A A A A A A A A A A A A A A A B B B B B B B B B
GM02 A A A A A A A A A A A A A A A B B B B B B B B B B
GM03 A A A A A A A A A A A A A B B B B B B B B B B B B
GM04 A A A A A A A A A A A B B B B B B B B B B B B B B
GM05 A A A A A A A A A A B B B B B B B B B B B B B B B
GM06 A A A A A A A A A B B B B B B B B B B B B B B B B
GM07 A A A A A A A A A B B B B B B B B B B B B B B A A
GM08 A A A A A A A A A B B B B B B B B B B B B B A A A
GM09 A A A A A A A A A B B B B B B B B B B B A A A A A
GM10 B A A A A A A A A A B B B B B B B B B A A A A A A
GM11 B B A A A A A A A A B B B B B B B B A A A A A A A
GM12 B B B A A A A A A A B B B B B B B A A A A A A A A
by execution of the script with several arguments/options (detailed explanation of options is
here):
$python py_matrix_2D_V248_RECBIT.py madmapper_test_small.out.pairs_all madmapper_test_small.map.right
madmapper_test_small.map.right.xout X Y madmapper_test_small.loc REC NOGRAPH 0.9 LARGE RIL
graphical output will be generated:
Note, that a good (or 'right') map forms a red diagonal on 2D plot running from left upper corner to the bottom of image.
All colors (and corresponding pairwise scores) display smooth transition from any cell on 2D plot to adjacent cells.
Good map has no 'jumps' in adjacent scores on two-dimensional matrix.
Visualization of the 'right' map above is based on the assignment of different color values
to the pairwise distance matrix data in numerical format:
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| #### | GM01 | GM02 | GM03 | GM04 | GM05 | GM06 | GM07 | GM08 | GM09 | GM10 | GM11 | GM12 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM01 | 0.00 | 0.04 | 0.12 | 0.20 | 0.24 | 0.28 | 0.36 | 0.40 | 0.48 | 0.52 | 0.60 | 0.68 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM02 | 0.04 | 0.00 | 0.08 | 0.16 | 0.20 | 0.24 | 0.32 | 0.36 | 0.44 | 0.48 | 0.56 | 0.64 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM03 | 0.12 | 0.08 | 0.00 | 0.08 | 0.12 | 0.16 | 0.24 | 0.28 | 0.36 | 0.40 | 0.48 | 0.56 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM04 | 0.20 | 0.16 | 0.08 | 0.00 | 0.04 | 0.08 | 0.16 | 0.20 | 0.28 | 0.32 | 0.40 | 0.48 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM05 | 0.24 | 0.20 | 0.12 | 0.04 | 0.00 | 0.04 | 0.12 | 0.16 | 0.24 | 0.28 | 0.36 | 0.44 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM06 | 0.28 | 0.24 | 0.16 | 0.08 | 0.04 | 0.00 | 0.08 | 0.12 | 0.20 | 0.32 | 0.40 | 0.48 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM07 | 0.36 | 0.32 | 0.24 | 0.16 | 0.12 | 0.08 | 0.00 | 0.04 | 0.12 | 0.24 | 0.32 | 0.40 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM08 | 0.40 | 0.36 | 0.28 | 0.20 | 0.16 | 0.12 | 0.04 | 0.00 | 0.08 | 0.20 | 0.28 | 0.36 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM09 | 0.48 | 0.44 | 0.36 | 0.28 | 0.24 | 0.20 | 0.12 | 0.08 | 0.00 | 0.12 | 0.20 | 0.28 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM10 | 0.52 | 0.48 | 0.40 | 0.32 | 0.28 | 0.32 | 0.24 | 0.20 | 0.12 | 0.00 | 0.08 | 0.16 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM11 | 0.60 | 0.56 | 0.48 | 0.40 | 0.36 | 0.40 | 0.32 | 0.28 | 0.20 | 0.08 | 0.00 | 0.08 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM12 | 0.68 | 0.64 | 0.56 | 0.48 | 0.44 | 0.48 | 0.40 | 0.36 | 0.28 | 0.16 | 0.08 | 0.00 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
It was an example of visualization of 'good' or 'right' map.
...............
Now we will run CheckMatrix with 'bad' genetic map.
CheckMatrix with 'wrong' marker order (markers GM10 and GM03 are flipped):
LG GM01 0
LG GM02 1
LG GM10 2 ***
LG GM04 3
LG GM05 4
LG GM06 5
LG GM07 6
LG GM08 7
LG GM09 8
LG GM03 9 ***
LG GM11 10
LG GM12 11
produces following heat plot:
It is easy to notice color distortion on this image. It indicates that map is wrong and has to be fixed.
There are large 'jumps' in adjacent distance matrix values for markers GM03 and GM10.
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| #### | GM01 | GM02 | GM10 | GM04 | GM05 | GM06 | GM07 | GM08 | GM09 | GM03 | GM11 | GM12 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM01 | 0.00 | 0.04 | 0.52 | 0.20 | 0.24 | 0.28 | 0.36 | 0.40 | 0.48 | 0.12 | 0.60 | 0.68 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM02 | 0.04 | 0.00 | 0.48 | 0.16 | 0.20 | 0.24 | 0.32 | 0.36 | 0.44 | 0.08 | 0.56 | 0.64 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM10 | 0.52 | 0.48 | 0.00 | 0.32 | 0.28 | 0.32 | 0.24 | 0.20 | 0.12 | 0.40 | 0.08 | 0.16 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM04 | 0.20 | 0.16 | 0.32 | 0.00 | 0.04 | 0.08 | 0.16 | 0.20 | 0.28 | 0.08 | 0.40 | 0.48 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM05 | 0.24 | 0.20 | 0.28 | 0.04 | 0.00 | 0.04 | 0.12 | 0.16 | 0.24 | 0.12 | 0.36 | 0.44 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM06 | 0.28 | 0.24 | 0.32 | 0.08 | 0.04 | 0.00 | 0.08 | 0.12 | 0.20 | 0.16 | 0.40 | 0.48 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM07 | 0.36 | 0.32 | 0.24 | 0.16 | 0.12 | 0.08 | 0.00 | 0.04 | 0.12 | 0.24 | 0.32 | 0.40 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM08 | 0.40 | 0.36 | 0.20 | 0.20 | 0.16 | 0.12 | 0.04 | 0.00 | 0.08 | 0.28 | 0.28 | 0.36 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM09 | 0.48 | 0.44 | 0.12 | 0.28 | 0.24 | 0.20 | 0.12 | 0.08 | 0.00 | 0.36 | 0.20 | 0.28 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM03 | 0.12 | 0.08 | 0.40 | 0.08 | 0.12 | 0.16 | 0.24 | 0.28 | 0.36 | 0.00 | 0.48 | 0.56 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM11 | 0.60 | 0.56 | 0.08 | 0.40 | 0.36 | 0.40 | 0.32 | 0.28 | 0.20 | 0.48 | 0.00 | 0.08 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| GM12 | 0.68 | 0.64 | 0.16 | 0.48 | 0.44 | 0.48 | 0.40 | 0.36 | 0.28 | 0.56 | 0.08 | 0.00 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
Compare values for marker GM03 (underlined on corresponding column) to its adjacent values.
The difference between these adjacent values is larger (greater) in comparison to 'good' matrix file.
Best map should have smallest difference in adjacent values on 2D matrix overall.
In other words, for the best map total sum of these 'delta'
(difference between adjacent values) should be smallest.
Example of visualization using CheckMatrix of random distribution of genetic markers
Part 2
MadMapper_RECBIT - group analysis and quality control of genetic markers.
We will use DL_RIL_Data.may2001.loc Dean and Lister's
Arabidopsis genotyping data set (locus file)
[ it has raw scores for 1357 markers and was downloaded from
arabidopsis.info web site ]
as an example to demonstrate how to run
Python_MadMapper_V248_RECBIT_012.py
script to solve following problems:
1. Group (cluster) all markers from locus file and assign markers to distinct linkage groups.
2. Select markers for linkage group 4 (as an example) and extract from that group reliable markers
with high quality scores.
3. Then we will go to Part 3 and will try to infer linear order of
selected markers on Arabidopsis linkage group 4.
Python_MadMapper_V248_RECBIT_012.py
script takes as input two files:
1. DL_RIL_Data.may2001.loc locus file with raw marker scores.
2. Dean_Lister_frame.IDs frame work map (optional).
Details about Python_MadMapper_V248_RECBIT_012.py usage can be found by running the script without arguments and
in the README_MADMAPPER_248.txt file;
as well as useful description of previous versions can be found
here
for example:
bash-2.03$ python Python_MadMapper_V248_RECBIT_012.py
PROGRAM USAGE:
MAD MAPPER TAKES 10 ARGUMENTS/OPTIONS IN THE FOLLOWING ORDER:
(1)input_file[LOC_DATA/MARKER SCORES] (2)output_file[NAME]
(3)rec_cut[0.2] (4)bit_cut[100] (5)data_cut[25]
(6)group_file[OPTIONAL] (7)allele_dist[0.33] (8)missing_data[50]
(9)trio_analysis[TRIO/NOTRIO] (10)double_cross[3]
if group_file does not exist just enter [X]
DEFAULT VALUES: IN OUT 0.2 100 25 X 0.33 50 TRIO 3
TYPE "HELP" FOR HELP [ "EXIT" TO EXIT ] : HELP
MAD MAPPER ARGUMENTS/OPTIONS - BRIEF EXPLANATION:
INPUT/OUTPUT FILES:
[1] - Input File Name (locus file with raw marker scores)
[2] - Output File Name (master name [prefix] for 80 or so output files)
CLUSTERING PARAMETERS (WILL AFFECT CLUSTERING/GROUPING ONLY):
[3] - Recombination Value (Haplotype Distance) cutoff: 0.20 - 0.25
(NOTE: TRIO analysis (see below) works with 0.2 rec_cut value only)
[4] - BIT Score cutoff: 60-1000 [100 is default and highly recommended]
(Check README_MADMAPPER for BIT Scoring Matrix system and values)
[5] - Overlap Data cutoff (data_cut): minimum number of scores between
two markers to be compared to assign pairwise distance
[6] - Optional Frame Work Marker Map (very useful for clustering analysis
to assign new markers to known linkage groups)
FILTERING PARAMETERS (WILL AFFECT MARKER FILTERING, CREATION OF GOOD
NON-REDUNDANT SET OF MARKERS AND TRIO ANALYSIS):
[7] - Allele Distortion: to filter markers with high allele distortion
[8] - Missing Data: how many missing scores are allowed per marker
TRIO (TRIPLET) ANALYSIS -
FINDING OF TIGHTLY LINKED MARKERS AND THEIR RELATIVE ORDER:
[9] - TRIO/NOTRIO (If TRIO option is chosen then TRIPLET analysis will
take place. Is not recommended to use for large set
markers, 1000 or greater)
[10] - Number of Double Crossovers cutoff value for TRIPLET analysis:
3 - default for noisy data; 0 is recommended for perfect scores
CHECK README_MADMAPPER FOR DETAILED DESCRIPTION OF OPTIONS
AND OUTPUT FILES FORMATS/STRUCTURE
First Step - Group Analysis - We will run the script with the following options/arguments:
$python Python_MadMapper_V248_RECBIT_012.py DL_RIL_Data.may2001.loc DL_RIL_Data.may2001.xout248 0.2 100 25 Dean_Lister_frame.IDs 0.33 50 NOTRIO 3
Output will be represented by 76 files (find more about the output in the
README_MADMAPPER_248.txt file).
[ Note, that at this time the script was running with "NOTRIO" option ]
At this step we are interested in
DL_RIL_Data.may2001.xout248.x_tree_clust
file only. This file has the information about marker clustering/grouping. From this file we will select
markers belonging to the linkage group 4.
We will analyze DL_RIL_Data.may2001.xout248.x_tree_clust
file using MS Excel to find and highlight (select) a set of markers belonging to the linkage group 4.
DL_RIL_Data.may2001.xout248.x_tree_clust.xls -
tree clustering file in MS Excel format.
Screenshot of the region of interest:
From this group info file we will extract marker IDs belonging to the linkage group 4
DL_RIL_LG4A.tab and create new locus
file DL_RIL_LG4A.loc which contains marker scores for
linkage group 4 only. It has raw scores for 247 markers.
Now we will run Python_MadMapper_V248_RECBIT_012.py again on DL_RIL_LG4A.loc file with "TRIO" option:
$python Python_MadMapper_V248_RECBIT_012.py DL_RIL_LG4A.loc DL_RIL_LG4A.xout248 0.2 100 25 Dean_Lister_frame.IDs 0.33 50 TRIO 3
83 files will be generated. We are interested in the Marker Summary file:
DL_RIL_LG4A.xout248.z_marker_sum
which contains information
about loss of data and allele distortion for each marker, as well as useful information derived from
"TRIO" analysis. From this file we will extract marker IDs
which have "GOOD__MARKER" label (grade) only. It means we will use high-quality markers for further analysis.
These markers have low fraction of data loss and meaningful ratio of "A"/"B" scores (allele distortion).
There are 171 'good' markers in the dataset.
Also, we will need DL_RIL_LG4A.xout248.pairs_all for
further analysis. This file contains pairwise distances for all markers of linkage group 4.
So, from here we are ready to jump to the Part 3 to infer linear order of markers on linkage group 4.
Part 3
MadMapper_XDELTA - inference of linear order of markers on linkage groups using
Minimum Entropy Approach and Best-Fit Extension.
Python_MadMapper_V248_XDELTA_016.py
script infers linear order of markers on a linkage group by analysis of two dimensional matrices of pairwise distances.
MadMapper_XDELTA tries to find the 2D matrix which has minimal total sum of differences between adjacent cells.
The script calculates so called 'delta' for each pair of adjacent cells by subtracting one pairwise score from the
adjacent one. Then it calculates the sum of absolute values of all deltas and chooses that matrix which has a lowest
value. In other words, MadMapper_XDELTA is searching for the matrix with the lowest entropy among a set of
available matrices. Check examples of 'right' and 'wrong' maps from the Part 1 of this web page.
'Right' map has a lower entropy compare to high entropy of 'wrong' map.
All values in a two dimensional matrix of pairwise distances are taken into account even for pairs of unlinked markers.
In other words, contribution of pairwise distances between any pair of markers are equal to find the best map with
lowest entropy. This is the major difference of MadMapper_XDELTA approach in comparison to 'classical' genetic map
programs where scores only for linked markers are considered to construct maps.
Finding of the linear order for N markers has (N!)/2 complexity. For ten markers it is (10!)/2 =
1,814,400 [ almost two million of different orders is available for 10 markers ]. So, we can not check all
available matrices for a set of markers 12 or higher in a real time [ using single CPU of 1 to 5 GHz ].
To minimize a number of matrices to analyze, MadMapper_XDELTA script uses a frame work marker order and tries to
insert all other markers one by one into the frame map calculating 'delta' for each iteration. New map with
lowest delta is selected after each iteration. Run time using this approach does not exceed N x N x N time.
If frame work map is not available, MadMapper_XDELTA can check ALL POSSIBLE COMBINATIONS of orders for up to 10
chosen markers. Then it adds markers one by one from the 'markers to map list'. In this case run time for the script
is (N!)/2 + (M x M x M) where N is a number of frame work markers, M is a number of all markers to map.
Before we run MadMapper_XDELTA with a real set of markers (Arabidopsis linkage group 4 from the Part 2 of this web page)
we will show how to use it and how it works with a smaller example set from the Part 1.
Three input files are required:
1. madmapper_test_small.out.pairs_all [ pairwise distances ]
2. madmapper_test_small.list [ list of markers to map ]
3. madmapper_test_small.frame [ list of frame work markers ]
We will execute the script with the following options:
$python Python_MadMapper_V248_XDELTA_016DD.py madmapper_test_small.out.pairs_all madmapper_test_small.list madmapper_test_small.frame madmapper_test_small.xdelta LG FLEX
As a first step, MadMapper_XDELTA will find the best order for frame work markers [ three markers in madmapper_test_small.frame file ]
and then it will add one by one markers from madmapper_test_small.list file. Calculation of delta for first 18 iterations will
look like:
=============================================
MATRIX (ALL PAIRS) : madmapper_test_small.out.pairs_all
MARKERS TO MAP : madmapper_test_small.list
FRAME MARKERS LIST : madmapper_test_small.frame
OUTPUT MAP FILE : madmapper_test_small.xdelta
MAX FRAME LENGTH : 12
FIXED FRAME ORDER : FALSE
LINKAGE GROUP ID : LG
DUMMY DEBUG : TRUE
=============================================
=======
GM02 GM06 GM10 *** 1.52 *** 0.5067 *** 1
GM02 GM10 GM06 *** 1.92 *** 0.64 *** 2
GM06 GM02 GM10 *** 1.68 *** 0.56 *** 3
=======
=======
GM03 GM02 GM06 GM10 *** 2.16 *** 0.54 *** 1
GM02 GM03 GM06 GM10 *** 2.0 *** 0.5 *** 2
GM02 GM06 GM03 GM10 *** 2.64 *** 0.66 *** 3
GM02 GM06 GM10 GM03 *** 3.2 *** 0.8 *** 4
=======
=======
GM08 GM02 GM03 GM06 GM10 *** 3.64 *** 0.728 *** 1
GM02 GM08 GM03 GM06 GM10 *** 4.32 *** 0.864 *** 2
GM02 GM03 GM08 GM06 GM10 *** 3.28 *** 0.656 *** 3
GM02 GM03 GM06 GM08 GM10 *** 2.56 *** 0.512 *** 4
GM02 GM03 GM06 GM10 GM08 *** 3.16 *** 0.632 *** 5
=======
=======
GM09 GM02 GM03 GM06 GM08 GM10 *** 4.8 *** 0.8 *** 1
GM02 GM09 GM03 GM06 GM08 GM10 *** 5.92 *** 0.9867 *** 2
GM02 GM03 GM09 GM06 GM08 GM10 *** 4.72 *** 0.7867 *** 3
GM02 GM03 GM06 GM09 GM08 GM10 *** 3.76 *** 0.6267 *** 4
GM02 GM03 GM06 GM08 GM09 GM10 *** 3.12 *** 0.52 *** 5
GM02 GM03 GM06 GM08 GM10 GM09 *** 3.52 *** 0.5867 *** 6
=======
Note that the lowest delta (red) value corresponds to the best 'right' map in each set of iterations.
Finally, the best map for all 12 markers will be generated
madmapper_test_small.xdelta.mad_map_final:
LG MARKER POS #1# DST1 #2# DST2 #3# DST3 #S# SUMM #D# DIFF STATUS CLASS
LG GM01 0 #1# 0 #2# NNNNNN #3# NNNNNN #S# NNNNNN #D# NNNNNN NNNNNN NNNNN
LG GM02 1 #1# 0.04 #2# 0.08 #3# 0.12 #S# 0.12 #D# 0.0 GOOD __0__
LG GM03 2 #1# 0.08 #2# 0.08 #3# 0.16 #S# 0.16 #D# 0.0 GOOD __0__
LG GM04 3 #1# 0.08 #2# 0.04 #3# 0.12 #S# 0.12 #D# 0.0 GOOD __0__
LG GM05 4 #1# 0.04 #2# 0.04 #3# 0.08 #S# 0.08 #D# 0.0 GOOD __0__
LG GM06 5 #1# 0.04 #2# 0.08 #3# 0.12 #S# 0.12 #D# 0.0 GOOD __0__
LG GM07 6 #1# 0.08 #2# 0.04 #3# 0.12 #S# 0.12 #D# 0.0 GOOD __0__
LG GM08 7 #1# 0.04 #2# 0.08 #3# 0.12 #S# 0.12 #D# 0.0 GOOD __0__
LG GM09 8 #1# 0.08 #2# 0.12 #3# 0.2 #S# 0.2 #D# 0.0 GOOD __0__
LG GM10 9 #1# 0.12 #2# 0.08 #3# 0.2 #S# 0.2 #D# 0.0 GOOD __0__
LG GM11 10 #1# 0.08 #2# 0.08 #3# 0.16 #S# 0.16 #D# 0.0 GOOD __0__
LG GM12 11 #1# 0.08 #2# NNNNNN #3# NNNNNN #S# NNNNNN #D# NNNNNN NNNNNN NNNNN
OK, it is time for the real set, finally:
Input files [ 'good' markers for Arabidopsis linkage group 4 ]:
1. DL_RIL_LG4A.xout248.pairs_all - pairwise distances
2. DL_RIL_LG4A.list.good - list of markers to map
3. DL_RIL_LG4A.frame - list of frame work markers
$python Python_MadMapper_V248_XDELTA_016.py DL_RIL_LG4A.xout248.pairs_all DL_RIL_LG4A.list.good DL_RIL_LG4A.frame DL_RIL_LG4A.list.xdelta 4 FLEX
Output files:
1. DL_RIL_LG4A.list.xdelta.mad_map_final - final best map
2. DL_RIL_LG4A.list.xdelta.mad_map_log - log file
3. DL_RIL_LG4A.list.xdelta.mad_map_temp -
file with all best maps after each set of iterations
4. DL_RIL_LG4A.list.xdelta.mad_map_xjump - 'deltas' for each set of iterations
Visualization using CheckMatrix of the constructed map [ 171 'good' markers ]
(inferred linear order of markers):
Now using 'good' map with 171 markers as a frame work map with fixed order we will try to add (insert) remaining markers
[ those markers that did not fall into 'good' category during grouping (clustering) ]:
$python Python_MadMapper_V248_XDELTA_016.py
DL_RIL_LG4A.xout248.pairs_all
DL_RIL_LG4A.list
DL_RIL_LG4A.list.good.map.order
DL_RIL_LG4A.final.xdelta
4 FIXED
Output files:
1. DL_RIL_LG4A.final.xdelta.mad_map_final - final best map
2. DL_RIL_LG4A.final.xdelta.mad_map_log - log file
3. DL_RIL_LG4A.final.xdelta.mad_map_temp -
file with all best maps after each set of iterations
4. DL_RIL_LG4A.final.xdelta.mad_map_xjump - 'deltas' for each set of iterations
Visualization using CheckMatrix of the constructed map [ 247 all markers ]
(inferred linear order of markers):
We can generate diagonal 2D dot-plots to compare linear order of markers inferred by MadMapper
to the order of markers on 'classical' genetic map constructed by Lister and Dean using MapMaker:
Dot plot of 'classical' map (X axis) versus 'good' map [ 171 markers ] generated by MadMapper (Y axis)
Dot plot of 'classical' map (X axis) versus 'all' map [ 247 markers ] generated by MadMapper (Y axis)
Images were generated using
GenomePixelizer 2D Plotter
and TwoMaps2GenoPix_002.py script.
Note, that 'classical' map displays positions of markers according to their map distances;
'madmap' displays just a relative order of markers [ without real map distances ].
Diagonal 2D dot plot indicates that the linear order of markers derived by MadMapper is
in good correlation with the order of markers derived by MapMaker.
High quality markers [ 'good' 171 set ] produces better map compare to 'all' [ 247 ] marker set.
Optimization of MadMapper approach and improvement of algorithms behind the program are
next steps of this work.
NEW! (January 14 2006)
In the latest version of MadMapper_V248_XDELTA_024 (#24) shuffle [ or ripple ]
procedure was implemented allowing re-arrangement of markers within sliding window:
Python_MadMapper_V248_XDELTA_024.py
Genetic_Map_MadMapper_Arabidopsis.html
web page describes the usage of
Python_MadMapper_V248_XDELTA_024.py
with shuffle option on Arabidopsis genetic map as an example.
CURRENT WORKING VERSIONS OF MadMapper_RECBIT and MadMapper_XDELTA:
Python_MadMapper_V248_RECBIT_012NR.py
Python_MadMapper_V248_RECBIT_016NR.py -
(February 28 2008 update:
generation of *.all_pairs file is optional and FALSE by default;
reduced memory usage; this version may work with up to ~10,000 markers on a machine with 16 Gb of RAM)
Python_MadMapper_V248_XDELTA_115.py
Python_MadMapper_V248_XDELTA_116.py - minor bug fixes
Python_MadMapper_V248_XDELTA_117.py - 'check_map' new function
Python_MadMapper_V248_XDELTA_119.py - can choose clolumn in matrix file with pairwise data; minor bug fixes
example of usage MadMapper/CheckMatrix suite to construct
high-density genetic map of Arabidopsis thaliana
using Affymetrix microarray SFP genotyping data
POSTERS AND PRESENTATIONS:
AKozik_SanDiego_PAG_14_Poster.ppt -
download poster with MadMapper presentation at PAG-14 meeting
AKozik_SanDiego_PAG_14_Presentation.ppt -
dowload MadMapper presentation at PAG-14 meeting
AKozik_Poster_MadMapper_023_L.ppt - Details about MadMapper Usage
email to: akozik@atgc.org Alexander Kozik
last modified July 04 2006