Construction of high-density genetic map of Arabidopsis thaliana
using Affymetrix microarray SFP genotyping data:
Marker grouping and inference of linear order using MadMapper
by Alexander Kozik, UC Davis Genome Center, R.Michelmore group


in the paper:
High-density haplotyping with microarray-based expression and single feature polymorphism markers in Arabidopsis.
West MA, van Leeuwen H, Kozik A, Kliebenstein DJ, Doerge RW, St Clair DA, Michelmore RW.
Genome Res. 2006 Jun;16(6):787-95. Epub 2006 May 15.
PubMed:16702412 | Full text article at the Genome Research
high density SFP Arabidopsis genetic map of the cross Bay-0 Sha containing 637 markers was constructed using JoinMap program.

On this web page we demonstrate alternative approach to construct genetic map using MadMapper suite only.

Scripts:
Python_MadMapper_V248_RECBIT_012NR.py
Python_MadMapper_V248_XDELTA_115.py
py_matrix_2D_V248_RECBIT.py - CheckMatrix

Original data source (genotyping data): elp.ucdavis.edu
File with raw marker scores SFPmap_1258_markers.txt has been slightly modified by adding random numbers in front of IDs to eliminate possible bias because of sorting by mapping program.

original file can be accessed locally here: Ath_SFP_Scores_1258.loc
and modified version with random prefixes here: Ath_SFP_Scores_1258_Rand.loc


STEP 1: GENERATION OF NON-REDUNDANT SET OF SCORES

python Python_MadMapper_V248_RECBIT_012NR.py Ath_SFP_Scores_1258_Rand.loc Ath_SFP_Scores_1258_Rand.recbit01 0.2  100  25  X  0.33  50  NR_SET  3

file with non-redundant marker scores are generated upon script execution:
Ath_SFP_Scores_1258_Rand.recbit01.z_nr_scores.loc
Ath_SFP_Scores_1258_Rand.recbit01.z_nr_scores.loc is re-named into Ath_SFP_Scores_0846_Rand.loc since it contains 846 markers with non-redundant scores.


STEP 2: MARKER CLUSTERING/GROUPING AND FINDING OF DISTINCT LINKAGE GROUPS

python Python_MadMapper_V248_RECBIT_012NR.py Ath_SFP_Scores_0846_Rand.loc Ath_SFP_Scores_0846_Rand.recbit02 0.2  100  120  X  0.33  14  TRIO  3

eighty two (82) files are generated; we are interested in two files:
Ath_SFP_Scores_0846_Rand.recbit02.pairs_all - pairwise distances for all pair of markers
Ath_SFP_Scores_0846_Rand.recbit02.x_tree_clust is re-named into Ath_SFP_Scores_0846_Rand.matrix because it will be used by ChekMatrix and MadMapper_XDELTA programs as matrix file

Ath_SFP_Scores_0846_Rand.recbit02.x_tree_clust - clustering/grouping of markers into distinct linkage groups
Ath_SFP_Scores_0846_Rand.recbit02.x_tree_clust.xls - MS Excel version of the file above


STEP 3: ANALYSIS OF GROUP INFO FILE, SELECTION OF DISTINCT LINKAGE GROUPS AND FRAMEWORK MARKERS

Ath_SFP_Scores_0846_Rand.recbit02.x_tree_clust.xls was analyzed and five group files were generated for visualization with CheckMatrix:

Linkage Group I: Ath_SFP_Scores_0846_Rand.lg1.clust
Linkage Group II: Ath_SFP_Scores_0846_Rand.lg2.clust
Linkage Group III: Ath_SFP_Scores_0846_Rand.lg3.clust
Linkage Group IV: Ath_SFP_Scores_0846_Rand.lg4.clust
Linkage Group V: Ath_SFP_Scores_0846_Rand.lg5.clust
(note, that markers are ordered in the same order as in clustering/grouping info file)

After visualization of grouping by CheckMatrix ten framework markers per linkage group were selected
(framework marker IDs are highlighted in red on the right side on two dimensional heat-plots):
Linkage Group I
Linkage Group II
Linkage Group III
Linkage Group IV
Linkage Group V


STEP 4: MAP CONSTRUCTION

for each list of markers to map:
Linkage Group I: Ath_SFP_Scores_0846_Rand.lg1.list.sorted
Linkage Group II: Ath_SFP_Scores_0846_Rand.lg2.list.sorted
Linkage Group III: Ath_SFP_Scores_0846_Rand.lg3.list.sorted
Linkage Group IV: Ath_SFP_Scores_0846_Rand.lg4.list.sorted
Linkage Group V: Ath_SFP_Scores_0846_Rand.lg5.list.sorted
(note, that markers are randomly ordered according to their prefixes)

and list of framework markers:
Linkage Group I: Ath_SFP_Scores_0846_Rand.lg1.frame
Linkage Group II: Ath_SFP_Scores_0846_Rand.lg2.frame
Linkage Group III: Ath_SFP_Scores_0846_Rand.lg3.frame
Linkage Group IV: Ath_SFP_Scores_0846_Rand.lg4.frame
Linkage Group V: Ath_SFP_Scores_0846_Rand.lg5.frame

run Python_MadMapper_V248_XDELTA_115.py with following options:

python Python_MadMapper_V248_XDELTA_115.py Ath_SFP_Scores_0846_Rand.recbit02.pairs_all Ath_SFP_Scores_0846_Rand.lg1.list.sorted Ath_SFP_Scores_0846_Rand.lg1.frame Ath_SFP_Scores_0846_Rand.lg1.xdeltaV115_6_3_MS.out 1 FLEX SHUFFLE 6 3 

five output files (maps) are listed here:
Linkage Group I: Ath_SFP_Scores_0846_Rand.lg1.xdeltaV115_6_3_MS.out.mad_map_final
Linkage Group II: Ath_SFP_Scores_0846_Rand.lg2.xdeltaV115_6_3_MS.out.mad_map_final
Linkage Group III: Ath_SFP_Scores_0846_Rand.lg3.xdeltaV115_6_3_MS.out.mad_map_final
Linkage Group IV: Ath_SFP_Scores_0846_Rand.lg4.xdeltaV115_6_3_MS.out.mad_map_final
Linkage Group V: Ath_SFP_Scores_0846_Rand.lg5.xdeltaV115_6_3_MS.out.mad_map_final

all temporary maps were recorded into '*.mad_map_temp' files:
Linkage Group I: Ath_SFP_Scores_0846_Rand.lg1.xdeltaV115_6_3_MS.out.mad_map_temp
Linkage Group II: Ath_SFP_Scores_0846_Rand.lg2.xdeltaV115_6_3_MS.out.mad_map_temp
Linkage Group III: Ath_SFP_Scores_0846_Rand.lg3.xdeltaV115_6_3_MS.out.mad_map_temp
Linkage Group IV: Ath_SFP_Scores_0846_Rand.lg4.xdeltaV115_6_3_MS.out.mad_map_temp
Linkage Group V: Ath_SFP_Scores_0846_Rand.lg5.xdeltaV115_6_3_MS.out.mad_map_temp


Visualization of constructed genetic maps using CheckMatrix:
(note, remove/delete first header line from any *.mad_map_final file to use it with CheckMatrix)

Linkage Group I
Linkage Group II
Linkage Group III
Linkage Group IV
Linkage Group V

Comparison with physical coordinates of genes on Columbia genotype
Ath Chromosome I
Ath Chromosome II
Ath Chromosome III
Ath Chromosome IV
Ath Chromosome V
diagonal dot plots were generated using GenoPix_2D_Plotter



Comparison with other methods/software to construct genetic map (example for linkage group 1):
visualization of JoinMap 3.0 output


JoinMap3 log file



JoinMap3 Map file
visualization of JoinMap 2.0 output


JoinMap2 log file



JoinMap2 Map file
visualization of RECORD output


RECORD log file



RECORD Map file




RIL CLUSTERING (example for linkage group I):

Transposed locus file Ath_SFP_Scores_0846_Rand.lg1.loc.transposed.tab is derived from Ath_SFP_Scores_0846_Rand.lg1.map.loc locus file for linkage group I by transposition - 'rotation' of data using MS Excel that converted all columns into rows and all rows into columns.

RILs were clustered using Python_MadMapper_V248_RECBIT_012NR.py script:
Ath_SFP_Scores_0846_Rand.lg1.loc.transposed.out020.x_tree_clust
Ath_SFP_Scores_0846_Rand.lg1.loc.transposed.out020.pairs_all
Ath_SFP_Scores_0846_Rand.lg1.loc.transposed.out020.x_log_file

Results of RIL clustering/grouping were visualized using CheckMatrix:



ten 'frame' RILs were selected: Ath_SFP_Scores_0846_Rand.lg1.loc.transposed.xclust.frame and list of RILs 'to map' was compiled: Ath_SFP_Scores_0846_Rand.lg1.loc.transposed.xclust.list.sorted

Then RILs were 'mapped' using Python_MadMapper_V248_XDELTA_115.py script:

python Python_MadMapper_V248_XDELTA_115.py Ath_SFP_Scores_0846_Rand.lg1.loc.transposed.out020.pairs_all Ath_SFP_Scores_0846_Rand.lg1.loc.transposed.xclust.list.sorted Ath_SFP_Scores_0846_Rand.lg1.loc.transposed.xclust.frame Ath_SFP_Scores_0846_Rand.lg1.loc.transposed.xclust.xdelta_S 1 FLEX SHUFFLE 6 3 &

Ath_SFP_Scores_0846_Rand.lg1.loc.transposed.xclust.xdelta_S.mad_map_final was generated, re-named into Ath_SFP_Scores_0846_RIL.lg1.map and visualized with CheckMatrix:



new locus file was compiled where RILs are sorted according to their similarity (order derived by Python_MadMapper_V248_XDELTA_115.py): Ath_SFP_Scores_0846_RIL.lg1.map.loc

then Ath_SFP_Scores_0846_RIL.lg1.map.loc was transposed back to Ath_SFP_Scores_0846_RIL.lg1.map.transposed.loc file and visualized with CheckMatrix:





GLOBAL MAP:


GLOBAL_MAP_1

GLOBAL_MAP_2

GLOBAL_MAP_3

Minimum Entropy and Best-Fit Extension approach allows to infer linear order of markers without linkage data within selected linkage group. Linkage data between markers of different linkage groups are sufficient to find approximate order of markers. For example, it is possible to find an order of markers of Arabidopsis linkage group 3 based only on data of their interactions with other four linkage groups. See illustration of this approach above (GLOBAL_MAP_1, GLOBAL_MAP_2 and GLOBAL_MAP_3). All pairwise data for markers within linkage group 3 were removed from pairwise matrix file. Only their values between other linkage groups left. This so called 'minus three' matrix is visualized on figure GLOBAL_MAP_2. Framework markers with fixed order are highlighted by red on figure GLOBAL_MAP_2. Sixteen markers from linkage group 3 left to serve as initial positions for proper placing remaining 140 markers. Then MadMapper_XDELTA has found approximate positions for 140 markers of linkage group 3 based only on their interactions with other linkage groups. Resulted map is displayed on figure GLOBAL_MAP_3. Mapped markers of linkage group 3 based on their relationships with markers on linkage groups 1,2,4 and 5 are displayed by black color.

python Python_MadMapper_V248_XDELTA_115.py Ath_MadMap_Shuffle115_AllLG_M3.matrix Ath_MadMap_Shuffle115_AllLG_M3.list Ath_MadMap_Shuffle115_AllLG_M3.frame Ath_MadMap_Shuffle115_AllLG_M3.out 0 FIXED NOSHUFFLE 6 3 

Input files:
Ath_MadMap_Shuffle115_AllLG_M3.matrix - matrix file with all pairwise data within linkage group 3 removed (see visualization of this matrix on GLOBAL_MAP_2 figure)
Ath_MadMap_Shuffle115_AllLG_M3.list - list of markers to map
Ath_MadMap_Shuffle115_AllLG_M3.frame - framework map with fixed order

Output files:
Ath_MadMap_Shuffle115_AllLG_M3.out.mad_map_final - map output file
Ath_MadMap_Shuffle115_AllLG_M3.out.mad_map_log - log file with run parameters



email to: akozik@atgc.org Alexander Kozik
last modified July 04 2006