This web page describes MS Excel table formats to submit marker and
genotyping data to CGPDB curator. There are four different type of tables:
LGM1_Passport_Table_V00X_Template.xls -
Marker Passport Table
LGM2_Primer_Table_V00X_Template.xls -
Primer (oligo) Table
LGM3_Marker_Info_V00X_Template.xls -
Marker Info Table
LGM4_Marker_Scores_V00X_Template.xls -
Marker Scores Table
TEMPLATE VERSION:
Table field names and structure unfortunately are not static. They can be changed
because of possible changes in database setup. Every time you want to submit new data
check the version of template you are going to use. Version is indicated as a suffix
of corresponding file name, like: _V00X_Template.
Use the latest version for data submission.
HOW TO NAME FILES YOU WANT TO SUBMIT:
File name should reflect the name of submitter as well as the date of file modification/version.
Use your personal three letter code. If your three letter code is SSM
then replace Template on the end of file name with your three letter code and
add the date of file modification/version [like: SSM_2004_07_24 Year_Month_Date].
For example:
LGM4_Marker_Scores_V001_Template.xls rename to
LGM4_Marker_Scores_V001_SSM_2004_07_24.xls
In this case curator has no problem to recognize the type of submitted data, who is submitter
and version of file.
DIFFERENCE BETWEEN NEW DATA SUBMISSION AND UPDATE:
Suggested update procedure:
each table (with exception of oligo table) has a field "status". This field may have
two values new or update. If row has status update and some fields in
this row are filled with X then values in database for
entries with X-s will be intact (not overwritten). Check
example for Table 4.
!!! NO BLANK ENTRIES !!!:
Excel spreadsheet may not contain blank entries. There is an assumption that all
fields should be filled. If no data for particular cell just fill it with dash "-"
___________________________________________________________________
Template 1 Marker Passport Table
Data from this table will go into corresponding mySQL tabel which contains information about
all genetic markers involved into mapping at CGPDB. Table is "cross" independent,
e.g. marker descriptions are valid for any population
(Salinas x Serriola, Calmar x Kordaat, etc.)
Download: LGM1_Passport_Table_V003_Template.xls
Marker Passport Table fields description:
marker_id - required
- marker ID (unique ID in CGPDP)
est_id - required
if it was designed based on EST from CGPDB;
est_id - optional
if it was derived from other source
marker_type - required
- vocabulary: [ EST, SSR, AFLP, genomic, phenotype ]
(Note: if SSR within EST sequence it is suggested to use EST type)
marker_origin - required
- vocabulary: [ CGPDB, keygene, seminis ]
(This field referred to place/organization where marker was developed; in
most cases it is CGPDB.
To distinguish different labs within CGPDB we are using "sender" field)
cos - optional
if marker is COS (conserved ortholog to Arabidopsis)
then enter "COS", otherwise keep "-" (dash)
class - optional (but highly desirable) -
if marker was derived from EST sequence this field should contain short characteristic
(three words max) of that EST.
vocabulary: [ "Pathogen Recognition", "Resistance Pathway", "Defense Response" ]
(Note: feel free to discuss, suggest new type of entries and
modify existing in this field. We need to set up (define) vocabulary for this field as soon as possible)
annotation - optional
- if marker was derived from EST sequence this field
may contain short description of that EST (putative function, for example).
(free style, no more than 12 words, be polite, we do not have a
filter for rudeness yet)
status - required -
if you believe that this is new submission then
enter "new"; if you like to update information then enter "update".
(Note: regardless of your belief (new or update) database will be checked
for this subject. This field simplifies troubleshooting of some cases dramatically)
sender - required -
three letter code of sender (submitter)
vocabulary: [ LKM, SSM, DLM, AKM ] (do you recognize you?) Third letter should reflect
different labs within CGPDB group [ M stands for RWM Lab ]
(Tips: before choosing your personal three letter code do
Google text and image search
to avoid any weird associations)
date_t1 - required -
date of submission (Month/Day/Year)
(Why date_t1 instead of date? Because date is
a reserved word in mySQL, we can not use it as a field name)
public_private - optional -
public or private (this field designed mostly for keygene and seminis markers; if you leave it
unfilled "-" then it will be automatically public)
___________________________________________________________________
Template 2 Primer (oligo) Table
This table contains DNA sequences of primers (oligos) designed for mapping.
Table does not contain information about primers performance (does it work
or does not). Table is "cross" independent. Information about how primers work
for particular cross can be found on Table 3 (see below).
Download: LGM2_Primer_Table_V001_Template.xls
Primer (oligo) Table fields description:
marker_id - required
- marker ID (unique ID in CGPDP)
primer_forward - required
- DNA sequence of forward primer (5'-end)
primer_reverse - required
- DNA sequence of reverse primer (3'-end)
annealing_t - required
- annealing temperature (Celsius)
designer - required
- three letter code of the person who designed this set of primers
date_t2 - required
- date of design (or ordering?)
(DNA sequences for many sets of primers are temporary unavailable.
It is not a problem if primers were ordered by Dean Lavelle.
In this case information about primer sequences can be recovered from ordering tables
[companies provide us with excel spreadsheets for primers we ordered].
Primer (oligo) Table must be submitted if you are using primers which
were designed outside of CGPDB group)
___________________________________________________________________
Template 3 Marker Info Table
This table is specific for particular cross (Salinas x Serriola in this example).
Table contains comprehensive information about method(s) used for scoring/mapping and
names of image files.
Download: LGM3_Marker_Info_V002_Template.xls
Marker Info Table fields description:
marker_id - required
- marker ID (unique ID in CGPDP)
cross - required
- plants (lines, genotypes) which were
used to find polymorphism.
vocabulary: [ "Sal_Ser" - Salinas x Serriola; "Cal_Kor" - Calmar x Kordaat ]
amplification - required -
"yes" if PCR was successful,
"none" if PCR was done but you did not get amplified fragment for this combination of primers
band_n -
optional (but highly desirable) -
number of bands visible on an agarose gel
vocabulary: [ "1" - single band; "2" - two bands; "M" - multiple bands ]
size_b1 -
size of single band
size_b2 -
size of second band
band_range -
band range in the case of multiple bands
next five fields describe the method which was used to score particular marker:
agarose -
marker was scored on a regular agarose gel
sscp -
SSCP: Single Stranded Conformation Polymorphism
tgce -
TGCE: Temperature Gradient Capillary Electrophoresis
dcap -
dCAP: derived Cleaved Amplified Product
other -
if method is other than four above
if amplification is "none" you should leave all five entries above with dashes "-"
if amplification is "yes" (successful) at least one of five method entries should be
designated as good, fair, poor or none
vocabulary: [ "none" - polymorphism was not detected; "good" "fair" "poor"
- different grades how easily polymorphism is scored ]
if entry has dash "-" it means that method was not used (checked)
comments - optional -
meaningful description what do you think about this marker/method
image_file -
optional (but highly desirable) -
name of image file (gel, graph, etc.) for method which was used for actual scoring
(Note: image name should reflect particular cross; it should contain suffix just
before *.jpg / *.tiff file extension which is identical to "cross"
type [ Sal_Ser in this example ] )
status - required
sender - required
date_t3 - required
Last three fields are the same as for Table 1
___________________________________________________________________
Template 4 Marker Scores Table
Probably, most important table. It contains actual marker scores data for particular cross.
Download: LGM4_Marker_Scores_V001_Template.xls
Marker Scores Table fields description:
marker_id - required
status - required
sender - required
date_t4 - required
First four fields are the same as for Table 1 and 3
cross - required
- cross which was used for mapping
vocabulary: [ same as for Table 3 ]
population - required
- population which was used for mapping
vocabulary: [ "F2", "RIL_7" (seventh generation), etc. for RILs ]
method - required
- method which was used for mapping
vocabulary: [ agarose, SSCP, TGCE, dCAP or other ]
other_method - required
if method field above is other; otherwise leave dash "-"
A - required -
parent which was scored as "A"
B - required -
parent which was scored as "B"
P - optional -
introduced because of AFLP markers
vocabulary: [ "P1" or "P2" ]
RIL1 ... RIL130 - required -
marker scores
values: [ A, B, C, D, H, -
or X (if you place X then
values for those individuals (RILs) will not be overwritten in database; this is suggested procedure
if user wants to update data for particular set of individuals only) ]
Python scripts to process templates:
lgm1_verification_script_005.py
lgm2_verification_script_005.py
lgm3_verification_script_005.py
lgm3_verification_script_005_imgx.py
lgm4_verification_script_005.py
lgm4_verification_script_005_non_compl_set.py
lgm_dictionary_005.py
mysql_table_updater_005.py
External links:
European Spreadsheet Risks Interest Group
Excel ate my DNA
Feedback:
email: akozik@atgc.org Alexander Kozik
email: michelmore@vegmail.ucdavis.edu Richard Michelmore
last modified April 03 2005