CGPDB Excel Data Submission Templates

This web page describes MS Excel table formats to submit marker and genotyping data to CGPDB curator. There are four different type of tables:

LGM1_Passport_Table_V00X_Template.xls - Marker Passport Table

LGM2_Primer_Table_V00X_Template.xls - Primer (oligo) Table

LGM3_Marker_Info_V00X_Template.xls - Marker Info Table

LGM4_Marker_Scores_V00X_Template.xls - Marker Scores Table


TEMPLATE VERSION:
Table field names and structure unfortunately are not static. They can be changed because of possible changes in database setup. Every time you want to submit new data check the version of template you are going to use. Version is indicated as a suffix of corresponding file name, like: _V00X_Template. Use the latest version for data submission.

HOW TO NAME FILES YOU WANT TO SUBMIT:
File name should reflect the name of submitter as well as the date of file modification/version. Use your personal three letter code. If your three letter code is SSM then replace Template on the end of file name with your three letter code and add the date of file modification/version [like: SSM_2004_07_24 Year_Month_Date]. For example:

LGM4_Marker_Scores_V001_Template.xls rename to LGM4_Marker_Scores_V001_SSM_2004_07_24.xls

In this case curator has no problem to recognize the type of submitted data, who is submitter and version of file.

DIFFERENCE BETWEEN NEW DATA SUBMISSION AND UPDATE:
Suggested update procedure: each table (with exception of oligo table) has a field "status". This field may have two values new or update. If row has status update and some fields in this row are filled with X then values in database for entries with X-s will be intact (not overwritten). Check example for Table 4.

!!! NO BLANK ENTRIES !!!:
Excel spreadsheet may not contain blank entries. There is an assumption that all fields should be filled. If no data for particular cell just fill it with dash "-"

___________________________________________________________________


Template 1 Marker Passport Table
Data from this table will go into corresponding mySQL tabel which contains information about all genetic markers involved into mapping at CGPDB. Table is "cross" independent, e.g. marker descriptions are valid for any population (Salinas x Serriola, Calmar x Kordaat, etc.)
Download: LGM1_Passport_Table_V003_Template.xls



Marker Passport Table fields description:

  marker_id - required - marker ID (unique ID in CGPDP)

  est_id - required if it was designed based on EST from CGPDB;
  est_id - optional if it was derived from other source

  marker_type - required - vocabulary: [ EST, SSR, AFLP, genomic, phenotype ]
(Note: if SSR within EST sequence it is suggested to use EST type)

  marker_origin - required - vocabulary: [ CGPDB, keygene, seminis ]
(This field referred to place/organization where marker was developed; in most cases it is CGPDB.
To distinguish different labs within CGPDB we are using "sender" field)


  cos - optional if marker is COS (conserved ortholog to Arabidopsis) then enter "COS", otherwise keep "-" (dash)

  class - optional (but highly desirable) - if marker was derived from EST sequence this field should contain short characteristic (three words max) of that EST.
vocabulary: [ "Pathogen Recognition", "Resistance Pathway", "Defense Response" ]
(Note: feel free to discuss, suggest new type of entries and modify existing in this field. We need to set up (define) vocabulary for this field as soon as possible)

  annotation - optional - if marker was derived from EST sequence this field may contain short description of that EST (putative function, for example).
(free style, no more than 12 words, be polite, we do not have a filter for rudeness yet)

  status - required - if you believe that this is new submission then enter "new"; if you like to update information then enter "update".
(Note: regardless of your belief (new or update) database will be checked for this subject. This field simplifies troubleshooting of some cases dramatically)

  sender - required - three letter code of sender (submitter)
vocabulary: [ LKM, SSM, DLM, AKM ] (do you recognize you?) Third letter should reflect different labs within CGPDB group [ M stands for RWM Lab ]
(Tips: before choosing your personal three letter code do Google text and image search to avoid any weird associations)

  date_t1 - required - date of submission (Month/Day/Year)
(Why date_t1 instead of date? Because date is a reserved word in mySQL, we can not use it as a field name)

  public_private - optional - public or private (this field designed mostly for keygene and seminis markers; if you leave it unfilled "-" then it will be automatically public)
___________________________________________________________________



Template 2 Primer (oligo) Table
This table contains DNA sequences of primers (oligos) designed for mapping. Table does not contain information about primers performance (does it work or does not). Table is "cross" independent. Information about how primers work for particular cross can be found on Table 3 (see below).
Download: LGM2_Primer_Table_V001_Template.xls



Primer (oligo) Table fields description:

  marker_id - required - marker ID (unique ID in CGPDP)

  primer_forward - required - DNA sequence of forward primer (5'-end)

  primer_reverse - required - DNA sequence of reverse primer (3'-end)

  annealing_t - required - annealing temperature (Celsius)

  designer - required - three letter code of the person who designed this set of primers

  date_t2 - required - date of design (or ordering?)

(DNA sequences for many sets of primers are temporary unavailable. It is not a problem if primers were ordered by Dean Lavelle. In this case information about primer sequences can be recovered from ordering tables [companies provide us with excel spreadsheets for primers we ordered]. Primer (oligo) Table must be submitted if you are using primers which were designed outside of CGPDB group)
___________________________________________________________________



Template 3 Marker Info Table
This table is specific for particular cross (Salinas x Serriola in this example). Table contains comprehensive information about method(s) used for scoring/mapping and names of image files.
Download: LGM3_Marker_Info_V002_Template.xls



Marker Info Table fields description:

  marker_id - required - marker ID (unique ID in CGPDP)

  cross - required - plants (lines, genotypes) which were used to find polymorphism.
vocabulary: [ "Sal_Ser" - Salinas x Serriola; "Cal_Kor" - Calmar x Kordaat ]

  amplification - required - "yes" if PCR was successful,
"none" if PCR was done but you did not get amplified fragment for this combination of primers

  band_n - optional (but highly desirable) - number of bands visible on an agarose gel
vocabulary: [ "1" - single band; "2" - two bands; "M" - multiple bands ]
      size_b1 - size of single band
      size_b2 - size of second band
      band_range - band range in the case of multiple bands

next five fields describe the method which was used to score particular marker:

      agarose - marker was scored on a regular agarose gel
      sscp - SSCP: Single Stranded Conformation Polymorphism
      tgce - TGCE: Temperature Gradient Capillary Electrophoresis
      dcap - dCAP: derived Cleaved Amplified Product
      other - if method is other than four above

if amplification is "none" you should leave all five entries above with dashes "-"
if amplification is "yes" (successful) at least one of five method entries should be designated as good, fair, poor or none
vocabulary: [ "none" - polymorphism was not detected; "good" "fair" "poor" - different grades how easily polymorphism is scored ]
if entry has dash "-" it means that method was not used (checked)

  comments - optional - meaningful description what do you think about this marker/method

  image_file - optional (but highly desirable) - name of image file (gel, graph, etc.) for method which was used for actual scoring
(Note: image name should reflect particular cross; it should contain suffix just before *.jpg / *.tiff file extension which is identical to "cross" type [ Sal_Ser in this example ] )

  status - required
  sender - required
  date_t3 - required
   Last three fields are the same as for Table 1
___________________________________________________________________



Template 4 Marker Scores Table
Probably, most important table. It contains actual marker scores data for particular cross.
Download: LGM4_Marker_Scores_V001_Template.xls



Marker Scores Table fields description:

  marker_id - required
  status - required
  sender - required
  date_t4 - required
   First four fields are the same as for Table 1 and 3

  cross - required - cross which was used for mapping
vocabulary: [ same as for Table 3 ]

  population - required - population which was used for mapping
vocabulary: [ "F2", "RIL_7" (seventh generation), etc. for RILs ]

  method - required - method which was used for mapping
vocabulary: [ agarose, SSCP, TGCE, dCAP or other ]

  other_method - required if method field above is other; otherwise leave dash "-"

  A - required - parent which was scored as "A"

  B - required - parent which was scored as "B"

  P - optional - introduced because of AFLP markers
vocabulary: [ "P1" or "P2" ]

  RIL1 ... RIL130 - required - marker scores
values: [ A, B, C, D, H, - or X (if you place X then values for those individuals (RILs) will not be overwritten in database; this is suggested procedure if user wants to update data for particular set of individuals only) ]


Python scripts to process templates:

lgm1_verification_script_005.py

lgm2_verification_script_005.py

lgm3_verification_script_005.py

lgm3_verification_script_005_imgx.py

lgm4_verification_script_005.py

lgm4_verification_script_005_non_compl_set.py

lgm_dictionary_005.py

mysql_table_updater_005.py


External links:
European Spreadsheet Risks Interest Group
Excel ate my DNA


Feedback:

email: akozik@atgc.org Alexander Kozik
email: michelmore@vegmail.ucdavis.edu Richard Michelmore

last modified April 03 2005