Brief Description
Compositae Genome Project Database (CGPDB) is a source of comprehensive information about 135,000
lettuce and sunflower ESTs. All data displayed in this database are results of joint efforts of
the Compositae Genome Initiative.
Database contains detailed information about individual EST reads
as well as about EST assemblies. Individual EST sequences can be retrieved by ID from
"Sequence retrieval from CGPDB"
web page. Contig Viewer
provides details about EST assemblies as well as polymorphic sites between
different genotypes. Search by keywords for particular gene can be done from
"Sequence Retrieval by Keywords"
web page. Also, you can do
BLAST search against lettuce or sunflower assemblies.
phpMyAdmin
is a generic interface to access and search all tables in the database (see Database Structure paragraph below).
Ordering of particular clones can be done via
Arizona Genome Center.
Note, that you need to provide correct AGI (Arizona Genome Institute) library ID and GenBank Accession number.
That information can be obtained at
"Sequence retrieval from CGPDB" web page.
"Clone ordering information" subsection on that web page will convert CGPDB ID
(like QGB3a10.yg.ab1 or QHB20A05.yg.ab1) into AGI library ID, plate number, well address and GenBank Accession number.
Details about library construction can be found
here.
Database Structure
Link "Database with phpMyAdmin interface"
leads to phpMyAdmin interface to mySQL database.
Tables are listed on the right frame when you click on "Display Tables".
You can browse tables by clicking on "Browse".
Use this to see the type of data.
To run SQL queries click on "Select".
Currently database contains the following tables:
-
Lettuce_Contigs_ID, Sunflower_Contigs_ID -
list of contigs produced by CAP3 assembly.
Tables contain information
which ESTs were used to assemble particular contig and size of fragment
overlaps. By clicking on Contig_ID you can view actual
sequence alignment.
-
Lettuce_Info, Sunflower_Info - tables contain information about TAGs
for each EST read. If an EST read belongs to a contig, this
information is given in a column "Cluster_ID". Column "Seqs_N" shows
how many other EST reads belong to a given contig. By clicking on
"EST_ID" you can view the actual chromatogram. Note, that chromatogram viewer
displays untrimmed unmasked raw reads. Chromatogram viewer is
written in Java and works with Netscape-7 and IE-6.
-
lettuce_all_reads_trimmed, sunflower_all_reads_trimmed -
Actual set of sequences after trimming, visual inspection and
additional semi-manual trimming which were used for contig
assembly.
By clicking on "Sequence_ID" you can view the actual chromatogram.
-
lettuce_assembly, sunflower_assembly -
CAP3 assembly of trimmed sequences. Tables contain
information about all contigs and singletons (unigene set).
In other words, there are a non-redundant datasets.
-
lettuce_assembly_blastX, sunflower_assembly_blastX -
Results of blastX search against nr protein database at NCBI
of unigene set.
24 best hits (if found) represented for each contig or singleton.
Normalized expectation is: (-log(exp))/100
-
lettuce_vs_ath_TIGR, sunflower_vs_ath_TIGR -
Results of blastx search against
Arabidopsis TIGR database (predicted ORFs) of unigene set.
48 best hits (if found) represented for each contig or singleton.
Normalized expectation is: (-log(exp))/100
-
lettuce_pfam, sunflower_pfam - results of hmmrsearch for Pfam domains in the
translated regions of the assemblies.
-
lettuce_clustering, sunflower_clustering -
Results of tblastx search EST assembly against itself. Two sequences considered
as linked if they have identity at least 40% with sequence overlap at least 100 amino acids (based on blast translation).
Clustering was done using
tcl_blast_parser_123_V007.tcl and
Graph9 programs.
Example of clustering visualization you can find
here.
Contig Viewer navigation:
Meaning of EST designation
Contigs:
QG_CA_Contig#### or QH_CA_Contig#### - "QG" stands for Lettuce dataset,
"QH" stands for Sunflower dataset.
Lettuce EST reads:
QGA QGB QGC QGD QGI = QG_(A,B,C,D,I) libraries - cultivated lettuce, Salinas (multiple tissues and growth conditions identified by TAG IDs)
QGE QGF QGG QGH QGJ = QG_(E,F,G,H,J) libraries - wild lettuce, Lactuca serriola (multiple tissues and growth conditions identified by TAG IDs)
Sunflower EST reads:
QHA QHB QHC QHD QHI = QH_(A,B,C,D,I) libraries - sunflower RHA801 (multiple tissues and growth conditions identified by TAG IDs)
QHE QHF QHG QHH QHJ = QH_(E,F,G,H,J) libraries - sunflower RHA280 (multiple tissues and growth conditions identified by TAG IDs)
QHK QHL - Helianthus paradoxus (seedling, root, leaf and flower tissues) (no TAG IDs are available)
QHK - normal growth conditions
QHL - salt stress
QHM QHN - Helianthus argophyllus (seedlings, root and leaf tissues) (no TAG IDs are available)
QHM - normal growth conditions
QHN - drought stress
TAG IDs description:
Lettuce:
TAG0 - callus
TAG1 - roots
TAG2 - none
TAG3 - flowers pre-fertilized
TAG4 - flowers post-fertilized
TAG5 - chemical induction
TAG6 - none
TAG7 - roots environmental stress
TAG8 - shoots environmental stress
TAG9 - germinating seeds
TAG10 - flowers environmental stress
TAG11 - leaves dark grow
Sunflower:
TAG0 - callus
TAG1 - roots
TAG2 - disk and ray flowers
TAG3 - flowers pre-fertilized
TAG4 - developing kernel
TAG5 - chemical induction
TAG6 - none
TAG7 - roots environmental stress
TAG8 - shoots environmental stress
TAG9 - germinating seeds
TAG10 - flowers environmental stress
TAG11 - hulls
email: Alexander Kozik
email: Richard Michelmore
Last modified, December 11 2003