Brief Description
Compositae Genome Project Database (CGPDB) is a source of comprehensive information about 135,000
lettuce and sunflower ESTs. All data displayed in this database are results of joint efforts of
the Compositae Genome Initiative.
Database contains detailed information about individual EST reads
as well as about EST assemblies. Contig Viewer
provides details about EST assemblies as well as polymorphic sites between
different genotypes.
Ordering of particular clones can be done via
Arizona Genome Center.
Note, that you need to provide correct AGI (Arizona Genome Institute) library ID and GenBank Accession number.
Details about library construction can be found
here.
Database Structure
Currently database contains the following tables:
-
Lettuce_Contigs_ID, Sunflower_Contigs_ID -
list of contigs produced by CAP3 assembly.
Tables contain information
which ESTs were used to assemble particular contig and size of fragment
overlaps. By clicking on Contig_ID you can view actual
sequence alignment.
-
Lettuce_Info, Sunflower_Info - tables contain information about TAGs
for each EST read. If an EST read belongs to a contig, this
information is given in a column "Cluster_ID". Column "Seqs_N" shows
how many other EST reads belong to a given contig. By clicking on
"EST_ID" you can view the actual chromatogram. Note, that chromatogram viewer
displays untrimmed unmasked raw reads. Chromatogram viewer is
written in Java and works with Netscape-7 and IE-6.
-
lettuce_all_reads_trimmed, sunflower_all_reads_trimmed -
Actual set of sequences after trimming, visual inspection and
additional semi-manual trimming which were used for contig
assembly.
By clicking on "Sequence_ID" you can view the actual chromatogram.
-
lettuce_assembly, sunflower_assembly -
CAP3 assembly of trimmed sequences. Tables contain
information about all contigs and singletons (unigene set).
In other words, there are a non-redundant datasets.
-
lettuce_assembly_blastX, sunflower_assembly_blastX -
Results of blastX search against nr protein database at NCBI
of unigene set.
24 best hits (if found) represented for each contig or singleton.
Normalized expectation is: (-log(exp))/100
-
lettuce_vs_ath_TIGR, sunflower_vs_ath_TIGR -
Results of blastx search against
Arabidopsis TIGR database (predicted ORFs) of unigene set.
48 best hits (if found) represented for each contig or singleton.
Normalized expectation is: (-log(exp))/100
-
lettuce_pfam, sunflower_pfam - results of hmmrsearch for Pfam domains in the
translated regions of the assemblies.
-
lettuce_clustering, sunflower_clustering -
Results of tblastx search EST assembly against itself. Two sequences considered
as linked if they have identity at least 40% with sequence overlap at least 100 amino acids (based on blast translation).
Clustering was done using
tcl_blast_parser_123_V007.tcl and
Graph9 programs.
Example of clustering visualization you can find
here.
Contig Viewer navigation:
Meaning of EST designation
Contigs:
QG_CA_Contig#### or QH_CA_Contig#### - "QG" stands for Lettuce dataset,
"QH" stands for Sunflower dataset.
Lettuce EST reads:
QGA QGB QGC QGD QGI = QG_(A,B,C,D,I) libraries - cultivated lettuce, Salinas (multiple tissues and growth conditions identified by TAG IDs)
QGE QGF QGG QGH QGJ = QG_(E,F,G,H,J) libraries - wild lettuce, Lactuca serriola (multiple tissues and growth conditions identified by TAG IDs)
Sunflower EST reads:
QHA QHB QHC QHD QHI = QH_(A,B,C,D,I) libraries - sunflower RHA801 (multiple tissues and growth conditions identified by TAG IDs)
QHE QHF QHG QHH QHJ = QH_(E,F,G,H,J) libraries - sunflower RHA280 (multiple tissues and growth conditions identified by TAG IDs)
QHK QHL - Helianthus paradoxus (seedling, root, leaf and flower tissues) (no TAG IDs are available)
QHK - normal growth conditions
QHL - salt stress
QHM QHN - Helianthus argophyllus (seedlings, root and leaf tissues) (no TAG IDs are available)
QHM - normal growth conditions
QHN - drought stress
TAG IDs description:
Lettuce:
TAG0 - callus
TAG1 - roots
TAG2 - none
TAG3 - flowers pre-fertilized
TAG4 - flowers post-fertilized
TAG5 - chemical induction
TAG6 - none
TAG7 - roots environmental stress
TAG8 - shoots environmental stress
TAG9 - germinating seeds
TAG10 - flowers environmental stress
TAG11 - leaves dark grow
Sunflower:
TAG0 - callus
TAG1 - roots
TAG2 - disk and ray flowers
TAG3 - flowers pre-fertilized
TAG4 - developing kernel
TAG5 - chemical induction
TAG6 - none
TAG7 - roots environmental stress
TAG8 - shoots environmental stress
TAG9 - germinating seeds
TAG10 - flowers environmental stress
TAG11 - hulls
email: Alexander Kozik
email: Richard Michelmore
Last modified, December 11 2003