Bioinformatic platform

Supervisor: Prof. Giorgio Valle – CRIBI-PADOVA

Goals

The bioinformatic platform is a pivotal resource in the whole project. We aim to develop a robust platform to achieve three major goals:

  1. To analyze raw data from genome sequencing to obtain information about genes and their functional and structural annotation. This goal will take advantage of the annotation platform developed for the Grape Genome Project, composed by a gene prediction module and a functional annotation workflow.
  2. To integrate data from modules 1, 2 and 3, about – respectively – structural genomics, functional genomics and genetic breeding. In practical terms we aim to develop and integrated data management system built both using existing tools (like the GMOD/Gbrowse) and, when necessary, developing novel tools.
  3. To develop and implement an advanced query system, to support the scientific activity of project modules 5, 6, 7 and 8. This platform will be used to store, manage and organize data produced by this project, and moreover will act as an advanced data retrieval system.

 Activity details

4.1. Gene Prediction Platform (UNIPD-CRIBI)

Once that the genome sequence has been assembled (see Module 1), we will face the problem of identifying the functional elements present in the genome sequence. First we will need to identify genes, taking care of understanding their structure in terms of exons/introns and UTRs and protein coding sequence. Our research group worked in the Grape Genome Project and set up a gene prediction platform that performed quite well. This platform integrates evidences like:

  1. Protein sequence alignments
  2. mRNA sequence alignment (cDNA from same organism, or related ones)
  3. De novo prediction exploiting signals encoded in the sequence

During the Grape Genome Project we used both Sanger sequences from cDNA clones, and the emerging Next Generation Sequencing platform (RNA-Seq). This approach rapidly emerged as a powerful method to discover genes and annotate their structure. We plan to perform the whole transcriptome sequencing (RNA-Seq) from a number of different tissues/organs of the plant (at least 5), to ensure the possibility to describe the transcriptional landscape of as much genes is possibile.

4.2. Functional annotation platform (UNIPD-CRIBI)

Beside annotating the presence of a gene in a sequence, and details about its structure, we need to understand the function of that gene. During the Grape Genome Project we developed a genome annotation pipeline used to produce a functional annotation of the genome. Using as input the sequence similarity with known genes and looking for peculiar pattern as intracellular sorting signals, we aim to assign to each gene as much information is possible.

In particular the description of gene functions will be performed using the standard vocabulary provided by the “Gene Ontology” that allows to define a molecular function, to classify biological processes and to define the cellular component (localization).

4.3. Data management and retrieval (UNIPD-CRIBI)

A specialized database will be implemented to support the whole project. This has been done for the Grape Genome Project, and used to allow the integration of sequence data and functional data.

Particular care will be used to implement and design a robust database structure to allow a precise data retrieval with a complex query system.

4.4. Expert annotation support (UNIPD-CRIBI)

The automatic annotation of a genome, produced with our bioinformatic methods, is a strong staring point to produce a high quality genome that can be useful for the scientific community.

Functional annotation is a complex step, and need to be manually refined by “experts” working on a specific group of genes or proteins.

We will develop and implement an informatic platform to allow experts – i.e. registered annotators of the project – to share their knowledge about proteins, thus improving the final annotation.