SEVENS

Seven-transmembrane-helix receptors (7-TMR), known as G-protein-coupled receptors [1], are important genes that work as the gateway of signal transudation induced by ligand binding. Recent progress in determination of human draft sequences [2,3] accelerates the comprehensive analysis of 7-TMR in whole human genome. We have developed an automated system for discovering 7-TMR genes in the whole human genome by three stages. (I) Gene prediction stage: Genomic sequences were obtained from human genome resources of NCBI . To maximize the number of gene candidates, we detected three kinds of sequence sets, (a)"6f-sequences" which were all possible combination between initial and stop codons in 6 reading frames. (b)"ALN-sequences" obtained by ALN [4], which is a dynamic programming algorithm that assigns genome sequence to known protein sequence. (c)"GD-sequences" generated by GeneDecoder [5] which is based on HMM models. (II)Screening stage: The predicted genes passed an analyzing filter using items of BLASTP [6] for similarity search, HMMER [7] and in house program for assigning 7-TMR specific HMM. (PFAM domain [7] ), PROSITE patterns [8] and transmembrane helix (TMH) prediction tools [9]. By carefully assessing each component, two threshold settings, best specificity and best sensitivity, were determined. Then four confidence levels of the datasets were obtained by combining the best specificity and best sensitivity thresholds. (III) Quality improvement stage: Sequence redundancies were adjusted as follows. (1) Pair-wise alignment was applied to the candidate sequences in all-against-all fashion. (2) Sequences were linked together only when they hit for > 50 A.A residues with > 95% identity and shared the same chromosome No., and overlapping genetic position. (3)The result of a transitive closure of the links was then regarded as one cluster. And one representative gene was selected from each cluster. Applying this system to human genome sequences (Apr, 2003), we collected 7-TMR genes in four confidence levels ranging from 1,114 candidates at the highest specificity to 2,235 at the highest sensitivity. These are summarized in SEVENS (http://sevens.cbrc.jp/1.20/). This database intends to cover all "7-TMR universe" with not only the known sequences but also to use newly discovered sequence by computational gene finding program. This aspect is clearly different from previous databases [10-12]. The content search button navigates a page, where candidates are obtained. by the "AND" combination of (a) Keyword in nr.aa database search results, (b)Chromosome number, (c)Data Level, (d)Predicted exon number, (e) Gene Length, (f)Protein length, (g)E-value of sequence search against SWISSPROT or nr.aa, (h) Prosite motifs, and (i) Pfam domains. This search lists up 7-TMR candidate sequences at a chromosomal viewer and a list table. Then each chromosome or sequence links to the sequence analysis page. Here, chromosomal viewer shows the mapping information of selected genes (purple) which links to their protein sequence analysis. Result of Similarity Search part shows an alignment of the query searched against SWISS-PROT and nr.aa database. using BLASTP. Structure part shows the results of analysis, with TMH prediction, PROSITE motif pattern and PFAM domain in amino acid sequence. We are planning to maintain SEVENS with constant updates according to the version up of human genome sequence. Additional information (such as expression data, tertiary structure data etc.) will be included in database with every update chance. We hope these datasets will be of value to researchers engaged in 7-TMR studies.

Webpage:
http://sevens.cbrc.jp/

Tags:

protein sequence protein family

More to explore:

1/20



Need help integrating and/or managing biomedical data?