A major challenge following the sequencing of the human genome is to determine the biological functions of the estimated 30,000 genes. This requires the biochemical characterization of expressed proteins in various hosts. To facilitate the protein expression in a high throughput fashion, Invitrogen initiated an ORFeome program about one and half years ago. This ORFeome program is intended to provide the Open Reading Frames (ORFs) of all human proteins to researchers in academics and government, biotech and pharmaceutical industries. Each ORF was PCR-amplified using gene specific primer set from cDNA clone or library and the amplified DNA fragment was cloned into Gateway entry vector using recombination based cloning in a high throughput format. The positive ORF clone was then subject to full-insert sequence verification. The qualified ORF clones are guaranteed to match 100% to public protein sequence. Last December, we launched a web based ORFBrowser to allow users to query our ORFeome database. The ORFBrowser provides not only standard query tools like blast and keyword search, but also advanced browse by Gene Ontology, and a fast and flexible ID search in a batch mode. Using GO Browser, users can easily browse through the ORF collections based on biological processes, cellular component, and molecular function. Using the ID search, the user can query ORFeome database using the IDs such as Unigene ID, LocusLinkID, Gene Symbol, gene accession number and protein accession number or the combination of these IDs, in a batch mode with up to 10,000 IDs in a single query. In addition, Each ORF has been annotated extensively and the annotation of each ORF is captured in the ORFCard, which is summarized in six categories: Gene, ORF, Clone, Protein, SNP and Genomic links. 1. Gene information contains the gene definition, function annotation, related accessions, gene symbol, GO classification, links for CGAP gene expression profile and PubMed references. 2. ORF information contains the ORF size, nucleotide and protein sequences as well as Phred quality values for each base of the ORF. 3. Clone information has the vector type, host cell and clone collection. 4. Protein annotation includes the basic feature of the protein, function annotation, related accessions, protease digestion profile, disease link with OMIM id, secondary structure prediction as well as the links to map the protein to well-known protein domain mapping sites like, PFAM, Prosite, SMART and link to map the protein to SwissModel 5. SNP information contains the links to NCBI SNP database. 6. Genomic Links include links to Unigene, LocusLink, Ensembl, MGI, as well as the link to map the protein to Human genomic backbone using online UCSC Blat tool. With this ORFCard, the users can easily access the most current information for the ORF of interest in various categories with their finger click. The access of this information is absolutely free of charge to anyone and the user has no obligation to buy the ORF clones. Currently, the ORFeome database has 4500 human ORFs and 2800 mouse ORFs. We will be releasing 2000 clones every three months. To our knowledge, ORFeome database provides the largest ORF collection for human and mouse Open Reading Frames and these ORF clones will facilitate the function dissection of individual proteins.



human and other vertebrate genomes human orfs

More to explore:


Need help integrating and/or managing biomedical data?