Whole-genome haplotyping using long reads and statistical methods
2014; 32 (3): 261-266
The rapid growth of sequencing technologies has greatly contributed to our understanding of human genetics. Yet, despite this growth, mainstream technologies have not been fully able to resolve the diploid nature of the human genome. Here we describe statistically aided, long-read haplotyping (SLRH), a rapid, accurate method that uses a statistical algorithm to take advantage of the partially phased information contained in long genomic fragments analyzed by short-read sequencing. For a human sample, as little as 30 Gbp of additional sequencing data are needed to phase genotypes identified by 50× coverage whole-genome sequencing. Using SLRH, we phase 99% of single-nucleotide variants in three human genomes into long haplotype blocks 0.2-1 Mbp in length. We apply our method to determine allele-specific methylation patterns in a human genome and identify hundreds of differentially methylated regions that were previously unknown. SLRH should facilitate population-scale haplotyping of human genomes.
View details for DOI 10.1038/nbt.2833
View details for Web of Science ID 000332819800024
Whole-exome sequencing identifies tetratricopeptide repeat domain 7A (TTC7A) mutations for combined immunodeficiency with intestinal atresias.
journal of allergy and clinical immunology
2013; 132 (3): 656-664 e17
Combined immunodeficiency with multiple intestinal atresias (CID-MIA) is a rare hereditary disease characterized by intestinal obstructions and profound immune defects.We sought to determine the underlying genetic causes of CID-MIA by analyzing the exomic sequences of 5 patients and their healthy direct relatives from 5 unrelated families.We performed whole-exome sequencing on 5 patients with CID-MIA and 10 healthy direct family members belonging to 5 unrelated families with CID-MIA. We also performed targeted Sanger sequencing for the candidate gene tetratricopeptide repeat domain 7A (TTC7A) on 3 additional patients with CID-MIA.Through analysis and comparison of the exomic sequence of the subjects from these 5 families, we identified biallelic damaging mutations in the TTC7A gene, for a total of 7 distinct mutations. Targeted TTC7A gene sequencing in 3 additional unrelated patients with CID-MIA revealed biallelic deleterious mutations in 2 of them, as well as an aberrant splice product in the third patient. Staining of normal thymus showed that the TTC7A protein is expressed in thymic epithelial cells, as well as in thymocytes. Moreover, severe lymphoid depletion was observed in the thymus and peripheral lymphoid tissues from 2 patients with CID-MIA.We identified deleterious mutations of the TTC7A gene in 8 unrelated patients with CID-MIA and demonstrated that the TTC7A protein is expressed in the thymus. Our results strongly suggest that TTC7A gene defects cause CID-MIA.
View details for DOI 10.1016/j.jaci.2013.06.013
View details for PubMedID 23830146
Promise of personalized omics to precision medicine
WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE
2013; 5 (1): 73-82
The rapid development of high-throughput technologies and computational frameworks enables the examination of biological systems in unprecedented detail. The ability to study biological phenomena at omics levels in turn is expected to lead to significant advances in personalized and precision medicine. Patients can be treated according to their own molecular characteristics. Individual omes as well as the integrated profiles of multiple omes, such as the genome, the epigenome, the transcriptome, the proteome, the metabolome, the antibodyome, and other omics information are expected to be valuable for health monitoring, preventative measures, and precision medicine. Moreover, omics technologies have the potential to transform medicine from traditional symptom-oriented diagnosis and treatment of diseases toward disease prevention and early diagnostics. We discuss here the advances and challenges in systems biology-powered personalized medicine at its current stage, as well as a prospective view of future personalized health care at the end of this review.
View details for DOI 10.1002/wsbm.1198
View details for Web of Science ID 000312736200005
View details for PubMedID 23184638
Specific plasma autoantibody reactivity in myelodysplastic syndromes.
2013; 3: 3311-?
Increased autoantibody reactivity in plasma from Myelodysplastic Syndromes (MDS) patients may provide novel disease signatures, and possible early detection. In a two-stage study we investigated Immunoglobulin G reactivity in plasma from MDS, Acute Myeloid Leukemia post MDS patients, and a healthy cohort. In exploratory Stage I we utilized high-throughput protein arrays to identify 35 high-interest proteins showing increased reactivity in patient subgroups compared to healthy controls. In validation Stage II we designed new arrays focusing on 25 of the proteins identified in Stage I and expanded the initial cohort. We validated increased antibody reactivity against AKT3, FCGR3A and ARL8B in patients, which enabled sample classification into stable MDS and healthy individuals. We also detected elevated AKT3 protein levels in MDS patient plasma. The discovery of increased specific autoantibody reactivity in MDS patients, provides molecular signatures for classification, supplementing existing risk categorizations, and may enhance diagnostic and prognostic capabilities for MDS.
View details for DOI 10.1038/srep03311
View details for PubMedID 24264604
Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes
2012; 148 (6): 1293-1307
Personalized medicine is expected to benefit from combining genomic information with regular monitoring of physiological states by multiple high-throughput methods. Here, we present an integrative personal omics profile (iPOP), an analysis that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14 month period. Our iPOP analysis revealed various medical risks, including type 2 diabetes. It also uncovered extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. Extremely high-coverage genomic and transcriptomic data, which provide the basis of our iPOP, revealed extensive heteroallelic changes during healthy and diseased states and an unexpected RNA editing mechanism. This study demonstrates that longitudinal iPOP can be used to interpret healthy and diseased states by connecting genomic information with additional dynamic omics activity.
View details for DOI 10.1016/j.cell.2012.02.009
View details for Web of Science ID 000301889500023
View details for PubMedID 22424236
- Performance comparison of whole-genome sequencing platforms NATURE BIOTECHNOLOGY 2012; 30 (1): 78-U118
Performance comparison of exome DNA sequencing technologies
2011; 29 (10): 908-U206
Whole exome sequencing by high-throughput sequencing of target-enriched genomic DNA (exome-seq) has become common in basic and translational research as a means of interrogating the interpretable part of the human genome at relatively low cost. We present a comparison of three major commercial exome sequencing platforms from Agilent, Illumina and Nimblegen applied to the same human blood sample. Our results suggest that the Nimblegen platform, which is the only one to use high-density overlapping baits, covers fewer genomic regions than the other platforms but requires the least amount of sequencing to sensitively detect small variants. Agilent and Illumina are able to detect a greater total number of variants with additional sequencing. Illumina captures untranslated regions, which are not targeted by the Nimblegen and Agilent platforms. We also compare exome sequencing and whole genome sequencing (WGS) of the same sample, demonstrating that exome sequencing can detect additional small variants missed by WGS.
View details for DOI 10.1038/nbt.1975
View details for Web of Science ID 000296273000017
View details for PubMedID 21947028
Whole-Exome Enrichment with the Illumina TruSeq Exome Enrichment Platform.
Cold Spring Harbor protocols
Multiple platforms are available for whole-exome enrichment and sequencing (WES). This protocol is based on the Illumina TruSeq Exome Enrichment platform, which captures ∼62 Mb of the human exonic regions using 95-base DNA probes. In addition to covering the RefSeq and Ensembl coding sequences, the enriched sequences also include ∼28 Mb of RefSeq untranslated regions (UTR). The protocol can be performed at the benchside without the need for automation, and the resulting library can be used for targeted next-generation sequencing on an Illumina HiSeq 2000 sequencer.
View details for DOI 10.1101/pdb.prot084863
View details for PubMedID 25762418
Whole-Exome Enrichment with the Roche NimbleGen SeqCap EZ Exome Library SR Platform.
Cold Spring Harbor protocols
Multiple platforms are available for whole-exome enrichment and sequencing (WES). This protocol is based on the Roche NimbleGen SeqCap EZ Exome Library SR platform, which enriches for ∼44 Mb of the human exonic regions. The SeqCap system uses 55- to 105-base DNA probes to capture known coding DNA sequences (CDS) from the NCBI Consensus CDS Database, RefSeq, and Sanger miRBase. The protocol can be performed at the benchside without the need for automation, and the resulting library can be used for targeted next-generation sequencing on an Illumina HiSeq 2000 sequencer.
View details for DOI 10.1101/pdb.prot084855
View details for PubMedID 25762419
Whole-Exome Enrichment with the Agilent SureSelect Human All Exon Platform.
Cold Spring Harbor protocols
There are multiple platforms available for whole-exome enrichment and sequencing (WES). This protocol is based on the Agilent SureSelect Human All Exon platform, which targets ∼50 Mb of the human exonic regions. The SureSelect system uses ∼120-base RNA probes to capture known coding DNA sequences (CDS) from the NCBI Consensus CDS Database as well as other major RNA coding sequence databases, such as Sanger miRBase. The protocol can be performed at the benchside without the need for automation, and the resulting library can be used for targeted next-generation sequencing on an Illumina HiSeq 2000 sequencer.
View details for DOI 10.1101/pdb.prot083659
View details for PubMedID 25762417
Distinct Splice Variants and Pathway Enrichment in the Cell-Line Models of Aggressive Human Breast Cancer Subtypes
JOURNAL OF PROTEOME RESEARCH
2014; 13 (1): 212-227
This study was conducted as a part of the Chromosome-Centric Human Proteome Project (C-HPP) of the Human Proteome Organization. The United States team of C-HPP is focused on characterizing the protein-coding genes in chromosome 17. Despite its small size, chromosome 17 is rich in protein-coding genes; it contains many cancer-associated genes, including BRCA1, ERBB2, (Her2/neu), and TP53. The goal of this study was to examine the splice variants expressed in three ERBB2 expressed breast cancer cell-line models of hormone-receptor-negative breast cancers by integrating RNA-Seq and proteomic mass spectrometry data. The cell lines represent distinct phenotypic variations subtype: SKBR3 (ERBB2+ (overexpression)/ER-/PR-; adenocarcinoma), SUM190 (ERBB2+ (overexpression)/ER-/PR-; inflammatory breast cancer), and SUM149 (ERBB2 (low expression) ER-/PR-; inflammatory breast cancer). We identified more than one splice variant for 1167 genes expressed in at least one of the three cancer cell lines. We found multiple variants of genes that are in the signaling pathways downstream of ERBB2 along with variants specific to one cancer cell line compared with the other two cancer cell lines and with normal mammary cells. The overall transcript profiles based on read counts indicated more similarities between SKBR3 and SUM190. The top-ranking Gene Ontology and BioCarta pathways for the cell-line specific variants pointed to distinct key mechanisms including: amino sugar metabolism, caspase activity, and endocytosis in SKBR3; different aspects of metabolism, especially of lipids in SUM190; cell-to-cell adhesion, integrin, and ERK1/ERK2 signaling; and translational control in SUM149. The analyses indicated an enrichment in the electron transport chain processes in the ERBB2 overexpressed cell line models and an association of nucleotide binding, RNA splicing, and translation processes with the IBC models, SUM190 and SUM149. Detailed experimental studies on the distinct variants identified from each of these three breast cancer cell line models that may open opportunities for drug target discovery and help unveil their specific roles in cancer progression and metastasis.
View details for DOI 10.1021/pr400773v
View details for Web of Science ID 000329472700022
View details for PubMedID 24111759
Toward More Transparent and Reproducible Omics Studies Through a Common Metadata Checklist and Data Publications
OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY
2014; 18 (1): 10-14
Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies, omics studies are becoming increasingly prevalent; yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research. These essential steps require consistent generation, capture, and distribution of metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologies and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. The omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and, importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement.
View details for DOI 10.1089/omi.2013.0149
View details for Web of Science ID 000331085100002
View details for PubMedID 24456465
A Chromosome-centric Human Proteome Project (C-HPP) to Characterize the Sets of Proteins Encoded in Chromosome 17
JOURNAL OF PROTEOME RESEARCH
2013; 12 (1): 45-57
We report progress assembling the parts list for chromosome 17 and illustrate the various processes that we have developed to integrate available data from diverse genomic and proteomic knowledge bases. As primary resources, we have used GPMDB, neXtProt, PeptideAtlas, Human Protein Atlas (HPA), and GeneCards. All sites share the common resource of Ensembl for the genome modeling information. We have defined the chromosome 17 parts list with the following information: 1169 protein-coding genes, the numbers of proteins confidently identified by various experimental approaches as documented in GPMDB, neXtProt, PeptideAtlas, and HPA, examples of typical data sets obtained by RNASeq and proteomic studies of epithelial derived tumor cell lines (disease proteome) and a normal proteome (peripheral mononuclear cells), reported evidence of post-translational modifications, and examples of alternative splice variants (ASVs). We have constructed a list of the 59 "missing" proteins as well as 201 proteins that have inconclusive mass spectrometric (MS) identifications. In this report we have defined a process to establish a baseline for the incorporation of new evidence on protein identification and characterization as well as related information from transcriptome analyses. This initial list of "missing" proteins that will guide the selection of appropriate samples for discovery studies as well as antibody reagents. Also we have illustrated the significant diversity of protein variants (including post-translational modifications, PTMs) using regions on chromosome 17 that contain important oncogenes. We emphasize the need for mandated deposition of proteomics data in public databases, the further development of improved PTM, ASV, and single nucleotide variant (SNV) databases, and the construction of Web sites that can integrate and regularly update such information. In addition, we describe the distribution of both clustered and scattered sets of protein families on the chromosome. Since chromosome 17 is rich in cancer-associated genes, we have focused the clustering of cancer-associated genes in such genomic regions and have used the ERBB2 amplicon as an example of the value of a proteogenomic approach in which one integrates transcriptomic with proteomic information and captures evidence of coexpression through coordinated regulation.
View details for DOI 10.1021/pr300985j
View details for Web of Science ID 000313156300007
Exome sequencing by targeted enrichment.
Current protocols in molecular biology / edited by Frederick M. Ausubel ... [et al.]
2013; Chapter 7: Unit7 12-?
This unit describes methods for targeted enrichment of the exon-coding portions of the genome using Agilent SureSelect Human All Exon 50 Mb and Roche Nimblegen SeqCap EZ Exome platforms. Each platform targets and enriches a large overlapping portion of the greater human exome. The protocols here describe the biochemical procedures used to enrich exomic DNA with each platform, including recommended modifications to the manufacturers' protocols. In addition, a brief description of the sequencing protocol and estimation of the needed amount of sequencing for each platform is included. Finally, a detailed analytical pipeline for processing the subsequent data is described. These protocols focus specifically on human exome sequencing platforms, but can be applied with some modification to other organisms and targeted enrichment approaches.
View details for DOI 10.1002/0471142727.mb0712s102
View details for PubMedID 23547016
Systems biology: personalized medicine for the future?
CURRENT OPINION IN PHARMACOLOGY
2012; 12 (5): 623-628
Systems biology is actively transforming the field of modern health care from symptom-based disease diagnosis and treatment to precision medicine in which patients are treated based on their individual characteristics. Development of high-throughput technologies such as high-throughout sequencing and mass spectrometry has enabled scientists and clinicians to examine genomes, transcriptomes, proteomes, metabolomes, and other omics information in unprecedented detail. The combined 'omics' information leads to a global profiling of health and disease, and provides new approaches for personalized health monitoring and preventative medicine. In this article, we review the efforts of systems biology in personalized medicine in the past 2 years, and discuss in detail achievements and concerns, as well as highlights and hurdles for future personalized health care.
View details for DOI 10.1016/j.coph.2012.07.011
View details for Web of Science ID 000310478800017
View details for PubMedID 22858243
Patient-Specific Induced Pluripotent Stem Cells as a Model for Familial Dilated Cardiomyopathy
SCIENCE TRANSLATIONAL MEDICINE
2012; 4 (130)
Characterized by ventricular dilatation, systolic dysfunction, and progressive heart failure, dilated cardiomyopathy (DCM) is the most common form of cardiomyopathy in patients. DCM is the most common diagnosis leading to heart transplantation and places a significant burden on healthcare worldwide. The advent of induced pluripotent stem cells (iPSCs) offers an exceptional opportunity for creating disease-specific cellular models, investigating underlying mechanisms, and optimizing therapy. Here, we generated cardiomyocytes from iPSCs derived from patients in a DCM family carrying a point mutation (R173W) in the gene encoding sarcomeric protein cardiac troponin T. Compared to control healthy individuals in the same family cohort, cardiomyocytes derived from iPSCs from DCM patients exhibited altered regulation of calcium ion (Ca(2+)), decreased contractility, and abnormal distribution of sarcomeric ?-actinin. When stimulated with a ?-adrenergic agonist, DCM iPSC-derived cardiomyocytes showed characteristics of cellular stress such as reduced beating rates, compromised contraction, and a greater number of cells with abnormal sarcomeric ?-actinin distribution. Treatment with ?-adrenergic blockers or overexpression of sarcoplasmic reticulum Ca(2+) adenosine triphosphatase (Serca2a) improved the function of iPSC-derived cardiomyocytes from DCM patients. Thus, iPSC-derived cardiomyocytes from DCM patients recapitulate to some extent the morphological and functional phenotypes of DCM and may serve as a useful platform for exploring disease mechanisms and for drug screening.
View details for DOI 10.1126/scitranslmed.3003552
View details for Web of Science ID 000303045900004
View details for PubMedID 22517884
- The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome NATURE BIOTECHNOLOGY 2012; 30 (3): 221-223
- Detecting and annotating genetic variations using the HugeSeq pipeline NATURE BIOTECHNOLOGY 2012; 30 (3): 226-229
Diverse protein kinase interactions identified by protein microarrays reveal novel connections between cellular processes
GENES & DEVELOPMENT
2011; 25 (7): 767-778
Protein kinases are key regulators of cellular processes. In spite of considerable effort, a full understanding of the pathways they participate in remains elusive. We globally investigated the proteins that interact with the majority of yeast protein kinases using protein microarrays. Eighty-five kinases were purified and used to probe yeast proteome microarrays. One-thousand-twenty-three interactions were identified, and the vast majority were novel. Coimmunoprecipitation experiments indicate that many of these interactions occurred in vivo. Many novel links of kinases to previously distinct cellular pathways were discovered. For example, the well-studied Kss1 filamentous pathway was found to bind components of diverse cellular pathways, such as those of the stress response pathway and the Ccr4-Not transcriptional/translational regulatory complex; genetic tests revealed that these different components operate in the filamentation pathway in vivo. Overall, our results indicate that kinases operate in a highly interconnected network that coordinates many activities of the proteome. Our results further demonstrate that protein microarrays uncover a diverse set of interactions not observed previously.
View details for DOI 10.1101/gad.1998811
View details for Web of Science ID 000289062700010
View details for PubMedID 21460040
Yeast proteomics and protein microarrays
JOURNAL OF PROTEOMICS
2010; 73 (11): 2147-2157
Our understanding of biological processes as well as human diseases has improved greatly thanks to studies on model organisms such as yeast. The power of scientific approaches with yeast lies in its relatively simple genome, its facile classical and molecular genetics, as well as the evolutionary conservation of many basic biological mechanisms. However, even in this simple model organism, systems biology studies, especially proteomic studies had been an intimidating task. During the past decade, powerful high-throughput technologies in proteomic research have been developed for yeast including protein microarray technology. The protein microarray technology allows the interrogation of protein-protein, protein-DNA, protein-small molecule interaction networks as well as post-translational modification networks in a large-scale, high-throughput manner. With this technology, many groundbreaking findings have been established in studies with the budding yeast Saccharomyces cerevisiae, most of which could have been unachievable with traditional approaches. Discovery of these networks has profound impact on explicating biological processes with a proteomic point of view, which may lead to a better understanding of normal biological phenomena as well as various human diseases.
View details for DOI 10.1016/j.jprot.2010.08.003
View details for Web of Science ID 000283903000008
View details for PubMedID 20728591
Systems biology approaches to disease marker discovery
2010; 28 (4): 209-224
Our understanding of human disease and potential therapeutics is improving rapidly. In order to take advantage of these developments it is important to be able to identify disease markers. Many new high-throughput genomics and proteomics technologies are being implemented to identify candidate disease markers. These technologies include protein microarrays, next-generation DNA sequencing and mass spectrometry platforms. Such methods are particularly important for elucidating the repertoire of molecular markers in the genome, transcriptome, proteome and metabolome of patients with diseases such as cancer, autoimmune diseases, and viral infections, resulting from the disruption of many biological pathways. These new technologies have identified many potential disease markers. These markers are expected to be valuable to achieve the promise of truly personalized medicine.
View details for DOI 10.3233/DMA-2010-0707
View details for Web of Science ID 000279321200003
View details for PubMedID 20534906