Education & Certifications
Bachelor of Science, Santa Clara University, Biology (2009)
Bachelor of Science, Santa Clara University, Bioengineering (2009)
Please visit my personal page at:
RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of a fluorescently labeled protein to >10(7) RNA targets generated on a flow cell surface by in situ transcription and intermolecular tethering of RNA to DNA. Studying the MS2 coat protein, we decompose the binding energy contributions from primary and secondary RNA structure, and observe that differences in affinity are often driven by sequence-specific changes in both association and dissociation rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis and a long-hypothesized, structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNA-MaP) provides generalizable insight into the biophysical basis and evolutionary consequences of sequence-function relationships.
View details for DOI 10.1038/nbt.2880
View details for PubMedID 24727714
We describe an assay for transposase-accessible chromatin using sequencing (ATAC-seq), based on direct in vitro transposition of sequencing adaptors into native chromatin, as a rapid and sensitive method for integrative epigenomic analysis. ATAC-seq captures open chromatin sites using a simple two-step protocol with 500-50,000 cells and reveals the interplay between genomic locations of open chromatin, DNA-binding proteins, individual nucleosomes and chromatin compaction at nucleotide resolution. We discovered classes of DNA-binding factors that strictly avoided, could tolerate or tended to overlap with nucleosomes. Using ATAC-seq maps of human CD4(+) T cells from a proband obtained on consecutive days, we demonstrated the feasibility of analyzing an individual's epigenome on a timescale compatible with clinical decision-making.
View details for DOI 10.1038/nmeth.2688
View details for PubMedID 24097267
We describe an approach for targeted genome resequencing, called oligonucleotide-selective sequencing (OS-Seq), in which we modify the immobilized lawn of oligonucleotide primers of a next-generation DNA sequencer to function as both a capture and sequencing substrate. We apply OS-Seq to resequence the exons of either 10 or 344 cancer genes from human DNA samples. In our assessment of capture performance, >87% of the captured sequence originated from the intended target region with sequencing coverage falling within a tenfold range for a majority of all targets. Single nucleotide variants (SNVs) called from OS-Seq data agreed with >95% of variants obtained from whole-genome sequencing of the same individual. We also demonstrate mutation discovery from a colorectal cancer tumor sample matched with normal tissue. Overall, we show the robust performance and utility of OS-Seq for the resequencing analysis of human germline and cancer genomes.
View details for DOI 10.1038/nbt.1996
View details for Web of Science ID 000296801300024
View details for PubMedID 22020387
Somatic cells can be transdifferentiated to other cell types without passing through a pluripotent state by ectopic expression of appropriate transcription factors. Recent reports have proposed an alternative transdifferentiation method in which fibroblasts are directly converted to various mature somatic cell types by brief expression of the induced pluripotent stem cell (iPSC) reprogramming factors Oct4, Sox2, Klf4 and c-Myc (OSKM) followed by cell expansion in media that promote lineage differentiation. Here we test this method using genetic lineage tracing for expression of endogenous Nanog and Oct4 and for X chromosome reactivation, as these events mark acquisition of pluripotency. We show that the vast majority of reprogrammed cardiomyocytes or neural stem cells obtained from mouse fibroblasts by OSKM-induced 'transdifferentiation' pass through a transient pluripotent state, and that their derivation is molecularly coupled to iPSC formation mechanisms. Our findings underscore the importance of defining trajectories during cell reprogramming by various methods.
View details for DOI 10.1038/nbt.3270
View details for PubMedID 26098448
This unit describes Assay for Transposase-Accessible Chromatin with high-throughput sequencing (ATAC-seq), a method for mapping chromatin accessibility genome-wide. This method probes DNA accessibility with hyperactive Tn5 transposase, which inserts sequencing adapters into accessible regions of chromatin. Sequencing reads can then be used to infer regions of increased accessibility, as well as to map regions of transcription-factor binding and nucleosome position. The method is a fast and sensitive alternative to DNase-seq for assaying chromatin accessibility genome-wide, or to MNase-seq for assaying nucleosome positions in accessible regions of the genome. © 2015 by John Wiley & Sons, Inc.
View details for DOI 10.1002/0471142727.mb2129s109
View details for PubMedID 25559105
Limb-girdle muscular dystrophy primarily affects the muscles of the hips and shoulders (the "limb-girdle" muscles), although it is a heterogeneous disorder that can present with varying symptoms. There is currently no cure. We sought to identify the genetic basis of limb-girdle muscular dystrophy type 1 in an American family of Northern European descent using exome sequencing. Exome sequencing was performed on DNA samples from two affected siblings and one unaffected sibling and resulted in the identification of eleven candidate mutations that co-segregated with the disease. Notably, this list included a previously reported mutation in DNAJB6, p.Phe89Ile, which was recently identified as a cause of limb-girdle muscular dystrophy type 1D. Additional family members were Sanger sequenced and the mutation in DNAJB6 was only found in affected individuals. Subsequent haplotype analysis indicated that this DNAJB6 p.Phe89Ile mutation likely arose independently of the previously reported mutation. Since other published mutations are located close by in the G/F domain of DNAJB6, this suggests that the area may represent a mutational hotspot. Exome sequencing provided an unbiased and effective method for identifying the genetic etiology of limb-girdle muscular dystrophy type 1 in a previously genetically uncharacterized family. This work further confirms the causative role of DNAJB6 mutations in limb-girdle muscular dystrophy type 1D.
View details for DOI 10.1016/j.nmd.2014.01.014
View details for Web of Science ID 000336349400011
With next-generation DNA sequencing technologies, one can interrogate a specific genomic region of interest at very high depth of coverage and identify less prevalent, rare mutations in heterogeneous clinical samples. However, the mutation detection levels are limited by the error rate of the sequencing technology as well as by the availability of variant-calling algorithms with high statistical power and low false positive rates. We demonstrate that we can robustly detect mutations at 0.1% fractional representation. This represents accurate detection of one mutant per every 1000 wild-type alleles. To achieve this sensitive level of mutation detection, we integrate a high accuracy indexing strategy and reference replication for estimating sequencing error variance. We employ a statistical model to estimate the error rate at each position of the reference and to quantify the fraction of variant base in the sample. Our method is highly specific (99%) and sensitive (100%) when applied to a known 0.1% sample fraction admixture of two synthetic DNA samples to validate our method. As a clinical application of this method, we analyzed nine clinical samples of H1N1 influenza A and detected an oseltamivir (antiviral therapy) resistance mutation in the H1N1 neuraminidase gene at a sample fraction of 0.18%.
View details for DOI 10.1093/nar/gkr861
View details for Web of Science ID 000298733500002
View details for PubMedID 22013163
We have developed an integrated strategy for targeted resequencing and analysis of gene subsets from the human exome for variants. Our capture technology is geared towards resequencing gene subsets substantially larger than can be done efficiently with simplex or multiplex PCR but smaller in scale than exome sequencing. We describe all the steps from the initial capture assay to single nucleotide variant (SNV) discovery. The capture methodology uses in-solution 80-mer oligonucleotides. To provide optimal flexibility in choosing human gene targets, we designed an in silico set of oligonucleotides, the Human OligoExome, that covers the gene exons annotated by the Consensus Coding Sequencing Project (CCDS). This resource is openly available as an Internet accessible database where one can download capture oligonucleotides sequences for any CCDS gene and design custom capture assays. Using this resource, we demonstrated the flexibility of this assay by custom designing capture assays ranging from 10 to over 100 gene targets with total capture sizes from over 100 Kilobases to nearly one Megabase. We established a method to reduce capture variability and incorporated indexing schemes to increase sample throughput. Our approach has multiple applications that include but are not limited to population targeted resequencing studies of specific gene subsets, validation of variants discovered in whole genome sequencing surveys and possible diagnostic analysis of disease gene subsets. We also present a cost analysis demonstrating its cost-effectiveness for large population studies.
View details for DOI 10.1371/journal.pone.0021088
View details for Web of Science ID 000292291800008
View details for PubMedID 21738606
Intra- and interspecific variation in flower color is a hallmark of angiosperm diversity. The evolutionary forces underlying the variety of flower colors can be nearly as diverse as the colors themselves. In addition to pollinator preferences, non-pollinator agents of selection can have a major influence on the evolution of flower color polymorphisms, especially when the pigments in question are also expressed in vegetative tissues. In such cases, identifying the target(s) of selection starts with determining the biochemical and molecular basis for the flower color variation and examining any pleiotropic effects manifested in vegetative tissues. Herein, we describe a widespread purple-white flower color polymorphism in the mustard Parrya nudicaulis spanning Alaska. The frequency of white-flowered individuals increases with increasing growing-season temperature, consistent with the role of anthocyanin pigments in stress tolerance. White petals fail to produce the stress responsive flavonoid intermediates in the anthocyanin biosynthetic pathway (ABP), suggesting an early pathway blockage. Petal cDNA sequences did not reveal blockages in any of the eight enzyme-coding genes in white-flowered individuals, nor any color differentiating SNPs. A qRT-PCR analysis of white petals identified a 24-fold reduction in chalcone synthase (CHS) at the threshold of the ABP, but no change in CHS expression in leaves and sepals. This arctic species has avoided the deleterious effects associated with the loss of flavonoid intermediates in vegetative tissues by decoupling CHS expression in petals and leaves, yet the correlation of flower color and climate suggests that the loss of flavonoids in the petals alone may affect the tolerance of white-flowered individuals to colder environments.
View details for DOI 10.1371/journal.pone.0018230
View details for Web of Science ID 000289238700005
View details for PubMedID 21490971
Critical to conservation efforts and other investigations at low taxonomic levels, DNA sequence data offer important insights into the distinctiveness, biogeographic partitioning and evolutionary histories of species. The resolving power of DNA sequences is often limited by insufficient variability at the intraspecific level. This is particularly true of studies involving plant organelles, as the conservative mutation rate of chloroplasts and mitochondria makes it difficult to detect polymorphisms necessary to track genealogical relationships among individuals, populations and closely related taxa, through space and time. Massively parallel sequencing (MPS) makes it possible to acquire entire organelle genome sequences to identify cryptic variation that would be difficult to detect otherwise. We are using MPS to evaluate intraspecific chloroplast-level divergence across biogeographic boundaries in narrowly endemic and widespread species of Pinus. We focus on one of the world's rarest pines - Torrey pine (Pinus torreyana) - due to its conservation interest and because it provides a marked contrast to more widespread pine species. Detailed analysis of nearly 90% ( approximately 105 000 bp each) of these chloroplast genomes shows that mainland and island populations of Torrey pine differ at five sites in their plastome, with the differences fixed between populations. This is an exceptionally low level of divergence (1 polymorphism/ approximately 21 kb), yet it is comparable to intraspecific divergence present in widespread pine species and species complexes. Population-level organelle genome sequencing offers new vistas into the timing and magnitude of divergence within species, and is certain to provide greater insight into pollen dispersal, migration patterns and evolutionary dynamics in plants.
View details for DOI 10.1111/j.1365-294X.2009.04474.x
View details for Web of Science ID 000275645700010
View details for PubMedID 20331774