Doctor of Philosophy, University of California Santa Cruz (2011)
Scott Boyd, Postdoctoral Faculty Sponsor
The human and mouse antibody repertoires are formed by identical processes, but like all small animals, mice only have sufficient lymphocytes to express a small part of the potential antibody repertoire. In this study, we determined how the heavy chain repertoires of two mouse strains are generated. Analysis of IgM- and IgG-associated VDJ rearrangements generated by high-throughput sequencing confirmed the presence of 99 functional immunoglobulin heavy chain variable (IGHV) genes in the C57BL/6 genome, and inferred the presence of 164 IGHV genes in the BALB/c genome. Remarkably, only five IGHV sequences were common to both strains. Compared with humans, little N nucleotide addition was seen in the junctions of mouse VDJ genes. Germline human IgG-associated IGHV genes are rare, but many murine IgG-associated IGHV genes were unmutated. Together these results suggest that the expressed mouse repertoire is more germline-focused than the human repertoire. The apparently divergent germline repertoires of the mouse strains are discussed with reference to reports that inbred mouse strains carry blocks of genes derived from each of the three subspecies of the house mouse. We hypothesize that the germline genes of BALB/c and C57BL/6 mice may originally have evolved to generate distinct germline-focused antibody repertoires in the different mouse subspecies.
View details for DOI 10.1098/rstb.2014.0236
View details for PubMedID 26194750
Common variable immune deficiency (CVID) is the most common symptomatic primary immune deficiency, affecting ~1 in 25,000 persons. These patients suffer from impaired antibody responses, autoimmunity, and susceptibility to lymphoid cancers. To explore the cellular basis for these clinical phenotypes, we conducted high-throughput DNA sequencing of immunoglobulin heavy chain gene rearrangements from 93 CVID patients and 105 control subjects and sorted naïve and memory B cells from 13 of the CVID patients and 10 of the control subjects. The CVID patients showed abnormal VDJ rearrangement and abnormal formation of complementarity-determining region 3 (CDR3). We observed a decreased selection against antibodies with long CDR3s in memory repertoires and decreased variable gene replacement, offering possible mechanisms for increased patient autoreactivity. Our data indicate that patient immunodeficiency might derive from both decreased diversity of the naïve B cell pool and decreased somatic hypermutation in memory repertoires. The CVID patients also exhibited an abnormal clonal expansion of unmutated B cells relative to the controls. Although impaired B cell germinal center activation is commonly viewed as causative in CVID, these data indicate that CVID B cells diverge from controls as early as the pro-B stage, cell and suggest possible explanations for the increased incidence of autoimmunity, immunodeficiency, and lymphoma CVID patients.
View details for DOI 10.1126/scitranslmed.aab1216
View details for Web of Science ID 000360942300006
View details for PubMedID 26311730
Adaptive immune responses in humans rely on somatic genetic rearrangements of Ig and T-cell receptor loci to generate diverse antigen receptors. It is unclear to what extent an individual's genetic background affects the characteristics of the antibody repertoire used in responding to vaccination or infection. We studied the B-cell repertoires and clonal expansions in response to attenuated varicella-zoster vaccination in four pairs of adult identical twins and found that the global antibody repertoires of twin pair members showed high similarity in antibody heavy chain V, D, and J gene segment use, and in the length and features of the complementarity-determining region 3, a major determinant of antigen binding. These twin similarities were most pronounced in the IgM-expressing B-cell pools, but were seen to a lesser extent in IgG-expressing B cells. In addition, the degree of antibody somatic mutation accumulated in the B-cell repertoire was highly correlated within twin pair members. Twin pair members had greater numbers of shared convergent antibody sequences, including mutated sequences, suggesting similarity among memory B-cell clonal lineages. Despite these similarities in the memory repertoire, the B-cell clones used in acute responses to ZOSTAVAX vaccination were largely unique to each individual. Taken together, these results suggest that the overall B-cell repertoire is significantly shaped by the underlying germ-line genome, but that stochastic or individual-specific effects dominate the selection of clones in response to an acute antigenic stimulus.
View details for DOI 10.1073/pnas.1415875112
View details for Web of Science ID 000347732300060
High-throughput DNA sequencing techniques have greatly accelerated the pace of research into the repertoires of antibody and T cell receptor gene rearrangements that confer antigen specificity to adaptive immune responses. Studies of aging-related changes in human B cell repertoires have benefited from the ability to detect and quantify thousands to millions of B cell clones in human samples, and study the mutational lineages and isotype switching relationships within each clonal lineage. Correlation of repertoire analysis with antibody gene data from antigen-specific B cells is poised to give much greater insight into clinically relevant B cell responses and memory storage. Here, we describe strategies for preparing and analyzing human antibody gene libraries for studying B cell repertoires.
View details for DOI 10.1007/978-1-4939-2963-4_17
View details for PubMedID 26420720
The frequencies, cellular phenotypes, epitope specificity, and clonal diversity of allergen-specific B cells in patients with food allergy are not fully understood but are of major pathogenic and therapeutic significance.We sought to characterize peanut allergen-specific B-cell populations and the sequences and binding activities of their antibodies before and during immunotherapy.B cells binding fluorescently labeled Ara h 1 or Ara h 2 were phenotyped and isolated by means of flow cytometric sorting from 18 patients at baseline and 13 patients during therapy. Fifty-seven mAbs derived from allergen-binding single B cells were evaluated by using ELISA, Western blotting, and peptide epitope mapping. Deep sequencing of the B-cell repertoires identified additional members of the allergen-specific B-cell clones.Median allergen-binding B-cell frequencies were 0.0097% (Ara h 1) or 0.029% (Ara h 2) of B cells in baseline blood from allergic patients and approximately 3-fold higher during immunotherapy. Five of 57 allergen-specific cells belonged to clones containing IgE-expressing members. Almost all allergen-specific antibodies were mutated, and binding to both conformational and linear allergen epitopes was detected. Increasing somatic mutation of IgG4 members of a clone was seen in immunotherapy, whereas IgE mutation levels in the clone did not increase.Most peanut allergen-binding B cells isolated by means of antigen-specific flow sorting express mutated and isotype-switched antibodies. Immunotherapy increases their frequency in the blood, and even narrowly defined allergen epitopes are recognized by numerous distinct B-cell clones in a patient. The results also suggest that oral immunotherapy can stimulate somatic mutation of allergen-specific IgG4.
View details for DOI 10.1016/j.jaci.2015.05.029
View details for PubMedID 26152318
B cells expressing IgE contribute to immunity against parasites and venoms and are the source of antigen specificity in allergic patients, yet the developmental pathways producing these B cells in human subjects remain a subject of debate. Much of our knowledge of IgE lineage development derives from model studies in mice rather than from human subjects.We evaluate models for isotype switching to IgE in human subjects using immunoglobulin heavy chain (IGH) mutational lineage data.We analyzed IGH repertoires in 9 allergic and 24 healthy adults using high-throughput DNA sequencing of 15,843,270 IGH rearrangements to identify clonal lineages of B cells containing members expressing IgE. Somatic mutations in IGH inherited from common ancestors within the clonal lineage are used to infer the relationships between B cells.Data from 613,641 multi-isotype B-cell clonal lineages, of which 592 include an IgE member, are consistent with indirect switching to IgE from IgG- or IgA-expressing lineage members in human subjects. We also find that these inferred isotype switching frequencies are similar in healthy and allergic subjects.We found evidence that secondary isotype switching of mutated IgG1-expressing B cells is the primary source of IgE in human subjects, with lesser contributions from precursors expressing other switched isotypes and rarely IgM or IgD, suggesting that IgE is derived from previously antigen-experienced B cells rather than naive B cells that typically express low-affinity unmutated antibodies. These data provide a basis from which to evaluate allergen-specific human antibody repertoires in healthy and diseased subjects.
View details for DOI 10.1016/j.jaci.2015.07.014
View details for PubMedID 26309181
Monoclonal antibodies derived from blood plasma cells of acute HIV-1-infected individuals are predominantly targeted to the HIV Env gp41 and cross-reactive with commensal bacteria. To understand this phenomenon, we examined anti-HIV responses in ileum B cells using recombinant antibody technology and probed their relationship to commensal bacteria. The dominant ileum B cell response was to Env gp41. Remarkably, a majority (82%) of the ileum anti-gp41 antibodies cross-reacted with commensal bacteria, and of those, 43% showed non-HIV-1 antigen polyreactivity. Pyrosequencing revealed shared HIV-1 antibody clonal lineages between ileum and blood. Mutated immunoglobulin G antibodies cross-reactive with both Env gp41 and microbiota could also be isolated from the ileum of HIV-1 uninfected individuals. Thus, the gp41 commensal bacterial antigen cross-reactive antibodies originate in the intestine, and the gp41 Env response in HIV-1 infection can be derived from a preinfection memory B cell pool triggered by commensal bacteria that cross-react with Env.
View details for DOI 10.1016/j.chom.2014.07.003
View details for Web of Science ID 000341144100011
B cells produce a diverse antibody repertoire by undergoing gene rearrangements. Pathogen exposure induces the clonal expansion of B cells expressing antibodies that can bind the infectious agent. To assess human B cell responses to trivalent seasonal influenza and monovalent pandemic H1N1 vaccination, we sequenced gene rearrangements encoding the immunoglobulin heavy chain, a major determinant of epitope recognition. The magnitude of B cell clonal expansions correlates with an individual's secreted antibody response to the vaccine, and the expanded clones are enriched with those expressing influenza-specific monoclonal antibodies. Additionally, B cell responses to pandemic influenza H1N1 vaccination and infection in different people show a prominent family of convergent antibody heavy chain gene rearrangements specific to influenza antigens. These results indicate that microbes can induce specific signatures of immunoglobulin gene rearrangements and that pathogen exposure can potentially be assessed from B cell repertoires.
View details for DOI 10.1016/j.chom.2014.05.013
View details for Web of Science ID 000341142600013
View details for PubMedID 24981332
Elderly humans show decreased humoral immunity to pathogens and vaccines, yet the effects of aging on B cells are not fully known. Chronic viral infection by CMV is implicated as a driver of clonal T cell proliferations in some aging humans, but whether CMV or EBV infection contributes to alterations in the B cell repertoire with age is unclear. We have used high-throughput DNA sequencing of IGH gene rearrangements to study the BCR repertoires over two successive years in 27 individuals ranging in age from 20 to 89 y. Some features of the B cell repertoire remain stable with age, but elderly subjects show increased numbers of B cells with long CDR3 regions, a trend toward accumulation of more highly mutated IgM and IgG Ig genes, and persistent clonal B cell populations in the blood. Seropositivity for CMV or EBV infection alters B cell repertoires, regardless of the individual's age: EBV infection correlates with the presence of persistent clonal B cell expansions, whereas CMV infection correlates with the proportion of highly mutated Ab genes. These findings isolate effects of aging from those of chronic viral infection on B cell repertoires and provide a baseline for understanding human B cell responses to vaccination or infectious stimuli.
View details for DOI 10.4049/jimmunol.1301384
View details for Web of Science ID 000329224000006
In order to identify novel somatic mutations associated with classic BCR/ABL1-negative myeloproliferative neoplasms, we performed high-coverage genome sequencing of DNA from peripheral blood granulocytes and cultured skin fibroblasts from a patient with MPL W515K-positive primary myelofibrosis. The primary myelofibrosis genome had a low somatic mutation rate, consistent with that observed in similar hematopoietic tumor genomes. Interfacing of whole-genome DNA sequence data with RNA expression data identified three somatic mutations of potential functional significance: a nonsense mutation in CARD6, implicated in modulation of NF-kappaB activation; a 19-base pair deletion involving a potential regulatory region in the 5'-untranslated region of BRD2, implicated in transcriptional regulation and cell cycle control; and a non-synonymous point mutation in KIAA0355, an uncharacterized protein. Additional mutations in three genes (CAP2, SOX30, and MFRP) were also evident, albeit with no support for expression at the RNA level. Re-sequencing of these six genes in 178 patients with polycythemia vera, essential thrombocythemia, and myelofibrosis did not identify recurrent somatic mutations in these genes. Finally, we describe methods for reducing false-positive variant calls in the analysis of hematologic malignancies with a low somatic mutation rate. This trial is registered with ClinicalTrials.gov (NCT01108159).
View details for DOI 10.3324/haematol.2013.092379
View details for PubMedID 23872309
Dengue is the most prevalent mosquito-borne viral disease in humans, and the lack of early prognostics, vaccines, and therapeutics contributes to immense disease burden. To identify patterns that could be used for sequence-based monitoring of the antibody response to dengue, we examined antibody heavy-chain gene rearrangements in longitudinal peripheral blood samples from 60 dengue patients. Comparing signatures between acute dengue, postrecovery, and healthy samples, we found increased expansion of B cell clones in acute dengue patients, with higher overall clonality in secondary infection. Additionally, we observed consistent antibody sequence features in acute dengue in the highly variable major antigen-binding determinant, complementarity-determining region 3 (CDR3), with specific CDR3 sequences highly enriched in acute samples compared to postrecovery, healthy, or non-dengue samples. Dengue thus provides a striking example of a human viral infection where convergent immune signatures can be identified in multiple individuals. Such signatures could facilitate surveillance of immunological memory in communities.
View details for DOI 10.1016/j.chom.2013.05.008
View details for Web of Science ID 000330851000009
Current human immunodeficiency virus-1 (HIV-1) vaccines elicit strain-specific neutralizing antibodies. However, cross-reactive neutralizing antibodies arise in approximately 20% of HIV-1-infected individuals, and details of their generation could provide a blueprint for effective vaccination. Here we report the isolation, evolution and structure of a broadly neutralizing antibody from an African donor followed from the time of infection. The mature antibody, CH103, neutralized approximately 55% of HIV-1 isolates, and its co-crystal structure with the HIV-1 envelope protein gp120 revealed a new loop-based mechanism of CD4-binding-site recognition. Virus and antibody gene sequencing revealed concomitant virus evolution and antibody maturation. Notably, the unmutated common ancestor of the CH103 lineage avidly bound the transmitted/founder HIV-1 envelope glycoprotein, and evolution of antibody neutralization breadth was preceded by extensive viral diversification in and near the CH103 epitope. These data determine the viral and antibody evolution leading to induction of a lineage of HIV-1 broadly neutralizing antibodies, and provide insights into strategies to elicit similar antibodies by vaccination.
View details for DOI 10.1038/nature12053
View details for PubMedID 23552890
Continuing research into the global multiple sequence alignment problem has resulted in more sophisticated and principled alignment methods. Unfortunately these new algorithms often require large amounts of time and memory to run, making it nearly impossible to run these algorithms on large datasets. As a solution, we present two general methods, Crumble and Prune, for breaking a phylogenetic alignment problem into smaller, more tractable sub-problems. We call Crumble and Prune meta-alignment methods because they use existing alignment algorithms and can be used with many current alignment programs. Crumble breaks long alignment problems into shorter sub-problems. Prune divides the phylogenetic tree into a collection of smaller trees to reduce the number of sequences in each alignment problem. These methods are orthogonal: they can be applied together to provide better scaling in terms of sequence length and in sequence depth. Both methods partition the problem such that many of the sub-problems can be solved independently. The results are then combined to form a solution to the full alignment problem.Crumble and Prune each provide a significant performance improvement with little loss of accuracy. In some cases, a gain in accuracy was observed. Crumble and Prune were tested on real and simulated data. Furthermore, we have implemented a system called Job-tree that allows hierarchical sub-problems to be solved in parallel on a compute cluster, significantly shortening the run-time.These methods enabled us to solve gigabase alignment problems. These methods could enable a new generation of biologically realistic alignment algorithms to be applied to real world, large scale alignment problems.
View details for DOI 10.1186/1471-2105-12-144
View details for Web of Science ID 000291658600002
View details for PubMedID 21569267
The ENCODE project is an international consortium with a goal of cataloguing all the functional elements in the human genome. The ENCODE Data Coordination Center (DCC) at the University of California, Santa Cruz serves as the central repository for ENCODE data. In this role, the DCC offers a collection of high-throughput, genome-wide data generated with technologies such as ChIP-Seq, RNA-Seq, DNA digestion and others. This data helps illuminate transcription factor-binding sites, histone marks, chromatin accessibility, DNA methylation, RNA expression, RNA binding and other cell-state indicators. It includes sequences with quality scores, alignments, signals calculated from the alignments, and in most cases, element or peak calls calculated from the signal data. Each data set is available for visualization and download via the UCSC Genome Browser (http://genome.ucsc.edu/). ENCODE data can also be retrieved using a metadata system that captures the experimental parameters of each assay. The ENCODE web portal at UCSC (http://encodeproject.org/) provides information about the ENCODE data and links for access.
View details for DOI 10.1093/nar/gkq1017
View details for Web of Science ID 000285831700136
View details for PubMedID 21037257
Levels of recombination vary among species, among chromosomes within species, and among regions within chromosomes in mammals. This heterogeneity may affect levels of diversity, efficiency of selection, and genome composition, as well as have practical consequences for the genetic mapping of traits. We compared the genetic maps to the genome sequence assemblies of rat, mouse, and human to estimate local recombination rates across these genomes. Humans have greater overall levels of recombination, as well as greater variance. In rat and mouse, the size of the chromosome and proximity to telomere have less effect on local recombination rate than in human. At the chromosome level, rat and mouse X chromosomes have the lowest recombination rates, whereas human chromosome X does not show the same pattern. In all species, local recombination rate is significantly correlated with several sequence variables, including GC%, CpG density, repetitive elements, and the neutral mutation rate, with some pronounced differences between species. Recombination rate in one species is not strongly correlated with the rate in another, when comparing homologous syntenic blocks of the genome. This comparative approach provides additional insight into the causes and consequences of genomic heterogeneity in recombination.
View details for DOI 10.1101/gr.1970304
View details for Web of Science ID 000220629900004
View details for PubMedID 15059993
The rates at which human genomic DNA changes by neutral substitution and insertion of certain families of transposable elements covary in large, megabase-sized segments. We used the rat, mouse, and human genomic DNA sequences to examine these processes in more detail in comparisons over both shorter (rat-mouse) and longer (rodent-primate) times, and demonstrated the generality of the covariation. Different families of transposable elements show distinctive insertion preferences and patterns of variation with substitution rates. SINEs are more abundant in GC-rich DNA, but the regional GC preference for insertion (monitored in young SINEs) differs between rodents and humans. In contrast, insertions in the rodent genomes are predominantly LINEs, which prefer to insert into AT-rich DNA in all three mammals. The insertion frequency of repeats other than SINEs correlates strongly positively with the frequency of substitutions in all species. However, correlations with SINEs show the opposite effects. The correlations are explained only in part by the GC content, indicating that other factors also contribute to the inherent tendency of DNA segments to change over evolutionary time.
View details for DOI 10.1101/gr.1984404
View details for Web of Science ID 000220629900003
View details for PubMedID 15059992
We define a "threaded blockset," which is a novel generalization of the classic notion of a multiple alignment. A new computer program called TBA (for "threaded blockset aligner") builds a threaded blockset under the assumption that all matching segments occur in the same order and orientation in the given sequences; inversions and duplications are not addressed. TBA is designed to be appropriate for aligning many, but by no means all, megabase-sized regions of multiple mammalian genomes. The output of TBA can be projected onto any genome chosen as a reference, thus guaranteeing that different projections present consistent predictions of which genomic positions are orthologous. This capability is illustrated using a new visualization tool to view TBA-generated alignments of vertebrate Hox clusters from both the mammalian and fish perspectives. Experimental evaluation of alignment quality, using a program that simulates evolutionary change in genomic sequences, indicates that TBA is more accurate than earlier programs. To perform the dynamic-programming alignment step, TBA runs a stand-alone program called MULTIZ, which can be used to align highly rearranged or incompletely sequenced genomes. We describe our use of MULTIZ to produce the whole-genome multiple alignments at the Santa Cruz Genome Browser.
View details for Web of Science ID 000220629900025
View details for PubMedID 15060014
The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution.
View details for DOI 10.1038/nature02426
View details for Web of Science ID 000220540100032
View details for PubMedID 15057822
We construct several score functions for use in locating unusually conserved regions in a genomewide search of aligned DNA from two species. We test these functions on regions of the human genome aligned to the mouse genome. These score functions are derived from properties of neutrally evolving sites on the mouse and human genome and can be adjusted to the local background rate of conservation. The aim of these functions is to try to identify regions of the human genome that are conserved by evolutionary selection because they have an important function, rather than by chance. We use them to get a very rough estimate of the amount of DNA in the human genome that is under selection.
View details for Web of Science ID 000222588300011
View details for PubMedID 15285898
The University of California Santa Cruz (UCSC) Table Browser (http://genome.ucsc.edu/cgi-bin/hgText) provides text-based access to a large collection of genome assemblies and annotation data stored in the Genome Browser Database. A flexible alternative to the graphical-based Genome Browser, this tool offers an enhanced level of query support that includes restrictions based on field values, free-form SQL queries and combined queries on multiple tables. Output can be filtered to restrict the fields and lines returned, and may be organized into one of several formats, including a simple tab- delimited file that can be loaded into a spreadsheet or database as well as advanced formats that may be uploaded into the Genome Browser as custom annotation tracks. The Table Browser User's Guide located on the UCSC website provides instructions and detailed examples for constructing queries and configuring output.
View details for DOI 10.1093/nar/gkh103
View details for Web of Science ID 000188079000117
View details for PubMedID 14681465
The University of California Santa Cruz (UCSC) Genome Browser Database is an up to date source for genome sequence data integrated with a large collection of related annotations. The database is optimized to support fast interactive performance with the web-based UCSC Genome Browser, a tool built on top of the database for rapid visualization and querying of the data at many levels. The annotations for a given genome are displayed in the browser as a series of tracks aligned with the genomic sequence. Sequence data and annotations may also be viewed in a text-based tabular format or downloaded as tab-delimited flat files. The Genome Browser Database, browsing tools and downloadable data files can all be found on the UCSC Genome Bioinformatics website (http://genome.ucsc.edu), which also contains links to documentation and related technical information.
View details for DOI 10.1093/nar/gkg129
View details for Web of Science ID 000181079700009
View details for PubMedID 12519945
Six measures of evolutionary change in the human genome were studied, three derived from the aligned human and mouse genomes in conjunction with the Mouse Genome Sequencing Consortium, consisting of (1) nucleotide substitution per fourfold degenerate site in coding regions, (2) nucleotide substitution per site in relics of transposable elements active only before the human-mouse speciation, and (3) the nonaligning fraction of human DNA that is nonrepetitive or in ancestral repeats; and three derived from human genome data alone, consisting of (4) SNP density, (5) frequency of insertion of transposable elements, and (6) rate of recombination. Features 1 and 2 are measures of nucleotide substitutions at two classes of "neutral" sites, whereas 4 is a measure of recent mutations. Feature 3 is a measure dominated by deletions in mouse, whereas 5 represents insertions in human. It was found that all six vary significantly in megabase-sized regions genome-wide, and many vary together. This indicates that some regions of a genome change slowly by all processes that alter DNA, and others change faster. Regional variation in all processes is correlated with, but not completely accounted for, by GC content in human and the difference between GC content in human and mouse.
View details for DOI 10.1101/gr.844103
View details for Web of Science ID 000180550800002
View details for PubMedID 12529302
The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.
View details for DOI 10.1038/nature01262
View details for Web of Science ID 000179611600053
View details for PubMedID 12466850
As vertebrate genome sequences near completion and research refocuses to their analysis, the issue of effective genome annotation display becomes critical. A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu. This browser displays assembly contigs and gaps, mRNA and expressed sequence tag alignments, multiple gene predictions, cross-species homologies, single nucleotide polymorphisms, sequence-tagged sites, radiation hybrid data, transposon repeats, and more as a stack of coregistered tracks. Text and sequence-based searches provide quick and precise access to any region of specific interest. Secondary links from individual features lead to sequence details and supplementary off-site databases. One-half of the annotation tracks are computed at the University of California, Santa Cruz from publicly available sequence data; collaborators worldwide provide the rest. Users can stably add their own custom tracks to the browser for educational or research purposes. The conceptual and technical framework of the browser, its underlying MYSQL database, and overall use are described. The web site currently serves over 50,000 pages per day to over 3000 different users.
View details for DOI 10.1101/gr.229102
View details for Web of Science ID 000176433700017
View details for PubMedID 12045153