I am a hybrid computer scientist, statistician and bioinformatitian generally interested in genome sciences and medicine. I have extensively worked on metagenomics, cancer genomics and structural variants. My goal is to push forward the field of genomic sciences, in particular personalized genomics and medicine, by integrating data, technology, computation and statistical modeling. My publications have addressed methodology developments in microbial community analysis and recently in structural variant and cancer genome analysis.

Honors & Awards

  • Travel Fellowship, Alzheimer’s Association International Conference (2016)
  • Reviewer's Choice Best Abstract, The American Society of Human Genetics Annual Meeting (2015)
  • Travel Fellowship, Bayer International Computational Biology Workshop (2014)
  • Dissertation Year Fellowship, University of Southern California (2012)
  • Merit Fellowship, University of Southern California (2006-2007)

Boards, Advisory Committees, Professional Organizations

  • Program Committee Co-chair, COMMAND workshop of the IEEE Bioinformatics and Biomedicine Conference 2015 (2015 - 2015)

Professional Education

  • Doctor of Philosophy, University of Southern California (2013)
  • Master of Science, University of Southern California, Los Angeles, US, Statistics (2012)
  • Master of Science, University of Southern California, Los Angeles, US, Computer Science (2008)
  • Master of Science, Fudan University, Shanghai, China, Physics (Theoretical Physics) (2006)
  • Bachelor of Science, Fudan University, Shanghai, China, Electronics Engineering (2003)

Stanford Advisors

Research & Scholarship

Lab Affiliations


All Publications

  • Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains BMC BIOINFORMATICS Xia, L. C., Ai, D., Cram, J. A., Liang, X., Fuhrman, J. A., Sun, F. 2015; 16
  • Efficient statistical significance approximation for local similarity analysis of high-throughput time series data BIOINFORMATICS Xia, L. C., Ai, D., Cram, J., Fuhrman, J. A., Sun, F. 2013; 29 (2): 230-237


    Local similarity analysis of biological time series data helps elucidate the varying dynamics of biological systems. However, its applications to large scale high-throughput data are limited by slow permutation procedures for statistical significance evaluation.We developed a theoretical approach to approximate the statistical significance of local similarity analysis based on the approximate tail distribution of the maximum partial sum of independent identically distributed (i.i.d.) random variables. Simulations show that the derived formula approximates the tail distribution reasonably well (starting at time points > 10 with no delay and > 20 with delay) and provides P-values comparable with those from permutations. The new approach enables efficient calculation of statistical significance for pairwise local similarity analysis, making possible all-to-all local association studies otherwise prohibitive. As a demonstration, local similarity analysis of human microbiome time series shows that core operational taxonomic units (OTUs) are highly synergetic and some of the associations are body-site specific across samples.The new approach is implemented in our eLSA package, which now provides pipelines for faster local similarity analysis of time series data. The tool is freely available from eLSA's website: data are available at Bioinformatics

    View details for DOI 10.1093/bioinformatics/bts668

    View details for Web of Science ID 000313722800011

    View details for PubMedID 23178636

  • Extended local similarity analysis (eLSA) of microbial community and other time series data with replicates BMC SYSTEMS BIOLOGY Xia, L. C., Steele, J. A., Cram, J. A., Cardon, Z. G., Simmons, S. L., Vallino, J. J., Fuhrman, J. A., Sun, F. 2011; 5


    The increasing availability of time series microbial community data from metagenomics and other molecular biological studies has enabled the analysis of large-scale microbial co-occurrence and association networks. Among the many analytical techniques available, the Local Similarity Analysis (LSA) method is unique in that it captures local and potentially time-delayed co-occurrence and association patterns in time series data that cannot otherwise be identified by ordinary correlation analysis. However LSA, as originally developed, does not consider time series data with replicates, which hinders the full exploitation of available information. With replicates, it is possible to understand the variability of local similarity (LS) score and to obtain its confidence interval.We extended our LSA technique to time series data with replicates and termed it extended LSA, or eLSA. Simulations showed the capability of eLSA to capture subinterval and time-delayed associations. We implemented the eLSA technique into an easy-to-use analytic software package. The software pipeline integrates data normalization, statistical correlation calculation, statistical significance evaluation, and association network construction steps. We applied the eLSA technique to microbial community and gene expression datasets, where unique time-dependent associations were identified.The extended LSA analysis technique was demonstrated to reveal statistically significant local and potentially time-delayed association patterns in replicated time series data beyond that of ordinary correlation analysis. These statistically significant associations can provide insights to the real dynamics of biological systems. The newly designed eLSA software efficiently streamlines the analysis and is freely available from the eLSA homepage, which can be accessed at

    View details for DOI 10.1186/1752-0509-5-S2-S15

    View details for Web of Science ID 000301987000015

    View details for PubMedID 22784572

  • Accurate Genome Relative Abundance Estimation Based on Shotgun Metagenomic Reads PLOS ONE Xia, L. C., Cram, J. A., Chen, T., Fuhrman, J. A., Sun, F. 2011; 6 (12)


    Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy) by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy). GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data-sets) in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based) even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes.

    View details for DOI 10.1371/journal.pone.0027992

    View details for Web of Science ID 000298173500008

    View details for PubMedID 22162995

  • Phase transition in sequence unique reconstruction JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY Xia, L., Zhou, C. 2007; 20 (1): 18-29
  • Pan-cancer analysis of the extent and consequences of intratumor heterogeneity NATURE MEDICINE Andor, N., Graham, T. A., Jansen, M., Xia, L. C., Aktipis, C. A., Petritsch, C., Ji, H. P., Maley, C. C. 2016; 22 (1): 105-?

    View details for DOI 10.1038/nm.3984

    View details for Web of Science ID 000367590700022

  • Cross-depth analysis of marine bacterial networks suggests downward propagation of temporal changes ISME JOURNAL Cram, J. A., Xia, L. C., Needham, D. M., Sachdeva, R., Sun, F., Fuhrman, J. A. 2015; 9 (12): 2573-2586


    Interactions among microbes and stratification across depths are both believed to be important drivers of microbial communities, though little is known about how microbial associations differ between and across depths. We have monitored the free-living microbial community at the San Pedro Ocean Time-series station, monthly, for a decade, at five different depths: 5 m, the deep chlorophyll maximum layer, 150 m, 500 m and 890 m (just above the sea floor). Here, we introduce microbial association networks that combine data from multiple ocean depths to investigate both within- and between-depth relationships, sometimes time-lagged, among microbes and environmental parameters. The euphotic zone, deep chlorophyll maximum and 890 m depth each contain two negatively correlated 'modules' (groups of many inter-correlated bacteria and environmental conditions) suggesting regular transitions between two contrasting environmental states. Two-thirds of pairwise correlations of bacterial taxa between depths lagged such that changes in the abundance of deeper organisms followed changes in shallower organisms. Taken in conjunction with previous observations of seasonality at 890 m, these trends suggest that planktonic microbial communities throughout the water column are linked to environmental conditions and/or microbial communities in overlying waters. Poorly understood groups including Marine Group A, Nitrospina and AEGEAN-169 clades contained taxa that showed diverse association patterns, suggesting these groups contain multiple ecological species, each shaped by different factors, which we have started to delineate. These observations build upon previous work at this location, lending further credence to the hypothesis that sinking particles and vertically migrating animals transport materials that significantly shape the time-varying patterns of microbial community composition.

    View details for DOI 10.1038/ismej.2015.76

    View details for Web of Science ID 000365094400004

    View details for PubMedID 25989373

  • Emergence of Hemagglutinin Mutations During the Course of Influenza Infection. Scientific reports Cushing, A., Kamali, A., Winters, M., Hopmans, E. S., Bell, J. M., Grimes, S. M., Xia, L. C., Zhang, N. R., Moss, R. B., Holodniy, M., Ji, H. P. 2015; 5: 16178-?


    Influenza remains a significant cause of disease mortality. The ongoing threat of influenza infection is partly attributable to the emergence of new mutations in the influenza genome. Among the influenza viral gene products, the hemagglutinin (HA) glycoprotein plays a critical role in influenza pathogenesis, is the target for vaccines and accumulates new mutations that may alter the efficacy of immunization. To study the emergence of HA mutations during the course of infection, we employed a deep-targeted sequencing method. We used samples from 17 patients with active H1N1 or H3N2 influenza infections. These patients were not treated with antivirals. In addition, we had samples from five patients who were analyzed longitudinally. Thus, we determined the quantitative changes in the fractional representation of HA mutations during the course of infection. Across individuals in the study, a series of novel HA mutations directly altered the HA coding sequence were identified. Serial viral sampling revealed HA mutations that either were stable, expanded or were reduced in representation during the course of the infection. Overall, we demonstrated the emergence of unique mutations specific to an infected individual and temporal genetic variation during infection.

    View details for DOI 10.1038/srep16178

    View details for PubMedID 26538451

  • Extended Local Similarity Analysis (eLSA) of Biological Data Encyclopedia of Metagenomics: Genes, Genomes and Metagenomes. Basics, Methods, Databases and Tools Sun, F., Xia, L. edited by Nelson, K. Springer. 2014
  • Accurate Genome Relative Abundance Estimation Based on Shotgun Metagenomic Reads Encyclopedia of Metagenomics: Genes, Genomes and Metagenomes. Basics, Methods, Databases and Tools Sun, F., Xia, L. C. edited by Nelson, K. Springer. 2014
  • A Quantitative Evaluation of Health Care System in US, China, and Sweden Health Med Wang, Q., Li, M., Zu, H., Gao, M., Cao, C., Xia, L. C. 2013; 7 (4)
  • Genetic analysis of differentiation of T-helper lymphocytes GENETICS AND MOLECULAR RESEARCH Wang, Q., Li, M., Xia, L. C., Wen, G., Zu, H., Gao, M. 2013; 12 (2): 972-987


    In the human immune system, T-helper cells are able to differentiate into two lymphocyte subsets: Th1 and Th2. The intracellular signaling pathways of differentiation form a dynamic regulation network by secreting distinctive types of cytokines, while differentiation is regulated by two major gene loci: T-bet and GATA-3. We developed a system dynamics model to simulate the differentiation and re-differentiation process of T-helper cells, based on gene expression levels of T-bet and GATA-3 during differentiation of these cells. We arrived at three ultimate states of the model and came to the conclusion that cell differentiation potential exists as long as the system dynamics is at an unstable equilibrium point; the T-helper cells will no longer have the potential of differentiation when the model reaches a stable equilibrium point. In addition, the time lag caused by expression of transcription factors can lead to oscillations in the secretion of cytokines during differentiation.

    View details for DOI 10.4238/2013.April.2.13

    View details for Web of Science ID 000320030100011

    View details for PubMedID 23613243

  • Marine bacterial, archaeal and protistan association networks reveal ecological linkages ISME JOURNAL Steele, J. A., Countway, P. D., Xia, L., Vigil, P. D., Beman, J. M., Kim, D. Y., Chow, C. T., Sachdeva, R., Jones, A. C., Schwalbach, M. S., Rose, J. M., Hewson, I., Patel, A., Sun, F., Caron, D. A., Fuhrman, J. A. 2011; 5 (9): 1414-1425


    Microbes have central roles in ocean food webs and global biogeochemical processes, yet specific ecological relationships among these taxa are largely unknown. This is in part due to the dilute, microscopic nature of the planktonic microbial community, which prevents direct observation of their interactions. Here, we use a holistic (that is, microbial system-wide) approach to investigate time-dependent variations among taxa from all three domains of life in a marine microbial community. We investigated the community composition of bacteria, archaea and protists through cultivation-independent methods, along with total bacterial and viral abundance, and physico-chemical observations. Samples and observations were collected monthly over 3 years at a well-described ocean time-series site of southern California. To find associations among these organisms, we calculated time-dependent rank correlations (that is, local similarity correlations) among relative abundances of bacteria, archaea, protists, total abundance of bacteria and viruses and physico-chemical parameters. We used a network generated from these statistical correlations to visualize and identify time-dependent associations among ecologically important taxa, for example, the SAR11 cluster, stramenopiles, alveolates, cyanobacteria and ammonia-oxidizing archaea. Negative correlations, perhaps suggesting competition or predation, were also common. The analysis revealed a progression of microbial communities through time, and also a group of unknown eukaryotes that were highly correlated with dinoflagellates, indicating possible symbioses or parasitism. Possible 'keystone' species were evident. The network has statistical features similar to previously described ecological networks, and in network parlance has non-random, small world properties (that is, highly interconnected nodes). This approach provides new insights into the natural history of microbes.

    View details for DOI 10.1038/ismej.2011.24

    View details for Web of Science ID 000295782900003

    View details for PubMedID 21430787

  • PPLook: an automated data mining tool for protein-protein interaction BMC BIOINFORMATICS Zhang, S., Li, Y., Xia, L., Pan, Q. 2010; 11


    Extracting and visualizing of protein-protein interaction (PPI) from text literatures are a meaningful topic in protein science. It assists the identification of interactions among proteins. There is a lack of tools to extract PPI, visualize and classify the results.We developed a PPI search system, termed PPLook, which automatically extracts and visualizes protein-protein interaction (PPI) from text. Given a query protein name, PPLook can search a dataset for other proteins interacting with it by using a keywords dictionary pattern-matching algorithm, and display the topological parameters, such as the number of nodes, edges, and connected components. The visualization component of PPLook enables us to view the interaction relationship among the proteins in a three-dimensional space based on the OpenGL graphics interface technology. PPLook can also provide the functions of selecting protein semantic class, counting the number of semantic class proteins which interact with query protein, counting the literature number of articles appearing the interaction relationship about the query protein. Moreover, PPLook provides heterogeneous search and a user-friendly graphical interface.PPLook is an effective tool for biologists and biosystem developers who need to access PPI information from the literature. PPLook is freely available for non-commercial users at

    View details for DOI 10.1186/1471-2105-11-326

    View details for Web of Science ID 000280331700002

    View details for PubMedID 20550717

  • Oligonucleotide profiling for discriminating bacteria in bacterial communities COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING He, P., Xia, L. 2007; 10 (4): 247-255


    Based on the relative ratios of di- and tri-nucleotides in the DNA sequences, the profiles of 164 genome sequences from 152 representative microbial organisms were computed. By comparing the profiles of the genomes and their substrings with length 500 bps, the fluctuations of the relative abundances of di- and tri-nucleotides of these genomic sequences were analyzed. A new method to discriminate the origins of orphan DNA sequences was proposed, and the origins of 17 uncultured bacterium sequences from a bacterial community in the human gut were postulated and discussed.

    View details for Web of Science ID 000247022400002

    View details for PubMedID 17506707