I am a Bioinformatics Research Scientist at AB Life Inc. Department of Genomic Data Analysis, Wuhan, China
My current goals are to use cutting-edge computational and statistical learning methods alongside sequencing and biochemical techniques to advance basic science and therapeutics
There are 4 major focus areas in my daily work:
• Built or contributed to several projects that have dramatically sped up our computational biology workflow, including whole exome sequencing (quality control, mapping, re-alignment, variant calls, prioritization), RNA-seq (quantification and de novo transcriptome analysis), genome assembly (reference guided and de novo) and epigenomics.
• Develop new method for microbial community analysis to study individual microbes, pathogens and their influences of host diet and environment.
• Devise and execute translational bioinformatics and clinical research project by integrating genomic high-throughput technology associated with medical records.
• Create SOPs through standardization and maintenance of summary reports, supervise data analysis and customized graphing presentation in the company.
PhD in Biomedical Sciences, 2014
Wuhan University
Visiting graduate in Bioinformatics, 2011
Department of Cellular and Molecular Medicine in UC San Diego
BSc in Software Engineering, 2009
Wuhan University
This is a home for various data science projects, where I try to analyse some interesting collaborate datasets, build pretty and informative graphics. Example: Document your workflow In this example, the coder doesn’t need to present every line of code, but rather needs to present the overall process of loading, crunching, and reporting the data, so another scientist can understand the whole process, and if necessary, replicate it. References, links, and provenance of data files are more important here, so the reader can understand where the data sets are coming from.
I picked up the R programming language during my MSc at University of California San Diego, and use it constantly in my day-job, along with some Python. For fun I sometimes apply these tools to interesting-looking datasets that are lying around the web, and try to tell their stories through well-designed data visualisations. Some blog posts are mirrored on R-bloggers, a blogging community for the R language. Useful bash one-liners useful for bioinformatics (and some, more generally useful).
General splicing analysis materials.
General BS-seq analysis materials.
General CLIP-seq analysis materials.
General ChIP-seq analysis materials.
General Exome-seq analysis materials.
General Hi-C analysis materials.
General Metagenomics analysis materials.
General RNA-seq analysis materials.
General sRNA-seq analysis materials.
I am a senior training instructor for the following subjects in the company:
Provides an intensive introduction to applied statistics and data visualization. Trains people to become data analyst capable of both applied data analysis and critical evaluation of statistical methods. Since both data analysis and methods development require substantial hands-on experience, focuses on hands-on data analysis.
With Chao Chen and Joseph Wei, we created a course specialization in genomic data visualization. Some cool things about the program are:
Unix & High-Performance Computing
NGS data analysis (RNA-Seq, ChIP-Seq, Variant calling)
Statistical Visualization using R/ggplot2
Every training runs every month. It is designed from the ground up to cover modern genomics including Python, R, ggplot2, Bioconductor, statistics, computing, and genomic technologies. All of my lecture materials for these are open source and available from the website