Recent Publications

  • Mechanisms for U2AF to define 3 ' splice sites and regulate alternative splicing in the human genome

    Details PDF Dataset Project Custom Link

  • MicroRNA Directly Enhances Mitochondrial Translation during Muscle Differentiation

    Details PDF Project

  • Multiplex analysis of polyA-linked sequences (MAPS): an RNA-seq strategy to profile poly(A+) RNA

    Details PDF Project

  • SR proteins collaborate with 7SK and promoter-associated nascent RNA to release paused polymerase

    Details PDF Dataset Project

  • Genome-wide analysis reveals SR protein cooperation and competition in regulated splicing

    Details PDF Project

  • Transcriptome and Proteome Exploration to Provide a Resource for the Study of Agrocybe aegerita

    Details PDF Project

  • Direct conversion of fibroblasts to neurons by reprogramming PTB-regulated microRNA circuits

    Details PDF Dataset Project

  • Deep insight into the Ganoderma lucidum by comprehensive analysis of its transcriptome

    Details PDF Project

  • Nuclear matrix factor hnRNP U/SAF-A exerts a global control of alternative splicing by regulating U2 snRNP maturation.

    Details PDF Project

Recent Posts

This is a home for various data science projects, where I try to analyse some interesting collaborate datasets, build pretty and informative graphics. Example: Document your workflow In this example, the coder doesn’t need to present every line of code, but rather needs to present the overall process of loading, crunching, and reporting the data, so another scientist can understand the whole process, and if necessary, replicate it. References, links, and provenance of data files are more important here, so the reader can understand where the data sets are coming from.

Read more

I picked up the R programming language during my MSc at University of California San Diego, and use it constantly in my day-job, along with some Python. For fun I sometimes apply these tools to interesting-looking datasets that are lying around the web, and try to tell their stories through well-designed data visualisations. Some blog posts are mirrored on R-bloggers, a blogging community for the R language. Useful bash one-liners useful for bioinformatics (and some, more generally useful).

Read more

Projects

Teaching

I am a senior training instructor for the following subjects in the company:

Advanced Data Science

Provides an intensive introduction to applied statistics and data visualization. Trains people to become data analyst capable of both applied data analysis and critical evaluation of statistical methods. Since both data analysis and methods development require substantial hands-on experience, focuses on hands-on data analysis.

Genomic Data Visualization

With Chao Chen and Joseph Wei, we created a course specialization in genomic data visualization. Some cool things about the program are:

  • Unix & High-Performance Computing

  • NGS data analysis (RNA-Seq, ChIP-Seq, Variant calling)

  • Statistical Visualization using R/ggplot2

Every training runs every month. It is designed from the ground up to cover modern genomics including Python, R, ggplot2, Bioconductor, statistics, computing, and genomic technologies. All of my lecture materials for these are open source and available from the website

Contact