Key Projects
- Toxic Silicosis Study: In this project, I took the lead on the RNA-Seq transcriptomics analysis to improve data quality for public health guidelines on crystalline silica exposure in smokers. Using bioinformatics tools such as limma-voom, I identified differentially expressed genes (DEGs) and assessed their biological significance. I further enhanced the analysis by generating informative visualizations in R, which helped in the interpretation of results. Additionally, I employed Gene Set Enrichment Analysis (GSEA) and KEGG pathway analysis to uncover relevant biological pathways involved in the toxic effects of silica exposure. The successful completion of this project has resulted in a manuscript under submission, contributing to the development of evidence-driven insights into the molecular mechanisms underlying silicosis and refining public health guidelines.
- Legionella Nextflow Workflow Development: Developed a custom Nextflow pipeline for genomic analysis of Legionella isolates, designed for scalability and ease of use by public health teams. The pipeline processes FASTQ and FASTA inputs, generating a QC report, phylogenetic tree, and SNP distance matrix. It integrates species identification using MASH, SBT via El Gato, and a reference-free phylogenetic method to compare isolate relatedness. This workflow automates data processing and analysis, providing essential insights for epidemiological surveillance of Legionella.
- Sourmash Nextflow Workflow: Developed and implemented the sourmash bioinformatics pipeline, designed for comprehensive microbial genomic data analysis. The pipeline supports both FASTA and FASTQ input formats, incorporating modules for quality control (FastQC, Fastp), genomic sketching (Sourmash), and multisearch analysis. It automates the generation of similarity comparisons and Average Nucleotide Identity (ANI) values, producing detailed outputs summarized in a final MultiQC report. Built using Nextflow, the pipeline ensures portability across compute infrastructures and high reproducibility, leveraging Docker/Singularity containers for seamless installation and deployment.
- Crankshaft Contribution: Contributed to the Crankshaft project, a Rust-based headless workflow engine designed for scalable bioinformatics analysis.
- Aquascope & C-WAP Pipeline Development: Contributed to the enhancement of the CDCgov/aquascope pipeline, a best-practice bioinformatics tool for early detection of SARS-CoV-2 variants of concern from wastewater using shotgun metagenomic sequencing. Built with Nextflow, the pipeline ensures portability and reproducibility across various compute infrastructures, utilizing Docker/Singularity containers for easy installation. My role involved modifying the pipeline to incorporate both short and long reads as input, upgrading software versions, and debugging issues to enhance functionality. The pipeline includes key modules for read QC (FastQC), trimming (Fastp), read alignment (Minimap2), variant classification (Freyja), and quality reporting (MultiQC).
- AI/ML for Cervical Cancer: Applied machine learning (KNN, SVM, Random Forest) to develop predictive models, improving early detection of cervical cancer.
- Single-Cell RNA-Seq (scRNA-Seq) Analysis Using Seurat: Conducted in-depth analysis of single-cell RNA-Seq datasets, applying Seurat for quality control (QC), normalization, clustering, and differential expression analysis.
- Integration of Multi-Modal Data: Combined single-cell transcriptomics data (scRNA-Seq) with other modalities like scATAC-Seq, using Seurat and other R packages for integrative analysis, enhancing the understanding of cellular regulation and gene expression dynamics.