A list of projects I've led or contributed to. All are open source and available on GitHub:
Typescript, PL/pgSQL, PostGraphile, Rust, Python, Next.js, TailwindCSS, Docker
RummaGEO
Automatically generated signatures from GEO

The Gene Expression Omnibus (GEO) is a major open biomedical research repository for transcriptomics and other omics datasets. It currently contains millions of gene expression samples from tens of thousands of studies collected by many biomedical research laboratories from around the world. While users of the GEO repository can search the metadata describing studies and samples for locating relevant studies, there is currently no method or resource that facilitates global search of GEO at the data level. To address this shortcoming, we developed RummaGEO, a webserver application that enables gene expression signature search against all human and mouse RNA-seq studies deposited into GEO. To enable such a search engine, we performed offline automatic identification of conditions from uniformly aligned GEO studies available from ARCHS4, and then computed differential expression signatures to extract gene sets from these signatures. In total, RummaGEO currently contains 178,975 human and 203,427 mouse gene sets from 30,576 GEO studies. Overall, RummaGEO provides an unprecedented resource for the biomedical research community enabling hypotheses generation for many future studies.

React, PL/pgSQL, Python, Next.js, MaterialUI, Docker
TargetRanger
Immunotherapy target discovery

TargetRanger is a web-server application that identifies targets from user-inputted RNA-seq samples collected from the cells we wish to target. By comparing the inputted samples with processed RNA-seq and proteomics data from several atlases, TargetRanger identifies genes that are highly expressed in the target cells while lowly expressed across normal human cell types, tissues, and cell lines.

Python, Flask, Docker
D2H2
Diabetes Data and Hypothesis Hub (D2H2)

There is a rapid growth in the production of omics datasets collected by the diabetes research community. However, such published data are underutilized for knowledge discovery. To make bioinformatics tools and published omics datasets from the diabetes field more accessible to biomedical researchers, we developed the Diabetes Data and Hypothesis Hub (D2H2).

Python
MadHappy
Applying real-time video filters based on emotional state

We use a deep learning model to detect the user's emotional state in real-time. Based on the user's emotional state, we apply a video filter to the user's face. We use Tensorflow to detect the user's emotional state and the OpenCV library and computer vision to apply the video filter.

Python, Tensorflow
Plant Cell Segmentor
Deep learning fraemwork for plant cell segmentation

A deep learning framework for 2d plant cell segmentation. This model was developed based off the work of Wolny et. al’s paper “Accurate and versatile 3D segmentation of plant tissues at cellular resolution.” The model architecture exists in segmentor.py and the training and testing as well as visualization functions exist in assignment.py. Data is available at https://osf.io/uzq3w and the specific sets that were used in preprocessing and testing were the LateralRootPrimordia images in the test and train folders. With only a small set of these images a 97% accuracy was achieved.

Python, Appyter, Jinja2
Tumor Gene Target Screener (Appyter)
Gene expression across human cell types and tissues

This Appyter is inspired by the work of Bosse, Kristopher R et al. which compared neurobastomas vs normal tissue in GTEx to identify a promising candidate immunotherapeutic target. The goal is to allow rapid screening of targets with the help of normal tissue data from GTEx and GEO data through ARCHS4, as well as single-cell data from Tabula Sapiens and the Human Cell Atlas. The Appyter takes tumor expression data and attempts to rank significantly differentially expressed genes when compared with with either bulk RNA-seq data from GTEx or ARCHS4, or single-cell RNA-seq data from Tabula Sapiens or Human Cell Atlas, across all tissues. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. GTEx Version 8 gene counts was processed to produce gene summary statistics. ARCHS4 provides access to gene counts from HiSeq 2000, HiSeq 2500 and NextSeq 500 platforms for human and mouse experiments from GEO and SRA. We processed ARCHS4 Version 11 to produce gene summary statistics. The Tabula Sapiens dataset was created by the The Tabula Sapiens Consortium. We processed the Tabula Sapiens dataset to produce gene summary statistics. The Human Cell Atlas provides access to single-cell data contributed by the scientific community. We combined and processed 15 datasets from the Human Cell Atlas to produce gene summary statistics. Immunotherapeutic candidates must have limited expression in normal tissues to be considered safe targets, so proteomic visualizations of the highly expressed genes in normal tissues may be useful in assessing gene candidacy. Proteomics data were obtained from the Human Protein Atlas with IHC-based expression profiling, the Human Proteome Map with MS-based expression quantification, and a GTEx proteome project using TMT MS.

Python, Jinja2, Svelte, Appyter, Docker
Multiomics2Targets
Identification of Cell Surface Targets and Driver Kinases from Multiomics Data

The availability of data from the profiling of cancer patients with multiomics technologies is rapidly increasing. However, integrative analysis of such data for knowledge extraction and practical hypotheses generation for clinical applications is not trivial. Here we present Multiomics2Targets, a bioinformatics workflow that enables users to upload three data matrices collected from the same cohorts of cancer patients. After uploading transcriptomics, proteomics, and phosphoproteomics data matrices as well as accompanying metadata, Multiomics2Targets produces a report that resembles a research publication. The uploaded data matrices are processed, analyzed, and visualized using the tools Enrichr, KEA3, ChEA3, Expression2Kinases, and TargetRanger to produce ~80 figures and ~30 tables. Figure and table legends, as well as descriptions of the methods and results are provided. The reports include an abstract, an introduction, methods, results, discussion, conclusions, and references sections. Multiomics2Targets reports can be exported as PDF or Jupyter Notebooks, and can be cited. Additionally, since the pipeline is implemented as a Jupyter Notebook, the source code used to perform the analysis and produce the report is embedded within the report and can be easily viewed, modified, and run locally. Multiomics2Targets can be used to perform alternative analyses when only one or two omics datasets are uploaded..

Python, Flask, Docker
lncHUB2
Functional predictions of human long non-coding RNAs

A long non-coding RNA (lncRNA) is a transcript with more than 200 nucleotides that is not translated into protein. Based on gene-gene co-expression correlations created from ARCHS4's processed RNA-seq samples, we present 18,705 human and 11,274 mouse landing pages for long non-coding RNAs that include expression statistics across tissues and cell lines, predicted biological functions, pathway membership, subcellular localization, and predicted small molecules and CRISPR KO genes that may regulate their expression.

MATLAB
Hippocampal Replay
Simutaing Hippocampal Replay with Reinforcement Learning

The hippocampus is a brain region that plays a key role in memory formation and recall. The hippocampus replays memories, which is thought to be important for memory consolidation. However, the mechanisms underlying hippocampal replay are not well understood. In this project, we use reinforcement learning to simulate hippocampal replay. We train an agent to navigate a maze and then replay the agent's trajectory.

React, PL/pgSQL, Python, Next.js, MaterialUI, Docker
GeneRanger
Gene and transcript Expression across human tissue and cell atlases

GeneRanger is a web-server application that provides access to processed data about the expression of human genes and proteins across human cell types, tissues, and cell lines from several atlases. A sister-site to TargetRanger

Python, Appyter, Jinja2
Gene Expression across Cells and Tissues (Appyter)
Gene expression across human cell types and tissues

The Gene Expression across Cells and Tissues Appyter takes as input a human gene symbol to produce box plots that display its expression across human cell types and tissues at the mRNA and protein levels. This appyter utilizes normal tissue gene and protein expression from GTEx, ARCHS4, and the Tabula Sapiens, the Human Protein Atlas, the Human Proteome Map, the GTEx proteome project, and the CCLE. GTEx Version 8 and the ARCHS4 Version 11 gene counts were processed to produce gene summary statistics for cell types and tissues. The Tabula Sapiens dataset was processed to produce expression values for all human genes in 469 cell types from 456,101 single cells collected from 14 donors. Proteomics data were obtained from the Human Protein Atlas with IHC-based expression profiling, the Human Proteome Map with MS-based expression quantification, and a GTEx proteome project using TMT MS.

Typescript, Next.js, TailwindCSS, MaterialUI
My Personal Site
The site you're on right now

This site was built using Next.js and TailwindCSS and is hosted on Vercel.

Mail

GitHub

LinkedIn

ORCID

Google Scholar