Software, Algorithms, and Computational Methods
The LSP approach combines computation and data collection, which requires developing new computational methods, algorithms, and software. Generally, we release our code on GitHub under permissive open-source licenses (see Research Reproducibility).
Our software methods range from machine-learning-assisted visualization to constructing and interrogating large-scale biological networks. Some projects yield software that is primarily useful in a specific setting, while other projects lead to software that we continue to develop and maintain. Visit the dedicated websites for each software to learn more.
Multiple-choice microscopy pipeline (MCMICRO)
MCMICRO is the end-to-end processing pipeline for multiplexed whole tissue imaging and tissue microarrays. It comprises stitching and registration, segmentation, and single-cell feature extraction. Each step of the pipeline is containerized to enable portable deployment across an array of compute environments, including local machines, job-scheduling clusters and cloud environments like AWS. The pipeline execution is implemented in Nextflow, a workflow language that facilitates caching of partial results, dynamic restarts, extensive logging and resource usage reports.
Publication: Schapiro D, Sokolov A, Yapp C, Chen Y-A, Muhlich JL, Hess J, Creason AL, Nirmal AJ, Baker GJ, Nariya MK, Lin J-R, Maliga Z, Jacobson CA, Hodgman MW, Ruokonen J, Farhi SL, Abbondanza D, McKinley ET, Persson D, Betts C, Sivagnanam S, Regev A, Goecks J, Coffey RJ, Coussens LM, Santagata S, Sorger PK. MCMICRO: a scalable, modular image-processing pipeline for multiplexed tissue imaging. Nat Methods. 2022 Mar;19(3):311–315. Available from: https://doi.org/10.1038/s41592-021-01308-y
Integrated Network and Dynamical Reasoning Assembler (INDRA)
INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system, originally developed for molecular systems biology and currently being generalized to other domains. INDRA draws on natural language processing systems and structured databases to collect mechanistic and causal assertions, represents them in a standardized form (INDRA Statements), and assembles them into various modeling formalisms including causal graphs and dynamical models.
At the core of INDRA are its knowledge-level assembly procedures, allowing sources to be assembled into coherent models, a process that involves correcting systematic input errors, finding and resolving redundancies, inferring missing information, filtering to a relevant scope and assessing the reliability of causal information.
Publication: Gyori BM, Bachman JA, Subramanian K, Muhlich JL, Galescu L, Sorger PK. From word models to executable models of signaling networks using automated assembly. Molecular Systems Biology. 2017 Nov 24;13(11):954. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5731347/ PMCID: PMC5731347.
Alignment by Simultaneous Harmonization of Layer/Adjacency Registration (ASHLAR)
ASHLAR (Alignment by Simultaneous Harmonization of Layer/Adjacency Registration) is an open source Python package that stiches together successive microscopy image tiles to generate a single, seamless image. ASHLAR also registers images from different fluorescent channels at a high level of accuracy.
Publication: Muhlich J, Chen Y-A, Russell D, Sorger PK. Stitching and registering highly multiplexed whole slide images of tissues and tumors using ASHLAR software. 2021 Apr 21; Available from: http://biorxiv.org/lookup/doi/10.1101/2021.04.20.440625
shinyDepMap is a web-tool to explore the Cancer Dependency Map (DepMap) project data of the Broad Institute (version 19q3), to help users identify and characterize essential genes among all protein-encoding genes. DepMap is a powerful drug discovery tool. Its portal website provides individual genes’ dependency from both CRISPR and RNAi data. We combined the CRISPR and RNAi dependency data into a unified score, and built this non-programmer biologist-friendly tool, shinyDepMap, which allows 1) to compare efficacy and selectivity of a gene loss across 15,847 protein-encoding genes, and 2) genes that work together in pathways or complexes.
This web-tool can be used to predict the efficacy and selectivity of future drugs with a known target gene, identify targets of highly selective drugs, identify maximally sensitive cell lines for testing a drug, “Target hop”, i.e., navigate from an undruggable protein with the desired selectivity profile to more druggable targets with similar profiles, and identify novel pathways needed to cancer cell growth and survival.
Publication: Shimada K, Bachman JA, Muhlich JL, Mitchison TJ. shinyDepMap, a tool to identify targetable cancer genes and their functional connections from Cancer Dependency Map data. Elife. 2021 Feb 8;10. PMCID: PMC7924953.
The Small Molecule Suite
The Small Molecule Suite (SMS) is a free, open-access tool developed by the Harvard Program in Therapeutic Sciences (HiTS) and funded by the NIH. The goal of the SMS is to help scientists understand and work with the targets of molecular probes, approved drugs and other drug-like molecules, while acknowledging the complexity of polypharmacology — the phenomenon that virtually all drug-like molecules bind multiple target proteins. The SMS combines data from the ChEMBL database with pre-published data from the Laboratory of Systems Pharmacology. The methodology of calculating selectivities and similarities are explained in Moret et al. Cell Chem Biol 2019 (which can also be used to cite the Small Molecule Suite).
Publication: Moret N, Clark NA, Hafner M, Wang Y, Lounkine E, Medvedovic M, Wang J, Gray N, Jenkins J, Sorger PK. Cheminformatics Tools for Analyzing and Designing Optimized Small-Molecule Collections and Libraries. Cell Chem Biol. 2019 May 16;26(5):765-777.e3. PMCID: PMC6526536.
Systemic Lymphoid Architecture Response Assessment (SYLARAS)
SYLARAS (SYstemic Lymphoid Architecture Response ASsessment) is a preclinical research platform for the interrogation of systemic immune response to disease and therapy. The approach combines multiplex immunophenotyping with biological computation to transform complex single-cell datasets into a visual compendium of the time and tissue-dependent changes occurring in immune cell frequency and/or function in response to an arbitrary immune stimulus (e.g. tumor model, infectious or autoimmune disease, vaccine, immunotherapy, etc.).
SYLARAS is deployed in three stages. In the first stage, longitudinal immunophenotyping data are collected from mouse lymphoid organs of test and control subjects in a high-throughput manner by multiplex flow cytometry. In the second stage, raw FCS files are spectrally compensated and filtered for viable cells before undergoing a systematic immune cell subset identification procedure. In the final stage, the pre-processed data are computationally analyzed using an open-source data analysis tool scripted in the Python programming language and run at the command-line of a personal computer. This leads to the generation of a set of data-rich graphical dashboards (1 per immune cell type) that together portray systemic immune response to a given experimental perturbation.
Publication: Baker GJ, Muhlich JL, Palaniappan SK, Moore JK, Davis SH, Santagata S, Sorger PK. SYLARAS: A Platform for the Statistical Analysis and Visual Display of Systemic Immunoprofiling Data and Its Application to Glioblastoma. Cell Syst. 2020 Sep 23;11(3):272-285.e9. PMCID: PMC7565356.
Ras Executable Model
The Ras Executable Model (REM) is a model of the signaling pathway of the Ras family proteins (K-RAS, N-RAS, H-RAS), including their regulators and effectors, at a biochemical level of detail. It is a rule-based model written in PySB. REM consists of three interlinked levels. Model components consist of PySB modules that implement pathway mechanisms. A model scenario corresponds to a specific use case and imports one or more model components in a fit-to-purpose manner to instantiate an executable model. Each model scenario can have a set of corresponding model analysis scripts. REM is built via a combination of automated assembly using INDRA (Integrated Network and Dynamical Reasoning Assembler), and manual development by a team of modelers.