Using Knowledge Assembly to Enable Network Biology
The LSP is developing new knowledge assembly systems to dramatically improve how we extract causal and mechanistic information from published literature.
Approaches to studying biological regulation, disease mechanisms, and drug targets that involve systems pharmacology approaches require computable knowledge about biological networks. Many databases have been developed to curate information on gene functions and protein-protein interactions but much of the information in the literature is not yet recorded in databases in a useful way. We are therefore developing novel approaches to knowledge assembly that combine natural language processing (NLP) with novel domain-specific programming languages such as PySB that can capture complex biological concepts in computable form. Much of this work focuses on advancing the Integrated Network and Dynamical Reasoning Assembler (INDRA), an automated model assembly software system that uses NLP to rapidly scan the published literature and also connects to existing databases to collect knowledge from a wide range of existing database. INDRA then assembles this knowledge into a self-consistent form, enabling the generation of causal graphs and dynamical computational models of disease mechanisms and drug action.
Bachman JA, Gyori BM, Sorger PK. Assembling a phosphoproteomic knowledge base using ProtMapper to normalize phosphosite information from databases and text mining. bioRxiv. 2019 Nov 6;822668. DOI: 10.1101/822668.
Gyori BM, Bachman JA. From knowledge to models: Automated modeling in systems and synthetic biology. Current Opinion in Systems Biology. 2021 Dec;28:100362. DOI: 10.1016/j.coisb.2021.100362.
Gyori BM, Bachman JA, Subramanian K, Muhlich JL, Galescu L, Sorger PK. From word models to executable models of signaling networks using automated assembly. Mol Syst Biol. 2017 Nov 24;13(11):954. PMCID: PMC5731347.
Bachman JA, Gyori BM, Sorger PK. Automated assembly of molecular mechanisms at scale from text mining and curated databases. Mol Syst Biol. 2023 May 9;19(5):e11325. PMCID: PMC10167483.