Editors: Anton Yuryev, Nikolai Daraselia

From Knowledge Networks to Biological Models

eBook: US $34 Special Offer (PDF + Printed Copy): US $148
Printed Copy: US $131
Library License: US $136
ISBN: 978-1-60805-436-7 (Print)
ISBN: 978-1-60805-437-4 (Online)
Year of Publication: 2012
DOI: 10.2174/97816080543741120101

Introduction

Most findings about molecular interactions and cellular regulatory events are published in peer-reviewed scientific literature in the form of scientific jargon. The computerized text-mining algorithms are used to convert free grammar of human language into a set of formalized relationships between biological concepts in order to use this wealth of information. The compendium of such interactions extracted from an entire set of biomedical literature is a called knowledge network. Knowledge networks are the first step in the process of digitizing molecular biological knowledge. The next step is building molecular models depicting principal molecular events that govern various biological processes. Data mining in knowledge networks is the essence of building new biological models. The purpose is to elucidate major pathways of information flow through a molecular physical interaction network that happens during a disease, a cell process or an experiment. Such models contain key proteins involved in the process and can be used for prioritizing disease targets, for understanding of drug action and prevention of drug-induced toxicities, for analysis of patient predispositions and design of personalized therapies, for design of diagnostic biomarkers and analysis of patient molecular data. This e-book contains detailed examples illustrating the path to the digital biology and computerized drug development for personalized medicine. It provides conceptual principals for building biological models and for applying the models to make predictions relevant for drug development and translational medicine.

The e-book will also be useful for researchers who use high-throughput technologies for molecular profiling of disease and drug action. It provides examples for analysis of gene expression microarrays to infer biological models, to find biomarkers for drug response and for applications of high-throughput molecular profiling technologies for personalized medicine. Scientists in academia, in pharmaceutical industry as well as graduate students will benefit from reading this book. The illustrations from the book can also be readily used in taught courses for molecular biology and pharmacology.

Indexed in: Book Citation Index, Science Edition, BIOSIS Previews, EBSCO.

Preface

Chapters in this book describe building mechanistic models for various human diseases and conditions. While each chapter provides novel insights into the disease mechanism and should be of interest to any expert in this disease, we note that the authors in this book have never published articles about the disease described in their chapter and have never performed any experiments to study the disease. All authors have learned and advanced the understanding of the disease mechanism either through analysis of knowledge networks or through analysis of publicly available gene expression datasets using knowledge networks. All chapters also have in common the use of Pathway Studio software from Ariadne Genomics. Pathway Studio provides access to the biological knowledge networks and tools for their navigation and analysis. Most knowledge in the Pathway Studio database is extracted automatically from scientific literature using MedScan information extraction technology. While MedScan is thoroughly described in publications from Ariadne Genomics, the goal of this book is to show how to use the extracted information for knowledge inference, for building mechanistic models, and for learning how to use the model and knowledge networks to make more informed predictions about disease targets and biomarkers. We emphasize that while every model in this book required MedScan-extracted knowledge networks, Pathway Studio also allows import and navigation of additional knowledge from other sources and databases. Some examples of additional knowledge - protein homology network or network of physical interaction imported from public PINA database - are described in the chapters about cholestasis and gastric cancer models.

So what are “knowledge networks”? There are a couple of ways to answer this question. The analogy with the computer science term “Semantic Web” is the first that come to mind. For readers with a biological background, another definition of “molecular biological knowledge networks” can be compressed, formalized representation of the knowledge about biological molecular interactions described in scientific literature. Statements about molecular interactions, molecular function, and about molecule roles in disease and other phenotypes are scattered among millions of articles published by the scientific community in the last 60 years. MedScan converts such statements into semantic triplets, e.g., “A regulates B” or “C binds B”, in order that they can be imported into a relational database. The Pathway Studio database generated by MedScan 5.0 technology contains more than 2.5 million unique relationships described in more than 18 million molecular biological articles. Knowledge networks stored in the Pathway Studio relational database provide instantaneous access to the knowledge generated by entire molecular biological research that has been supported by trillions of dollars of investment.

The compression of quintessential molecular biological knowledge into semantic triplets allows both a quick overview for users and rapid traversing using network navigation algorithms. By bringing together in one database information extracted from disparate knowledge domains, Pathway Studio enables individual domain experts to make analytical connections that have been previously unnoticed. It allows the making of statistically sounder conclusions that are based on all published observations rather than on the limited set of papers familiar to only one expert. There are three major domains in biomedical knowledge: physical and regulatory molecular interactions measured in basic academic molecular biological research; pharmacological effects and drug interactions published by medicinal chemists from the pharmaceutical industry and pharmacology and translational medicine departments in academia; disease - related molecular changes published by clinicians and medical doctors. Medical doctors rarely know molecular biology and basic scientists usually do not know much about pharmaceutical research. Bringing together molecular interactions and clinical observations are essential, however, for building a molecular mechanism of a disease. Knowledge about drug mechanisms is necessary for finding new drugs based on the mechanistic disease models.

Any given drug or disease may affect the activity of dozens and often hundreds of biological molecules. While contemporary high-throughput molecular profiling technologies, such as gene expression microarrays, can measure global molecular response, the interpretation of observed profile requires an overview of thousands of publications describing individual interactions between genes and proteins in the profile.

Such intermolecular dependencies are often measured in individual academic labs independently from clinical or drug research. Another example of separation in biomedical knowledge is the context specificity of observed molecular interactions and functions. Due to the high cost of molecular biological experiments, individual molecular interactions are usually measured only in the context of one tissue, organism or condition. Most of these context-specific interactions can be used for building a model for another disease or to explain the molecular profile measured in a different tissue or organism. While borrowing interactions from another organism or tissue is a common practice for building biological models, the search for such interactions through biomedical literature would be a daunting task without Pathway Studio and its knowledge networks.

This book is written by Pathway Studio experts to show how one can leverage the information integrated into the knowledge networks for building mechanistic models. While the knowledge networks consititute a global compendium of molecular interactions observed by entire molecular biological research, the mechanistic model of a disease, phenotype, or trait contains only a subset of such interactions. This subset must be sufficient to explain all or a majority of molecular observations about the condition. The first step in building a model is collecting all observations from various scientific publications and enriching it with the results obtained by a global molecular profiling experiment. For many complex diseases, such as cancer, this effort leads to the collection of several thousand proteins affected by the disease state. The process of model building can be described as complexity reduction of the observed molecular profile for a given disease or condition. You will learn from the book chapters that even changes in thousands of genes and metabolites affected by disease can be explained by the activity change in only a few biological pathways.

Three chapters in the book use public gene expression datasets profiling the disease state and comparing it to healthy control “normal” state. The principal technique of reducing complexity of a molecular profile is called sub-network enrichment analysis (SNEA). In the case of gene expression, SNEA uses the expression regulatory knowledge network to find transcription factors and other regulators responsible for the biggest changes observed in the experiment. You will see that, throughout the book, SNEA regulators can often be mapped onto one or several canonical pathways, indicating that pathway changes its activity in the disease state. Due to the small number of pathways known for the human organism, it is not always possible to map significant expression regulators identified by SNEA. Therefore, the last two chapters suggest other techniques - regulator clustering and pathway reconstruction - to classify expression regulators into a smaller number of functional communities in order to further reduce the complexity of the molecular profile.

We hope that the examples from this book will allow readers to start building models for their disease or phenotype of interest. The book starts with simpler chapters that use knowledge networks to review the state-of-the-art in a disease field. The last chapters describe more complicated applications of knowledge networks for building disease models by analyzing public gene expression datasets. Some chapters go beyond model building. Once the disease model is built, it can be used for more accurate prediction of biomarkers, repositioning of existing drugs, target selection for future drugs, and design of personalized therapy using the same knowledge networks available in Pathway Studio.

Anton Yuryev
Senior Director of Application Science at Ariadne Genomics
Ariadne Genomics
Rockville Maryland
USA

&

Nikolai Daraselia
Chief Scientific Officer at Ariadne Genomics
Ariadne Genomics
Rockville Maryland
USA