Editors: Anton Yuryev, Nikolai Daraselia

From Knowledge Networks to Biological Models

Personal Book: US $34 Special Offer (PDF + Printed Copy): US $136
Printed Copy: US $119
Library Book: US $136
ISBN: 978-1-60805-436-7
eISBN: 978-1-60805-437-4 (Online)
Year of Publication: 2012
DOI: 10.2174/97816080543741120101


Most findings about molecular interactions and cellular regulatory events are published in peer-reviewed scientific literature in the form of scientific jargon. The computerized text-mining algorithms are used to convert free grammar of human language into a set of formalized relationships between biological concepts in order to use this wealth of information. The compendium of such interactions extracted from an entire set of biomedical literature is a called knowledge network. Knowledge networks are the first step in the process of digitizing molecular biological knowledge. The next step is building molecular models depicting principal molecular events that govern various biological processes. Data mining in knowledge networks is the essence of building new biological models. The purpose is to elucidate major pathways of information flow through a molecular physical interaction network that happens during a disease, a cell process or an experiment. Such models contain key proteins involved in the process and can be used for prioritizing disease targets, for understanding of drug action and prevention of drug-induced toxicities, for analysis of patient predispositions and design of personalized therapies, for design of diagnostic biomarkers and analysis of patient molecular data. This e-book contains detailed examples illustrating the path to the digital biology and computerized drug development for personalized medicine. It provides conceptual principals for building biological models and for applying the models to make predictions relevant for drug development and translational medicine.

The e-book will also be useful for researchers who use high-throughput technologies for molecular profiling of disease and drug action. It provides examples for analysis of gene expression microarrays to infer biological models, to find biomarkers for drug response and for applications of high-throughput molecular profiling technologies for personalized medicine. Scientists in academia, in pharmaceutical industry as well as graduate students will benefit from reading this book. The illustrations from the book can also be readily used in taught courses for molecular biology and pharmacology.

Indexed in: Book Citation Index, Science Edition, BIOSIS Previews, EBSCO.


Molecular Biology rapidly evolves from experimental science to computational discipline. This transformation is fueled by simultaneous advances in modern computing and explosion of global molecular profiling methods. Typical molecular profile contains tens of thousands data points and its interpretation relies on the relational database storing formalized knowledge about molecular interactions. The development of computerized knowledgebase for pathway and network analysis started in the beginning of this century in response to the advances in DNA hybridization microarray technology which allowed simultaneous mRNA expression measurement for all genes in a biological sample. In 2003 Ariadne Genomics pioneered MedScan information extraction technology in order to find statements about molecular interactions in scientific literature and to automatically populate the knowledgebase with extracted information. MedScan is highly accurate technology which reliably converts the enormous amount of literature accumulated during more than 60 years of research into the knowledgebase suitable for computational analysis. MedScan makes Pathway Studio a unique software product which provides tools for navigating the most comprehensive knowledgebase in molecular biology.

Scientific literature proved to be extremely rich source of molecular interaction data which suffers from large number of errors and omissions. Biological data is intrinsically ambiguous not only due to the technical noise from the experimental set up but also due to the natural genetic variability and genetic linkage in biological samples. Genetic variability makes a response to the same environmental changes unique in every biological sample. Genetic linkage causes every response to include non-specific components which are functionally irrelevant to the response. High level of noise in the knowledgebase is exuberated with the noise in high-throughput molecular profiling data that is analyzed using the knowledgebase. Hence, the necessity to sift through the knowledgebase lead to the development of statistical algorithms capable of finding key regulatory events relevant to biological response or cell process in focus. I have worked at Ariadne Genomics on developing sub-network enrichment analysis which has become the major tool for interpreting raw molecular profiling data. I am happy to see how extensively SNEA is used throughout this book for making inferences from gene expression microarray data providing the foundation for building mechanistic models.

Building predictive mechanistic models in biology requires multiple expert skills including thorough understanding of context and experimental approaches used for measuring interactions in the knowledgebase, thorough understanding of the limitations of high-throughput molecular profiling technologies, expert understanding of cellular processes involved in disease or biological response, and thorough understanding of statistical algorithms enabling the knowledge inference. Very few people in the world possess this combination of skills. Therefore I am not surprised that the book is written almost entirely by Ariadne team who also took advantage of the powerful graphical interface for pathway visualization and construction available in Pathway Studio. This book provides readers with the deep insights into how the raw biological data can be converted into predictive in silco models.

Andrey Sivachenko, Ph.D.
Broad Institute
Boston, MA