Workshop on OR in Computational Biology, Bioinformatics, and Medicine

 

    Contributed Talks

    Title:          Path finding approaches and metabolic pathways

    Authors:   F.J. Planes1,2 and J.E. Beasley1

    1Mathematical Sciences, Brunel University, Uxbridge, UB8 3PH, UK; 2CEIT and TECNUN, University of Navarra, Manuel de Lardizabal 15, 20018 San Sebastian, Spain

    Abstract:

    The complex world of cellular metabolism has been commonly organised into metabolic pathways. A number of metabolic pathways, which are usually defined as a set of enzyme catalysed biochemical reactions by which a living organism transforms an initial source compound into a final target compound, have been elucidated via experiments on different model organisms. The problem of determining in a computational (algorithmic) fashion these metabolic pathways, given a database of biochemical reactions and compounds associated with a particular organism or cell, has attracted increased interest in recent years.

    Path finding approaches to metabolic pathways adopt a graph theory approach to the problem of determining the reactions an organism might use to transform a source compound into a target compound. In this paper the effectiveness of using compound node connectivities in a path finding approach is examined. An approach to path finding based upon integer programming is also presented.

    Title:         Comparing Structural Information in the Life Sciences: From RNA to Metabolic Networks

    Author:    Gunnar W. Klau

                       Free University Berlin

    Abstract:

    Graphs and networks are powerful means of abstraction in the life sciences, and their comparison is an important task. In structural biology, for instance, we can model the space of possible secondary structures of a given RNA sequence as a graph and formulate the problem of aligning multiple RNA molecules with respect to an additive sequence-structure scoring function as a graph problem. Similar techniques are used in the area of structural proteomics. Here, contact map graphs are used to describe protein structures in order to classify them according to their structural similarities. In systems biology, the comparison of metabolic or protein-protein interaction networks helps to understand cellular functions, evolutionary relationships, and disease mechanisms.

    Unlike for classical sequence alignment, most of these problems are NP-hard---even in the pairwise case---as they involve, for instance, the computation of a maximum common subgraph. Nevertheless, techniques from mathematical programming may be used to compute optimal solutions for large instances quite efficiently.

    Starting with the problem of aligning a set of RNA molecules, I will present a novel and unifying approach to the above comparison and alignment problems, which is based on an integer linear programming (ILP) formulation. Inspired by the work of Lancia and Caprara for the contact map overlap problem, we employ Lagrangian relaxation to find provably near-optimal solutions of the ILP in reasonable computation time. I will present results from an extensive computational study on benchmark alignments for the RNA case, which show that our approach outperforms alternative methods in terms of quality. Finally, I will show first results on the non-sequential contact map overlap problem and on the comparison of complex biological networks and stoichiometric matrices.

    Title:          A New Mathematical Approach in Environmental Protection: Gene-Environment Networks and Their Dynamics

    Author(s):  Gerhard-Wilhelm Weber, Pakize Taylan, Zeynep Alparslan Gök, Başak Akteke Öztürk, Süreyya Özöğür, Ömür Uğur and Aysun Tezel

                       Institute of Applied Mathematics, Middle East Technical University, Ankara, Turkey

    Abstract:

    A research area of central importance in computational biology, biotechnology - and life sciences at all -  is devoted to modeling, prediction and dynamics of gene-expression patterns. However, as clearly understood in these days, this enterprise cannot be investigated in a satisfying way without taking into account the role of the environment in its widest sense. To a representation of past, present and most likely future states, the authors also acknowledge the presence of measurement errors and further uncertainties. This paper surveys and improves recent advances in understanding the mathematical foundations and interdisciplinary implications of the newly introduced gene-environment networks. Moreover, it integrates the important theme carbon dioxide emission reduction into the context of our networks and their dynamics. Given the data from DNA microarray experiments and environmental records, we extract nonlinear ordinary differential equations which contain parameters that have to be determined. This is done by some modern kinds of (so-called generalized Chebychev) approximation and (so-called generalized semi-infinite) optimization. After this is provided, time-discretized dynamical systems are studied. Here, a combinatorial algorithm constructing and following polyhedra sequences, allows to detect the region of parametric (in)stability. Finally, we analyze the topological landscape of gene-environment networks in its structural stability. With the example of CO2-emission control and some further perspectives we conclude.

    This pioneering work is practically motivated and theoretically elaborated; it tries to support improvements in health care, medicine, education, more healthy living conditions and environmental protection. The authors discuss structural frontiers and invite the interested readers to future research.

    Title:          Mixed-Integer Programming based Hyper-Box Enclosure Method for Predicting Folding Type of Proteins

    Authors:    Fadime Uney Yuksektepe and Metin Turkay

                        Department of Industrial Engineering, Koc University, Istanbul, Turkey

    Abstract:

    A novel integer programming based hyper-box enclosure method is presented in order to predict the folding types of proteins. Proteins are mainly categorized in four different classes depending on their secondary structure content. In addition, it is shown that the folding types of proteins are correlated to their amino acid compositions. Traditional approaches using hyperplanes, such as support vector machines, that are very effective in classifying data sets into two groups perform poorly in multi-class problems. A novel hyper-box enclosure method is developed to overcome difficulties and inefficiencies of these approaches. The method uses Boolean variables to define the boundaries of the hyper-boxes that include all or some of the points in that class.

    The efficiency of the proposed approach is illustrated on a benchmark problem that includes a training set of 120 proteins (30 from each group). The self-consistency and cross-validation test results are also performed. Moreover, the same data set is studied by different classifiers of the software WEKA. The results show that the proposed method has a better classification accuracy on this data set compared other methods.

    Title:            Logic Mining Methods: introduction and applications to bioinformatics

    Author(s):   Paola Bertolazzi, Giovanni Felici

                          Istituto di Analisi dei Sistemi e Informatica “Antonio Ruberti” Consiglio Nazionale delle Ricerche

    Abstract:

    We present a particular class of data mining and machine learning techniques that can be applied to solve Classification, Clustering and Feature Selection problems. Such techniques are based on the use of logic variables and logic models that represent information and explanatory models.

    We show how these methods can be fruitfully adopted in bioinformatics applications. We describe effective methods to extract hidden information from large data sets, with particular focus on the integer programming models and algorithms on which they are based.

    Finally, several bioinformatics applications of the described data mining and machine learning techniques concerning automatic classification of microarray data, DNA sequence classification, feature selection from protein sequences, and Barcode analysis, will be presented.

    Title:          Exploring the specificity of the relationship between cortical network function and biological simulation parameters with a Particle Swarm Optimization algorithm.

    Authors:   David Gomez-Cabrero (1), Salva Ardid (2), Albert Compte (2)

    (1) david.gomez@uv.es (speaker)

    Departamento de Estadística e Investigación Operativa, Universitat de Valencia, Spain.

    (2) jsardid@umh.es, acompte@clinic.ub.es Institut d'Investigacions Biomčdiques August Pi i Sunyer (IDIBAPS) Carrer Villarroel 170, 08036 Barcelona

    Abstract:

    Mechanistic aspects of brain function can be studied with the use of biologically detailed computational models. Often, a critical question that arises from these computational models is how critically the conclusions depend on a particular choice of the simulation parameters. It is difficult to establish that the presented network model is unique in producing the relevant phenomenology. This issue has been approached before for the case of single neurons or small networks of neurons.

    Here, we design a computational strategy to explore this for the case of large-scale biological neural network simulations. We focus on a specific neural function: visuo-spatial working memory, and we construct a biological neural network that mimics the cortical network.

    We search for sets of parameters such that the network sustains persistent activity. We design several evaluation functions that quantify this ability and weigh them in a proper way. To guide the search we rely on the Particle Swarm Optimization. The first objective is to find if there exists a unique solution or a set of significant different solutions. In the second case, we explore and typify the different areas of solutions; for this second objective we rely on different neighbor policies among the particles.

    Title:          Polynomial time approximation algorithms for the Simplified Partial Digest Problem

    Author:     Alexandr Kovalev

                       Poznan University of Technology

    Abstract:

    The Simplified Partial Digest Problem (SPDP) is a mathematical model for a recently proposed simplified partial digest method of genome mapping. This method is easy for laboratory implementation and robust with respect to the experimental errors. SPDP is NP-hard in the strong sense for the case of measurement errors and for the error-free case. One way to tackle this problem is to develop efficient (polynomial time) approximation algorithms. I will present two optimization versions of the original problem, which are called SPDP-Min and SPDP-Max, and give results on the worst-case efficiency of approximation algorithms for these problems. Then I will describe a graph-theoretic model for SPDP-Min and SPDP-Max, which can be used to reduce the search space for an optimal solution in either of these problems. I will also outline three heuristic polynomial time algorithms based on the graph-theoretic model. The results of computer experiments with these algorithms on randomly generated data and real data from GenBank will be given. 

    Title:          A DNA Computing algorithm for calculating the maxflow in networks

    Author:    Andrea Sackmann

                       Poznan University of Technology

    Abstract:

    The main idea of DNA Computing is to encode information as DNA sequences and use known bimolecular techniques (based on the chemical properties of DNA) for manipulating the molecules to solve computational problems.  The approach's inherent value is its massive computational parallelism and its power of large database searching. The concept of DNA Computing was introduced in 1994 to solve the Hamiltonian path problem in a directed graph.  Since then various approaches have been developed dealing with mainly graph theoretical but also other mathematical problems (including NP hard ones).  An effort of the current research optimizes the procedure on the one hand for example by reducing the external interference by utilizing biochemical reactions which autonomously process computations. And on the other hand to optimize the designing of the sequences used.

    To the best of my knowledge an application of DNA Computing for solving net work problems has not been proposed yet. In the presented work I introduce an algorithm based on DNA Computing to solve the maxflow problem of a given network N = (G; c; s; t). Here, G is a directed graph G = (V; E) with weighted edges and these weights c denote the capacity of each edge. The vertices s and t are the source, and the sink of N, respectively. According to the Maxflow min-cut theorem the maximum value of a flow on a network is equal to the minimal capacity of a network's cut. Thus, the structure of the algorithm is as follows:

    1. Building templates for the vertices,
    2. Generating random partitions (S; T) of V (as the combinatorial library),
    3. Finding the capacities of the cuts of N,
    4. Detecting a minimal cut and read out its capacity.

    Finding suitable DNA sequences and carrying out the corresponding biological experiments are subject of my future research. 

    Title:          SNPs analysis in common diseases susceptibility

    Author:    Linda Fiaschi

                       University of Nottingham

    Abstract:

    Many common human diseases are caused by genetic variations within a single gene or influenced by complex interactions among multiple genes as well as environmental and lifestyle factors. It is currently difficult to measure and evaluate environmental and lifestyle effects on a disease process, therefore my study is focused on the analysis of individual predisposition to develop a disease based on genes and hereditary factors.

    In this talk I will give an overview of my specific task within the BIOPTRAIN project which consists in the SNPs analysis applied to Alzheimer and Pre-eclampsia diseases.

    Title:         TBA        

    Author:    Marc Vincent

                      Dipartimento di Sistemi e Informatica, Universitŕ degli Studi di Firenze

    Abstract:    TBA

     

    Title:         TBA

    Author:    Matt Labbé

                       Dipartimento di Sistemi e Informatica, Universitŕ degli Studi di Firenze

    Abstract:    TBA

     

    Title:         Gene prioritization through genomic data fusion

    Author:    Tranchevent LC, Aerts S, Van Loo P, Shi Y, Barriot R, Coessens B, Hassan B and Moreau Y.

                       Department of Electrical Engineering (ESAT-SCD), Katholieke Universiteit Leuven

    Abstract:      TBA

    Genome-wide experimental methods to identify disease / pathway genes (such as linkage analysis and association studies) often generate large list of candidate genes from which only a few are interesting. ENDEAVOUR (http://www.esat.kuleuven.be/endeavour) is able to prioritize such a list, i.e., give a rank to each gene according to its similarity with the disease / pathway of interest. To calculate the ranking, ENDEAVOUR use informations from multiple heterogeneous data sources such like microarray expression, abstract from literature, protein sequence, Gene Ontology annotations and protein-protein interactions for instance. The global similarity is then obtained by fusing the rankings into one global ranking using the order statistics. This concept has been successfully validated by performing a leave-one-out cross-validation on a set of 29 diseases, 3 pathways and 5 random sets.

    Here, we present an extension to the ENDEAVOUR general framework: the screening of the full genome in order to find gene of interest. Starting from a gene set of interest, and using exactly the same methods than in the general framework, all protein coding genes (around 23000 for human) are then used as candidates and ranked. Oppositely to the general framework, the prioritization is done off-line which considerably speed up the process. This service was validated with the same leave-one-out cross-validation, results show no major difference with the general framework

    We have also extended ENDEAVOUR to make it a multiple-species tool. Nowadays, the tool supports Homo sapiens, Drosophila melanogaster, Mus musculus, Rattus norvegicus and Caenorhabditis elegans. Some organism specific sources of information (e.g., Drosophila In-Situ hybridization) have been added to the system while some generic ones were extended (e.g., Gene Ontology, InterPro). Again, the leave-one-out cross-validation has been conducted to prove the principle. As an example, results of the validation on 9 Drosophila pathways are presented.

    In conclusion, we present an extension of the ENDEAVOUR framework that can prioritize selected candidate genes or whole genomes, in five major organisms.

    Title:           Heuristics for gene prioritisation

    Author:      Francisco Bonachela, Patrick De Causmaecker

                         Katholieke Universiteit Leuven

    Abstract:

    Polygenic disorders cause the major contribution to human mortality and morbidity in our modern society. Some approaches are being used in order to tackle them. Endeavour is a software tool that uses the candidate gene approach, which states that all of the disease genes share common characteristics. We propose a modification in Endeavour consisting of applying specific heuristics to classify disease genes in groups according to different characteristics.

               

Home