▼Demonstrations & Posters Abstract

[P1]
Hiroshi IZUMI1, Akihiro WAKISAKA1, Laurence A. NAFIE2,3, Rina K. DUKOR3

(1 AIST, 2 Syracuse University, 3 BioTools Inc.)
"Data Mining of Supersecondary Structure in Protein"
"タンパク質超二次構造のデータマイニング解析"

In solution state, several conformations are in equilibrium with each other, and the conformational information is necessary for the determination of absolute configuration of chiral pharmaceutical compounds. We have proposed a conformational code for the description of conformations of all kinds of chemical compounds and have developed the auto-conversion techniques of conformational information needed for structural homology between proteins, which can be represented by 20 symbols of amino acid residues. The fuzzy search and data mining techniques of supersecondary structure using conformational code with “h”, “s”, and “o” fragment patterns to convert 3D data to 1D data reflect well the shape of main chain. The characteristic fragment patterns “hhshshh” have been found in the armadillo repeat region of beta-catenin. In this presentation, the comparison result of the new method with the DSSP method to standardize secondary structure assignment is also discussed.



[P2]
Tamotsu Noguchi1,2, Kana Shimizu1, Satoru Kanai3, Shuichi Hirose1

1 CBRC, AIST, 2 Meiji Pharmaceutical University, 3 PharmaDesign, Inc.
"Development of software to support protein experiments"
"タンパク質実験を支援するソフトウェアの開発"

We developed a web system called POODLE (Prediction Of Order and Disorder by machine Learning) and ESPRESSO (EStimation of PRotein ExpreSsion and SOlubility).
POODLE is software to predict protein disorder regions, which do not form stable structures. For instance, protein disorder regions are involved in signaling, cell cycle control and molecular recognition. So they are important for understanding protein function. The POODLE system consists of four predictions, short disorder regions prediction, long disorder regions prediction, unfolded protein prediction and integrated system combining POODLE series with structural information obtained by several other prediction tools.
POODLE series are available at http://mbs.cbrc.jp/poodle/ and the source code is freely available.
ESPRESSO is a system for estimating protein expression and solubility in protein expression systems (Escherichia Coli, wheat germ cell-free and Brevibacillus choshinensis). ESPRESSO shows the two types of prediction result (i.e. property based prediction and motif based prediction). ESPRESSO is available at http://mbs.cbrc.jp/ESPRESSO/.



[P3]
Tsukasa Fukunaga1,2, Wataru Iwasaki1,2

1 Department of Computational Biology, the University of Tokyo, 2 Atmosphere and Ocean Research Institute, the University of Tokyo
"Gaussian mixture model-based multiple object tracking for analysis of animal social behaviors."
"動物の社会行動解析のためのガウス混合分布に基づく複数オブジェクトトラッキング"

Animals display a wide range of interesting and fascinating behaviors. Behaviors involving inter-individual interactions are of particular interest, because they are the most complex outcome of their nervous systems and may reflect emergence of societies and intelligence. A severe bottleneck lies in simultaneous and precise quantification of multiple animal behaviors, which is a laborious task that requires extensive human efforts. In the present work, we developed an accurate multiple object tracking system for video sequences of animal social behaviors. The system extracts the position of each individual using the dynamic binarization thresholding method, models a group of animals as the gaussian mixture distribution, and accurately tracks each individual using the expectation maximization algorithm. With this powerful tracking system, we aim at revealing unexplored structures and rules behind animal interactions, which would provide insights into the evolution of nervous systems and societies.



[P4]
Kazunori YAMADA, Kentaro TOMII

CBRC, AIST
"Developing a novel amino acid substitution matrix suitable for detecting distantly related proteins"

Protein sequence comparison method is the most fundamental tool for a wide spectrum of biological studies including protein structure and function prediction. To improve its performance, optimizing an amino acid substitution matrix is indispensable. In this study, we tackled to develop a novel matrix capable of detecting remote homology. At first, we conducted a principal component analysis with nine of typical existing matrices and acquired three PC axes with high contribution ratio. In a PC subspace of these three axes, we searched the best performance matrix with SSEARCH for SCOP ASTRAL 20 subset as a training dataset. Finally, we identified a matrix with the best detection performance and tested it on a test dataset. Our developed matrix showed improvement of detection performance compared to the existing matrices and other sequence comparison methods. Using this matrix in combination with sophisticated comparison algorithm such as profile-profile comparison method, further improvement would be achieved.



[P5]
Yutaka Saito1, Junko Tsuji1,2, Toutai Mituyama1

1 CBRC, AIST, 2 Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo
"Bisulfighter: accurate detection of methylated cytosines and differentially methylated regions"

We present Bisulfighter, a new software package for detecting methylated cytosines (mCs) and differentially methylated regions (DMRs) from bisulfite sequencing data. Bisulfighter combines the LAST alignment tool for mC calling, and a novel framework for DMR detection based on hidden Markov models (HMMs). We conduct extensive experiments in which accuracy of mC calling and DMR detection is evaluated on simulated data with various mC contexts, read qualities, sequencing depths, and DMR lengths, as well as on real data from a wide range of biological processes including pathogenesis and normal development. We demonstrate that Bisulfighter consistently achieves better accuracy than other published tools, providing greater sensitivity for mCs with fewer false positives, more precise estimates of mC levels, more exact locations of DMRs, and better agreement of DMRs with gene expression and DNase I hypersensitivity.



[P6]
Goro Terai1,2, Satoshi Kamegai1,2, Kiyoshi Asai1,3

1 CBRC,AIST, 2 INTEC Inc., 3 Graduate Schoole of Frontier Siences, the University of Tokyo
"An algorithm for designing DNA sequence in protein coding regions"
"タンパクコード領域のDNA配列設計アルゴリズム"

The production of useful materials by microorganisms, which has advantages in terms of energy consumption and environmental stress, is based on gene recombination technology. In introducing a gene by recombination, it becomes a problem how to design the DNA sequence that carries the gene into microorganism, because the expression level of genes varies for each DNA sequence.
In this study, we developed a method for designing DNA sequence with high translation efficacy. The translation of a gene is affected by secondary structure and codon usage in a protein coding region. We devised an algorithm to optimize codon usage under given secondary structural constraints. The algorithm is based on a dynamic programing, and therefore can find the optimal solution. The secondary structural constraints can be set flexibly. For example, the user can eliminate secondary structure from around start codon, and conversely can insert stable secondary structure in an arbitrary location.



[P7]
Yutaka Saito, Toutai Mituyama

CBRC, AIST
"Finding 3D co-localization of genomic elements from HiC data"

Among a variety of epigenomic data, high-throughput chromatin conformation capture (HiC) provides unique information about how genomic elements co-localize in 3D nuclear space.
Recently, several methods have been proposed for analyzing HiC data in a hypothesis-driven manner, i.e. testing a user-specified set of genomic elements for their statistical significance of 3D co-localization. However, to take full advantage of genome-wide HiC data, it is more interesting to search for unspecified sets of genomic elements which show the most significant 3D co-localization. Here, we employ a simple greedy algorithm for finding the set of genomic elements so that it maximizes the z-score of 3D co-localization statistics. We detect 3D co-localization clusters for both protein-coding and non-coding elements, and provide their biological implications.



[P8]
Kenichiro Imai1, Yoshinori Fukasawa1,2, Kentaro Tomii1, Paul Horton1,2

1 CBRC, AIST, 2 Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo
"Improved prediction of mitochondrial presequence and cleavage site"

Approximately 1500 different proteins are imported into mitochondria. About half of known mitochondrial proteins possess an N-terminal targeting signal (presequence). The presequence is in most cases cleaved off by the mitochondrial processing peptidase (MPP) in the matrix. The amphiphilic helical motif for the Tom20 import receptor has been proposed however this motif matches about 50% of known presequences. Also, the R-2 motif for MPP has been reported, but the motif covers only 40% of those cleavage sites. Moreover, presequence is poorly conserved between orthologs. Thus, the features of presequence remain unclear. To improve prediction of presequence and cleavage site, we attempt to search for novel motifs and develop novel profiles of cleavage site based on recent yeast and plant proteomic presequence data and also considering cleavage by intermediate proteases. Our predictor integrating novel motif match and cleavage site profiles attains better performances than the present predictors.



[P9]
Mariko Morita, Osamu Gotoh

CBRC, AIST
"Kingdom-wide prediction of cytochrome P450 genes by Famaln"

We are developing an automated computational pipeline named “Famaln”. Famaln is a protein sequence similarity-based gene prediction tool that is applicable to a set of high-quality genomic sequences derived from one or more kingdom of life. We have individually applied our full-automated version of Famaln to 28 plant and 204 fungal genomes to comprehensively find cytochrome P450 genes on the genomes. As the results, we found 9,154 plant and 15,035 fungal P450 gene candidates. We assessed the accuracy of the predicted gene organizations by referring to the results of genome mapping of Expressed Sequence Tags derived from the corresponding species. The comparison indicated that more than 95 % of the testable introns were correctly predicted at the both boundaries.



[P10]
Junko YAMANE, Michihiro TANAKA, Wataru FUJIBUCHI

Center for iPS Cell Research and Application (CiRA), Kyoto Universitybr> "Single-cell RNA-seq analysis of human iPS and ES cells"

Single-cell analysis provides opportunities to describe the hidden mechanism of gene expression pattern. Recent analysis revealed that individual cells within the same cell line differ dramatically its expression pattern in mouse feeder and ES cells at single-cell level. However these properties are not yet understood in human iPS and ES cells. To elucidate these differences, we performed single-cell whole transcriptome study by highly multiplexed RNA-seqs and examined transcriptome of human iPS and ES cells at single-cell level. We checked three human iPS cell lines, two ES cell lines, and a mouse feeder cell line. Here we identified six candidates for new pluripotent marker genes. These genes showed cell specific expression pattern. On the other hand, known pluripotent markers, POU5F1, SOX2, and NANOG showed heterogenous expression pattern within the same cell line. These findings reveal that some famous markers do not work well in single-cell level because they are highly variant.



[P11]
Michiaki Hamada1,2

1 The University of Tokyo, 2 CBRC, AIST
"Recent progress of RNA secondary structure predictions"
"2次構造予測の最近の進展"

Recent research has revealed that a number of RNAs which are not translated into proteins play important roles in cells. These RNAs are called non-coding RNAs (ncRNAs), and have attracted remarkable attention. It is known that the functions of ncRNAs are often related to their respective structures, and we have developed several algorithms for RNA secondary structure predictions, including the CentroidFold software (http://www.ncrna.org/centroidfold). In this poster, I will introduce recent progress of RNA secondary structure predictions, including (i) benchmark results by Janusz M. Bujnicki group (Nucl. Acids Res. (2013) 41 (7): 4307-4323), (ii) a novel method to incorporate experimental information (e.g. SHAPE) into RNA secondary structure predictions (Hamada, Journal of Computational Biology, 19(12): 1265-1276, 2012) and (iii) a semi-supervised learning approach for RNA secondary structure predictions (manuscript preparation with Haruka Yonemoto (NTT) and Kiyoshi Asai (Univ Tokyo)), where we employ not only RNA sequences with secondary structures but also RNA sequences without secondary structures in learning internal parameters of a probabilistic model for RNA secondary structures.



[P12]
Michihiro Tanaka1, Pui Shan Wong1, Sachiyo Aburatani1, Yoshihiko Sunaga2, Masaki Muto2, Masayoshi Tanaka2, Michiko Nemoto2, Takeaki Taniguchi3, Mitsufumi Matsumoto4, Tomoko Yoshino2, Tsuyoshi Tanaka2, Wataru Fujibuchi1,5

1 CBRC, AIST, 2 Tokyo University of Agriculture and Technology, 3 Mitsubishi Research Institute, 4 Electric Power Development Center, 5 Center for iPS Cell Research and Application (CiRA), Kyoto University
"Draft assembly of the marine pennate diatom, Fistulifera sp. strain JPCC DA0580 genome reveals the chromosomal evidence for allopolyploidy in genome structure"
"海洋性羽状珪藻Fistulifera sp. strain JPCC DA0580ゲノムのドラフトアセンブリから明らかになった異質倍数性の証拠"

The marine pennate diatom, Fistulifera sp. strain JPCC DA0580, which is an autotrophic organism and accumulates neutral lipid (triacylglycerol) according to environmental nitrogen levels, is a promising candidate for industrial source of biodiesel; however, the underlying regulatory mechanism remains far from clear. To address this issue, we performed the draft genome sequencing using pyrosequencing. With 124 Mb and 19,859 protein-coding genes, the number is 1.8 and 1.5 times larger than those of nearest diatoms, P. tricornutum (10,402 genes) and T. pseudonana (11,776 genes), respectively. Furthermore, we used scaffold sequence similarity searches to detect scaffold pairs corresponded to allopolyploid-like genome structure. These findings provide a fundamental source for detecting of factors affecting the nature of the oil accumulation.



[P13]
Kentaro HAMADA1, Michiaki HAMADA1,2, Kiyoshi ASAI1,2

1 Department of Computational Biology, Graduate school of Frontier Sciences, The University of Tokyo, 2 CBRC, AIST
"Inferring constraints on amino acids from protein sequence alignment"
"タンパク質アラインメントからアミノ酸残基にかかる制約の推定"

Sequence alignment is one of the most central problems in computational biology. Recently, many alignment programs have been developed and some programs return not only alignment, but also information on conservation represented as ‘*’, ‘:’. and ‘.’ below each alignment column complementary. However, any other information on conservation such as constraint on amino acid is not included. Here we introduce a new method to immediately and intuitively comprehend constraints on each alignment column by measuring bias of each constraint. This approach explicitly shows what constraints the residues are conserved by. We also experimented on both simulated and real sequence alignment data. It shows that proposed method presents flexibility and more detailed information about conserved amino acids that conventional methods cannot find out. Especially, our approach makes it possible to directly infer constraints on residues such as polarity, charge, chemical structure, etc., or a reduced-alphabet structure in Murphy et al. (2000).



[P14]
Ryota Mori1, Michiaki Hamada1,2, Kiyoshi Asai1,2

1 Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa City, Chiba, Japan, 2 CBRC, AIST
"A fast and exact calculation for various score distributions of RNA secondary structure"

RNA secondary structure is widely studied because of the close relationship with its biological function and ease of analysis.
Today, a great number of methods are proposed to not merely estimate RNA secondary structure itself but capture more detailed properties of structure.
Examples of such properties are mean distance between 5 prime and 3 prime ends, entropy of ensemble, approximate folding size, or its local stability.
We can generally obtain exact calculation of distributions of integer assigned to each RNA secondary structures.
In this poster, we show several applications as concrete examples, and these examples imply procedures to construct fast and exact calculations of integer score distributions given by RNA secondary structures.



[P15]
Chen-Hua LU1, Che-Rung LEE1

1 Deptartment of Computer Science, National Tsing Hua University
"A Memory Efficient and Fast-Constructed Algorithm for Massive Short Sequence Indexing"

Indexing is a fundamental yet advancing topic in bioinformatics field and now providing strong foundations for NGS Application. Most modern tools use either table lookup method or suffix array-based full-text index, depend on the trade-off between CPU time costs while index building and memory efficiency. In this research, we present an algorithm combining keyword tree and array structure to index massive short sequences in effective memory footprint and high building/querying performance. For a given set of short texts S, the algorithm sorts S in lexical order and builds a table to count all character presenting in S. The algorithm deals with a query by table traversal. Meanwhile, a back-track mechanism is provided to handle mismatches efficiently like FM-index bases alignment tools. The efficiency of this algorithm also shows potentials to fit current GPGPU memory configuration.



[P16]
Anish Man Singh Shrestha

Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo
"Reducing the memory required for constructing suffix arrays for spaced and subset seed queries."

Alignment algorithms employing the seed-and-extend technique first identify potentially similar regions using “seeds”, short strings that can be found in both input sequences. It has been shown that the sensitivity of these algorithms increases significantly when “spaced/subset seeds” that allow mismatches at certain predefined positions are used. Suffix arrays, which can be constructed with linear time and space complexity, are widely used for efficient pattern (seed) searching in large texts. It has been shown that standard suffix array construction algorithms can be easily modified to construct a suffix-array-like index structure capable of efficiently searching for spaced/subset seeds. However, the worst-case working memory for the current method is about *4n* bytes for a text of length *n *(for 32-bit systems). We show that a simple technique brings this figure down by one-third.



[P17]
Thomas M. Poulsen, Martin C. Frith, and Paul Horton

CBRC, AIST
"Detection of homologous regions using higher order hidden markov models"

The ability to analyze and find homologous regions between nucleotide sequences is a fundamental topic in biological research. One difficulty in detecting homologs is that they may contain tandem repeats and low complexity regions that produce alignments that are not homologous. Standard methods attempt to address this issue by masking such regions, but masking does not distinctively model the difference between alignment of low complexity and 'standard' regions, and thus risks disregarding important information. In addition, such methods typically align single nucleotides and do not consider preceding sequence information, possibly reducing alignment accuracy significantly. In this study, we employ higher order pair Hidden Markov models (HMMs) that use dependent emission probabilities such that repetitive regions and preceding nucleotides are probabilistically scored. We find that our model achieves higher homology detection rates than standard HMMs and heuristic models in a comparison that includes a range of different animal species.



[P18]
Sascha Meiers1, David Weese1, Martin Frith2, Knut Reinert1

1 Algorithmic Bioinformatics, Freie Universitat Berlin, Germany, 2 CBRC, AIST
"SeqAn - The C++ library for sequence analysis"

SeqAn is a large open source C++ library of efficient algorithms and data structures for the analysis of sequences with focus on biological data. The exceptional design based on template meta programming guarantees high performance, generality and extensibility. SeqAn is easy to use, platform independent and freely available. Various tutorials, consistent documentation and an annual workshop support users in their first steps into SeqAn.
The library offers an immense range of sequence algorithms, index structures like enhanced suffix arrays or the FM index, file IO into popular file formats and much more. Latest enhancements include multi-sequence compression (journaling) and parallelization of core functionality on CPU and GPU.
In a current project we reimplement the local alignment search tool LAST (Frith et al, 2011) using multiple gapped enhanced suffix arrays. This ongoing collaboration extends the library's functionality by the new gapped index and is supposed to result in an efficient SeqAn-version of LAST.



[P19]
David A. duVerle1, Ichiro Takeuchi2 and Koji Tsuda1

1 CBRC, AIST, 2 Department of Computer Science, Nagoya Institute of Technology, Nagoya
"Discovering Combinatorial Interactions in Survival Data"

While several methods exist to relate high-dimensional gene expression data to various clinical phenotypes, finding combinations of features in such input remains a challenge, particularly when fitting complex statistical models such as those used for survival studies.
Our method builds on existing ‘regularisation path-following’ techniques to produce regression models that can extract arbitrarily complex patterns of input features (such as gene combinations) from large-scale data that relate to a known clinical outcome. Through the use of the data’s structure and itemset mining techniques, we are able to avoid combinatorial complexity issues typically encountered with such methods and our algorithm performs in similar orders of duration as single-variable versions.
Applied to data from various clinical studies of cancer patient survival time, we were able to produce a number of promising gene-interaction candidates whose tumour-related roles appear confirmed by literature.



[P20]
Chun Fang1,2, Tamotsu Noguchi2,3 , Hayato Yamana1

1 Department of Computer Science and Engineering of Waseda University, 2 CBRC, AIST, 3 Meiji Pharmaceutical University
"Identifying Molecular Recognition Features in Disordered Proteins"

We propose a novel method which adopts a modified PSSM encoding scheme for identifying Molecular recognition features (MoRFs) in disordered proteins. By means of masking, filtering and smoothing the position-specific scoring matrix (PSSM), the modified PSSMs combine predictive features, which can effectively distinguish MoRF from non_MoRF residues. Our method employs no predicted results as input, and all used features are extracted from the PSSM of sequence only. Experimental results show that, comparing with other methods tested on the same datasets, our method achieves the best performance. In addition, when tested on an independent membrane proteins-related dataset, our method significantly outperformed the existing predictor MoRFpred. This study shows that: 1) MoRFs are mingled with highly conserved residues and highly variable residues, and on the whole, are highly locally conserved; and 2) combining contextual information with local conservation information of residues is predictive for identifying MoRFs.



[P21
Junichi Iwakiri1, Michiaki Hamada1, Kiyoshi Asai1,2, Tomoshi Kameda2

1 Grad. Sch. Frontier Sci., the Univ of Tokyo, 2 CBRC, AIST
"The 3D structure prediction of Protein and RNA complex"
"蛋白質ーRNA複合体の立体構造予測"

Interactions between biomolecules play an important role in the function of biological processes. To understand the mechanism of interaction, it is essential to acquire the three dimensional (3D) structure of biomolecule complexes.
However, it is difficult to solve the structure of biomolecule complexes compared to monomeric protein.
So, the computational 3D structure prediction of complex (often called “docking problem”) has been studied. Although 3D structure prediction of protein-protein complex and protein-compound complex has been investigated by many researchers during decades, there are a few studies about protein-nucleic acid complex.
Now, we introduce the method to predict 3D structure of protein-RNA complex. Our method is applied to 72 complex structures, its success rate is ~29%, which may be world record in this research area. Moreover, it usually requires within only an hour to acquire result by using a general desktop computer.



[P22]
Masanori Yamanaka

CST, Nihon University
"Random matrix refinment of principal component analysis of MD simulation of protein"

We apply the random matrix theory to analyze the time series data of motion of atoms of proteins which is produced by the all-atom MD simulation with solvent. The cross-correlation matrices are constructed from the time series data. We calculate the fundamental statistical quantities which characterize the universality class in the random matrix theory. Throughout the results of different time intervals, we find that the eigenvalue spacings agree well with the Gaussian orthogonal ensembles. Following the random matrix theory, we analyze the inverse participation ratio, determine the correlated atoms, and classify the correlation sectors of the atoms, which enables us to refine the principal component analysis.



[P23]
A. Taneda

Graduate School of Science and Technology, Hirosaki University
"RNA inverse folding for multiple target secondary structures"
"複数のターゲット二次構造に対するRNA配列設計"

RNA inverse folding is an inverse problem of RNA structure prediction, where we explore RNA sequences which fold into a given secondary structure, i.e. a target structure. Since RNA inverse folding can be useful for synthetic RNA sequence design, various inverse folding algorithms, such as RNAinverse in the Vienna RNA package, have been proposed. We have developed an RNA inverse folding algorithm based on a multi-objective genetic algorithm, where multiple target structures are taken into account in the objective functions. We can design RNA sequences with conformational changes like riboswitches by using the inverse folding for multiple target structures. In this poster presentation, we will show the details of our inverse folding algorithm and discuss the differences between ours and the other algorithms for multiple target structures.



[P24]
Szu-Chin FU1, Kenichiro IMAI1, Kentaro TOMII1

1. CBRC, AIST
"Improving prediction of Caspase-3 substrates by using experimentally verified non-cleavage sites"

Caspases are a family of cysteine proteases critical to the initiation and progression of apoptosis. Amongst all family members, Caspase-3 is thought to be a main executioner caspase due to its wide range of substrates. Experimental identification of new Caspase-3 substrates has been an active field of research. However, there is only one computational method, namely CAT3, designed specifically for the same purpose. We propose ScreenCap3, a new predictor for Caspase-3 substrates and the cleavage sites. As the most different feature compared to CAT3, ScreenCap3 is trained by more reliable information of non-cleavage sites. Instead of unconditionally labeling the spurious consensus matches as negative data, we undertook a careful literature search, retrieving a total of 1291 non-cleavage sites verified experimentally. In addition, our data set includes 473 Caspase-3 cleavage sites - 200 more than used in previous method. The improved performance demonstrates the advantage of using experimentally verified non-cleavage sites.



[P25]
Yutaka Ueno1,2, Shuntaro Ito 2

1 Health Research Institute, AIST, 2 Graduate School of Information Science, Nara Institute of Science and Technology
"Software Development for the Structure Based Modeling of Protein Interactions and Movements"

On examining models of protein interaction using structural models together with results of molecular dynamics simulations, the smooth manipulation of molecular models should be aided by a software tool. With recent advances in the scripting language and three dimensional graphics methods, we have started a software development for modeling protein interactions allowing construction of large molecular complex systems. The framework of describing movements and interactions of molecules with geometrical modifications is provided by the scripting language Lua. The collision detection of the molecular objects is calculated by Open Dynamics Engine (ODE) implemented in Luxinia, a software development toolkit. The fundamental design of our software tool with shared resources is discussed.



[P26]
Minoru Sugihara1, Makiko Suwa1,2, Ana-Nicoleta Bondar3

1 CBRC, AIST, 2 Aoyama Gakuin University, 3 Free University of Berlin
"Signal propagation pathway and interaction with G protein in bovine opsin"

In this work we discuss the signal propagation pathway and interaction with G protein based on the prototype of G protein coupled receptors (GPCRs), bovine opsin. The initial model was prepared from the crystal structure of opsin binding to the carboxy terminus of alpha-subunit in G protein (GtCT) [1] and we performed molecular dynamics (MD) by using the namd package. The extended hydrogen bond network from the retinal (ligand) binding site to Arg135 in the cytoplasimic site is highly stable during MD run at the room temperature and this hydrogen bond network is functionally important for the signal propagation. Arg135 is highly conserved in GPCRs (98%) and this residue interacts with GtCT through the backbone carbonyl oxygen of Cys347. MD simulation confirms that Arg135 plays a significant role in the interaction of opsin with the alpha-subunit in G protein.
[1] P. Scheerer et al. Nature 455 (2008) 497.



[P27]
Sachiyo Aburatani1, Myco Umemura2, Nozomi Nagano1, Kentaro Tomii1, Kiyoshi Asai1, Masayuki Machida2

1 CBRC, AIST, 2 BRI, AIST
"Inference of Gene Regulatory Network for Kojic Acid Biosynthesis in A.oryzae "
"構造方程式モデリングによるコウジ酸生合成メカニズムの推定"

Kojic acid is natural compound produced by fungi, especially Aspergillus oryzae. During fermentation process in A.oryzae, kojic acid is produced as by-product, and it is widely used as a food additive for preventing enzymatic browning, and so on. While the mechanism of action of kojic acid is well defined, the regulatory mechanism of kojic acid biosynthesis has not been uncovered. To produce kojic acid effectively, it is important to clarify the mechanism of kojic acid biosynthesis. In this study, we combined partial correlation coefficients and factor analysis to detect the 24 koj-related genes, and we applied our SEM approach for inferring a network model between the 24 genes. The inferred network model indicated that transcription factor kojR regulated not only the other koj genes, but also the precursor related genes. Furthermore, the expressions of koj-related genes were affected by Cellobiose dehydrogenase, which is frequently discovered in various fungi.



[P28]
Pui Shan Wong1, Michihiro Tanaka2, Yoshihiko Sunaga3,4, Masayoshi Tanaka3, Takeaki Taniguchi5, Tomoko Yoshino3,4, Tsuyoshi Tanaka3,4, Sachiyo Aburatani1, Wataru Fujibuchi1

1 CBRC, AIST, 2 Center for iPS Research and Application, Kyoto University, 3 Institute of Engineering, Tokyo University of Agriculture & Technology, 4 JST, CREST, 5 Mitsubishi Research Institute, Inc.
"Pathway analysis on Fistulifera diatom to explore triacylglycerol metabolism"

Fistulifera sp. strain JPCC DA0580 is a triacylglycerol (TAG) producing diatom in low nutrition (nitrogen) conditions. Its TAG production systems are of great interest for the production of biofuel. In an effort to learn about the regulatory system of TAG production, we applied statistical methods and pathway analyses on gene expression data measured in control and TAG producing conditions at time intervals of 0, 24, 48 and 60 hours. As increased TAG production is thought to be associated with the up-regulation of carbohydrate, energy and lipid related metabolism, we used these pathways as the gene sets for the parametric analysis of gene set enrichment (PAGE). PAGE was adjusted to include expression data at different time points and to no longer assume normality. We then used it to analyze the data on 37 gene sets created from KEGG pathways. There were 20 significantly enriched pathways containing 219 genes of which 7 pathways were up-regulated. We also used PAGE in a new approach to find out how the enriched pathways were connected together by using common genes within each pathway to represent a form of pathway interaction. Analysis on the gene-level was conducted on the 219 genes from the significantly enriched pathways. They were clustered by expression and time to group similarly expressed genes. There were 14 genes that were identified to have positive fold change through all time points. These genes shared the same clades with 7 genes that belong to different pathways. Although the majority of genes in our data were down-regulated in low nutrition conditions, some specific pathways were found to be up-regulated in our analysis. Since those enriched pathways are related to low nutrition conditions, the difference in pathway activity can be considered to be a switch for TAG production.



[P29]
Daisuke Tominaga

CBRC, AIST
"Dynamic Changes of Regulatory Intensities in the Mouse Circadian Rhythm Gene Network "
"マウス概日周期遺伝子制御ネットワークにおける構造の動的な変化"

In Gene Regulatory Networks of the circadian clock of mouse, expression levels of genes oscillate in a 24 hour period. This oscillation is drived by regulation schemes, such as feed back loops. Although knowledge about these schemes of natural oscillating GRNs is growing rapidly, it is not unclear that whether kinetics of these regulation schemes, i. e. strength of gene-to-gene regulation, can be represented by fixed parameter values or not.
The S-system is one of the most established canonical ordinary differential equation model, however, ordinary application of the S-system in ordinary way to GRNs is quite difficult due to the high dimensional nonlinear real parameter space. Furthermore, this approach ignores the possibility of dynamic changes of gene-to-gene regulation strength in GRNs.
We introduce decoupling and log-transformation to the S-system to depict dynamic changes of gene-to-gene regulation strength.



[P30]
Shohei MARUYAMA1,2, Sachiyo ABURATANI,2, Yasuo MATSUYAMA1

1 Department of Computer Science and Engineering, Waseda University, 2 CBRC, AIST
"Automatic Estimation of Transcriptional Regulation of Budding Yeast by a Machine Learning Method"
"出芽酵母における転写制御関係の自動抽出システムの構築"

Revealing the gene regulatory system between DNA and protein is one of the important tasks for understanding a living cell as a controlled system. As the foundation of intracellular system, the expressions of genes are mainly controlled by Transcription Factors (TFs). Thus, we applied one of machine learning techniques to detect regulatory relationships between genes and TF proteins. In this study, we applied Relevance Vector Machine (RVM) to expression profiles of S. cerevisiae. We compiled 93 public data sets from 704 gene expressions, which were measured at 1,411 different conditions. The expression profiles of empirically confirmed regulations between TFs and genes were arranged as vectors for positive data. Negative data were defined as follows: i) TFs and genes without binding site in upstream region, ii) gene pairs of enzyme coding genes, and iii) reverse arrangements of positive data. Being applied to the defined data, RVM recognized the specific patterns of transcriptional regulations and estimated probabilities of transcriptional regulations based on those patterns. The accuracy of our developed method was 84%, which was higher than the accuracy of SVM (79%). To find new transcriptional regulations, we applied our method to all gene pairs. As a result, 313,500 new regulations from TFs were detected and we found some conserved sequences as new binding sites among the regulated genes. Our developed method can be useful for estimating the unexplained transcriptional regulations.



[P31]
Kana Shimizu1, Koji Nuida2, Hiromi Arai3, Shigeo Mitsunari4, Michiaki Hamada1,5, Koji Tsuda1, Jun Sakuma6, Takatsugu Hirokawa1,7, Goichiro Hanaoka2, Kiyoshi Asai1,5

1 CBRC, AIST, 2 RISEC, AIST, 3 RIKEN, 4 Cybozu Labs, 5 University of Tokyo, 6 Tsukuba University, 7 Molprof, AIST
"An efficient privacy-preserving similarity search protocol for chemical compound databases"
"類似化合物のプライバシ保護検索を行うための効率的なプロトコル"

Searching similar compound from a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. There arises serious dilemma, however, when the database holder also wants to output no information except for the results of the search, and such dilemma prevents efficient use of many important databases. Therefore, it is emerging demand to develop the new technology which overcome this dilemma. In this study, we propose a novel protocol which enables searching databases while keeping both a query holder's privacy and a database holder's privacy. Generally, such a privacy-preserving protocol entails highly time-consuming cryptographic techniques such as general purpose multi-party computation, but our protocol is successfully designed without relying on such techniques and built from only additive-homomorphic cryptosystem. Hence its performance is significantly efficient both in CPU time and communication size, easily scales for large scale databases. In the experiment searching on ChEMBL, which consists of more than 1,200,000 compounds, the proposed method is 50,000 times faster in CPU time and 12,000 times efficient in communication size comparing to general purpose multi-party computation. So far, technology related to privacy issues has been scarcely discussed in the field of bioinformatics, thus, we think our study serves as the important model which examines practical application of privacy-preserving datamining.



[P32]
Hiroaki Iwata1, Minako Yoshihara2, Yoshihiro Yamanishi1

1 Division of System Cohort, Medical Institute of Bioregulation, Kyushu University, 2 Division of Bioinformatics, Medical Institute of Bioregulation, Kyushu University
"Predicting drug-disease association network toward drug repositioning"

Drug repositioning, or detection of new drug indications of known drugs, is an important issue in recent pharmaceutical research. In this study, we develop a new computational method to predict unknown drug-disease association network on a large scale toward drug repositioning. We define the descriptor for each drug-disease pair based on various data sets of drugs and diseases (e.g., drug side-effects, target proteins, biological pathways, disease symptoms, environmental factors), and construct a predictive model based on the descriptors in a supervised machine learning framework. In the results, we demonstrate the usefulness of the proposed method by performing cross-validation experiments on known drug-diseases associations. Finally, we make a comprehensive prediction of drug-disease association network consisting of about 4,500 drugs and 1,000 diseases, and show some successful examples of newly found drug indications.



[P33]
Nozomi NAGANO1, Tsuyoshi KATO2, Naoko NAKAYAMA3

1 CBRC-AIST , 2 Faculty of Science and Engineering, Gunma University, 3 Technology Research Association of Highly Efficient Gene Design (TRAHED)
"Analyses of enzyme structures and functions"
"Analyses of enzyme structures and functions"

There has been no enzyme classification that has considered 3D-structures and catalytic mechanisms. Therefore, in our project, enzyme structures and functions, including enzyme reaction mechanisms, have been analyzed and classified, in order to develop the enzyme reaction database, EzCatDB, as well as the enzyme-related softwares. Our novel enzyme classification can complement the conventional classification, Enzyme Commission Number (EC), with enzyme reaction mechanisms and active sites. Meanwhile, as an EzCatDB-related software, EzMetAct has been developed based on metric learning algorithm, in order to detect the catalytic sites effectively. Along with enzymes, enzyme-related biomolecules, such as ligand molecules including intermediates, have also been analyzed.
Furthermore, we are now focusing on the enzymes involved in secondary metabolism for the TRAHED project.

[1] Nagano (2005) Nucleic Acids Research, 33, D407-D412.
[2] Nagano et al. (2007) PROTEINS: Structure, Function, and Bioinformatics, 66, 147-159.
[3] Kato & Nagano (2010) Bioinformatics, 26, 2698-2704.



[P34]
Kunie Sakurai1, Uma D. Vempati2, Stephan C. Schurer2, and Vance P. Lemmon1,2

1, The Miami Project to Cure Paralysis, Department of Neurological Surgery, University of Miami, Miami, Florida, United States of America, 2, Center for Computational Science, University of Miami, Miami, Florida, United States of America
"BioAssay Ontology: a tool for enabling robust annotation and description of diverse biological screening assays"

High Content Screening (HCS) allows quantification of complex biological events using automated imaging. A single HCS assay produces very large numbers of images and metadata associated with the screen. Image analysis typically provides many measurements that are used to calculate endpoints for the research target. HCS assay data is found in the literature and also in public databases (e.g. PubChem). However, a major hindrance to the interrogation and analysis of public HCS data is the lack of an accepted standard terminology to describe HCS assays and results from image analysis. The BioAssay Ontology (BAO) was previously developed by our group to address this challenge which is essential for the full utilization of available data in the domain of drug and probe discovery. In the current study, we extend BAO to provide standardized terminology and formal description of HCS assays, -cellular phenotypes and -screening results.



[P35]
Hiroto SASAKI1, Kazuhiro MUKAIYAMA1, Kei KANIE1,2, Wakana YAMAMOTO1, Yasujiro KIYOTA3, Hiroyuki HONDA1, Ryuji KATO1,2

1 Department of Biotechnology, Graduate School of Engineering, Nagoya University, 2 Department of Basic Medicinal Sciences, Graduate School of Pharmaceutical Sciences, Nagoya University, 3 Nikon Instruments Company
"Morphology-based non-invasive evaluation of siRNA effects"
"細胞形態情報解析に基づいたsiRNA機能性の非破壊評価"

Nucleic acid medicines (e.g. siRNA, miRNA) are expected as new drug strategies. In screening process of these medicines, endpoint assay are used in cell-based evaluation for human cells. However, real-time evaluation of functionalities is more informative than endpoint evaluation.
We constructed image-based system for evaluation cell qualities from cell morphological information. We collected phase contrast images (non-stained images) continuously and extracted time-course cell morphological information from these images using original image processing techniques. Then, we analyzed relationship between the morphological features and biological experimental results using multi-parametric machine learning algorithms.
In our poster presentation, we propose the application of morphology-based non-invasive and quantitative evaluation system for screening of siRNA effects.



[P36]
Bahtiyor Nosirov1, Shunsuke Teraguchi2, Alexis Vandenbon1, Edward Wijaya1, Yoriko Suenari1,Daron Standley1

1 Systems Immunology Laboratory, Immunology Frontier Research Center (iFReC), Osaka University, 2 Quantitative Immunology Research Unit, Immunology Frontier Research Center (iFReC), Osaka University
"Identification of MicroRNA Biomarkers for the Prediction of Adverse Reactions to Immunization"

In this study, we present a large-scale analysis of serum miRNA levels in children before vaccination with an influenza vaccine, and subsequent development of febrile episodes and efficiency of vaccination (as measured by increase in antibody titer). Our goal is to find serum miRNA(s) as a biomarker whose level can be used for susceptibility to adverse reactions (e.g. fever) to vaccination. We employed regression analysis combined with bootstrapping and perform computational validations of the results. We have found several miRNAs that can predict adverse reactions (e.g. fever) in patients to vaccination.


Page Top