Archive for July, 2011
Integrative Network Biology: Graph Prototyping for Co-Expression Cancer Networks
Written by Scott Christley et al. on July 29, 2011 – 9:00 pm -by Karl G. Kugler, Laurin A. J. Mueller, Armin Graber, Matthias Dehmer
Network-based analysis has been proven useful in biologically-oriented areas, e.g., to explore the dynamics and complexity of biological networks. Investigating a set of networks allows deriving general knowledge about the underlying topological and functional properties. The integrative analysis of networks typically combines networks from different studies that investigate the same or similar research questions. In order to perform an integrative analysis it is often necessary to compare the properties of matching edges across the data set. This identification of common edges is often burdensome and computational intensive. Here, we present an approach that is different from inferring a new network based on common features. Instead, we select one network as a graph prototype, which then represents a set of comparable network objects, as it has the least average distance to all other networks in the same set. We demonstrate the usefulness of the graph prototyping approach on a set of prostate cancer networks and a set of corresponding benign networks. We further show that the distances within the cancer group and the benign group are statistically different depending on the utilized distance measure.Tags: computer, news, science
Posted in Computer Science | Comments Off
Modeling Disordered Regions in Proteins Using Rosetta
Written by Scott Christley et al. on July 29, 2011 – 9:00 pm -by Ray Yu-Ruei Wang, Yan Han, Kristina Krassovsky, William Sheffler, Michael Tyka, David Baker
Protein structure prediction methods such as Rosetta search for the lowest energy conformation of the polypeptide chain. However, the experimentally observed native state is at a minimum of the free energy, rather than the energy. The neglect of the missing configurational entropy contribution to the free energy can be partially justified by the assumption that the entropies of alternative folded states, while very much less than unfolded states, are not too different from one another, and hence can be to a first approximation neglected when searching for the lowest free energy state. The shortcomings of current structure prediction methods may be due in part to the breakdown of this assumption. Particularly problematic are proteins with significant disordered regions which do not populate single low energy conformations even in the native state. We describe two approaches within the Rosetta structure modeling methodology for treating such regions. The first does not require advance knowledge of the regions likely to be disordered; instead these are identified by minimizing a simple free energy function used previously to model protein folding landscapes and transition states. In this model, residues can be either completely ordered or completely disordered; they are considered disordered if the gain in entropy outweighs the loss of favorable energetic interactions with the rest of the protein chain. The second approach requires identification in advance of the disordered regions either from sequence alone using for example the DISOPRED server or from experimental data such as NMR chemical shifts. During Rosetta structure prediction calculations the disordered regions make only unfavorable repulsive contributions to the total energy. We find that the second approach has greater practical utility and illustrate this with examples from de novo structure prediction, NMR structure calculation, and comparative modeling.Tags: biology, computing, news
Posted in Computatioanl biology | Comments Off
Condition-Dependent Cell Volume and Concentration of Escherichia coli to Facilitate Data Conversion for Systems Biology Modeling
Written by Scott Christley et al. on July 29, 2011 – 9:00 pm -by Benjamin Volkmer, Matthias Heinemann
Systems biology modeling typically requires quantitative experimental data such as intracellular concentrations or copy numbers per cell. In order to convert population-averaging omics measurement data to intracellular concentrations or cellular copy numbers, the total cell volume and number of cells in a sample need to be known. Unfortunately, even for the often studied model bacterium Escherichia coli this information is hardly available and furthermore, certain measures (e.g. cell volume) are also dependent on the growth condition. In this work, we have determined these basic data for E. coli cells when grown in 22 different conditions so that respective data conversions can be done correctly. First, we determine growth-rate dependent cell volumes. Second, we show that in a 1 ml E. coli sample at an optical density (600 nm) of 1 the total cell volume is around 3.6 µl for all conditions tested. Third, we demonstrate that the cell number in a sample can be determined on the basis of the sample's optical density and the cells' growth rate. The data presented will allow for conversion of E. coli measurement data normalized to optical density into volumetric cellular concentrations and copy numbers per cell - two important parameters for systems biology model development.Tags: biology, computing, news
Posted in Computatioanl biology | Comments Off
Prediction of Body Fluids where Proteins are Secreted into Based on Protein Interaction Network
Written by Scott Christley et al. on July 29, 2011 – 9:00 pm -by Le-Le Hu, Tao Huang, Yu-Dong Cai, Kuo-Chen Chou
Determining the body fluids where secreted proteins can be secreted into is important for protein function annotation and disease biomarker discovery. In this study, we developed a network-based method to predict which kind of body fluids human proteins can be secreted into. For a newly constructed benchmark dataset that consists of 529 human-secreted proteins, the prediction accuracy for the most possible body fluid location predicted by our method via the jackknife test was 79.02%, significantly higher than the success rate by a random guess (29.36%). The likelihood that the predicted body fluids of the first four orders contain all the true body fluids where the proteins can be secreted into is 62.94%. Our method was further demonstrated with two independent datasets: one contains 57 proteins that can be secreted into blood; while the other contains 61 proteins that can be secreted into plasma/serum and were possible biomarkers associated with various cancers. For the 57 proteins in first dataset, 55 were correctly predicted as blood-secrete proteins. For the 61 proteins in the second dataset, 58 were predicted to be most possible in plasma/serum. These encouraging results indicate that the network-based prediction method is quite promising. It is anticipated that the method will benefit the relevant areas for both basic research and drug development.Tags: biology, computing, news
Posted in Computatioanl biology | Comments Off
Differential Gene Expression in the Siphonophore Nanomia bijuga (Cnidaria) Assessed with Multiple Next-Generation Sequencing Workflows
Written by Scott Christley et al. on July 29, 2011 – 9:00 pm -by Stefan Siebert, Mark D. Robinson, Sophia C. Tintori, Freya Goetz, Rebecca R. Helm, Stephen A. Smith, Nathan Shaner, Steven H. D. Haddock, Casey W. Dunn
We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing.Tags: biology, computing, news
Posted in Computatioanl biology | Comments Off
Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs
Written by Scott Christley et al. on July 29, 2011 – 9:00 pm -by Zhen Chen, Yong-Zi Chen, Xiao-Feng Wang, Chuan Wang, Ren-Xiang Yan, Ziding Zhang
As one of the most important reversible protein post-translation modifications, ubiquitination has been reported to be involved in lots of biological processes and closely implicated with various diseases. To fully decipher the molecular mechanisms of ubiquitination-related biological processes, an initial but crucial step is the recognition of ubiquitylated substrates and the corresponding ubiquitination sites. Here, a new bioinformatics tool named CKSAAP_UbSite was developed to predict ubiquitination sites from protein sequences. With the assistance of Support Vector Machine (SVM), the highlight of CKSAAP_UbSite is to employ the composition of k-spaced amino acid pairs surrounding a query site (i.e. any lysine in a query sequence) as input. When trained and tested in the dataset of yeast ubiquitination sites (Radivojac et al, Proteins, 2010, 78: 365–380), a 100-fold cross-validation on a 1∶1 ratio of positive and negative samples revealed that the accuracy and MCC of CKSAAP_UbSite reached 73.40% and 0.4694, respectively. The proposed CKSAAP_UbSite has also been intensively benchmarked to exhibit better performance than some existing predictors, suggesting that it can be served as a useful tool to the community. Currently, CKSAAP_UbSite is freely accessible at http://protein.cau.edu.cn/cksaap_ubsite/. Moreover, we also found that the sequence patterns around ubiquitination sites are not conserved across different species. To ensure a reasonable prediction performance, the application of the current CKSAAP_UbSite should be limited to the proteome of yeast.Tags: biology, computing, news
Posted in Computatioanl biology | Comments Off
Quantitative Analysis of Protein Phosphorylations and Interactions by Multi-Colour IP-FCM as an Input for Kinetic Modelling of Signalling Networks
Written by Scott Christley et al. on July 29, 2011 – 9:00 pm -by Sumit Deswal, Anna K. Schulze, Thomas Höfer, Wolfgang W. A. Schamel
BackgroundTo understand complex biological signalling mechanisms, mathematical modelling of signal transduction pathways has been applied successfully in last few years. However, precise quantitative measurements of signal transduction events such as activation-dependent phosphorylation of proteins, remains one bottleneck to this success.
Methodology/Principal FindingsWe use multi-colour immunoprecipitation measured by flow cytometry (IP-FCM) for studying signal transduction events to unrivalled precision. In this method, antibody-coupled latex beads capture the protein of interest from cellular lysates and are then stained with differently fluorescent-labelled antibodies to quantify the amount of the immunoprecipitated protein, of an interaction partner and of phosphorylation sites. The fluorescence signals are measured by FCM. Combining this procedure with beads containing defined amounts of a fluorophore allows retrieving absolute numbers of stained proteins, and not only relative values. Using IP-FCM we derived multidimensional data on the membrane-proximal T-cell antigen receptor (TCR-CD3) signalling network, including the recruitment of the kinase ZAP70 to the TCR-CD3 and subsequent ZAP70 activation by phosphorylation in the murine T-cell hybridoma and primary murine T cells. Counter-intuitively, these data showed that cell stimulation by pervanadate led to a transient decrease of the phospho-ZAP70/ZAP70 ratio at the TCR. A mechanistic mathematical model of the underlying processes demonstrated that an initial massive recruitment of non-phosphorylated ZAP70 was responsible for this behaviour. Further, the model predicted a temporal order of multisite phosphorylation of ZAP70 (with Y319 phosphorylation preceding phosphorylation at Y493) that we subsequently verified experimentally.
Conclusions/SignificanceThe quantitative data sets generated by IP-FCM are one order of magnitude more precise than Western blot data. This accuracy allowed us to gain unequalled insight into the dynamics of the TCR-CD3-ZAP70 signalling network.
Tags: biology, computing, news
Posted in Computatioanl biology | Comments Off
Toward an Understanding of the Molecular Mechanisms of Barnacle Larval Settlement: A Comparative Transcriptomic Approach
Written by Scott Christley et al. on July 29, 2011 – 9:00 pm -by Zhang-Fan Chen, Kiyotaka Matsumura, Hao Wang, Shawn M. Arellano, Xingcheng Yan, Intikhab Alam, John A. C. Archer, Vladimir B. Bajic, Pei-Yuan Qian
BackgroundThe barnacle Balanus amphitrite is a globally distributed biofouler and a model species in intertidal ecology and larval settlement studies. However, a lack of genomic information has hindered the comprehensive elucidation of the molecular mechanisms coordinating its larval settlement. The pyrosequencing-based transcriptomic approach is thought to be useful to identify key molecular changes during larval settlement.
Methodology and Principal FindingsUsing 454 pyrosequencing, we collected totally 630,845 reads including 215,308 from the larval stages and 415,537 from the adults; 23,451 contigs were generated while 77,785 remained as singletons. We annotated 31,720 of the 92,322 predicted open reading frames, which matched hits in the NCBI NR database, and identified 7,954 putative genes that were differentially expressed between the larval and adult stages. Of these, several genes were further characterized with quantitative real-time PCR and in situ hybridization, revealing some key findings: 1) vitellogenin was uniquely expressed in late nauplius stage, suggesting it may be an energy source for the subsequent non-feeding cyprid stage; 2) the locations of mannose receptors suggested they may be involved in the sensory system of cyprids; 3) 20 kDa-cement protein homologues were expressed in the cyprid cement gland and probably function during attachment; and 4) receptor tyrosine kinases were expressed higher in cyprid stage and may be involved in signal perception during larval settlement.
ConclusionsOur results provide not only the basis of several new hypotheses about gene functions during larval settlement, but also the availability of this large transcriptome dataset in B. amphitrite for further exploration of larval settlement and developmental pathways in this important marine species.
Tags: biology, computing, news
Posted in Computatioanl biology | Comments Off
How Predation and Landscape Fragmentation Affect Vole Population Dynamics
Written by Scott Christley et al. on July 29, 2011 – 9:00 pm -by Trine Dalkvist, Richard M. Sibly, Chris J. Topping
BackgroundMicrotine species in Fennoscandia display a distinct north-south gradient from regular cycles to stable populations. The gradient has often been attributed to changes in the interactions between microtines and their predators. Although the spatial structure of the environment is known to influence predator-prey dynamics of a wide range of species, it has scarcely been considered in relation to the Fennoscandian gradient. Furthermore, the length of microtine breeding season also displays a north-south gradient. However, little consideration has been given to its role in shaping or generating population cycles. Because these factors covary along the gradient it is difficult to distinguish their effects experimentally in the field. The distinction is here attempted using realistic agent-based modelling.
Methodology/Principal FindingsBy using a spatially explicit computer simulation model based on behavioural and ecological data from the field vole (Microtus agrestis), we generated a number of repeated time series of vole densities whose mean population size and amplitude were measured. Subsequently, these time series were subjected to statistical autoregressive modelling, to investigate the effects on vole population dynamics of making predators more specialised, of altering the breeding season, and increasing the level of habitat fragmentation. We found that fragmentation as well as the presence of specialist predators are necessary for the occurrence of population cycles. Habitat fragmentation and predator assembly jointly determined cycle length and amplitude. Length of vole breeding season had little impact on the oscillations.
SignificanceThere is good agreement between our results and the experimental work from Fennoscandia, but our results allow distinction of causation that is hard to unravel in field experiments. We hope our results will help understand the reasons for cycle gradients observed in other areas. Our results clearly demonstrate the importance of landscape fragmentation for population cycling and we recommend that the degree of fragmentation be more fully considered in future analyses of vole dynamics.
Tags: biology, computing, news
Posted in Computatioanl biology | Comments Off
Evolution of Trefoil Factor(s): Genetic and Spatio-Temporal Expression of Trefoil Factor 2 in the Chicken (Gallus Gallus Domesticus)
Written by Scott Christley et al. on July 29, 2011 – 9:00 pm -by Zhengyu Jiang, Amy C. Lossie, Todd J. Applegate
Trefoil factors are essential healing initiators participating in mucosal reconstitution and tissue morphogenesis, especially on the surfaces of the gastrointestinal tract. This family has been cloned and characterized predominantly from mammals and amphibians. Avian species ingest stone and grit to help digest food, which may expose their gut to severe physical conditions. To further the understanding of the function of the TFF gene family across species, we undertook this research to clone, sequence, and characterize the spatio-temporal expression patterns of chicken TFF2 (ChTFF2) cDNA. Bioinformatics analysis of the promoter region and deduced amino acid sequence demonstrated that ChTFF2 contained unique characteristics; specifically the chicken promoter has multiple start sites and the protein contains a series of Lys-Lys-Val repeats. Unlike mammals, where TFF2 is detected primarily in the stomach, and occasionally in the proximal duodenum, chicken TFF2 transcripts are found throughout the gastrointestinal tract, with major expression sites in the glandular and muscular stomach as well as evident expression in the colon, small intestine, cecal tonsil and crop. Temporal analysis of intestinal ChTFF2 transcripts by quantitative RT-PCR showed high levels in embryos and a trend of constant expression during embryonic and post-hatch development, with a reduction occurring around hatch. Phylogenetic analysis highlighted the conservation of TFF proteins and functional divergence of trefoil domains, which suggest a transitional role in the bird during evolution.Tags: biology, computing, news
Posted in Computatioanl biology | Comments Off
