Prediction and Analysis of Protein Hydroxyproline and Hydroxylysine

Written by Scott Christley et al. on December 31, 2010 – 8:00 am -

Background

Hydroxylation is an important post-translational modification and closely related to various diseases. Besides the biotechnology experiments, in silico prediction methods are alternative ways to identify the potential hydroxylation sites.

Methodology/Principal Findings

In this study, we developed a novel sequence-based method for identifying the two main types of hydroxylation sites – hydroxyproline and hydroxylysine. First, feature selection was made on three kinds of features consisting of amino acid indices (AAindex) which includes various physicochemical properties and biochemical properties of amino acids, Position-Specific Scoring Matrices (PSSM) which represent evolution information of amino acids and structural disorder of amino acids in the sliding window with length of 13 amino acids, then the prediction model were built using incremental feature selection method. As a result, the prediction accuracies are 76.0% and 82.1%, evaluated by jackknife cross-validation on the hydroxyproline dataset and hydroxylysine dataset, respectively. Feature analysis suggested that physicochemical properties and biochemical properties and evolution information of amino acids contribute much to the identification of the protein hydroxylation sites, while structural disorder had little relation to protein hydroxylation. It was also found that the amino acid adjacent to the hydroxylation site tends to exert more influence than other sites on hydroxylation determination.

Conclusions/Significance

These findings may provide useful insights for exploiting the mechanisms of hydroxylation.


Tags: , ,
Posted in Computatioanl biology | Comments Off

Mammalian Stem Cells Reprogramming in Response to Terahertz Radiation

Written by Scott Christley et al. on December 31, 2010 – 8:00 am -

We report that extended exposure to broad-spectrum terahertz radiation results in specific changes in cellular functions that are closely related to DNA-directed gene transcription. Our gene chip survey of gene expression shows that whereas 89% of the protein coding genes in mouse stem cells do not respond to the applied terahertz radiation, certain genes are activated, while other are repressed. RT-PCR experiments with selected gene probes corresponding to transcripts in the three groups of genes detail the gene specific effect. The response was not only gene specific but also irradiation conditions dependent. Our findings suggest that the applied terahertz irradiation accelerates cell differentiation toward adipose phenotype by activating the transcription factor peroxisome proliferator-activated receptor gamma (PPARG). Finally, our molecular dynamics computer simulations indicate that the local breathing dynamics of the PPARG promoter DNA coincides with the gene specific response to the THz radiation. We propose that THz radiation is a potential tool for cellular reprogramming.


Tags: , ,
Posted in Computatioanl biology | Comments Off

Evolutionary Distances in the Twilight Zone—A Rational Kernel Approach

Written by Scott Christley et al. on December 31, 2010 – 8:00 am -

Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.


Tags: , ,
Posted in Computatioanl biology | Comments Off

The Molecular Evolution of the p120-Catenin Subfamily and Its Functional Associations

Written by Scott Christley et al. on December 31, 2010 – 8:00 am -

Background

p120-catenin (p120) is the prototypical member of a subclass of armadillo-related proteins that includes δ-catenin/NPRAP, ARVCF, p0071, and the more distantly related plakophilins 1–3. In vertebrates, p120 is essential in regulating surface expression and stability of all classical cadherins, and directly interacts with Kaiso, a BTB/ZF family transcription factor.

Methodology/Principal Findings

To clarify functional relationships between these proteins and how they relate to the classical cadherins, we have examined the proteomes of 14 diverse vertebrate and metazoan species. The data reveal a single ancient δ-catenin-like p120 family member present in the earliest metazoans and conserved throughout metazoan evolution. This single p120 family protein is present in all protostomes, and in certain early-branching chordate lineages. Phylogenetic analyses suggest that gene duplication and functional diversification into “p120-like” and “δ-catenin-like” proteins occurred in the urochordate-vertebrate ancestor. Additional gene duplications during early vertebrate evolution gave rise to the seven vertebrate p120 family members. Kaiso family members (i.e., Kaiso, ZBTB38 and ZBTB4) are found only in vertebrates, their origin following that of the p120-like gene lineage and coinciding with the evolution of vertebrate-specific mechanisms of epigenetic gene regulation by CpG island methylation.

Conclusions/Significance

The p120 protein family evolved from a common δ-catenin-like ancestor present in all metazoans. Through several rounds of gene duplication and diversification, however, p120 evolved in vertebrates into an essential, ubiquitously expressed protein, whereas loss of the more selectively expressed δ-catenin, p0071 and ARVCF are tolerated in most species. Together with phylogenetic studies of the vertebrate cadherins, our data suggest that the p120-like and δ-catenin-like genes co-evolved separately with non-neural (E- and P-cadherin) and neural (N- and R-cadherin) cadherin lineages, respectively. The expansion of p120 relative to δ-catenin during vertebrate evolution may reflect the pivotal and largely disproportionate role of the non-neural cadherins with respect to evolution of the wide range of somatic morphology present in vertebrates today.


Tags: , ,
Posted in Computatioanl biology | Comments Off

Mutation Detection with Next-Generation Resequencing through a Mediator Genome

Written by Scott Christley et al. on December 31, 2010 – 8:00 am -

The affordability of next generation sequencing (NGS) is transforming the field of mutation analysis in bacteria. The genetic basis for phenotype alteration can be identified directly by sequencing the entire genome of the mutant and comparing it to the wild-type (WT) genome, thus identifying acquired mutations. A major limitation for this approach is the need for an a-priori sequenced reference genome for the WT organism, as the short reads of most current NGS approaches usually prohibit de-novo genome assembly. To overcome this limitation we propose a general framework that utilizes the genome of relative organisms as mediators for comparing WT and mutant bacteria. Under this framework, both mutant and WT genomes are sequenced with NGS, and the short sequencing reads are mapped to the mediator genome. Variations between the mutant and the mediator that recur in the WT are ignored, thus pinpointing the differences between the mutant and the WT. To validate this approach we sequenced the genome of Bdellovibrio bacteriovorus 109J, an obligatory bacterial predator, and its prey-independent mutant, and compared both to the mediator species Bdellovibrio bacteriovorus HD100. Although the mutant and the mediator sequences differed in more than 28,000 nucleotide positions, our approach enabled pinpointing the single causative mutation. Experimental validation in 53 additional mutants further established the implicated gene. Our approach extends the applicability of NGS-based mutant analyses beyond the domain of available reference genomes.


Tags: , ,
Posted in Computatioanl biology | Comments Off

Modeling Single Nucleotide Polymorphisms in the Human AKR1C1 and AKR1C2 Genes: Implications for Functional and Genotyping Analyses

Written by Scott Christley et al. on December 31, 2010 – 8:00 am -

Enzymes encoded by the AKR1C1 and AKR1C2 genes are responsible for the metabolism of progesterone and 5α-dihydrotestosterone (DHT), respectively. The effect of amino acid substitutions, resulting from single nucleotide polymorphisms (SNPs) in the AKR1C2 gene, on the enzyme kinetics of the AKR1C2 gene product were determined experimentally by Takashi et al. In this paper, we used homology modeling to predict and analyze the structure of AKR1C1 and AKR1C2 genetic variants. The experimental reduction in enzyme activity in the AKR1C2 variants F46Y and L172Q, as determined by Takahashi et al., is predicted to be due to increased instability in cofactor binding, caused by disruptions to the hydrogen bonds between NADP and AKR1C2, resulting from the insertion of polar residues into largely non-polar environments near the site of cofactor binding. Other AKR1C2 variants were shown to involve either conservative substitutions or changes taking place on the surface of the molecule and distant from the active site, confirming the experimental finding of Takahashi et al. that these variants do not result in any statistically significant reduction in enzyme activity. The AKR1C1 R258C variant is predicted to have no effect on enzyme activity for similar reasons. Thus, we provide further insight into the molecular mechanism of the enzyme kinetics of these proteins. Our data also highlight previously reported difficulties with online databases.


Tags: , ,
Posted in Computatioanl biology | Comments Off

Robust Computational Analysis of rRNA Hypervariable Tag Datasets

Written by Scott Christley et al. on December 31, 2010 – 8:00 am -

Next-generation DNA sequencing is increasingly being utilized to probe microbial communities, such as gastrointestinal microbiomes, where it is important to be able to quantify measures of abundance and diversity. The fragmented nature of the 16S rRNA datasets obtained, coupled with their unprecedented size, has led to the recognition that the results of such analyses are potentially contaminated by a variety of artifacts, both experimental and computational. Here we quantify how multiple alignment and clustering errors contribute to overestimates of abundance and diversity, reflected by incorrect OTU assignment, corrupted phylogenies, inaccurate species diversity estimators, and rank abundance distribution functions. We show that straightforward procedural optimizations, combining preexisting tools, are effective in handling large () 16S rRNA datasets, and we describe metrics to measure the effectiveness and quality of the estimators obtained. We introduce two metrics to ascertain the quality of clustering of pyrosequenced rRNA data, and show that complete linkage clustering greatly outperforms other widely used methods.


Tags: , ,
Posted in Computatioanl biology | Comments Off

A Knowledge-Based Weighting Framework to Boost the Power of Genome-Wide Association Studies

Written by Scott Christley et al. on December 31, 2010 – 8:00 am -

Background

We are moving to second-wave analysis of genome-wide association studies (GWAS), characterized by comprehensive bioinformatical and statistical evaluation of genetic associations. Existing biological knowledge is very valuable for GWAS, which may help improve their detection power particularly for disease susceptibility loci of moderate effect size. However, a challenging question is how to utilize available resources that are very heterogeneous to quantitatively evaluate the statistic significances.

Methodology/Principal Findings

We present a novel knowledge-based weighting framework to boost power of the GWAS and insightfully strengthen their explorative performance for follow-up replication and deep sequencing. Built upon diverse integrated biological knowledge, this framework directly models both the prior functional information and the association significances emerging from GWAS to optimally highlight single nucleotide polymorphisms (SNPs) for subsequent replication. In the theoretical calculation and computer simulation, it shows great potential to achieve extra over 15% power to identify an association signal of moderate strength or to use hundreds of whole-genome subjects fewer to approach similar power. In a case study on late-onset Alzheimer disease (LOAD) for a proof of principle, it highlighted some genes, which showed positive association with LOAD in previous independent studies, and two important LOAD related pathways. These genes and pathways could be originally ignored due to involved SNPs only having moderate association significance.

Conclusions/Significance

With user-friendly implementation in an open-source Java package, this powerful framework will provide an important complementary solution to identify more true susceptibility loci with modest or even small effect size in current GWAS for complex diseases.


Tags: , ,
Posted in Computatioanl biology | Comments Off

Integrated Profiling of MicroRNAs and mRNAs: MicroRNAs Located on Xq27.3 Associate with Clear Cell Renal Cell Carcinoma

Written by Scott Christley et al. on December 30, 2010 – 8:00 am -

Background

With the advent of second-generation sequencing, the expression of gene transcripts can be digitally measured with high accuracy. The purpose of this study was to systematically profile the expression of both mRNA and miRNA genes in clear cell renal cell carcinoma (ccRCC) using massively parallel sequencing technology.

Methodology

The expression of mRNAs and miRNAs were analyzed in tumor tissues and matched normal adjacent tissues obtained from 10 ccRCC patients without distant metastases. In a prevalence screen, some of the most interesting results were validated in a large cohort of ccRCC patients.

Principal Findings

A total of 404 miRNAs and 9,799 mRNAs were detected to be differentially expressed in the 10 ccRCC patients. We also identified 56 novel miRNA candidates in at least two samples. In addition to confirming that canonical cancer genes and miRNAs (including VEGFA, DUSP9 and ERBB4; miR-210, miR-184 and miR-206) play pivotal roles in ccRCC development, promising novel candidates (such as PNCK and miR-122) without previous annotation in ccRCC carcinogenesis were also discovered in this study. Pathways controlling cell fates (e.g., cell cycle and apoptosis pathways) and cell communication (e.g., focal adhesion and ECM-receptor interaction) were found to be significantly more likely to be disrupted in ccRCC. Additionally, the results of the prevalence screen revealed that the expression of a miRNA gene cluster located on Xq27.3 was consistently downregulated in at least 76.7% of ~50 ccRCC patients.

Conclusions

Our study provided a two-dimensional map of the mRNA and miRNA expression profiles of ccRCC using deep sequencing technology. Our results indicate that the phenotypic status of ccRCC is characterized by a loss of normal renal function, downregulation of metabolic genes, and upregulation of many signal transduction genes in key pathways. Furthermore, it can be concluded that downregulation of miRNA genes clustered on Xq27.3 is associated with ccRCC.


Tags: , ,
Posted in Computatioanl biology | Comments Off

A Large Web-Based Observer Reliability Study of Early Ischaemic Signs on Computed Tomography. The Acute Cerebral CT Evaluation of Stroke Study (ACCESS)

Written by Scott Christley et al. on December 30, 2010 – 8:00 am -

Background

Early signs of ischaemic stroke on computerised tomography (CT) scanning are subtle but CT is the most widely available diagnostic test for stroke. Scoring methods that code for the extent of brain ischaemia may improve stroke diagnosis and quantification of the impact of ischaemia.

Methodology and Principal Findings

We showed CT scans from patients with acute ischaemic stroke (n = 32, with different patient characteristics and ischaemia signs) to doctors in stroke-related specialties world-wide over the web. CT scans were shown twice, randomly and blindly. Observers entered their scan readings, including early ischaemic signs by three scoring methods, into the web database. We compared observers' scorings to a reference standard neuroradiologist using area under receiver operator characteristic curve (AUC) analysis, Cronbach's alpha and logistic regression to determine the effect of scales, patient, scan and observer variables on detection of early ischaemic changes. Amongst 258 readers representing 33 nationalities and six specialties, the AUCs comparing readers with the reference standard detection of ischaemic signs were similar for all scales and both occasions. Being a neuroradiologist, slower scan reading, more pronounced ischaemic signs and later time to CT all improved detection of early ischaemic signs and agreement on the rating scales. Scan quality, stroke severity and number of years of training did not affect agreement.

Conclusions

Large-scale observer reliability studies are possible using web-based tools and inform routine practice. Slower scan reading and use of CT infarct rating scales improve detection of acute ischaemic signs and should be encouraged to improve stroke diagnosis.


Tags: , ,
Posted in Computer Science | Comments Off
RSS