Wikipedia Information Flow Analysis Reveals the Scale-Free Architecture of the Semantic Space

Written by Scott Christley et al. on February 28, 2011 – 8:00 am -

In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the semantic space is characterised by scale-free behaviour at different levels of complexity and this relates the semantic space to a wide range of biological, social and linguistics phenomena. In particular we find that the cluster size distribution, representing the size of different semantic areas, is scale-free. Moreover the topology of the resulting semantic space is scale-free in the connectivity distribution and displays small-world properties. However its statistical properties do not allow a classical interpretation via a generative model based on a simple multiplicative process. After giving a detailed description and interpretation of the topological properties of the semantic space, we introduce a stochastic model of content-based network, based on a copy and mutation algorithm and on the Heaps' law, that is able to capture the main statistical properties of the analysed semantic space, including the Zipf's law for the word frequency distribution.


Tags: , ,
Posted in Computer Science | Comments Off

Heart Rate Variability Dynamics for the Prognosis of Cardiovascular Risk

Written by Scott Christley et al. on February 28, 2011 – 8:00 am -

Statistical, spectral, multi-resolution and non-linear methods were applied to heart rate variability (HRV) series linked with classification schemes for the prognosis of cardiovascular risk. A total of 90 HRV records were analyzed: 45 from healthy subjects and 45 from cardiovascular risk patients. A total of 52 features from all the analysis methods were evaluated using standard two-sample Kolmogorov-Smirnov test (KS-test). The results of the statistical procedure provided input to multi-layer perceptron (MLP) neural networks, radial basis function (RBF) neural networks and support vector machines (SVM) for data classification. These schemes showed high performances with both training and test sets and many combinations of features (with a maximum accuracy of 96.67%). Additionally, there was a strong consideration for breathing frequency as a relevant feature in the HRV analysis.


Tags: , ,
Posted in Computer Science | Comments Off

Mechanical Influences on Morphogenesis of the Knee Joint Revealed through Morphological, Molecular and Computational Analysis of Immobilised Embryos

Written by Scott Christley et al. on February 28, 2011 – 8:00 am -

Very little is known about the regulation of morphogenesis in synovial joints. Mechanical forces generated from muscle contractions are required for normal development of several aspects of normal skeletogenesis. Here we show that biophysical stimuli generated by muscle contractions impact multiple events during chick knee joint morphogenesis influencing differential growth of the skeletal rudiment epiphyses and patterning of the emerging tissues in the joint interzone. Immobilisation of chick embryos was achieved through treatment with the neuromuscular blocking agent Decamethonium Bromide. The effects on development of the knee joint were examined using a combination of computational modelling to predict alterations in biophysical stimuli, detailed morphometric analysis of 3D digital representations, cell proliferation assays and in situ hybridisation to examine the expression of a selected panel of genes known to regulate joint development. This work revealed the precise changes to shape, particularly in the distal femur, that occur in an altered mechanical environment, corresponding to predicted changes in the spatial and dynamic patterns of mechanical stimuli and region specific changes in cell proliferation rates. In addition, we show altered patterning of the emerging tissues of the joint interzone with the loss of clearly defined and organised cell territories revealed by loss of characteristic interzone gene expression and abnormal expression of cartilage markers. This work shows that local dynamic patterns of biophysical stimuli generated from muscle contractions in the embryo act as a source of positional information guiding patterning and morphogenesis of the developing knee joint.


Tags: , ,
Posted in Computatioanl biology | Comments Off

New Implications on Genomic Adaptation Derived from the Helicobacter pylori Genome Comparison

Written by Scott Christley et al. on February 28, 2011 – 8:00 am -

Background

Helicobacter pylori has a reduced genome and lives in a tough environment for long-term persistence. It evolved with its particular characteristics for biological adaptation. Because several H. pylori genome sequences are available, comparative analysis could help to better understand genomic adaptation of this particular bacterium.

Principal Findings

We analyzed nine H. pylori genomes with emphasis on microevolution from a different perspective. Inversion was an important factor to shape the genome structure. Illegitimate recombination not only led to genomic inversion but also inverted fragment duplication, both of which contributed to the creation of new genes and gene family, and further, homological recombination contributed to events of inversion. Based on the information of genomic rearrangement, the first genome scaffold structure of H. pylori last common ancestor was produced. The core genome consists of 1186 genes, of which 22 genes could particularly adapt to human stomach niche. H. pylori contains high proportion of pseudogenes whose genesis was principally caused by homopolynucleotide (HPN) mutations. Such mutations are reversible and facilitate the control of gene expression through the change of DNA structure. The reversible mutations and a quasi-panmictic feature could allow such genes or gene fragments frequently transferred within or between populations. Hence, pseudogenes could be a reservoir of adaptation materials and the HPN mutations could be favorable to H. pylori adaptation, leading to HPN accumulation on the genomes, which corresponds to a special feature of Helicobacter species: extremely high HPN composition of genome.

Conclusion

Our research demonstrated that both genome content and structure of H. pylori have been highly adapted to its particular life style.


Tags: , ,
Posted in Computatioanl biology | Comments Off

Identification of Anchor Genes during Kidney Development Defines Ontological Relationships, Molecular Subcompartments and Regulatory Pathways

Written by Scott Christley et al. on February 28, 2011 – 8:00 am -

The development of the mammalian kidney is well conserved from mouse to man. Despite considerable temporal and spatial data on gene expression in mammalian kidney development, primarily in rodent species, there is a paucity of genes whose expression is absolutely specific to a given anatomical compartment and/or developmental stage, defined here as ‘anchor’ genes. We previously generated an atlas of gene expression in the developing mouse kidney using microarray analysis of anatomical compartments collected via laser capture microdissection. Here, this data is further analysed to identify anchor genes via stringent bioinformatic filtering followed by high resolution section in situ hybridisation performed on 200 transcripts selected as specific to one of 11 anatomical compartments within the midgestation mouse kidney. A total of 37 anchor genes were identified across 6 compartments with the early proximal tubule being the compartment richest in anchor genes. Analysis of minimal and evolutionarily conserved promoter regions of this set of 25 anchor genes identified enrichment of transcription factor binding sites for Hnf4a and Hnf1b, RbpJ (Notch signalling), PPARγ:RxRA and COUP-TF family transcription factors. This was reinforced by GO analyses which also identified these anchor genes as targets in processes including epithelial proliferation and proximal tubular function. As well as defining anchor genes, this large scale validation of gene expression identified a further 92 compartment-enriched genes able to subcompartmentalise key processes during murine renal organogenesis spatially or ontologically. This included a cohort of 13 ureteric epithelial genes revealing previously unappreciated compartmentalisation of the collecting duct system and a series of early tubule genes suggesting that segmentation into proximal tubule, loop of Henle and distal tubule does not occur until the onset of glomerular vascularisation. Overall, this study serves to illuminate previously ill-defined stages of patterning and will enable further refinement of the lineage relationships within mammalian kidney development.


Tags: , ,
Posted in Computatioanl biology | Comments Off

Genomic Profiling of Advanced-Stage Oral Cancers Reveals Chromosome 11q Alterations as Markers of Poor Clinical Outcome

Written by Scott Christley et al. on February 28, 2011 – 8:00 am -

Identifying oral cancer lesions associated with high risk of relapse and predicting clinical outcome remain challenging questions in clinical practice. Genomic alterations may add prognostic information and indicate biological aggressiveness thereby emphasizing the need for genome-wide profiling of oral cancers. High-resolution array comparative genomic hybridization was performed to delineate the genomic alterations in clinically annotated primary gingivo-buccal complex and tongue cancers (n = 60). The specific genomic alterations so identified were evaluated for their potential clinical relevance. Copy-number changes were observed on chromosomal arms with most frequent gains on 3q (60%), 5p (50%), 7p (50%), 8q (73%), 11q13 (47%), 14q11.2 (47%), and 19p13.3 (58%) and losses on 3p14.2 (55%) and 8p (83%). Univariate statistical analysis with correction for multiple testing revealed chromosomal gain of region 11q22.1–q22.2 and losses of 17p13.3 and 11q23–q25 to be associated with loco-regional recurrence (P = 0.004, P = 0.003, and P = 0.0003) and shorter survival (P = 0.009, P = 0.003, and P 0.0001) respectively. The gain of 11q22 and loss of 11q23-q25 were validated by interphase fluorescent in situ hybridization (I-FISH). This study identifies a tractable number of genomic alterations with few underlying genes that may potentially be utilized as biological markers for prognosis and treatment decisions in oral cancers.


Tags: , ,
Posted in Computatioanl biology | Comments Off

Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods

Written by Scott Christley et al. on February 28, 2011 – 8:00 am -

The expression microarray is a frequently used approach to study gene expression on a genome-wide scale. However, the data produced by the thousands of microarray studies published annually are confounded by “batch effects,” the systematic error introduced when samples are processed in multiple batches. Although batch effects can be reduced by careful experimental design, they cannot be eliminated unless the whole study is done in a single batch. A number of programs are now available to adjust microarray data for batch effects prior to analysis. We systematically evaluated six of these programs using multiple measures of precision, accuracy and overall performance. ComBat, an Empirical Bayes method, outperformed the other five programs by most metrics. We also showed that it is essential to standardize expression data at the probe level when testing for correlation of expression profiles, due to a sizeable probe effect in microarray data that can inflate the correlation among replicates and unrelated samples.


Tags: , ,
Posted in Computatioanl biology | Comments Off

Phylogenetic Diversity and Genotypical Complexity of H9N2 Influenza A Viruses Revealed by Genomic Sequence Analysis

Written by Scott Christley et al. on February 28, 2011 – 8:00 am -

H9N2 influenza A viruses have become established worldwide in terrestrial poultry and wild birds, and are occasionally transmitted to mammals including humans and pigs. To comprehensively elucidate the genetic and evolutionary characteristics of H9N2 influenza viruses, we performed a large-scale sequence analysis of 571 viral genomes from the NCBI Influenza Virus Resource Database, representing the spectrum of H9N2 influenza viruses isolated from 1966 to 2009. Our study provides a panoramic framework for better understanding the genesis and evolution of H9N2 influenza viruses, and for describing the history of H9N2 viruses circulating in diverse hosts. Panorama phylogenetic analysis of the eight viral gene segments revealed the complexity and diversity of H9N2 influenza viruses. The 571 H9N2 viral genomes were classified into 74 separate lineages, which had marked host and geographical differences in phylogeny. Panorama genotypical analysis also revealed that H9N2 viruses include at least 98 genotypes, which were further divided according to their HA lineages into seven series (A–G). Phylogenetic analysis of the internal genes showed that H9N2 viruses are closely related to H3, H4, H5, H7, H10, and H14 subtype influenza viruses. Our results indicate that H9N2 viruses have undergone extensive reassortments to generate multiple reassortants and genotypes, suggesting that the continued circulation of multiple genotypical H9N2 viruses throughout the world in diverse hosts has the potential to cause future influenza outbreaks in poultry and epidemics in humans. We propose a nomenclature system for identifying and unifying all lineages and genotypes of H9N2 influenza viruses in order to facilitate international communication on the evolution, ecology and epidemiology of H9N2 influenza viruses.


Tags: , ,
Posted in Computatioanl biology | Comments Off

Frequent and Simultaneous Epigenetic Inactivation of TP53 Pathway Genes in Acute Lymphoblastic Leukemia

Written by Scott Christley et al. on February 28, 2011 – 8:00 am -

Aberrant DNA methylation is one of the most frequent alterations in patients with Acute Lymphoblastic Leukemia (ALL). Using methylation bead arrays we analyzed the methylation status of 807 genes implicated in cancer in a group of ALL samples at diagnosis (n = 48). We found that 154 genes were methylated in more than 10% of ALL samples. Interestingly, the expression of 13 genes implicated in the TP53 pathway was downregulated by hypermethylation. Direct or indirect activation of TP53 pathway with 5-aza-2′-deoxycitidine, Curcumin or Nutlin-3 induced an increase in apoptosis of ALL cells. The results obtained with the initial group of 48 patients was validated retrospectively in a second cohort of 200 newly diagnosed ALL patients. Methylation of at least 1 of the 13 genes implicated in the TP53 pathway was observed in 78% of the patients, which significantly correlated with a higher relapse (p = 0.001) and mortality (p<0.001) rate being an independent prognostic factor for disease-free survival (DFS) (p = 0.006) and overall survival (OS) (p = 0.005) in the multivariate analysis. All these findings indicate that TP53 pathway is altered by epigenetic mechanisms in the majority of ALL patients and correlates with prognosis. Treatments with compounds that may reverse the epigenetic abnormalities or activate directly the p53 pathway represent a new therapeutic alternative for patients with ALL.


Tags: , ,
Posted in Computatioanl biology | Comments Off

Practical and Theoretical Considerations in Study Design for Detecting Gene-Gene Interactions Using MDR and GMDR Approaches

Written by Scott Christley et al. on February 28, 2011 – 8:00 am -

Detection of interacting risk factors for complex traits is challenging. The choice of an appropriate method, sample size, and allocation of cases and controls are serious concerns. To provide empirical guidelines for planning such studies and data analyses, we investigated the performance of the multifactor dimensionality reduction (MDR) and generalized MDR (GMDR) methods under various experimental scenarios. We developed the mathematical expectation of accuracy and used it as an indicator parameter to perform a gene-gene interaction study. We then examined the statistical power of GMDR and MDR within the plausible range of accuracy (0.50~0.65) reported in the literature. The GMDR with covariate adjustment had a power of>80% in a case-control design with a sample size of≥2000, with theoretical accuracy ranging from 0.56 to 0.62. However, when the accuracy was<0.56, a sample size of≥4000 was required to have sufficient power. In our simulations, the GMDR outperformed the MDR under all models with accuracy ranging from 0.56~0.62 for a sample size of 1000–2000. However, the two methods performed similarly when the accuracy was outside this range or the sample was significantly larger. We conclude that with adjustment of a covariate, GMDR performs better than MDR and a sample size of 1000~2000 is reasonably large for detecting gene-gene interactions in the range of effect size reported by the current literature; whereas larger sample size is required for more subtle interactions with accuracy<0.56.


Tags: , ,
Posted in Computatioanl biology | Comments Off
RSS