lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

ORACLE.

2018-10-17 00:17:17 | Science News

□ ODESZA / "Meridian"



"the universe is a (gigantic) joint probabilistic model, and some marginal distributions can be described by standard model..."



□ Architectural Principles for Characterizing the Performance of Sequestration Feedback Networks:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/27/428300.full.pdf

The primary focus here is a circuit architecture that uses a sequestration mechanism to implement feedback control in a biomolecular circuit. This circuit immediately had a broad impact on the study of biological feedback systems, as sequestration is both abundant in natural biological contexts and appears to be feasible to implement in synthetic networks. For example, sequestration feedback can be implemented using sense-antisense mRNA pairs, sigma-antisigma factor pairs, or scaffold-antiscaffold pairs.




□ NanoSatellite: Accurate characterization of expanded tandem repeat length and sequence through whole genome long-read sequencing on PromethION.:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/09/439026.full.pdf

“NanoSatellite”, a novel pattern recognition algorithm, which bypasses base calling and alignment, and performs direct Tandem Repeats analysis on raw PromethION squiggles. achieved more than 90% accuracy and high precision (5.6% relative standard deviation). NanoSatellite is based on consecutive rounds of Dynamic Time Warping (DTW), a dynamic programming algorithm to find the optimal alignment between two (unevenly spaced) time series.






□ INSTRAL-ASTRAL: Discordance-aware Phylogenetic Placement using Quartet Scores:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/02/432906.full.pdf

INSTRAL finds the optimal solution to the quartet placement problem. Unlike ASTRAL, the number of possible solutions to the placement problem is small (grows linearly with n), and thus, INSTRAL can solve the problem exactly even for large trees. In principle, it is possible to develop algorithms that compute the quartet score for all possible branches, one at a time, and to select the optimal solution at the end. However, the ASTRAL dynamic programming allows for a more straight-forward solution.






□ AEGIS: Exploratory Gene Ontology Analysis with Interactive Visualization:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/05/436741.full.pdf

AEGIS (Augmented Exploration of the GO with Interactive Simulations) is an interactive information-retrieval framework that enables an investigator to navigate through the entire Gene Ontology graph (tens of thousands of nodes) and focus on fine-grained details without losing the context. AEGIS features interpretable visualization of GO terms, flexible exploratory analysis of the GO DAG (directed acyclic graph) by adopting the focus-and-context framework, reminiscent of classical principles in visual information system design that is biologically grounded.






□ Contour Monte Carlo: Inverse sensitivity analysis of mathematical models avoiding the curse of dimensionality:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/01/432393.full.pdf

The computational complexity of the methods used to conduct inverse sensitivity analyses for deterministic systems has limited their application to models with relatively few parameters. a novel Markov Chain Monte Carlo method we call “Contour Monte Carlo”, which can be used to invert systems with a large number of parameters.

the utility of this method by inverting a range of frequently-used deterministic models of biological systems, including the logistic growth equation, the Michaelis-Menten equation, and an SIR model of disease transmission with nine input parameters. argue that the simplicity of this approach means it is amenable to a large class of problems of practical significance and, more generally, provides a probabilistic framework for understanding the inversion of deterministic models.




□ An information thermodynamic approach quantifying MAPK-related signaling cascades by average entropy production rate:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/01/431676.full.pdf

Signal transduction can be computed by entropy production amount from the fluctuation in the phosphorylation reaction of signaling molecules. By Bayesian analysis of the entropy production rates of individual steps, they are consistent through the signal cascade.






□ New architecture trains a nano-oscillator classifier with standard machine learning algorithms:

>> https://aip.scitation.org/doi/10.1063/1.5042359

they only used the average stable state of the oscillator network, the offline learning algorithm can be applied to temporal signals as well, by inputting a different F at every time-step, and reading a sliding time window average of f(t). The new architecture correctly categorized a larger percentage of the standard data set known as Iris than the reference classifier did. Comparison of results on the Iris data set further highlights the power of the nonmonotonic and interunit interactions.








□ Systematic Prediction of Regulatory Motifs from Human ChIP-Sequencing Data Based on a Deep Learning Framework:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/16/417378.full.pdf

DESSO utilizes deep neural network and binomial distribution to optimize the motif prediction, and the results showed that DESSO outperformed existing tools in predicting distinct motifs from the 690 in vivo ENCODE ChIP- Seq datasets for 161 human TFs in 91 cell lines. designed a first-of-its-kind binomial-based model in DESSO to identify all the significant motif instances, under the statistical hypothesis that the number of random sequence segments which contain the motif of interest in the human genome is binomially distributed.




□ rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data:

>> https://www.biorxiv.org/content/early/2018/09/18/420208

rnaSPAdes shows decent and stable results across multiple RNA-Seq datasets, the choice of the de novo transcriptome assembler remains a non-trivial problem, even with the aid of specially developed tools, such as Transrate, DETONATE, BUSCO and rnaQUAST.




□ Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications:

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1406-4

a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene- and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq.




□ VarTrix: a software tool for extracting single cell variant information from 10x Genomics single cell data

>> https://github.com/10XGenomics/vartrix

VarTrix does not perform variant calling. VarTrix uses Smith-Waterman alignment to evaluate reads that map to each known input variant locus and assign single cells to these variants. This process works on both 10x single cell gene expression datasets as well as 10x single cell DNA datasets.




□ Predictive Collective Variable Discovery with Deep Bayesian Models:

>> https://arxiv.org/pdf/1809.06913.pdf

formulating the discovery of collective variables (CVs) as a Bayesian inference problem and consider the CVs as hidden generators of the full-atomistic trajectory. Subtracting it from the atomistic potential as long as the approximation of the generative model is adequate could potentially accelerate the simulation by ”filling-in” the deep free-energy wells.






□ GenEpi: Gene-based Epistasis Discovery Using Machine Learning:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/20/421719.full.pdf

GenEpi takes the Genotype File Format (.GEN) used by Oxford statistical genetics tools, such as IMPUTE2 and SNPTEST as the input format for genotype data. Since the phenotype may also be affected by environmental factors, after determining the final set of genotype features, included the environmental factors such as clinical assessments for constructing the final model. To obtain the final model, they used random forests with 1,000 decision trees as the ensemble algorithm.






□ Parliament2: Fast Structural Variant Calling Using Optimized Combinations of Callers:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/23/424267.full.pdf

Parliament2 uses a call-overlap-genotype approach that is highly extensible to new methods and presents users the choice to run some or all of Breakdancer, Breakseq, CNVnator, Delly, Lumpy, and Manta to run. Parliament2 uses SURVIVOR to overlap these calls into consensus candidates; and validates these calls using SVTyper. Parliament2 is also a publicly available app on DNAnexus.







□ MetaCell: analysis of single cell RNA-seq data using k-NN graph partitions:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/08/437665.full.pdf

Metacells constitute local building blocks for clustering and quantitative analysis of gene expression, while not enforcing any global structure on the data, thereby maintaining statistical control and minimizing biases. In theory, a set of scRNA-seq profiles that are sampled from precisely replicated cellular RNA pools will be distributed multinomially with predictable variance and zero gene-gene covariance.







□ SIGDA: Scale-Invariant Geometric Data Analysis provides robust, detailed visualizations of human ancestry specific to individuals and populations:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/03/431585.full.pdf

SIGDA is intended to generalize two widely-used methods which apply to different kinds of data: Principal Components Analysis (PCA) ​, which applies z-score normalization to each of a set of random variables (columns) measured on a set of objects (rows), and Correspondence Analysis (CA),​ which applies a chi-squared model to cross-tabulated counts of observed events.

SIGDA interprets each matrix entry as a weight of similarity (or proximity or association) between the containing row and the containing column, or equivalently whatever (hidden) annotation may be associated with each row and column. SIGDA therefore generalizes both PCA and CA by discarding the assumptions which determine their respective approaches to data normalization, and it is SIGDA’s unique approach to data normalization which distinguishes it most from existing methods. SIGDA’s normalization, which they call ​projective decomposition​.

SIGDA determines the “relative orientation” between these two k -dimensional sub​spaces by singular value decomposition (SVD),​ obtaining k ​pairs of corresponding singular vectors.

SIGDA interprets matrix A twice: as 3D points defined by the eight rows, and unconventionally as an 8-dimensional point for each axis. Conceptually, projective decomposition simultaneously “focuses” these row and column points onto spheres; procedurally, it rescales each row and column of A to form a scale-free matrix W.

in general SIGDA will be used on data with many more than 3 dimensions, and this interpretation as a perspective drawing is therefore of limited utility. This connection with projective geometry is, however, at the heart of our “data camera” analogy.






□ GOAE and GONN: Combining Gene Ontology with Deep Neural Networks to Enhance the Clustering of Single Cell RNA-Seq Data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/07/437020.full.pdf

By integrating Gene Ontology with both unsupervised and supervised models, two novel methods are proposed, named GOAE (Gene Ontology AutoEncoder) and GONN (Gene Ontology Neural Network) respectively, for clustering of scRNA-seq data. In the GONN model, another hidden layer with 100 fully-connected neurons are added. After the training phase, the hidden layer with 100 fully-connected neurons is con- sidered as the low dimensional representation of the input. The diversity of a GO terms could be measured by gene expression values. z-score-based method is used for normalization on gene dimension.






□ dphmix: Variational Infinite Heterogeneous Mixture Model for Semi-supervised Clustering of Heart Enhancers:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/13/442392.full.pdf

implements a Dirichlet Process Infinite Heterogeneous Mixture model that infers Gaussian, Bernoulli and Poisson distributions over continuous. derived a variational inference algorithm to handle semi-supervised learning where certain observations are forced to cluster together. Cluster assignments, stick-breaking variables and distribution parameters form the latent variable space, while α and parameters of the NGBG prior form the hyperparameter space of the DPHM model.




□ XTalkiiS: a tool for finding data-driven cross-talks between intra-/inter-species pathways:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/13/437541.full.pdf

XTalkiiS loads a data-driven pathway network and applies a novel cross-talk modelling approach to determine interactions among known KEGG pathways in selected organisms. The potentials of XTalkiiS are huge as it paves the way of finding novel insights into mechanisms how pathways from two species (ideally host-parasite) may interact that may contribute to the various phenotype.




□ Reactive SINDy: Discovering governing reactions from concentration data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/13/442095.full.pdf

extend the sparse identification of nonlinear dynamics (SINDy) method to vector-valued ansatz functions, each describing a particular reaction process. The resulting sparse tensor regression method “reactive SINDy” is able to estimate a parsimonious reaction network. One apparent limitation is that the method can only be applied if the data stems from the equilibration phase, as the concentration-based approach has derivatives equal zero in the equilibrium, which precludes the reaction dynamics to be recovered.




□ BitMapperBS: a fast and accurate read aligner for whole-genome bisulfite sequencing:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/14/442798.full.pdf

BitMapperBS is an ultra-fast and memory-efficient aligner that is designed for WGBS reads from directional protocol. BitMapperBS is at most more than 70 times faster than popular WGBS aligners BSMAP and Bismark, and presents similar or greater sensitivity and precision. The vectorized bit-vector algorithm used in BitMapperBS extends multiple candidate locations simultaneously, while existing aligners extend their candidate locations one-by-one. As a result, the time-consuming extension step of BitMapperBS can be significantly accelerated.






□ A robust nonlinear low-dimensional manifold for single cell RNA-seq data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/14/443044.full.pdf

the t-Distributed Gaussian Process Latent Variable Model (tGPLVM) for learning a low dimensional embedding of unfiltered count data. tGPLVM is a Bayesian nonparametric model for robust nonlinear manifold estimation in scRNA-seq settings. The sparse kernel structure allows us to effectively reduce the number latent dimensions based on the actual complexity of the data. The implementation of tGPLVM accepts sparse inputs produced from high-throughput experimental cell by gene count matrices.




□ A direct comparison of genome alignment and transcriptome pseudoalignment:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/16/444620.full.pdf

To enable the feature with transcriptome pseudoalignment, developed a tool, kallisto quant - genomebam, that converts genome alignments in the format of a BAM or SAM file to transcript compatibility counts, the primary output of transcriptome pseudoalignment. using bam2tcc to convert HISAT2, STAR, transcriptome pseudoalignment programs kallisto and Salmon into transcript compatibility counts, which were then quantified using the expectation maximization (EM) algorithm for a uniform coverage model.






最新の画像もっと見る

コメントを投稿