lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

ORACLE.

2018-10-17 00:17:17 | Science News

□ ODESZA / "Meridian"



"the universe is a (gigantic) joint probabilistic model, and some marginal distributions can be described by standard model..."



□ Architectural Principles for Characterizing the Performance of Sequestration Feedback Networks:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/27/428300.full.pdf

The primary focus here is a circuit architecture that uses a sequestration mechanism to implement feedback control in a biomolecular circuit. This circuit immediately had a broad impact on the study of biological feedback systems, as sequestration is both abundant in natural biological contexts and appears to be feasible to implement in synthetic networks. For example, sequestration feedback can be implemented using sense-antisense mRNA pairs, sigma-antisigma factor pairs, or scaffold-antiscaffold pairs.




□ NanoSatellite: Accurate characterization of expanded tandem repeat length and sequence through whole genome long-read sequencing on PromethION.:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/09/439026.full.pdf

“NanoSatellite”, a novel pattern recognition algorithm, which bypasses base calling and alignment, and performs direct Tandem Repeats analysis on raw PromethION squiggles. achieved more than 90% accuracy and high precision (5.6% relative standard deviation). NanoSatellite is based on consecutive rounds of Dynamic Time Warping (DTW), a dynamic programming algorithm to find the optimal alignment between two (unevenly spaced) time series.






□ INSTRAL-ASTRAL: Discordance-aware Phylogenetic Placement using Quartet Scores:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/02/432906.full.pdf

INSTRAL finds the optimal solution to the quartet placement problem. Unlike ASTRAL, the number of possible solutions to the placement problem is small (grows linearly with n), and thus, INSTRAL can solve the problem exactly even for large trees. In principle, it is possible to develop algorithms that compute the quartet score for all possible branches, one at a time, and to select the optimal solution at the end. However, the ASTRAL dynamic programming allows for a more straight-forward solution.






□ AEGIS: Exploratory Gene Ontology Analysis with Interactive Visualization:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/05/436741.full.pdf

AEGIS (Augmented Exploration of the GO with Interactive Simulations) is an interactive information-retrieval framework that enables an investigator to navigate through the entire Gene Ontology graph (tens of thousands of nodes) and focus on fine-grained details without losing the context. AEGIS features interpretable visualization of GO terms, flexible exploratory analysis of the GO DAG (directed acyclic graph) by adopting the focus-and-context framework, reminiscent of classical principles in visual information system design that is biologically grounded.






□ Contour Monte Carlo: Inverse sensitivity analysis of mathematical models avoiding the curse of dimensionality:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/01/432393.full.pdf

The computational complexity of the methods used to conduct inverse sensitivity analyses for deterministic systems has limited their application to models with relatively few parameters. a novel Markov Chain Monte Carlo method we call “Contour Monte Carlo”, which can be used to invert systems with a large number of parameters.

the utility of this method by inverting a range of frequently-used deterministic models of biological systems, including the logistic growth equation, the Michaelis-Menten equation, and an SIR model of disease transmission with nine input parameters. argue that the simplicity of this approach means it is amenable to a large class of problems of practical significance and, more generally, provides a probabilistic framework for understanding the inversion of deterministic models.




□ An information thermodynamic approach quantifying MAPK-related signaling cascades by average entropy production rate:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/01/431676.full.pdf

Signal transduction can be computed by entropy production amount from the fluctuation in the phosphorylation reaction of signaling molecules. By Bayesian analysis of the entropy production rates of individual steps, they are consistent through the signal cascade.






□ New architecture trains a nano-oscillator classifier with standard machine learning algorithms:

>> https://aip.scitation.org/doi/10.1063/1.5042359

they only used the average stable state of the oscillator network, the offline learning algorithm can be applied to temporal signals as well, by inputting a different F at every time-step, and reading a sliding time window average of f(t). The new architecture correctly categorized a larger percentage of the standard data set known as Iris than the reference classifier did. Comparison of results on the Iris data set further highlights the power of the nonmonotonic and interunit interactions.








□ Systematic Prediction of Regulatory Motifs from Human ChIP-Sequencing Data Based on a Deep Learning Framework:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/16/417378.full.pdf

DESSO utilizes deep neural network and binomial distribution to optimize the motif prediction, and the results showed that DESSO outperformed existing tools in predicting distinct motifs from the 690 in vivo ENCODE ChIP- Seq datasets for 161 human TFs in 91 cell lines. designed a first-of-its-kind binomial-based model in DESSO to identify all the significant motif instances, under the statistical hypothesis that the number of random sequence segments which contain the motif of interest in the human genome is binomially distributed.




□ rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data:

>> https://www.biorxiv.org/content/early/2018/09/18/420208

rnaSPAdes shows decent and stable results across multiple RNA-Seq datasets, the choice of the de novo transcriptome assembler remains a non-trivial problem, even with the aid of specially developed tools, such as Transrate, DETONATE, BUSCO and rnaQUAST.




□ Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications:

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1406-4

a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene- and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq.




□ VarTrix: a software tool for extracting single cell variant information from 10x Genomics single cell data

>> https://github.com/10XGenomics/vartrix

VarTrix does not perform variant calling. VarTrix uses Smith-Waterman alignment to evaluate reads that map to each known input variant locus and assign single cells to these variants. This process works on both 10x single cell gene expression datasets as well as 10x single cell DNA datasets.




□ Predictive Collective Variable Discovery with Deep Bayesian Models:

>> https://arxiv.org/pdf/1809.06913.pdf

formulating the discovery of collective variables (CVs) as a Bayesian inference problem and consider the CVs as hidden generators of the full-atomistic trajectory. Subtracting it from the atomistic potential as long as the approximation of the generative model is adequate could potentially accelerate the simulation by ”filling-in” the deep free-energy wells.






□ GenEpi: Gene-based Epistasis Discovery Using Machine Learning:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/20/421719.full.pdf

GenEpi takes the Genotype File Format (.GEN) used by Oxford statistical genetics tools, such as IMPUTE2 and SNPTEST as the input format for genotype data. Since the phenotype may also be affected by environmental factors, after determining the final set of genotype features, included the environmental factors such as clinical assessments for constructing the final model. To obtain the final model, they used random forests with 1,000 decision trees as the ensemble algorithm.






□ Parliament2: Fast Structural Variant Calling Using Optimized Combinations of Callers:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/23/424267.full.pdf

Parliament2 uses a call-overlap-genotype approach that is highly extensible to new methods and presents users the choice to run some or all of Breakdancer, Breakseq, CNVnator, Delly, Lumpy, and Manta to run. Parliament2 uses SURVIVOR to overlap these calls into consensus candidates; and validates these calls using SVTyper. Parliament2 is also a publicly available app on DNAnexus.







□ MetaCell: analysis of single cell RNA-seq data using k-NN graph partitions:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/08/437665.full.pdf

Metacells constitute local building blocks for clustering and quantitative analysis of gene expression, while not enforcing any global structure on the data, thereby maintaining statistical control and minimizing biases. In theory, a set of scRNA-seq profiles that are sampled from precisely replicated cellular RNA pools will be distributed multinomially with predictable variance and zero gene-gene covariance.







□ SIGDA: Scale-Invariant Geometric Data Analysis provides robust, detailed visualizations of human ancestry specific to individuals and populations:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/03/431585.full.pdf

SIGDA is intended to generalize two widely-used methods which apply to different kinds of data: Principal Components Analysis (PCA) ​, which applies z-score normalization to each of a set of random variables (columns) measured on a set of objects (rows), and Correspondence Analysis (CA),​ which applies a chi-squared model to cross-tabulated counts of observed events.

SIGDA interprets each matrix entry as a weight of similarity (or proximity or association) between the containing row and the containing column, or equivalently whatever (hidden) annotation may be associated with each row and column. SIGDA therefore generalizes both PCA and CA by discarding the assumptions which determine their respective approaches to data normalization, and it is SIGDA’s unique approach to data normalization which distinguishes it most from existing methods. SIGDA’s normalization, which they call ​projective decomposition​.

SIGDA determines the “relative orientation” between these two k -dimensional sub​spaces by singular value decomposition (SVD),​ obtaining k ​pairs of corresponding singular vectors.

SIGDA interprets matrix A twice: as 3D points defined by the eight rows, and unconventionally as an 8-dimensional point for each axis. Conceptually, projective decomposition simultaneously “focuses” these row and column points onto spheres; procedurally, it rescales each row and column of A to form a scale-free matrix W.

in general SIGDA will be used on data with many more than 3 dimensions, and this interpretation as a perspective drawing is therefore of limited utility. This connection with projective geometry is, however, at the heart of our “data camera” analogy.






□ GOAE and GONN: Combining Gene Ontology with Deep Neural Networks to Enhance the Clustering of Single Cell RNA-Seq Data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/07/437020.full.pdf

By integrating Gene Ontology with both unsupervised and supervised models, two novel methods are proposed, named GOAE (Gene Ontology AutoEncoder) and GONN (Gene Ontology Neural Network) respectively, for clustering of scRNA-seq data. In the GONN model, another hidden layer with 100 fully-connected neurons are added. After the training phase, the hidden layer with 100 fully-connected neurons is con- sidered as the low dimensional representation of the input. The diversity of a GO terms could be measured by gene expression values. z-score-based method is used for normalization on gene dimension.






□ dphmix: Variational Infinite Heterogeneous Mixture Model for Semi-supervised Clustering of Heart Enhancers:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/13/442392.full.pdf

implements a Dirichlet Process Infinite Heterogeneous Mixture model that infers Gaussian, Bernoulli and Poisson distributions over continuous. derived a variational inference algorithm to handle semi-supervised learning where certain observations are forced to cluster together. Cluster assignments, stick-breaking variables and distribution parameters form the latent variable space, while α and parameters of the NGBG prior form the hyperparameter space of the DPHM model.




□ XTalkiiS: a tool for finding data-driven cross-talks between intra-/inter-species pathways:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/13/437541.full.pdf

XTalkiiS loads a data-driven pathway network and applies a novel cross-talk modelling approach to determine interactions among known KEGG pathways in selected organisms. The potentials of XTalkiiS are huge as it paves the way of finding novel insights into mechanisms how pathways from two species (ideally host-parasite) may interact that may contribute to the various phenotype.




□ Reactive SINDy: Discovering governing reactions from concentration data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/13/442095.full.pdf

extend the sparse identification of nonlinear dynamics (SINDy) method to vector-valued ansatz functions, each describing a particular reaction process. The resulting sparse tensor regression method “reactive SINDy” is able to estimate a parsimonious reaction network. One apparent limitation is that the method can only be applied if the data stems from the equilibration phase, as the concentration-based approach has derivatives equal zero in the equilibrium, which precludes the reaction dynamics to be recovered.




□ BitMapperBS: a fast and accurate read aligner for whole-genome bisulfite sequencing:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/14/442798.full.pdf

BitMapperBS is an ultra-fast and memory-efficient aligner that is designed for WGBS reads from directional protocol. BitMapperBS is at most more than 70 times faster than popular WGBS aligners BSMAP and Bismark, and presents similar or greater sensitivity and precision. The vectorized bit-vector algorithm used in BitMapperBS extends multiple candidate locations simultaneously, while existing aligners extend their candidate locations one-by-one. As a result, the time-consuming extension step of BitMapperBS can be significantly accelerated.






□ A robust nonlinear low-dimensional manifold for single cell RNA-seq data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/14/443044.full.pdf

the t-Distributed Gaussian Process Latent Variable Model (tGPLVM) for learning a low dimensional embedding of unfiltered count data. tGPLVM is a Bayesian nonparametric model for robust nonlinear manifold estimation in scRNA-seq settings. The sparse kernel structure allows us to effectively reduce the number latent dimensions based on the actual complexity of the data. The implementation of tGPLVM accepts sparse inputs produced from high-throughput experimental cell by gene count matrices.




□ A direct comparison of genome alignment and transcriptome pseudoalignment:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/16/444620.full.pdf

To enable the feature with transcriptome pseudoalignment, developed a tool, kallisto quant - genomebam, that converts genome alignments in the format of a BAM or SAM file to transcript compatibility counts, the primary output of transcriptome pseudoalignment. using bam2tcc to convert HISAT2, STAR, transcriptome pseudoalignment programs kallisto and Salmon into transcript compatibility counts, which were then quantified using the expectation maximization (EM) algorithm for a uniform coverage model.






Omega Point.

2018-10-17 00:13:17 | Science News


問題の解決手段自体が、解決すべき問題になる。セグメントの単純化は、対象事物を指向するフラグメントの細分化を伴い、その複雑性を保存する。エントロピーは不可逆性を計る指標だが、複雑性は時間に対し可塑性を担保する。即ち行為の余波は、行為しなかった余波と対称的な力学量を持つ。





□ The SIRAH force field 2.0: Altius, Fortius, Citius:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/05/436774.full.pdf

SIRAH 2.0 can be considered a significant upgrade that comes at no increase of computational cost, as the functional form of the Hamiltonian, the number of beads in each moiety, and their topologies remained the same. The simulation of the holo form starting from an experimental structure sampled near- native conformations, retrieving quasi-atomistic precision.




□ A starless bias in the maximum likelihood phylogenetic methods:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/05/435412.full.pdf

If the aligned sequences are equidistant from each other with the true tree being a star tree, then the likelihood method is incapable of recover unless the sequences are either identical or extremely diverged. analyze this “starless” bias and identify the source for the bias. In contrast, distance-based methods (with the least-squares method for branch evaluation and either minimum evolution or least-squares criterion for choosing the best tree) do not have this bias. The finding sheds light on the star-tree paradox in Bayesian phylogenetic inference.






□ Prioritising candidate genes causing QTL using hierarchical orthologous groups:

>> https://academic.oup.com/bioinformatics/article/34/17/i612/5093215

Gene families, in the form of hierarchical orthologous groups from the Orthologous MAtrix project (OMA), enable reasoning over complex nested homologies in a consistent framework. By integrating functional inference with homology mapping, it is possible to differentiate the confidence in orthologous and paralogous relationships when propagating functional knowledge.




□ Evaluating stochastic seeding strategies in networks

>> https://arxiv.org/abs/1809.09561

how stochastic seeding strategies can be evaluated using existing data arising from randomized experiments in networks designed for other purposes and how to design much more efficient experiments for this specific evaluation. he proposed estimators and designs can dramatically increase precision while yielding valid inference.




□ CaSTLe – Classification of single cells by transfer learning: Harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments:

>> https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0205499

''CaSTLe–classification of single cells by transfer learning,'' is based on a robust feature engineering workflow and an XGBoost classification model built on these features. The feature engineering steps include: selecting genes with the top mean expression and mutual information gain, removing correlated genes, and binning the data according to pre-defined ranges.




□ Demonstration of End-to-End Automation of DNA Data Storage:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/10/439521.full.pdf

The device enables the encoding of data, which is then written to a DNA oligonucleotide using a custom DNA synthesizer, pooled for liquid storage, and read using a nanopore sequencer and a novel, minimal preparation protocol. The extension segment is then T/A ligated to the standard Oxford Nanopore Technology (ONT) LSK-108 kit sequencing adapter, creating the “extended ONT adapter,” which ensures that sufficient bases are read for successful base calling.






□ Selene: a PyTorch-based deep learning library for sequence-level data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/10/438291.full.pdf

"Sequence-level data" refers to any type of biological sequence such as DNA, RNA, or protein sequences and their measured properties (e.g. TF binding, DNase sensitivity, RBP binding). Training is automatically completed by Selene; afterwards, the researcher can easily use Selene to compare the performance of their new model to the original DeepSEA model on the same chromosomal holdout dataset.




□ DIAlign provides precise retention time alignment across distant runs in DIA and targeted proteomics:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/10/438309.full.pdf

DIAlign is a novel algorithm based on direct alignment of raw MS2 chromatograms using a hybrid dynamic programming approach. The algorithm does not impose a chronological order of elution and allows for aligning of elution-order swapped peaks.






□ SETD8 wild-type apo and cofactor-bound, and mutant apo Folding@home simulations

>> https://osf.io/2h6p4/




□ VOMM: A framework for space-ecient variable-order Markov models:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/14/443101.full.pdf

a practical, versatile representations of variable-order Markov models and of interpolated Markov models, that support a large number of context-selection criteria, scoring functions, probability smoothing methods, and interpolations, and that take up to 4 times less space than previous implementations based on the suffix array, regardless of the number and length of contexts, and up to 10 times less space than previous trie-based representations, or more, while matching the size of related, state-of-the-art data structures from Natural Language Processing.






□ D-NAscent: Capturing the dynamics of genome replication on individual ultra-long nanopore sequence reads:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/16/442814.full.pdf

Under conditions of limiting BrdU concentration, D-NAscent detects the differences in BrdU incorporation frequency across individual molecules to reveal the location of active replication origins, fork direction, termination sites, and fork pausing/stalling events. The trained BrdU pore model to account for the presence of BrdU in the sequence while also circumventing the high space and time complexities that can result from dynamic programming alignment. With an alignment of events to the Albacore basecall, then aligned to the minimap2.




□ Comparative Pathway Integrator: a framework of meta-analytic integration of multiple transcriptomic studies for consensual and differential pathway analysis:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/16/444604.full.pdf

Given pathway enrichment results, perform Adaptively-weighted Fisher’s (AW Fisher) method as meta-analysis, to identify pathways significant in one or more studies/conditions.






□ GoT: High throughput droplet single-cell Genotyping of Transcriptomes reveals the cell identity dependency of the impact of somatic mutations:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/16/444687.full.pdf

GoT capitalizes on high-throughput scRNA-seq (the 10x Genomics Chromium single cell 3’ platform), by which thousands of cells can be jointly profiled for genotyping information as well as single-cell full transcriptomes. the ability of GoT to genotype multiple target genes in parallel is critical. while described here for 3’ droplet-based scRNA-seq, GoT can be integrated in any scRNA-seq method that generates full length cDNA as an intermediate product (Microwell-seq, 10x SingleCell V(D)J +5′GE). the high-throughput linking of single-cell genotyping of expressed genes to transcriptomic data may provide the means to gain insight into questions such as the integration of clonal diversification with lineage plasticity or differentiation topologies.






□ Using genetic data to strengthen causal inference in observational research:

>> https://www.nature.com/articles/s41576-018-0020-3

Recent progress in genetic epidemiology — including statistical innovation, massive genotyped data sets and novel computational tools for deep data mining — has fostered the intense development of methods exploiting genetic data and relatedness to strengthen causal inference. Assessing credibility requires in-depth knowledge of the question, which is unlikely in massive hypothesis-free causal inference exercises, such as phenome-wide approaches.

Triangulation — when conclusions from several study designs converge — will play an increasingly important role in strengthening evidence for causality. One should not expect that a single existing or future method for causal inference in observational settings will provide a definitive answer to a causal question. Rather, such methods can substantially improve the strength of evidence on a continuum from mere association to established causality.






□ Using long-read sequencing to detect imprinted DNA methylation:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/17/445924.full.pdf

Determining allele-specific methylation patterns in diploid or polyploid cells with short-read sequencing is hampered by the dependence on a high SNP density and the reduction in sequence complexity inherent to bisulfite treatment. Using long-read nanopore sequencing, with an average genomic coverage of approximately ten, it is possible to determine both the level of methylation of CpG sites and the haplotype from which each read arises.




□ BiG-SCAPE and CORASON: A computational framework for systematic exploration of biosynthetic diversity from large-scale genomic data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/17/445270.full.pdf

BiG-SCAPE facilitates rapid calculation and interactive exploration of BGC sequence similarity networks (SSNs); it accounts for differences in modes of evolution between BGC classes, groups gene clusters at multiple hierarchical levels, introduces a ‘glocal’ alignment mode that supports complete as well as fragmented BGCs, and democratizes the analysis through a dramatically accelerated.




□ Reverse GWAS: Using Genetics to Identify and Model Phenotypic Subtypes:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/17/446492.full.pdf

Reverse GWAS (RGWAS) to identify and validate subtypes using genetics and multiple traits: while GWAS seeks the genetic basis of a given trait, RGWAS seeks to define trait subtypes with distinct genetic bases. RGWAS uses a bespoke decomposition, MFMR, to model covariates, binary traits, and population structure. A random-effect version of MFMR could improve power to detect polygenic subtypes, though computational issues are non-trivial. MFMR could also be adapted to count data, zero-inflation, higher-order arrays, or missing data.




□ OSCA: a tool for omic-data-based complex trait analysis:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/17/445163.full.pdf

MOMENT (a mixed-linear-model-based method) that tests for association between a DNAm probe and trait with all other distal probes fitted in multiple random-effect components to account for the effects of unobserved confounders as well as the correlations between distal probes. MOMENT has been implemented in a versatile software package (OSCA) together with a number of other implementations for omic-data-based analysis incl the estimation of variance in a trait captured by all measures of multiple omic profiles, xQTL analysis, and meta-analysis of xQTL.






□ LuxRep: a technical replicate-aware method for bisulfite sequencing data analysis:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/19/444711.full.pdf

LuxGLM is a probabilistic covariate model for quantification of DNA methylation modifications with complex experimental designs. LuxRep improves the accuracy of differential methylation analysis and lowers running time of model-based DNA methylation analysis. LuxRep features Model-based integration of biological / technical replicates, and Full Bayesian inference by variational inference implemented in Stan. LuxRep also generates count data from sequencing reads using e.g. Bismark, and align BS-seq data.




□ RAxML-NG: A fast, scalable, and user-friendly tool for maximum likelihood phylogenetic inference:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/18/447110.full.pdf

RAxML-NG is a from scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML. RAxML and ExaML are large monolithic codes, RAxML-NG employs a two-step L-BFGS-B method to optimize the parameters of the LG4X model. RAxML-NG is a phylogenetic tree inference tool which uses maximum-likelihood (ML) optimality criterion. RAxML can compute the novel branch support metric called transfer bootstrap expectation. TBE is less sensitive to individual misplaced taxa in replicate trees, and thus better suited to reveal well-supported deep splits in large trees with thousands of taxa.




□ On Parameter Interpretability of Phenomenological-Based Semiphysical Models:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/18/446583.full.pdf

the phenomenological modeling approach offers the great advantage of having a structure with variables and parameters with physical meaning that enhance the interpretability of the model and its further used for decision making. this property has not been deeply discussed, perhaps by the implicit assumption that interpretability is inherent to the phenomenological-based models.






□ SeqOthello: querying RNA-seq experiments at scale:

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1535-9

SeqOthello, an ultra-fast & memory-efficient indexing structure to support arbitrary sequence query against large collections of RNA-seq experiments. It takes only 5 min and 19.1 GB memory to conduct a global survey of 11,658 fusion events against 10,113 TCGA Pan-Cancer datasets. By providing a reference-free, alignment-free, and parameter-free sequence search system, SeqOthello will enable large-scale integrative studies using sequence-level data, an undertaking not previously practicable for many individual labs.






□ Highly parallel single-molecule identification of proteins in zeptomole-scale mixtures:

>> https://www.nature.com/articles/nbt.4278

The obtained sparse fluorescent sequence of each molecule was then assigned to its parent protein in a reference database. testing the method on synthetic and naturally derived peptide molecules in zeptomole-scale quantities. they also fluorescently labeled phosphoserines and achieved single-molecule positional readout of the phosphorylated sites.




□ Deciphering epigenomic code for cell differentiation using deep learning:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/22/449371.full.pdf

Increasing lines of evidence have suggested that the epigenome in a cell type is established step-wisely though the interplay of genomic sequence, chromatin remodeling systems and environmental cues along the developmental lineage. As the latter two factors are the results of interactions of the products of genomic sequences, the epigenome of a cell type is ultimately determined by the genomic sequence.




□ A relative comparison between Hidden Markov- and Log-Linear- based models for differential expression analysis in a real time course RNA sequencing data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/22/448886.full.pdf

evaluate the relative performance of two Hidden Markov- and Log-Linear- based statistical models in detection of DE genes in a real time course RNA-seq data. The Hidden Markov-based model, EBSeq-HMM, was particularly developed for time course experiments while the log-linear based model, multiDE, was proposed for multiple treatment conditions.




□ Efficient Proximal Gradient Algorithm for Inference of Differential Gene Networks:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/22/450130.full.pdf

The differential gene-gene interactions identified by ProGAdNet algorithm yield a list of genes alternative to the list of differentially expressed genes. This may provide additional insight into the molecular mechanism behind the phenotypical difference of the tissue under different conditions. Alternatively, the two gene networks inferred by ProGAdNet algorithm can be used for further differential network analysis (DiNA).




□ Look4TRs: A de-novo tool for detecting simple tandem repeats using self-supervised hidden Markov models:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/23/449801.full.pdf

Look4TRs adapts itself to the input genomes automatically, balancing high sensitivity and low false positive rate. Look4TRs generates a random chromosome based on a real chromosome of the input genome. Then it inserts semi-synthetic MS in the random chromosome. Finally, the HMM is trained and calibrated using these semi-synthetic MS.