lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

Clairvoyant.

2018-05-05 05:05:05 | Science News


□ Goldmund / "Occasus"

Terrarium




□ New Top Expressed Genes visualization shows how top genes in one tissue are expressed in other tissues.

>> https://bit.ly/2q4pxuV






□ Clairvoyante: a multi-task convolutional deep neural network for variant calling in Single Molecule Sequencing:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/28/310458.full.pdf

Clairvoyante is the first method for Single Molecule Sequencing to finish a whole genome variant calling in two hours on a 28 CPU-core machine, with top-tier accuracy and sensitivity. Future improvements to the base-calling algorithm or sequencing chemistry will lead to raw reads with higher accuracy at these troublesome genome regions and hence, further decrease the number of known variants undetected by Clairvoyante.



□ Skyhawk: An Artificial Neural Network-based discriminator for reviewing clinically significant genomic variants:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/01/311985.full.pdf

The multi-task ANN was designed for variant calling in Single Molecule Sequencing and the method is orthogonal to traditional variant callers using algorithms such as Bayesian or local-assembly. Skyhawk mimics how a human visually identifies genomic features comprising a variant and decides whether the evidence supports or contradicts the sequencing read alignments using the network architecture they developed in a previous study named "Clairvoyante".






□ Black-boxing and cause-effect power: causal properties of physical systems across different spatiotemporal scales.

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006114

while local maxima reveal cause-effect properties to an investigator studying the system, the global maximum specifies the set of elements and spatiotemporal grain at which the system has most cause-effect power upon itself–from its own intrinsic perspective. According to integrated information theory, a set of elements at the spatial-temporal grain that defines the global maximum of intrinsic cause-effect power corresponds to a physical substrate of consciousness.




□ Machine learning quantum phases of matter beyond the fermion sign problem:
  Using Deep Learning to Navigate Chaos in Many-Body Problems:

>> https://www.nature.com/articles/s41598-017-09098-0

Using auxiliary-field quantum Monte Carlo (QMC) simulations to sample the many-fermion system, the Green’s function holds sufficient information to allow for the distinction of different fermionic phases. QMC + machine learning approach works even for systems exhibiting a severe fermion sign problem where conventional approaches to extract information from the Green’s function, e.g. in the form of equal-time correlation functions, fail.




□ Haplotype-aware genotyping from noisy long reads:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/03/293944.full.pdf

formalize the computational problem in terms of a Hidden Markov Model and compute posterior genotype probabilities using the forward-backward algorithm. Genotype predictions can then be made by picking the most likely genotype at each site. Computing genotypes based on bipartitions of reads that represent possible haplotypes of the individual helps to reduce the number of genotyping errors, since it makes it easier to detect sequencing errors in the given reads.




□ bioLQM: a java library for the manipulation and conversion of Logical Qualitative Models of biological networks:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/05/287011.full.pdf

a multi-valued component with a maximal activity level m is replaced by m Boolean components, each denoting increasing activity. All possible states of the original model can then be associated to states of the Boolean model. The logical rules of the new model ensure that we obtain the same transitions between these states. Ongoing discussions should lead to extensions of the SED-ML format and the Kinetic Simulation Algorithm Ontology to describe model modifications and simulation parameters.




□ Multi-configuration time-dependent Hartree method for Efficient grid-based quantum dynamics without pre-computed potential energy surfaces:

>> https://aip.scitation.org/doi/10.1063/1.5024869

The simulations of proton transfer in salicylaldimine and non-adiabatic dynamics in pyrazine show that their method can be used to efficiently model quantum dynamics in many-dimensional systems. A database of energies was created on-the-fly during propagation, and the SVD procedure, described above, was used to generate the potential energy operator for the dynamics from the kernel ridge regression fit using the additive kernel.






□ DeFine: Deep learning based Functional impact of non-coding variants evaluator for accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants:

>> https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gky215/4958204

DeFine provides a framework for constructing cell type-specific deep learning models to assess the functional impact of abundant non-coding variants across the whole human genome. DeFine integrates cell type specific three-dimensional genome contact maps.




□ Learning and Forgetting Using Reinforced Bayesian Change Detection:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/06/294959.full.pdf

This rich model encompasses many aspects of Reinforcement Learning (RL), such as Temporal Difference RL and counterfactual learning, and accounts for the reduced computational cost of automatic behaviour. Hierarchical Adaptive Forgetting Variational Filter (HAFVF) that achieves adaptive change detection by the use of Stabilized Forgetting, updating its current belief based on a mixture of fixed, initial priors and previous posterior beliefs.




□ Phase Transitions of the Moran Process and Algorithmic Consequences:

>> https://arxiv.org/pdf/1804.02293.pdf

The Moran process is a randomised algorithm that models the spread of genetic mutations through graphs. If the graph is connected, the process eventually reaches "fixation", where every vertex is a mutant, or "extinction", where no vertex is a mutant. For all epsilon > 0, the expected running time on an n-vertex graph is o(n^(3+epsilon)). In fact, it is at most n^3 * exp(O((log log n)^3)) and that there is a family of graphs where it is Omega(n^3).






□ Chiron: Translating nanopore raw signal directly into nucleotide sequence using deep learning:

>> https://academic.oup.com/gigascience/advance-article/doi/10.1093/gigascience/giy037/4966989

Chiron has a novel architecture which couples a convolutional neural network with an RNN and a Connectionist Temporal Classification decoder. This enables it to model the raw signal data directly, without use of an event segmentation step. The convolutional layers discriminate local patterns in the raw input signal, whereas the recurrent layers integrate these patterns into basecall probabilities. At the top of the NN is a CTC decoder to provide the final DNA sequence according to the base probabilities.




□ Pushing the limits of ​de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/12/300186.full.pdf

Only by using very long reads from the ONT platform (mean read length 44 kb) was it possible to fully resolve this genome, indicating the presence of dark matter of genome assembly also in prokaryotes. Base called ONT reads were filtered (30 kb or longer), and assembled with Flye v.2.3 using standard parameters and performing three polishing iterations on Flye’s polishing stage (-i 3 --nano-raw [ont_reads.fastq] -g 7.7m).




□ Analysis on the Nonlinear Dynamics of Deep Neural Networks: Topological Entropy and Chaos:

>> https://arxiv.org/pdf/1804.03987.pdf

DNN is considered as a discrete-time dynamical system due to its layered structure. The complexity provided by the nonlinearity in the dynamics is analyzed in terms of topological entropy and chaos characterized by Lyapunov exponents. if the nonlinear dynamics provides too much complexity, the Vapnik-Chervonenkis (VC) dimension could be high and thus affect the generalization capability. there may exist chaotic behavior due to the nonlinearity, which explains why DNN can form complicated classification surfaces. we will calculate the topological entropy analytically in the high dimensional regime or numerically using the generating function of symbolic dynamics.






□ GNE: A deep learning framework for gene network inference by aggregating biological information:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/13/300996.full.pdf

The low-dimensional vector representation of the gene topological characteristics, which derive deeper insights into the structure of rapidly accumulating and diverse gene interaction networks, greatly simplify downstream modeling and ontology reconstruction.




□ Single cell RNA-seq denoising using a deep count autoencoder:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/13/300681.full.pdf

a deep count autoencoder network (DCA) takes the count distribution, overdispersion and sparsity of the data into account using a zero-inflated negative binomial noise model, and nonlinear gene-gene or gene-dispersion interactions are captured.




□ Tensorial Blind Source Separation for Improved Analysis of Multi-Omic Data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/13/300277.full.pdf

For instance, one solution may be to first perform dimensional reduction using supervised feature selection on separate data-types, and subsequently applying the tensorial methods on a reduced feature space. "For Canonical Correlation Analysis (CCA) and sCCA, the maximum number of canonical vectors to search for was set to be equal to the RMT estimate of the concatanated matrix, i.e. equal to the dimension of joint variation."






□ DragoNN/DeepLIFT: Decoding non-coding regulatory elements and disease-associated variants with deep learning:

>> https://drive.google.com/file/d/1Uk9i2TykKOrQUO0HCy-_MrdVswQkKuky/view

The dragonn package implements Deep RegulAtory GenOmic Neural Networks (DragoNNs) for predictive modeling of regulatory genomics, nucleotide-resolution feature discovery, and simulations for systematic development and benchmarking.




□ emeraLD: Rapid Linkage Disequilibrium Estimation with Massive Data Sets:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/15/301366.full.pdf

In data sets with 10Ks of samples, emeraLD is 5-20x faster than PLINK-1.9 and 100s-Ks of times faster than existing tools for VCF files. For genotype formats, calculate the dot product using sparse-by-dense and sparse-by-sparse vector products. Using haplotype block format (M3VCF), calculate the dot product using within-block and between-block haplotype intersections.




□ Droplet scRNA-seq is not zero inflated

>> http://www.nxn.se/valent/2017/11/16/droplet-scrna-seq-is-not-zero-inflated

In high throughput variants of scRNA-seq assays cells are isolated in (reverse) droplets, within which several molecular reactions occur to eventually give rise to labeled cDNA from expressed genes from each cell. Part of what makes this possible is limiting the sequenced fragments to single tags from the 3' or 5' end of each transcript. It has recently been observed in statistical analysis that RNA tag counting versions of scRNA-seq data is explained without additional zero inflation.






□ Distinguishing genetic correlation from causation across 52 diseases and complex traits:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/17/205435.full.pdf

a latent causal variable (LCV) model, under which the genetic correlation between two traits is mediated by a latent variable having a causal effect on each trait. the genetic causality proportion using mixed fourth moments E(α^21α1α2) & E(α^22α1α2) of marginal effect sizes, exploiting the fact that if trait 1 is causal for trait 2 then SNPs w/ large effects on trait 1 (large E(α^21)) will have correlated effects on trait 2 (large E(α1α2)).




□ STREAM: Single-cell Trajectories Reconstruction, Exploration And Mapping of omics data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/18/302554.full.pdf

ElPiGraph (Elastic Principal Graph) is a completely redesigned algorithm for elastic principal graph optimization introducing the elastic matrix Laplacian, trimmed mean square error, explicit control for topological complexity and scalability to millions of points.






□ ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/25/306902.full.pdf

demonstrate how the method provides further improvements of a megabase-scale Supernova human genome assembly, which itself exclusively uses linked read data for assembly, with an execution speed six to nine times faster than competitive linked read scaffolders. Following ARKS scaffolding of a human genome 10xG Supernova assembly, fewer than 9 scaffolds cover each chromosome, except the largest (chromosome 1, n=13).




□ Samplix: the Xdrop technology – for single molecule sample preparation: Information Integrity for Context in Genomics:

>> https://www.samplix.com

Samplix enables single DNA molecule sample preparation where today only multi-molecule sample preparation is possible. Among many applications, the Xdrop™ technology allows for targeted enrichment of very large DNA fragments.






□ A Monte Carlo method for in silico modeling and visualization of Waddington’s epigenetic landscape with intermediate details:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/29/310771.full.pdf

The method estimates the probability distribution of cellular states by collecting a large number of time-course simulations with random initial conditions. By projecting all the trajectories into a 2-dimensional plane of dimension i and j, we can approximately calculate the quasi-potential U(xi,xj) = −ln P(xi,xj), where P (xi , xj ) is the estimated probability of an equilibrium steady state or a non-equilibrium state.






□ DeepPVP: Deep PhenomeNET Variant Predictor: phenotype-based prioritization of causative variants using deep learning:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/02/311621.full.pdf

DeepPVP, an extension of the PVP system which uses deep learning and achieves significantly better performance in predicting disease-associated variants than the previous PVP system, as well as competing algorithms that combine pathogenicity and phenotype similarity. DeepPVP not only uses a deep artificial neural network to classify variants into causative and non-causative but also corrects for a common bias in variant prioritization methods in which gene-based features are repeated and potentially lead to overfitting.






□ Whale watching with BulkVis: A graphical viewer for Oxford Nanopore bulk fast5 files:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/03/312256.full.pdf

BulkVis can annotate signal features based on the metadata within the bulk fast5 file and provides a simple method to relate a base called read back to the channel and time in the data stream from which it originated.

long reads can be incorrectly divided by MinKNOW resulting in single DNA molecules being split into two or more reads. The longest seen to date is 2,272,580 bases in length and reported in eleven consecutive reads.






□ Causal gene regulatory network inference using enhancer activity as a causal anchor

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/03/311167.full.pdf

a data-driven approach to classify enhancer expression as binary or continuous was sufficient to automatically select the best target prediction method, allowing parameter-free application of the method to organisms and cell types where validation data is not currently available.




□ SATurn: A modular bioinformatics framework for the design of robust maintainable web-based and standalone applications

>> https://ddamerell53.github.io/SATurn/





3 Stars.

2018-05-04 01:36:59 | Science News


"I remember lying out in my bed and looking at the vast, quiet sky. Up above my head, there were three stars in a row, & I remember thinking, 'Well, I'll have those three stars all my life, & wherever I am, they will be. They are my stars, and they belong to me" - Spike Milligan






□ HiCAGE: an R package for large-scale annotation and visualization of 3C-based genomic data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/05/315234.full.pdf

HiCAGE provides 3C-based data integrated with gene expression analysis, as well as graphical summaries of integrations and gene-ontology enrichment of candidate genes based on proximity. Additionally, HiCAGE will increase our understanding of the functional consequences of changes to the nuclear architecture by linking gene expression with chromatin state interactions.






□ Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses

>> http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188299

The second-generation p-value is the proportion of data-supported hypotheses that are also null hypotheses, and indicate when the data are compatible with null hypotheses (pδ = 1), or with alternative hypotheses (pδ = 0), or when the data are inconclusive (0




□ On the networked architecture of genotype spaces and its critical effects on molecular evolution:

>> https://arxiv.org/pdf/1804.06835.pdf

when the complex architecture of genotype spaces is taken into account, the evolutionary dynamics of molecular populations becomes intrinsically non-uniform, sharing deep qualitative and quantitative similarities with slowly driven physical systems. Furthermore, the phenotypic plasticity inherent to genotypes transforms classical fitness landscapes into multiscapes where adaptation in response to an environmental change may be very fast. building a mesoscopic description in which phenotypes, rather than genotypes, are the basic elements of our dynamical framework, and in which microscopic details are subsumed in an effective, possibly non-Markovian stochastic dynamics.




□ Machine Learning’s ‘Amazing’ Ability to Predict Chaos

>> https://www.quantamagazine.org/machine-learnings-amazing-ability-to-predict-chaos-20180418/

the flamelike system would continue to evolve out to eight “Lyapunov times”. The Lyapunov time represents how long it takes for two almost-identical states of a chaotic system to exponentially diverge. it typically sets the horizon of predictability. “In order to have this exponential divergence of trajectories you need this stretching, and in order not to run away to infinity you need some folding.” The stretching and compressing in the different dimensions correspond to a system’s positive and negative “Lyapunov exponents,” respectively.



□ Using machine learning to replicate chaotic attractors and calculate Lyapunov exponents from data:

>> https://aip.scitation.org/doi/pdf/10.1063/1.5010300

a limited time series of measurements as input to a high-dimensional dynamical system called a “reservoir.” After the reservoir's response to the data is recorded, linear regression is used to learn a large set of parameters, called the “output weights.” The learned output weights are then used to form a modified autonomous reservoir designed to be capable of producing an arbitrarily long time series whose ergodic properties approximate those of the input signal. Since the reservoir equations and output weights are known, we can compute the derivatives needed to determine the Lyapunov exponents of the autonomous reservoir, which we then use as estimates of the Lyapunov exponents for the original input generating system.





□ Hybrid Forecasting of Chaotic Processes: Using Machine Learning in Conjunction with a Knowledge-Based Model:

>> https://arxiv.org/pdf/1803.04779.pdf

Both the hybrid scheme and the reservoir-only model have the property of “training reusability", make any number of subsequent predictions by preceding each prediction with a short run in the configuration to resynchronize the reservoir dynamics with the dynamics to be predicted. A particularly dramatic example illustrating the effectiveness of the hybrid approach in which, when acting alone, both the knowledge-based predictor & the reservoir machine learning predictor give fairly worthless results (prediction time of only a fraction of a Lyapunov time). when the same two systems are combined in the hybrid scheme, good predictions are obtained for a substantial duration of about 4 Lyapunov times.






□ Hidden state models improve the adequacy of state-dependent diversification approaches using empirical trees, including biogeographical models:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/17/302729.full.pdf

HMM models in combination with a model-averaging approach naturally account for hidden traits when examining the meaningful impact of a suspected "driver" of diversification. they demonstrate the role of hidden state models as a general framework by expanding the original geographic state speciation and extinction model (GeoSSE).

For instance, when an asteroid impact throws up a dust cloud, or causes a catastrophic fire, every lineage alive at that time is affected simultaneously.Their ability to survive may come from heritable factors, but the sudden shift in diversification caused by an exogenous event like this appears suddenly across the tree, in a manner not yet incorporated in these models. Similarly, a secular trend affecting all species is not part of this model, but is in others, and they also do not incorporate factors like a species “memory” of time since last speciation, or even a global carrying capacity for a clade that affects the diversification rate. Most of these caveats are not limited to HMM models of diversification, but this reminder may serve to reduce overconfidence in results.






□ Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data:

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006076

In 10 TCGA RNA-Seq data sets, Cox-nnet achieves the same or better predictive accuracy compared to other methods, including Cox-proportional hazards regression (with LASSO, ridge, and mimimax concave penalty), Random Forests Survival and CoxBoost. The outputs from the hidden layer node provide an alternative approach for survival-sensitive dimension reduction. it is possible to embed a priori biological pathway information into the architecture, by connecting genes in a pathway to a common node in the next hidden layer.






□ Tigmint: Correcting Assembly Errors Using Linked Reads From Large Molecules:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/20/304253.full.pdf

Tigmint identifies and corrects misassemblies using linked reads from 10x Genomics Chromium. The physical coverage of the large molecules is more consistent and less prone to coverage dropouts than that of the short read sequencing data. Each scaffold is scanned with a fixed window to identify areas where there are few spanning molecules, revealing possible misassemblies. Correcting the ABySS assembly of the human data set HG004 with Tigmint reduces the number of misassemblies identified by QUAST by 216, a reduction of 27%. The last assembly on the Pareto frontier is DISCOVARdenovo + BESST + Tigmint + ARCS, which strikes a good balance between both good contiguity and few misassemblies.




□ fast5seek

>> https://github.com/mbhall88/fast5seek

This program takes a directory (or multiple) of fast5 files along with any number of fastq, SAM, or BAM files. The output is the full paths for all fast5 files present in the fastq, BAM, or SAM files that are also in the provided fast5 directory(s). Sometimes there are additional characters in the fast5 names added on by albacore or MinKnow. They have variable length, so this attempts to clean the name to match what is stored by the fast5 files.




Clive_G_Brown:
MinKNOW 2.0 is coming ...




morgantaschuk:
Today's task is deleting 1TB of PhiX sequencing data from @illumina NovaSeq validation runs.

That's 3.479886103×10¹² bases

PhiX genome is 5386 bases long

So we theoretically sequenced 646098422 viral particles

The viral capsid is 250 Å wide

we sequenced 16 metres of PhiX




□ LCA robustly reveals subtle diversity in large-scale single-cell RNA-seq data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/20/305581.full.pdf

Latent Cellular Analysis (LCA), a machine learning-based analytical pipeline that features a dual-space model search with inference of latent cellular states, control of technical variations, cosine similarity measurement, and spectral clustering. LCA provides mathematical formulae with which to project the remaining cells (testing cells) directly to the inferred low-dimensional LC space, after which individual cells are assigned to the subpopulation with the best similarity.




□ gmxapi: a high-level interface for advanced control and extension of molecular dynamics simulations:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/22/306043.full.pdf

This approach, originally published and implemented using CHARMM, is a common workflow in our group using GROMACS that requires custom code in three places: user-specified biasing forces in the core MD engine, analysis code to process predicted ensemble data and update the biasing forces, and parallelization scripts to manage execution, analysis, and data exchange between many ensemble members simultaneously.




□ Global Biobank Engine: enabling genotype-phenotype browsing for biobank summary statistics:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/22/304188.full.pdf

Genetic correlations have been estimated by applying the multivariate polygenic mixture model (MVPMM) to GWAS summary statistics for more than one million pairs of traits and can be visualized using the app. Users can filter the phenotypes and results that are displayed by the app.






□ Genomic SEM Provides Insights into the Multivariate Genetic Architecture of Complex Traits:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/21/305029.full.pdf

genomic structural equation modeling (Genomic SEM) can be used to identify variants with effects on general dimensions of cross-trait liability, boost power for discovery, and calculate more predictive polygenic scores. Genomic SEM is a Two-Stage Structural Equation Modeling approach. In Stage 1, the empirical genetic covariance matrix and its associated sampling covariance matrix are estimated. The diagonal elements of the sampling covariance matrix are squared standard errors.




□ Rethomics: an R framework to analyse high-throughput behavioural data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/21/305664.full.pdf

At the core of rethomics lies the behavr table, a structure used to store large amounts of data (e.g. position and activity) and metadata (e.g. treatment and genotype) in a unique data.table-derived object. The metadata holds a single row for each of the n individuals. Its columns, the p metavariables, are one of two kinds: either required – and defined by the acquisition platform – or user-defined.




□ NanoDJ: a Jupyter notebook integration of tools for simplified manipulation and assembly of DNA sequences produced by ONT devices.

>> https://github.com/genomicsITER/NanoDJ

NanoDJ integrates basecalling, read trimming and quality control, simulation and plotting routines with a variety of widely used aligners and assemblers, including procedures for hybrid assembly.






□ Balancing Non-Equilibrium Driving with Nucleotide Selectivity at Kinetic Checkpoints in Polymerase Fidelity Control:

>> http://www.mdpi.com/1099-4300/20/4/306

the individual transitions serving as selection checkpoints need to proceed at moderate rates in order to sustain the necessary non-equilibrium drives as well as to allow nucleotide selections for an optimal error control. The accelerations on the backward transitions show similar but opposite trends: the accelerations lead to close-to-equilibrium with low speeds and high error rates; slowing down the transitions promotes far-from-equilibrium with high speeds and low error rates, except for the selection checkpoint at which the error rate rises for the too slow backward transition.






□ Generalised free energy and active inference: can the future cause the past?

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/23/304782.full.pdf

Formally, the ensuing generalised free energy is a Hamiltonian Action, because it is a path or time integral of free energy at each time point. In other words, active inference is just a statement of Hamilton's Principle of Stationary Action. Generalised free energy minimisation replicates the epistemic and reward seeking behaviours induced in earlier active inference schemes, but prior preferences now induce an optimistic distortion of belief trajectories into the future. This allows beliefs about outcomes in the distal future to influence beliefs about states in the proximal future and present. That these beliefs then drive policy selection suggests that, under the generalised free energy formulation, the future can indeed cause the past. A prior belief about an outcome at a particular time point thus distorts the trajectory of hidden states at each time point reaching back to the present.




□ NiDelta: De novo protein structure prediction using ultra-fast molecular dynamics simulation:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/23/262188.full.pdf

NiDelta is built on a deep convolutional neural network and statistical potential enabling molecular dynamics simulation for modeling protein tertiary structure. Statistically determined residue-contacts from the MSAs and torsion angles (φ, ψ) predicted by deep learning method provide valuable structural constraints for the ultra-fast MD simulation (Upside).




□ GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts:


>> https://www.biorxiv.org/content/biorxiv/early/2018/04/24/306936.full.pdf

discrepancies in the semantic granularity of gene annotations in knowledgebases represent a significant hurdle to overcome for researchers interested in mining genes based on a set of annotations used in experimental data. the topological distance between two terms in the ontology graph is not necessarily proportional to the semantic closeness in meaning between those terms, and semantic similarity reconciles potential inconsistencies between semantic closeness and graph distance.




□ HMMRATAC, The Hidden Markov ModeleR for ATAC-seq:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/24/306621.full.pdf

HMMRATAC splits a single ATAC-seq dataset into nucleosome- free and nucleosome-enriched signals, learns the unique chromatin structure around accessible regions, and then predicts accessible regions across the entire genome.






□ Discovery of Large Disjoint Motif in Biological Network using Dynamic Expansion Tree:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/25/308254.full.pdf

The dynamic expansion tree used in this algorithm is truncated when the frequency of the subgraph fails to cross the predefined threshold. This pruning criterion in DET reduces the space complexity significantly.






□ Identifying high-priority proteins across the human diseasome using semantic similarity:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/29/309203.full.pdf

a systematic collection of popular proteins across 10,129 human diseases as defined by the Disease Ontology, 10,642 disease phenotypes defined by Human Phenotype Ontology, and 2,370 cellular pathways defined by Pathway Ontology. This strategy allows instant retrieval of popular proteins across the human ”diseasome”, and further allows reverse queries from protein to disease, enabling functional analysis of experimental protein lists using bibliometric annotations.




□ Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes:

>> https://www.biorxiv.org/content/biorxiv/early/2018/04/30/311449.full.pdf

SmuDGE, a method that uses feature learning to generate vector-based representations of phenotypes associated with an entity. SmuDGE can match or outperform semantic similarity in phenotype-based disease gene prioritization, and furthermore significantly extends the coverage of phenotype-based methods to all genes in a connected interaction network.