lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

XXII.

2017-03-03 23:33:33 | Science News

(APOD: 2017 February 13 - Cloud Swirls around Southern Jupiter.)



lpachter:

"The circle is the most perfect shape. I know this is true because my reasoning is circular."




"The new Oncocircos tool allows visualization of segment data derived from the Titan and Sequenza-based workflows"

>> http://biorxiv.org/content/early/2016/11/26/089631

GigaScience:
Morin now onto his integrative analyses & vizualisation work. New Oncocircos tool for visualization of segment data.





□ Genome Modeling System: A Knowledge Management Platform for Genomics:

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004274

a current adopter has access to well-vetted pipelines and tools for cancer genome analysis including: BWA, Strelka, VarScan2, SomaticSniper, Pindel, GATK, BreakDancer, CREST, TIGRA_SV, ChimeraScan, the Tuxedo suite, the HTSeq and edgeR combination, CopyCat, and many more.




StevenSalzberg1:
http://genome.cshlp.org/content/early/2017/01/27/gr.213405.116.abstract … Our new "mega-reads" assembly algorithm for PacBio+Illumina hybrid datase




satijalab:
combinatorial indexing for scRNA-seq : http://biorxiv.org/content/early/2017/02/02/104844




□ Using computational theory to constrain statistical models of neural data:

>> http://biorxiv.org/content/biorxiv/early/2017/01/31/104737.full.pdf

automatic model composition methods iteratively construct models, adding increasingly sophisticated structure to capture nuances of the data and comparing on the basis of marginal likelihood. While these advances have still not taken the human “out of the loop,” these approaches do indeed mimic the process by which humans learn the complex structure of data.




□ Process reveals structure: How a network is traversed mediates expectations about its architecture:

>> https://arxiv.org/abs/1702.00101

A combination of computational modeling and measures of semantic fluency in humans indicate that retrieval processes operating on densely clustered semantic networks are optimized by algorithms that sample densely from those clusters before moving to another. at least in a dynamic sensory environment, fixed graph topology underlying incoming stimuli does not itself appear sufficient for learning. the order in which that topology is revealed to the learner facilitates (or impedes) extraction of higher-order architectural properties.




□ Maximum Entropy Methods for Extracting the Learned Features of Deep Neural Networks:

>> http://biorxiv.org/content/biorxiv/early/2017/02/03/105957.full.pdf

employing the categorical cross-entropy loss function and the adaDelta optimizer from python package Keras. the 5 lowest variance PC vectors of each exemplar with 5 mutually orthogonal vectors randomly sampled from the 201 dimensional binary space. a central goal of statistical mechanics is to understand constrained maximum entropy distributions of many degrees of freedom.




□ rxncon 2.0: a language for executable molecular systems biology:

>> http://biorxiv.org/content/biorxiv/early/2017/02/08/107136.full.pdf

the syntax and semantics of rxncon, the reaction- contingency language for the description of cellular signalling processes. The structure indices of a Boolean contingency live in a namespace that is labelled by the name of that particular Boolean contingency.




□ Model-free reinforcement learning operates over information stored in working-memory to drive human choices:

>> http://biorxiv.org/content/biorxiv/early/2017/02/11/107698.full.pdf

In the human participants, behavior was influenced by both model-based and model-free processes regardless of whether the states were defined by external sensory cues or internal working-memory representations. habits are not exclusively learned in a model-free way; it may also be true that habituation involves additional mechanisms beyond the model-free caching of state-action-reward contingencies.




□ Statistical inference for moving-average L ́evy-driven processes: Fourier-based approach:

>> https://arxiv.org/pdf/1702.02794.pdf

a new method of the semiparametric statistical estimation for the continuous-time moving average L ́evy processes. the L ́evy triplet (γ, σ, ν) can be estimated from ψ. In fact, since ν is absolutely continuous with an absolutely integrable density, then by the Riemann-Lebesgue lemma F[ν](u) → 0 as |u| → ∞, and consequently ψ(u) can be viewed, at least for large |u|, as a second order polynomial with the coefficients (−λ,iγ,−σ2/2).






□ Semantics-based plausible reasoning - extend the knowledge coverage of medical knowledge bases for clinical decision

>> http://biodatamining.biomedcentral.com/articles/10.1186/s13040-017-0123-y

the SWI-Prolog engine is used to perform deductive reasoning and the unification steps in the analogical reasoning; Aleph, an Inductive Logic Programming (ILP) system, conducts inductive reasoning; and ontology-based inferencing is performed by the Apache Jena OWL reasoner.




□ AGBT 2017 - Advances in Genome Biology and Technology #agbt17

>> http://www.agbt.org/meetings/agbt-general-meeting/



□ Bionano launches Saphyr, our most advanced system for genome mapping & structural variation analysis

>> http://bit.ly/2l7E7Bw
>> http://bionanogenomics.com/products/saphyr/

Bionano’s Saphyr System features enhanced optics with adaptive loading of DNA utilizing machine learning. Saphyr detects structural variations ranging from 1,000 bp to megabase pairs in length. With Saphyr Chip’s dual-flowcell design, it enables to generate two independent maps from one sample, with two enzymes, & combine the data.




□ 10x Genomics Launches Comprehensive Software Suite for Single-Cell RNA-seq Data Analysis and Visualization:

>> http://www.businesswire.com/news/home/20170213005309/en/10x-Genomics-Launches-Comprehensive-Software-Suite-Single-Cell

Cell Ranger features a powerful new graph-based clustering method with the ability to identify and resolve more cellular sub-populations. Loupe Cell Browser features an array of techniques for dimensionality reduction and clustering for single cell analysis.




jgreid:
1st up in informatics concurrent session Mark DePristo on "Mastering variant calling of SNPs & small indels w/deep neural networks" #AGBT17

MS: 60x PE ILMN, 35x linked-reads 10X, 55x long read PacBio, assemblies w/MegaHit, SuperNova, & FALCON-unzip


DaleYuzuki:
Simpson: Uses speech recognition algorithms following similar path, from HMM to neural networks. R9 metrichor, nanonet, deepnano #AGBT17


coregenomics:
JS: canu+racon+nanopolish (60X) +pilon = 99.6% accuracy! #agbt17


DrT1973:
Just finishing up amplicon run using MinKNOW1.4.1 , ~8.2Gb sequence in 1.65M reads. Big jump from our previous high of 5.6Gb :0) @nanopore




□ Nanopore sequencing in microgravity:

>> http://www.nature.com/articles/npjmgrav201635

MinKNOW produced three signals (977, 3,710, and 63,362 events in length) from the flight run, but base calling reduced these signals to 170, 1,752, and 5,193 bases, respectively. “Skips” in the signal, bases predicted that do not correspond to events, occurred at a much lower rate than stays in both data sets, still higher in the flight data, a mean rate of 0.24 skips/base called for the flight data, and 0.11 skips/base called for the ground data.




□ DNA sequencing, genome assembly, and epigenetics on the International Space Station:

>> https://www.dropbox.com/s/s8rbk50vs2jwk10/Mason_AGBT_2017.pdf

Only need 1% of your cells’ DNA length to get to Mars. 225 million km distance to Mars/0.0018288 km of DNA/cell = 123 billion cells. m6A and other base modifications can be discerned on Earth and beyond with ONT data.






□ Deep Forest: Towards An Alternative to Deep Neural Networks:

>> https://arxiv.org/pdf/1702.08835v1.pdf

gc-Forest has much fewer hyper-parameters and is less sensitive to parameter setting. This method generates a deep forest ensemble, with a cascade structure which enables gcForest to do representation learning.






□ mixOmics: an R package for 'omics feature selection and multiple data integration:

>> http://biorxiv.org/content/early/2017/02/14/108597

the main challenge to face is data heterogeneity, due to inherent platform-specific artefacts (N-integration), or systematic differences arising from experiments assayed at different geographical sites or different times (P-integration).




□ A SUPERNOVA AT 50 parsec: EFFECTS ON THE EARTH’S ATMOSPHERE AND BIOTA:

>> http://biorxiv.org/content/biorxiv/early/2017/02/15/108936.full.pdf

Tropospheric ionization will increase proportionately, overall muon radiation load on terrestrial organisms will increase by a factor of 150. with a largely coherent field aligned along the line of sight to the supernova, in which case TeV-PeV cosmic ray flux increases are ~10000; in the case of a transverse field they are below current levels.




□ Quantum dynamics of incoherently driven V-type systems: Analytic solutions beyond the secular approximation:

>> http://aip.scitation.org/doi/10.1063/1.4954243

the Bloch-Redfield method employed here predicts rich dynamics in systems with more complicated ground state manifolds, where the non-secular terms produce additional phenomena due to interference effects between different ground states.




□ Sequence-specific bias correction for RNA-seq data using recurrent neural networks:

>> https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-3262-5

RNNs make for an attractive alternative because information about past history is automatically encoded in hidden units so that no explicit structures need be determined in advance.






□ Deep learning and the Schrödinger equation:

>> https://arxiv.org/abs/1702.01361

@yutakashino 深層学習の画像識別問題として多体電子系のシュレーディンガー方程式を解くという驚愕の試み.入力は二次元ポテンシャル画像で出力は波動関数という…




□ PathNet: Evolution Channels Gradient Descent in Super Neural Networks

>> https://arxiv.org/pdf/1701.08734.pdf

PathNet is composed of layers of modules. Each module is a Neural Network of any type, convolutional, recurrent, feedforward and whatnot. PathNet also significantly improves the robustness to hyperparameter choices of a parallel asynchronous reinforcement learning algorithm.




□ An Empirical Bayes Approach for High Dimensional Classification:

>> https://arxiv.org/pdf/1702.05056v1.pdf

an empirical Bayes estimator based on Dirichlet process mixture model for estimating the sparse normalized mean difference, which could be directly applied to the high dimensional linear classification. MAP (maximum a posterior) estimate of cluster weights including zero clusters is w ̃t = #{k : ηˆZk = mˆ t}/p and w ̃0 = #{k : ηˆZ∗ = 0}/p. a variational Bayes algorithm is developed to compute the posterior efficiently & parallelized to deal with the ultra-high dimensional case.







□ Waves as the Symmetry Principle Underlying Cosmic, Cell, and Human Languages:

>> http://www.mdpi.com/2078-2489/8/1/24

the meanings of the terms “entropy” & “information” are controversial, perhaps because of their lack of precise, mathematical definitions. if “entropy” and “information” can be shown to be related to “quanta” mathematically, such a triadic relation may contribute to clarifying the true meanings of “entropy” and “information”.






□ a new Hidden Markov Modeling method that satisfies detailed balance (HMM-DB):

>> http://www.cell.com/biophysj/fulltext/S0006-3495(16)30879-7

A hidden Markov model with detailed balance is crucial to modeling transitions at equilibrium. HMM-DB more accurately modeled single-molecule trajectories with limited data points or degenerate states than HMM-EM.




□ Data-driven reverse engineering of signaling pathways using ensembles of dynamic models:

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005379

SELDOM can be adapted to any signaling or gene regulation dataset obtained upon perturbation, even if prior knowledge is not available. it combines elements from information theory, ensemble modeling, parametric dynamic identification, logic-based modeling & model reduction.






□ Mapping DNA methylation with high-throughput nanopore sequencing

>> http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.4189.html






□ Modern machine learning far outperforms GLMs at predicting spikes:

>> http://biorxiv.org/content/biorxiv/early/2017/02/24/111450.full.pdf

The consistently best method was the ensemble, an instance of XGBoost stacked on the predictions of the GLM, NN, XGBoost & a random forest. The ensemble was significantly better than XGBoost (pseudo-R^2 of 0.08 [0.055 – 0.103], 95% bootstrapped CI) on the 10-dimensional set.






□ Testing the limits of gradient sensing:

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005386

derive two methods for generating a linear particle gradient in 3-dimensional particle-based stochastic diffusion simulations. These methods explain how many and how far to inject particles, and simulate tens of thousands of molecules over billions of time steps.






□ The genome Aggregation Database (gnomAD):

>> https://macarthurlab.org/2017/02/27/the-genome-aggregation-database-gnomad/

A spherized plot of the first 3 principal components from a principal components analysis done using 70k bi-allelic single nucleotide polymorphisms, Data were plotted using Tensorflow projector.




□ PHESANT: a tool for performing automated phenome scans in UK Biobank:

>> http://biorxiv.org/content/biorxiv/early/2017/02/26/111500.full.pdf

PHESANT data-coding information file specifies whether a data code of a categorical field defines an ordered or unordered category structure and use this information to assign each non-binary categorical (single) field as either an ordered or unordered categorical data type.




□ Demographic inference through approximate-Bayesian-computation skyline plots:

>> http://biorxiv.org/content/biorxiv/early/2017/02/27/112060.full.pdf




□ Bayesian Analysis of Evolutionary Divergence with Genomic Data Under Diverse Demographic Models:

>> https://academic.oup.com/mbe/article-abstract/doi/10.1093/molbev/msx070/3053364/Bayesian-Analysis-of-Evolutionary-Divergence-with

The new method implements a 2-step analysis, with an initial Markov chain Monte Carlo phase that samples simple coalescent trees, followed by the calculation of the joint posterior density for the parameters of a demographic model.






□ Coarse-graining time series data: Recurrence plot of recurrence plots and its application for music:

>> http://aip.scitation.org/doi/full/10.1063/1.4941371

cannot calculate the global determinism of the music whose elements are influenced with each other intricately. this method enabled the calculation of the global determinism by using the recurrence plots hierarchically.




□ ConsenSys explains self-sovereign identity on Ethereum at the United Nations

>> http://www.ibtimes.co.uk/consensys-explains-self-sovereign-identity-ethereum-united-nations-1607068

According to the World Bank, United Nations, and ID2020 project, there are currently 2-2.5 billion people among the "unbanked", who could benefit significantly from blockchain-based identity.