lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

Ripples.

2016-05-05 05:05:05 | Science News

(iPhone 6s, Camera.)

コルモゴロフ複雑性のパースペクティブからは、人間の認知バイアス自体が、全ての記述可能な言語領域に対し偏在する島のように浮かんでいる。データ解析は本質的に計算機同士のエミュレート可能性に依りたっているが、その解釈を人工知能にさせるという試みは、島より大海に漕ぎでる意味合いもある。






□ Music Language Modeling with Recurrent Neural Networks:

>> http://gitxiv.com/posts/gquaBY76n92HLRvnH/music-language-modeling-with-recurrent-neural-networks

l(x, m, h; f) = αlog(ef(x)/ΣM-1ef(x)n)+(1-α)log(ef(x)+h/ΣM+Hef(x)n)

α=melody coefficient, If α is at 1.0, the model will ignore the harmony. α can be set to zero to learn exclusively over the harmony targets. This hyperparameter introduces a flexibility that is uniquely advantageous to this model over the cross-entropy formulation.








□ Network structure, metadata and the prediction of missing nodes:

>> http://arxiv.org/pdf/1604.00255v1.pdf

construct a joint generative model for the data & metadata, a non-parametric Bayesian framework to infer parameters from annotated datasets. maximizing the joint Bayesian likelihood is identical to the minimum description length criterion, which is a formalization of Occam’s razor. employing the fast MCMC algorithm developed in hierarchical stochastic block model (SBM), the inference procedure scales linearly as O(N) (or log-linearly O(N ln2 N) when obtaining the full hierarchy. separating groups of annotations with respect to their predictiveness, and hence can be used to prune such datasets from “metadata noise”.




□ BAGEL: A computational framework for identifying essential genes from pooled library screens.

>> http://biorxiv.org/content/early/2015/11/27/033068

BAGEL (Bayesian Analysis of Gene EssentiaLity), a supervised learning method for analyzing gene knockout screens. BAGEL accurately models the wide variability in phenotype shown by reagents targeting known essential genes, enabling the sensitive and precise identification of fitness genes, even under conditions of suboptimal data quality.




□ BAM2fastx tools: conversion of PacBio BAM files into gzipped fasta & fastq including demultiplexing of barcoded data

>> https://github.com/PacificBiosciences/bam2fastx




□ Cytosine Variant Calling with Highthroughput Nanopore Sequencing:

>> http://biorxiv.org/content/biorxiv/early/2016/04/04/047134.full.pdf

this method is based on a generative model of the MinION’s ionic current signal. augment the HMM by modeling the ionic current distributions with a hierarchical Dirichlet process mixture model (HDP), a Bayesian nonparametric method that shares statistical strength to robustly estimate a set of potentially complex distributions.




□ Promoting platform interoperability with portable bcbio workflows:

>> https://github.com/chapmanb/bcbb/blob/master/abstracts/bosc_2016_bcbio_cwl/chapman_bosc.md

re-engineered bcbio's internal workflow representation to use the Common Workflow Language. A clinical lab requiring full data provenance could run the generated bcbio CWL using Arvada's.




pmelsted:
Kallisto is published. I've had some straight-to-video papers, this is not one of them

>> http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.3519.html




amcrisan:
STAN for your Bayesian statistics needs (http://mc-stan.org/ ). #CANSSI






□ A Genealogical Look at Shared Ancestry on the X Chromosome:

>> http://biorxiv.org/content/biorxiv/early/2016/04/03/046912.full.pdf

An triangle of binomial coefficients, similar to an offset version of Pascal’s triangle. grows as Fibonacci numbers




□ □ NVIDIA's Pascal Is Shipping: Company Announces Tesla P100 Compute GPU & DGX-1 Deep Learning Server:

>> http://techgage.com/news/nvidias-pascal-is-shipping-company-announces-tesla-p100-compute-gpu-dgx-1-deep-learning-server/






□ SATORI: A System for Ontology-Guided Visual Exploration of Biomedical Data Repositories:

>> http://biorxiv.org/content/biorxiv/early/2016/04/05/046755.full.pdf

SATORI aims at an improved information foraging process by enriching the search process and provides means of semantic top-down exploration.




□ Artemis: Rapid and Reproducible RNAseq Analysis for End Users using Kallisto within BaseSpace cloud platform:

>> http://biorxiv.org/content/biorxiv/early/2016/04/06/031435.full.pdf

Artemis reduce a computational bottleneck freeing inefficiencies from utilizing ultra-fast transcript abundance calculations while simultaneously connecting an accelerated quantification software to the Sequencing Read Archive. Artemis encapsulates expensive transcript quantification preparatory routines, while uniformly preparing Kallisto execution commands within a versionized environment encouraging reproducible protocols.






□ Hypercycle:

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004853

a genome integration of the hypercycle can proceed by linking its members into a single chain, which forms a precursor of a genome. Numerical simulations showed that when stochastic effects are taken into account, compartmentalization is sufficient to integrate information dispersed in competitive replicators w/o the need for hypercycle organization. Another model based on cellular automata, taking into account a simpler replicating network of continuously mutating parasites and their interactions with one replicase species, exhibited an emergent travelling wave pattern.




□ Towards Bayesian Deep Learning: A Survey

>> http://arxiv.org/pdf/1604.01662v2.pdf

Bayesian deep learning strives to combine the merits of PGM & NN by organically integrating in a single principled probabilistic framework.




□ Building a Kraken database with new FTP structure and no GI numbers:

>> http://www.opiniomics.org/building-a-kraken-database-with-new-ftp-structure-and-no-gi-numbers/




□ Public medical research funding stimulates private R&D investment

>> http://blogs.biomedcentral.com/on-medicine/2016/02/24/public-medical-research-funding-stimulates-private-rd-investment/




□ SLICER: Inferring Branched, Nonlinear Cellular Trajectories from Single Cell RNA-seq Data:

>> http://biorxiv.org/content/biorxiv/early/2016/04/09/047845.full.pdf

SLICER can select genes without prior knowledge of the process, and automatically determine the location and number of branches and loops. experiments with Isomap, Hessian LLE, Laplacian eigenmaps & diffusion maps, found that Locally Linear Embedding seemed to give best results.




□ PALADIN: Protein Alignment for Functional Profiling Whole Metagenome Shotgun Data:

>> http://biorxiv.org/content/biorxiv/early/2016/04/07/047712.full.pdf




□ Detecting gene-gene interactions using a permutation-based random forest method:

>> http://biodatamining.biomedcentral.com/articles/10.1186/s13040-016-0093-5

Most machine learning methods incl Random Forest & Multifactor Dimensionality Reduction, are not well suited for unbalanced numbers of cases. The permutation based strategy is not only suitable for application by RF, but could also be used with other machine learning algorithms, such as Deep Learning field, which models high-level abstractions from genetic data by the complex architectures. this approach could used on both the categorical data and continuously data, due to the fact that Random Forest could be grown using the categorical and continuously data.







□ 「パナマ文書」解析の技術的側面

>> https://medium.com/@c_z/パナマ文書-解析の技術的側面-d10201bbe195#.qokisrxct

"The democratization of technologies to make sense of data at scale is an important part of a free and open society, and I’m proud of the role we play in that evolving landscape ― not only in the case of Swiss Leaks and the Panama Papers, but in solving future problems we can’t even yet imagine." - Emil Eifrem, CEO, Neo Technology




□ CISCA: A Class-Information-Based Sparse Component Analysis Method to Identify Differentially Expressed Genes:

>> http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=7126959

the total scatter matrix is gotten by combining the between-class and within-class scatter matrices. CISCA decomposes the constructed data matrix by solving an optimization problem with sparse constraints on loading vectors.






□ Third-generation sequencing and the future of genomics:

>> http://www.biorxiv.org/content/biorxiv/early/2016/04/13/048603.full.pdf






□ RapClust: Lightweight Clustering of de novo Transcriptomes use Fragment Equivalence Classes

>> http://arxiv.org/pdf/1604.03250.pdf

sequence-level analysis of the resulting clusters of contigs could reveal important information about the nature of the transcripts. one could imagine “reverse-engineering” the splicing patterns present in the transcripts occurring in the same cluster. This would allow one to build a virtual gene model, even in the completely de novo context in downstream for differential splicing analyses.




□ Valid parameter space of a bivariate Gaussian Markov random field with a generalized block-Toeplitz precision matrix

>> http://arxiv.org/abs/1604.05478v1

bivariate GMRF is part of a hierarchical model used in spatial statistics to analyze data coming from projections of regional climate change. The goal is to efficiently evaluate quadratic forms and log-determinants involving matrices of the form without the need for computing the Cholesky decomposition of large sparse matrices.




□ Real-time single-molecule electronic DNA sequencing by synthesis using polymer-tagged nucleotides on a nanopore array

>> http://www.pnas.org/content/early/2016/04/14/1601782113.long

Sequencing is realized on an electronic chip containing an array of independently addressable electrodes, each with a single polymerase–nanopore complex, potentially offering the high throughput required for precision medicine.




□ Variant callers return different call sets when re-run on the same data

>> http://bioinformatics.oxfordjournals.org/content/early/2016/03/11/bioinformatics.btw139.abstract

HaplotypeCaller showed the most discrepancies, where the discordant calls were less pronounced in Freebayes and Platypus results. Using the same alignments twice, the callers themselves are deterministic and return different call sets when the same data is remapped.




□ Network-Based Interpretation of Diverse High-Throughput Datasets through the Omics Integrator Software Package:

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004879

The software is comprised of two individual tools, Garnet and Forest, that can be run together or independently to allow a user to perform advanced integration of multiple types of high-throughput data as well as create condition-specific subnetworks of protein interactions that best connect the observed changes in various datasets.




□ GENEWIZ Expands High Throughput Sequencing Capabilities with Investment in Multiple Platforms:

>> http://www.prnewswire.com/news-releases/genewiz-expands-high-throughput-sequencing-capabilities-with-investment-in-multiple-platforms-300252330.html

GENEWIZ has acquired multiple high-throughput sequencing platforms incl. the Illumina HiSeq X Ten, PacBio Sequel, and 10x Genomics Chromium.




□ DNAnexus Partners With SolveBio to Make Genomic Data More Actionable for Pharmaceutical and Diagnostics Industry:

>> http://www.businesswire.com/news/home/20160421005311/en/Leading-Cloud-Genomics-Company-DNAnexus-Partners-SolveBio




□ Beta-Poisson model for single-cell RNA-seq data analyses:

>> http://bioinformatics.oxfordjournals.org/content/early/2016/04/18/bioinformatics.btw202.abstract




□ rescue lost code from a Jupyter:

>> http://blog.rtwilson.com/how-to-rescue-lost-code-from-a-jupyteripython-notebook/




□ Pushed beta support for R9 RNN base caller to pore tools:

>> https://github.com/arq5x/poretools/commit/07657c794e0c1ef8303d866f6cfc74fbe54d9762

'r9rnn' : { 'template' : '/Analyses/Basecall_RNN_1D_%03d/BaseCalled_template'}
self.hdf5file["/Analyses/Basecall_RNN_1D_%03d/BaseCalled_template" % (http://self.group )]
return 'r9rnn'




□ DeepBlue epigenomic data server: programmatic data retrieval & analysis of epigenome region sets.

>> http://nar.oxfordjournals.org/content/early/2016/04/15/nar.gkw211.long

The DeepBlue terms are used in the samples to name the source from Cell Type Ontology, Experimental Factor Ontology & Uber Anatomy Ontology. XML-RPC protocol for implementing the DeepBlue API to maximize compatibility with various programming languages.




□ Ultrahigh Dimensional Feature Selection via Kernel Canonical Correlation Analysis:

>> http://arxiv.org/abs/1604.07354v1

applied KCCA-SIS to spatiotemporal gene expression dataset for human brain development and obtained better results based on gene ontology enrichment analysis comparing to the other existing methods.




□ FALCON release v 0.5.0 incl LSF Support:

FALCON-integrate: https://github.com/PacificBiosciences/FALCON-integrate

Falcon Genome Assembly Tool Kit: https://github.com/PacificBiosciences/FALCON/wiki/Manual




□ DNAnexus Cloud Genomics Platform Now Supports Data Management & Genomic Analysis for Leading Genome Research Center

>> http://www.businesswire.com/news/home/20160428005202/en/DNAnexus-Cloud-Genomics-Platform-Supports-Data-Management

This large network of laboratories that need access to the Illumina-powered sequencing center run by SCGPM has resulted in the organization. researchers can contract SCGPM to sequence samples on an Illumina HiSeq 2000 for a minimum price of $1,400 per lane.




□ Operator Calculus for Information Field Theory:

>> http://arxiv.org/abs/1605.00660v1

a new way of translating expectation values to a language of operators which is similar to that in quantum mechanics. finding analogies to the BCH formula for function classes other than the exponential function which will to apply the operator formalism.




□ Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks:

>> http://genome.cshlp.org/content/early/2016/05/03/gr.200535.115.full.pdf

Basset predictions for the change in accessibility between variant alleles were far greater for GWAS SNPs that are likely to be causal relative to nearby SNPs in linkage disequilibrium with them. Basset assigns low SAD profiles to many nucleotides overlapped by these regions, calling in to question their consideration for causal roles.




□ SCell: integrated analysis of single-cell RNA-seq data:

>> http://bioinformatics.oxfordjournals.org/content/early/2016/05/05/bioinformatics.btw201.abstract

SCell integrates quality filtering, normalization, feature selection, iterative dimensionality reduction, clustering & estimate GE gradients. SCell implements PCA for dimensionality reduction, and optionally Varimax-rotation. apply Varimax Rotation to post-process the PCA, a technique which will decrease the number of genes that load both PCA axes to a comparable degree and can make the PCA axes more interpretable.




□ Multi Level Monte Carlo methods for a class of ergodic stochastic differential equations:

>> http://arxiv.org/pdf/1605.01384v1.pdf

a multilevel version of the recently introduced Stochastic Gradient Langevin (SGLD) method that this is the first stochastic gradient MCMC method with complexity O(ε-2|logε|3), which is asymptotically an order ε lower than the O(ε-3) complexity of all stochastic gradient MCMC methods that are currently available.