lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

Opus_XIV.

2015-12-13 00:28:40 | Science News
□ 『はじめに言葉ありき』それは理を写す鏡であるが故に、無限に記号を組み替え続けても、意識の認知できるフレームは鏡合わせでしかなく、様相を変えても深層が表層に顕れることはない。しばしばそれは鏡面に走る罅割れから覗く矛盾によって描かれる。『意味』は過重を与え、我々は鏡を破るべく生まれた。



□ 私たちは理不尽な社会の要求に応えるべく混沌を平伏させるように思える。しかし平和とは、均衡であり続けることは、自ずから理不尽なことなのだ。意味や価値に普遍を求めて人に成り替わるだけの私たちは、ほつれた借り物の躯体で機を織り続けている。






□ Doubly Bayesian Analysis of Confidence in Perceptual Decision-Making:

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004519

This difference between one-dimensional and multi-dimensional sensory data is one of the key differences, Previous models based on signal detection theory have typically assumed that the sensory data is one-dimensional, leaving them susceptible to the problem described above.

There is also a variety of “dynamic” signal detection theory models in which sensory data is assumed to accumulate over time. Such models are able to explain the interplay between accuracy, confidence, and reaction time. the sensory data is also summarised by a single scalar value, making it impossible to determine whether subjects’ confidence reports reflect heuristic or Bayes optimal computations.




□ Data-driven hypothesis weighting increases detection power in genomic analytics:

>> http://biorxiv.org/content/early/2015/12/13/034330

The covariate can be any continuous-valued or categorical variable that is thought to be informative on the statistical properties of each hypothesis test, while it is independent of the p-value under the null hypothesis. in expression- QTL or ChIP-QTL analysis, eligible covariates are the distance between the genetic variant and the genomic location of the phenotype, or measures of their comembership in a topologically associated domain. the optimal weight vector under a convex relaxation of the above optimization task, which in statistical terms corresponds to replacing the empirical cumulative distribution function of the p-values w the Greenlander estimator The resulting problem is convex and can be efficiently solved even for large numbers of hypotheses.




rikija:
saddle-point freeの最適化が洗練されればdeep learningと高次元のノンパラベイズ双方で大きな進歩になる. もともとfull-Bayesすれば最適なfeatureやカーネルを与えることは原理的に可能だったものの、高次元とposteriorの多峰性から無茶なアプローチは忌避されていた。SGDのような確率最適化テクニックがこの状況を劇的に変えつつあるわけだ。






□ A Functional Cartography of Cognitive Systems:

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004533

define the network role that a cognitive system plays in dynamics along two dimensions: stability vs. flexibility and connected vs. isolated. a data-driven clustering of regions into putative cognitive systems that is more statistically robust than examining community structure at individual times or in individual task windows, and provides sensitivity to a wide range of temporal scales of importance.




□ ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis:

>> http://www.genomebiology.com/2015/16/1/241

The fundamental empirical observation that underlies the zero-inflation model in ZIFA is that the dropout rate for a gene depends on the expected expression level of that gene in the population. Genes with lower expression magnitude are more likely to be affected by dropout than genes that are expressed with greater magnitude.




□ Accelerating Asymptotically Exact MCMC for Computationally Intensive Models via Local Approximations

>> http://arxiv.org/abs/1402.1694




□ Deep Genomics Raises $3.7M in seed financing in a funding round led by TrueVentures https://www.genomeweb.com/informatics/deep-genomics-raises-37m






□ Discover hidden splicing variations by mapping personal transcriptomes to personal genomes:

>> http://nar.oxfordjournals.org/content/early/2015/11/16/nar.gkv1099.long

investigated a distinct issue in RNAseq alignment, namely the identification of novel, personal specific splice junctions from personal data.

if a genetic polymorphism creates a novel splice site dinucleotide motif, the resulting splice junction reads utilizing novel splice site will likely be unmappable to the reference genome by a standard RNA-seq aligner.




□ “Broadband” Bioinformatics Skills Transfer with the Knowledge Transfer Programme (KTP):

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004512

The multidisciplinary nature of the bioinformatics field, coupled with rare and depleting expertise, is a critical problem for the advancement of bioinformatics in Africa.






□ BRANE Cut: Biologically-Related A priori Network Enhancement w Graph cuts for GRN Inference

>> http://biorxiv.org/content/early/2015/11/20/032383

Using biologically sound penalties and data-driven parameters, it improves three state-of-the-art GRN inference methods. Using this algorithm, the computational complexity of BRANE Cut is O(mn2|C|), where m (respectively n) is the number of edges (respectively the number of nodes) in the flow network Gf & |C| the cost of the minimal cut.






□ MEGENA: Multiscale Embedded Gene Co-expression Network Analysis

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004574

PMFG is not suitable for large-scale genomic data due to several drawbacks, such as the high computation complexity O(|V|^3) the presence of false-positives due to the maximal planarity constraint, and the inadequacy of the clustering framework.

MEGENA to infer gene co-expression networks, by implementing a parallelized algorithm for embedding co-expression networks on topological sphere and a new clustering analysis algorithm to detect coherent clusters at various compactness scales. MEGENA revealed not only biologically meaningful multi-scale clustering structures of gene co-expression, but also novel key regulators of important cancer biological processes like lineage-specific differentiations in LUAD.





(Fin plot for all SNPs. ‘Fin plot’ for analysis ‘All’. Hardy-Weinberg disequilibrium is plotted against MAF.)


□ Construction of relatedness matrices using genotyping-by-sequencing data:

>> http://www.biomedcentral.com/1471-2164/16/1047

□ Kinship (genetic relatedness) using GBS (genotyping-by-sequencing) with Depth adjust

>> https://github.com/AgResearch/KGD

GS can be applied using an explicit estimate of the genomic relatedness matrix with genomic best linear unbiased prediction (GBLUP). a method which gives unbiased estimates of relatedness, based on SNPs assayed by GBS, which accounts for the depth (including zero depth) of the genotype calls. A simple graphical method is given to illustrate this issue and to suggest an appropriate filter & can be excluded from the GRM calculations.






□ AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Drug Design:

>> http://www.atomwise.com/introducing-atomnet/

このAtomwiseを仕掛けたのも、やはりトップインキュベーターのYCombinator。Cofactor Genomicsの一件といい、シーケンスベースの創薬分野への投資も目立つ。

>> https://www.crunchbase.com/organization/atomwise#/entity

Atomwise fund received raises $6.57M in 4 Rounds / 10 Investors: Y Combinator, Data Collective, Khosla Ventures, DFJ, AME Cloud and OS Fund.


Atomwise uses Deep Learning Neural Networks to discover new medicines. Atomwise achieves the world’s best results for new drug hit discovery. The locally-constrained deep convolutional architecture allows the system to model the complex, non-linear phenomenon of molecular binding by hierarchically composing proximate basic chemical features into more intricate ones.

By incorporating structural target information AtomNet can predict new active molecules even for targets with no previously known modulators. AtomNet shows outstanding results on a widely used structure-based benchmark achieving an AUC greater than 0.9 on 57.8% of the targets in the DUDE benchmark, far surpassing previous docking methods.






□ World’s first genomic medicine clinic, located in Huntsville

>> http://whnt.com/2015/11/20/hudsonalpha-opens-doors-to-worlds-first-genomic-medicine-clinic/

The Smith Family Clinic for Genomic Medicine has been established to use whole genome sequencing to diagnose rare, undiagnosed disease.


□ HudsonAlpha in DDN SC15: End to End Infrastructure Design for Large Scale Genomics: https://www.youtube.com/watch?v=xp04N5Hezkg

HudsonAlpha computes About 20 TFLOPS, doubling in the next year, and producing 3 TB/Day of data from sequencers & 7.5 TB/Day from compute.




□ CHARMM-GUI HMMM Builder for Membrane Simulations with the Highly Mobile Membrane-Mimetic Model:

>> http://www.cell.com/biophysj/abstract/S0006-3495%2815%2901047-4

The HMMM Builder is designed to provide bilayer simulation systems (and inputs) with the Highly Mobile Membrane-Mimietic (HMMM) model to study binding and insertion of molecules into the membrane. Based on the system size determined in the previous step, this step builds individual pieces such as the lipid bilayer around the protein, additional water molecules to fully solvate the protein & ions (Monte Carlo sampling or distance-based algorithm) for a given concentration.






□ gammaMAXT: a fast multiple-testing correction algorithm:

>> http://www.biodatamining.org/content/pdf/s13040-015-0069-x.pdf

The algorithm was implemented in software MBMDR-4.2.2, as part of the MB-MDR framework to screen for SNP-SNP, SNP-environment or SNP-SNP-environment interactions at a genome-wide level. in the absence of interaction effects, test-statistics produced by the MB-MDR methodology follow a mixture distribution with a point mass at zero and a shifted gamma distribution for the top 10 % of the strictly positive values.




□ HYBRIDSPADES: an algorithm for hybrid assembly of short and long reads:

>> http://bioinformatics.oxfordjournals.org/content/early/2015/11/20/bioinformatics.btv688.short




□ SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis:

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004575

a comparative evaluation of SINCERA with three recently available single-cell RNA-seq analysis tools, SNN-Cliq, scLVM and SINGuLAR. The analytic pipeline consists of three main components: pre-processing, cell type identification, and cell type specific gene signature and driving force identification.




□ Memory and Combinatorial Logic Based on DNA Inversions: Dynamics and Evolutionary Stability:

>> http://pubs.acs.org/doi/10.1021/acssynbio.5b00170

build a NOT gate where the input promoter drives FimE and in the absence of signal the reverse state is maintained by the constitutive expression of HbiF. The evolutionary stabilities of these circuits are measured by passaging cells while cycling function.




□ ASAP: A Machine-Learning Framework for Local Protein Properties:

>> http://biorxiv.org/content/biorxiv/early/2015/11/21/032532.full.pdf

CleavePred, an ASAP-based model trained to solve the following RLBP task for each residue along the precursor, predicting whether it is a cleavage site or not.




□ Kibana and Kibi for Big (Relational) Lifes Sciences data: a ChEMBL test:

>> http://siren.solutions/kibana-and-kibi-for-big-relational-lifes-sciences-data-a-chembl-test/

It is distributed in compressed Postgres, Oracle or Mysql dumps which turn into well over 20 million records interconnected across 63 tables. Elasticsearch cluster using the Logstash JDBC connector to create 5 indexes in Elasticsearch, relationally interconnected via ID properties.




□ Graphical pan-genome analysis with compressed suffix trees and the Burrows–Wheeler transform:

>> http://bioinformatics.oxfordjournals.org/content/early/2015/11/19/bioinformatics.btv603.short

The first implements a novel linear-time suffix tree algorithm by means of a compressed suffix tree. The second algorithm uses the Burrows–Wheeler transform to build the compressed de Bruijn graph in O(n log σ) time.




□ BAGEL: A computational framework for identifying essential genes from pooled library screens:

>> http://biorxiv.org/content/early/2015/11/27/033068.full-text.pdf+html

BAGEL (Bayesian Analysis of Gene EssentiaLity), a supervised learning method for analyzing gene knockout screens.




□ Learning structure in gene expression data using deep architectures, with an application to gene clustering:

>> http://biorxiv.org/content/early/2015/11/16/031906

deep architectures pre-trained unsupervised manner using denoising autoencoders as a preprocessing step for an unsupervised learning task. empirically demonstrate the advantage of using gene expression samples regenerated from the low-dimensional codes for the task of clustering.




□ AGOUTI: improving genome assembly and annotation using transcriptome data:

>> http://biorxiv.org/content/early/2015/11/26/033019.full-text.pdf+html

AGOUTI is a tool that uses RNA-seq data to simultaneously combine contigs into scaffolds and fragmented gene models into single models. AGOUTI carries out scaffolding by first constructing an edge-weighted adjacency graph made up of contigs and the supporting joining-pairs. then denoises the graph by removing erroneous joining-pairs based on the presence of intervening genes and read orientation.




□ Manta: Rapid detection of structural variants and indels:

>> http://bioinformatics.oxfordjournals.org/content/early/2015/12/07/bioinformatics.btv710.abstract

Manta combines paired and split-read evidence during SV discovery and scoring to improve accuracy, but does not require split-reads or successful breakpoint assemblies to report a variant in cases where there is strong evidence otherwise.






□ Why are we growing decision trees via entropy instead of the classification error?:

>> http://sebastianraschka.com/faq/docs/decisiontree-error-vs-entropy.html

the Entropy is always larger than the averaged Entropy due to its "bell shape," which is why we keep continuing to split the nodes in contrast to the classification error.




□ Data-driven medicine: Sophia Genetics becomes largest clinical genomics network:

>> http://www.telegraph.co.uk/technology/news/12023634/Data-driven-medicine-Sophia-Genetics-becomes-largest-clinical-genomics-

Sophia Genetics applies machine learning algorithms to analyse the samples for disease-related genes within two hours.




□ STARR-seq: bioinformatics & molecular biology methods - systematic understanding transcriptional regulatory elements

>> http://starr-seq.starklab.org/data/

activators and repressors can have diverse regulatory functions that typically depend on the enhancer context. The systematic functional characterization of TFs & cofactors should further understanding of combinatorial enhancer control & gene regulation.




□ metilene: Fast and sensitive calling of differentially methylated regions from bisulfite sequencing data:

>> http://genome.cshlp.org/content/early/2015/12/02/gr.196394.115.abstract

binary segmentation algorithm combined w/ a 2-dimensional statistical test allows to detect DMRs in large methylation experiments in minutes. DMRs can by filtered by q-value, number CpGs, length in nucleotides and mean methylation difference.




□ Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers & Recurrent Neural World

>> http://arxiv.org/abs/1511.09249

algorithmic information content or Kolmogorov complexity of some computable object is the length of the shortest program that computes it. bell-shaped, zero-centered probability distribution Pe on the finite number of possible real-valued prediction errors ei,τ = (predi(τ)-sensei(τ))^2 and encode each ei,τ by -logPe(ei,τ ) bits.




□ Phenotypic robustness determines genetic regulation of complex traits:

>> http://biorxiv.org/content/early/2015/12/04/033621

Environment-dependent effect of release of variance as well as high consistency in buffering indicates that a locus that buffers the phenotypic capacitance will, have an antagonistic effect on the population mean in an environment-dependent manner. under the alternative hypothesis, the phenotypes of the two alleles reveal a difference in the variance. As a result, the corresponding LOD scores indicate markers responsible for genetic canalization defined as variance-QTL (vQTL).