lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

MaGuS.

2016-03-31 09:08:20 | Science News

(iPhone 6s: camera)

□ 生物種という境界概念が無いのだとしたら、私たち他者や葦との異なりは、等しく空間に散逸する砂粒の模様の違いでしかない。そこに柵や水槽のようなレトリックもなく、文字通りの意味で、砂粒と私たちとを別つ理である。




□ 特有と共有は相容れないものではない。確かに自身の経験する思考や感情が、他者にとって既知であることを担保するものはない。然しながら記憶や発話が、対面の他者との表現上のレファレンスを書き換えているだけではないことは自明でありそうだ。私たちは差異を通じて同化しようとしているのである。




□ Private algebras in quantum information and infinite-dimensional complementarity:

>> http://scitation.aip.org/docserver/fulltext/aip/journal/jmp/57/1/1.4935399.pdf

This new framework is particularly amenable to the important class of linear bosonic channels, and preliminary investigations suggest a deeper connection between comple- mentarity and symplectic duality.






□ MaGuS: quality assessment and scaffolding of genome assemblies with Whole Genome Profiling™

>> http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-0969-x

Based on the QUAST NA50 and NA75, ranked the assemblies from the highest to lowest quality as BESST, OPERA-LG, SSPACE, SOAPdenovo2, and SGA. Existing scaffolder tools encounter issues when dealing with repeat-rich regions. The use of a map overcomes this problem if a contig or scaffold can be anchored onto the map. For large genomes, the sequencing depth of an MP library may result in low covered regions. Users of scaffolding programs often set a minimum cut-off for read pairs required to validate a link btwn contigs, to avoid assembly errors. The use of MaGuS is not restricted to WGP maps, other genome map types can be integrated after formatting.







□ Assortative mating can impede or facilitate fixation of underdominant alleles

>> http://biorxiv.org/content/biorxiv/early/2016/03/03/042192.1.full.pdf

The n-choice model is a simple, mechanistic implementation of positive assortative mating that accounts for the fact that in reality rare genotypes are less likely to find preferred mates. The n-choice model fulfills the realistic condition that individuals can survey only a limited number of prospective mates.




□ Demographic inference under the coalescent in a spatial continuum

>> http://www.biorxiv.org/content/biorxiv/early/2016/03/02/042135.full.pdf

the spatial Λ-Fleming-Viot (ΛV) model is amenable to parameter inference under biologically realistic conditions. REX corresponds to either a single reproduction event accompanied by extinction of the parent w the offspring dispersing over long distances or a sum of multiple reproduction and extinction events, each reproduction accompanied by dispersal of the offspring over short distances. the average time between two successive REX in a given lineage is proportional to the generation time of the species under scrutiny.




□ HPG pore: an efficient and scalable framework for nanopore sequencing data

>> http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-0966-0

HPG Pore, the first scalable bioinformatic tool for exploring and analyzing nanopore sequencing data using the Hadoop framework. HPG Pore allows for virtually unlimited sequencing data scalability, efficient management of huge amounts of data. When HPG Pore runs in Hadoop mode it is faster than Poretools and poRe, despite an initial delay due to the preparation of the Hadoop nodes. as expected, the speed is even faster when more nodes are available, thus it outperforms the other two programs when running in local mode. The latency of the Hadoop framework causes the paradox that the stand alone version slightly slower than the counterpart running on one node.




□ The Ensembl Variant Effect Predictor: analysis, annotation & prioritization of genomic variants in non-coding region

>> http://biorxiv.org/content/biorxiv/early/2016/03/04/042374.full.pdf

The VEP incl PubMed identifiers for variants which have been cited, and also annotates those associated with a phenotype, disease or trait, using data from OMIM, Orphanet, the GWAS Catalog and other data sources.




□ Comparative evaluation of isoform-level gene expression estimation algorithms for RNA-seq and exon-array platforms

>> http://bib.oxfordjournals.org/content/early/2016/02/26/bib.bbw016.long

Multi-Mapping Bayesian Gene eXpression (MMBGX) programs achieved the best performance for RNA-seq and exon-array platforms. While eXpress achieved the highest correlation with the RT-qPCR and exon-array (MMBGX) results overall, RSEM was more highly correlated with MMBGX for changes in highly variable transcripts.




□ XGBoost: A Scalable Tree Boosting System: a sparsity-aware algorithm and weighted quantile sketch for tree learning.

>> http://arxiv.org/abs/1603.02754v1

XGBoost handles all sparsity patterns in a unified way. this method exploits the sparsity to make computation complexity linear to number of non-missing entries in the input. the cache-aware implementation of the exact greedy algorithm runs twice as fast as the naive version when the dataset is large. Scaling of XGBoost with different number of machines on criteo full 1.7 billion data with only four machines. Using more machines results in more file cache and makes the system run faster, causing the trend to be slightly super linear.




□ Topological phase transition of a fractal spin system: The relevance of the network complexity:

>> http://scitation.aip.org/content/aip/journal/adva/6/5/10.1063/1.4942826

the random links remove infrared divergences by breaking the inversion symmetry of a complex network. the dimensionality restrictions imposed by Mermin-Wagner theorem which assumes translational invariance, are completely removed. Particular attention is given to complex networks of small-world type. The phase a system adopts, in thermal equilibrium, is characterized by spontaneous symmetry breaking and by the ratio of the order parameter and the characteristic length of the system. time reversal inversion and discrete symmetries, neglecting higher order terms, the free energy Φ0(x, T) as Φ0(x, T) = A(T)|∇ψ|2 + B(T)|ψ|2,




□ A modified PATH algorithm rapidly generates transition states comparable to those found by other algorithms:

>> http://scitation.aip.org/docserver/fulltext/aca/journal/sdy/3/1/1.4941599.pdf

PATH defines the structures of equilibrium states using a linearized ANM potential. This approximation of the complex potential energy landscape works because most protein conformational changes are small displacements.




□ NanoSim: nanopore sequence read simulator based on statistical characterization:

>> http://biorxiv.org/content/biorxiv/early/2016/03/18/044545.full.pdf

The perfect flag of NanoSim can generate perfect reads with no errors, relying on the full-length distribution of aligned reads.




□ DeepNano: Deep Recurrent Neural Networks for Base Calling in MinION Nanopore Reads:

>> http://arxiv.org/pdf/1603.09195v1.pdf

DeepNano is a more accurate and efficent alternative to the HMM-based methods used in the Metrichor base caller by the device manufacturer. DeepNano does not require read overlaps, it processes reads individually and provides more precise base calls for downstream analysis. SGD is better at avoiding bad local optima in the initial phases of training, while L-BFGS seems to be faster during the final fine-tuning.


mattloose:
Very excited - rnn base caller running on R9....


Clive_G_Brown:
@mattloose for example, at 150-180mv. 200mv adds a 1-2%. Looking at 250 and 300mv. Note 2D tidied up since talk.




□ deepTarget: End-to-end Learning Framework for microRNA Target Prediction using Deep Recurrent Neural Networks

>> http://arxiv.org/pdf/1603.09123v1.pdf

The performance gap between the proposed method and existing alternatives is substantial (over 25% increase in F-measure), and deepTarget delivers a quantum leap in the long-standing challenge of robust miRNA target prediction.




□ ACE: adaptive cluster expansion for maximum entropy graphical model inference:

>> http://biorxiv.org/content/biorxiv/early/2016/03/18/044677.full.pdf

ACE avoids overfitting by constructing a sparse network of interactions sufficient to reproduce the observed correlation data within the statistical error expected due to finite sampling. an extension of the adaptive cluster expansion (ACE) method, originally devised for binary (Ising) variables, to more general (Potts) variables taking multiple categorical values. When convergence of the ACE algorithm is slow, we combine it with a Boltzmann Machine Learning algorithm (BML). Refinement with Boltzmann Machine Learning, adapted the RPROP algorithm for neural network learning to the case of Potts models.




□ EMPIAR: Supporting the bioimaging revolution: Commentary in @naturemethods

>> http://www.ebi.ac.uk/about/news/press-releases/supporting-bioimaging-revolution

EMPIAR is part of a larger project at PDBe called “MOL2CELL”, that aims to integrate 3D structural information on different length scales.




□ Robust normalization protocols for multiplexed fluorescence bioimage analysis:

>> http://biodatamining.biomedcentral.com/articles/10.1186/s13040-016-0088-2

the proposed method produces higher between class Kullback-Leibler (KL) divergence and lower within class on a distribution of phenotypes. the combination of No clipping + bilateral filtering + linear scaling which produces the best results.







□ Modeling multi-particle complexes in equilibrium/non-equilibrium stochastic chemical system

>> http://biorxiv.org/content/biorxiv/early/2016/03/23/045435.full.pdf

The concept of occlusion is essential for any rule-based description is realized by using a Fock space constructed from hard core bosons. Using this composite Fock space dramatically simplifies the Hamiltonian of the 0-Dimentional polymer system. one readily verifies that gallery G and Hamiltonian properly define a zero-dimensional monomeric particles in the grand canonical ensemble.




□ An integral formula adapted to different boundary conditions for arbitrarily high-dimensional nonlinear KG equations

>> http://scitation.aip.org/docserver/fulltext/aip/journal/jmp/57/2/1.4940050.pdf

The operators have a complete system of orthogonal eigenfunctions in the complex Hilbert space L2(Ω). Because of the isomorphism between L2 and l2, the operator ∆ on L2(Ω) induces a corresponding operator on l2. the Laplacian-valued functions de ned on D(∆) depending on different boundary conditions are bounded operators with respect to the norm.




□ An algebraic approach to parameter optimization in biomolecular bistable systems:

>> http://biorxiv.org/content/biorxiv/early/2016/03/24/045518.full.pdf

Algebraic equilibrium conditions are also bistability conditions in a class of bistable networks. Theorem 3: Consider a linear positive system x ̇ = Ax, whose characteristic polynomial is an F-polynomial. The index of a regular equilibrium point x ̄ is the sign of the determinant of -J(x ̄):

ind(x ̄) = sign[det(-J(x ̄))]




□ Div-Seq: A single nucleus RNA-Seq method reveals dynamics of rare adult newborn neurons in the CNS:

>> http://biorxiv.org/content/biorxiv/early/2016/03/27/045989.full.pdf

Div-Seq, which combines Nuc-Seq, a scalable single nucleus RNA-Seq method, with EdU-mediated labeling of proliferating cells. Div-Seq provides a unique opportunity to profile the transcriptional program underlying neuronal maturation. Div-Seq’s ability to clearly identify and characterize rare cells in the spinal cord shows its significantly improved sensitivity.




□ RQBoost: Developing Quantum Annealer Driven Data Discovery to Quantum Machine Learning:

>> http://arxiv.org/pdf/1603.07980v1.pdf

explore the performance of RQBoost in the space of NLP and seizure prediction and find QA-enabled ML using QBoost and RQBoost. RQBoost are sensitive to initial feature space partitions when the number of features is larger than the number of possible QUBO variables. RQBoost could be used as an ensembler where the base learners are traditional machine learning algorithms.




□ Markov substitute processes : a new model for linguistics and beyond:

>> http://arxiv.org/abs/1603.07850v1

define one-dimensional Markov substitute processes and shown that they are an extension of one-dimensional Markov random fields. generalize the notion of a multi-dimensional Markov random field by proposing a definition for multi-dimensional Markov substitute processes. the state space of the model is D+= ∞j=1 Dj the set of all sentences of finite non-zero length. the empty string ε of zero length, D∗={ε}∪D+




□ Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance:

>> http://arxiv.org/pdf/1603.07879v1.pdf

K-Means is formally equivalent to EM as K-Means is a limiting case of fitting data by a mixture of k Gaussians with identical, isotropic covariance matrices (Σ = σ2I), when the soft assignments of data points to mixture components are hardened to allocate each data point solely to the most likely component. A random space is isotropic if its covariance function depends on distance alone.

M-Step: Compute means μj and covariance matrices Σj for j = 1, ..., k, based on the results of K-Means step.

Wj = nj/N for j = 1, ..., k.

E-Step: For each given data vector Xi (i = 1, 2, ..., N), the cluster probability P(Cj|Xi) for j = 1, ..., k

Max{P(Cj | Xi ); j = 1,...,k}.

K-Means Step: Assign each data vector Xi to the closest cluster with mean, μj using Euclidean distance




□ PhredEM: A Phred-Score-Informed Genotype-Calling Approach for Next-Generation Sequencing Studies:

>> http://biorxiv.org/content/biorxiv/early/2016/03/29/046136.full.pdf

PhredEM uses the Expectation-Maximization algorithm to obtain consistent estimates of genotype frequencies & logistic regression parameters. a simple and computationally efficient screening algorithm to identify monomorphic loci. Nominally, the phred score is defined as
Q = -10 log10 Pr(observed allele ≠ true allele),




□ Nanocall: An Open Source Basecaller for Oxford Nanopore Sequencing Data:

>> http://biorxiv.org/content/early/2016/03/28/046086

Nanocall uses some simple heuristics for splitting the sequence of events (current levels) into strands, it models the events using a hidden Markov model where the states are the kmers being sequenced, optionally scales the pore model emissions using several rounds of Expectation Maximization based on posteriors computed w/ Forward-Backward, it produces basecalls by running Viterbi.




□ Classification-based Financial Markets Prediction using Deep Neural Networks:

>> http://arxiv.org/pdf/1603.08604v1.pdf

Gaussian random numbers are generated from transforming the uniform random numbers with an inverse Gaussian cumulative distribution function with zero mean and standard deviation of 0.01. the subset of the training set used for each epoch is defined as

De := {xnk ∈ Dtrain | nk ∈ U(1,Ntrain),k := 1,...,Nepoch}

If cross entropy(e) ≤ cross entropy(e-1) then γ ← γ/2




□ Feather: fast, interoperable binary data frame storage:

>> https://github.com/wesm/feather

Language agnostic: Feather files are the same whether written by Python or R code. Other languages can read and write Feather files

arr = np.random.randn(10000000) # 10% nulls
arr[::10] = np.nan
df = pd.DataFrame({'column_{0}'.format(i): arr for i in range(10)})

In [9]: %time df = feather.read_dataframe('test.feather')
CPU times: user 316 ms, sys: 944 ms, total: 1.26 s
Wall time: 1.26 s






□ DiagrammeR: Generate graph diagrams using text in a Markdown-like syntax.

>> http://rich-iannone.github.io/DiagrammeR/

DiagrammeR directly edit mermaid (.mmd) or Graphviz (.gv) files inside RStudio with syntax coloring. Preview easily. Scalable Vector Graphics, Output looks great as SVG. Diagram elements such as nodes and edges won't lose visual clarity.




□ pomegranate: a package for graphical models and Bayesian statistics for Python, implemented in Python

>> http://pomegranate.readthedocs.org/en/latest/index.html

pomegranate is flexible enough to allow nesting of components to form models such as general mixture model hidden Markov models (GMM-HMMs) or Naive Bayes comparing a hidden Markov model to a Markov chain. Yet Another Hidden Markov Model library module implements graph-based interface. forward-backward, Baum-Welch and Viterbi algorithms.




□ BayesDB: Data science is a communication problem: Strata + Hadoop World conf in San Jose, March 28-31, 2016.

>> https://www.oreilly.com/ideas/bayesdb-data-science-is-a-communication-problem

no graphical environment can overcome the fundamental information asymmetry and complexity inherent in modeling data. The Bayesian Query Language (BQL) provides three key verbs that encompass the full range of inference questions: SIMULATE, INFER, ESTIMATE. Bayesian Query Language enables users to generate answers to a broad class of "what-if?" scenarios, contingencies and hypotheticals. The Meta-modeling Language (MML) enables machine assisted modeling for populations based on samples and domain insight.


AGBT 16.

2016-03-03 01:03:33 | Science News


□ NGS Battle: Illumina is now suing Oxford Nanopore (and it looks suspicious):

>> http://labiotech.eu/ngs-battle-illumina-sues-oxford-nanopores-looks-suspicious/






□ Advances in Genome Biology and Technology (AGBT) 10-13th February, at JW Marriott Orlando, Grande Lakes Orlando, Florida.

>> http://www.agbt.org #AGBT16

Joel Malek, Weill Cornell Medical College in Qatar “AVA-Seq: a method for all-versus-all protein interaction mapping using next generation sequencing”
James Hadfield, University of Cambridge “Progress in developing a nanopore rapid cancer MDX test”
GiWon Shin, Stanford University “STR-Seq: a massively parallel microsatellite sequencing and genotyping technology”
Mohan Bolisetty, The Jackson Laboratory “Determining exon connectivity in complex mRNAs using the MinION sequencer”
Jason Bielas, Fred Hutchinson Cancer Research Center “Deep profiling of complex cell populations using scalable single cell gene expression analysis”
Paolo Piazza, University of Oxford “Linking epigenetics and gene expression at single cell levels using SMART-ATAC-seq”
Maria Nattestad, Cold Spring Harbor Laboratory “SplitThreader: A graphical algorithm for the historical reconstruction of highly rearranged and amplified cancer genomes“
Han Fang, Cold Spring Harbor Laboratory “Scikit-ribo: Accurate A-site prediction and robust modeling of translation control from Riboseq and RNAseq data”
Nick Loman, University of Birmingham “Real-time genome sequencing in the field”
Jonas Korlach, Pacific Biosciences “Addressing complex diseases and hidden heritability with the sequel system”

etcetc...




□ The exceptional genomic word symmetry along DNA sequences:

>> http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-0905-0

a sliding window analysis in terms of exceptional symmetry (V R). results for 10 l , 2×10 l , 5×10 l base pairs, with l∈{3,4,5,6,7,8}. To evaluate the effect of chromosome type, window size and word length on the local exceptional symmetry behavior, consider the window V R median values of each ACGT sequence (chromosomes or corresponding random chromosomes). The local exceptional symmetry in the human genome is clearly higher than in the random scenarios produced without exceptional symmetry, but globally the effect is similar to random sequences generated with first order Markov models.






□ Shannon: An Information-Optimal de Novo RNA-Seq Assembler:

>> http://biorxiv.org/content/early/2016/02/09/039230

This algorithm provably solves the information-theoretically reconstructable instances in linear-time, even though the general sparsest-flow problem is NP-hard. the heart of Shannon is a novel iterative flow-decomposition algorithm.

Information theoretic condition

L > max{l< Z> , l Intra-Transcript Interleaved Repeat, l Intra-Transcript Triple Repeat}(T).

The proposed iterative algorithm reconstructs the transcriptome uniquely,

L > max{l< Z, l Intra-Transcript Repeat, l Circle}(T ).

<br />


□ 10x Genomics Announces Collaboration with QIAGEN, N.V. for Co-Development of Sequencing and Single Cell Analysis:

>> http://www.businesswire.com/news/home/20160209006492/en/10x-Genomics-Announces-Collaboration-QIAGEN-N.V.-Co-Marketing

Optimizing QIAGEN’s sample technologies for use with 10x Genomics GemCode and Chromium systems. Developing solutions for enabling the processing and analysis of 10X Genomics’ “Linked-Reads” with QIAGEN’s suite of leading bioinformatics. 10x’s ChromiumTM for single cell analysis allows for rapid analysis of dynamic transcription events from large numbers of individual cells.

10X genomicsのLong-Reads技術には予てから懐疑的な声も上がっていたけれども、QIAGENのワークフローについては有用性が認められたということか、「これから」共同開発に係るということなので注視しておこう。




□ At AGBT, Illumina Provides Additional Details on Project Firefly:

>> https://www.genomeweb.com/sequencing-technology/agbt-illumina-provides-additional-details-project-firefly

The platform itself will consist of two modules totaling one cubic foot of volume. One module will be for library prep and will be able to prepare up to eight libraries in parallel in 3.5 hours, unattended.




□ 10xgenomics is on the roll: Chromium for 3′ single-cell RNA-seq on 48K cells, haplotyping & assembly #AGBT16

>> http://nextgenseek.com/2016/02/10x-genomics-is-on-the-roll-chromium-system-for-3-single-cell-rna-seq-on-48000-cells-haplotyping-and-assembly/


iGenomics:
Theoretically, ~100kb reads could make ~14Mb phase block. Linked reads can reach that level. #AGBT16


SeqComplete:
#agbt16 SL: 860-884M reads per lane, mean depth 32-34X on HiSeq X10/10X Chromium (from 1 ng input). Phasing all good. #AGBT16 #genomics


infoecho:
CH, initial FALCON N50 ~ 12Mb, after correcting the misassembles N50 ~ 9Mbp #AGBT16 (Great to correct errors, Thanks CH)




□ Hyb & Seq Sequencing from Nanostring: New optical 3D biology & sequencing-simultaneous DNA, RNA, & protein analysis.



Initial experiments on BRAF V600E sequencing had raw single pass error rates of about 2%. less than 5x coverage from a single molecule would be required to reach a consensus sequence accuracy of 99.99% (Q40).




□ IGNITE-SPARK Toolbox: Supporting Practice via Apps/ Resources/Knowledge http://bit.ly/1Wgm1WB #AGBT16







□ IMP: a pipeline for reproducible metagenomic and metatranscriptomic analyses:

>> http://biorxiv.org/content/early/2016/02/10/039263

the IMP iterative co-assemblies were generated using a lower number of reads compared to MetAMOS-IDBA_UD due to the more stringent preprocessing procedures in IMP, which in turn yielded better quality assemblies, which are a prerequisite for population-level genome reconstruction and multi-omic data interpretation.






□ phASER: Long range phasing and haplotypic expression from RNA sequencing:

>> http://biorxiv.org/content/early/2016/02/12/039529

phASER is a fast and accurate approach for phasing variants that are overlapped by sequencing reads, including those from RNA-sequencing (RNA-seq), which often span multiple exons due to splicing. phASER performs haplotype phasing using read alignments in BAM format from both DNA and RNA based assays. Uses output from phASER to produce gene level haplotype counts for allelic expression studies. It does this by summing reads from both single variants and phASER haplotype blocks using their phase for each gene.






□ SMITE: Significance-based Modules Integrating the Transcriptome and Epigenome:

>> http://bioconductor.org/packages/devel/bioc/html/SMITE.html

a Monte Carlo Method of random sampling of the combined scores to determine a FDR like p-value which will used as the p-value/score analysis. When combining p-values using any method, p-values will be combined over the gene promoters, gene bodies & over any provided other datasets. This plotting function creates a hexbin plot of any two p-value vectors stored within a p-value object. It can be used to define relationships between direction and significance in different genomic contexts after having combined p-values.




□ The infinitesimal model:

>> http://biorxiv.org/content/biorxiv/early/2016/02/15/039768.full.pdf

a mathematical justification of the model as the limit as the number M of underlying loci tends to infinity of a model with Mendelian inheritance, mutation and environmental noise, when the genetic component of the trait is purely additive. the infinitesimal model holds up to an error which is at most of the order of M^{-1/2}, the error could be as small as ο{1/M).




□ GenomeScope: Fast genome analysis from unassembled short reads:

>> http://schatzlab.cshl.edu/publications/posters/2016/2016.AGBT.GenomeScope.pdf

GenomeScope can quickly infer the heterozygosity rate and other genome characteristics from the k-mer distribution using a mixture model composed of 4 negative binomial (NB) terms scaled by the genome size G.

The model parameters are determined using non-linear least squares regression (NLS) implemented in R.

f(x) = G{αNB(x, λ, λ/ρ) + βNB(x, 2λ , 2λ /ρ) + γNB(x, 3λ , 3 λ/ρ) + δNB(x, 4λ , 4λ /ρ)}




□ MetaPalette: K-mer painting approach for metagenomic taxonomic profiling & quantification of novel strain variation:

>> http://biorxiv.org/content/early/2016/02/17/039909

MetaPalette provides an indication of how related the organisms in a given sample are to the closest matching organisms of the training DB, whether they are within the same species or distantly related organisms from the same phyla. a sparsity promoting optimization procedure to infer the most parsimonious x consistent with the equations A(k)x = y(k) for k = 30,50.




□ UROBORUS: a tool for detect circRNAs with low expression levels in total RNA-seq without RNase R treatment

>> https://nar.oxfordjournals.org/content/early/2016/02/11/nar.gkw075.full

sample comprised ∼1.27 million reads, and assuming that all reported circRNA are false positives, the false positive rate would be ∼0.79 per million reads & estimated that UROBORUS will have FDR < 0.013 (62.4 million reads, 3875 circRNAs)

<br />


□ Kronos: a workflow assembler for genome analytics and informatics:

>> http://biorxiv.org/content/biorxiv/early/2016/02/19/040352.full.pdf

Kronos minimizes the cumbersome process of writing code for a workflow by transforming a YAML configuration file into a Python script. Kronos’ agnosticism towards the compute grid scheduler allows it to seamlessly be used in combination with “cloud cluster” management tools.




□ A Central Limit Theorem for Punctuated Equilibrium: “the phenotype can jump”

>> http://biorxiv.org/content/biorxiv/early/2016/02/18/039867.full.pdf

Combining jumps with an Ornstein–Uhlenbeck process is attractive from a biological point of view. The dependency between the adaptation rate α and branching rate λ = 1 governs in which regime the process is. if 0 < α < 1/2 then the process has “long memory” [“local correlations dominate over the OU’s ergodic properties”]

<br />


□ Genotype Specification Language

>> http://dlvr.it/KXh9Hv

Genotype Specification Language allows facile incorporation of parts from a library of cloned DNA constructs and from the “natural” library. GSL was designed to engage genetic engineers in their native language while providing a framework for higher level abstract tooling. define 4 language levels, Level 0 (literal DNA sequence) through Level 3, w/ increasing abstraction of part selection & construction paths.






□ Inferring causal molecular networks: empirical assessment through a community-based effort

>> http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.3773.html

the HPN-DREAM network inference challenge, which focused on data-driven learning causal networks and predict molecular time-course data. Given the complexity of causal learning and wide range application-specific factors, recommend at the present time network inference efforts should whenever possible incl some interventional data and that suitable scores be used for empirical assessment in the setting of interest.






□ TOPSIS: Comparative assessment of methods for the fusion transcripts detection from RNA-Seq

>> http://www.nature.com/articles/srep21597

TOPSIS (Technique for Order of Preference by Similarity to Ideal Solution) on the mixed dataset results & ranked the fusion detection tools. TOPSIS scores were calculated by taking two types of weights for all of the four criteria i.e. sensitivity, time consumption, RAM, and PPV.




□ Which Genetics Variants in DNase-Seq Footprints Are More Likely to Alter Binding?

>> http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1005875

SVMs are better sequence models than PWMs, but are not as specific without footprint information.




□ Ovation® SoLo RNA-Seq System:

>> http://www.nugen.com/content/ovation-solo-rna-seq-system

The Ovation SoLo RNA-Seq System utilizes NuGEN’s Insert-Dependent Adaptor Cleavage (InDA-C) technology. an end-to-end solution for strand-specific RNA-Seq library construction using as little as 1–1000 cells or 10 pg to 100 ng of total RNA.






□ Modeling cumulative biological phenomena with Suppes-Bayes causal networks:

>> http://biorxiv.org/content/biorxiv/early/2016/02/25/041343.full.pdf

SBCN is particularly sound in modeling the dynamics of system driven by the monotonic accumulation of events, thanks to encoded priors based on Suppes’ theory of probabilistic causation. 3 types of MPNs given the canonical boolean formula: a conjunctive MPN, a disjunctive (semi-monotonic) MPN and an exclusive disjunction MPN.

Algorithm 1: CAPRI

Input: a dataset D of n Bernoulli variables, e.g., genomic alterations or patterns, and m samples.
Result: a graphical model G = (V, E) representing all the relations of “probabilistic causation”.






□ Alpha-CENTAURI: Assessing novel centromeric repeat sequence variation with long read

>> http://bioinformatics.oxfordjournals.org/content/early/2016/02/24/bioinformatics.btw101.short

alpha-CENTAURI is a Pyhton-based workflow for mining alpha satellites and their higher-order structures in sequence data. Alpha-CENTAURI takes two input files: a FASTA file containing long reads, and an HMM database built using known alpha-satellite monomers.

Potential HOR reads often unlocalized in (Falcon) asm. Run Falcon->Falcon-unzip read tracker->Alpha-CENTAURI & save time



□ ARRIS: Learning In Spike Trains: Estimating Within-Session Changes In Firing Rate Using Weighted Interpolation:

>> http://biorxiv.org/content/biorxiv/early/2016/02/26/041301.full.pdf

ARRIS provides reliable estimates of firing rates based on small samples using the reversible-jump Markov chain Monte Carlo algorithm.






□ Approximate Bayesian bisulphite sequencing analysis:

>> http://biorxiv.org/content/early/2016/02/29/041715

Integrated Nested Laplace Approximation (INLA), that allows for a fast and accurate fitting of the parameters in terms of both convergence and computational time when compared to sampling-based methods such as Markov chain Monte Carlo (MCMC) or Sequential Monte Carlo (SMC).




□ DECRES: Genome-Wide Prediction of cis-Regulatory Regions Using Supervised Deep Learning Methods:

>> http://biorxiv.org/content/biorxiv/early/2016/02/28/041616.full.pdf

DECRES gives higher sensitivity and precision on FANTOM annotated regions compared with ChromHMM and ChromHMM-Segway Combined methods.

param alpha_reg: paramter from interval [0,1] to control the smoothness of weights by squared l_2 norm. The regularization term is lambda_reg( (1-alpha_reg)/2 * ||W||_2^2 + alpha_reg ||W||_1 ),




□ ADEPT, a dynamic next generation sequencing data error-detection program with trimming

>> http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-0967-z

ADEPT is the only tool to date that dynamically assesses errors within reads by comparing position-specific and neighboring base quality scores with the distribution of quality scores for the dataset being analyzed.




□ End Point of Black Ring Instabilities and the Weak Cosmic Censorship Conjecture:

>> http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.116.071102

a robust and simple new method, based on localized diffusion, to handle singularities in numerical general relativity. the CCZ4 formulation of the five-dimensional Einstein vacuum equations in Cartesian coordinates, with the redefinition of the damping parameter κ1 → κ1/α, where α is the lapse.




□ この世界に何ひとつとどまるものはなく、動かざる様に思える大地も、空に弧を描き続けた月と太陽も振動し、学んだ教訓を灰にして、万象あまねく砂塵へと成り変わる。しかし踏みしめる場所を感じられるのは、それぞれが過ぎ去る速度が異なるからだ。そこに居続けるということは、擦れ違うよりも困難だ。

□ 記憶が灰になっても良い。燃やされたこと自体が重要なのだから。

□ ”what was, has always been. what is, has always been. and what will be, has always been.” Louis I Kahnの言葉を借りるなら、起こりうることは、常に起こったことと同じ確からしさを持つ。