lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

Shai Maestro / "The Dream Thief"

2018-10-30 00:22:03 | music18


□ Shai Maestro / "The Dream Thief"

>> https://www.ecmrecords.com/catalogue/1530796791
>> https://www.shaimaestro.com

Release date: 28.09.2018
ECM 2616

Shai Maestro Piano
Jorge Roeder Double Bass
Ofri Nehemya Drums

イスラエル人ジャズ・ピアニストのECMデビュー作品。
旋律はダイナミックな瑞々しさに溢れ、変幻自在のフレージングは、ある種のポエトリーを刻む。
静謐かつ仄暗い耽美なリリシズムを漂わせる一枚。


“Expressions of joy, introspective thoughts and heightened intensity all come to the fore.” Maestro’s differentiated touch is special; he can convey a range of fleeting emotions in a single phrase.





ORACLE.

2018-10-17 00:17:17 | Science News

□ ODESZA / "Meridian"



"the universe is a (gigantic) joint probabilistic model, and some marginal distributions can be described by standard model..."



□ Architectural Principles for Characterizing the Performance of Sequestration Feedback Networks:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/27/428300.full.pdf

The primary focus here is a circuit architecture that uses a sequestration mechanism to implement feedback control in a biomolecular circuit. This circuit immediately had a broad impact on the study of biological feedback systems, as sequestration is both abundant in natural biological contexts and appears to be feasible to implement in synthetic networks. For example, sequestration feedback can be implemented using sense-antisense mRNA pairs, sigma-antisigma factor pairs, or scaffold-antiscaffold pairs.




□ NanoSatellite: Accurate characterization of expanded tandem repeat length and sequence through whole genome long-read sequencing on PromethION.:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/09/439026.full.pdf

“NanoSatellite”, a novel pattern recognition algorithm, which bypasses base calling and alignment, and performs direct Tandem Repeats analysis on raw PromethION squiggles. achieved more than 90% accuracy and high precision (5.6% relative standard deviation). NanoSatellite is based on consecutive rounds of Dynamic Time Warping (DTW), a dynamic programming algorithm to find the optimal alignment between two (unevenly spaced) time series.






□ INSTRAL-ASTRAL: Discordance-aware Phylogenetic Placement using Quartet Scores:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/02/432906.full.pdf

INSTRAL finds the optimal solution to the quartet placement problem. Unlike ASTRAL, the number of possible solutions to the placement problem is small (grows linearly with n), and thus, INSTRAL can solve the problem exactly even for large trees. In principle, it is possible to develop algorithms that compute the quartet score for all possible branches, one at a time, and to select the optimal solution at the end. However, the ASTRAL dynamic programming allows for a more straight-forward solution.






□ AEGIS: Exploratory Gene Ontology Analysis with Interactive Visualization:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/05/436741.full.pdf

AEGIS (Augmented Exploration of the GO with Interactive Simulations) is an interactive information-retrieval framework that enables an investigator to navigate through the entire Gene Ontology graph (tens of thousands of nodes) and focus on fine-grained details without losing the context. AEGIS features interpretable visualization of GO terms, flexible exploratory analysis of the GO DAG (directed acyclic graph) by adopting the focus-and-context framework, reminiscent of classical principles in visual information system design that is biologically grounded.






□ Contour Monte Carlo: Inverse sensitivity analysis of mathematical models avoiding the curse of dimensionality:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/01/432393.full.pdf

The computational complexity of the methods used to conduct inverse sensitivity analyses for deterministic systems has limited their application to models with relatively few parameters. a novel Markov Chain Monte Carlo method we call “Contour Monte Carlo”, which can be used to invert systems with a large number of parameters.

the utility of this method by inverting a range of frequently-used deterministic models of biological systems, including the logistic growth equation, the Michaelis-Menten equation, and an SIR model of disease transmission with nine input parameters. argue that the simplicity of this approach means it is amenable to a large class of problems of practical significance and, more generally, provides a probabilistic framework for understanding the inversion of deterministic models.




□ An information thermodynamic approach quantifying MAPK-related signaling cascades by average entropy production rate:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/01/431676.full.pdf

Signal transduction can be computed by entropy production amount from the fluctuation in the phosphorylation reaction of signaling molecules. By Bayesian analysis of the entropy production rates of individual steps, they are consistent through the signal cascade.






□ New architecture trains a nano-oscillator classifier with standard machine learning algorithms:

>> https://aip.scitation.org/doi/10.1063/1.5042359

they only used the average stable state of the oscillator network, the offline learning algorithm can be applied to temporal signals as well, by inputting a different F at every time-step, and reading a sliding time window average of f(t). The new architecture correctly categorized a larger percentage of the standard data set known as Iris than the reference classifier did. Comparison of results on the Iris data set further highlights the power of the nonmonotonic and interunit interactions.








□ Systematic Prediction of Regulatory Motifs from Human ChIP-Sequencing Data Based on a Deep Learning Framework:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/16/417378.full.pdf

DESSO utilizes deep neural network and binomial distribution to optimize the motif prediction, and the results showed that DESSO outperformed existing tools in predicting distinct motifs from the 690 in vivo ENCODE ChIP- Seq datasets for 161 human TFs in 91 cell lines. designed a first-of-its-kind binomial-based model in DESSO to identify all the significant motif instances, under the statistical hypothesis that the number of random sequence segments which contain the motif of interest in the human genome is binomially distributed.




□ rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data:

>> https://www.biorxiv.org/content/early/2018/09/18/420208

rnaSPAdes shows decent and stable results across multiple RNA-Seq datasets, the choice of the de novo transcriptome assembler remains a non-trivial problem, even with the aid of specially developed tools, such as Transrate, DETONATE, BUSCO and rnaQUAST.




□ Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications:

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1406-4

a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene- and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq.




□ VarTrix: a software tool for extracting single cell variant information from 10x Genomics single cell data

>> https://github.com/10XGenomics/vartrix

VarTrix does not perform variant calling. VarTrix uses Smith-Waterman alignment to evaluate reads that map to each known input variant locus and assign single cells to these variants. This process works on both 10x single cell gene expression datasets as well as 10x single cell DNA datasets.




□ Predictive Collective Variable Discovery with Deep Bayesian Models:

>> https://arxiv.org/pdf/1809.06913.pdf

formulating the discovery of collective variables (CVs) as a Bayesian inference problem and consider the CVs as hidden generators of the full-atomistic trajectory. Subtracting it from the atomistic potential as long as the approximation of the generative model is adequate could potentially accelerate the simulation by ”filling-in” the deep free-energy wells.






□ GenEpi: Gene-based Epistasis Discovery Using Machine Learning:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/20/421719.full.pdf

GenEpi takes the Genotype File Format (.GEN) used by Oxford statistical genetics tools, such as IMPUTE2 and SNPTEST as the input format for genotype data. Since the phenotype may also be affected by environmental factors, after determining the final set of genotype features, included the environmental factors such as clinical assessments for constructing the final model. To obtain the final model, they used random forests with 1,000 decision trees as the ensemble algorithm.






□ Parliament2: Fast Structural Variant Calling Using Optimized Combinations of Callers:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/23/424267.full.pdf

Parliament2 uses a call-overlap-genotype approach that is highly extensible to new methods and presents users the choice to run some or all of Breakdancer, Breakseq, CNVnator, Delly, Lumpy, and Manta to run. Parliament2 uses SURVIVOR to overlap these calls into consensus candidates; and validates these calls using SVTyper. Parliament2 is also a publicly available app on DNAnexus.







□ MetaCell: analysis of single cell RNA-seq data using k-NN graph partitions:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/08/437665.full.pdf

Metacells constitute local building blocks for clustering and quantitative analysis of gene expression, while not enforcing any global structure on the data, thereby maintaining statistical control and minimizing biases. In theory, a set of scRNA-seq profiles that are sampled from precisely replicated cellular RNA pools will be distributed multinomially with predictable variance and zero gene-gene covariance.







□ SIGDA: Scale-Invariant Geometric Data Analysis provides robust, detailed visualizations of human ancestry specific to individuals and populations:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/03/431585.full.pdf

SIGDA is intended to generalize two widely-used methods which apply to different kinds of data: Principal Components Analysis (PCA) ​, which applies z-score normalization to each of a set of random variables (columns) measured on a set of objects (rows), and Correspondence Analysis (CA),​ which applies a chi-squared model to cross-tabulated counts of observed events.

SIGDA interprets each matrix entry as a weight of similarity (or proximity or association) between the containing row and the containing column, or equivalently whatever (hidden) annotation may be associated with each row and column. SIGDA therefore generalizes both PCA and CA by discarding the assumptions which determine their respective approaches to data normalization, and it is SIGDA’s unique approach to data normalization which distinguishes it most from existing methods. SIGDA’s normalization, which they call ​projective decomposition​.

SIGDA determines the “relative orientation” between these two k -dimensional sub​spaces by singular value decomposition (SVD),​ obtaining k ​pairs of corresponding singular vectors.

SIGDA interprets matrix A twice: as 3D points defined by the eight rows, and unconventionally as an 8-dimensional point for each axis. Conceptually, projective decomposition simultaneously “focuses” these row and column points onto spheres; procedurally, it rescales each row and column of A to form a scale-free matrix W.

in general SIGDA will be used on data with many more than 3 dimensions, and this interpretation as a perspective drawing is therefore of limited utility. This connection with projective geometry is, however, at the heart of our “data camera” analogy.






□ GOAE and GONN: Combining Gene Ontology with Deep Neural Networks to Enhance the Clustering of Single Cell RNA-Seq Data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/07/437020.full.pdf

By integrating Gene Ontology with both unsupervised and supervised models, two novel methods are proposed, named GOAE (Gene Ontology AutoEncoder) and GONN (Gene Ontology Neural Network) respectively, for clustering of scRNA-seq data. In the GONN model, another hidden layer with 100 fully-connected neurons are added. After the training phase, the hidden layer with 100 fully-connected neurons is con- sidered as the low dimensional representation of the input. The diversity of a GO terms could be measured by gene expression values. z-score-based method is used for normalization on gene dimension.






□ dphmix: Variational Infinite Heterogeneous Mixture Model for Semi-supervised Clustering of Heart Enhancers:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/13/442392.full.pdf

implements a Dirichlet Process Infinite Heterogeneous Mixture model that infers Gaussian, Bernoulli and Poisson distributions over continuous. derived a variational inference algorithm to handle semi-supervised learning where certain observations are forced to cluster together. Cluster assignments, stick-breaking variables and distribution parameters form the latent variable space, while α and parameters of the NGBG prior form the hyperparameter space of the DPHM model.




□ XTalkiiS: a tool for finding data-driven cross-talks between intra-/inter-species pathways:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/13/437541.full.pdf

XTalkiiS loads a data-driven pathway network and applies a novel cross-talk modelling approach to determine interactions among known KEGG pathways in selected organisms. The potentials of XTalkiiS are huge as it paves the way of finding novel insights into mechanisms how pathways from two species (ideally host-parasite) may interact that may contribute to the various phenotype.




□ Reactive SINDy: Discovering governing reactions from concentration data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/13/442095.full.pdf

extend the sparse identification of nonlinear dynamics (SINDy) method to vector-valued ansatz functions, each describing a particular reaction process. The resulting sparse tensor regression method “reactive SINDy” is able to estimate a parsimonious reaction network. One apparent limitation is that the method can only be applied if the data stems from the equilibration phase, as the concentration-based approach has derivatives equal zero in the equilibrium, which precludes the reaction dynamics to be recovered.




□ BitMapperBS: a fast and accurate read aligner for whole-genome bisulfite sequencing:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/14/442798.full.pdf

BitMapperBS is an ultra-fast and memory-efficient aligner that is designed for WGBS reads from directional protocol. BitMapperBS is at most more than 70 times faster than popular WGBS aligners BSMAP and Bismark, and presents similar or greater sensitivity and precision. The vectorized bit-vector algorithm used in BitMapperBS extends multiple candidate locations simultaneously, while existing aligners extend their candidate locations one-by-one. As a result, the time-consuming extension step of BitMapperBS can be significantly accelerated.






□ A robust nonlinear low-dimensional manifold for single cell RNA-seq data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/14/443044.full.pdf

the t-Distributed Gaussian Process Latent Variable Model (tGPLVM) for learning a low dimensional embedding of unfiltered count data. tGPLVM is a Bayesian nonparametric model for robust nonlinear manifold estimation in scRNA-seq settings. The sparse kernel structure allows us to effectively reduce the number latent dimensions based on the actual complexity of the data. The implementation of tGPLVM accepts sparse inputs produced from high-throughput experimental cell by gene count matrices.




□ A direct comparison of genome alignment and transcriptome pseudoalignment:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/16/444620.full.pdf

To enable the feature with transcriptome pseudoalignment, developed a tool, kallisto quant - genomebam, that converts genome alignments in the format of a BAM or SAM file to transcript compatibility counts, the primary output of transcriptome pseudoalignment. using bam2tcc to convert HISAT2, STAR, transcriptome pseudoalignment programs kallisto and Salmon into transcript compatibility counts, which were then quantified using the expectation maximization (EM) algorithm for a uniform coverage model.






Omega Point.

2018-10-17 00:13:17 | Science News


問題の解決手段自体が、解決すべき問題になる。セグメントの単純化は、対象事物を指向するフラグメントの細分化を伴い、その複雑性を保存する。エントロピーは不可逆性を計る指標だが、複雑性は時間に対し可塑性を担保する。即ち行為の余波は、行為しなかった余波と対称的な力学量を持つ。





□ The SIRAH force field 2.0: Altius, Fortius, Citius:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/05/436774.full.pdf

SIRAH 2.0 can be considered a significant upgrade that comes at no increase of computational cost, as the functional form of the Hamiltonian, the number of beads in each moiety, and their topologies remained the same. The simulation of the holo form starting from an experimental structure sampled near- native conformations, retrieving quasi-atomistic precision.




□ A starless bias in the maximum likelihood phylogenetic methods:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/05/435412.full.pdf

If the aligned sequences are equidistant from each other with the true tree being a star tree, then the likelihood method is incapable of recover unless the sequences are either identical or extremely diverged. analyze this “starless” bias and identify the source for the bias. In contrast, distance-based methods (with the least-squares method for branch evaluation and either minimum evolution or least-squares criterion for choosing the best tree) do not have this bias. The finding sheds light on the star-tree paradox in Bayesian phylogenetic inference.






□ Prioritising candidate genes causing QTL using hierarchical orthologous groups:

>> https://academic.oup.com/bioinformatics/article/34/17/i612/5093215

Gene families, in the form of hierarchical orthologous groups from the Orthologous MAtrix project (OMA), enable reasoning over complex nested homologies in a consistent framework. By integrating functional inference with homology mapping, it is possible to differentiate the confidence in orthologous and paralogous relationships when propagating functional knowledge.




□ Evaluating stochastic seeding strategies in networks

>> https://arxiv.org/abs/1809.09561

how stochastic seeding strategies can be evaluated using existing data arising from randomized experiments in networks designed for other purposes and how to design much more efficient experiments for this specific evaluation. he proposed estimators and designs can dramatically increase precision while yielding valid inference.




□ CaSTLe – Classification of single cells by transfer learning: Harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments:

>> https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0205499

''CaSTLe–classification of single cells by transfer learning,'' is based on a robust feature engineering workflow and an XGBoost classification model built on these features. The feature engineering steps include: selecting genes with the top mean expression and mutual information gain, removing correlated genes, and binning the data according to pre-defined ranges.




□ Demonstration of End-to-End Automation of DNA Data Storage:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/10/439521.full.pdf

The device enables the encoding of data, which is then written to a DNA oligonucleotide using a custom DNA synthesizer, pooled for liquid storage, and read using a nanopore sequencer and a novel, minimal preparation protocol. The extension segment is then T/A ligated to the standard Oxford Nanopore Technology (ONT) LSK-108 kit sequencing adapter, creating the “extended ONT adapter,” which ensures that sufficient bases are read for successful base calling.






□ Selene: a PyTorch-based deep learning library for sequence-level data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/10/438291.full.pdf

"Sequence-level data" refers to any type of biological sequence such as DNA, RNA, or protein sequences and their measured properties (e.g. TF binding, DNase sensitivity, RBP binding). Training is automatically completed by Selene; afterwards, the researcher can easily use Selene to compare the performance of their new model to the original DeepSEA model on the same chromosomal holdout dataset.




□ DIAlign provides precise retention time alignment across distant runs in DIA and targeted proteomics:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/10/438309.full.pdf

DIAlign is a novel algorithm based on direct alignment of raw MS2 chromatograms using a hybrid dynamic programming approach. The algorithm does not impose a chronological order of elution and allows for aligning of elution-order swapped peaks.






□ SETD8 wild-type apo and cofactor-bound, and mutant apo Folding@home simulations

>> https://osf.io/2h6p4/




□ VOMM: A framework for space-ecient variable-order Markov models:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/14/443101.full.pdf

a practical, versatile representations of variable-order Markov models and of interpolated Markov models, that support a large number of context-selection criteria, scoring functions, probability smoothing methods, and interpolations, and that take up to 4 times less space than previous implementations based on the suffix array, regardless of the number and length of contexts, and up to 10 times less space than previous trie-based representations, or more, while matching the size of related, state-of-the-art data structures from Natural Language Processing.






□ D-NAscent: Capturing the dynamics of genome replication on individual ultra-long nanopore sequence reads:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/16/442814.full.pdf

Under conditions of limiting BrdU concentration, D-NAscent detects the differences in BrdU incorporation frequency across individual molecules to reveal the location of active replication origins, fork direction, termination sites, and fork pausing/stalling events. The trained BrdU pore model to account for the presence of BrdU in the sequence while also circumventing the high space and time complexities that can result from dynamic programming alignment. With an alignment of events to the Albacore basecall, then aligned to the minimap2.




□ Comparative Pathway Integrator: a framework of meta-analytic integration of multiple transcriptomic studies for consensual and differential pathway analysis:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/16/444604.full.pdf

Given pathway enrichment results, perform Adaptively-weighted Fisher’s (AW Fisher) method as meta-analysis, to identify pathways significant in one or more studies/conditions.






□ GoT: High throughput droplet single-cell Genotyping of Transcriptomes reveals the cell identity dependency of the impact of somatic mutations:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/16/444687.full.pdf

GoT capitalizes on high-throughput scRNA-seq (the 10x Genomics Chromium single cell 3’ platform), by which thousands of cells can be jointly profiled for genotyping information as well as single-cell full transcriptomes. the ability of GoT to genotype multiple target genes in parallel is critical. while described here for 3’ droplet-based scRNA-seq, GoT can be integrated in any scRNA-seq method that generates full length cDNA as an intermediate product (Microwell-seq, 10x SingleCell V(D)J +5′GE). the high-throughput linking of single-cell genotyping of expressed genes to transcriptomic data may provide the means to gain insight into questions such as the integration of clonal diversification with lineage plasticity or differentiation topologies.






□ Using genetic data to strengthen causal inference in observational research:

>> https://www.nature.com/articles/s41576-018-0020-3

Recent progress in genetic epidemiology — including statistical innovation, massive genotyped data sets and novel computational tools for deep data mining — has fostered the intense development of methods exploiting genetic data and relatedness to strengthen causal inference. Assessing credibility requires in-depth knowledge of the question, which is unlikely in massive hypothesis-free causal inference exercises, such as phenome-wide approaches.

Triangulation — when conclusions from several study designs converge — will play an increasingly important role in strengthening evidence for causality. One should not expect that a single existing or future method for causal inference in observational settings will provide a definitive answer to a causal question. Rather, such methods can substantially improve the strength of evidence on a continuum from mere association to established causality.






□ Using long-read sequencing to detect imprinted DNA methylation:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/17/445924.full.pdf

Determining allele-specific methylation patterns in diploid or polyploid cells with short-read sequencing is hampered by the dependence on a high SNP density and the reduction in sequence complexity inherent to bisulfite treatment. Using long-read nanopore sequencing, with an average genomic coverage of approximately ten, it is possible to determine both the level of methylation of CpG sites and the haplotype from which each read arises.




□ BiG-SCAPE and CORASON: A computational framework for systematic exploration of biosynthetic diversity from large-scale genomic data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/17/445270.full.pdf

BiG-SCAPE facilitates rapid calculation and interactive exploration of BGC sequence similarity networks (SSNs); it accounts for differences in modes of evolution between BGC classes, groups gene clusters at multiple hierarchical levels, introduces a ‘glocal’ alignment mode that supports complete as well as fragmented BGCs, and democratizes the analysis through a dramatically accelerated.




□ Reverse GWAS: Using Genetics to Identify and Model Phenotypic Subtypes:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/17/446492.full.pdf

Reverse GWAS (RGWAS) to identify and validate subtypes using genetics and multiple traits: while GWAS seeks the genetic basis of a given trait, RGWAS seeks to define trait subtypes with distinct genetic bases. RGWAS uses a bespoke decomposition, MFMR, to model covariates, binary traits, and population structure. A random-effect version of MFMR could improve power to detect polygenic subtypes, though computational issues are non-trivial. MFMR could also be adapted to count data, zero-inflation, higher-order arrays, or missing data.




□ OSCA: a tool for omic-data-based complex trait analysis:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/17/445163.full.pdf

MOMENT (a mixed-linear-model-based method) that tests for association between a DNAm probe and trait with all other distal probes fitted in multiple random-effect components to account for the effects of unobserved confounders as well as the correlations between distal probes. MOMENT has been implemented in a versatile software package (OSCA) together with a number of other implementations for omic-data-based analysis incl the estimation of variance in a trait captured by all measures of multiple omic profiles, xQTL analysis, and meta-analysis of xQTL.






□ LuxRep: a technical replicate-aware method for bisulfite sequencing data analysis:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/19/444711.full.pdf

LuxGLM is a probabilistic covariate model for quantification of DNA methylation modifications with complex experimental designs. LuxRep improves the accuracy of differential methylation analysis and lowers running time of model-based DNA methylation analysis. LuxRep features Model-based integration of biological / technical replicates, and Full Bayesian inference by variational inference implemented in Stan. LuxRep also generates count data from sequencing reads using e.g. Bismark, and align BS-seq data.




□ RAxML-NG: A fast, scalable, and user-friendly tool for maximum likelihood phylogenetic inference:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/18/447110.full.pdf

RAxML-NG is a from scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML. RAxML and ExaML are large monolithic codes, RAxML-NG employs a two-step L-BFGS-B method to optimize the parameters of the LG4X model. RAxML-NG is a phylogenetic tree inference tool which uses maximum-likelihood (ML) optimality criterion. RAxML can compute the novel branch support metric called transfer bootstrap expectation. TBE is less sensitive to individual misplaced taxa in replicate trees, and thus better suited to reveal well-supported deep splits in large trees with thousands of taxa.




□ On Parameter Interpretability of Phenomenological-Based Semiphysical Models:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/18/446583.full.pdf

the phenomenological modeling approach offers the great advantage of having a structure with variables and parameters with physical meaning that enhance the interpretability of the model and its further used for decision making. this property has not been deeply discussed, perhaps by the implicit assumption that interpretability is inherent to the phenomenological-based models.






□ SeqOthello: querying RNA-seq experiments at scale:

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1535-9

SeqOthello, an ultra-fast & memory-efficient indexing structure to support arbitrary sequence query against large collections of RNA-seq experiments. It takes only 5 min and 19.1 GB memory to conduct a global survey of 11,658 fusion events against 10,113 TCGA Pan-Cancer datasets. By providing a reference-free, alignment-free, and parameter-free sequence search system, SeqOthello will enable large-scale integrative studies using sequence-level data, an undertaking not previously practicable for many individual labs.






□ Highly parallel single-molecule identification of proteins in zeptomole-scale mixtures:

>> https://www.nature.com/articles/nbt.4278

The obtained sparse fluorescent sequence of each molecule was then assigned to its parent protein in a reference database. testing the method on synthetic and naturally derived peptide molecules in zeptomole-scale quantities. they also fluorescently labeled phosphoserines and achieved single-molecule positional readout of the phosphorylated sites.




□ Deciphering epigenomic code for cell differentiation using deep learning:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/22/449371.full.pdf

Increasing lines of evidence have suggested that the epigenome in a cell type is established step-wisely though the interplay of genomic sequence, chromatin remodeling systems and environmental cues along the developmental lineage. As the latter two factors are the results of interactions of the products of genomic sequences, the epigenome of a cell type is ultimately determined by the genomic sequence.




□ A relative comparison between Hidden Markov- and Log-Linear- based models for differential expression analysis in a real time course RNA sequencing data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/22/448886.full.pdf

evaluate the relative performance of two Hidden Markov- and Log-Linear- based statistical models in detection of DE genes in a real time course RNA-seq data. The Hidden Markov-based model, EBSeq-HMM, was particularly developed for time course experiments while the log-linear based model, multiDE, was proposed for multiple treatment conditions.




□ Efficient Proximal Gradient Algorithm for Inference of Differential Gene Networks:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/22/450130.full.pdf

The differential gene-gene interactions identified by ProGAdNet algorithm yield a list of genes alternative to the list of differentially expressed genes. This may provide additional insight into the molecular mechanism behind the phenotypical difference of the tissue under different conditions. Alternatively, the two gene networks inferred by ProGAdNet algorithm can be used for further differential network analysis (DiNA).




□ Look4TRs: A de-novo tool for detecting simple tandem repeats using self-supervised hidden Markov models:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/23/449801.full.pdf

Look4TRs adapts itself to the input genomes automatically, balancing high sensitivity and low false positive rate. Look4TRs generates a random chromosome based on a real chromosome of the input genome. Then it inserts semi-synthetic MS in the random chromosome. Finally, the HMM is trained and calibrated using these semi-synthetic MS.





Vortex.

2018-10-13 10:13:57 | Science News


"If you want to go fast, go alone. If you want to go far, go together."

"Genomic data are inherently and insidiously NOT IID." (Independent and identically distributed random variables.)
- Katherine S. Pollard (Genome Informatics 2018)




□ DeepSequence: Deep generative models of genetic variation capture the effects of mutations:

>> https://www.nature.com/articles/s41592-018-0138-4

DeepSequence, a probabilistic model for sequence families, predicted the effects of mutations across a variety of deep mutational scanning experiments substantially better than existing methods based on the same evolutionary data. The model, learned in an unsupervised manner solely on the basis of sequence information, is grounded with biologically motivated priors, reveals the latent organization of sequence families, and can be used to explore new parts of sequence space.




□ DeepVariant: A universal SNP and small-indel variant caller using deep neural networks:

>> https://www.nature.com/articles/nbt.4235

DeepVariant is a deep convolutional neural network can call genetic variation in aligned next-generation sequencing read data by learning statistical relationships between images of read pileups around putative variant and true genotype calls. DeepVariant can learn to call variants in a variety of sequencing technologies and experimental designs, including deep whole genomes from 10X Genomics and Ion Ampliseq exomes. The learned model generalizes across genome builds and mammalian species, allowing nonhuman sequencing projects to benefit from the wealth of human ground-truth data.






□ BiT-STARR-seq: High throughput characterization of genetic effects on DNA:protein binding and gene transcription:

>> http://genome.cshlp.org/content/early/2018/09/25/gr.237354.118.full.pdf

BiT-STARR-seq (Biallelic Targeted STARR-seq), a streamlined protocol for a high-throughput reporter assay, that identifies allele-specific expression (ASE) while accounting for PCR duplicates through unique molecular identifiers. Unlike STARR-seq, this method does not require preparation of DNA regions for use in the assay, such as whole genome fragmentation, or targeting regions, while, similar to STARR-seq, it requires only a single cloning and transformation step.






□ Trans effects on gene expression can drive omnigenic inheritance:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/24/425108.full.pdf

a formal model in which genetic contributions to complex traits can be partitioned into direct effects from core genes, and indirect effects from peripheral genes acting as trans-regulators. if the core genes for a trait are co-regulated – as seems likely – then the effects of peripheral variation can be amplified by these co-regulated networks such that nearly all of the genetic variance is driven by peripheral genes.






□ Identify tissue structure with Automatic Expression Histology using STARmap data:

>> http://www.nxn.se/valent/2018/9/25/identify-tissue-structure-with-automatic-expression-histology

Implemented in the SpatialDE package there is a Bayesian inference algorithm for learning posterior probabilities of the assignments z between genes and hidden spatial patterns that, together, give rise to spatial co-expression. STARmap (spatially-resolved transcript amplicon readout mapping) begins with labeling of cellular RNAs by pairs of DNA probes followed by enzymatic amplification so as to produce a DNA nanoball (amplicon), which eliminates background caused by mislabeling of single probes.






□ Mapping DNA replication with nanopore sequencing:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/26/426858.full.pdf

harnessing nanopore sequencing to study DNA replication genome-wide at the single-molecule level. Using in vitro prepared DNA substrates, characterize the effect of bromodeoxyuridine (BrdU) substitution for thymidine on the MinION nanopore electrical signal. implementing RepNano, a recurrent neural network with an architecture similar to DeepNano, to convert the raw current from nanopore experiments into a DNA sequence.




□ CaSpER: Identification, visualization and integrative analysis of CNV events in multiscale resolution using single-cell or bulk RNA sequencing data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/26/426122.full.pdf

CaSpER models the allele-based frequencies as a mixture of Gaussian distributions for identification and classification of genotype clusters. The shift in allelic signal is used to quantify the loss-of-heterozygosity (LOH) which is valuable for CNV identification. CaSpER uses Hidden Markov Models (HMM) to assign copy number states to regions. The multiscale nature of CaSpER enables comprehensive analysis of focal and large-scale CNVs and LOH segments.




□ Atlas-CNV: a validated approach to call Single-Exon CNVs in the eMERGESeq gene panel:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/27/422337.full.pdf

Atlas-CNV is validated as a method to identify exonic CNVs in targeted sequencing data generated in the clinical laboratory. The ExonQC and C-score assignment can reduce FDR (identification of targets with high variance) and improve calling accuracy of single-exon CNVs respectively. they proposed guidelines and criteria to identify high confidence single-exon CNVs.




□ Detection and Mitigation of Spurious Antisense RNA-seq Reads with RoSA:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/26/425900.full.pdf

RoSA (Removal of Spurious Antisense), detects the presence of high levels of spurious antisense read alignments in strand-specific RNA-Seq datasets. RoSA uses incorrectly spliced reads on the antisense strand and/or ERCC spike-ins (if present in the data) to calculate both global and gene-specific antisense correction factors.




□ Artificial Dilution Series: Novel Comparison of Evaluation Metrics for Gene Ontology Classifiers Reveals Drastic Performance Differences:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/27/427096.full.pdf

Artificial Dilution Series is the first method for testing various classifier evaluation metrics using real datasets as a platform for controlled amounts of embedded signal, and allows a simple testing of evaluation metrics to see how easily can separate different signal levels. This work proposes improved versions for some well-known evaluation metrics. The presented methods are also applicable to other areas of science where evaluation of prediction results is non-trivial.




□ plyranges: A grammar of genomic data transformation:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/26/327841.full.pdf

Genome Query Language and its distributed implementation GenAp which use a SQL-like syntax for fast retrieval of information of unprocessed sequencing data. Similarly, the Genometric Query Language (GMQL) implements a DSL for combining genomic datasets. a genomic DSL called plyranges that reformulates notions from existing genomic algebras and embeds them in R as a genomic extension of dplyr. By analogy, plyranges is to the genomic algebra, as dplyr is to the relational algebra.




□ GRID-seq assisted computational prediction of transcription factor binding motifs using multivariate mahalanobis distance analysis reveals that RNA-chromosomal interaction may act as a proxy indicator of true positive transcriptional activity.:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/27/429332.full.pdf

the global RNA interactions with DNA by deep sequencing (GRID-seq) approach, which allows a user to quantify genome wide binding of RNA to chromatin in a manner that results in data similar to what might be obtained from model-based analysis of ChIP-Seq (MACS). The aim of combining DeepBind and GRID-seq is to derive results that may assist researchers in study target selection. The desired outcome is achieved by simultaneously considering the influence that scores from GRID-seq and DeepBind has as a single distance function based on the distribution, and then ranking sequences that contain predicted binding motifs according to this new combined score.




□ SMARTer single cell total RNA sequencing:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/29/430090.full.pdf

The total RNA-seq method detects an equal or higher number of genes compared to classic polyA[+] RNA-seq, including novel and non-polyadenylated genes. The obtained RNA expression patterns also recapitulate the expected biological signal. Inherent to total RNA-seq, this method is also able to detect circular RNAs.






□ GEViT: A systematic method for surveying data visualizations and a resulting genomic epidemiology visualization typology:

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty832/5107018

GEViT consists of an initial literature analysis phase followed by a visualization analysis phase, resulting in a visualization design space in which images are classified according to their why and their how. The literature analysis phase automatically analyzes text from a corpus of research articles to identify the topic of a data visualization – why it was created – as assuming that different topics are likely to yield different visualization designs.




□ Enspara: Modeling molecular ensembles with scalable data structures and parallel computing:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/29/431072.full.pdf

enspara is a library for building Markov State Models at scale. a numpy-compatible implementation of the ragged array, which dramatically improved the memory footprint of Markov State Models-associated data. enspara also has turn-key sparse matrix usage. for users who wish to prototype entirely new Markov State Models estimation methods, any function or callable object is accepted as a builder, as long as it accepts a transition counts matrix C as input & returns a 2-tuple of transition probabilities & equilibrium probabilities.

information from a Markov State Model cannot be trivially substituted for frame-by-frame calculations. they also implement a function using cython and OpenMP that takes a trajectory of n features and returns a 4-dimensional joint counts array with dimension n × n × sn × sn. The value of returning this four-dimensional joint counts matrix is that it renders the problem embarrassingly parallel in the number of trajectories.

In enspara, a reference implementation of Correlation of All Rotameric and Dynamical States framework (CARDS). In brief, this method takes a series of molecular dynamics trajectories and computes the mutual information (MI) between all pairs of dihedral angle rotameric states, and between all pairs of dihedral angle order/disorder states. A dihedral angle is considered disordered if it frequently hops between rotameric states. This implementation parallelizes across cores on a single machine using the thread-parallelization.




□ MetaMap, an interactive webtool for the exploration of metatranscriptomic reads in human disease-related RNA-seq data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/30/425439.full.pdf

MetaMap applied the pipeline to close to 150 terabytes of publicly available raw RNA-seq data from >17,000 samples from >400 studies relevant to human disease using state-of-the-art high performance computing systems from the Leibniz Supercomputing Centre.




□ Vertical and horizontal integration of multi-omics data with miodin:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/30/431429.full.pdf

if(is(assayObj, "GenomicRatioSet")){
assayMetaData <- get(dataset, "assayMetaData", name = testAssay)
if("omega" %in% names(assayMetaData)){
omega = assayMetaData$omega[sampleTable$SampleName, ]
designMatrixCor <- cbind(designMatrix, omega)




□ The Chromium Single Cell ATAC (Assay for Transposase Accessible Chromatin) Solution

>> https://www.10xgenomics.com/solutions/single-cell-atac/#ATACseq




□ Challenges and guidelines toward 4D nucleome data and model standards

>> https://www.nature.com/articles/s41588-018-0236-3

it is now possible to build 3D models of how the genome folds within the nucleus and changes over time (4D). Because genome folding influences its function, this opens exciting new possibilities to broaden our understanding of the mechanisms that determine cell fate.




□ Everlasting Iatric Reader (Eir): Automatic Human-like Mining and Constructing Reliable Genetic Association Database with Deep Reinforcement Learning:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/05/434803.full.pdf

improve the reliability of text-mining by building a system that can directly simulate the behavior of a researcher, and develop corresponding methods, such as Bi-directional LSTM for text mining and Deep Q-Network for organizing behaviors. The immediate next-step plan is to train Eir for SNP-phenotype association with GWAS Catalog, and integrate these databases into GenAMap, a visual machine learning tool for GWASe, for validation purpose of GWAS results.




□ ewanbirney:
H3Africa refunded to $170 Million; 35 african countries, over 50 languages, many different legal systems, >100K people recruited, data deposition (collaboration with @EGAarchive) 380 trainees, 193 PhDs - Wow! #GA4GH2018




□ TransQST in a decade of public-private innovative medicines initiative (IMI) worth 3 billion €. Find out what 21 institutions from 9 countries do for developing safer drugs:

>> http://sbi.imim.es/data/imi-transqst.pdf




□ Nebula Genomics: Genomics startup bets on blockchain for data sharing platform:

>> https://www.mobihealthnews.com/content/genomics-startup-bets-blockchain-data-sharing-platform






□ AIDE: annotation-assisted isoform discovery and abundance estimation from RNA-seq data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/07/437350.full.pdf

Solving the isoform discovery problem in a stepwise and conservative manner, AIDE prioritizes the annotated isoforms and precisely identifies novel isoforms whose addition significantly improves the explanation of observed RNA-seq reads. AIDE learns gene and exon boundaries from annotations and also selectively borrows information from the annotated isoform structures using a stepwise likelihood-based selection approach.






□ scDesign: A statistical simulator for rational scRNA-seq experimental design:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/07/437095.full.pdf

scDesign is protocol- and data-adaptive. It learns scRNA-seq data characteristics from rapidly accumulating public scRNA-seq data generated under diverse settings. scDesign generates synthetic data that well mimic real scRNA-seq data under the same experimental settings, providing a basis for using its synthetic data to guide practical scRNA-seq experimental design.






□ Building gene regulatory networks from scATAC-seq and scRNA-seq using Linked Self-Organizing Maps:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/09/438937.full.pdf

Self-organizing maps (SOMs) are a type of artificial neural networks, also referred to as a Kohonen network. Combining the metaclusters from multiple SOMs as a pair-wise set generates a data-space that combines the properties from both without any assumptions about how the data relates to each-other. The ability of linked SOMs to detect emergent properties from multiple types of highly-dimensional genomic data with very different signal properties opens new avenues for integrative analysis of single-cells.






□ D-GPM: a deep learning method for gene promoter methylation inference:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/09/438218.full.pdf

D-GPM is a fully connected multi-layer perceptron with one output layer. All the hidden layers have the same number of hidden units. D-GPM contains 902 units in the input layer corresponding to the 902 landmark genes, and we also configure D-GPM with 21,645 units in the output layer analogous to the 21,645 target genes.





Revsic.

2018-10-12 02:44:28 | music18



□ Richard Devine - Revsic

>> https://richarddevine.bandcamp.com/album/sort-lave

Music written and produced by Richard Devine
available on the New album 'Sort\Lave' (TIMESIG009) on November 2nd.

Video produced and directed by Craig Ritchie Allan
https://numbercult.design/


Richard Devine:
“I really wanted to break free from timeline-based music creation and do things with my hands on the fly,”
“So the tracks are more like captured snapshot performances where I could experiment and play around with the idea of probability-based sequencing for every patch, string multiple sequencers together that would feed other sequencers to come up with interesting rhythms and melodies."


after a break of 6 years, Atlanta based electronic musician, producer and sound designer Richard Devine returns with a new album ‘Sort\Lave’ on Venetian Snares‘ Timesig imprint. Recorded between 2016 and 2017 using Richard’s custom built Eurorack modular system and two Nord G2 Modular units, Sort\Lave features 12 tracks of intricate electronica that ranges from abrasive percussive experiments such as ‘Revsic‘ to ‘Astra’s dazzling juxtaposition of sounds and onto the radiant ambience of the album’s closer ‘Takara‘.






calc.

2018-10-10 22:10:10 | Science News


全ての事象は散逸構造の中の計算過程として観察される。だが多体間の相関性は、一元構造の内部に掘り出されるシミュラークルに等しく、時間とは可変な測度である以上の意味を為さない。





□ Rev D: Oxford Nanopore has released a new version of MinION and GridION flow cells that include the new ‘Rev D’ ASIC.

>> https://nanoporetech.com/about-us/news/oxford-nanopore-releases-rev-d-flow-cells-enabling-increase-data-yields

Rev D extends the amount of time that flow cells can be used for DNA sequencing or RNA sequencing, increasing the overall yields of DNA sequence data to as much as 30 Gb per flow cell (at this performance, the equivalent of ~10X human genome for $500*). Rev D increases the rate at which DNA fragments pass through a nanopore from 35 bases per second at launch to 450 bases per second now.




□ Do Cells use Passwords? Do they Encrypt Information?:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/03/432120.full.pdf

Encryption could benefit cells by making it more difficult for pathogens to hijack cell networks. Because the 'language' of cell signaling is unknown, i.e., similar to an alien language detected by SETI. use information theory to consider the general case of how non-randomness filters can be used to recognize (1) that a data stream encodes a language, rather than noise, and (2) quantitative criteria for whether an unknown language is encrypted.

This leads to the result that an unknown language is encrypted if efforts at decryption produce sharp decreases in entropy and increases in mutual information. The magnitude of which should scale with language complexity. demonstrate this with a simple numerical experiment on English language text encrypted with a basic polyalphabetic cipher.




□ Caring without sharing: Meta-analysis 2.0 for massive genome-wide association studies:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/05/436766.full.pdf

The extension to this methods and paradigm would be to adopt an honest-but-curious threat model for all parties and aim to prevent leakage of individual level information from the data owning silos to any other party involved. Since for all steps of the pipeline the central hub simply sums the results of all silos (or takes averages), when more than 2 silos contribute, multi-party, secure sum protocols can be used to decrease the overall chance of information leakage.




□ Improved DNA based storage capacity and fidelity using composite DNA letters:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/02/433524.full.pdf

The Shannon information capacity of DNA was recently demonstrated, using fountain codes, to be ~1.57 bit per synthesized position. However, synthesis and sequencing technologies process multiple nominally identical molecules in parallel, leading to significant information redundancies. composite DNA alphabets, using mixed DNA base types, to leverage this redundancy, enabling higher density. develop encoding and decoding for composite DNA based storage, including error correction. Using current DNA synthesis technologies, they code 6.4 Megabyte data into composite DNA, achieving ~25% increase in capacity as compared to literature.






□ MiniScrub: de novo long read scrubbing using approximate alignment and deep learning:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/02/433573.full.pdf

a novel Convolutional Neural Network (CNN) based method, called MiniScrub, for de novo identification and subsequent “scrubbing” (removal) of low-quality Nanopore read segments. MiniScrub first generates read-to-read alignments by MiniMap, then encodes the alignments into images, and finally builds CNN models to predict low-quality segments that could be scrubbed based on a customized quality cutoff. Compared to raw reads, de novo genome assembly with scrubbed reads pro- duces many fewer mis-assemblies and large indel errors.






□ poreTally: run and publish de novo Nanopore assembler benchmarks:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/23/424184.full.pdf

poreTally automatically assembles a given read set using several often-used assembly pipelines, analyzes the resulting assemblies for correctness and continuity, and finally generates a quality report. For the running of assembly pipelines, poreTally relies on the Snakemake work-flow management system and its excellent integration with conda environments.




□ Structural variants identified by Oxford Nanopore PromethION sequencing of the human genome:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/03/434118.full.pdf

determined the precision and recall, present high confidence and high sensitivity call sets of variants and discuss optimal parameters. The aligner Minimap2 and structural variant caller Sniffles are both the most accurate and the most computationally efficient tools. While LAST is the recommended aligner by the authors of NanoSV the number of variants identified was excessive, with a high number of false positives. NanoSV, too, obtained the best results after minimap2 alignment.

In this comparison with SVs called from short read sequencing data using Manta and Lumpy a clear advantage for long reads was demonstrated, with substantially higher recall values.




□ rCASC: reproducible Classification Analysis of Single Cell sequencing data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/03/430967.full.pdf

rCASC a modular RNAseq analysis workflow allowing data analysis from counts generation to cell sub-population signatures identification, granting both functional and computation reproducibility. CASC uses as core application to detect cell clusters the “kernel based similarity learning”: identification of the optimal number of clusters for cell partitioning. The evaluation of clusters stability, measuring the permanence of a cell in a cluster upon random removal of subsets of cells.




□ Time and space dimensions of gene dosage imbalance of aneuploidies revealed by single cell transcriptomes:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/23/424887.full.pdf

gene dosage imbalance is of bidimensional nature: over time (simultaneous expression of all alleles resulting in increased accumulation of RNA of copy altered genes) as previously stated, and over space (increased fraction of cells simultaneously expressing copy altered genes).






□ fastp: an ultra-fast all-in-one FASTQ preprocessor:

>> https://academic.oup.com/bioinformatics/article/34/17/i884/5093234

fastp provides functions including quality profiling, adapter trimming, read filtering and base correction, and supports both single-end and paired-end short read data and also provides basic support for long-read data. fastp includes most features of FASTQC + Cutadapt + Trimmomatic + AfterQC while running 2–5 times faster than any of them alone.




□ SemGen: a tool for semantics-based annotation and composition of biosimulation models:

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/bty829/5107020

A key SemGen capability is to decompose and then integrate models across existing model exchange formats including SBML and CellML. To support this capability, using semantic annotations to explicitly capture the underlying biological and physical meanings of the entities and processes that are modeled. SemGen leverages annotations to expose a model’s biological and computational architecture and to help automate model composition.




□ Grouper: graph-based clustering and annotation for improved de novo transcriptome analysis:

There are two main modules in Grouper: the clustering module and the labeling module. The former is based on the tool, RapClust, and is designed to be run downstream of the Sailfish or Salmon tools for rapid transcript-level quantification. It relies on the fragment equivalence classes, orphaned read mappings and quantification information computed by these tools in order to determine how contigs in the assembly are potentially related and cluster them accordingly.







□ Energy performance optimization in buildings: A review on semantic interoperability, fault detection, and predictive control:

>> https://aip.scitation.org/doi/full/10.1063/1.5053110

The traditional architecture of BAS is often represented as a three-layer architecture: the field layer includes sensors, actuators, and controllers interconnected via field buses like KNX, LON, or wireless networks like ZigBee or Z-Wave. The automation layer consists of PLCs covering measurement processing, control, and alarm tasks for the devices of the field layer and uses protocols of both the field and the management layer. The management layer forms the upper tier of the architecture and is constituted of supervisory control systems (SCS), human-machine interfaces (HMI) with configuration and monitoring features, as well as databases for time series data archival (DBs). Typical protocols of the management layer are BACnet or OPC.




□ rMETL: sensitive mobile element insertion de- tection with long read realignment:

>> https://www.biorxiv.org/content/biorxiv/early/2018/09/19/421560.full.pdf

rMETL takes advantage of its novel chimeric read re-alignment approach to well handle complex MEI signals. Benchmarking results demonstrate that rMETL can produce high quality callsets to improve long read-based MEI calling.






□ scClustViz: Single-cell RNAseq cluster assessment and visualization:

>> https://f1000research.com/articles/7-1522/v1

Given that cell types are expected to have distinct gene expression patterns, scClustViz uses differential gene expression between clusters as a metric for assessing the fit of a clustering result to the data at multiple cluster resolution levels. scClustViz provides interactive visualisation of cluster-specific distributions of technical factors, predicted cell cycle stage and other metadata; cluster-wise gene expression statistics to simplify annotation and identification specific marker genes; and GE distributions.






□ Genomic prediction of autotetraploids; influence of relationship matrices, allele dosage, and continuous genotyping calls in phenotype prediction:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/03/432179.full.pdf

Continuous genotypic based models performed as well as the current best models and presented a significantly better goodness-of-fit for all traits analyzed. This approach also reduces the computational time required for marker calling and avoids problems associated with misclassification of genotypic classes when assigning dosage in polyploid species.






□ scRNA-seq mixology: towards better benchmarking of single cell RNA-seq protocols and analysis methods:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/03/433102.full.pdf

a realistic benchmark experiment that included mixtures of single cells or ‘pseudo-cells’ created by sampling admixtures of cells or RNA from 3 distinct cancer cell lines. The comparison shows the 10X Chromium platform to produce the highest quality data, both Drop-seq and CEL-seq2 are very flexible protocols, with various parameters that can be optimized and tuned.

systematic methods comparisons for 4 key tasks; normalization and imputation, clustering, trajectory analysis and data integration. The performance of methods varied across different datasets, with no clear winners in all situations, however, consistently satisfactory results were observed for scran, Linnorm, DrImpute and SAVER for normalization and imputation; Seurat and SC3 for clustering; Monocle2 and Slingshot for trajectory analysis and MNN for data integration.




□ mirtronDB: a mirtron knowledge base:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/03/429522.full.pdf

Mirtrons are originated from short introns with atypical cleavage from the miRNA canonical pathway by using the splicing mechanism. the first knowledge database dedicated to mirtron, called mirtronDB, has a total of 1,407 mirtron precursors and 2,426 mirtron mature sequences in 18 species.




□ dream: Powerful differential expression analysis for repeated measures designs:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/03/432567.full.pdf

The dream model extends
• multiple random effects
• the variance terms to vary across genes
• estimate residual degrees of freedom for each model from the data in order to reduce false positives
• hypothesis testing with moderated t-statistics using empirical Bayes approach
• fast hypothesis testing for fixed effects in linear mixed models
• small sample size hypothesis test to increase power
• precision weights to model measurement error in RNA-seq counts
• seamless integration with the widely used workflow of limma




□ Novel Data Transformations for RNA-seq Data Analysis:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/03/432690.full.pdf

Simulation studies showed that limma based on transformed data by using the rv transformation performed best compared with limma based on transformed data by using other transformation methods in term of high accuracy and low FNR, while keeping FDR at the nominal level. For large sample size, limma with the r2 transformation performed better than limma with the voom transformation. In real data analysis, several (l2, l, r2, r, rv, and rv2) of our proposed transformations performed better than voom.






□ Bazam : A rapid method for read extraction and realignment of high throughput sequencing data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/03/433003.full.pdf

Bazam, a tool that efficiently extracts the original paired FASTQ from reads stored in aligned form (BAM or CRAM format). Bazam extracts reads in a format that directly allows realignment with popular aligners with high concurrency. Bazam increases parallelism by splitting the output streams into multiple paths for separate realignment. a single source alignment can be realigned using an unlimited number of parallel aligners, significantly accelerating the process when a computational cluster is available. Through eliminating steps and increasing the accessible concurrency, Bazam facilitates up to a 90% reduction in the time required for realignment.





□ GPyTorch beta: Scalable Gaussian processes in PyTorch, with strong GPU acceleration.

>> https://gpytorch.ai/

GPyTorch provides significant GPU acceleration (through MVM based inference); implementations of the latest algorithmic advances for scalability and flexibility (SKI/KISS-GP, stochastic Lanczos expansions, LOVE, SKIP, stochastic variational deep kernel learning, ...)






□ An information theoretic treatment of sequence-to-expression modeling:

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006459

Methodologically, an important feature of this approach was the generation of ensemble by uniform sampling in the multi-dimensional space, followed by optimization, as was done in. Using the fact that the entropy of the probability distribution captures the uncertainty intrinsic, we can use the difference in entropy of the original ensemble & the filtered ensemble (information gain), as an objective evaluation of how informative the experiments results are.




□ Improving on hash-based probabilistic sequence classification using multiple spaced seeds and multi-index Bloom filters:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/05/434795.full.pdf

a novel probabilistic data structure based on Bloom filters that implicitly stores hashed data (to reduce memory usage) yet can better handle sequence polymorphisms and errors with multiple spaced seeds, increasing the sensitivity of hashed-based sequence classification. multi-index Bloom Filter shows a higher sensitivity and specificity for read- binning than BWA MEM at an order of magnitude less time. For taxonomic classification, miBF shows higher sensitivity than CLARK-S at an order of magnitude less time while using half the memory.




□ Quasi-universality in single-cell sequencing data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/10/05/426239.full.pdf

This direct approach across technologies, roughly 95% of the eigenvalues derived from each single-cell data set can be described by universal distributions predicted by Random Matrix Theory. Interestingly, 5% of the spectrum shows deviations from these distributions and present a phenomenon known as eigenvector localization, where information tightly concentrates in groups of cells.





Víkingur Ólafsson / "JOHANN SEBASTIAN BACH"

2018-10-03 22:15:02 | art music


□ Víkingur Ólafsson / "JOHANN SEBASTIAN BACH"

>> https://vikingurolafsson.com
>> https://itunes.apple.com/jp/album/johann-sebastian-bach/1410551592


Release Date; 06/Sep/2018
Label; Deutsche Grammophon

Víkingur Ólafsson - J.S. Bach: Concerto in D Minor, BWV 974 - 2. Adagio


“Rare combination of sheer
technical brilliance, expressive
control and interpretative depth”


『アイスランドのグレン・グールド』の異名を持つ新進気鋭のピアニスト。ドイツグラモフォン名物、クラシックの現代的解釈シリーズ。今回はバッハの鍵盤楽曲を軸に、後世の作曲家や自身の編曲を巧みに織り交ぜた万華鏡の如き作品。




□ Víkingur Ólafsson - Bach Reworks (Pt. 1)

>> https://itunes.apple.com/jp/album/bach-reworks-pt-1-ep/1434050758

こちらは『前奏曲とフーガ ホ短調 BWV』の変奏、およびRemix EP。よりPostClassical風にアレンジされた表題曲もさることながら、Valgeir Sigurðssonによる音色とノイズの『飽和』を生かした前衛的なリミックスや、『ノイズの哲学者』Ben Frostの、眩い光を収束させながら、横軸に無限に引き伸ばされていく鋭利かつ冷たい形而上学的なサウンドも聴きごたえがある。