lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

We Are More Than We Are.

2018-12-31 23:57:50 | Science News
(Floaters 1: Andreas Levers)






□ Long-read sequence and assembly of segmental duplications:

>> https://www.nature.com/articles/s41592-018-0236-3

a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. Segmental Duplication Assembler (SDA) constructs graphs in which paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges, enabling the partition and assembly of long reads corresponding to distinct paralogs.






□ scAlign: a tool for alignment, integration and rare cell identification from scRNA-seq data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/22/504944.full.pdf

scAlign, a deep learning method for aligning and integrating scRNA-seq data collected across multiple conditions into a common, low-dimensional cell state space for downstream analysis such as clustering and trajectory inference across conditions. scAlign simultaneously aligns scRNA-seq from multiple conditions and performs a non-linear dimensionality reduction on the transcriptomes, and largely robust to the size of the architecture and network depth and width along with choice of hyper parameters.






□ High-dimensional Bayesian network inference from systems genetics data using genetic node ordering:

>> https://www.biorxiv.org/content/early/2018/12/24/501460

Bayesian network inference using pairwise node ordering is a highly efficient approach for reconstructing gene regulatory networks from High-Dimensional Systems data, which outperforms MCMC methods by assembling pairwise causal inference results in a global causal network.






□ nanoNOMe: Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/22/504993.full.pdf

nanoNOMe combines the ability of NOMe-seq to simultaneously evaluate CpG and chromatin accessibility, with long-read nanopore sequencing technology. Using the bisulfite mode on IGV, we can view methylation over the length of long reads at single-read resolution.






□ SeQuiLa-cov: A fast and scalable library for depth of coverage calculations:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/13/494468.full.pdf

SeQuiLa-cov, an extension to the recently released SeQuiLa platform, runs a redesigned event-based algorithm for the distributed environment, which provides efficient depth of coverage calculations, reaching more than 100x speedup over the state-of-the-art tools. Performance and scalability of SeQuiLa-cov allows for exome and genome-wide calculations running locally or on a cluster while hiding the complexity of the distributed computing with Structured Query Language Application Programming Interface.






□ SVIM: Structural Variant Identification using Mapped Long Reads:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/13/494096.full.pdf

SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates 5 different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from PacBio and Nanopore sequencing machines.




□ EXtrACtor, a tool for multiple queries and data extractions from the EXAC and gnomAD database:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/10/483909.full.pdf

It then queries the ExAC or gnomAD website for the data which would normally be returned in the browser (variation data, coverage information, etc) and after the data is retrieved quick selections can be made on which exact filtering steps can be executed, updating the data in EXtrACtor in real-time without the need to generate new queries to the ExAC database.






□ scCapsNet: a deep learning classifier with the capability of interpretable feature extraction, applicable for single cell RNA data analysis:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/27/506642.full.pdf

The parallel fully connected neural networks could function like a feature extractor as convolutional neural networks in the original CapsNet model. scCapsNet provides the precise contribution of each extracted feature to the cell type recognition, and could be used in the classification scenario where multiple information sources are available such as -omic datasets with data generated across different biological layers.




□ elPrep 4: A multithreaded framework for sequence analysis:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/10/492249.full.pdf

elPrep 4, a reimplementation from scratch of the elPrep framework for processing sequence alignment map files in the Go programming language. elPrep executes the preparation steps of the GATK Best Practices up to 13x faster on WES data, and up to 7.4x faster for WGS data compared to running the same pipeline with GATK 4, while utilizing fewer compute resources.




□ Scavager: a versatile postsearch validation algorithm for shotgun proteomics based on gradient boosting:

>> https://onlinelibrary.wiley.com/doi/abs/10.1002/pmic.201800280

Scavager employing CatBoost, an open‐source gradient boosting library, which shows improved efficiency compared with the other machine learning algorithms, such as Percolator, PeptideProphet, and Q‐ranker. Scavager - a proteomics post-search validation tool, currently supported search engines: Identipy, X!Tandem, Comet, MSFragger, msgf+, Morpheus.




□ npGraph - Resolve assembly graph in real-time using nanopore data:

>> https://github.com/hsnguyen/assembly

npGraph is another real-time scaffolder beside npScarf. Instead of using contig sequences as pre-assemblies, this tool is able to work on assembly graph (from SPAdes). If the sequences are given, then it's mandatory to have either BWA-MEM or minimap2 installed in your system to do the alignment between long reads and the pre-assemblies.




□ m-pCMF / ZINBayes: Scalable probabilistic matrix factorization for single-cell RNA-seq analysis:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/14/496810.full.pdf

Two novel generative models for dimensionality reduction: modified probabilistic count matrix factorization (m-pCMF) and Bayesian zero-inflated negative binomial factorization (ZINBayes). In terms of cell type separability in the reduced spaces, m-pCMF and ZINBayes yield higher ASW scores and moderately lower ARI and NMI scores than all competing methods, except pCMF, in the ZEISEL data set.




□ CDSeq: A novel computational complete deconvolution method using RNA-seq data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/14/496596.full.pdf

CDSeq is a complete deconvolution algorithm that takes RNA-seq data (raw read counts) from a collection of possibly heterogeneous samples as input and returns estimates of GEPs for each constituent cell type as well as the proportional representation of those cell types.




□ Hera-EM: A revisit of RSEM generative model and its EM algorithm for quantifying transcript abundances.

>> https://www.biorxiv.org/content/early/2018/12/21/503672

100x faster than RSEM with better accuracy. Hera-EM identified and removed early converged parameters to significantly reduce the model complexity in further iterations, and used SQUAREM method to speed up the convergence. On a data set with 60 million of reads, RSEM takes about an hour (3432 seconds) for EM step only, while Hera-EM needs half a minute (24 seconds); for an other data set with 75 million of reads, RSEM takes about 1.5 hours (5044 seconds), while Hera-EM takes 39 seconds.




□ Real-Time Point Process Filter for Multidimensional Decoding Problems Using Mixture Models:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/23/505289.full.pdf

the posterior distribution on each filtering time-step can be approximated using a Gaussian Mixture Model. The algorithm provides a real-time solution for multi-dimensional point-process filter problem and attains accuracy comparable to the exact solution.






□ SNIPER: Revealing Hi-C subcompartments by imputing high-resolution inter-chromosomal chromatin interactions

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/23/505503.full.pdf

a new computational approach, named SNIPER, based on an autoencoder and multilayer perceptron classifier to infer subcompartments using typical Hi-C datasets with moderate coverage.






□ Stochastic diffusion framework determines the free-energy landscape and rate from single-molecule trajectory:

>> https://www.ncbi.nlm.nih.gov/pubmed/30579309

This manuscript reports a general theoretical/computational methodology that characterises D(Q) [and by consequence, F(Q) and τf] by only giving as an input one single-molecule time-dependent trajectory [Q(t)]. The stochastic approach recovered v and F, in which the lattice model simulations were subjected to by simply imposing Gaussian distributions diffusing in one-dimension reaction coordinates.




□ Cryfa: a secure encryption tool for genomic data:

>> https://academic.oup.com/bioinformatics/article/35/1/146/5055587

Cryfa, a fast secure encryption tool for genomic data, namely in Fasta, Fastq, VCF, SAM and BAM formats, which is also capable of reducing the storage size of Fasta and Fastq files. Cryfa uses advanced encryption standard (AES) encryption combined with a shuffling mechanism, which leads to a substantial enhancement of the security against low data complexity attacks.




□ RGAAT: A Reference-based Genome Assembly and Annotation Tool for New Genomes and Upgrade of Known Genomes:

>> https://www.sciencedirect.com/science/article/pii/S1672022918304376

RGAAT can detect sequence variants with comparable precision, specificity, and sensitivity to GATK and with higher precision and specificity than Freebayes & SAMtools on four DNA-seq, and can identify sequence variants based on cross-cultivar or cross-version genomic alignments.






□ sn-m3C-seq: Single-cell multi-omic profiling of chromatin conformation and DNA methylome:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/26/503235.full.pdf

The sn-m3C-seq method allows unequivocal clustering of cell types using two orthogonal types of epigenomic information and the reconstruction of cell-type specific chromatin conformation maps.






□ ORGaNICs: A Canonical Neural Circuit Computation:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/26/506337.full.pdf

The ORGaNICs (Oscillatory Recurrent GAted Neural Integrator Circuits) theory provides a means for reading out information from the dynamically varying responses at any point in time, in spite of the complex dynamics.






□ A Darwinian Uncertainty Principle:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/26/506535.full.pdf

the simulation results have a similar flavor to a fundamental principle in quantum physics – Heisenberg’s uncertainty principle – which provides an absolute lower bound on the precision of simultaneously estimating both the position and the momentum of a particle. the phylogenetic analogue of ‘position’ as ‘ancestral state’, and thus ‘momentum’ (closely related to velocity) corresponds to the rates at which ancestral states change into different alternative states.




□ HiCluster: A Robust Single-Cell Hi-C Clustering Method Based on Convolution and Random Walk:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/27/506717.full.pdf

HiCluster, a single-cell clustering algorithm for Hi-C contact matrices that is based on imputations using linear convolution and random walk. Using both simulated and real data as benchmarks.






□ S3norm: simultaneous normalization of sequencing depth and signal-to-noise ratio in epigenomic data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/26/506634.full.pdf

S3norm is a robust way to normalize both SDs and SNRs across multiple data sets. S3norm can become increasingly useful to normalize signals across these diverse and heterogeneous epigenomic data sets and better highlight true epigenetic changes against technical bias.






□ Time-resolved mapping of genetic interactions to model rewiring of signaling pathways:

>> https://elifesciences.org/articles/40174

the genetic interactions form in different trajectories and developed an algorithm, termed MODIFI, to analyze how genetic interactions rewire over time.




□ EMEP: Single-cell RNA-seq Interpretations using Evolutionary Multiobjective Ensemble Pruning:

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/bty1056/5265329

EMEP is designed to dynamically select the suitable clustering results from the ensembles. This algorithm firstly applies the unsupervised dimensionality reduction to project data from the original high dimensions to low-dimensional sub-spaces.




□ FINDOR: Leveraging Polygenic Functional Enrichment to Improve GWAS Power

>> https://www.cell.com/ajhg/fulltext/S0002-9297(18)30411-7




□ TIGAR: An Improved Bayesian Tool for Transcriptomic Data Imputation Enhances Gene Mapping of Complex Traits:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/28/507525.full.pdf

TIGAR (Transcriptome-Integrated Genetic Association Resource) integrates both data-driven nonparametric Bayesian and Elastic-Net models for transcriptomic data imputation, along with TWAS and summary-level GWAS data for univariate and multi-variate phenotypes.






□ LATE / TRANSLATE: Imputation of single-cell gene expression with an autoencoder neural network:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/29/504977.full.pdf

TRANSLATE builds on LATE and further incorporates a reference gene expression data set (bulk gene expression, larger scRNA-seq data set through transfer learning.




□ R/qtl2: Software for Mapping Quantitative Trait Loci with High-Dimensional Data and Multi-parent Populations

>> http://www.genetics.org/content/early/2018/12/26/genetics.118.301595

R/qtl2 is designed to handle modern high-density genotyping data and high-dimensional molecular phenotypes including gene expression and proteomics.






□ Janggu - Deep learning for Genomics:

>> https://github.com/BIMSBbioinfo/janggu

Janggu provides special Genomics datasets that allow you to access raw data in FASTA, BAM, BIGWIG, BED and GFF file format.






□ Time-lagged Ordered Lasso for network inference

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2558-7

a regularized regression method with temporal monotonicity constraints, for de novo reconstruction.




□ Libra: scalable k-mer based tool for massive all-vs-all metagenome comparisons:

>> https://academic.oup.com/gigascience/advance-article/doi/10.1093/gigascience/giy165/5266304

Libra uses a scalable Hadoop framework for massive metagenome comparisons, Cosine Similarity for calculating the distance using sequence composition and abundance while normalizing for sequencing depth.





Thomas Bergersen / "In Orbit"

2018-12-30 01:21:06 | music18


□ Thomas Bergersen - In Orbit (feat. Cinda M.)

>> https://www.facebook.com/Thomas-Bergersen-147900228587129/


トーマス・バーガーセンによる、宇宙を漂うような壮大なSci-Fi ヴォーカル曲。自身の得意とするSFムービー的サウンドトラック風の音楽要素に、感傷的なポップネスを絡めるのが非常に巧い。2000年代初期のトリップホップの影響が色濃い。





Farout.

2018-12-27 23:33:16 | Science News





□ La science condamne à l’obsolescence les instruments qui lui permettent de progresser. Cet héritage, souvent menacé, est protégé par des passionnés et des institutions.

>> https://www.lemonde.fr/sciences/article/2018/12/18/patrimoine-scientifique-ces-instruments-sauves-de-l-oubli_5399475_1650684.html

Petit échantillon de machines oubliées, dont la forme et la fonction nous intriguent aujourd’hui.




□ SBOL-OWL: An ontological approach for formal and semantic representation of synthetic genetic circuits:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/19/499970.full.pdf

SBOL-OWL, an ontology for a machine understandable definition of SBOL. This ontology acts as a semantic layer for genetic circuit designs. As a result, computational tools can understand the meaning of design entities in addition to parsing structured SBOL data. Semantic reasoning has huge potential to verify genetic circuit structures. constraints between any 2-DNA parts can be captured using SBOL-OWL, and can be easily integrated with the set of new terms in order to validate genetic circuits based on the order of DNA components.




□ SCeQTL: an R package for identifying eQTL from single-cell parallel sequencing data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/19/499863.full.pdf

They find non-zero parts of scRNA-seq data fit the negative binomial distribution well similar to bulk RNA-seq data, but there can be a high probability of a gene being dropped out in the single-cell data. SCeQTL uses zero-inflated negative binomial regression to do eQTL analysis on single-cell data. It can distinguish two type of GE differences among different genotype groups, and can also be used for finding GE variations associated with other grouping factors like cell lineages.




□ gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/19/501502.full.pdf

This tool clusters the mapped sequencing reads and merges each cluster to generate one consensus read. If the data has unique molecular identifier (UMI), gencore uses it for identifying the reads derived from same original DNA fragment. This error-suppressing feature makes gencore very suitable for the application of detecting ultra-low frequency mutations from deep sequencing data.






□ Dashing: Fast and Accurate Genomic Distances with HyperLogLog:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/20/501726.full.pdf

Dashing uses the HyperLogLog sketch together with cardinality estimation methods that specialize in set unions and intersections. Dashing sketches genomes more rapidly than previous MinHash-based methods while providing greater accuracy across a wide range of input sizes and sketch sizes. It can sketch and calculate pairwise distances for over 87K genomes in under 6 minutes.

Dashing also uses Single Instruction Multiple Data (SIMD or “Vector”) instructions on modern general-purpose computer processors to exploit the finer-grained parallelism inherent in calculating the HyperLogLog estimate.






□ Cloud-BS: A MapReduce-based bisulfite sequencing aligner on cloud:

>> https://www.worldscientific.com/doi/abs/10.1142/S0219720018400280

Cloud-BS is an efficient Bisulfite Sequencing aligner designed for parallel execution on a distributed environment. Utilizing Apache Hadoop framework, the Cloud-BS splits sequencing reads into multiple blocks and transfers to distributed nodes. By designing each aligning procedure into a separate map and reduce tasks while internal key-value structure is optimized based on MapReduce programming model, the algorithm significantly improved aligning performance without sacrificing mapping accuracy.






□ BiSpark: a Spark-based highly scalable aligner for bisulfite sequencing data:

>> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6288881/

BiSpark is a highly parallelized bisulfite-treated read aligner algorithm that utilizes distributed environment to significantly improve aligning performance & scalability. BiSpark is designed based on the Apache Spark distributed framework and shows highly efficient scalability. implemented a highly-optimized load-balancing algorithm in the BiSpark provides re-distributing data almost evenly across the cluster nodes, achieving better scalability on a large-scale cluster.






□ Coherent chaos in a recurrent neural network with structured connectivity:

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006309

applying a perturbative approach to solve the dynamic mean-field equations, showing that in this regime coherent fluctuations are driven passively by the chaos of local residual fluctuations. in this regime the dynamics depend qualitatively on the particular realization of the connectivity matrix: a complex leading eigenvalue can yield coherent oscillatory chaos while a real leading eigenvalue can yield chaos with broken symmetry. The level of coherence grows with increasing strength of structured connectivity until the dynamics are almost entirely constrained to a single spatial mode.






□ A Deep Learning Genome-Mining Strategy Improves Biosynthetic Gene Cluster Prediction:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/18/500694.full.pdf

DeepBGC, a novel utilization of deep learning and natural language processing (NLP) strategy, and employs a Bidirectional Long Short-Term Memory (BiLSTM) RNN and a word2vec-like word embedding skip-gram neural network we call pfam2vec. addressing the algorithmic limitation by implementing a deep learning approach using RNN & vector representations of pfam domains which together, unlike HMMs, are capable of intrinsically sensing short- & long-term dependency effects between adjacent and distant genomic entities.




□ CRIP: Predicting circRNA-RBP interaction sites using a codon-based encoding and hybrid deep neural networks:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/18/499012.full.pdf

In order to fully exploit the sequence information, propose a stacked codon-based encoding scheme and a hybrid deep learning architecture, in which a convolutional NN learns high-level abstract features and a recurrent NN learns long dependency in the sequences. the CNN and BiLSTM hybrid components further learn high-level abstract features and contextual information from the encoding vectors, respectively.




□ FORGe: prioritizing variants for graph genomes:

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1595-x

FORGe works in cooperation with a variant-aware read aligner (graph aligner) such as HISAT2. FORGe then uses a mathematical model to score each variant according to its expected positive and negative impacts on alignment accuracy and computational overhead. FORGe could consider factors such as the variant’s frequency in a population, its proximity to other variants, and how its inclusion affects the repetitiveness of the graph genome.






□ HiDRA: High-resolution genome-wide functional dissection of transcriptional regulatory regions and nucleotides in human:

>> https://www.nature.com/articles/s41467-018-07746-1

HiDRA overcomes the construct-length and region-count limitations of synthesis-based technologies at substantially lower cost, and our ATAC-based selection of open chromatin regions concentrates the signal on likely regulatory regions and enables high-resolution inferences. HiDRA selection approach resulted in highly-overlapping fragments (~32,000 regions covered by 10+ unique fragments, ~12,500 by 20+ fragments), enabling us to pinpoint “driver” regulatory nucleotides that are critical for transcriptional enhancer activity.




□ Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1603-1

By sequencing these tags alongside the cellular transcriptome, we can assign each cell to its original sample, robustly identify cross-sample multiplets, and “super-load” commercial droplet-based systems for significant cost reduction.




□ Fast and accurate differential transcript usage by testing equivalence class counts

>> https://www.biorxiv.org/content/early/2018/12/19/501106

equivalence classes (ECs) counts have similar sensitivity and false discovery rates as exon-level counts but can be generated in a fraction of the time through the use of pseudo-aligners. The transcript abundance estimates can be used as an alternative starting measure for DTU testing. the estimated transcript abundances can perform well in detecting differential transcript usage, pseudo-alignment is significantly faster than methods that map to a genome. count-based DTU testing procedures such as DEXSeq are applied directly to alignments generated from fast lightweight aligners, such as Salmon and Kallisto.




□ De-Novo-Designed Translational Repressors for Multi-Input Cellular Logic:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/19/501783.full.pdf

Automated in silico optimization of thermodynamic parameters yields improved toehold repressors with up to 300-fold repression, while in-cell SHAPE-Seq measurements of 3WJ repressors confirm their designed switching mechanism in living cells. The modularity, wide dynamic range, and low crosstalk of the repressors enable their direct integration into ribocomputing devices that provide universal NAND and NOR logic capabilities and can perform multi-input RNA-based logic.




□ G-Dash: A Genome Dashboard Integrating Modeling and Informatics:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/19/501874.full.pdf

G-Dash unites the Interactive Chromatin Modeling(ICM) tools with the Biodalliance genome browser and the JSMol molecular viewer to rapidly fold any DNA sequence into atomic or coarse-grained models of DNA, nucleosomes or chromatin. G-Dash demonstrates that such an inventory of Masks can be maintained and converted to 3D structures from single base pairs or entire chromosomes in real time. In this manner, genome dashboards enable users to both define and navigate chromatin folding energy landscapes.






□ OncodriveCLUSTL: a sequence-based clustering method to identify cancer drivers:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/19/500132.full.pdf

OncodriveCLUSTL, a new linear clustering algorithm to detect genomic regions and elements with significant clustering signals based on a local background model derived from a cohort’s observed tri- or penta-nucleotide substitutions frequency. OncodriveCLUSTL is an unsupervised clustering algorithm. It analyzes somatic mutations that have been observed in genomic elements (GEs) across a cohort of samples.




□ Improved Representation of Sequence Bloom Trees:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/19/501452.full.pdf

building on the Sequence Bloom Tree (SBT) framework to construct the HowDe-SBT data structure, which uses a novel partitioning of information to reduce the construction and query time as well as the size of the index. proving theoretical bounds on the performance of HowDe-SBT and also demonstrate its performance advantages on real data by comparing it to the previous SBT methods and to mantis, a representative from the second category of indexing methods.




□ Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/19/501130.full.pdf

Nucleotide Archival Format (NAF) - a new file format for lossless reference-free compression of FASTA and FASTQ-formatted nucleotide sequences. NAF compression ratio is comparable to the best DNA compressors, it provides 30 to 80 times faster decompression.




□ Expression reflects population structure

>> https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007841

Lior Pachter:
Expression reflects population structure: while PCA does not reveal population structure in RNAseq (e.g. @tuuliel et al.'s GEUVADIS), it is revealed via another projection. Interesting implications for eQTL discovery.

The method is able to determine the significance of the variance in the canonical correlation projection explained by each gene. They identify 3,571 significant genes, only 837 of which had been previously reported to have an associated eQTL in the GEUVADIS results.




□ RAISS: Robust and Accurate imputation from Summary Statistics:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/21/502880.full.pdf

RAISS is a python package enabling the imputation of SNP summary statistics from the neighboring SNPs by taking advantage of the Linkage disequilibrium. Neighboring SNPs are highly correlated variables which the inversion of prone numerical instabilities, and invert with the Moore-Penrose inverse. To ensure numerical stability, eigen values below a given threshold are set to zero in the computation of pseudo inverse.




□ doepipeline: a systematic approach for optimizing multi-level and multi-step data processing workflows:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/21/504050.full.pdf

DoE-based strategy for a systematic approach for optimizing multi-level and multi-step data processing workflows, and exemplify the application of doepipeline in; de-novo assembly / scaffolding of contiguous sequence / k-mer classification of long noisy reads generated by MinION. Optimal parameter settings are first approximated in a screening phase using a subset design that efficiently span the entire search space, and subsequently optimized in the following phase using response surface designs and OLS modeling.




□ qgg: an R package for large-scale quantitative genetic analyses:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/21/503631.full.pdf

qgg handles large-scale data by taking advantage of:

multi-core processing using openMP
multithreaded matrix operations implemented in BLAS libraries (OpenBLAS, ATLAS or MKL)
fast and memory-efficient batch processing of genotype data stored in binary files (PLINK bedfiles)






□ NCUA: A novel structure-based control method for analyzing nonlinear dynamics in biological networks:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/21/503565.full.pdf

NCUA is a novel and general graphic-theoretic algorithm from the perspective of the feedback vertex set to discover the possible minimum sets of the input nodes in controlling the network state. NCUA is based on the assumption that the edges of the undirected networks are modeled as the bi-directed edges. NCUA determining the MDS of the top side nodes to cover the bottom side nodes in the bipartite graph by using Integer Linear Programming (ILP), and designing random Markov chain sampling to obtain different input node sets.




□ The Epistasis Boundary: Linear vs. Nonlinear Genotype-Phenotype Relationships:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/21/503466.full.pdf

in separability theory determine the conditions, which correspond to 3 biological criteria (Directional Consistency, Environmental Compensability, and Pathway Redundancy) together making up an Epistatic Boundary between systems suitable and unsuitable for linear modeling. a classification of types of nonlinearity from a systems perspective.




□ Predicting complex genetic phenotypes using error propagation in weighted networks:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/21/487348.full.pdf

investigate if biological networks could be approximated as overlapping, feed-forward networks where the nodes have non-linear activation functions. Mathematical formalization of this model followed by numerical simulations based on genomic data allowed us to accurately predict the statistics of gene essentiality.




□ SeqCrispr: Identifying Context-specific Network Features for CRISPR-Cas9 Targeting Efficiency Using Accurate and Interpretable Deep Neural Network:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/24/505602.full.pdf

seqCrispr involves a sequence feature engineering layer. It utilizes unsupervised representation learning to find the vector representation of 3mer instead of one-hot encoder. This hybrid model can take both advantages of RNN and CNN for the feature engineering of the sgRNA, and make the model more resistant to data noise. word2vec embedding with Hilbert-curve filling may have advantage over vertical stacking.






Catalyst.

2018-12-24 23:33:18 | Science News


"We Are What We Are."




□ The Bellerophon pipeline, improving de novo transcriptomes and removing chimeras:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/13/495754.full.pdf

Bellerophon first uses the quality-assessment tool TransRate to indicate the quality, after which it uses a Transcripts Per Million (TPM) filter to remove lowly expressed contigs and CD-HIT-EST to remove highly identical contigs. Bellerophon pipeline was able to remove between 40 to 91.9% of the chimeras in transcriptome assemblies and removed more chimeric than non-chimeric contigs.






□ NASC-seq monitors RNA synthesis in single cells:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/17/498667.full.pdf

NASC-seq reveals rapidly up- and down-regulated genes during the T-cell activation, and RNA sequenced for induced genes were essentially only newly synthesized. NASC-seq is based on a combination of RNA labeling with 4sU, RNA modification by alkylation as in SLAM-seq, RNA sequencing library preparation as in Smart-seq2, and data analysis that includes a computational model from GRAND-SLAM.




□ Consensify: a method for generating pseudohaploid genome sequences from palaeogenomic datasets with reduced error rates:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/18/498915.full.pdf

The error correction is derived directly from the data itself, without the requirement for additional genomic resources or simplifying assumptions such as contemporaneous sampling. The extended D statistic, rather than standard pseudohaploidisation, makes use of the complete read stack and can further apply a correction to error rates estimated by comparison to data from a high quality “error free” individual.




□ RACER: A data visualization strategy for exploring multiple genetic associations:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/14/495366.full.pdf

Statistical methods have been developed to assess the likelihood that two associations (e.g. disease locus and eQTL) share a common causal variant, however, visualization of the two loci is often a crucial step in determining if a locus is pleiotropic. the Regional Association ComparER (RACER) package, which creates mirror plots, in which the two associations are plotted on a shared x-axis. Mirror plots provide an effective tool for the visual exploration and presentation of the relationship between two genetic associations.




□ MapOptics: A light-weight, cross-platform visualisation tool for optical mapping alignment:

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/bty1013/5232997

MapOptics is a lightweight cross-platform tool that enables the user to visualise and interact with the alignment of Bionano optical mapping data and can be used for in depth exploration of hybrid scaffolding alignments.

public static LinkedHashMap calculateRefOverlap(String refId, RefContig ref, LinkedHashMap queries) {
LinkedHashMap overlapRegions = new LinkedHashMap();




□ Unified Cox model based multifactor dimensionality reduction method for gene-gene interaction analysis of the survival phenotype:

>> https://biodatamining.biomedcentral.com/articles/10.1186/s13040-018-0189-1

Cox UM-MDR is easily implemented by combining Cox-MDR with UM-MDR to detect the significant gene-gene interactions associated with the survival time without cross-validation and permutation testing. Cox UM-MDR has similar power to Cox-MDR, whereas it outperforms Cox-MDR with marginal effects and more robust to heavy censoring when some SNPs having only marginal effects might mask the detection of the causal epistasis.




□ Optimal Gene Filtering for Single-Cell data (OGFSC) – a gene filtering algorithm for single-cell RNA-seq data:

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/bty1016/5237553

Precise identification of differentially expressed genes and cell populations are heavily dependent on the effective reduction of technical noise, for example, by gene filtering. However, there is still no well-established standard in the current approaches of gene filtering. a novel algorithm, termed as OGFSC (Optimal Gene Filtering for Single-Cell data), to construct a thresholding curve based on gene expression levels and the corresponding variances.






□ susieR: "sum of single effects" (SuSiE) sparse multiple regression,: fine-mapping in human genetic association studies:

>> https://github.com/stephenslab/susieR

The methods implemented here are particularly well-suited to settings where some of the X variables are highly correlated, and the true effects are highly sparse (e.g.


□ Nanopore sequencing and rapid fusion testing – a ‘killer app’ in molecular pathology

>> https://nanoporetech.com/resource-centre/william-jeck-nanopore-sequencing-and-rapid-fusion-testing-killer-app-molecular






□ VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database:

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2532-4

VarGenius provides a database to store the output of the analysis (calling quality statistics, variant annotations, internal allelic variant frequencies) and sample information (personal data, genotypes, phenotypes).

VarGenius has been tested on a parallel computing cluster with 52 machines with 120GB of RAM, a 50M whole exome sequencing was executed in about 7h (trio or quartet); a joint analysis of 30 WES in about 24 h and the parallel analysis of 34 single samples from a  M panel in 2h.

VarGenius also implements the GATK VQSR and the GRW pipelines for the joint analysis of hundreds of samples. The VarGenius database can be queried through SQL programming interface which has a very intuitive syntax, and provides a script (query_2_db.pl), described by the online user guide, which allows to perform basic automated queries.




□ artic.network: LHFV Current Outbreak Simulation

>> https://artic-network.github.io/artic-workshop/

#RealTimeGhana18






□ The Future of Science and Science of the Future: Vision and Strategy for the African Open Science Platform (v02)

>> https://zenodo.org/record/2222418#.XBYWFy3AORs

the future nanopore for viral genome sequencing. Participants will return home and train others so Africa will be epidemic ready






@pathogenomenick
12-plex viral genomes- 100s of x coverage per genome from participant-prepared libraries generated in a few minutes, basecalled with guppy and visualised both in real-time by @hamesjadfield RAMPART. And the negative is clean! #RealTimeGhana18


AineToole
RAMPART successfully detecting contaminants and depth of coverage direct from MinION sequencer in real time. #RealTimeGhana18




arambaut:
A nanopore MinIT running a Flongle, wireless controlled by a laptop running @NetworkArtic RAMPART to do real-time viral analysis as the reads are base-called. #RealTimeGhana18


kirstynbrunker:
Team EARRT @george_l present their idea for a measles surveillance system in East Africa to help deal with challenges of widespread movement and understand real-time pathogen spread #RealTimeGhana18 @NetworkArtic






K_G_Andersen:

>> https://www.nature.com/articles/s41564-018-0296-2

genomic epidemiology and how infectious disease genomics can be really helpful in tracking and understanding outbreaks? We tried to cram it all into a single review @NatureMicrobiol:




□ Real-time measurement of protein–protein interactions at single-molecule resolution using a biological nanopore:

>> https://www.nature.com/articles/nbt.4316

engineering a genetically encoded sensor for real-time sampling of transient PPIs at single-molecule resolution. This sensor comprises a truncated outer membrane protein pore, a flexible tether, a protein receptor and a peptide adaptor. This selective nanopore sensor could be applied for single-molecule protein detection, could form the basis for a nanoproteomics platform or might be adapted to build tools for protein profiling and biomarker discovery.






gaurav_bio:

Computational efficiency gains from kallisto paired with high concordance to traditional methods = no brainer: use pseudo-alignment/quasi-mapping. #bioinformatics #genomics

>> https://link.springer.com/protocol/10.1007/978-1-4939-8868-6_2

@lpachter A comparison of RSEM and kallisto for reproducible cancer RNA-seq analysis with workflow design.




□ Israel to Sequence 100K People, Create Genomic Database to Support 'Digital Health' and establish itself as a "world leader in precision medicine".

>> https://www.genomeweb.com/sequencing/israel-sequence-100k-people-create-genomic-database-support-digital-health






ReindertN:
Great @nanopore sequencing run on #microbiome of bleaching #coral to start the weekend: first ~2H yielded >1.1 million reads and >1.1 GBases (60 pooled barcoded amplicons) Using a fresh RevD flowcell and new MinION release 18.12.4. Also superbly nice pore occupancy of >90%




□ Qtlizer: comprehensive QTL annotation of GWAS results:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/17/495903.full.pdf

After applying the ETL process described in the last section, 40,883,209 QTLs (37,014,094 study-wide significant) from 3,856,968 variants to 32,987 genes were finally added to the Genehopper DB.




□ Benchmarker: an unbiased, association-data-driven strategy to evaluate gene prioritization algorithms:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/17/497602.full.pdf

They compared two gene prioritization methods, DEPICT and MAGMA; genes prioritized by both methods strongly outperformed genes prioritized by only one. This strategy is highly generalizable because it can be applied to any method that prioritizes genes or variants based on their similarity to each other with respect to some feature(s) of interest (e.g. similar patterns of gene set membership, similar epigenetic marks).






□ ELIGOS: Decoding the Epitranscriptional Landscape from Native RNA Sequences:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/17/487819.full.pdf

the ONT sequencing signals obtained from cDNA and those derived from the same RNA molecules by dRNA-seq could be used to filter out systematic noise from data to detect locations of possible RNA modifications. The ELIGOS software is publicly available and can be used to detect possible RNA modification sites and secondary structures quickly, on a global transcriptomic scale. Moreover, ELIGOS can be used as a diagnostics tool to improve the base calling algorithm of nanopore sequencing.




□ High Dimensional Mediation Analysis with Applications to Causal Gene Identification:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/17/497826.full.pdf

MedFix is a new application of adaptive lasso with one additional tuning parameter. MedMix is a novel mediation model based on high dimensional linear mixed model, for which we also develop a new variable selection algorithm. motivated by the causal gene identification problem, where causal genes are defined as the genes that mediate the genetic effect. the genetic variants are the high dimensional exposure, the gene expressions the high dimensional mediator, and the phenotype of interest the outcome.






□ Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments:

>> https://almob.biomedcentral.com/articles/10.1186/s13015-018-0135-2

Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact p-values for motif observations in sequences. Biases in motif distributions across ranked sequence lists are evaluated using random walks, Brownian bridges, or modified rank based statistics. A modular setup and fast analytic p-value evaluations make Regmex applicable to diverse and potentially large-scale motif analysis problems.




□ Beyond SNP Heritability: Polygenicity and Discoverability of Phenotypes Estimated with a Univariate Gaussian Mixture Model:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/17/498550.full.pdf

using detailed linkage disequilibrium structure from an extensive reference panel, to estimate these quantities from genome-wide association studies (GWAS) summary statistics for SNPs with minor allele frequency >1%. A power analysis allows us to estimate the proportions of phenotypic variance explained additively by causal SNPs at current sample sizes, and map out sample sizes required to explain larger portions of additive SNP heritability, and also allows for estimating residual inflation.




□ Drop-seq-compatible DART-seq captures multiplexed RNA targets--including non-polyA transcripts, viral transcripts & Ig sequences--simultaneously with 5’-end sequenced transcriptomes in single cells at high throughput.

>> http://bit.ly/2A6NK9U




□ Simphony: simulating large-scale, rhythmic data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/17/497859.full.pdf

Simphony has adjustable parameters for specifying experimental design and modeling rhythms, including the ability to sample from Gaussian and negative binomial distributions. rhythm detection improved as rhythm amplitude increased or the interval between time points decreased. Rhythm detection also improved as baseline expression increased (and thus as the standard deviation of log-transformed counts of non-rhythmic genes decreased).






□ STRING2GO: Using deep maxout neural networks to improve the accuracy of function prediction from protein interaction networks:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/17/499244.full.pdf

It adopts deep maxout neural networks to learn a novel type of functional biological network feature representations simultaneously encapsulating both node neighborhoods and co-occurrence functions information. hese higher-level representations are learnt in a supervised way by training deep maxout neural networks to output all the terms in biological process domain associated with an input protein – an approach that has led to higher predictive accuracy in the past.




□ Tensor Decomposition of Stimulated Monocyte and Macrophage Gene Expression Profiles Identifies Neurodegenerative Disease-specific Trans-eQTLs

>> http://biorxiv.org/cgi/content/short/499509v1

robust evidence that some disease-associated genetic variants affect the expression of multiple genes in trans.




□ Variation in proviral content among human genomes mediated by LTR recombination

>> https://mobilednajournal.biomedcentral.com/articles/10.1186/s13100-018-0142-3

The viral content of human genomes is more variable than we thought. Because LTR-LTR recombination events may occur long after proviral insertion but are challenging to detect in resequencing data, we hypothesize that this mechanism is a source of genomic variation in the human population that remains vastly underestimated.






Xmas Night Music Review.

2018-12-23 23:55:35 | music18


□ Zayn / "Icarus Falls"

>> https://itunes.apple.com/jp/album/icarus-falls/1444229858

ZAYN - There You Are (Lyric Video)


プリミティブなコーラスが心地よいZAYNの新曲。洋楽POPのメインストリームが90年台回帰的なのと、トロピカルハウスの潮流が混じり合って、Enigma世代には非常に懐かしい風を感じる昨今の音楽シーンである。






□ The Chainsmokers / "Sick Boy"

>> https://itunes.apple.com/jp/album/sick-boy/1445725433

The Chainsmokers - This Feeling (Official Video) ft. Kelsea Ballerini


NY拠点に活動するEDMデュオ。どこまでもメロウなサウンドと、詩情の下に隠した牙が鋭いポップネス。






□ Glen Campbell: "Sings for the King"

>> https://itunes.apple.com/us/album/sings-for-the-king/1437210358

クリスマスと言えば、ロカビリー! この未発表音源は、カントリー歌手であるグレン・キャンベルが、エルヴィス・プレスリーの為に書かれた曲のデモ音源を歌っていたもの。この米音楽史上最高の歌手の二人によるデュエットも収録。






□ John Legend / "A Legendary Christmas"

>> https://www.johnlegend.com/legendaryxmas/

John Legend - By Christmas Eve (Official Audio)


Just perfect.💖👩‍🚀 Perfectly sung christmas moods!🎅🌙🌈




WIND RIVER

2018-12-23 00:54:34 | 映画


□ 『WIND RIVER (ウインド・リバー)』

>> http://wind-river.jp

Directed by Taylor Sheridan
Produced by Matthew George / Basil Iwanyk / Peter Berg / Wayne L. Rogers / Elizabeth A. Bell
Written by Taylor Sheridan
Starring Jeremy Renner / Elizabeth Olsen
Music by Nick Cave / Warren Ellis


『ここには運などない。生き延びるか諦めるか、それだけだ。』

『Wind River』現代西部劇として定評ある脚本家、テイラー・シェリダン自らメガホンを取ったスリラー。雪深いシチュエーションでの淡々とした作劇は味わい深いが、特筆すべきはシャープな撮影手法と、ソリッドで緊迫感に満ちたアクション演出。堪えていた感情が発露する終盤の対話には落涙を禁じ得ない。


生き残ろうとする意思が、狩人を導く。
"ウインド・リバー"の風が、雪上に遺した命の軌跡を攫う前に。








Asterism.

2018-12-13 23:23:23 | Science News





□ High-resolution genetic mapping of putative causal interactions between regions of open chromatin:

>> https://www.nature.com/articles/s41588-018-0278-6

a Bayesian hierarchical approach that uses two-stage least squares and applied it to an ATAC-seq (assay for transposase-accessible chromatin using sequencing) data set from 100 individuals, to identify over 15,000 high-confidence causal interactions. Assignment of the direction of effect between different peaks allowed us to identify smaller sets of plausible candidate variants by identifying “master regulatory” regions, and also revealed the genomic architecture of causal interactions between regulatory elements.






□ Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies:

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2485-7

Purge Haplotigs was developed specifically for third-gen sequencing-based assemblies to automate the reassignment of allelic contigs, and to assist in the manual curation of genome assemblies. The pipeline uses a draft haplotype-fused assembly or a diploid assembly, Minimap2 read alignments, and repeat annotations to identify allelic variants in the primary assembly. Purge Haplotigs will run on either a haploid assembly (i.e. Canu, FALCON or FALCON-Unzip primary contigs) or on a phased-diploid assembly (i.e. FALCON-Unzip primary contigs + haplotigs).






□ Fast and accurate large multiple sequence alignments using root-to-leave regressive computation:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/07/490235.full.pdf

developed and validated on protein sequences a regressive algorithm that works the other way around, aligning first the most dissimilar sequences. this algorithm produces more accurate alignments than non-regressive methods, especially on datasets larger than 10,000 sequences. By design, it can run any existing alignment method in linear time. in the case of Clustal Omega (ClustalO) using mBed trees, the regressive combination was about twice as fast as the progressive alignment and appeared to have a linear complexity.




□ isONclust: De novo clustering of long-read transcriptome data using a greedy, quality-value based algorithm:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/06/463463.full.pdf

isONclust is a tool for clustering either PacBio Iso-Seq reads, or Oxford Nanopore reads into clusters, where each cluster represents all reads that came from a gene. Output is a tsv file with each read assigned to a cluster-ID. isONclust on 3 simulated & 5 biological datasets, across a breadth of organisms, technologies, and read depths. the results demonstrate that isONclust is a substantial improvement over previous approaches, both in terms of overall accuracy and/or scalability to large datasets.






□ Hidden patterns of codon usage bias across kingdoms:

>> https://www.biorxiv.org/content/early/2018/11/24/478016

derive from first principles a mathematical model describing the statistics of codon usage bias and apply it to extensive genomic data. A new model-based measure of codon usage bias that extends existing measures by taking into account both codon frequency and codon distribution reveals distinct, amino acid specific patterns of selection in distinct branches of the tree of life.






□ Naught all zeros in sequence count data are the same:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/26/477794.full.pdf

a systematic description of different processes that can give rise to zero values as well as the types of methods for addressing zeros in sequence count studies. The results demonstrate that zero-inflated models can have substantial biases in both simulated and real data settings. Additionally, they find that zeros due to biological absences can, for many applications, be approximated as originating from under sampling. the zero-inflated models tend to inflate parameter estimates in both simulated and real data settings due to inherent identi-fiability issues. this parameter inflation can be so severe as to dominate the results of a DE analysis on a previously published single-cell RNA-seq study.




□ Devil in details: Beware the Jaccard: the choice of metric is important and non-trivial in genomic colocalisation analysis.

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/27/479253.full.pdf




□ Efficient computation of spaced seed hashing with block indexing:

>> https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-018-2415-8

the FISH algorithm can be further exploited to improve the speed up with respect to the computation of the Q-grams hashing of each spaced seed separately. FISH can compute the hashing values of spaced seeds with a speedup, with respect to the traditional approach, between 1.9x to 6.03x, depending on the structure of the spaced seeds. Going from "contiguous k-mers" to "spaced k-mers" usually brings an overhead (e.g., 1.5-2.5x in Seed-Kraken).






□ RISC: robust integration of single-cell RNA-seq datasets with different extents of cell cluster overlap:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/29/483297.full.pdf

In RISC, instead of estimating the lambda, the PCR model selects the PCs based on dimension reduction, the process regularizes the matrices and generates the unique singular vectors at the first step of scRNA-seq data analysis. Because of the natural compatibility of eigenvectors between PCR model and dimension reduction, RISC can accurately integrate scRNA-seq datasets and avoid over-integration.






□ DeeReCT-PolyA: a robust and generic deep learning method for PAS identification.

>> https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty991/5221014

DeeReCT-PolyA is a robust, PAS motif agnostic, and highly interpretable and transferrable deep learning model for accurate PAS recognition, which requires no prior knowledge or human-designed features.




□ OCHROdb: a comprehensive, quality checked database of open chromatin regions from sequencing data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/03/484840.full.pdf

Multiple large-scale consortia-based projects, including ENCODE, REMC, Blueprint and GGR have generated thousands of sequencing data samples that capture DNase-I hypersensitive sites (DHS) on the whole genome in hundreds of cell types. an analysis pipeline that gets hundreds of pre-processed DHS data as the input, aligns regions of open chromatin across samples, checks quality of each region using a replication-based test, and outputs a well-curated DB of open chromatin accessibility across the whole genome.




□ Ozymandias: A biodiversity knowledge graph:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/04/485854.full.pdf

it is worth noting that the biodiversity informatics community has been aware of knowledge graphs and semantic web technologies for a decade or more, and several taxonomic databases have been serving data in RDF since the mid-2000’s. Ozymandias is a biodiversity knowledge graph. This mapping can then be used to construct a knowledge graph, where entities such as taxa, publications, people, places, specimens, sequences, and institutions are all part of a single, shared knowledge space.




□ ClinGen Receives Recognition Through New @US_FDA Human Variant Database Program. ClinGen expert curated variants are available for unrestricted use in the community via @NCBI_Clinical ClinVar

>> http://bit.ly/2ScRzRi




□ Ultra-deep, long-read nanopore sequencing of mock microbial community standards

>> http://biorxiv.org/cgi/content/short/487033v1




□ CellTagging: Single-cell mapping of lineage and identity in direct reprogramming

>> https://www.nature.com/articles/s41586-018-0744-4

CellTagging is a combinatorial cell-indexing methodology that enables parallel capture of clonal history and cell identity, in which sequential rounds of cell labelling enable the construction of multi-level lineage trees. the results demonstrate the utility of our lineage-tracing method for revealing the dynamics of direct reprogramming.






□ Cell growth is an omniphenotype:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/05/487157.full.pdf

provide evidence that cell growth is a generalizable phenotype because it is an aggregation of phenotypes. To the extent that it might be an aggregation of all possible phenotypes – an omniphenotype – suggests its potential as a pan-disease model for biological discovery and drug development.




□ New methods to calculate concordance factors for phylogenomic datasets:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/05/487801.full.pdf

the gene concordance factor (gCF) is defined as the percentage of “decisive” gene trees containing that branch. a package that calculates it while accounting for variable taxon coverage among gene trees. the site concordance factor (sCF) is a new measure defined as the percentage of decisive sites supporting a branch in the reference tree. gCF and sCF complement classical measures of branch support in phylogenetics by providing a full description of underlying disagreement among loci and sites.




□ Accelerated Bayesian inference of gene expression models from snapshots of single-cell transcripts:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/07/489880.full.pdf

the time-dependent mRNA distributions of discrete-state models of gene expression are dynamic Poisson mixtures, whose mixing kernels are characterized by a piece-wise deterministic Markov process. combined this analytical result with a kinetic Monte Carlo algorithm to create a hybrid numerical method that accelerates the calculation of time-dependent mRNA distributions by 1000-fold compared to current methods. then integrated the hybrid algorithm into an existing Monte Carlo sampler to estimate the Bayesian posterior distribution of many different, competing models in a reasonable amount of time.




□ Galaxy-Kubernetes integration: scaling bioinformatics workflows in the cloud:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/07/488643.full.pdf

Kubernetes acts as an abstraction layer between Galaxy and the different cloud providers, allowing Galaxy to run on every cloud provider that supports Kubernetes (>10 cloud providers currently).




□ Sparse Dynamic Programming on DAGs with Small Width just accepted to ACM TALG. (ACM Transactions on Algorithms.)

>> https://link.springer.com/chapter/10.1007%2F978-3-319-89929-9_7

"Using Minimum Path Cover to Boost Dynamic Programming on DAGs: Co-linear Chaining Extended." - an algorithm for finding a minimum path cover of a DAG (V, E) in 𝑂(𝑘|𝐸|log|𝑉|) time, improving all known time-bounds when k is small and the DAG is not too dense. a general technique for extending dynamic programming (DP) algorithms from sequences to DAGs. This is enabled by our minimum path cover algorithm, and works by mimicking the DP algorithm for sequences on each path of the minimum path cover.






□ Statistical Dynamics of Spatial-Order Formation by Communicating Cells:

>> https://www.cell.com/iscience/fulltext/S2589-0042(18)30022-1?sf203801563=1

cellular automata and mimicking approaches of statistical mechanics—for understanding how secrete-and-sense cells with bistable gene expression, from disordered beginnings, can become spatially ordered by communicating through rapidly diffusing molecules.

Classifying lattices of cells by two “macrostate” variables—“spatial index,” measuring degree of order, and average gene-expression level: a group of cells behaves as a single particle, in an abstract space, that rolls down on an adhesive “pseudo-energy landscape” whose shape is determined by cell-cell communication and an intracellular gene-regulatory circuit.

the gradient of the pseudo-energy and a “trapping probability,” which quantifies the adhesiveness of the pseudo-energy landscape, together determine the particle's trajectories in the phase space - the particle rolls down along the negative of the gradient of the pseudo-energy.




□ ReMIX: Genome-wide recombination map construction from single individuals using linked-read sequencing:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/08/489989.full.pdf

ReMIX makes use of linked-read sequencing technology developed by 10X Genomics to acquire long-range haplotype information from gametes of a single individual. Using the recombinant molecules, crossover locations are defined as genomic intervals based on the location of the last variant of the first haplotype and first variant of the second. The linked-read information is exploited by ReMIX during three steps: identifying high-quality heterozygous variants, reconstructing molecules, and haplotype phasing each molecule.




□ Robust and Structural Ergodicity Analysis and Antithetic Integral Control of a Class of Stochastic Reaction Networks:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/08/481051.full.pdf

addressing the problem of verifying the conditions for large sets of reaction networks with time-invariant topologies, either from a robust or a structural viewpoint, using three different approaches. by exploiting the Metzler structure of the matrix, it has been possible to obtain interesting simplified conditions for the robust and structural ergodicity of stochastic reaction networks with uncertain reaction rates.




□ Smart computational exploration of stochastic gene regulatory network models using human-in-the-loop semi-supervised learning:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/08/490623.full.pdf

the information about a modeler’s preferences can be used to train classifiers and use them to guide the sampling process of the parameter space where the exploration of “interesting” regions are accelerated. This way of training classifiers based on modeler input can be seen as a way to engineer objective functions that can be used in systematic downstream sampling algorithms that require prior information.






□ GeTallele: a mathematical model and a toolbox for integrative analysis and visualization of DNA and RNA allele frequencies:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/09/491209.full.pdf

Based on the results, variant probability vPR can serve as a dependable indicator to assess gene and chromosomal allele asymmetries and to aid calls of genomic events. GeTallele allows to visualize the observed patterns, with the ability to magnify regions of interest to desired resolution, including chromosome, gene, or custom genome region, along with statistical measures of the modes, for all the modes in the examined segment.




□ Searching and mapping genomic subsequences in nanopore raw signals through novel dynamic time warping algorithms:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/10/491456.full.pdf

the Direct Subsequence Dynamic Time Warping for nanopore raw signal search (DSDTWnano) and the continuous wavelet Subsequence Dynamic Time Warping for nanopore raw signal search (cwSDTWnano), to enable the direct subsequence searching and exact mapping in nanopore raw signals. DSDTWnano could ensure an output of highly accurate query result and cwSDTWnano is the accelerated version of DSDTWnano, with the help of seeding and multi-scale coarsening of signals that based on continuous wavelet transform (CWT).




□ Expansion, Exploitation and Extinction: Niche Construction in Ephemeral Landscapes:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/09/489096.full.pdf

developed an Interacting Particle System (IPS) to study the effect of niche construction on metapopulation dynamics in ephemeral landscapes. Using finite scaling theory, a divergence in the qualitative behavior at the extinction threshold between analytic (mean field) and numerical (IPS) results when niche construction is confined to a small area in the spatial model.




□ simuG: a general-purpose genome simulator:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/09/491498.full.pdf

simuG, a light-weighted tool for simulating the full-spectrum of genomic variants. The simplicity and versatility of simuG makes it a unique general purpose genome simulator for a wide-range of simulation-based applications.





ephemeral.

2018-12-07 19:19:19 | Science News


内在する境界と外在する境界の同質性が不均衡に分布・発現する過程において、その間隙を結ぶダイナミクスの影として、意思は意思と観測される。




□ High-resolution mapping of regulatory element interactions and genome architecture using ARC-C:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/11/467506.full.pdf

ARC-C (accessible region chromosome conformation capture) works well for profiling chromatin interactions, as sequencing 200 million fragments per duplicate library produces enough cis-informative read pairs for profiling architecture and regulatory element interactions. Investigating domain level architecture, chromatin domains defined by either active or repressive modifications form topologically associating domains (TADs) and these domains interact to form A/B (active/inactive) compartment structure.




□ Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/13/469130.full.pdf

NJMerge, which is a polynomial-time extension of Neighbor Joining (NJ); thus, NJMerge can be viewed either as a method for improving traditional NJ or as a method for scaling the base method to larger datasets. NJMerge sometimes improved the accuracy of traditional NJ and substantially reduced the run- ning time of three popular species tree methods (ASTRAL-III, SVDquartets, and “con- catenation” using RAxML) without sacrificing accuracy.






□ Scaling computational genomics to millions of individuals with GPUs:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/14/470138.full.pdf

the re-implemented methods for two commonly performed computational genomics tasks in a GPU environment: (i) QTL mapping (which we call tensorQTL) and Bayesian non-negative matrix factorization10 (named SignatureAnalyzer-GPU). the application of Bayesian NMF to a million single cells takes approximately 6 hours using this GPU-based approach compared to approximately 50 days with CPUs. Similarly, GPU implementation enables computation of empirical p-values for trans-QTL, which is intractable on CPUs.




□ Bayesian Shrinkage Estimation of High Dimensional Causal Mediation Effects in Omics Studies:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/14/467399.full.pdf

Bayesian Sparse Linear Mixed Model (BSLMM), a hybrid between LMM and BVSR that imposes continuous shrinkage on the effects, for high-dimensional mediation analysis.




□ SVXplorer: Identification of structural variants through overlap of discordant clusters:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/14/469981.full.pdf

SVXplorer, which uses a streamlined sequential approach to integrate discordant paired-end align- ments with split-reads and read depth information.




□ BVS: Sparse variable and covariance selection for high-dimensional seemingly unrelated Bayesian regression:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/11/467019.full.pdf

a Bayesian Variable Selection (BVS) model includes a matrix of binary variable selection indicators for multivariate regression, thus allowing different phenotype responses to be associated with different genetic predictors. The covariance structure may be dense (unrestricted) or sparse, with a graphical modelling prior. The graphical structure amongst the multivariate responses can be estimated as part of the model. Since the model loses the conjugacy of the earlier models, exploit a factorisation of the covariance matrix parameter to enable faster computation using Markov Chain Monte Carlo methods.




□ methplotlib: a genome browser for nanopore methylation data

>> https://gigabaseorgigabyte.wordpress.com/2018/11/12/announcing-methplotlib-a-genome-browser-for-nanopore-methylation-data/




□ Testing for Hardy-Weinberg Equilibrium in Structured Populations using NGS Data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/12/468611.full.pdf

The bias of calling genotypes for low-depth sequencing data has also been demonstrated using sHWE and PLINK, but PCAngsd is able to overcome this bias by working directly on genotype likelihoods.




□ Sketching Algorithms for Genomic Data Analysis and Querying in a Secure Enclave:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/12/468355.full.pdf

This solution maintains a “sketch” (or a summary) of the potentially important dimensions and allows the user to identify the top-k dimensions with respect to the χ2 values - with high probability.




□ ADAMH: Bayesian estimation for stochastic gene expression using multifidelity models:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/11/468090.full.pdf

Adaptive Delayed Acceptance Metropolis-Hastings (ADAMH) algorithm to utilize with reduced Krylov-basis projections of the Finite State Projection (FSP).




□ Multi-BRWT: Sparse Binary Relation Representations for Genome Graph Annotation:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/12/468512.full.pdf

This BRWT is implemented as a tree in memory, compressing the index and leaf vectors as RRR vectors. Multi-BRWT, led to an 80% decrease in compressed size compared to the baseline method and a 50% decrease compared to the closest competitor, Rainbowfish.




□ ManiNetCluster: A Manifold Learning Approach to Reveal the Functional Linkages Across Multiple Gene Networks:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/14/470195.full.pdf

Manifold learning has been successfully used to find aligned, local and non-linear structures among non biological networks; e.g., manifold alignment and warping.






□ Offering free DNA sequencing, Nebula Genomics opens for business. But there’s an itsy-bitsy catch

>> https://www.statnews.com/2018/11/15/nebula-genomics-offers-free-dna-sequencing/

Nebula’s core privacy protection will come from its use of blockchain technology, a distributed ledger that underlies bitcoin and other cryptocurrencies. Nebula will offer "credits" for tests or other perks, and LunaDNA to offer stock or shares, only Encrypgen allows you to be paid NOW with $DNA.






□ Xeva: Integrative Pharmacogenomics Analysis of Patient Derived Xenografts:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/16/471227.full.pdf

The Xeva package follows the PDX minimum information (PDX-MI) standards and can handle both replicate-based and 1x1x1 experimental designs. The key strengths of the Xeva platform is its ability to store all metadata from a PDX experiment, link genomic data to corresponding PDX models and provide user friendly functions for analysis.




□ Reconstructing the History of Polygenic Scores Using Coalescent Trees:

>> http://www.genetics.org/content/early/2018/11/02/genetics.118.301687

a set of methods for estimating the historical time course of a population-mean polygenic score using local coalescent trees at GWAS loci. These time courses are estimated by using coalescent theory to relate the branch lengths of trees to allele-frequency change. Because of its grounding in coalescent theory, this framework can be extended to a variety of demographic scenarios, and its usefulness will increase as both GWAS and ancestral recombination graph (ARG) inference continue to progress.






□ Distributed retrieval engine for the development of cloud-deployed biological databases:

>> https://biodatamining.biomedcentral.com/articles/10.1186/s13040-018-0185-5

Integrating cloud resources and federated data retrieval engine in the context of the development of specialized databases has the potential to enhance the constant development in databases in the biomedical field. This framework distributes a query among several strategic web-based biological databases, such as NCBI’s datasets and Malacards, storing the retrieved results over MongoDB cloud service, and annotating them with the query keywords for future retrieval.




□ KrakenUniq: confident and fast metagenomics classification using unique k-mer counts

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1568-0

KrakenUniq is based on the Kraken metagenomics classifier, to which it adds a method for counting the number of unique k-mers identified for each taxon using the efficient probabilistic cardinality estimation algorithm HyperLogLog.




□ Boost-HiC : Computational enhancement of long-range contacts in chromosomal contact maps:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/18/471607.full.pdf

Boost-HiC, enables the detection of Hi-C patterns such as chromosomal compartments at a resolution that would be otherwise only attainable by sequencing x100 times deeper the experimental Hi-C library.






□ Assembly of a pan-genome from deep sequencing of 910 humans of African descent:

>> https://www.nature.com/articles/s41588-018-0273-y

aligned 1.19 trillion reads from the 910 individuals to the reference genome (GRCh38), collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs). nearly 300 Mb of DNA missing from the reference genome.






□ RA (Rapid Assembler): Overlap-layout-consensus based DNA assembler of long uncorrected reads:

>> https://github.com/lbcb-sci/ra

a fast and easy to use de novo assembler for long reads: Ra consists of Minimap2, Rala and Racon.




□ VariantKey - A Reversible Numerical Representation of Human Genetic Variants:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/19/473744.1.full.pdf

VariantKey, a novel reversible numerical encoding schema for human genetic variants, overcomes the limitations by allowing to process variants as a single 64 bit numeric entities while preserving the ability to be searched and sorted per chromosome and position.






□ TSD: A computational tool to study the complex structural variants using PacBio targeted sequencing data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/20/474445.full.pdf

The genomic organization structure of targeted sequences is recovered by assembling the mapped PacBio fragments. Evaluation suggests that TSD has an equal or better performance in discovering the structure of SVs than existing tools, especially when the targeted sequences have complex structure in the genome.




□ Multi-Scale Structural Analysis of Proteins by Deep Semantic Segmentation:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/20/474627.1.full.pdf

A characteristic of this model that distinguishes it from most conventional structure classification algorithms is that it is sequence-agnostic. On the single-residue level, Shannon entropies of the CNN-predicted probability distributions can be used as indicators of local structure quality, allowing for identification of low-confidence regions that require further refinement or reconstruction.

The model was implemented in the PyTorch deep learning framework. Training was performed for a total of 160 epochs with a mini-batch size of 64 using the Adam optimization algorithm. using RosettaRemodel, which was used in the original study, to generate 7000 backbone designs for each of the topologies, producing a total of 84,000 structures. Dropout regularization was used throughout the convolutional layers in the encoding phase with a zeroing-probability of 0.1. All weights were initialized using Xavier initialization.




□ The entropic force generated by intrinsically disordered segments tunes protein function

>> https://www.nature.com/articles/s41586-018-0699-5

The data show that the unfolded state of the ID-tail rectifies the dynamics and structure of UGDH to favour inhibitor binding. Because this entropic rectifier does not have any sequence or structural constraints, it is an easily acquired adaptation. This model implies that evolution selects for disordered segments to tune the energy landscape of proteins, which may explain the persistence of intrinsic disorder in the proteome.




□ A deep learning approach to automate refinement of somatic variant calling

>> https://www.nature.com/articles/s41588-018-0257-y




□ FactorialHMM: Fast and exact inference in factorial hidden Markov models:

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/bty944/5184283

FactorialHMM allows simulating either directly from the model or from the posterior distribution of states given the observations. Additionally, they allow the inference of all key quantities related to HMMs: the (Viterbi) sequence of states with the highest posterior probability; the likelihood of the data; and the posterior probability (given all observations) of the marginal and pairwise state probabilities. The running time and space requirement of all procedures is linearithmic in the number of possible states.






□ ExtendAlign: a computational algorithm for delivering multiple global alignment results originated from local alignments:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/23/475707.full.pdf

ExtendAlign extends the alignment achieved by a local MSAT and provides an end-to-end report of true m/mm for every hit in each query sequence, reducing the aforementioned alignment bias. ExtendAlign increases significantly the number of m/mm originally missed by an MSAT in all alignments tested. Remarkably, ExtendAlign corrects the alignment hits of dissimilar sequences in the range of ∼35–50% similarity - also known as the twilight zone.




□ LC_EC_analyser: Comparative assessment of long-read error-correction software applied to RNA-sequencing data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/23/476622.full.pdf

LC_EC_analyser, a software that enables automatic benchmarking of long-read RNA-sequencing error-correction software, in the hope that future error-correction methods will take advantage of it to avoid biases.




□ Noise-Cancelling Repeat Finder: Uncovering tandem repeats in error-prone long-read sequencing data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/22/475194.full.pdf

NCRF utilizes a modified Smith-Waterman algorithm aligning a read to any number of tandem copies of a specified motif sequence. A modified version of Dynamic Programming (DP) uses only one copy of the motif but allows loops from the end back to the beginning, so that one column of the DP matrix represents any number of tandem copies of the motif.




□ NGOMICS-WF, a Bioinformatic Workflow Tool for Batch Omics Data Analysis:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/23/475699.full.pdf

NG-Omics-WF is a workflow tool to automatically run a bioinformatic pipeline for multiple datasets under generic Linux computer or Linux cluster environment. It has been tested with Open Grid Engine (OGE).




□ APPLES: Fast Distance-based Phylogenetic Placement:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/23/475566.full.pdf

APPLES has better accuracy than ML for placing on trees with thousands of species and can place on trees with a hundred thousands species, and identifies samples w/o assembled sequences for the reference or the query using k-mer-based distances, a scenario that ML cannot handle. For simulated datasets, estimate the topology of backbone tree by running RAxML on the true alignment using GTRGAMMA model and use this tree as the backbone for pplacer.




□ BioKEEN: A library for learning and evaluating biological knowledge graph embeddings:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/23/475202.full.pdf

Determining the appropriate values for the hyper-parameters of a KGE model requires both machine learning and domain specific knowledge. If the user specifies hyper-parameters, BioKEEN can be run directly in ​training mode.




□ Multi-level Approximate Bayesian Computation:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/23/475640.full.pdf

This method focus on discrete state-space stochastic models that are governed by the CME, and demonstrates a multi-resolution inference method, where a machine-learning-led approach is able to select sample paths for further refinement, without introducing an additional bias.





ODESZA / "A Moment Apart (Deluxe Edition)"

2018-12-04 21:31:04 | music18


□ ODESZA / "A Moment Apart (Deluxe Edition)"

>> https://itunes.apple.com/jp/album/a-moment-apart-deluxe-edition/1443989226
>> https://odesza.com

Release Date; 30/11/2018
Label: Counter/Ninja Tune/Foreign Family Collective


>> additional tracklisting.

1. Loyal
2. Memories That You Call (feat. Monsoonsiren) [ODESZA & Golden Features VIP Remix]
3. It’s Only (feat. Zyra) [ODESZA VIP Remix]
4. Falls (Reprise) (feat. Sasha Sloan)
5. Line Of Sight (Reprise) (feat. WYNNE & Mansionair)
6. Higher Ground (Reprise) (feat. Naomi Wild)
7. Falls (Reprise) [Instrumental]
8. Line Of Sight (Reprise) [Instrumental]
9. Higher Ground (Reprise) [Instrumental]


Harrison MillsとClayton Knightの『次世代のENIGMA』にポジションする、エスノ系EDMデュオ。昨年のヒットアルバムにセルフリミックスと、クワイアやオーケストラのスローアレンジを加えて収録したリイシュー盤。


future bassなどの最先端のポップネスを軸に、民族音楽などのサンプリングをふんだんに行う手法が、トロピカルハウス隆盛のトレンドに上手く乗り、USを中心に急速に求心力を高めている。



□ ODESZA - Memories That You Call (feat. Monsoonsiren) [ODESZA & Golden Features VIP Remix]

トロピカルハウスに先鞭を打ったとは良く評価されるが、Ninja Tuneのレーベルカラーとしては20年前からある手法ではある。民族音楽風コーラスとEDMの調合法は、70年代プログレにまで遡ることが出来る。しかし昨今のトロピカル・ハウス隆盛の源流には、90年代ニューエイジの存在が確かにあると思わせてくれる。




□ ODESZA - It’s Only (feat. Zyra) [ODESZA VIP Remix]

ODESZAとENIGMAの共通点は、民族音楽や聖歌など、グローバルでタイムレスな音楽素材を用いながら、お互いに当時の最先端のポップカルチャーとサウンドメイキングに通じていること。ただODESZAの音楽はよりドラマチックだ。




elongate.

2018-12-03 03:03:03 | Science News


遺伝子分野に限らず、技術工学の取り得る選択肢で『倫理的に問題』であるとはどういうことか。個々の行動規範が、議論やコンセンサスを経た結果であるかどうかは、単純に予測と結果の仮説の上にのみ成り立つ。この問題を制御する上で本当に必要なのは、技術と力学的関係にある社会規範の検証である。可能なことを実証し続けなければ、我々に待っているのは死滅の運命だ。遺伝子工学がカタストロフィに向かうかどうかは、個々が然るべきトランザクションを経たとしても予測不可能である。

過ってはいけないのが、生命としての動機だ。感情は思考を定め、思考は行動を決定する。そこにはあるのは、自己と非自己の振る舞いの差異だけである。生命種は、相互作用する振動子の共振のレベルと摂動関係に置き換えられる。振る舞いに干渉するのであれば、別の記号でも知性でも人工知能でも問題ではない。このアルゴリズムの瑕疵はシステムの膜の外側に向かって、予め決定論的に発現する。我々は過ちを侵すが、侵さなければ エラーを修正できない。



□ PEAS: A neural network based model effectively predicts enhancers from clinical ATAC-seq samples:

>> https://www.nature.com/articles/s41598-018-34420-9

Among the tools developed by the ENCODE consortium, the Hidden Markov Model (HMM)–based ChromHMM algorithm has become an important tool to assess the global epigenomic landscape in human cells by segmenting genome-wide chromatin into a finite number of chromatin states. Although ChromHMM has been very powerful in finding regulatory elements in diverse human cell types, ChromHMM cannot be applied on clinical samples since the datasets that it stem from (i.e., multiple ChIP-seq profiles) cannot be easily generated in these samples.




□ Clustering-based optimization method of reference set selection for improved CNV callers performance:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/25/478313.1.full.pdf

CODEX algorithm is based on a multi-sample normalization model, which is fitted to remove various biases including noise introduced by different GC content in the analyzed targets, and CNVs are called by the Poisson likelihood-based segmentation algorithm. ExomeCopy implements a hidden Markov model which uses positional covariates, including background read depth and GC content, to simultaneously normalize and segment the samples into the regions of constant copy count.




□ "on the definition of sequence identity":

>> http://lh3.github.io/2018/11/25/on-the-definition-of-sequence-identity

estimate error rate or identity:

minimap2 -c ref.fa query.fa \
| perl -ane 'if(/tp:A:P/&&/NM:i:(\d+)/){$n+=$1;$m+=$1 while/(\d+)M/g;$g+=$1,++$o while/(\d+)[ID]/g}END{print(($n-$g+$o)/($m+$o),"\n")}'

"The estimate of sequence identity varies with definitions and alignment scoring. When you see someone talking about “sequencing error rate” next time, ask about the definition and scoring in use to make sure that is the error rate you intend to compare."




□ StructLMM: A linear mixed-model approach to study multivariate gene–environment interactions:

>> https://www.nature.com/articles/s41588-018-0271-0

Although high-dimensional environmental data are increasingly available and multiple exposures have been implicated with G×E at the same loci, multi-environment tests for G×E are not established. while StructLMM can in principle be used in conjunction with any environmental covariance, they have here limited the application to linear covariances. The model could be extended to account for non-linear interactions, for example using polynomial covariance functions.




□ Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions:

>> https://www.nature.com/articles/s41588-018-0268-8

This flexible approach increases power, improves effect estimates and allows for more quantitative assessments of effect-size heterogeneity compared to simple shared or condition-specific assessments. although genetic effects on expression are extensively shared among tissues, effect sizes can still vary greatly among tissues. Some shared eQTLs show stronger effects in subsets of biologically related tissues, or in only one tissue (for example, testis).






□ Inferring putative transmission clusters with Phydelity:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/26/477653.full.pdf

Phydelity identifies groups of sequences that are more closely-related than the ensemble distribution of the phylogeny under a statistically-principled and phylogeny-informed framework, without the introduction of arbitrary distance thresholds. Phydelity infers the within-cluster divergence of putative transmission clusters by first determining the pairwise patristic distance distribution of closely-related tips. In simulated phylogenies, Phydelity achieves higher rates of correspondence to ground-truth clusters than current model-based methods, and comparable results to parametric methods without the need for parameter calibration.




□ Algorithm identifies multiple gene–environment relationships:

>> https://www.ebi.ac.uk/about/news/press-releases/gene-environment-algorithm

Comprehensive analysis of hundreds of environmental factors could enhance understanding of genotype–phenotype relationships






□ Using classification algorithms, such as support vector machines and neural networks, to automatically find efficient linear and non-linear collective variables for accelerated molecular simulations:

>> https://aip.scitation.org/doi/10.1063/1.5029972

solving the “initial” CV problem using a data-driven approach inspired by the field of supervised machine learning (SML). In particular, they show how the decision functions in SML algorithms can be used as initial CVs (SMLcv) for accelerated sampling. they illustrate how the distance to the support vector machines’ decision hyperplane, the output probability estimates from logistic regression, shallow or deep neural network classifiers, and other classifiers may be used to reversibly sample slow structural transitions.




□ SLEDGE Hammer: Swift Large-scale Examination of Directed Genome Editing:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/27/479261.full.pdf

The robust isolation and detection of multiple alleles of various abundancies in a mosaic genetic background allows phenotype-genotype correlation already in the injected generation, demonstrating the reliability and sensitivity of the filter-in-tips. the SLEDGE Hammer protocol with the adapted filter-in pipet tips, allows to bypass the otherwise tedious and time-consuming genomic purification step that hitherto limited high-throughput genotyping approaches.




□ OPTIMIR: a novel algorithm for integrating available genome- wide genotype data into miRNA sequence alignment analysis:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/27/479097.full.pdf

OPTIMIR, for pOlymorPhisminTegratIon for MIRna data alignment, is based on a scoring strategy that incorporate biological knowledge on miRNA editing to identify the most likely alignment in presence of cross-mapping reads. OPTIMIR integrates genetic information from genotyping arrays or DNA sequencing into the miRSeq data alignment process with the aim of improving the accuracy of polymiRs alignment, while accommodating for isomiRs detection and ambiguously aligned reads.






□ MinION Mk 1c will combine MinION, MinIT for rapid data analysis and a screen, for a one-stop palm sized fully portable sequencing system, fully connected.

Mk1C has mobile/cellular data, NVMe SSD built-in






□ PromethION is now giving us more than a Tb of @nanopore sequencing data per week. This is a ‘world changing phenomenon’#nanoporeconf






□ algorithms for SV detection by while genome alignment: RaGOO beats SALSA #nanoporeconf




□ Hidden Markov Models (HMMs) not only for calling signal, DNA storage can use paired HMM state machines; takes binary data, coverts to ternary data then encodes as DNA. Further extension to this - @nanopore DNA seq can be used for protein alignments #nanoporeconf






□ the cost calculation of RAGE sequencing SCISOR-seq / RAGE-seq



□ initial strategy was a mix of short, long, and linked reads. Encouraging results on PromethION motivated a switch to solely using PromethION. Currently running 6-8 flow cells in parallel twice a week, >1 terabase per week. "This is a world-changing phenomenon" #NanoporeConf






□ The @nanopore PromethION 48fc, with a theoretical maximum throughput of 15Tb per 48 fcell run, or 5769Gb per day, is now on par with the announced but not released @MGI_BGI T7 machine (not available until Q2/Q3 2019)




□ CoCo: RNA-seq Read Assignment Correction for Nested Genes and Multimapped Reads:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/29/477869.full.pdf

a correction of the gene annotation used by read assignment softwares such as featureCounts or HTSeq in order to correct the evaluated read counts for embedded genes such as snoRNA, that overlap features of their host gene's transcripts such as retained introns and exons. The second part of the correction distributes multi mapped reads in relation to the evaluated read counts obtained from single mapped reads. CoCo salvages over 15% of discarded aligned RNA-seq reads and significantly changes the abundance estimates for both coding and non-coding RNA as validated by PCR and bed-graph comparisons.




□ A Bayesian mixture modelling approach for spatial proteomics:

>> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006516

a Bayesian generative classifier based on Gaussian mixture models to assign proteins probabilistically to sub-cellular niches, proteins have a probability distribution over sub-cellular locations, using the expectation-maximisation algorithm, as well as Markov-chain Monte-Carlo. Outliers are often dispersed and thus this additional component is described by a heavy-tailed distribution: the multivariate Student’s t-distribution, leading us to a T-Augmented Gaussian Mixture model (TAGM).




□ Limits to a classic paradigm: Most transcription factors regulate genes in multiple biological processes:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/28/479857.full.pdf

In this scenario, general regulons show the regulatory potential of TFs, but the specific subset of genes in the regulon that is expressed at a certain time is defined by the combinatory logic of the TFs bound to each gene’s promoter. Dissecting the molecular decision-making processes associated to changes of growth conditions at a genomic level is doable with current technologies.




□ GRAM: A generalized model to predict the molecular effect of a non-coding variant in a cell-type specific manner:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/29/482992.full.pdf

using a LASSO regularized linear model, transcription factor binding most predictive, especially for TFs that are hubs in the regulatory network; in contrast, evolutionary conservation, a popular feature in many other functional-impact predictors, has almost no contribution. Moreover, TF binding inferred from in vitro SELEX is as effective as that from in vivo ChIP-Seq. Second, we implemented GRAM integrating SELEX features and expression profiles. The GRAM model will be a useful tool for elucidating the underlying patterns of variants that modulate expression in a cell-type context. By leveraging the accumulating data generated from multiple cell lines, future studies can be performed in-depth investigation using GRAM.




□ FastqPuri: high-performance preprocessing of RNA-seq data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/29/480707.full.pdf

FastqPuri provides sequence quality reports on the sample and dataset level with new plots which facilitate decision making for subsequent quality filtering. Using the BLOOM method to filter out potential contaminations using larger-sized files (e.g. genomes), FastqPuri was faster than BioBloom tools in generating the bloom filter but slightly slower in classifying sequences.




□ scGen: Generative modeling and latent space arithmetics predict single-cell perturbation response across cell types, studies and species:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/29/478503.full.pdf

scGen, a model combining variational autoencoders and latent space vector arithmetics for high-dimensional single-cell GE data. scGen learns cell type and species specific response implying that it captures features that distinguish responding from non-responding genes and cells. By adequately encoding the original expression space in a latent space, achieve simple, near-to-linear mappings for highly non-linear sources of variation in the original data, which explain a large portion of the variability in the data.




□ CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise:

>> https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1590-2

The new database contains 42,611 genes, of which 20,352 are potentially protein-coding and 22,259 are noncoding, and a total of 323,258 transcripts. These include 224 novel protein-coding genes and 116,156 novel transcripts. detected over 30 million additional transcripts at more than 650,000 genomic loci, nearly all of which are likely nonfunctional, revealing a heretofore unappreciated amount of transcriptional noise in human cells.

the novel genes and transcripts in CHESS using a genome-guided assembly pipeline including HISAT2 and StringTie. All of these samples were subjected to deep RNA-sequencing, with tens of millions of sequences (“reads”) captured from each sample.




□ NucBreak: Location of structural errors in a genome assembly by using paired-end Illumina reads:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/29/393488.full.pdf

NucBreak aimed at detecting structural errors in assemblies, including insertions, deletions, duplications, inversions, and different inter- and intra-chromosomal rearrangements. NucBreak analyses the alignments of reads properly mapped to an assembly and exploits information about the alternative read alignments.

compared NucBreak with other existing assembly accuracy assessment tools, namely Pilon, REAPR, and FRCbam as well as with several structural variant detection tools, including BreakDancer, Lumpy, and Wham, by using both simulated and real datasets. The benchmarking results have shown that NucBreak in general predicts assembly errors of different types and sizes with relatively high sensitivity and with higher precision than the other tools.






□ NucMerge: Genome assembly quality improvement assisted by alternative assemblies and paired-end Illumina reads:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/30/483701.full.pdf

The tool corrects insertion, deletion, substitution, and inversion errors and locates different inter- and intra-chromosomal rearrangement errors. NucMerge was compared to two existing alternatives, namely Metassembler and GAM-NGS. The results have shown that the error detection approach used in NucMerge is more effective than the CE-statistics and depth-of-coverage analysis.




□ Random Tanglegram Partitions (Random TaPas): An Alexandrian Approach to the Cophylogenetic Gordian Knot:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/29/481846.full.pdf

Random Tanglegram Partitions (Random TaPas) that applies a given global-fit method to random partial tanglegrams of a fixed size to identify the associations, terminals and nodes that maximize phylogenetic congruence. In addition, with time-calibrated trees, Random TaPas is also efficient at distinguishing cospeciation from pseudocospeciation. Random TaPas can handle large tanglegrams in affordable computational time and incorporates phylogenetic uncertainty in the analyses.




□ Linked-read sequencing of gametes allows efficient genome-wide analysis of meiotic recombination:

>> https://www.biorxiv.org/content/biorxiv/early/2018/11/30/484022.full.pdf

a highly efficient method for genome-wide identification of COs at kilobase resolution in pooled recombinants. The simplicity of this approach now enables the simultaneous generation and analysis of multiple CO landscapes and thereby allows for efficient comparison of genotypic and environmental effects on recombination, accelerating the pace at which the mechanisms for the regulation of recombination can be elucidated.






□ TorchCraftAI and CherryPi: a machine learning model for high-level strategy selection

>> https://torchcraft.github.io/TorchCraftAI/blog/2018/11/28/build-order-switch-retraining-has-arrived.html

TorchCraftAI: distributed RL environment.
CherryPI: modular StarCraft bot with a hybrid architecture combining rules/search and deep learning.




□ PoreOver: Nanopore basecalling in TensorFlow:

>> https://github.com/jordisr/poreover

PoreOver is a neural network basecaller for the Oxford Nanopore sequencing platform and is under active development. It is intended as a platform on which to explore new algorithms and architectures for basecalling. The current version uses a bidirectional RNN with LSTM cells and CTC loss to call bases from raw signal, and has been inspired by other community basecallers such as DeepNano and Chiron.





Hammock / "Universalis"

2018-12-01 12:54:42 | music18


□ Hammock / "Universalis"

>> https://www.hammockmusic.com/universalis

Release Date; 07/Dec/2018
Label; Blue Raft Music (BMI) / Celestial Sphere (ASCAP)

>> tracklisting.

01. Mouth to Dust... Waiting
02. Scattering Light
03. Universalis
04. Cliffside
05. Always Before Your Eyes
06. We Are More Than We Are
07. Tether of Yearning
08. Clothed with Sky
09. Thirst
10. We Watched You Disappear
11. Tremendum


Produced by Marc Byrd & Andrew Thompson
All songs written by Marc Byrd & Andrew Thompson

Art Direction by The Fuel And Lumber Company
Drawings by Pete Schulte
Layout by Ben Walker
Artwork photographed by Jonathan Purvis

Angelic vocals by Christine Glass Byrd
Additional keys on tracks 4 & 8 by Matt Kidd
Additional editing on tracks 2, 4 & 6 by Matt Kidd
Drums and percussion on tracks 2, 4, 6 and 9 by Ken Lewis
Engineered by Billy Whittington in Nashville, TN

Nashville’s Hammock follows up 2017’s critically-acclaimed Mysterium with Universalis, the second installment of a planned three-album series. While Mysterium took listeners down a horizontal path that explored themes of death and grief, Universalis begins a vertical, upward movement back toward the light.

At times, Universalis calls back to Hammock’s 2006 album Raising Your Voice… Trying to Stop an Echo, while also retaining Mysterium’s deep ambient, neoclassical style. The band also finds itself rediscovering some of its earliest influences, including Low and Red House Painters. With Universalis, Hammock invites listeners to lose themselves within its layers of sound, while also embracing the beauty in its raw openness and silence.


Nashvilleのシューゲイザー/アンビエントデュオの3部作アルバム構想の第2部。
初期のバンドサウンドに立ち返った、前向きで透明感のある作風。

彼らの音楽は、眼前の障壁を全て取り去り、
何処までも遠く、過去も未来も見渡せる地平に立たせてくれる。





MacBook Air 2018.

2018-12-01 12:07:59 | デジタル・インターネット


□ MacBook Air 2018 13.3-inch Retina. Space Gray/1.6GHz/8GB/256GB

>> https://www.apple.com/jp/macbook-air/



スペースグレイのAir!✨それ以外に理由はいるかい?😎
プライベートユースには十分すぎる機能性とデザイン思想を備えた、彫刻のような一枚。打鍵感が心地よい。




旧モデルと比較するとこんな感じ。画面は13.3 inchで同じですが、筐体の大きさはこんなにコンパクト💻✨
というか、やはりSpace Grayの発色がとても良い✨曲面の反射にフィットしてる😎