lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

Tesseract.

2018-06-06 06:06:06 | Science News


□ Using a zeta value to transform a shape can connect it to a seemingly unrelated geometric space. Now we have the math to explain why the connection.

>> https://www.quantamagazine.org/three-decades-later-mystery-numbers-explained-20180503/




□ Alevin: An integrated method for dscRNA-seq quantification:

>> https://www.biorxiv.org/content/biorxiv/early/2018/06/01/335000.full.pdf

Alevin is an end-to-end quantification pipeline that starts from sample-demultiplexed FASTQ files and generates gene-level counts for two popular droplet-based sequencing protocols (drop-seq, and 10x-chromium). Alevin enables full, end-to-end analysis for single-cell human experiment consisting of ∼ 4500 cells with 335 Million reads with 13G of RAM and 8 threads (of an Intel Xeon E5-2699 v4 CPU) in 27 minutes.




□ Detecting differential transcription factor activity from ATAC-seq data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/06/315622.full.pdf

Given most eRNAs originate from areas of open chromatin and many transcription factors can alter chromatin accessibility, it is perhaps unsurprising that differential chromatin accessibility can be used to infer changes in TF activity. However, it remains unclear whether the observed alterations of chromatin reflect a distinct functional activity of transcription factors or are simply a side effect of DNA binding and/or altering transcription.




□ Massive single-cell RNA-seq analysis and imputation via deep learning:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/06/315556.full.pdf

scScope is a scalable deep-learning based approach that can accurately and rapidly identify cell-type composition from millions of noisy single-cell gene-expression profiles. A major innovation of scScope is the design of a self-correcting layer. This layer exploits a recurrent network structure to iteratively perform imputations on zero-valued entries of input scRNA-seq data.




□ Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading:

>> https://doi.org/10.1093/bioinformatics/bty380

unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD instructions of modern processors. adding two layers of thread-level parallelization, where a) distribute many independent alignments on multiple threads and b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal.






□ AlbertVilella:

Added 40 PromethION 24fcells to the http://tinyurl.com/ngsspecs worksheet of high-throughput NGS sequencers install base.

>> http://omicsomics.blogspot.co.uk/2018/05/promethion-racing-call-to-post.html

with 48 runs delivering between 50 and over 80 gigabases per flowcell -- so a 30X human genome can be reliably generated with just 2 flowcells. . In the first few hours, data races out at a 2.5Gb/hour clip. PromethION is that the sequencing is so fast 450 bases per second and generate a faster 30X genome by throwing more machines at the problem. If libraries generate 2.5Gb per hour, then one could generate the requisite data in about 3 quarters of an hour.







□ MetaXcan: Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics:

>> https://www.nature.com/articles/s41467-018-03621-1

Colocalized significant associations in unexpected tissues underscore the need for an agnostic scanning of multiple contexts to improve our ability to detect causal regulatory mechanisms. a mathematical expression that allows us to compute the results of PrediXcan without the need to use individual-level data, greatly expanding its applicability. define a general framework (MetaXcan) to integrate eQTL information with GWAS results and map disease-associated genes.






□ ReGAN: RE[LAX|BAR|INFORCE] based Sequence Generation using GANs:

>> https://arxiv.org/pdf/1805.02788.pdf

a comparative study of recent unbiased low variance gradient estimation techniques such as REBAR, RELAX and REINFORCE. The discrete space issue is side-stepped without using gradient estimators by letting the discriminator see a sequence of probabilities over every token in the vocabulary from the generator and a sequence of one-hot vectors from the true data distribution.




□ The variations of human miRNAs and Ising like base pairing models:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/10/319301.full.pdf

the Ising model with nearest neighbour interactions and chemical potential serves as a better descriptor of the miRNA data than the binomial model. there is hardly any difference between the finite and large N Ising models in describing the miRNA data. so the finite number of elements in the string does not play an important role in our approximated description.






□ Enzymatic evolution driven by entropy production:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/13/319814.full.pdf

By diminishing the number of metastable intermediate states, the total entropy produced decreases and consequently the enzyme kinetics and the thermodynamic efficiency are enhanced. Minimizing locally the total entropy produced for an enzymatic process with metastable intermediate states, the kinetics and the thermodynamic efficiency are raised. In contrast, in the absence of metastable intermediate states, a maximum of the entropy produced results in an improvement of the kinetic performance although the thermodynamic efficiency diminishes.




□ GraphSeq: Accelerating String Graph Construction for De Novo Assembly on Spark:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/14/321729.full.pdf

ADAM transforms FASTQ data into alignment record in parquet format. GraphSeq loads all of reads in parallel, generates all suffixes, group with the same initial string into the same partition, and parallelly apply string graph construction algorithm by partitions. GraphSeq is >13X faster than SGA overlap implementation and computes the string graph of the 38X WGS PE data (NA12878 provided by 10X Genomics) in ~2 hours.




□ Beyond SNPs: how to detect selection on transposable element insertions

>> https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12781




□ Bit-parallel sequence-to-graph alignment:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/15/323063.full.pdf

they generalize two linear sequence-to-sequence algorithms to graphs: the Shift-And algorithm for exact matching and Myers’ bitvector algorithm for semi-global alignment. thier bitvector-based graph alignment algorithm reaches a worst case runtime of O(V+[m/w]Elogw)w or acyclic graphs and O(V+mE log w) for arbitrary cyclic graphs.




□ Fast Nonnegative Matrix Factorization and Applications to Pattern Extraction, Deconvolution and Imputation:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/15/321802.full.pdf

the sequential coordinate-wise descent (SCD) to KL-divergence and applied SCD to NMF based on the alternating scheme. Both SCD and Lee’s multiplicative algorithms with square error loss have complexity of (O 􏰀(m+n)k^2Ni+2nmk))No)􏰁, while their KL counterparts have complexity of O 􏰀nmk^2NiNo􏰁.




□ Ularcirc: Visualisation and enhanced analysis of circular RNAs via back and canonical forward splicing:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/15/318436.full.pdf

Ularcirc, the first software tool that provides a complete circRNA workflow from detection, integrated visualization, quality filtering of BSJ and forward splicing junctions (FSJ), through to sequence retrieval and downstream functional analysis. Ularcirc uses an innovative method to filter out false positive circRNAs coined read alignment distribution (RAD) score which allows detection of circRNAs independent of gene annotations.




□ Annotation of phenotypes using ontologies: a Gold Standard for the training and evaluation of natural language processing systems:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/15/322156.full.pdf



□ A probabilistic and multi-objective analysis of lexicase selection and ε-lexicase selection:

>> https://www.mitpressjournals.org/doi/abs/10.1162/evco_a_00224?journalCode=evco

the ε-lexicase outperforms several diversity-maintenance strategies on a number of real-world and synthetic regression problems. expanding upon the relation of lexicase selection to many-objective optimization methods to describe the behavior of lexicase selection, which is to select individuals on the boundaries of Pareto fronts in high-dimensional space.




□ GRIMM: GRaph IMputation and Matching for HLA Geno-types:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/16/323493.full.pdf

Using graph traversal, our algorithm runtime grows slowly with registry size. This implementation generates results that agree with consensus output on a publicly-available match algorithm cross- validation dataset.




□ scVAE: Variational auto-encoders for single-cell gene expression data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/16/318295.full.pdf

For the full PBMC data set, a high-dimensional latent space of 100 dimensions result in the highest test marginal log-likelihood lower bound, whereas for the two smaller subsets, a lower-dimensional latent space of 25 dimensions gave the best lower bound. scVAE has support for several count likelihood functions and a variant of the variational auto-encoder has a priori clustering in the latent space.






□ LUNA DNA: "Blockchain Genomics" series an interview with luna_dna co-founder Dawn Barry.

>> http://sandiegomics.com/?p=241
>> https://www.lunadna.com






□ LncADeep: An ab initio lncRNA identification and functional annotation tool based on deep learning:

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/bty428/5021677

LncADeep integrates intrinsic and homology features into a deep belief network and constructs models targeting both full- and partial-length transcripts. For functional annotation, LncADeep predicts a lncRNA’s interacting proteins based on deep neural networks, using both sequence and structure information.






□ From genome-wide associations to candidate causal variants by statistical fine-mapping:

>> https://www.nature.com/articles/s41576-018-0016-z

including interpreting results from genome-wide association studies (GWAS), the role of linkage disequilibrium, statistical fine-mapping approaches, trans-ethnic studies, genomic annotation and data integration, and other analysis and design issues.




□ Sequanix: A Dynamic Graphical Interface for Snakemake Workflows:

>> https://academic.oup.com/bioinformatics/article/34/11/1934/4817647

a PyQt graphical user interface -Sequanix- aimed at democratizing the use of Snakemake pipelines in the NGS space and beyond. By default, Sequanix includes Sequana NGS pipelines (Snakemake format), and is also capable of loading any external Snakemake pipeline. The Snakemake framework scales without modification, from single and multi-core workstations to cluster engines.






□ The evolutionary game theory of interspecific mutualism in multi-species communities:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/30/335133.full.pdf

Mutualistic interspecific interactions, incl Mu ̈llerian mimicry & division of labor, are common in nature. In contrast to antagonistic interactions, where faster evolution is favored, mutualism can favor slower evolution under some conditions. This is called the Red King effect. a correlation does not always exist between the evolutionary rates and the probability that the evolutionary dynamics converge to a favorable equilibrium for each species.




□ ClusterMap: Compare analysis across multiple Single Cell RNA-Seq profiling:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/30/331330.full.pdf




□ ProximID: Mapping the physical network of cellular interactions:

>> https://www.nature.com/articles/s41592-018-0009-z

ProximID, an approach for building a cellular network based on physical cell interaction and single-cell mRNA sequencing, and show that it can be used to discover new preferential cellular interactions without prior knowledge of component cell types.






□ SCHiRM: Single Cell Hierarchical Regression Model to detect dependencies in read count data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/31/335695.full.pdf

using the Poisson-log normal distribution and, by means of our hierarchical formulation, detect the dependencies between genes using linear regression model for the latent, cell-specific gene expression rate parameters. The hierarchical formulation allows us to model count data without artificial data transformations and makes it possible to incorporate normalization information directly into the latent layer of the model.




□ cytometree: a binary tree algorithm for automatic gating in cytometry analysis:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/31/335554.full.pdf

Gating in cytometree is basically done through recursive thresholding of marginal densities based on the assumption that cells express or do not express certain markers, leading to bimodality.




□ Principal Component Analysis applied directly to Sequence Matrix:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/31/336115.full.pdf

the differences among samples and bases that contribute to the difference should be observed coincidentally. To archive this, the sequence matrix is transferred to boolean vector and directly analyzed by using PCA.




□ Identification of biological mechanisms by semantic classifier systems:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/31/335737.full.pdf

a semantic approach to the issue: a multi-classifier system which incorporates existing biological knowledge and returns interpretable models based on these high-level semantic terms. The individual predictions of the semantic base classifiers are merged on a symbolic level by a late-aggregation strategy.




□ Lightweight bioinformatics: evaluating the utility of Single Board Computer (SBC) clusters for portable, scalable Real-Time Bioinformatics in fieldwork environments via benchmarking:

>> https://www.biorxiv.org/content/biorxiv/early/2018/06/02/337212.full.pdf

Briefly DNA was extracted from fresh plant tissue 90 using commercial kits (Qiagen DNEasy Plant Miniprep) and whole genome shotgun libraries were prepared for MinION R9 and R9.5 chemistry using rapid (SQK-RAD001/RAD003) 92 protocols and kits.






□ R2C2: Improving nanopore read accuracy enables the sequencing of highly-multiplexed full-length single-cell cDNA:

>> https://www.biorxiv.org/content/biorxiv/early/2018/06/04/338020.full.pdf

Rolling Circle Amplification to Concatemeric Consensus (R2C2) method generates more accurate reads of full-length RNA transcript isoforms than any other available long-read sequencing method. The resulting raw reads are split into subreads containing full-length or partial cDNA sequences, which are combined into an accurate consensus sequences using C3POa workflow which relies on a custom algorithm to detect DNA splints as well as poaV2 and racon.




□ An Integrative Boosting Approach for Predicting Survival Time With Multiple Genomics Platforms:

>> https://www.biorxiv.org/content/biorxiv/early/2018/06/04/338145.full.pdf

To assess whether the genomic variables provide extra predictive power in the presence of the clinical variables, compared the values of the C-index obtained by I-Boost-CV or I-Boost-Permutation under the model with clinical variables only and both clinical & genomic variables.





Planet 9.

2018-06-06 06:05:06 | Science News



□ zetadiv: an R package for computing compositional change across multiple sites, assemblages or cases:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/17/324897.full.pdf

Using orders of zeta beyond pairwise comparisons enables to further refine the uncertainty level of the remaining clusters by distinguishing between clusters with low (i.e. superficial) and high similarity for higher orders of zeta.




□ DNAnexus validated deep learning methods using DeepVariant and Clairvoyante on BGI-SEQ data in a http://SV.AI hackathon

>> http://ow.ly/KmzI30khbQ1






□ Full Bayesian comparative phylogeography from genomic data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/17/324525.full.pdf

If we have N pairs of populations, we would like to assign them to an unknown number of divergence events, k, which can range from one to N. For a given number of divergence events, the Stirling number of the second kind tells us the number of ways of assigning the taxa to the divergence times (i.e., the number of models with k divergence-time parameters).




□ Repression of divergent noncoding transcription by a sequence-specific transcription factor:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/17/314310.full.pdf

a sequence-specific transcription factor limits access of basal transcription machinery to regulatory elements and adjacent sequences that act as divergent cryptic promoters, thereby providing directionality towards productive transcription.




□ Synthetic STARR-seq reveals how DNA shape and sequence modulate transcriptional output and noise:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/18/325910.full.pdf

synSTARR-seq approach to systematically analyze how variations in the recognition sequence of the glucocorticoid receptor (GR) affect transcriptional output. This resulted in the identification of a novel highly active GR binding sequence and revealed that sequence variation both within and flanking GR's core binding site modulate its activity without apparent changes in DNA binding affinity.




□ starmap: Immersive visualisation of single cell data using smartphone-enabled virtual reality:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/17/324855.full.pdf
>> https://vccri.github.io/starmap/

starmap is a web-based VR-enabled tool which combines a 3D scatter plot with star plots (radar chart) to visualise hundreds of thousands of multivariate data points, such as single-cell expression data. a scalable visual design that combines the benefit of a three-dimensional scatter plot for exploring clustering structure and the benefit of star plots (also known as radar chart) for multivariate visualisation of an individual cell and designed to utilise low-cost VR headsets.




□ FALCON-Phase: Integrating PacBio and Hi-C data for phased diploid genomes:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/21/327064.full.pdf

FALCON-Phase, a new method that resolves phase-switches by reconstructing contig-length phase blocks using Hi-C short-reads mapped to both homozygous regions and phase blocks. Such Hi-C data contain ultra-long-range phasing information. FALCON-Phase is 96% accurate, suggesting that Hi-C proximity information can be used to correct nearly all haplotype switches along FALCON-Unzip primary contigs and replace the requirement of parental genotype information. The FALCON-Phase pipeline can also be applied to scaffolds to produce chromosome-scale phased diploid genome assemblies.






□ A population genetic interpretation of GWAS findings for human quantitative traits:

>> http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2002985

the model stabilizing selection in a multidimensional phenotype space, akin to Fisher’s geo-metric model. An individual’s phenotype is a vector in an n-dimensional Euclidian space, in which each dimension corresponds to a continuous quantitative trait. The directions of mutations are assumed to be isotropic, i.e., uniformly distributed on the hypersphere in n-dimensions defined by their size, the results are robust to relaxing this assumption as well.




□ Reinforced Adversarial Neural Computer for De Novo Molecular Design:

>> https://pubs.acs.org/doi/10.1021/acs.jcim.7b00690

RANC (Reinforced Adversarial Neural Computer) is a deep neural network architecture for the de novo design of novel small-molecule organic structures based on generative adversarial network (GAN) paradigm and reinforcement learning. As a generator RANC uses a Differentiable neural computer (DNC), a category of neural networks, with increased generation capabilities due to the addition of an explicit memory bank, which can mitigate common problems found in adversarial settings.




□ EternaBrain: Automated RNA design through move sets from an Internet-scale RNA videogame:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/21/326736.full.pdf

When pipelined with hand-coded move combinations developed by the Eterna community, the resulting EternaBrain method solves 61 out of 100 independent RNA design puzzles in the Eterna100 benchmark. EternaBrain surpasses all six other prior algorithms that were not informed by Eterna strategies and suggests a path for automated RNA design to achieve human-competitive performance.




□ NASA Sends New Research on Orbital ATK Mission to Space Station

>> https://www.nasa.gov/press-release/nasa-sends-new-research-on-orbital-atk-mission-to-space-station

Astronauts soon will have new experiments to conduct related to emergency navigation, DNA sequencing and ultra-cold atom research when the research arrives at the International Space Station following the 4:44 a.m. EDT Monday launch of an Orbital ATK Cygnus spacecraft.






□ DISSEQT - DIStribution based modeling of SEQuence space Time dynamics:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/21/327338.full.pdf

DISSEQT is centered around robust dimension and model reduction algorithms for analysis of genotypic data with additional capabilities for including phenotypic features to explore dynamic genotype-phenotype maps. Nonmetric multidimensional scaling using Kruskal’s stress criterion was used rather than classical multidimensional scaling in the final step of the Isomap algorithm.




□ The EVcouplings Python framework for coevolutionary sequence analysis:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/21/326918.full.pdf

The key prerequisite to infer evolutionary couplings is a high-quality sequence alignment of sufficient evolutionary depth. EVcouplings supports the generation of alignments using jackHMMER and HMMsearch, as well as the import of externally computed alignments. Since the identification of an optimal evolutionary depth can be challenging, the application supports the parallel exploration of different sequence inclusion thresholds, provides flexible alignment filtering parameters,and calculates summary statistics to assess alignment qual.




□ A data mining paradigm for identifying key factors in biological processes using gene expression data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/21/327478.full.pdf

a data processing paradigm to identify key factors in biological processes via systematic collection of gene expression datasets, primary analysis of data, and evaluation of consistent signals. selected a top gene for loss-of-function experimental validation and confirmed its function in epidermal differentiation, proving the ability of this paradigm to identify new factors in biological processes.






□ A Generative Bayesian Approach for Incorporating Biosurveillance Sources into Epidemiological Models:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/22/328518.full.pdf

Most emerging data sources come without a large corpus of historical data to be used to train them, and they do not allow the incorporation of prior or system knowledge to capture the interrelationships between data sources. To remedy this, the current work built a generative model for information fusion of diverse data types. This uses a multilevel Bayesian “information model” that enables easy expansion to include new biosurveillance data sources. This information model is used to estimate the inputs of a standard compartmental disease model, and this combination of a theory-based SEIR model and a statistical Markov-chain Monte Carlo model has advantages for a variety of applications.




□ An analysis and comparison of the statistical sensitivity of semantic similarity metrics:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/22/327833.full.pdf

a statistical sensitivity comparison of 5 semantic similarity metrics (Jaccard, Resnik, Lin, Jiang&Conrath, Hybrid Relative Specificity Similarity) representing 3 different kinds of metrics (Edge based, Node based, Hybrid) and explore key parameter choices can impact sensitivity. To evaluate sensitivity in a controlled fashion, they explore two different models for simulating data with varying levels of similarity and compare to the noise distribution using resampling. This increases the confidence in the generality of the results, although thier evaluation is limited to one context, comparison among profiles sampled from among the Entity-Quality phenotype annotations in the Phenoscape KB.




□ Pore-C: long-read chromatin conformation capture, collaboration with Imielinski lab at NYGC:

>> https://nanoporetech.com/resource-centre/posters/pore-c-using-nanopore-reads-delineate-long-range-interactions-between #nanoporeconf




□ plyranges: A grammar of genomic data transformation:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/23/327841.full.pdf

An example of the former is the Genome Query Language (GQL) and its distributed implementation GenAp which use an SQL-like syntax for fast retrieval of information of unprocessed sequencing data. Similarly, the Genometric Query Language (GMQL) implements a relational algebra for combining genomic datasets. they have created a genomic DSL (Several domain specific language) called plyranges that reformulates notions from existing genomic algebras and embeds them in R as a genomic extension of dplyr.




□ MXP: Modular eXpandable framework for building bioinformatics Pipelines:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/23/329110.full.pdf

developed MXP and tested it on various tasks in our organization, primarily for building pipelines for GWAS (Genome-Wide Association Studies) and post-GWAS analysis. At least two languages are involved into construction of a tool for building pipelines: first, implementation language (which may be a combination of languages) and domain specific language (DSL), which is used to specify a pipeline.






nanopore:
CB: We are planning to integrate MinION and MinIT for a total sequencing and analysis solution – MinION Mk1c #nanoporeconf


□ metrichor:
CB: Lots of changes afoot for Metrichor’s EPI2ME platform including running locally on MinIT/GridION/PromethION #nanoporeconf




□ crossword: A data-driven simulation language for the design of genetic-mapping experiments and breeding strategies:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/25/330563.full.pdf

crossword is a domain-specific language, it allows complex and unique simulations to be performed, but the language is supported by a graphical interface that guides users through functions and options. crossword’s utility in QTL-seq design, where its output accurately reflects empirical data. “QTN_random” phenotyping method was used to sample 60 QTNs for each one of 10 iterations, and the QTN effect was assigned using gamma distribution.




□ OUTRIDER: A statistical method for detecting aberrantly expressed genes in RNA sequencing data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/24/322149.full.pdf

The algorithm uses an autoencoder to model read count expectations according to the co-variation among genes resulting from technical, environmental, common genetic variations and a statistical test based on a NB distribution. The dimension of the autoencoder was fitted using a scheme in which artificially corrupted read counts were injected and presented as the input to the autoencoder, maximizing the likelihood of the original, uncorrupted data. The optimal encoding dimension is obtained by assessing the autoencoder performance in correcting corrupted data, artificially introduce corrupted read counts randomly w/ a probability of 10^-2 shifting the true read counts 2 standard deviations on a log scale randomly up / down.




□ HaploVectors: an integrative analytical tool for phylogeography:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/25/330761.full.pdf

HaploVectors is presented as an R package for computing haplotypic eigenvectors and performing null model-based tests. Investigation of HaploVectors using empirical datasets showed that the method is useful to uncover hidden patterns of haplotypic distribution. HaploVectors, a flexible tool that allows exploring phylogeographical patterns and discriminating biogeographic, neutral and environmental factors acting to shape genetic distribution across space.




□ Solving scaffolding problem with repeats:

>> http://biorxiv.org/cgi/content/short/330472v1




□ REVA: a rank-based multi-dimensional measure of correlation:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/25/330498.full.pdf

a nonparametric statistic, REVA, inspired by the Kendall rank correlation coefficient using U-statistic theory to derive the asymptotic distribution of the new correlation coefficient, developing additional large and finite sample properties along the way. Using a very different, generalized approach to the same problem, introduced the Hilbert-Schmidt independence Criterion, a kernel dependence test in multidimensional Euclidean spaces, building on earlier kernel methods like N-distances.




□ Temporal alignment and latent Gaussian process factor inference in population spike trains

>> http://biorxiv.org/cgi/content/short/331751v1

Scalable GPFA extension that operates directly on unbinned spike times + time-warping using nested Gaussian Processes.




□ MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms:

>> https://academic.oup.com/mbe/article-abstract/35/6/1547/4990887




□ The structure of the genetic code as an optimal graph clustering problem:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/28/332478.full.pdf

the genetic code as a partition of an undirected and unweighted graph, which makes the model general and universal. the structure of the genetic code is a solution to the graph clustering problem. Despite the fact that the standard genetic code is far from being optimal according to the conductance, its structure is characterised by many codon groups reaching the minimum conductance for their size.




□ Towards a Dynamic Interaction Network of Life to unify and expand the evolutionary theory:

>> https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-018-0531-6

Earth evolves and makes sense in isolation, challenging the key assumption of the Modern Synthesis framework that targeting the individual gene or organism (in principle knowing that it is part of a set of complex interactions) allows to capture evolution in all its dimensions. Considering transient collectives as stable entities at a given time-scale, when the collectives change much more slowly than the process in which take part, amounts to a focus on interactions occurring at a given time scale by treating the slower dynamics as stable edges/nodes.




□ Thousands of large-scale RNA sequencing experiments yield a comprehensive new human gene list and reveal extensive transcriptional noise:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/28/332825.full.pdf

The new human gene database contains 43,162 genes, of which 21,306 are protein-coding and 21,856 are noncoding, and a total of 323,824 transcripts, for an average of 7.5 transcripts per gene. Their expanded gene list includes 4,998 novel genes (1,178 coding and 3,819 noncoding) and 97,511 novel splice variants of protein-coding genes as compared to the most recent human gene catalogs. Alignment and assembly. In total, the 9,795 RNA-seq samples contain 899,960,113,026 reads (449,980,056,513 pairs), an average of 91.9 million reads (46M pairs) per sample.




□ Continuous visualization of differences between biological conditions in single-cell data:

>> http://biorxiv.org/cgi/content/short/337485v1






Quadrilateral.

2018-06-03 03:03:03 | Science News


□ Edward Witten / "Notes on Some Entanglement Properties of Quantum Field Theory":

>> https://arxiv.org/pdf/1803.04993.pdf

The main goal is to explain how to deal with entanglement when -- as in quantum field theory -- it is a property of the algebra of observables and not just of the states.

The infinite-dimensional case becomes essentially different from a finite-dimensional matrix algebra when one considers the behavior of ∆isΨ when s is no longer real. For a matrix algebra, there is no problem; ∆izΨ = exp(iz log ∆Ψ) is an entire matrix-valued function of z. In quantum field theory, ∆Ψ is unbounded and the analytic properties of ∆izΨχ for a state χ depend very much on χ.




□ Edward Witten / "A Mini-Introduction To [Quantum] Information Theory":

>> https://arxiv.org/pdf/1805.11965.pdf

Basic properties of the classical Shannon entropy and the quantum von Neumann entropy are described, along with related concepts such as classical and quantum relative entropy, conditional entropy, and mutual information.



how many bits of information can Alice send to Bob by sending a quantum system X with a k-dimensional Hilbert space H? Alice cannot encode more than logk bits of classical information in an k-dimensional quantum state, though it takes strong subadditivity (or equivalents) to prove this.




□ Stochastic Zeroth-order Optimization via Variance Reduction method:

>> https://arxiv.org/pdf/1805.11811v1.pdf

a novel Stochastic Zeroth-order method with Variance Reduction under Gaussian smoothing (SZVR-G) and establish the complexity for optimizing non-convex problems. With variance reduction on both sample space and search space, the complexity of our algorithm is sublinear to d and is strictly better than current approaches, in both smooth and non-smooth cases. SZVR-G algorithm is more efficient than both RGF and RSG in canonical logistic regression problem and successfully apply this algorithm to a real black-box adversarial attack problem that involves high-dimensional zeroth order optimization.




□ Root-cause Analysis for Time-series Anomalies via Spatiotemporal Graphical Modeling in Distributed Complex Systems:

>> https://arxiv.org/pdf/1805.12296v1.pdf

formulate the sequential state switching (S3, based on free energy concept of a restricted Boltzmann machine, RBM) and artificial anomaly association (A3, a classification framework using deep neural networks, DNN). S3 and A3 approaches can obtain high accuracy in root-cause analysis under both pattern-based and node-based fault scenar- ios, in addition to successfully handling multiple nominal operating modes.






□ Dimension Reduction and Visualization for Single-copy Alignments via Generalized PCA.:

>> https://www.biorxiv.org/content/biorxiv/early/2018/06/04/338442.full.pdf

the application of multiple correspondence analysis (MCA) directly to the sequence characters. p−dimensional single-copy DNA can be trans- formed into coordinates in genetic space, analo- gous to the way in which diploid DNA is trans-formed via PCA. The new vectors are ordered by the amount of variability explained by each ‘principal dimension’. Often the first few dimensions are used to visualise points in the new transformed space.




□ GenotypeTensors: Efficient Neural Network Genotype Callers:

>> https://www.biorxiv.org/content/biorxiv/early/2018/06/05/338780.full.pdf

Clairvoyante is described as a convolutional network and therefore also must convert alignment data to three dimensional tensors. They hypothesize that so-far unnoticed software implementation problems in available code bases, and/or insufficient hyper-parameter tuning for Clairvoyante, could be responsible for the differences observed in model performance, rather than being driven only by differences in model architecture and differences in representation of aligned reads in each genomic context.




□ A fast mrMLM algorithm for multi-locus genome-wide association studies:

>> https://www.biorxiv.org/content/biorxiv/early/2018/06/07/341784.full.pdf

the accelerated mrMLM algorithm by using GEMMA idea, matrix transformations and identities. The target functions and derivatives in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy & efficient to evaluate during optimization step. All potentially associated QTNs with P-values ≤ 0.01 are evaluated in a multi-locus model by LARS algorithm and/or EM-Empirical Bayes.




□ SVCollector: Optimized sample selection for validating and long-read resequencing of structural variants:

>> https://www.biorxiv.org/content/biorxiv/early/2018/06/08/342386.full.pdf

For the topN mode, it picks samples with the largest number of SVs irrespective if the SVs are shared with other samples. For the greedy mode, it finds a set of samples that collectively contain the largest number of distinct variants. hey assessed the performance of SVCollector based on 4,424 human genomes from the Center for Common Disease Genetics (CCDG) freeze 1 dataset composed of 425,500 SVs identified with SURVIVOR.




□ An Implementation of Empirical Bayesian Inference and Non-Null Bootstrapping for Threshold Selection and Power Estimation in Multiple and Single Statistical Testing:

>> https://www.biorxiv.org/content/biorxiv/early/2018/06/08/342964.full.pdf

The new implementation eliminates the need for parameter tuning (especially by using AIK for GMD fitting) and allows the method to be used in broader range of conditions. Importantly, the statistical power is explicitly estimated and made available for inference. using an implementation of EBI using non-parametric test statistics, Gaussian Mixture Models and null bootstrapping. This implementation readily handles one-sample, two-sample and correlation problems in multi-dimensional data with arbitrary distributions.




□ Co-fuse: a new class discovery analysis tool to identify and prioritize recurrent fusion genes from RNA-sequencing data:

>> https://link.springer.com/article/10.1007%2Fs00438-018-1454-1

Co-fuse can perform two or more groups comparison analysis to identify significant over-represented recurrent fusion genes that are associated with a particular group using the combination of pattern mining and statistical analysis. The Recursive Partitioning and Regression Trees (rpart) algorithm was used to further prioritise the recurrent fusion genes result obtained from the two groups comparison.




□ DataPackageR: Reproducible data preprocessing, standardization and sharing using R/Bioconductor for collaborative data analysis.:

>> https://www.biorxiv.org/content/biorxiv/early/2018/06/08/342907.full.pdf

The principle behind the tool is that it remains a lightweight and non-intrusive framework that easily plugs into most R-based data analytic work-flows. It places few restrictions on the user code therefore most existing scripts can be ported to use the package. It also builds boilerplate roxygen documentation of the R objects specified in the .yml, computes checksums of stored R objects and version tags the entire data set collection.

yml_find <- function(path) {
path <- normalizePath(path)
config_yml <- is_r_package$find_file("datapackager.yml", path = path)
if (!file.exists(config_yml)) {
stop("Can't find a datapackager.yml config at ",
dirname(config_yml),
call. = FALSE)






□ GLnexus: joint variant calling for large cohort sequencing:

>> https://www.biorxiv.org/content/biorxiv/early/2018/06/11/343970.full.pdf

Joint-calling with the cohort sharded across compute nodes will enable GLnexus to scale up to any N foreseeable with the gVCF/pVCF data model for short-read sequencing. a standalone open-source version of GLnexus and a DNAnexus cloud-native deployment supporting very large projects, which has been employed for cohorts of >240,000 exomes and >22,000 whole-genomes.






□ Reconciling Multiple Genes Trees via Segmental Duplications and Losses:

>> https://arxiv.org/pdf/1806.03988v1.pdf

the problem is polynomial-time solvable when δ≤λ (via LCA-mapping), while if δ>λ the problem is NP-hard, even when λ=0 and a single gene tree is given, solving a long standing open problem on the complexity of the reconciliation problem. On the positive side, give a fixed-parameter algorithm for the problem, where the parameters are δ/λ and the number d of segmental duplications, of time complexity O(⌈δ/λ⌉d⋅n⋅δ/λ).






□ φ-evo: A program to evolve phenotypic models of biological networks:

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006244

illustrate the predictive power of φ-evo by first recovering the asymmetrical structure of the lac operon regulation from an objective function with symmetrical constraints. Simulations are run in a deterministic mode, and both Euler and a Runge-Kutta integrators are available in the program. An option to run equations in a stochastic mode using τ-leaping algorithm (a biochemical numerical generalization of the Langevin equation) is also included.




□ SSCC: a computational framework for rapid and accurate clustering of large-scale single cell RNA-seq data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/06/11/344242.full.pdf

simpler single cell RNAseq data clustering (sscClust), is a package implement mutilple functionalities which are basic procedures in single cell RNAseq data analysis, including variable genes identification, dimension reduction, clustering on reduced data. The few positive ΔNMI values were attributed to poor clustering accuracy with total cells, most of which were related to the k-medoids algorithm. To absolutely eliminate the influence of the selection of clustering algorithms, they added an oracle-clustering algorithm.






□ BALLI: Bartlett-Adjusted Likelihood-based LInear Model Approach for Identifying Differentially Expressed Gene with RNA-seq Data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/06/12/344929.full.pdf




□ DIvERGE that precisely mutates long genomic segments up-to 1,000,000-times faster than non-targeted regions

>> http://www.pnas.org/content/early/2018/05/30/1801646115

DirectedEvolution of multiple genomic loci allows the prediction of antibiotic resistance




□ Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads:

>> https://www.biorxiv.org/content/biorxiv/early/2018/06/13/345876.full.pdf

The reads recovered contain more genetic variants compared to previously aligned reads, indicating that divergence between personal and reference genome plays a role in the false-negative non-alignment problem. the majority of genes with >1 fold change in expression after recovery are of pseudogenes category, indicating that pseudogenes expression can be substantially affected by the false-negative non-alignment problem.






□ Matchmaker Exchange now connects seven genomic matchmakers and two knowledge sources.

>> http://www.matchmakerexchange.org/i_am_a_clinician_laboratory.html

Have a candidate gene? Enter your case into one of the connected databases which allows you to query the Matchmaker Exchange network for a match.







□ AQ-seq: Accurate quantification of microRNAs and their variants:

>> https://www.biorxiv.org/content/biorxiv/early/2018/06/05/339606.full.pdf

AQ-seq diminishes the ligation bias of sRNA-seq. AQ-seq detects miRNAs of low abundance and reliably defines the terminal sequences of miRNAs undetected when using the conventional sRNA-seq method. AQ-seq incorporates RNA spike-in controls that consist of 30 exogenous RNA molecules. Use of the spike-ins allows us to monitor ligation bias and detection sensitivity.




□ Common Disease Is More Complex Than Implied by the Core Gene Omnigenic Model:

>> https://www.cell.com/cell/fulltext/S0092-8674(18)30714-1






□ RAMODO: Learning Representations of Ultrahigh-dimensional Data for Random Distance-based Outlier Detection:

>> https://arxiv.org/pdf/1806.04808.pdf

RAMODO learns a representation function f (·) to map D-dimensional input objects into a M-dimensional space, with M ≪ D. RAMODO unifies representation learning and outlier detection to learn a small set of features that are tailored for the random distance-based detectors. on eight real-world ultrahigh dimensional data sets show that REPEN enables a random distance-based detector to obtain significantly better AUC performance and two orders of magnitude speedup and leverages less than 1% labeled data to achieve up to 32% AUC improvement.





Mu-so Qb.

2018-06-02 11:22:55 | music16


□ Naim Audio 『Mu-so Qb』

>> https://www.naimaudio.com/ja/mu-so-qb

Mu-so Qb is the compact wireless music system from the engineers behind the award-winning Mu-so. Controlled by a powerful audio brain, Mu-so Qb is alive with custom-made features. From the contours of the glass-filled polymer casing to the bass radiators that help create huge low frequencies – every millimetre of space has been used to great effect. Mu-so Qb delivers a staggering 300 watts of power that unmasks your music with a sound that defies size.



英国の高級オーディオブランド、Naim Audioの"Mu-so Qb"を購入。キューブ型でこの音が出るのか!という驚きのサウンド。部屋のどこにいても、音像の立方体に包まれる感じ。iOSによるリモート、UPnP、Hi-Res音源やTIDALなどのストリーミングにも対応。

デザイン重視で買ったのだけど、この型にして低域の再現率に感動。筐体上部のリング型タッチパネルも直感的で近未来風。ハイレゾ音源では、ヴァスクスの『沈黙の果実(Voces8)』やChano Domingues Trioの『Con Alma』を聴いてみたが、確かに音の間隙に空気を絡めとるようで違いがある。勿論アナログVinylの再生にも向いてる。






Mu-so Qbでnaim radioにずっと聴き入っている。naim labelの抱えるインデペンデントアーティストの音源を掛け流してるのだけど、クラシックからカントリー、ジャズまで幅広く扱いながらも、naim audio独自の美学が一貫されており、ある種のコンセプトアルバムとして聴ける。


□ naim radio

>> https://www.naimaudio.com/radio/player




□ Phantom Limb / "The Hard Way" (naim records)





□ Antonio Forcione / "Touch Wood" (Naim Records)







Enigma / "The Colours of Enigma - The Vinyl Series"

2018-06-02 10:44:40 | Enigma


□ Enigma / "The Colours of Enigma - The Vinyl Series" (Limited Edition)



>> https://www.universal-music.de/enigma/videos/the-colours-of-enigma-vinyl-trailer-464773

Release Date; 4. May / 2018
Label; Universal
Format: Vinyl

Produced by Michael Cretu
Remastered for vinyl: MM Sound Digital Mastering Studio GmbH, 2018
Artwork: Dirk Rudolph, 2018



A stunning re-edition of all iconic Enigma albums on 180gr coloured deluxe vinyl.

Michel Cretu: "MC: "Everybody wants to know: What is behind the curtain? If you are on one side, you can see, what happens. If you are on the other side, you will be seen.."


マスターテープ音源から新たに起した全9枚のVinyl盤シリーズ(限定生産)をコンプリート。Enigma = Michael Cretuのメモラビリアでもある。アナログ音源は階調性が豊かで、全ての音が前面に張り出して来る印象。6th以降の有機的な作風にはしっくり来る。




_*