lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

Co=Factor

2017-12-17 22:40:02 | Science News



□ Riemannian Stein Variational Gradient Descent for Bayesian Inference

>> https://arxiv.org/abs/1711.11216

Riemannian Stein Variational Gradient Descent (RSVGD), which has many non-trivial considerations beyond SVGD and requires novel treatments. derive the Riemannian counterpart of the directional derivative, then conceive a novel technique to find the functional gradient, in which case the SVGD technique fails. Finally express RSVGD in the embedded space of the manifold and give an instance for hyperspheres, which is directly used for its application.






□ Aging in a relativistic biological space-time:

>> https://www.biorxiv.org/content/biorxiv/early/2017/12/05/229161.full.pdf

extend the concepts of physical space and time to an abstract, mathematically-defined space as the Cartesian product of manifolds and sub-manifolds, which they associate with a concept of “biological space-time” in which biological clocks operate.






□ Three-way clustering of multi-tissue multi-individual gene expression data using constrained tensor decomposition:

>> https://www.biorxiv.org/content/biorxiv/early/2017/12/05/229245.full.pdf

applied MultiCluster to identify 3-way blocks in noisy expression tensors simulated from three different cluster models: additive-, multiplicative-, and combinatorial-mean models. Through simulation and application to the GTEx RNA-seq data, this tensor decomposition identifies three-way clusters with higher accuracy, while being 11x faster, than the competing Bayesian method.

Both MultiCluster and SDA are built upon the Canonical Polyadic (CP) decomposition which decompose a tensor into a sum of rank-1 matrices, whereas HOSVD decomposes a tensor into a core tensor multiplied by an orthogonal matrix in each mode.






□ DeepVariant: Highly Accurate Genomes With Deep Neural Networks:

>> https://research.googleblog.com/2017/12/deepvariant-highly-accurate-genomes.html

The cost-optimized version of DeepVariant calls an aligned 30x genome in 3-4 hours for approximately $8-9 in cloud costs, and an aligned exome in a little over an hour for approximately $0.70.

PROJECT_ID=[your alphanumeric project ID]
OUTPUT_BUCKET=gs://OUTPUT_BUCKET
STAGING_FOLDER_NAME=[a unique alphanumeric name for each run]
OUTPUT_FILE_NAME=output.vcf
MODEL=gs://deepvariant/models/DeepVariant/0.4.0/DeepVariant-inception_v3-0.4.0+http://cl-174375304.data -wgs_standard






□ Dynamical compensation and structural identifiability of biological models: Analysis, implications, and reconciliation:

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005878

DC-Id captures the biological meaning of the dynamical compensation phenomenon, which is the invariance of the dynamics of certain state variables of interest with respect to changes in the values of certain parameters. STRIKE-GOLDD (STRuctural Identifiability taKen as Extended-Generalized Observability with Lie Derivatives and Decomposition) is a methodology and a tool for structural identifiability analysis which can handle nonlinear systems of a very general class, incl. non-rational ones.




□ BioModels performance boost, new model formats

>> https://www.ebi.ac.uk/about/news/announcements/biomodels-performance-boost-new-model-formats

BioModels now supports models built using a wider variety of modelling software and formats and standards, including Python, Mathematica, Matlab SimBiology, etc., as well as SBML and CellML.




apwiita:
Fascinating, shedding light on the Dark Proteome: existing genes may harbor additional translation start sites leading to thousands of short "alternative proteins"

□ Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins.

>> https://elifesciences.org/articles/27860

alternative proteins with signal peptides (SP) and/or transmembrane domains (TM) predicted by at least two of the three SignalP, PHOBIUS, TMHMM tools and alternative proteins with other signatures. The GO terms assigned to alternative proteins with InterPro entries were grouped and categorized into 13 classes within the three ontologies (cellular component, biological process, molecular function) using the CateGOrizer tool.




□ SpISO-seq: Microfluidic isoform sequencing shows widespread splicing coordination in the human transcriptome.

>> http://genome.cshlp.org/content/early/2017/12/01/gr.230516.117.abstract

sparse isoform sequencing (spISO-seq), sequences 100k-200k partitions of 10-200 molecules at a time, enabling analysis of 10-100 million RNA molecules. SpISO-seq requires less than 1ng of input cDNA, limiting or removing the need for prior amplification w/ its associated biases.


hagentilgner:
Our linked read isoform sequencing approach is out: Lots of coordination of distant splicing events (defined by separate chemical reactions) - you can only see this with long-reads.
#CornellRNAbiology2017 @10xgenomics #transcriptomics #genomics




□ Reevaluation of SNP heritability in complex human traits

>> https://www.nature.com/articles/ng.3865?

empirically derive a model that more accurately describes how heritability varies with minor allele frequency (MAF), linkage disequilibrium (LD) and genotype certainty.




□ HOME: A histogram based machine learning approach for effective identification of differentially methylated regions:

>> https://www.biorxiv.org/content/biorxiv/early/2017/12/02/228221.full.pdf

a Histogram Of MEthylation (HOME) based method that exploits the inherent difference in distribution of methylation levels between DMRs and non-DMRs to robustly discriminate between the two via a linear Support Vector Machine. HOME is a highly effective and robust DMR finder that accounts for uneven cytosine coverage in WGBS data, predicts DMRs in various genomic contexts, and accurately identifies DMRs among any number of treatment groups in experiments with or without replicates.




□ GrandPrix: Scaling up the Bayesian GPLVM for single-cell data:

>> https://www.biorxiv.org/content/biorxiv/early/2017/12/03/227843.full.pdf

this model is motivated by the DeLorean approach and uses cell capture time to specify a prior over the pseudotime. extending the model to higher-dimensional latent spaces that can be used to simultaneously infer pseudotime and other structure such as branching.




□ Qvella raises US$20M in Series B Financing - new strategic investor bioMérieux will explore collaboration around FAST technology

>> https://docs.wixstatic.com/ugd/45397b_87222b92a3244d1dae3c70bb8d3695ec.pdf




□ Clusterdv, a simple density-based clustering method that is robust, general and automatic.:

>> https://www.biorxiv.org/content/biorxiv/early/2017/11/25/224840.full.pdf

All density-based clustering methods suffer from the problem that density estimation for data with finite sample size produces “sporadic” local maxima that are not related to the “real” structure present in data. As cluster centres with decreasing separability index are added to the dendrogram, they are connected to the cluster centre with which they co-partition at the next higher level.




□ DEAP: Distributed Evolutionary Algorithms in Python

>> https://github.com/DEAP/deap

DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data structures transparent. It works in perfect harmony with parallelisation mechanism such as multiprocessing and SCOOP.

An ephemeral constant is a terminal encapsulating a value that is generated from a given function at run time. Ephemeral constants allow to have terminals that don’t have all the same values.

pset.addEphemeralConstant(lambda: random.uniform(-1, 1))





□ A deep learning method for lincRNA detection using auto-encoder algorithm:

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1922-3

a knowledge-based discovery method using the emerging deep learning technology for lincRNA detection is proposed and developed on DNA genome analysis. It takes advantage of the latest findings of lincRNA data set and aims to utilize the cutting-edge knowledge-based method, namely auto-encoder algorithm, in order to extract the features of lincRNA transcription sites in a more accurate way than conventional methods.




□ pysster: Learning Sequence and Structure Motifs in DNA and RNA Sequences using Convolutional Neural Networks:

>> https://www.biorxiv.org/content/biorxiv/early/2017/12/06/230086.full.pdf

The package can be applied to both DNA & RNA to classify sets of sequences by learning sequence & secondary structure motifs. It offers an automated hyper-parameter optimization & options to visualize learned motifs along w/ information about their positional & class enrichment. The Grid_Search class provides a simple way to execute a hyperparameter tuning for the convolutional neural network model. The tuning returns the best model (highest ROC-AUC on the validation data) and an overview of all trained models.






□ Rapid Nanopore Sequencing of Plasmids and Resistance Gene Detection in Clinical Isolates. "...full annotation of plasmid resistance gene content could be obtained in under 6 h from a subcultured isolate"

>> http://jcm.asm.org/content/55/12/3530.abstract




□ Long-Read Sequencing 2017 December 6-7, 2017; BMC Uppsala #LRUA2017

>> https://ngiseminars.wixsite.com/longread2017

the long-read sequencing has revolutionized the field of de novo sequencing for biodiversity studies, making working with this application less time consuming and more productive.




□ Identify causal variants and estimate their effects on splicing in a Massively Parallel Splicing Assay (MaPSy)

>> https://genomeinterpretation.org/content/MaPSy




□ Gene Conversion Facilitates Adaptive Evolution on Rugged Fitness Landscapes

>> http://www.genetics.org/content/207/4/1577

"Our results reveal the potential for duplicate genes to act as a 'scratch paper' that frees evolution from being limited to strictly beneficial mutations in strongly selective environments"






□ VARIMERGE: Succinct De Bruijn Graph Construction for Massive Populations Through Space-Efficient Merging:

>> https://www.biorxiv.org/content/biorxiv/early/2017/12/06/229641.full.pdf

reconstruct all edge label positions ephemerally (one base position at a time) in decreasing precedence order, and incrementally refine a data structure that captures the inferable rank and equivalence of full labels based on the portion of the labels this algorithm has seen. two de Bruijn graphs G1 = (V1, E2) and G2 = (V2, E2) constructed with integral value k, without loss of generality, |E1| ≥ |E2|, it follows that VariMerge constructs the merged de Bruijn graph GM in O(m·max(k,t))-time, where t is the number of colors (columns) inCM and m=|E1|.






□ Linked-Read sequencing resolves complex structural variants:

>> https://www.biorxiv.org/content/biorxiv/early/2017/12/08/231662.full.pdf

performed 30x Linked-Read genome sequencing on a set of 23 samples with known balanced or unbalanced SVs. Twenty-seven of the 29 known events were detected and another event was called as a candidate. Copy-number variants can be called with as little as 1-2x sequencing depth (5-10Gb) while balanced events require on the order of 10x coverage for variant calls to be made, although specific signal is clearly present at 1-2x sequencing depth.






□ Evolutionary stability of topologically associating domains is associated with conserved gene regulation:

>> https://www.biorxiv.org/content/biorxiv/early/2017/12/09/231431.full.pdf

Disruptions of TADs by large-scale rearrangements change expression patterns of orthologs across tissues and these changes might be explained by the altered regulatory environment which genes are exposed to after rearrangement. a significant association of conserved GE in TADs & divergent expression patterns in rearranged TADs explaining both why there could be selective pressure on the integrity of TADs over large evolutionary time scales, but also how TAD rearrangement can explain evolutionary leaps.






□ Comparative Annotation Toolkit (CAT) - simultaneous clade and personal genome annotation:

>> https://www.biorxiv.org/content/biorxiv/early/2017/12/08/231118.full.pdf

The CAT pipeline takes as input a HAL alignment file, an existing annotation set and aligned RNA-seq reads. CAT uses the Cactus alignment to project annotations to other genomes using transMap. AugustusTMR, treats each transcript projection separately and fixes errors in projection. AugustusPB, uses long-read RNA-seq to look for novel isoforms. AugustusCGP uses the Cactus alignment to simultaneously predict protein-coding genes in all aligned genomes.






□ Prometheus: omics portals for interkingdom comparative genomic analyses:

>> https://www.biorxiv.org/content/biorxiv/early/2017/12/11/232298.full.pdf
>> http://prometheus.kobic.re.kr

Prometheus supports interkingdom comparative analyses via a domain architecture-based gene identification system, Gene Search, and users can easily and rapidly identify single or entire gene sets in specific pathways. Bioinformatics tools for further analyses are provided in Prometheus or through BioExpress, a cloud-based bioinformatics analysis platform. Prometheus suggests a new paradigm for comparative analyses with large amounts of genomic information.






□ Cell-specific prediction and application of drug-induced gene expression profiles:

>> http://www.worldscientific.com/doi/pdf/10.1142/9789813235533_0004

Expression profiles are compiled into a tensor of 978 genes x 2,130 drugs x 71 cell types. The FaLRTC algorithm is sometimes referred to herein as ‘Tensor’. A three-dimensional tensor can be reshaped or unfolded into matrices in three mathematically distinct ways. The second category of extensions are methodological, including: 1) nonlinear modeling; 2) use of auxiliary similarity information; 3) addition of a time dimension to the tensor; 4) modeling measurement reliability; and 5) adopting a probabilistic framework.


LINCSProgram:
A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles:

>> https://www.ncbi.nlm.nih.gov/pubmed/29195078






□ Hercules: a profile HMM-based hybrid error correction algorithm for long reads:

>> https://www.biorxiv.org/content/biorxiv/early/2017/12/13/233080.full.pdf

Hercules, the 1st machine learning-based long read error correction algorithm, learns a posterior transition/emission probability distribution for each long read and uses this to correct errors. Hercules decodes the most probable sequence for profile HMM using Viterbi algorithm.

int backtraceWithViterbi(SequencingNode* graph, HMMParameters parameters, double** transitionProbs,
std::pair* emissionProbs, int numberOfStates, int longReadLength,
std::vector >& backtrace){






□ A TOOLBOX TO IMPROVE GENOME ANNOTATION

>> https://www.sib.swiss/about-us/news/1113-a-toolbox-to-improve-genome-annotation

a strategy toward accurate & complete genome annotation consolidates CDSs from multiple reference annotation, ab initio gene prediction algorithms & in silico ORFs (a six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB).




□ CLAN: the CrossLinked reads ANalysis tool:

>> https://www.biorxiv.org/content/biorxiv/early/2017/12/14/233841.full.pdf

the existing free energy model and sequence-based interaction model remain insufficient to characterize the complex molecular dynamics, and the computationally predicted RNA secondary structure and RRI remain imprecise. CLAN adopts a dynamic programming-based chaining algorithm to select the two non-overlapping mappings whose total length is maximized.





Hibernaculum.

2017-12-07 22:52:14 | Science News

Vulnerant omnes,ultima necat.





bgreene:
What's shocking about quantum mechanics? A particle traveling from here to there explores every possible path, and what we observe is a melding of them all.






□ Mikado: Leveraging multiple transcriptome assembly methods for improved gene structure annotation:

>> https://www.biorxiv.org/content/biorxiv/early/2017/11/09/216994.full.pdf

Mikado integrates multiple RNA-Seq assemblies into a coherent transcript annotation. Mikado can also utilize blast data to score transcripts based on proteins similarity and to identify and split chimeric transcripts.




□ Minerva: An Alignment and Reference Free Approach to Deconvolve Linked-Reads for Metagenomics:

>> https://www.biorxiv.org/content/biorxiv/early/2017/11/10/217869.full.pdf

LDA with hyper-parameter optimization and clustered to obtain a topic cluster for each barcode using the implementation LDA in Mallet. Using X-Means clustered the barcode vectors into discrete groups.




□ Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index:

>> https://www.biorxiv.org/content/biorxiv/early/2017/11/10/217372.full.pdf

Mantis is 4.4× faster and yields a 20% smaller index than the state-of-the-art split sequence Bloom tree (SSBT). For queries, Mantis is 6×–108× faster than SSBT and has no false positives or false negatives.




□ Generalizations of Gödel’s incompleteness theorems for Σn-definable theories of arithmetic:

>> https://www.cambridge.org/core/journals/review-of-symbolic-logic/article/generalizations-of-godels-incompleteness-theorems-for-n-definable-theories-of-arithmetic/9058F853416979FCBE78CF7A2FB40C4C

every ∑n+1-definable ∑n-sound theory is incomplete. every consistent theory has ∏n+1 set of theorems has a true but unprovable ∏n sentence. no ∑ n+1-definable ∑ n -sound theory can prove its own ∑ n -soundness.




□ The new numba based version of UMAP is out.

>> https://github.com/lmcinnes/umap

Now faster than ever, it takes only 2.5 minutes to embed the full 70000 points of the 784-dimensional "Fashion MNIST" dataset.




□ FateID infers cell fate bias in multipotent progenitors from single-cell RNA-seq data:

>> https://www.biorxiv.org/content/biorxiv/early/2017/11/11/218115.full.pdf

The FateID algorithm performs an iterative calculation in order to infer fate bias in multipotent progenitors, starting from cells within committed states of the distinct lineages arising from a common progenitor. FateID performs a topological ordering of pseudo-temporal expression profiles by self-organizing maps (SOM).




□ GPflowOpt: A Bayesian Optimization Library using TensorFlow. (arXiv:1711.03845v1 [stat.ML])

>> https://arxiv.org/abs/1711.03845




□ Robustness of early warning signals for catastrophic and non-catastrophic transitions:

>> https://www.biorxiv.org/content/biorxiv/early/2017/11/12/218297.full.pdf

Most of the EWS commonly applied in ecology have been studied in the context of one specific type of regime shift. the type brought on by a saddle-node bifurcation, at which one stable equilibrium point collides w/ an unstable equilibrium and disappears.




□ A fragment based method for modeling of protein segments into cryo-EM density maps:

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1904-5

FragFit uses a hierarchical strategy to select fragments from a pre-calculated set of billions of fragments derived from structures based on sequence similarly, fit of stem atoms and fit to a cryo-EM density map.






□ Beyond pseudotime: Following T­-cell maturation in single-­cell RNA-seq time series:

>> https://www.biorxiv.org/content/biorxiv/early/2017/11/14/219188.full.pdf

Pseudodynamics model approximates a developmental potential function (Waddington’s landscape) and suggests that thymic T-­cell development is biphasic and not strictly deterministic before beta­-selection. A trajectory in development can be defined probabilistically as change of the density of cells across cell state over time.




□ Singleton Variants Dominate the Genetic Architecture of Human Gene Expression

>> http://biorxiv.org/cgi/content/short/219238v1




□ Illumina Launches NextSeq 550Dx, Expands Use of MiSeqDx includes formalin-fixed paraffin-embedded tissues.

>> https://www.genomeweb.com/sequencing/illumina-launches-nextseq-550dx-expands-use-miseqdx

The NextSeq 550Dx is the illumina's second FDA-regulated and CE-IVD marked sequencer, and has a diagnostic mode and a research mode, enabling its use in clinical research and for developing in vitro diagnostic assays.




□ xCell: digitally portraying the tissue cellularheterogeneity landscape

>> https://link.springer.com/epdf/10.1186/s13059-017-1349-1?author_access_token=DVtns3PR3raQv61Z6RD4Ym_BpE1tBhCbnbw3BuzI2RO7SD-w75iAhrQ7gjSGzw_zJO6jHEpDqZzy8CsttNZVysUprXQi0WGX-FRCguKfv2d96DcweRBG2ni-01x6T6bpU3-cfZ5nkfzCQeNeg-mMUQ%3D%3D

xCell - the most accurate cell type enrichment analysis method from bulk transcriptomes




□ Semantic Graph Analysis for Federated Linked Open Data (LOD) Surfing in Life Sciences:

>> https://link.springer.com/chapter/10.1007/978-3-319-70682-5_18

a vertex v of a connected graph G=(V,E) is a cut vertex if the vertex-induced subgraph with V−{v} of G is disconnected. if vertex v is a cut vertex with two or more SPARQL endpoints, the vertex corresponds to a class connecting two datasets from different SPARQL endpoints. the VIVO-ISF Ontology (VIVO) and the eagle-i Research Resource Ontology (ERO) are commonly used for datasets from 32 SPARQL endpoints, and these two ontologies connect classes from different SPARQL endpoints.




□ LSX: Automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference

>> https://www.biorxiv.org/content/biorxiv/early/2017/11/16/220053.full.pdf

In LSX, the best model of sequence evolution can be given for each gene (in the original implementation only a single model could be selected for all genes), thus improving the accuracy of the branch length estimations and likelihood calculations under PAML.




□ MMseqs2.0: ultra fast and sensitive protein search and clustering suite

>> https://www.nature.com/articles/nbt.3988
>> https://github.com/soedinglab/mmseqs2

MMseqs2 can run 10000 times faster than BLAST and searches very large proteome sets with greater sensitivity than psi-BLAST, and 400x psi-BLAST's speed. At the core of MMseqs2 are two modules for the comparison of two sequence sets with each other - the prefiltering and the alignment modules. The first, prefiltering module computes the similarities between all sequences in one query database with all sequences a target database based on a very fast and sensitive k-mer matching stage followed by an ungapped alignment. The alignment module implements an vectorized Smith-Waterman-alignment of all sequences that pass a cut-off for the ungapped alignment score in the first module.






□ The fractured landscape of RNA-seq alignment: The default in our STARs:

>> https://www.biorxiv.org/content/biorxiv/early/2017/11/16/220681.full.pdf

a controlled assessment using STAR on a single dataset GEUVADIS, permitting a more exhaustive evaluation of performance dependency than is typically feasible. This view is consistent with the observation both of the variation in performance across algorithms in many individual assessments, where the most notable variation in performance was in occasional outliers, and also with our in-depth testing-to-destruction of STAR.




□ Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data:

>> https://www.biorxiv.org/content/biorxiv/early/2017/11/17/221309.full.pdf

On average across the 100 test datasets clust assigned 50% of the input genes to clusters that have significantly lower dispersion than Markov clustering (p-value 2.5×10- 31), k-means (p-value 4.6×10-7), HC (p-value 2.4×10-13), WGCNA (p-value 3.1×10-64), SOMs (p-value 4.8×10-24). All seed clusters are evaluated by the M-N distance (MND) metric (Abu-Jamous, et al., 2015) which consider both within-cluster dispersion and cluster size, and the set of non-overlapping clusters that minimise MND and maximise cluster size are selected as elite seed clusters.




□ The origin of a primordial genome through spontaneous symmetry breaking:

>> https://www.nature.com/articles/s41467-017-00243-x

Thanks to their small copy-numbers, these genome-like molecules experience increased intracellular genetic drift, which neutralises their evolutionary tendency to minimise the catalytic activities of their complements. Thereby, the genome-like molecules provide long-term stability to the genetic information of protocells.

『対称性の自発的破れによる原始的ゲノムの起源』
"相矛盾する進化的傾向が細胞の階層と分子の階層で同時に働くことにより、分子の相補鎖間で対称性が破れる。こうした非対称性は、突然変異圧の低下と細胞内の遺伝的浮動の増大をもたらすことにより、進化的平衡における細胞の適応度を増大させる。"




□ DDRTree: Learning Principal Graphs with DDRTree

>> https://cran.rstudio.com/web/packages/DDRTree/

implementation of the framework of reversed graph embedding which projects data into a reduced dimensional space while constructs a principal tree which passes through the middle of the data simultaneously.




□ cycleX: multi-dimensional pseudotime reveals cell cycle and differentiation relationship of dendritic cell progenitors using t-SNE and GPLVM:

>> https://www.biorxiv.org/content/biorxiv/early/2017/11/20/222372.full.pdf




□ Chaos and the (un)predictability of evolution in a changing environment:

>> https://www.biorxiv.org/content/biorxiv/early/2017/11/20/222471.full.pdf

Transience is expected to cause the proportion of chaos-like trajectories to decrease exponentially with time, even for high-dimensional systems. For each dimensionality, estimate the proportion f (t) of trajectories behaving chaotically at each time step, and used this to estimate the asymptotic proportion of trajectories that remain chaotic over infinite time.




□ Modelling the Dynamics of Biological Systems with the Geometric Hidden Markov Model:

>> https://www.biorxiv.org/content/biorxiv/early/2017/11/22/224063.full.pdf

the application of the GHMM to the analysis of data originating from biological processes, it can also be applied to domains where the dynamics of high-dimensional systems are described using the “landscape” paradigm (evolutionary processes, protein folding, molecular processes). The proposed methodology integrates a graph-theoretical algorithm for manifold learning with a latent variable model for sequential data.






□ Convergence of topological domain boundaries, insulators, and polytene interbands revealed by high-resolution mapping of chromatin contacts:

>> https://elifesciences.org/articles/29550

The implication that these elements are decompacted, extended chromatin regions provides an attractive model in which simple physical separation explains multiple activities associated with insulators, including the ability to block enhancer-promoter interactions, prevent the spread of silenced chromatin, and organize chromatin structure.




□ DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest:

>> https://www.biorxiv.org/content/biorxiv/early/2017/11/24/224527.full.pdf




□ Fragmentation modes and the evolution of life cycles:

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005860

the optimal life cycle is always a deterministic fragmentation mode involving the regular schedule of group development and fragmentation.






□ INDRA: the Integrated Network and Dynamical Reasoning Assembler: From word models to executable models of signaling networks using automated assembly

>> http://msb.embopress.org/content/13/11/954 at http://www.indra.bio/

The architecture of INDRA consists of three layers of modules . In layer (1), interfaces collect mechanisms from natural language processing systems (e.g., TRIPS Interface) and pathway databases and Processors (e.g., TRIPS Processor, BioPAX Processor) extract INDRA Statements. Statements, the internal representation in INDRA, constitute layer (2). In layer (3), INDRA Statements are assembled into various model formats by Assembler modules (e.g., PySB Assembler, Graph Assembler). The conceptual description can be expressed in natural language, which can be formalized as an INDRA Statement between an enzyme and a substrate Agent. The PySB description and a corresponding BioNetGen description describe a particular implementation of this mechanism in terms of a single rule, which corresponds to a instance of 3 differential equations describing the temporal behavior of the enzyme, substrate, and thier complex in time.






□ Nonequilibrium entropic bounds for Darwinian replicators:

>> https://www.biorxiv.org/content/biorxiv/early/2017/11/25/225011.full.pdf

Two other types of replicators are observed in Nature. The one is related to a template-based replication mechanism that we can identify in living systems as the standard mechanism of nucleic acid replication. This mechanism has been shown to lead to the ”survival of everyone”. under well-mixed and unlimited resource conditions, the hyperbolic replicator kinetics is reduced to a second order equation. Autocatalytic growth is characterized by displaying a finite-time singularity at tc = 1/hx0.




□ Vibrational spectra of halide-water dimers: Insights on ion hydration from full-dimensional quantum calculations on many-body potential energy surfaces:

>> http://aip.scitation.org/doi/full/10.1063/1.5005540






□ Infinitely productive language can arise from chance under communicative pressure:

>> https://academic.oup.com/jole/article/2/2/141/3194092

the probability of generating an infinite language (P[crd(L)=∞]) is a sum of this probability over all infinite languages
P[crd(L)=∞]∝∑p:crd(Lp)=∞ 2^−l(p).

If at least one infinite language has nonzero probability (P[crd(L)=∞]>0), then
P[crd(Lp)=∞∣crd(Lp)>B]→1
asB→∞

under maximum entropy assumptions, increasing the complexity of a language will not strongly pressure it to be finite or infinite. In contrast, increasing the number of signals in a language increases the probability of languages that have—in fact—infinite cardinality.




□ HEAVEN: The head anastomosis venture Project outline for the first human head transplantation with spinal linkage (GEMINI)

>> http://surgicalneurologyint.com/surgicalint-articles/heaven-the-head-anastomosis-venture-project-outline-for-the-first-human-head-transplantation-with-spinal-linkage-gemini/






□ Envisagenics closes $2.35M seed round and launches SpliceCore Platform for identifying RNA targets from splicing errors using artificial intelligence

>> http://www.rna-seqblog.com/envisagenics-closes-2-35m-seed-round-and-launches-splicecore-platform-for-identifying-rna-targets-from-splicing-errors/

Envisagenics is one of the first life science companies selected to receive investment from the New York State Innovation Venture Capital Fund administered by Empire State Development. The investors include Third Kind Venture Capital (3KVC), Cosine, LLC (NYC biotech investors), Dolby Family Ventures, Dynamk Capital, NY Empire State Development (ESD), and SV Angels.






□ DESCEND: Gene Expression Distribution Deconvolution in Single Cell RNA Sequencing:

>> https://www.biorxiv.org/content/biorxiv/early/2017/11/30/227033.full.pdf

When covariates are specified, DESCEND uses a log linear model for the covariates effect on nonzero mean & a logit model for the covariate effect on nonzero fraction. when covariates are specified, the deconvolution result is the covariate-adjusted distribution of gene expression.

library(descend)
data(zeisel)

result <- runDescend(zeisel$count.matrix.small,
scaling.consts = zeisel$library.size, n.cores = 3)

hvg <- findHVG(result)

hvg$HVG.genes




Sequentia.

2017-12-07 22:51:50 | Science News


□ New Shapes Solve Infinite Pool-Table Problem

>> https://www.quantamagazine.org/new-shapes-solve-infinite-pool-table-problem-20170808/

“There is a surprising interaction between algebraic geometry, the pure mathematics of moduli spaces, and the dynamics of billiards, that goes both ways,”



□ Secret Link Uncovered Between Pure Math and Physics

>> https://www.quantamagazine.org/secret-link-uncovered-between-pure-math-and-physics-20171201/

The British mathematician Minhyong Kim is making strides in number theory by finding connections to physics, as has happened before with topology and geometry.




fchollet:
If you assume energy consumption at current human levels, and a growth rate of ~3% per year (current), it would take ~2,500 years for us to consume all of the stars in the galaxy

kdzeja:
I just tried to do this math and it works out. ~2,500 years. ~100b solar masses. Compounding rates are too easy to underestimate.




Quantum_Zen:

>> http://jun-makino.sakura.ne.jp/talks/kobe20171203.pdf

「数値計算の結果、太陽系のリアプノフ時間(安定な運動してるタイムスケール)は2000万年ほどだが、実際は現実に45億年安定でいる」
『kAMトーラス完全には壊れて無くて、そんなかでカオス起こってるにしても、全体として周期運動してる構造は頑強に残ってる、って感じなんでしょうね。』

金融市場のHigh-frequency tradingによって波及するシステミック・リスクの評価に、リアプノフ時間やe-folding timeが使えそうだということはずっと感じていたけど、そもそも我々は複雑系の安定性を演算に十分な変数を未だ観測できていない、或いはトランザクションの境界を見誤っているのではないか。




□ Nanopore Community Meeting 2017 The Metropolitan Pavilion, New York

>> https://nanoporetech.com/sites/default/files/s3/ncm17/NCM2017_conference_agenda_final.pdf #nanoporeconf




AaronPomerantz:
Sequencing in spaaaace! Since then, astronauts have prepared DNA libraries in while in space, accurately identifying bacterial communities aboard the @Space_Station. Incredible #nanoporeconf




ewanbirney:
Welcome to the future - DNA sequencing on your mobile phone - imagine where and how you can use it. Hats off to the @nanopore team for getting this to work at this form factor, voltage and watts. https://twitter.com/martinalexsmith/status/936240547014070273




□ Open data release of NA12878 transcriptome on nanopore.
14.9M (13M called) direct RNA reads from 30 flowcells.
24M cDNA reads from 12 flow cells.
Data (raw and basecalled) available from

>> https://github.com/nanopore-wgs-consortium/NA12878/blob/master/RNA.md






vangurp:
Single cell sequencing demultiplexing 1000 of cells using vector quantization and dynamic time warping #bioinformatics #nanoporeconf #realRNAseq






□ read-until with basecall and reference-informed criteria (RUBRIC) #nanoporeconf

RUBRIC software basecalls @nanopore reads and rejects non-target reads after just 1.5 seconds #nanoporeconf




□ Bjarni V. Halldórsson: deCODE genetics links disease phenotypes to genetic mutations: now using #nanopore sequencing for finding structural variants and methylation status #nanoporeconf

□ Bjarni Halldórsson from @decodegenetics: long reads for population-scale genomics. WGS of 36k Islandic and 7k non-Islandic people. Also 100 individuals at 10x, and 5 at 30-50x with nanopore, also looking at methylation. #nanoporeconf






marimiya_clc:
このスライドが衝撃だったな。Exonが50個以上ある遺伝子だからということもあるらしいけど、見つかった10個のアイソフォームのうち、9個が新規とか、ロングリードで行う必要性を改めて実感.




AWS re:invent 2017: AWS Batch: Easy and Efficient Batch Computing on AWS (CMP323)


Base2G:
Our presentation describing our use of AWS Batch for scalable human genome analysis is now available. Tweet us if you have any questions. #reInvent2017




□ READING DNA IN REAL TIME: GARVAN’S NEW LONG-READ SEQUENCING CAPABILITY

>> https://www.garvan.org.au/news-events/news/reading-dna-in-real-time-garvan2019s-new-long-read-sequencing-capability

One of only two worldwide with certification on the GridIONx5 platform.




□ Variable Selection in Heterogeneous Datasets: A Truncated-rank Sparse Linear Mixed Model with Applications to Genome-wide Association Studies:

>> https://www.biorxiv.org/content/biorxiv/early/2017/12/03/228106.full.pdf

the proposed method does not require prior knowledge of sample structure in the data and adaptively selects a covariance structure of the correct complexity. the TrSLMM-based methods show improved performance. As an example where the SLMM methods are comparable when G = 2 SLMM-MCP and SLMM-SCAD behave better than TrSLMM-Lasso, but even they remain slightly inferior to TrSLMM-MCP and TrSLMM-SCAD.




□ Multiplex Confounding Factor Correction for Genomic Association Mapping with Squared Sparse Linear Mixed Model:

>> https://www.biorxiv.org/content/biorxiv/early/2017/12/03/228114.full.pdf

kinship matrix can gain many advantages of the model including allowing the model to consider hidden confounding structures, distinguish the statistical representation power for fixed effect variables and kinship matrix when kinship matrix is calculated as the covariance matrix of fixed effect variables, and can be reformulated into another model that considers that the random effects are not independently, but also following a covariance matrix.




□ Terrestrial effects of moderately nearby supernovae:

>> https://www.biorxiv.org/content/biorxiv/early/2017/12/08/230847.full.pdf

While the bulk of other kinds of space weather and even gamma-ray bursts ionize the upper stratosphere these high energy cosmic rays penetrate much deeper. The peak ionization rate is at 10 km, deep in the troposphere, where weather takes place. Even down at the surface, ionization rates are up by close to a factor of 100. With a huge increase in the number of cosmic rays traversing the lower atmosphere to the ground, we can expect a similarly large increase in the amount of lightning, especially cloud-to-ground lightning.




□ Explain XGBoost models and justify its predictions with decision paths (ELI5).

Code: https://github.com/MLWave/black-boxxy#xgboost-decision-paths

Explaining model: XGBRegressor(base_score=0.5, colsample_bylevel=1, colsample_bytree=1, gamma=0,
learning_rate=0.1, max_delta_step=0, max_depth=3,
min_child_weight=1, missing=None, n_estimators=100, nthread=-1,
objective='reg:linear', reg_alpha=0, reg_lambda=1,
scale_pos_weight=1, seed=0, silent=True, subsample=1)

"we not only need to deliver a highly predictive model, but a model that is also explainable."