lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

OpuS_XVII.

2016-07-30 14:06:19 | Science News

(iPhone 6s, Camera.)

人の生きた証は星の光のようなもの。その想いが潰えていても、誰かが憶えている限り輝き続ける。ならば星は記憶なのだ。そして心は、燃え尽きた光を集めて出来ている。



□ 稟議の肯・否定でもそうだけれど、分析やエビデンスの対立自体がいちいち障害となることは少ない。現実の稟議においては、実益を得るための代案、代案、代案の応酬の中で、概念レベルの二分法は曖昧になる。






□ RevBayes: Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model-Specification Language:

>> http://sysbio.oxfordjournals.org/content/65/4/726.full

A species-tree model depicted in graphical-model notation and the corresponding specification in the Rev language. Rev is inspired by the R language and the BUGS model-specification language; their popularity should reduce the Rev learning curve.




□ Ouija: Incorporating prior knowledge in single-cell trajectory learning using Bayesian nonlinear factor analysis:

>> http://biorxiv.org/content/biorxiv/early/2016/06/23/060442.full.pdf




□ SCOUP: the Ornstein-Uhlenbeck process to analyze single-cell expression data during differentiation:

>> http://www.ncbi.nlm.nih.gov/pubmed/27277014

この二点の論文も同時に発表されているけれど、single-cell発現解析における擬似時間推計のメソッドが大きく発展していきそうだ。


Under-the-hood Ouija uses Bayesian hierarchical nonlinear factor analysis as implemented in the probabilistic programming language. an orthogonal and complimentary approach to unsupervised, whole transcriptome methods that do not explicitly model any gene-specific behaviors, and do not readily permit the inclusion of prior knowledge.

the function f maps the one-dimensional pseudotime to the p-dimensional observation space in which the data lies. a Bayesian hierarchical model where the likelihood model is given by a bimodal distribution.



□ Pseudotime Estimation: Deconfounding Single Cell Time Series:

>> http://bioinformatics.oxfordjournals.org/content/early/2016/06/16/bioinformatics.btw372.long

a principled probabilistic model that accounts for uncertainty in the capture times of repeated cross-sectional time series. The latent space in all of the methods above is unstructured: there is no direct physical or biological interpretation of the space and the methods do not directly relate experimental covariates such as cell type or capture time to the space.

擬似時間を用いた推計について、時間次元の不確実性を考慮することで、Monocle, Waterfall, Embeddr, Wanderlust等の最先端の手法よりも優れた時系列横断的データの分析を行っている。




□ Oxford Statistics Phasing Server:

>> https://phasingserver.stats.ox.ac.uk

using large reference panels of haplotypes from the Haplotype Reference Consortium together with novel statistical methods implemented in the SHAPEIT2 program to carry out highly accurate phasing.




□ CSBB (Computational Suite for Bioinformaticians and Biologists) is a command line based bioinformatics suite:

>> https://github.com/skygenomics/CSBB-v1.0




□ fCCAC: functional canonical correlation analysis to evaluate covariance between nucleic acid sequencing datasets:

>> http://biorxiv.org/content/biorxiv/early/2016/06/27/060780.full.pdf




□ Dimension-Free Iteration Complexity of Finite Sum Optimization Problems:

>> http://arxiv.org/abs/1606.09333v1

this framework subsumes the vast majority of optimization methods for machine learning problems, it applies to SDCA, accelerated proximal SDCA, SDCA without duality, SAG, SAGA, SVRG and acceleration schemes, as well as for a large number of methods for smooth convex optimization (i.e., FSM with n = 1), e.g., stochastic Gradient descent, Accelerated Gradient Descent, the Heavy-Ball method and stochastic coordinate descent. the iteration complexity of oblivious (possibly stochastic) CLI algorithms equipped with dual RLM oracles.




□ Does NP-completeness have a role to play in Bioinformatics?:

>> https://pmelsted.wordpress.com/2016/06/22/does-np-completeness-have-a-role-to-play-in-bioinformatics/

In genome assembly several problems have been proposed as “capturing the essence” of the problem. The reduction to the Hamilton path also relies on having very long reads, whereas BGREAT is mostly focusing on short reads.




□ Tempus et Locus: a tool for extracting precisely dated viral sequences from GenBank:

>> http://biorxiv.org/content/biorxiv/early/2016/07/01/061697.full.pdf

A coalescent constant size tree prior was fixed, and strict clock, relaxed lognormal clock and relaxed exponential clock were run for 10 million iterations in BEAST. The best of the three clock models was then determined using the Akaike Information Criterion Markov (AICM) tool in Tracer, and further runs initiated using that best clock model and expansion growth, logistic growth, exponential growth and a Bayesian skyline.




□ Deep Sequencing of 10,000 Human Genomes:

>> http://biorxiv.org/content/biorxiv/early/2016/07/01/061663.full.pdf

pattern variation in human genome in depth, SNV Metaprofiles of protein-coding genes used GENCODE annotated TSS (n=88,046), start codons (n=21,147), splice donor and acceptor sites (n=137,079 and 133,072), stop codons (n=37,742) & polyadenylation sites (n=88,103). Each subsequently sequenced genome contributes on average 8,579 novel variants, that varied from 7,214 Europeans and 10,978 in admixed, to 13,530 in individuals of African ancestry.




□ Neuronal Representation of Numerosity Zero in the Primate Parieto-Frontal Number Network:

>> http://www.cell.com/current-biology/fulltext/S0960-9822(16)30262-7?sf30160322=1

These findings elucidate how the brain transforms the absence of countable items, nothing, into an abstract quantitative category, zero.




□ A pyethereum revamp-in-progress: "Purification", Consensus Abstraction, State Snapshots (ie. rapid sync time and an end to hard-fork technical debt) and more)

>> https://www.reddit.com/r/ethereum/comments/4r1k19/a_pyethereum_revampinprogress_purification?st=iq86kd6c&sh=dcf5ed78






□ Can DE methods developed originally for conventional bulk RNA-seq data also be applied to single-cell RNA-seq?:

>> http://www.rna-seqblog.com/can-differential-expression-methods-developed-originally-for-conventional-bulk-rna-seq-data-also-be-applied-to-single-cell-rna-seq/

Single-cell-specific methods did not have systematically better performance than other methods. Methods developed originally for bulk RNA-seq, DESeq and Limma, were not suitable for analyzing scRNA-seq data. ROTS performed well in all the comparisons and MAST had good statistical properties (false positives, precision and recall) as well.




□ SVAPLSseq: to correct for hidden sources of variability in differential gene expression studies based on RNAseq data

>> http://biorxiv.org/content/early/2016/07/05/062125

SVAPLSseq that aims to capture the traces of hidden variability in the data and incorporate them in a regression framework to re-estimate the primary signals for finding the truly positive genes.






□ Chaotic provinces in the kingdom of the Red Queen:

>> http://biorxiv.org/content/biorxiv/early/2016/07/06/062349.full.pdf

The nature of the dynamics depends strongly on the initial configuration of the system, the usual regular Red Queen oscillations are only observed in some parts of the parameter region. A neutrally stable fixed point and the consequent concentric circles, spheres or higher dimensional circulations around the point mean that the system is constantly changing, and yet, stationary in this change.




□ □ radix trees for both rstats and rcpp: Key-value structures with superfast matching.

>> https://github.com/Ironholds/triebeard




□ ISMB 2016 Proceedings JULY 8 - JULY 12, 2016, ORLANDO, FLORIDA: #ISMB2016 #BOSC2016

>> http://bioinformatics.oxfordjournals.org/content/32/12.toc




□ Medusa: Jumping across biomedical contexts using compressive data fusion:

>> http://bioinformatics.oxfordjournals.org/content/32/12/i90.full

Medusa, an automatic module detection algorithm that finds a size-k module of candidate objects that are jointly relevant to the pivots. Medusa explicitly takes different semantics into consideration during module detection by allowing choose a particular semantic or combine.




□ DeepMeSH: deep semantic representation for improving large-scale MeSH indexing

>> http://bioinformatics.oxfordjournals.org/content/32/12/i70.full

the deep semantic representation, D2V-TFIDF, that integrates the power of both dense representation, D2V, and sparse representation, TFIDF. ‘learning to rank’ which integrates diverse evidence smoothly and effectively.




□ SnoVault and encodeD: A novel object-based storage system & applications to ENCODE metadata:

>> https://github.com/ENCODE-DCC/snovault
>> https://github.com/ENCODE-DCC/encoded

encodeD provides a flexible metadata modle for representing samples and SnoVault is a general purpose database it uses for representing it. Uses JSON-SCHEMA and JSON-LD to validate and normalize the data. The core database engine, SnoVault which is completely independent of ENCODE, genomic data, or bioinformatic data.




□ Transcending the prediction paradigm: novel applications of SHAPE to RNA function and evolution:

>> http://onlinelibrary.wiley.com/doi/10.1002/wrna.1374/full

SHAPE is a model-free approach to RNA and provides information at multiscale level, and combined with other methods such as Shannon entropy. The use of Fourier transforms to detect periodicity in (aggregated) SHAPE data is the first example of applying signal processing to SHAPE.




□ Bitcoin-powered genomics: a Bitcoin-payable API for the Exome Aggregation Consortium (ExAC) database:

>> https://github.com/joepickrell/exac21/blob/master/README.md
>> https://joepickrell.wordpress.com/2016/07/11/bitcoin-powered-genomics/

ジナリス破産の件も、このJoe Pickrell氏のアイデアにもあるように、ethereum等の仮想通貨とpayable APIを用いて、バイオベンチャー非上場の新興企業株への投資を支えれば、ゲノミクスから社会的に得られるリターンを増幅できるのでは。




□ Compacting de Bruijn graphs in parallel from sequencing data quickly and in low memory:

>> http://cristal.univ-lille.fr/~chikhi/pdf/2016-july-10-ismb.pdf




□ Coordinates and Intervals in Graph-based Reference Genomes:

>> http://biorxiv.org/content/biorxiv/early/2016/07/11/063206.full.pdf

they define a genomic interval in a graph-based reference genome as a path between two vertices. a simple way to unambiguously represent genomic intervals by including information about all region paths covered by the interval, as well as the start and end coordinates, using an offset-based coordinate system.




□ GRNmap & GRNsight: open source for dynamical systems modeling/visualization of medium-scale gene regulatory networks

>> http://f1000research.com/posters/5-1618

include an option to use a Michaelis-Menten production function as well as the sigmoidal production function, the ability to input replicate expression data instead of the means for each timepoint, and the option to include data for experiments in which a transcription factor was deleted from the network, among others.




□ Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies:

>> https://peerj.com/preprints/2284/

Self-chimeras appear in Oases and Trinity contig sets at rate ranging respectively from 0.11 to 1.39 and from 0.09 to 0.56%. In DRAP, the corresponding figures drop to 0.01 to 0.16 and 0.00 to 0.01%.

DRAP includes the runMeta workflow, Differences in compaction and correction are more important between Trinity and Oases than between pooled versus meta-assembly. Pooled assemblies collect significantly worse results for the number of reference proteins and number of read pairs aligned on the contigs. This comes from the filtering strategy which eliminates low-expressed contigs of a given condition when merging all the samples but will keep these contigs in a per sample assembly and meta-assembly strategy. runAssessment processes different contigs sets build from the same read sets to generate assembly and alignment metrics which are collected in report.




□ Learning in Quantum Control: High-Dimensional Global Optimization for Noisy Quantum Dynamics:

>> http://arxiv.org/pdf/1607.03428v1.pdf

applying machine-learning algorithms to quantum control, namely the adaptive phase estimation and quantum gate design. The supervised-learning technique using SuSSADE enables to perform single-shot high-fidelity three-quoit gates that are as fast as an entangling two-qubit gate under the same experimental constraints.




□ Meta-dimensional data integration identifies critical pathways for susceptibility, tumorigenesis and progression:

>> http://www.impactjournals.com/oncotarget/index.php?journal=oncotarget&page=article&op=view&path%5B%5D=10509

This framework is different from other methods such as CONEXIC and PARADIGM, which are restricted to data generated from the same cohort. avoided the bias in each pathway-level GWAS analysis method by combining the results from different methods using Monte-Carlo simulations.




□ A Vector Space for Distributional Semantics for Entailment:

>> http://arxiv.org/pdf/1607.03780v1.pdf

Within this new vector space, the entailment operators and inference equations apply, thereby generalising naturally from these lexical representations to the compositional semantics.




□ Revealing neuro-computational mechanisms of reinforcement learning and decision-making with the hBayesDM:

>> http://biorxiv.org/content/biorxiv/early/2016/07/16/064287.full.pdf

With the hBayesDM package, anyone with minimal knowledge of programming can take advantage of cutting-edge computational modeling approaches and investigate the underlying processes of interactions between multiple decision-making (e.g. goal-directed, habitual, Pavlovian) systems.

model { ...
for (i in 1:N) { for (t in 1:T) {
Choice[i, t] ~ categorical_logit(ev);
... }

Generated quantities {
for (i in 1:N) { log_lik[i]=0; for (t in 1:T) {
log_lik[i]=log_lik[i] + categorical_logit_lpmf(Choice[i, t] | ev);
}




□ pRSEM: Integrative analysis with ChIP-seq advances the limits of transcript quantification from RNA-seq:

>> http://genome.cshlp.org/content/early/2016/07/11/gr.199174.115.full.pdf

pRSEM has a lower false-positive rate than alternative methods in data-driven simulations. Dirichlet-multinomial model used by pRSEM provides a flexible framework for the incorporation of prior information from a variety of sources.




□ TthPrimPol: a new Whole genome amplification technology from Expedeon/Sygnis: TruePrime

>> http://core-genomics.blogspot.jp/2016/07/whole-genome-amplification-improved.html

TthPrimPol is a DNA and RNA primase with DNA-dependent DNA and RNA polymerase activity. It is a unique human enzyme capable of de novo DNA synthesis solely with dNTPs and is found primarily in the nucleus - TthPrimPol -/- cells show inefficient mtDNA replication, but it is not an essential protein.




□ Multi-dimensional biomaterials for theragnosis:

>> http://biomaterialsres.biomedcentral.com/cfp-theragnosis




□ DeepLNC, a long non-coding RNA prediction tool using deep neural network:

>> http://link.springer.com/article/10.1007%2Fs13721-016-0129-2
>> http://bioserver.iiita.ac.in/deeplnc/index.php

The information content stored in k-mer pattern has been used as a sole feature for the DNN classifier using LNCipedia and RefSeq database, obtaining accuracy of 98.07 %, sensitivity of 98.98 %, and specificity of 97.19 %, respectively, on test dataset. The k-mer information content generated on the basis of Shannon entropy function has resulted in improved classifier accuracy.




□ Fully Dynamic de Bruijn Graphs: a space- and time-efficient fully dynamic implementation & jumbled pattern matching.

>> https://arxiv.org/pdf/1607.04909v2.pdf

based on minimal perfect hash functions, and the data structure is based on a combination of Karp-Rabin hashing and minimal perfect hashing. maintain a σ-ary kth-order de Bruijn graph G with n nodes that is fully dynamic in O(n(log log n + σ)) bits with high probability.




□ An integrated, structure- and energy-based view of the genetic code:

>> http://nar.oxfordjournals.org/content/early/2016/07/22/nar.gkw608.full

The new organization of the genetic code clearly segregates 4-codon boxes from 2:2- and 3:1-codon boxes. In the evolution of the triplet code, a major role was played by thermostability of codon–anticodon interactions. the evolution of the genetic code is primarily based on genomic GC-content w/ progressive introduction of U/A together w/ tRNA modifications.