lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

opus_XII.

2015-07-31 07:31:00 | Science News


□ Ted Pim: Street Artist Revamps Abandoned Buildings With Creepy Baroque Imagery:

>> http://www.huffingtonpost.com/entry/street-artist-revamps-abandoned-buildings-with-baroque-imagery_55a68bb3e4b0c5f0322bff12


学び、気付き、死ね。私が生きている間に深層心理に潜り込み、得た智慧や知見は、その表現の手前で大部分が失われて、誰にも伝わることがなく、誰かにとって価値のあるものではない。共感という気休めの儀式だけが遺る。

心というのは浸食過程だ。一刻一刻と寄せて返す水流の、壁を打って翻る渦のようなもの。次の瞬間には泡に消えて、削られた岸へ岸へと推し進むだけ。

私たちに時間は残されていない。灰燼だけが安らぎの在り処だ。そして思考と因果の座すところは地と星ほどに離れている。

人は人との繋がりによって学ぶのではないと感じる。他己の瞳の宿す光は、己のそれと光源を同じくして、内側から辺縁に投影されるパルスに似る。私たちが見ているのは誰かではなく、誰かに為り替わる影であり、彼我の狭間に移ろうように見えるだけ。知り得ることは伝わるのではなく、洞察するのである。

何かを概念化するということは、何かの概念化をしないということである。計算可能なセグメントは制限されているから、何かのカードをめくれば、何処かのカードが翻る。そのすべてを開示することは出来ないかもしれない。我々が何かを理解するという事象は、何かについてのスタンスを得ることに等しい。






□ Deep Genomics: a start-up bringing the power of deep learning to genomics:

>> http://www.deepgenomics.com

Deep Genomics's SPIDEX is produced state-of-the-art splicing simulator assembled by our proprietary deep learning algorithms. It captures 65% of variance of splicing levels across exon triplets observed to undergo alternative cassette splicing. It is a Bayesian ensemble of DNN trained w/ RNA-seq data from a diverse set of healthy human tissues & thousands of engineered RNA features.






□ DeepBind: Predicting the sequence specificities of DNA/RNA binding proteins by Deep Learning:

>> http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.3300.html

DeepBind computes a binding score f(s) using four stages:

f(s) = netW(pool)(rect b(convM(s))))

The convolve, rectify, pool and neural network stages predict a separate score for each sequence using the current model parameters.





(Cram ́er-von Mises test sensitivity to changes in parameters of the Poisson-Beta distribution.)

□ Discrete Distributional Differential Expression (D3E) A Tool for Gene Expression Analysis of Single-cell RNA-seq Data

>> http://biorxiv.org/content/early/2015/07/25/020735

D3E is based on an analytically tractable stochastic model, and thus it provides additional biological insights by quantifying biologically meaningful properties, , such as the average burst size and frequency.




□ BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations:

>> http://nar.oxfordjournals.org/content/early/2015/07/21/nar.gkv733.full

the chemical potential is protein-specific in the biophysical modeling of TF–DNA interaction it affects the in silico computation of TF binding affinity changes (i.e. δdbA)






□ The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease:

>> http://www.cell.com/ajhg/abstract/S0002-9297(15)00234-7

HPO applied a phenotype-aware CR system (the Bio-LarK Concept Recognizer) with 5,136,645 of 22,376,811 articles listed in PubMed. CR & Bioinformatic Analysis: Validation w/ OMIM, Orphanet & DO assessing phenotype sharing, linked to the same locus




□ GAML: genome assembly by maximum likelihood w/ systematic combination of diverse seq-data into a single assembly:

> http://www.almob.org/content/10/1/18

GAML can use any combination of insert sizes w/ Illumina, 454 & PacBio in a encompaasing error rate, comparable to ALLPATHS-LG or Cerulean.

compute the final score as a weighted combination of log average probabilities

LAP(A|R1,…,Rk)=w1LAP(A|R1)+…+wkLAP(A|Rk).




□ ASTRID: Accurate Species TRees from Internode Distances: statistically consistent under the MSC, faster than ASTRAL-2

>> http://biorxiv.org/content/biorxiv/early/2015/07/22/023036.full.pdf

ASTRID will be most useful as a starting tree for use within more computationally intensive analyses, including Bayesian MCMC analyses (e.g., *BEAST) or maximum likelihood analyses.

The input to ASTRID is a set of unrooted gene trees T1,...,Tk.WeletSi =L(Ti) denote the leafset of Ti, and S = ∪iL(Ti). Let |S| = n.

under the assumption that k > n and that ASTRID uses a distance-based method that runs in O(n^3) time, ASTRID’s running time is O(kn^2).






□ Knowledge transfer via classification rules using functional mapping for integrative modeling of gene expression data

>> http://www.biomedcentral.com/1471-2105/16/226

Ganchev and colleagues proposed a novel framework ― transfer rule learning (TRL) which leverages the concept of transfer learning to build an integrative model of classification rules from two datasets.

using semantic similarity as a distance measure, construct a similarity matrix among the GO terms. with the similarity matrix as input, applied the spectral clustering algorithm to group the GO terms into functionally similar clusters. TRL-FM can facilitate knowledge transfer among MAGE datasets that have different variable symbols, as long as the variables can be mapped to a common biological functions.






□ Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale

>> http://www.biomedcentral.com/1471-2105/16/227

Red is designed using Signal Processing & Machine Learning as well as a novel data structure to handle long DNA sequences efficiently. Red is the first repeat-detection tool that has the capability of labeling own training data and train itself automatically on each genome. A state in the Hidden Markov Model is designed to generate a specific range of scores that have the same logarithmic value.




□ RUVSeq: How data analysis affects power, reproducibility & biological insight of RNA-seq studies in complex datasets

>> http://nar.oxfordjournals.org/content/early/2015/07/21/nar.gkv736.full

RUV considerably increases the number of genes discovered as differentially expressed. RUV considerably increases the number of genes discovered as differentially expressed. indeed an improvement in the detection of true biological signal by showing that it increases the discovery of positive controls, known pathways involved in learning and memory and cross-platform concordance.






□ TEtranscripts: A package for including transposable elements in differential expression analysis of RNA-seq datasets:

>> http://bioinformatics.oxfordjournals.org/content/early/2015/07/22/bioinformatics.btv422.short

TETranscripts quantifies both gene and transposable element (TE) transcript abundances from RNA-Seq, utilizing both uniquely and ambiguously mapped short read sequences. It processes the short reads alignments (BAM files) and proportionally assigns read counts, to the corresponding gene or TE based on the user-provided annotation files (GTF files)




□ MetaFast: fast reference-free graph-based comparison of shotgun metagenomic data:

>> http://mccmb.belozersky.msu.ru/2015/proceedings/abstracts/37.pdf

Calculating a characteristic vector for every metagenome with a length equal to the number of connected components. Each vector element is the number of k-mers from a connected component that are present in reads of the metagenome. Cross-comparing metagenomes by calculating the Bray-Curtis dissimilarity matrix based on characteristic vectors.




□ Genomic Contextual Data JSON:

>> http://gensc.org/projects/gcdj/

The GCDJ core component supersedes Genomic Contextual Data Markup Language (GCDML) and adds a framework of tools.The GSC has groomed the MIxS standard into a stable, productive state and new standards are being constantly developed (e.g. MIBiG, MSB3). The INSDC and large analysis pipelines (e.g. MG-RAST) accepts and validates MIxS-compliant metadata.




□ For Reproducible bioinformatics research.

・ When Analyses that include randomness, Note underlying random seeds.
・Generate hierarchical Analysis output, allowing Layers of increasing Detail to be Inspected.
・Connect textual statements to Underlying Results, and provide Public Access to Scripts, Runs, and Results.




□ Renormalized spacetime is two-dimensional at the Planck scale:

>> http://arxiv.org/pdf/1507.05669.pdf

a physical ansatz to relate the renormalized metric tensor to the bare metric tensor such that the spacetime acquires a zero-point length l0 of the order of the Planck length LP.

the Euclidean volume VD(l,l0) in a D-dimensional spacetime of a region of size l scales as VD(l,l0) ∝ lD-2l2 0 when l ∼ l0 , while it reduces to the standard result VD (l, l0 ) ∝ lD at large scales (l ≫ l0). The appropriately defined effective dimension, Deff, decreases continuously from Deff = D (at l ≫ l0) to Deff = 2 (at l ∼ l0).

the existence of the zero-point-length in the spacetime and leads to well defined computation rules which can incorporate the effects of quantum gravity at mesoscopic scales w/o us leaving the comfort of a continuum differential geometry.





(The non-backtracking (NB) matrix and weak nodes. The optimal strategy for immunization and spreading minimizes λ by removing the minimum number of nodes that destroys all the loops.)

□ Influence maximization in complex networks through optimal percolation:

>> http://www.nature.com/nature/journal/vaop/ncurrent/abs/nature14604.html

map the problem onto optimal percolation in random networks to identify the minimal set of influencers, which arises by minimizing the energy of a many-body system, where the form of the interactions is fixed by the non-backtracking matrix.




□ CN-Summ: Complex Networks and Extractive Summarization

>> https://www.inf.pucrs.br/~propor2010/proceedings/phdmsc/AntiqueiraNunes.pdf

k-core strategyに関する記述が余りにも少ないので他の論文を当たって見た。単純にk-coreを用いた集計手順か.

A subgraph g of a graph G is a k-core if every node i of g has degree at least equal to k. This subgraph must also be the greatest subgraph of G that has this property. Notice that a non empty k-core w/ the maximum possible k called the innermost k-core, is a subgraph that consists of densely connected nodes.




□ Testbeds and Research Infrastructure: Development of Networks and Communities:

>> https://books.google.co.jp/books?id=JGC7BQAAQBAJ

In other words, k-core strategy selects a group of nodes that have minimum degree among themselves, each selected node has a degree higher than or equal to k within the wormhole. In order to find the Core wormhole the algorithm described above in the k-Core strategy is executed on a binary search for the largest k that returns a non-empty set of nodes.




□ Ibis: Scaling the Python Data Experience: a new data analysis framework launched today:

>> http://www.ibis-project.org

Ibis enable Python to become a true first-class language for Apache Hadoop, without compromises in functionality, usability, or performance. Ibis expands the useful set of Python that can be translated to LLVM IR to achieve true native performance at scale on complex data w Impala and exposing machine learning functionality already available in MADLib.




morgantaschuk: Biologists and bioinformaticians have different software needs

>> https://modernmodelorganism.wordpress.com/2015/07/19/biologists-and-bioinformaticians-have-different-software-needs

Computational biologists are interested in results and Bioinformaticians are interested in methods.

As many bioinformatics tools can easily be statically linked,it's not too hard to keep binaries working even if you update the basic system. at the moment it feels like there's about a dozen workflow management systems (WMS), all having a different focus, different priorities and different ways to define your pipeline. The "common workflow language" was started at BOSC 2014 to solve this.



□ Alignment of time course gene expression data and the classification of developmentally driven genes with HMM:

>> http://www.biomedcentral.com/1471-2105/16/196

Interpolation of the expression values between observed time points is not readily justified as significant non-linear variations in expression could conceivably occur between adjacent time points.




□ do_x3dna: a tool to analyze structural fluctuations of dsDNA or dsRNA from molecular dynamics simulations:

>> http://bioinformatics.oxfordjournals.org/content/31/15/2583.short

a Python module dnaMD to perform and visualize statistical analyses of complex data obtained from the trajectories.




□ HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy:

>> http://bioinformatics.oxfordjournals.org/content/31/15/2475.short




□ Randomized embeddings for Extreme Learning: SVD of a particular matrix: randomized approximations to kernel machines

>> https://github.com/pmineiro/randembed

Given features X and labels Y, where the SVD of X is given by
X = UX ΣX VX

and the SVD of (UXT Y) is
UXT Y = UE ΣE VE,

the k-dimensional embedding is defined as the first k columns of VE. This definition is motivated by the optimal rank-constrained least-squares approximation of Y given X. Randomized methods provide a fast way of approximating these SVDs when the dimensionalities are large.




□ Fast Label Embeddings via Randomized Linear Algebra:

>> http://arxiv.org/abs/1412.6547

The result is a randomized algorithm whose running time is exponentially faster than naive algorithms. on two large-scale public datasets, from the Large Scale Hierarchical Text Challenge and the Open Directory Project. embeddings can be part of a strategy for zero-shot learning, i.e., designing a classifier which is extensible in the output space.






□ Python Machine Learning: Unlock deeper insights into cutting-edge predictive analytics

>> https://www.packtpub.com/big-data-and-business-intelligence/python-machine-learning




□ Sealer: a scalable gap-closing application for finishing draft genomes:

>> http://www.biomedcentral.com/1471-2105/16/230

sealer uses the succinct Bloom filter representation of a de Bruijn graph to close gaps in draft assemblies, incl very large genomes. it scales to successfully close 50.8 % and 13.8 % of gaps in human (3 Gbp) and white spruce (20 Gbp) draft assemblies in under 30 and 27 h,




□ Illumina Inc (ILMN) Discloses Files Form 4 Insider Selling : Christian O Henry Sells 5,390 Shares:

>> http://www.insidertradingreport.org/illumina-inc-ilmn-files-form-4-insider-selling-christian-o-henry-sells-5390-shares/642423/

Currently the company Insiders own 1.4% of Illumina Inc. In the past 6 months, there is a change of -26.38% in the total insider ownership. Illumina Inc sold 5,390 shares. The Insider selling transaction was disclosed on Jul 17, 2015 to the Securities and Exchange Commission. The shares were sold at $229.73 per share for a total value of $1,237,613.00.




□ Alexandria Real Estate Equities PT Lowered to $97.00 (ARE):

>> http://www.dakotafinancialnews.com/alexandria-real-estate-equities-pt-lowered-to-97-00-are/264805/

In other Alexandria Real Estate Equities news, CIO Peter M. Moglia sold 2,666 shares of the company’s stock in a transaction dated June 29. The stock was sold at an average price of $88.25, for a total transaction of $235,274.50.







近年の地球温暖化「否定論」が、いかに観測データを無視してるかを示す簡潔なシミュレーション。温室効果ガス要因だけで温暖化モデルの要素を満たしてる。太陽活動も寄与していない。極端な環境からデータ採取することの妥当性も。

omgirlsvt
Climate deniers blame #globalwarming on nature. This NASA data begs to differ http://www.bloomberg.com/graphics/2015-whats-warming-the-world/ … #climatechange

Climate scientists tend not to report climate results in whole temperatures. Instead, they talk about how the annual temperature departs from an average, or baseline. They call these departures "anomalies."




□ The Brain vs Deep Learning: Computational Complexity or Why the Singularity Is Nowhere Near

>> https://timdettmers.wordpress.com/2015/07/27/brain-vs-deep-learning-singularity/

an extended linear-nonlinear-Poisson cascade model as groundwork and related it to convolutional architectures.

アナロジーが複雑系に対する相似構造の投影だとしたら、自明性は全て計量できるはずだ。私たちにはリソースが与えられていないだけなのか。

The brain represents object categories w/in a continuous semantic space which is organized into broad gradients across the cortical surface. This semantic space is shared across different individuals, underlying category representation in the brain probably has many dimensions.





the pass of excess leads to the tower of wisdom.

2015-07-20 03:48:06 | Science News

(Pluto. - NASA's New Horizons spacecraft: July 13, 2015, from a distance of 476,000 miles)


予測モデルを実務で用いる場合、精細な経時データを得られたとしても、最後の最後で見えざる手に意思決定を委ねてしまうケースは多くある。どこまで一貫して機械的に数値をアサインするのか。これは確率的事象の分布が未知という点で永遠のテーマなのだけど、リスクを削ることが、いつもリスクを回避することにならないということは、そろそろ周知されて然るべきだ。

生物学的データにおいては、既知の構造や相同性から決定論的モデルについて予測できることと、未知の機能と多体間の複雑な相互作用の影響について、この両者を扱う次元が全く異なる為、現時点での人工知能的アプローチがどこまで有効かという議論は、もっと前向きにされて良いかもしれない。






□ Astronomical or Genomical?: genomics is a “4-headed beast” the most demanding needs for data

>> http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002195

the current worldwide sequencing capacity to exceed 35 petabases per year, including the sixteen Illumina X-Ten systems that have been sold so far, each with a capacity of ~2 petabases per year. By 2025, some 100 million, and up to 2 billion, human genomes could be sequenced. This would require some 2 exabytes to 40 exabytes of storage space, just for the human genomes.




□ Total eclipse of the heart: the AM Canum Venaticorum Gaia14aae/ASSASN-14cn:

>> http://mnras.oxfordjournals.org/content/452/1/1060.full

Gaia14aae is a deeply eclipsing system, with the accreting WD being totally eclipsed on a period of 0.034519 d (49.71 min). Assuming an orbital inclination of 90° for the binary system, the contact phases of the WD lead to lower limits of 0.78 and 0.015 M⊙ on the masses of the accretor and donor, respectively, and a lower limit on the mass ratio of 0.019.

In order to constrain the scaled WD radius, r1=R1/a, we determined the phase of the WD eclipse to be ΔΦ = 0.0373 ± 0.0005 from our model fit. The ingress and egress phases were deduced from the parametrized model of the binary fitted to the WHT+ACAM light curve. This gives us r1 as a function of the mass ratio q and the inclination i. If we then assume a WD mass–radius relation, we can solve for M1 and M2 using q, r1 and the orbital period using Kepler's laws.

ちょっと前に、ゲノミクスで扱うデータ量と数値スケールが天文学を超えるという話題があったけど、この両者の観測やマイニング手法って多分に通じるものがある。Astronomics.






□ metaCCA: Summary statistics-based multivariate meta-analysis of GWAS using canonical correlation analysis:

>> http://biorxiv.org/content/early/2015/07/16/022665

metaCCA works w/ 3 pieces of the full data covariance matrix & covariance shrinkage algorithm to achieve robustness. metaCCA is the first summary statistic-based framework that allows multivariate representation of both genotypic and phenotypic variables. In large meta-analytic efforts, ability to work w/ summary statistics is beneficial, even when there is an access to the individual-Lv data.






□ Genome Modeling System: A Knowledge Management Platform for Genomics:

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004274

The genome modeling system (GMS) is implemented to use a federated disk SAN, with meta-data stored in a PostgreSQL relational database. GMS incl. MedSeq which attempts to converge all single-subject data into a form suitable for identification of clinically actionable events




□ Hyperscape: visualization for complex biological networks:

>> http://www.compsysbio.org/hyperscape/

a novel hypergraph implementation that better captures hierarchical structures, using components of elastic fibers & chromatin modification. demonstrates the unique capacity of hypergraphs to resolve overlaps, uncover new insights into the subfunctionalization of variant complexes.




□ On licensing bioinformatics software: use the BSD, Luke.

>> http://ivory.idyll.org/blog/2015-on-licensing-in-bioinformatics.html

there's at least four arguments to be made in favor of continuing to use Illumina while avoiding the use of Kallisto. There's no danger of Illumina claiming dibs on any of my results or extensions.




□ groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from GRO-seq data

>> http://www.biomedcentral.com/1471-2105/16/222

Global run-on coupled with deep sequencing provides extensive information on the location and function of coding and non-coding transcripts. groHMM is parameterized by probability distributions representing the number of GRO-seq reads each hidden state emits across the genome and by a 2x2 matrix of transition probabilities between the hidden states. a gamma distribution to model GRO-seq read counts due to its flexibility for representing a variety of probability distributions depending on the values of its parameters, shape (k) and scale (θ).




□ Assembly and diploid architecture of an individual human genome via single-molecule technologies:

>> http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.3454.html

GAGE statistics (for scaffolds) begin by fragmenting the sequences at “N” blocks prior to nucmer alignments. Using the current available C4-P6 SMRT sequencing, could achieve the 44X SMRT sequencing coverage used in as little as ~200 SMRTcells. The raw SMRT cell and reagent cost for this would be roughly $30,000. This does not include sample prep costs.




□ A Bayesian approach for structure learning in oscillating regulatory networks:

>> http://bioinformatics.oxfordjournals.org/content/early/2015/07/14/bioinformatics.btv414.full.pdf

DSS: DFT-based Spike and Slab model: projecting the signal onto a set of oscillatory basis functions using a Discrete Fourier Transform.




□ Why FPKM makes sense:

>> https://biomickwatson.wordpress.com/2015/06/18/why-fpkm-makes-sense/




□ CUA: a Flexible and Comprehensive Codon Usage Analyzer:

>> http://biorxiv.org/content/early/2015/07/19/022814

gawk '$2 > 0.40' codon_tAI.dmel_r5 >optimal_codons.dmel_r5




□ The Model Complexity Myth: You Can Fit Models With More Parameters Than Data Points (an under-determined system)

>> https://jakevdp.github.io/blog/2015/07/06/model-complexity-myth/

successfully fit an (N+2)-parameter model to N data points, and the best-fit parameters are actually meaningful in a deep way the N extra parameters give us individual estimates of whether each of the N data points has misreported errors.

logL_out = -0.5 * (np.log(2 * np.pi * sigma_y ** 2) + ((y - y_model) / sigma_y) ** 2)




□ Implications of diurnal and seasonal variations in renewable energy generation for large scale energy storage:

>> http://scitation.aip.org/content/aip/journal/jrse/6/3/10.1063/1.4874845




□ 23rd Annual International Conference on Intelligent Systems for Molecular Biology and the 14th European Conference on Computational Biology at the Convention Center Dublin, Ireland July 10 – 14, 2015

>> http://www.iscb.org/ismbeccb2015

>> #ISMB2015
>> #bosc2015
>> #HitSeq


□ HiTSeq 2015

>> http://hitseq.org

HiTSeq is an ISMB/ECCB 2015 special interest group satellite conference devoted to the latest advances in computational techniques for the analysis of high-throughput sequencing (HTS) data. It provides a forum for in depth presentations of novel algorithms, analysis methods, and applications in multiple areas of biology that HTS is transforming.




□ NOTES: BIOINFORMATICS OPEN SOURCE CONFERENCE 2015 DAY 1 MORNING ― HOLLY BIK AND DATA SCIENCE

>> https://smallchangebio.wordpress.com/2015/07/10/bosc2015day1a/


□ Notes: Bioinformatics Open Source Conference #BOSC2015 day 2 morning: @ewanbirney , Open Science and Reproducibility

>> https://smallchangebio.wordpress.com/2015/07/11/notes-bioinformatics-open-source-conference-2015-day-2-morning-ewan-birney-open-science-and-reproducibility/

□ Nextflow provides a declarative syntax to write parallel and scalable workflows, a nice domain specific language (DSL) based on Dataflow.


□ Parallel Recipes : massively parallel and distributed workflows made easy: https://github.com/yvdriess/precipes




morgantaschuk:
MC #CommonWL Uses EDAM ontology for metadata - missed opportunity for most workflow developers http://edamontology.org/page #bosc2015

A function or process performed by a tool; what is done, but not (typically) how or in what context.

CWL builds on technologies such as JSON-LD and Avro for data modeling and Docker for portable runtime environments. CWL is designed to express workflows for data-intensive science, such as Bioinformatics, Chemistry, Physics, and Astronomy.




iGenomics:
Volodymyr Kuleshov:Prism Statistical Phaser combines usingpre-phased blocks&local phasing to increase statistical phasing accuracy #ismb2015


□ Quartz algorithm: QUAlity score Reduction at Terabyte scale: Compressive quality scores applied Misra-Gries Algorithm http://groups.csail.mit.edu/cb/quartz/




□ scLVM: a modelling framework for single-cell RNA-seq: dissect the observed heterogeneity into different source

>> https://github.com/PMBio/scLVM

Observed heterogeneity in single-cell profiling data is multi-factorial. scLVM provides an framework for unravelling this heterogeneity, correcting for confounding factors and facilitating unbiased downstream analyses. scLVM builds on Gaussian process latent variable models & linear mixed models. The underlying models are based on inference schemes in LIMIX

Args:
Y: gene expression matrix [N, G]
geneID: G vector of geneIDs
tech_noise:G vector of tech_noise






□ Similarity network fusion for aggregating data types on a genomic scale

>> http://www.nature.com/nmeth/journal/v11/n3/abs/nmeth.2810.html






□ A multiscale statistical mechanical framework integrates biophysical and genomic data:

>> http://www.nature.com/ng/journal/v46/n12/abs/ng.3138.html






□ A hierarchical Bayesian model for flexible module discovery in three-way time-series data:

>> http://bioinformatics.oxfordjournals.org/content/31/12/i17.full

The algorithm is called TWIGS (three-way module inference via Gibbs sampling). TWIGS outperforms standard algorithms even when the core modules have no additional subject-specific signal. When subject-specific signals exist, the ability of extant algorithms to detect the core modules declines markedly, whereas the performance of TWIGS remains high.




□ eTRIKS: Enabling disease stratification and biomarker discovery by multi-study data harmonisation

>> http://www.etriks.org






kousikbioinfo:
An amazing place to visit !! Book of Kells -- Trinity college Dublin !! #ISMB2015 @PLOSCompBiol




□ Society for Molecular Biology and Evolution from July 12th to the 16th, 2015. Hofburg Palace, a former royal residence in the heart of Vienna.

>> http://smbe2015.at #SMBE15




LucyvanDorp:
We've arrived. Amazing venue! #smbe15


ScientistSoph:
Still can't get over this venue...! #smbe15


hmtme:
The plenary talk is here! #smbe15


かつてのハプスブルク帝国の中心地であるホーフブルグ王宮で、最先端の分子生物学・生物進化学会が開かれてると思うと感慨深いな。




MBE_press:
For every million years, in any given region, switches between transcribed and non-transcribed -DT #smbe15






□ Speeding up tree likelihood computation using state aggregation:

>> http://f1000research.com/posters/1098132

by reducing the number of states in a continous-time Markov chain without losing the dimensionality of the models. the aggregation optimization in FastCodeML which uses Branch-Site model to infer positive selection along positions of a protein-coding gene.






□ dynamic graphlets: Exploring the structure and function of temporal networks

>> http://www3.nd.edu/~cone/DG/

a temporal network is to completely discard its time dimension by aggregating all nodes and edges into a single static network.

recursive formulas for D(n, k): D(3,k)=3D(3,k-1)+D(2,k-1),n=3 and D(n,k)=(2n-3)D(n,k-1)+2D(n-1,k-1),n>3

the four different graphlet methods differ not only quantitatively but also qualitatively, static, static-temporal and dynamic graphlets identify different nodes as topologically similar.






□ Modeling coding-sequence evolution: The intersection between computational frameworks

>> http://figshare.com/articles/SMBE_2015_Presentation/1481093




□ Late Pleistocene climate change and the global expansion of anatomically modern humans:

>> http://m.pnas.org/content/109/40/16089.abstract

Including fluctuations in climate and sea level improves computational models of human demography. Using its spatial framework, simulated haplotypes for the HGDP-CEPH populations & traced the lineages for haplotypes from the same continent




mwilsonsayres:
Komarova: Track stem cells w/ asymmetric division (i), proliferation (i+1), & differentiation (i-1). If P(diff)>P(prol) =>disappear #smbe15




jplotkin:
Oli Tenaillon - Discreteness and continuity in adaptive landscapes #smbe15
Fits FGM dimensionality (n) to lattice-model of protein stability, based on time-series of fitness. Diff folds->Diff n #smbe15
Stability itself is a multidimensional phenotype [ed: based on lattice models of proteins] #smbe15




ContinuumIO:
Didn't make it to Austin last week? Find all our #SciPy2015 talks/tutorials on YouTube here: http://ow.ly/Pydqx #Python @SciPyConf




□ Causal-Bayesian-NetworkX:

>> https://github.com/michaelpacer/Causal-Bayesian-NetworkX
>> https://www.youtube.com/watch?v=qWAQgWOD_nA




□ UDL: Unified Interface for Deep Learning:

>> https://www.youtube.com/watch?v=4Sk4R8mZIIo

UDL is a Python object-oriented library that provide a unified inter- face for deep learning libraries. UDL can make it easy to integrate different components implemented in Pylearn2, Caffe, and Scikit-learn in the same pipeline.






DadiCharles:
How convert a predictive maintenance system into ROI? #rail #BigData #predictive #DataScience http://bit.ly/1fwt0fl






□ Controllability and observability of Boolean networks arising from biology

>> http://scitation.aip.org/content/aip/journal/chaos/25/2/10.1063/1.4907708

Consequently, simple necessary and sufficient conditions for reachability, controllability, and observability are obtained, and algorithmic tests for controllability and observability which are based on the Gröbner basis method are presented.




□ PROBABILISTIC MODELING IN GENOMICS: probabilistic models, algorithms, and statistical methods in genomics.

>> http://meetings.cshl.edu/meetings.aspx?meet=probgen&year=15




tri_iro:
実数の分類で思い出したけど、CCA (解析学における計算可能性&複雑性の国際会議 http://cca-net.de/cca2015/ )に結局Tentって来たのかな。タイトル見た感じ、ピリオド https://en.wikipedia.org/wiki/Ring_of_periods … の話をするのかなと思ってずっと気になってた




□ On the Complexity of k-Piecewise Testability and the Depth of Automata:

>> https://ddll.inf.tu-dresden.de/web/Inproceedings3022

For a non-negative integer k, a language is k-piecewise testable (k-PT) if it is a finite boolean combination of languages of the form \Sigma^*a1\Sigma^*...\Sigma^*an for ai in \Sigma and 0

M.E.S.H. / "Piteous Gate"

2015-07-19 18:19:03 | music15



□ M.E.S.H. / "Piteous Gate"





>> tracklisting.

01. Piteous Gate
02. Optimate
03. Thorium
04. The Black Pill
05. Kritikal & X
06. Epithet
07. Jester’s Visage
08. Methy Imbiß
09. Azov Seepage



“tightly gridded and sculpted sound” that “juts up against loose-wristed improvisation, automated processes, and collage”

ベルリンのアンダーグラウンド・シーンの新鋭DJクルー、Janusを代表する作曲家のデビューアルバムから。クラブ音楽というよりは、インダストリアルに近いけど、そこはかとなく漂うオサレ感。でも、"The Black Pill"や"Jester’s Visage"はしっかりアンダーグラウンドらしく狂気を感じさせる。歪んだ弦の音階が和っぽかったりバロックっぽかったり。




□ Deep Forest - Sweet Lullaby (blank & Jones RELAX Mix)

>> https://pro.beatport.com/track/sweet-lullaby-relax-mix/6851091

B&Jによる新リミックス。音涼みにぴったりなアンビエント。





Come down to us.

2015-07-07 19:37:37 | Science News



Not the intense moment
Isolated, with no before and after,
But a lifetime burning in every moment.

- T.S. Elliot.


世の中には、こうであるべき人も、そうであるべき意味も、定められた価値も、それ自身に拠って立つものはなく、すべてが動的な写像の描く紋様の中にある。
月明かりを切り取る黒い山の稜線、散らした星の結ぶ人の形、各々に明滅する信号の律動、暗い岩肌を打って翻る風洞音。その空っぽの殻の中に共鳴し、吹き抜けていく幽かな響きが私だ。

心は声よりも僅かに遅れてやってくる。恒に人は己の感情に先立って、反射の如く声を鳴らし、そして静寂を打った一瞬後に心が合成される。

誰かと共にあることは、とても意味深いことのように思える。人は元来、己の心に閉ざされて這い出ることない孤独な方陰だ。今日この日に、私が誰の近くに居て、誰かの側を通り過ぎたことは、一緒にいた誰かと、すれ違った誰かと同じものを共有したということではない。

独立した面と面が交わるように、そこにいた誰かと何らかの関係性で結ばれるということは、同じ場所、同じ空間を共有することと同義ではなく、お互いに動的に向かい合うことで生じる、書き換えと転写の連続性、遷移状態に拠って立つ。




… and a record of who made them, up to the minute, a permanent record …

… and every change could be reviewed by anyone, in a totally transparent way, and you can bundle changes and turn them into branches,

… and anyone can make as many branches as needed, without violating the integrity of the other branches …

... and everyone can have the history of every change ever made to the code, even if the codebase is decades old ...


"philosophy might elucidate the `true meaning' of axioms and of definitions by examining their ontology in a wider context."






□ BASiCS: Bayesian Analysis of Single-Cell Sequencing Data:

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004333

BASiCS treats cell-specific normalising constants (ϕj’s, sj’s) as model parameters, and estimates by combining information across all genes. MCMC_Output


□ A powerful HMMER for data mining:

>> http://www.ebi.ac.uk/about/news/press-releases/HMMER-website-launch

The repertoire of profile hidden Markov model libraries, which are used for annotation of query sequences with protein families and domains, has been expanded to include the libraries from CATH-Gene3D, PIRSF, Superfamily and TIGRFAMs.




□ Measuring Fisher Information Accurately in Correlated Neural Populations:

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004218

an analytical expression for the variance of the direct, bias-corrected estimator allows one to draw exact error bars without relying on bootstrapping methods.




□ A semi-supervised approach uncovers thousands of intragenic enhancers differentially activated in human cells:

>> http://biorxiv.org/content/early/2015/06/03/020362.full-text.pdf

The cassete exon shows increased inclusion in K562 w/ delta-PSI = 0.72 (PSI =0.86 in K562 and PSI = 0.14 in GM12878)

Enriched Gene Ontology processes in the genes with active or silent enhancers and with regulated events, compared to genes with intragenic enhancers but no regulated events.




□ FinisherSC : A repeat-aware tool for upgrading de-novo assembly using long reads:

>> http://bioinformatics.oxfordjournals.org/content/early/2015/06/03/bioinformatics.btv280.short




wef:
How to help farmers prevent #hunger in #Ebola-hit countries http://wef.ch/1AvaOeT






□ What My Deep Model Doesn't Know… uncertainty in regression and deep reinforcement learning.

>> http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html

minimise the KL divergence from the full posterior, which would result in the approximating model fitting not just the first moment of the posterior (our predictive mean) but also the second moment (resulting in a sensible variance estimate).

minimise the Kullback–Leibler (KL) divergence:

argminθKL(qθ(ω) | p(ω|X,Y)).

maximising the log evidence lower bound:

LVI:=∫qθ(ω)logp(Y|X,ω)dω-KL(qθ(ω)||p(ω))

Compute the value of doing any action and return the argmax action & value taking uncertainty into account sampling from the dropout network

var svol=new convnetjs.Vol(1, 1, this. net_inputs);
svol.w=s;
var is_sample=true;
var action_values=this.value_net.forward(svol, is_sample);






PLOSCompBiol:
Convex Clustering and Synaptic Restructuring: the @PLOSCompBiol May Issue:

>> http://blogs.plos.org/biologue/2015/06/08/convex-clustering-and-synaptic-restructuring-the-plos-cb-may-issue/






□ KaBOB: ontology-based semantic integration of biomedical databases:

>> http://www.biomedcentral.com/content/pdf/s12859-015-0559-3.pdf

An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples.






□ Monte Carlo Planning Method Estimates Planning Horizons during Interactive Social Exchange:

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004254




□ DNNGraph - A deep neural network model generation DSL in Haskell:

>> https://github.com/ajtulloch/dnngraph

The Torch backend generates Lua code. Anything network that can be expressed as a nested combination of computational layers, combined with nn.Sequential, nn.Concat, nn.ModelParallel, nn.DataParallel etc can be generated under this framework.

高度に抽象化されたlens libraryベースでDNNを実装出来るかというと、意外と親和性が高い




□ Chainer: Bridge the gap between algorithms and implementations of Deep Learning

>> http://chainer.org

Chainer stores the history of computation. This strategy enables to fully leverage the power of programming logic in Python.






□ BreakSeek: a breakpoint-based algorithm for full spectral range INDEL detection:

>> http://nar.oxfordjournals.org/content/early/2015/06/27/nar.gkv605.full

by incorporating sophisticated statistic models, the first time investigated and demonstrated the importance of handling false and conflicting signals for multi-signal integrated methods. a novel Bayesian classification system and the SW alignment based filtration for deletions, BreakSeek outperforms existing INDEL discovery methods on sensitivity and specificity, particularly for detecting full size range of INDELs.




□ PredSTP: a highly accurate SVM based model to predict sequential cystine stabilized peptides:

>> http://www.biomedcentral.com/1471-2105/16/210

a species-agnostic machine learning method, designed to nominate undefined STPs having low sequence identity with currently described STPs. Sensitivity [TP/(TP + FN)], specificity [TN/(TN + FP)], precision [TP/(TP + FP)], accuracy [(TP + TN)/(TP + FN + TN + FP)]. Mathews Correlation Coefficient [(TPXTN-FPXFN)/sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN) were calculated the performance of the algorithm.




□ Salk recruits human geneticist Graham McVicker - Salk Institute - News Release

>> http://www.salk.edu/news/pressrelease_details.php?press_id=2086#.VXTw5K2hKB0.twitter




□ Providing bioinformatics analysis environment with Virtual machine: Bayes Linuxでバイオインフォマティクス解析環境を簡単に構築する:

>> http://qiita.com/dritoshi/items/707d3dd1fe9ed4f3b5b6




□ PROSTA-inter: Finding optimal interaction interface alignments between biological complexes:

>> http://bioinformatics.oxfordjournals.org/content/31/12/i133.full

the PROSTA-inter method to determine and align the interaction interfaces between two arbitrary types of complex structures. Since the IS-scores are independent from the number of interface residues/nucleotides, a higher IS-score implies a better alignment.




tri_iro:
前からぼそぼそ呟いていたヴォート予想に関する解説を書いてみたので、数理論理学とかモデル理論とか記述集合論とかに興味があるひとはどうぞ:

>> http://recursion-theory.blogspot.com/2015/06/blog-post.html

"位相ヴォート予想と,連続 $\mathcal{L}_{\omega_1\omega}$-論理に対するヴォート予想は同値である."
"ω1CK,M:=min{ω1CK,x:x を神託 (oracle) に用いて M の複製をチューリング機械によって表示可能である"

エリオット・プログラムとは...

□ Elliott’s program and descriptive set theory III

>> http://www.fields.utoronto.ca/programs/scientific/13-14/summer-research13/notes/manchester-lc2012-3.pdf

Elliott’s program: Classify separable, unital, simple, nuclear C*-algebra by K-theoretic invariants.

Logic of C*-algebras: Semantics

If φP(x) is ∥x2 -x∥+∥x -x∗∥ then the zero-set of φP
{a∈A|φP(a)A =0}
is the set of projections in A.






infoecho:
Genome Assembly Tutorial in 140 chrs: Repeats and Assembly String Graph Topology 1/3,



□ Falcon String Graph Assembler Internal:

>> http://nbviewer.ipython.org/urls/dl.dropboxusercontent.com/u/38943405/Falcon_Internal/Falcon_Internal_Sec1.ipynb







□ Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false):

>> http://arxiv.org/pdf/1412.0348v2.pdf

The latter result would violate the Strong Exponential Time Hypothesis, which postulates that such algorithms do not exist.

Reducing Orthogonal Vectors Problem to PATTERN

vector gadget sequences as
VG1(a) = Z1LV0RZ2 and VG2(b) = V1DV2
Z1 =Z2 =0l2, V1 =V2 =V0 =1l2

The orthogonal vectors problem has an easy O(N2d)-time solution. any algorithm for this problem with strongly sub-quadratic running time would also yield a more efficient algorithm for SAT, breaking SETH.




□ SLICEMBLER: De novo meta-assembly of ultra-deep sequencing data:

>> http://www.ncbi.nlm.nih.gov/pubmed/26072514?dopt=Abstract

SLICEMBLER partitions the input data into optimal-sized “slices” and uses a standard assembly tool (e.g., Velvet, SPAdes, IDBA, Ray)

for n in st.preOrderNodes:
p = n.parent
if p is None: # the root
n._pathLabel = ''
else:
n._pathLabel = p._pathLabel + n.edgeLabel




□ Tools and pipelines for BioNano data: molecule assembly pipeline and FASTA super scaffolding tool

>> http://biorxiv.org/content/biorxiv/early/2015/06/15/020966.full.pdf

The most current assembly workflow, AssembleIrysXeonPhi maintains all the functionality of AssembleIrysCluster (e.g. adjusting stretch by scan and writing assembly scripts with all combinations of three p-ValueThresholds and three Minlen parameters) but runs on our new machine with the latest release of the BioNano Assembler and RefAligner.




□ Seattle-Based Genomics Company Develops Disruptive Gene Sequencing Technology

>> http://www.forbes.com/sites/robertglatter/2015/06/16/seattle-based-genomics-company-develops-disruptive-gene-sequencing-technology/

Stratos Genomics社が、自社技術であるXpandomer法におけるnanopore sequencing(X-NTP)を発表。提携関係のロシュとFisk Venturesから1500万ドルの資金調達。目標設定から半年で達成という所、海外のリスクテイクの違いを思わせる。

Stratos’ Sequencing by SBX is an efficient, low-cost DNA preparation method that rescales a DNA target into a longer surrogate polymer. This surrogate, called an Xpandomer™, encodes the sequence information in high signal-to-noise reporters.




□ Pyvolve: A Flexible Python Module for Simulating Sequences along Phylogenies:

>> http://biorxiv.org/content/biorxiv/early/2015/06/16/020214.full.pdf

Pyvolve includes several novel sequence simulation features, including a new rate matrix scaling algorithm and branch-length perturbations.

pyvolve.read_tree
model1 = pyvolve.Model("nucleotide", alpha = 0.7, num_categories = 4 )
part1=pyvolve.Partition(models = model1, size = 50)




□ Massively parallel quantification of the regulatory effects of non-coding genetic variation in a human cohort:

>> http://genome.cshlp.org/content/early/2015/06/17/gr.190090.115.abstract

most genetic variants have weak effects on distal regulatory element activity. Because haplotypes are typically maintained within but not between assayed regulatory elements, the approach can be used to identify causal regulatory haplotypes that likely contribute to human phenotypes.




□ ATAC-Seq: Single-cell chromatin accessibility reveals principles of regulatory variation:

>> http://www.nature.com/nature/journal/vaop/ncurrent/full/nature14590.html

Single-cell ATAC-seq (scATAC-seq) maps from hundreds of single cells in aggregate closely resemble accessibility profiles from tens of millions of cells and provide insights into cell-to-cell variation. The pattern of accessibility variation in cis across the genome recapitulates chromosome compartments de novo, linking single-cell accessibility variation to three-dimensional genome organization.






□ Compact graphical representation of phylogenetic data and metadata with GraPhlAn:

>> https://peerj.com/articles/1029/

GraPhlAn, a new method for generating high-quality circular phylogenies potentially integrated with diverse, high-dimensional metadata. GraPhlAn has been developed with command-driven automation in mind, as well as flexibility in the input “annotation file” so as to be easily generated by automated scripts.




□ DNAnexus Cloud Genomics Platform to Support Data Management and Genomic Analysis for a Global Research Consortium:

>> http://www.businesswire.com/news/home/20150630005478/en/DNAnexus-Cloud-Genomics-Platform-Support-Data-Management#.VZOAn2D2BE4

The DNAnexus global network provides hundreds of researchers at institutions worldwide secure and immediate access & use of ENCODE’s results. The platform supports the DCC bioinformatics analysis of ENCODE data, making the consortium’s methods and sharing for its Phase 3 project. It’s expected this analysis will require 10 million core-hours of compute & generate nearly 1 petabyte of raw data over the next 18 months.




□ Distributed evolutionary algorithms: Classify the models into population / dimension-distributed groups semantically

>> http://www.sciencedirect.com/science/article/pii/S1568494615002987

Population-distributed models are presented with master-slave, island, cellular, hierarchical, and pool architectures, which parallelize an evolution task at population, individual, or operation levels. Dimension-distributed models include coevolution and multi-agent models, which focus on dimension reduction. Insights into the models, such as synchronization, homogeneity, communication, topology, speedup,






□ Physicists demonstrate new violations of local realism.

>> http://bit.ly/1JKGtut




□ On integrability of some bi-Hamiltonian two field systems of partial differential equations:

>> http://scitation.aip.org/content/aip/journal/jmp/56/5/10.1063/1.4919542

This is applied to the construction of some new two field integrable systems of PDE by taking the pair (H 0, H 1) in the family of compatible Poisson structures that arose in the study of cohomology of moduli spaces of curves.




□ Infinitely many solutions to a linearly coupled Schrödinger system with non-symmetric potentials:

>> http://scitation.aip.org/content/aip/journal/jmp/56/5/10.1063/1.4921637

Using the Liapunov-Schmidt reduction methods two times and combining localized energy method, prove that the problem has infinitely many positive synchronized solutions, extends result about nonlinearly coupled Schrödinger equations




□ The Machine as Data: A Computational View of Emergence and Definability:

>> http://arxiv.org/ftp/arxiv/papers/1506/1506.06270.pdf

In our everyday world, we may retain distinctions between physical observables and their semantical content. informational ‘phase transitions’ which certainly cross semantic/computational barriers in ways which do not depart our observational domain.

digital ontology is not a satisfactory approach to the description of the environment in which informational organisms like us are embedded.