2015年2月のブログ記事一覧-lens, align.

what we were, and what we are.

2015-02-15 14:31:16 | Science News

□ Kiwi: a tool for integration and visualization of network topology and gene-set analysis:

>> http://www.biomedcentral.com/1471-2105/15/408 …

The shortest path length (SPL) measures the shortest distance between two gene-sets and is a property of the network that indicates whether the two gene-sets are interacting directly or indirectly via a certain number of intermediates.

□ Computation in Dynamically Bounded Asymmetric Systems: 動的な有界非対称ネットワークの計算。制約的生体システムのエントロピーを修正

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004039 …

very simple organizational constraints that combine these motifs can lead to spontaneous computation and so to the spontaneous modification of entropy that is characteristic of living systems.

the underlying computational elements of the network are not themselves stable. Instead, the overall boundedness of the system is provided by the asymmetrical coupling between excitatory and inhibitory elements commonly observed in neuronal and molecular networks.

Essentially, a nonlinear time-varying dynamic system will be called contracting if arbitrary initial conditions or temporary disturbances. are forgotten exponentially fast, trajectories of the perturbed system return to their unperturbed behavior w/ exponential convergence rate

A nonlinear contracting system has the following properties

・global exponential convergence and stability are guaranteed
・convergence rates can be explicitly computed as eigenvalues of well-defined Hermitian matrices
・combinations and aggregations of contracting systems are also contracting
・robustness to variations in dynamics can be easily quantified

□ Complexity Measurement Based on Information Theory and Kolmogorov Complexity:

>> http://www.mitpressjournals.org/doi/abs/10.1162/ARTL_a_00157 …

integrate the Shannon's information theory and Kolmogorov complexity, applied to elementary cellular automata and simulations of the self-organization of porphyrin molecules.

□ Biology How Does Information/Entropy/ Complexity fit in?

>> https://t.co/0HRIFXcq1P

The amount of compression is a good way to approximate K(s)
–Compression of Human Genome ~ 12%

Conditional Kolmogorov Complexity:
K(x|y) the shortest program which spits out xgiven y
Not Symmetric, so still need to find a good distance metric between two sequences

□ n0rr:
コルモゴロフ複雑性 K(X)の概算のため、Cilibrasi とVitanyi は圧縮を用いることを提案した。K(x)は文字列x の最高の圧縮と考えられるためである。 file:///Users/nor/Downloads/IPSJ-SES2011021.pdf

□ GenoMetric Query Language: A novel approach to large-scale genomic data management:

>> http://bioinformatics.oxfordjournals.org/content/early/2015/02/02/bioinformatics.btv048.short …

GMQL leverages a simple model that provides abstractions of genomic region data and associated experimental, biological & clinical metadata. GenoMetric Query Language can be used independently or within GenData 2020 a server-based architecture based on Hadoop & Apache Pig platform

□ flowCL: ontology-based cell population labelling in flow cytometry

>> http://bioinformatics.oxfordjournals.org/content/early/2015/01/20/bioinformatics.btu807.short …

FlowCLの論文が出てた。RとSPARQLを併用した細胞集団の意味論的標識化。

flowCL, a software package that performs semantic labelling of cell populations based on their surface markers and applied it to labelling. flowCL executes queries against the Cell Ontology, hosted on a triplestore, a database for storage and retrieval of RDF triples.

□ GRANITE – an integrative genomic tool for large complex data analysis:

静的発現データの経時評価と相互ネットワーク生成のための統合的ゲノムツール

>> http://www.rna-seqblog.com/granite-an-integrative-genomic-tool-for-complex-data-analysis/ …

responder AND non-responder yields the ‘common’ or ‘intersection’ network made up of the relationships that are common to both responder and non-responder groups. GRANITE supports six different methods to partition a graph using these logical operators:

GRANITE drops nodes of zero degree(spurious/no connections) in the induced subgraph. Network models are induced for both the responder group and the non-responder group, and then analysis is performed on the partitions defined above through graph visualization and graph measures.

□ Integrating Large-Scale RNA-Seq and CLIP-Seq Datasets Enables Study of lncRNA:

>> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4294205/ …

□ Multi-dimensional genome-wide analysis for producing gene regulatory networks underlying retinal development:

>> http://www.sciencedirect.com/science/article/pii/S1350946215000063 …

□ Introduction to Biodiversity Informatics:

>> http://figshare.com/articles/Introduction_to_Biodiversity_Informatics/1295382 …

Current taxonomic data

• 15-20k new spp. described annually (2M total)
• 30k nomenclatural acts (12M total)
• 20k phylogenies (750k total)
• 31k taxa sequenced (360k taxa total)
• 800k BioMed papers (40M total pp. of taxonomy)

□ ontology usage:

□ SynBioLGDB: A Gateway for Logical Biology: a resource for experimentally validated logic gates in synthetic biology:

>> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4308699/ …

生物学的論理ゲートの概念。AND, OR, NOR, NOT, NAND, XOR,を含む189のロジックゲートを有し、バイナリでないので、アナロジーの複雑さを軽減できる他、リボザイムベースのNORゲート問題に有用なリソースとなる。

SynBioLGDB has 80 AND gates, 8 Buffer gates, 7 Combinatorial gates, 10 NAND, 16 NOR, 28 NOT, 17 OR gates, 7 XOR gates and 16 other gates. diverse genetic logic gates capable of generating a Boolean function play critically important roles in synthetic biology. Basic genetic logic gates have been designed to combine biological science with digital logic.

□ GBIF, biodiversity informatics and the "platform rant":

>> http://iphylo.blogspot.jp/2015/01/gbif-biodiversity-informatics-and-rant.html …

"the goal of the platform is NOT to "help" users - that simply reinforces the distinction between you and the "users""

□ LFQC: a lossless compression algorithm for FASTQ files: better than gzip, bzip2, fastqz, fqzcomp, G-SQZ, SCALCE, DSRC

>> http://bioinformatics.oxfordjournals.org/content/early/2015/01/20/bioinformatics.btu701.short …

The improvement obtained is up to 225% on the datasets (SRR065390_1), the average improvement (over all the algorithms compared) is 74.62%

□ Machine Learning for Bioinformatics: MATLAB:

>> http://au.mathworks.com/examples/bioinfo/category/machine-learning-for-

Identifying Biomolecular Subgroups Using Attractor Metagenes Algorithm which are defined as the attracting fixed points of iterative process. The algorithm exists within the broad family of unsupervised machine learning. Related algorithms include principal component analysis, various clustering algorithms (especially fuzzy c-means), non-negative matrix factorization, and others.

□ Modeling and enhanced sampling of molecular systems with smooth and nonlinear data-driven collective variables:

>> http://scitation.aip.org/content/aip/journal/jcp/139/21/10.1063/1.4830403 …

a machine learning method of SandCV provides a description of the system that closely mimics one based on the conventional dihedral angles. This system is a benchmark for free energy calculations and has well-known and highly nonlinear collective variables.

□ New Computational Framework Provides Pipelines for Reproducible Multi-Omics Data Analysis:

>> https://www.genomeweb.com/informatics/new-computational-framework-provides-pipelines-reproducible-multi-omics-data-analysis

□ BioGPS Featured Article- Multi-omic landscape of rheumatoid arthritis: re-evaluation of drug adverse effects:

>> http://sulab.org/2015/02/biogps-featured-article-multi-omic-landscape-of-rheumatoid-arthritis-re-evaluation-of-drug-adverse-effects/

□ A Memory Efficient Short Read De Novo Assembly Algorithm:

>> https://www.jstage.jst.go.jp/article/ipsjtbio/8/0/8_2/_pdf …

The average maximum memory consumption of the proposed method for human chromosome 14 was approximately 54% of SOAPdenovo2 and that was approximately 63% of Velvet. the vertices that are put together in a path are as- signed the same label. A path from the start vertex to the end vertex represents a subgraph. Multiple subgraphs are created in this process.

□ PLNseq: a multivariate Poisson lognormal distribution for high-throughput matched RNA-sequencing read count data

>> http://onlinelibrary.wiley.com/doi/10.1002/sim.6449/abstract

The correlation is directly modeled through Gaussian random effects, and inferences are made by likelihood methods. A three-stage numerical algorithm is developed to estimate unknown parameters and conduct differential expression analysis.

Results using simulated data demonstrate the method performs reasonably well in terms of parameter estimation, DE analysis power, robustness. PLNseq also has better control of FDRs than the benchmarks edgeR and DESeq2.

□ Coevolutionary analyses require phylogenetically deep alignments and better null models to accurately detect inter-protein contacts within and between species:

>> http://biorxiv.org/content/biorxiv/early/2015/02/06/014902.full.pdf

□ EMASE: Expectation-Maximization algorithm for Allele Specific Expression:

>> https://pypi.python.org/pypi/emase/0.9.0 …

The EM algorithm employed in EMASE models multi-reads at the level of gene, isoform, and allele and apportions them probabilistically.

emase.Sparse3DMatrix

class emase.Sparse3DMatrix.Sparse3DMatrix(other=None, h5file=None, datanode='/', shape=None, dtype=<type 'float'>)

□ Empirical GO: Measure Enrichment using an Empirical Sampling Approach

>> http://sgjlab.org/empirical-go/

The Empirical GO generates the empirical distribution of the number of mRNA target genes in GO terms and returns p-values for enrichment. The code accompanies a manuscript submitted for publication in Bioinformatics.

□ Parallel de Bruijn Graph Construction and Traversal for de novo Genome Assembly:

>> http://www.homolog.us/blogs/blog/2015/01/27/parallel-de-bruijn-graph-construction-and-traversal-for-de-novo-genome-assembly/ …

a novel algorithm that leverages 1-sided communication capabilities of Unified Parallel C to facilitate the requisite fine-grained parallelism. and avoidance of data hazards, while analytically proving its scalability properties.

□ Parallel Bayesian Network Structure Learning for Genome-Scale Gene Networks:

>> http://ieeexplore.ieee.org/xpl/login.jsp

□ OneCodex:
generate + share public links for your NGS data analyses on One Codex http://blog.onecodex.com/2015/02/03/better-data-sharing/ …

□ Orion: Scaling Genomic Sequence Matching with Fine-Grained Parallelization:

>> http://ieeexplore.ieee.org/xpl/login.jsp

a fine-grained parallelism technique called Orion, that divides the input query into an adaptive number of fragments and shards the database.

higher parallelism (and hence speedup) and load balancing than database sharding alone, while maintaining 100% accuracy. 12.3X faster than mpiBLAST for solving a relevant comparative genomics problem.

□ Natera Adopts the DNAnexus Cloud Genomics Platform to Support a Portfolio of Next-Generation Genetic Tests:

>> http://www.businesswire.com/news/home/20150127005167/en/Natera-Adopts-DNAnexus-Cloud-Genomics-Platform-Support …

□ RSEM-EVAL – for evaluating assemblies when the ground truth is unknown:

>> http://www.rna-seqblog.com/rsem-eval-for-evaluating-assemblies-when-the-ground-truth-is-unknown/ …

去年の6月にbiorxivから発表されたDETONATEに関する論文がリリース。

"REF-EVALおよびRSEM-EVALが示唆するアセンブリの相対精度において、TrinityはコンティグとヌクレオチドレベルのF1とKC Scoreに関して最も正確なアセンブリを生成する。"

□ Lauded New Orleans biotech firm Renaissance Rx facing financial trouble: 遺伝子検査株高騰の反動が早くも顕在化してる模様

>> http://www.nola.com/business/index.ssf/2015/02/lauded_new_orleans_biotech_fir.html …

□ RainDance, U Chicago Sue 10X Genomics for Patent Infringement:

>> https://www.genomeweb.com/business-news/raindance-u-chicago-sue-10x-genomics-patent-infringement …

long-readの先進的技術が注目されてた10xに特許侵害の訴訟。AGBTでの詳報を目前にして。。

10x leverages existing short-read NGS but fills in knowledge gaps by taking a DNA/ partitioning the molecules in a massively parallel manner. Each partition has its own barcode, and once the partitioning is completed, the fragments are then pieced together into a long read.

「基板上のトランスポートで発生する反応」が訴状なら、NGSの大部分に引っかかるような気がするけど、"microfabricated"の仕様次第かも。10xの革新性はDNAサンプルの分子の大規模系列的なパーティションで、JP Morgan会議での発表時に5550万ドルを調達してる。

□ Orion Genomics LLC - Product Pipeline Analysis, 2014 Update, Has Been Published: New Market Study:

> http://www.releasewire.com/press-releases/new-market-study-orion-genomics-llc-product-pipeline-analysis-2014-update-has-been-published-577018.htm

□ GrammR: Graphical Representation & Modeling of Count Data Application in Metagenomics: metric multidimension scaling

>> http://bioinformatics.oxfordjournals.org/content/early/2015/01/19/bioinformatics.btv032.abstract …

a novel procedure for determining the number of clusters in conjunction with PAM (mPAM). using metric multidimensional scaling (MDS) as an alternative to PCoA for graphical representation.

□ BioGraphServ: Bioinformatics Graph Server

>> http://biographserv.com/bgs/view_project/bf757a08ccf54eae9d3ba37990bc8d61/ …

BGS using the Django framework, Graphs/Analysis use Pandas/Matplotlib. dispatched to asynchronous workers (Celery) & call a reference server

Drug & Dropで使用できるバイオインフォマティクス用グラフ・サーバ。BED, VCF, Expression, CuffDiffで自動解析。

□ c_z:
“Graph Analysis and Visualization: Discovering Business Opportunity in Linked Data” http://www.wiley.com/WileyCDA/WileyTitle/productCd-1118845846.html …

□ Advantages of distributed & parallel algorithms: leverage Cloud Computing platforms for large-scale genome assembly:

>> http://f1000research.com/articles/4-20/v1 …

the Hadoop implementation of the Contrail algorithm in the Map phase scans each read and emits the key-value pairs (u, v) corresponding to overlapping k-mer pairs that form an edge. by aggregation of identical K-mers in the Reduce phase, where also linear paths of the de Bruijn graph are calculated and continuously overlapping K-mers are simplified into single graph nodes representing longer stretches of sequence.

□ Graphical Fragment Assembly format

>> http://lh3.github.io/2014/07/19/a-proposal-of-the-grapical-fragment-assembly-format/ …

例の開発中だったゲノムアセンブリの新フォーマット、ABySSにネイティブサポートされた模様。

□ The bioboxes RFC: Request for comments on interchangeable bioinformatics containers

>> https://github.com/bioboxes/rfc

□ GenomicScape: a free online data-mining platform to quickly identify molecular changes during any biological process:

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004077 …

□ BASIL and ANISE: Methods for the Detection and Assembly of Novel Sequence in High-Throughput Sequencing Data

>> http://bioinformatics.oxfordjournals.org/content/early/2015/02/01/bioinformatics.btv051.short …

approaches for detecting insertion breakpoints and targeted assembly of large insertions from non-mapping HTS paired data.

□ sgwrhdk:
創薬等PF最先端セミナーにて、構造生命科学データクラウドVaProS(VAriation effect on PROtein Structure and function) を披露目 http://p4d-info.nig.ac.jp/vapros/

□ Ramsey theory for infinite words: extensions of infinite theorem/Hindman's finite sums theorem/MillikenTaylor theorem

>> http://www.liafa.jussieu.fr/web9/manifsem/description_en.php?idcongres=1798

□ AIP_Publishing:
Global solar irradiation prediction using a multi-gene genetic programming approach http://ow.ly/IBKT1

We are all we need.

2015-02-05 03:33:11 | Science News

「試行」が起きているのではない。もし、生命事象の高知能化が「何者か」を再現する試みであれば、それは必然的に帰結している。

□ Wanderers - a vision of humanity's expansion into the Solar System by Erik Wernquist

>> http://vimeo.com/108650530

□ Fibonacci Zoetrope Sculptures

>> http://web.stanford.edu/~edmark/

These 3-D printed sculptures, called aniforms, are designed to animate when spun under a strobe light. The placement of the appendages is determined by the same method nature uses in pinecones and sunflowers. The rotation speed is synchronized to the strobe so that one flash occurs every time the sculpture turns 137.5º―the golden angle. If you count the number of spirals on any of these sculptures you will find that they are always Fibonacci numbers.

□ Path Tracing 3D Fractals:

>> http://blog.hvidtfeldts.net/index.php/2015/01/path-tracing-3d-fractals/

□ blackmika:
Canvas+JS random L-system generator http://codepen.io/mikkamikka/full/sHrzL … still giving surprises #lsystem #generative

□ Visualizing Representations: Deep Learning and Human Beings:

>> http://colah.github.io/posts/2015-01-Visualizing-Representations/

□ Building & deploying large-scale machine learning pipelines: ml-matrix/scikit-learn/GraphLab

>> http://radar.oreilly.com/2015/01/building-and-deploying-large-scale-machine-learning-pipelines.html

□ Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints:

>> http://arxiv.org/pdf/1408.3595v3.pdf …

Integral Quadratic Constraints (IQC) can automatically generate verification certificates for machine learning pipelines.

(Schematic of the four main MSA benchmarking strategies)

□ Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment:

>> http://arxiv.org/abs/1211.2160

Structure-based Advantages: Independence: empirical data is used as input

Risks: Relevance: limited to structurally conserved regions;

□ A Probabilistic Palimpsest Model of Visual Short-term Memory:

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004003 …

The Cramer-Rao lower bound transforms the Fisher information into an estimate of performance in the task.

posterior distribution p(θ | y)
saturating the Cramer-Rao bound

E [ t(y) | θ ] = θ

Var [ t(y) | θ ] = 1/FI(θ)

□ MICA: A fast short-read aligner that takes full advantage of Intel® Many Integrated Core Architecture (MIC):

>> http://arxiv.org/pdf/1402.4876v1.pdf …

Experiments on aligning 150bp paired-end reads show that MICA using one MIC board is 4.9 times faster than the BWA-MEM (using 6-core of a top-end CPU), and slightly faster than SOAP3-dp (using a GPU) MICA on Tianhe-2 with 90 WGS samples (17.47 Tera-bases), which can be aligned in an hour less than 400 nodes. MIC-controller feeds the MIC wi/ a million of reads each time and spawns 224 threads, 56 cores to align in parallel

□ The Simulated Annealing Algorithm:

>> http://katrinaeg.com/simulated-annealing.html …

def anneal(sol):
old_cost = cost(sol)
T = 1.0
T_min = 0.00001
alpha = 0.9
while T > T_min:
i = 1
while i <= 100:
new_sol = neighbor(sol)
new_cost = cost(new_sol)
ap = acceptance_probability(old_cost, new_cost, T)
if ap > random():
sol = new_sol
old_cost = new_cost
i += 1
T = T*alpha
return sol, cost

□ DR-Seq: New Method Allows for Genome, Transcriptome Sequencing from Single Cell: gDNA-mRNA sequencing

>> https://www.genomeweb.com/sequencing-technology/new-method-allows-genome-transcriptome-sequencing-single-cell …

DR-Seq is a quasilinear amplification strategy to quantify genomic DNA & mRNA from the same cell w/o physically separating the nucleic acids

(Alternative hypothesis of complex-trait aetiology: Hypothesis A is the theory that variation is hierarchical, such that variation in DNA leads to variation in RNA and so on in a linear manner.)

□ methods of integrating data to uncover genotype–phenotype interactions

>> http://bit.ly/1z5lVtm

the emerging approaches for data integration incl. meta-dimensional and multi-staged analyses, which aim to deepen understanding of the role of genetics and genomics in complex outcomes.

□ WiringTheBrain:
"The genotype to phenotype link is stochastic, i.e. a single genotype actually makes a range of phenotypes even in a single environment"

□ Motif mining based on network space compression: Random graph structure & sub-graph searching w/ Back Tracking Method

>> http://www.biodatamining.org/content/pdf/s13040-014-0029-x.pdf …

standardize the associated matrix: sub-graph isomorphism as it can reduce the complexity of sub-graph isomorphism

(The principle and pipeline of circRNA identification in CIRI)

□ CIRI: an efficient and unbiased algorithm for de novo circular RNA identification:

>> http://genomebiology.com/content/pdf/s13059-014-0571-3.pdf …

CIRI requires two types of files, a FASTA formatted reference sequence and a SAM alignment generated by BWA-MEM algorithm. the short segment (<19 bp using default parameter of BWA-MEM) is ignored by the aligner to prevent multiple mapping or erroneous mapping, such junction reads lack one of the necessary clipping signals in the SAM alignment.

<br />

□ Long read nanopore sequencing for detection of HLA and CYP2D6 variants and haplotypes:

>> http://f1000research.com/articles/4-17/v1 …

he MinION nanopore sequencer is capable of producing very long reads to resolve both variants and haplotypes of HLA-A, HLA-B and CYP2D6 genes important in determining patient drug response in sample NA12878 of CEPH/UTAH pedigree 1463, without the need for statistical phasing.

Long read data from a single 24-hour nanopore sequencing run was used to reconstruct haplotypes, which were confirmed by HapMap data and statistically phased Complete Genomics and Sequenom genotypes.

□ ChemAxon's Biomolecule Toolkit: bridge the gap between biology and chemistry for complex biomolecular entities

>> http://www.chemaxon.com/products/biomolecule-toolkit/ …

The Biomolecule toolkit provides JChem-like functionality w/in a Web Service API framework for complex biomolecules which are notoriously difficult to handle using classical chemoinformatic and bioinformatic tools which provides SOAP and REST-ful APIs.

□ dgmacarthur: 10/01/2015
Clinical sequencing company @Invitae filing for IPO: http://wp.me/p5hvhT-6Sq1 Seems like a bizarre move - why not more VC?

Invitae expects to raise $86.3 million dollars from the sale of stock, according to the filing. Last October the company took a sizable $120 million funding round from The Broe Group, Decheng Capital etc.. In total, raised $207 million. For the first nine months of 2014, Invitae reported a loss of $32.2 million. For that same period, the company reported revenues of $700,000. As of September 30, 2014, the company has an accumulated loss of $69.9 million.

Invitae社の資金調達の不可解な動き。original investorが新規公開株による希薄化を牽制するのは自明のことだし、内情は掴めそうだけれど。

□ Illumina Launches Four New Systems; Provides Financial, Dx Update at JP Morgan:

>> https://www.genomeweb.com/business-news/illumina-launches-four-new-systems-provides-financial-dx-update-jp-morgan …

illumina enabling more "capital-constrained" customers to adopt high-throughput whole-genome sequencing with a lower capital investment. The X Five provides more than 9,000 genome sequence a Yr, cost $6Mn per system. Customers will be able to produce a genome for around $1,400

□ 10X Genomics Closes $55.5 Million Series B Round: New genomics platform company change the definition of sequencing

>> http://10xgenomics.com

JP Morgan会議でデビューした10X Genomics社、Illumina一強を崩しうるShort-readシーケシングに関する革新的プラットフォームを期待され$55.5Mnを調達。2月のAGBTで全容を公表する予定らしい。Illuminaを始めとするシーケンシング・システムの多くと互換性を持ち、容易に既存のワークフローに組み込める仕様。

a combination of proprietary microfluidic hardware, chemical reagents, and software, 10X Genomics aims to index the genome before it gets run through a conventional sequencing machine.

The 10X Genomics platform has numerous powerful characteristics, including generation of long range information (10s to 100s of kilo bases) and creation of high-quality sequencing libraries from 1 ng of DNA.

The 10X Genomics platform is a molecular barcoding and analysis suite that delivers structural variants, haplotypes, and other valuable long range contextual information for a broad range of applications including targeted, exome & whole genome sequencing.

visualizes the multi-megabase phase blocks and structural rearrangements revealed by the 10X Genomics platform. Researchers can open our output files in a haplotype-aware genome browser to investigate phased SNPs and indels as well as large-scale insertions, deletions, duplications, translocations, and inversions.

□ Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods:

>> http://bioinformatics.oxfordjournals.org/content/early/2015/01/09/bioinformatics.btu745.full …

Algorithm for taxonomic labeling of query segments (realignment placement algorithm/RPA).

□ A method for calculating probabilities of fitness consequences for point mutations across the human genome:

>> http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3196.html …

partitioned the data into small numbers of classes along each of these three axes by a simple scheme, and considered all possible combinations (the Cartesian product) of these DNase-seq, RNA-seq, and ChIP-seq class assignments. HMM-based or sliding-window methods can only be effectively applied on the scale of large genomic regions rather than individual elements.

a parallel method FitConsD and Evolutionary Turnover

a maximum-likelihood neutral scaling factor sneut w for T

ρdiv (Ci) = 1 － si /si neut

□ VarElect - NGS Phenotyper: more than 100 genomic and biomedical data sources integrated in LifeMap Knowledgebase

>> http://varelect.genecards.org

GeneCards & MalaCards rely on more than 100sources, everything from the big variant databases like ClinVar & OMIM to propriatry data sources

□ Repulsive parallel MCMC algorithm for discovering diverse motifs from large sequence sets:

>> http://bioinformatics.oxfordjournals.org/content/early/2015/01/10/bioinformatics.btv017.full.pdf …

the computation times of RPMCMC and DREME were about a ten- thousandth those of Hegma. To identify the cooperative cofactors of the primary TF, each predicted motif is matched to JASPAR CORE motifs by using the TOMTOM program.

□ tonets:
APBCの論文出てました。MEGADOCK Tesla K20x vs Phi 5110Pです。

BMC Syst Biol | Docking on Accelerators: Comparison of GPU and MIC http://www.biomedcentral.com/1752-0509/9/S1/S6 …

FFT-baseのdockingはTesla K20xとXeon Phi 5110Pで比べたらTeslaの方が2～5倍くらい速かったという話です．Phiはnativeとoffloadでそれぞれ実装しましたが，native実行だとthread並列数がメモリ不足で増やせないという．

実は同じようにbioinformatics系アプリXeon PhiとK20xで比較した論文は2014年に２報出ていて，GWAS (SNP間相互作用決定) だとこれhttp://www.biomedcentral.com/1471-2105/15/216 … ．結果は同じ感じでした．

In the Katchalski-Katzir algorithm, the pseudo-interaction energy score (docking score S) between a receptor protein and a ligand protein is calculated as the convolution of two discrete functions using N3-point forward FFT and inverse FFT (IFFT),

S(t) = ∑v∈V R(v)L(v+t)

= IFFT[FFT[R(v)]*FFT[L(v)]],

R and L are the discrete score
v is a coordinate in the 3D grid space V

□ 23andMe:
Machine Intelligence Cracks Genetic Controls | via @WIRED - http://wrd.cm/1Ax9LZf

□ Extracting reaction networks from databases–opening Pandora’s box

>> http://bib.oxfordjournals.org/content/15/6/973.full

analysis based on the exported exchange files (BioPAX and SBML formatted data), which are the most common and in the case of Panther and PID, the only method for accessing the databases’ content.

large scale goes huge.

□ sinhrks:
はてなブログに投稿しました
Theano で Deep Learning <6>: 制約付きボルツマンマシン <前編> - StatsFragments
http://sinhrks.hatenablog.com/entry/2015/01/12/225149 …

□ single threaded main-memory implementation of graph algorithms often faster than cluster-computing framework:

>> http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html …

□ Illumina Launches HiSeq X Five System and HiSeq 3000/4000 Sequencing Systems:

>> http://www.rna-seqblog.com/illumina-launches-hiseq-x-five-system-and-hiseq-30004000-sequencing-systems/ …

□ mwilsonsayres:
Geneticists should not use “races” to describe sub-populations within a species. They are either sub-populations or sub-species. Thoughts?

□ KristaTernus:
Titus: Recommends a new assembler called MEGAHIT http://hgpu.org/?p=12860 #PAGXXIII

□ infoecho:
@LAHug_
mentioned graph-based reconstruction of genome architecture #PAGXXIII
(Bioinformatists, time to learn more about graph th.)

□ AlanArchibald51:
Graph based assemblies may address need for multiple reference genomes but need new search tools for such assemblies #PAGXXIII

□ Ontorat: automatic generation of new ontology terms, annotations, and axioms based on ontology design patterns:

>> http://www.jbiomedsem.com/content/6/1/4

In Ontorat, the ontology axioms are formatted using the Manchester OWL Syntax, a logical syntax designed for writing OWL class expressions. OntoFox and Ontorat have been combined in use for development of new ontologies, such as the Cell Line Ontology (CLO), Vaccine Ontology, Ontology of Biological and Clinical Statistics (OBCS) & Beta Cell Genomics Ontology

□ Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers

>> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3314989/ …

□ n0rr:
RT 鍵臨界近傍って、次元が減るんですよ。だからパラメータ量が無数にあっても臨界近傍の実質的に支配的なパラメータ群の数は極端に減っていると思います。

□ copypasteusa:
暗号化状態でセキュリティレベルの更新と演算の両方ができる準同型暗号方式を開発 http://prw.kyodonews.jp/opn/release/201501166936/ … 今後、保険やバイオインフォマティクスなどの分野で使われる計算を暗号化したまま行うことで、大規模なプライバシー保護データマイニングシステムが構築可能

□ ym_duality:
寝太郎の機械学習本にあったカーネル法を圏解析した所、カーネル函数と再生核ヒルベルト空間の間には圏論的双対性があるようだ。所謂カーネルトリックは双対性の一適用例に過ぎず、今をときめくビッグデータ解析の背後にも結局は双対性という時流によらぬ普遍原理がある。永遠の偶然的断片としての今。

□ kensukeShimoda:
擬態する動物についてはそもそも昆虫レベルの脳みそで本当に他者の視覚を意識できるのか、意識できた所で意識した方向に進化できるのかという疑問がある。

razoralign: 擬態のプロセスに関する解明も進められてます。視覚認知が遺伝に及ぼす化学的作用の関係も考えられますが、驚く位単純に構成されてる可能性も。擬態に限らず環境因子と遺伝子発現の相互作用は必然・決定論的で、生き残ると無自覚にマスゲームの模様が完成するイメージ。

その意味で、被捕食者を含めた環境はスズメバチの形状を知っていて、スズメバチを含めた環境が被捕食者に対して進化の選択圧をかけた結果と言い換えることも。

□ Gradual and contingent evolutionary emergence of leaf mimicry in butterfly wing patterns

>> http://www.biomedcentral.com/1471-2148/14/229 …

the phylogenetic evolution of leaf mimicry patterns, for which a key principle is the ‘body plan’ or ‘ground plan’,　referring to the structural composition of organisms by homologous elements shared across species.

蝶の擬態に関する知見。空間配置パターンの時系列ベイズ統計。

□ KuboBook:
三月の生態学会のデータ解析集会で「時系列データを直線回帰しちゃいけませんよ」といったハナシでもしようかと考えているのだが…まあ，一番単純な「帰無仮説的」時系列データの例としては，正規乱数を足していったランダムウォークとかで，これを生成する統計モデルは直線回帰のモデルとは別物．

	ブログを読むだけ。毎月の訪問日数に応じてポイント進呈
	gooブロガーの今日のひとこと
	訪問者数に応じてdポイント最大1,000pt当たる！
	goo blogは20周年を迎えました！

2015年2月
日	月	火	水	木	金	土
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28

lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

what we were, and what we are.

We are all we need.