2015-02-05 03:33:11 | Science News

□ Wanderers - a vision of humanity's expansion into the Solar System by Erik Wernquist

>> http://vimeo.com/108650530

□ Fibonacci Zoetrope Sculptures

>> http://web.stanford.edu/~edmark/

These 3-D printed sculptures, called aniforms, are designed to animate when spun under a strobe light. The placement of the appendages is determined by the same method nature uses in pinecones and sunflowers. The rotation speed is synchronized to the strobe so that one flash occurs every time the sculpture turns 137.5º―the golden angle. If you count the number of spirals on any of these sculptures you will find that they are always Fibonacci numbers.

□ Path Tracing 3D Fractals:

>> http://blog.hvidtfeldts.net/index.php/2015/01/path-tracing-3d-fractals/

Canvas+JS random L-system generator http://codepen.io/mikkamikka/full/sHrzL … still giving surprises #lsystem #generative

□ Visualizing Representations: Deep Learning and Human Beings:

>> http://colah.github.io/posts/2015-01-Visualizing-Representations/

□ Building & deploying large-scale machine learning pipelines: ml-matrix/scikit-learn/GraphLab

>> http://radar.oreilly.com/2015/01/building-and-deploying-large-scale-machine-learning-pipelines.html

□ Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints:

>> http://arxiv.org/pdf/1408.3595v3.pdf

Integral Quadratic Constraints (IQC) can automatically generate verification certificates for machine learning pipelines.

(Schematic of the four main MSA benchmarking strategies)

□ Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment:

>> http://arxiv.org/abs/1211.2160

Structure-based Advantages: Independence: empirical data is used as input

Risks: Relevance: limited to structurally conserved regions;

□ A Probabilistic Palimpsest Model of Visual Short-term Memory:

>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004003

The Cramer-Rao lower bound transforms the Fisher information into an estimate of performance in the task.

posterior distribution p(θ | y)
saturating the Cramer-Rao bound

E [ t(y) | θ ] = θ

Var [ t(y) | θ ] = 1/FI(θ)

□ MICA: A fast short-read aligner that takes full advantage of Intel® Many Integrated Core Architecture (MIC):

>> http://arxiv.org/pdf/1402.4876v1.pdf

Experiments on aligning 150bp paired-end reads show that MICA using one MIC board is 4.9 times faster than the BWA-MEM (using 6-core of a top-end CPU), and slightly faster than SOAP3-dp (using a GPU) MICA on Tianhe-2 with 90 WGS samples (17.47 Tera-bases), which can be aligned in an hour less than 400 nodes. MIC-controller feeds the MIC wi/ a million of reads each time and spawns 224 threads, 56 cores to align in parallel

□ The Simulated Annealing Algorithm:

>> http://katrinaeg.com/simulated-annealing.html

def anneal(sol):
old_cost = cost(sol)
T = 1.0
T_min = 0.00001
alpha = 0.9
while T > T_min:
i = 1
while i <= 100:
new_sol = neighbor(sol)
new_cost = cost(new_sol)
ap = acceptance_probability(old_cost, new_cost, T)
if ap > random():
sol = new_sol
old_cost = new_cost
i += 1
T = T*alpha
return sol, cost

□ DR-Seq: New Method Allows for Genome, Transcriptome Sequencing from Single Cell: gDNA-mRNA sequencing

>> https://www.genomeweb.com/sequencing-technology/new-method-allows-genome-transcriptome-sequencing-single-cell

DR-Seq is a quasilinear amplification strategy to quantify genomic DNA & mRNA from the same cell w/o physically separating the nucleic acids

(Alternative hypothesis of complex-trait aetiology: Hypothesis A is the theory that variation is hierarchical, such that variation in DNA leads to variation in RNA and so on in a linear manner.)

□ methods of integrating data to uncover genotype–phenotype interactions

>> http://bit.ly/1z5lVtm

the emerging approaches for data integration incl. meta-dimensional and multi-staged analyses, which aim to deepen understanding of the role of genetics and genomics in complex outcomes.

"The genotype to phenotype link is stochastic, i.e. a single genotype actually makes a range of phenotypes even in a single environment"

□ Motif mining based on network space compression: Random graph structure & sub-graph searching w/ Back Tracking Method

>> http://www.biodatamining.org/content/pdf/s13040-014-0029-x.pdf

standardize the associated matrix: sub-graph isomorphism as it can reduce the complexity of sub-graph isomorphism

(The principle and pipeline of circRNA identification in CIRI)

□ CIRI: an efficient and unbiased algorithm for de novo circular RNA identification:

>> http://genomebiology.com/content/pdf/s13059-014-0571-3.pdf

CIRI requires two types of files, a FASTA formatted reference sequence and a SAM alignment generated by BWA-MEM algorithm. the short segment (<19 bp using default parameter of BWA-MEM) is ignored by the aligner to prevent multiple mapping or erroneous mapping, such junction reads lack one of the necessary clipping signals in the SAM alignment.

□ Long read nanopore sequencing for detection of HLA and CYP2D6 variants and haplotypes:

>> http://f1000research.com/articles/4-17/v1

he MinION nanopore sequencer is capable of producing very long reads to resolve both variants and haplotypes of HLA-A, HLA-B and CYP2D6 genes important in determining patient drug response in sample NA12878 of CEPH/UTAH pedigree 1463, without the need for statistical phasing.

Long read data from a single 24-hour nanopore sequencing run was used to reconstruct haplotypes, which were confirmed by HapMap data and statistically phased Complete Genomics and Sequenom genotypes.

□ ChemAxon's Biomolecule Toolkit: bridge the gap between biology and chemistry for complex biomolecular entities

>> http://www.chemaxon.com/products/biomolecule-toolkit/

The Biomolecule toolkit provides JChem-like functionality w/in a Web Service API framework for complex biomolecules which are notoriously difficult to handle using classical chemoinformatic and bioinformatic tools which provides SOAP and REST-ful APIs.

dgmacarthur: 10/01/2015
Clinical sequencing company @Invitae filing for IPO: http://wp.me/p5hvhT-6Sq1 Seems like a bizarre move - why not more VC?

Invitae expects to raise $86.3 million dollars from the sale of stock, according to the filing. Last October the company took a sizable $120 million funding round from The Broe Group, Decheng Capital etc.. In total, raised $207 million. For the first nine months of 2014, Invitae reported a loss of $32.2 million. For that same period, the company reported revenues of $700,000. As of September 30, 2014, the company has an accumulated loss of $69.9 million.

Invitae社の資金調達の不可解な動き。original investorが新規公開株による希薄化を牽制するのは自明のことだし、内情は掴めそうだけれど。

□ Illumina Launches Four New Systems; Provides Financial, Dx Update at JP Morgan:

>> https://www.genomeweb.com/business-news/illumina-launches-four-new-systems-provides-financial-dx-update-jp-morgan

illumina enabling more "capital-constrained" customers to adopt high-throughput whole-genome sequencing with a lower capital investment. The X Five provides more than 9,000 genome sequence a Yr, cost $6Mn per system. Customers will be able to produce a genome for around $1,400

□ 10X Genomics Closes $55.5 Million Series B Round: New genomics platform company change the definition of sequencing

>> http://10xgenomics.com

JP Morgan会議でデビューした10X Genomics社、Illumina一強を崩しうるShort-readシーケシングに関する革新的プラットフォームを期待され$55.5Mnを調達。2月のAGBTで全容を公表する予定らしい。Illuminaを始めとするシーケンシング・システムの多くと互換性を持ち、容易に既存のワークフローに組み込める仕様。

a combination of proprietary microfluidic hardware, chemical reagents, and software, 10X Genomics aims to index the genome before it gets run through a conventional sequencing machine.

The 10X Genomics platform has numerous powerful characteristics, including generation of long range information (10s to 100s of kilo bases) and creation of high-quality sequencing libraries from 1 ng of DNA.

The 10X Genomics platform is a molecular barcoding and analysis suite that delivers structural variants, haplotypes, and other valuable long range contextual information for a broad range of applications including targeted, exome & whole genome sequencing.

visualizes the multi-megabase phase blocks and structural rearrangements revealed by the 10X Genomics platform. Researchers can open our output files in a haplotype-aware genome browser to investigate phased SNPs and indels as well as large-scale insertions, deletions, duplications, translocations, and inversions.

□ Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods:

>> http://bioinformatics.oxfordjournals.org/content/early/2015/01/09/bioinformatics.btu745.full

Algorithm for taxonomic labeling of query segments (realignment placement algorithm/RPA).

□ A method for calculating probabilities of fitness consequences for point mutations across the human genome:

>> http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3196.html

partitioned the data into small numbers of classes along each of these three axes by a simple scheme, and considered all possible combinations (the Cartesian product) of these DNase-seq, RNA-seq, and ChIP-seq class assignments. HMM-based or sliding-window methods can only be effectively applied on the scale of large genomic regions rather than individual elements.

a parallel method FitConsD and Evolutionary Turnover

a maximum-likelihood neutral scaling factor sneut w for T

ρdiv (Ci) = 1 - si /si neut

□ VarElect - NGS Phenotyper: more than 100 genomic and biomedical data sources integrated in LifeMap Knowledgebase

>> http://varelect.genecards.org

GeneCards & MalaCards rely on more than 100sources, everything from the big variant databases like ClinVar & OMIM to propriatry data sources

□ Repulsive parallel MCMC algorithm for discovering diverse motifs from large sequence sets:

>> http://bioinformatics.oxfordjournals.org/content/early/2015/01/10/bioinformatics.btv017.full.pdf

the computation times of RPMCMC and DREME were about a ten- thousandth those of Hegma. To identify the cooperative cofactors of the primary TF, each predicted motif is matched to JASPAR CORE motifs by using the TOMTOM program.

APBCの論文出てました。MEGADOCK Tesla K20x vs Phi 5110Pです。

BMC Syst Biol | Docking on Accelerators: Comparison of GPU and MIC http://www.biomedcentral.com/1752-0509/9/S1/S6

FFT-baseのdockingはTesla K20xとXeon Phi 5110Pで比べたらTeslaの方が2~5倍くらい速かったという話です.Phiはnativeとoffloadでそれぞれ実装しましたが,native実行だとthread並列数がメモリ不足で増やせないという.

実は同じようにbioinformatics系アプリXeon PhiとK20xで比較した論文は2014年に2報出ていて,GWAS (SNP間相互作用決定) だとこれhttp://www.biomedcentral.com/1471-2105/15/216 … .結果は同じ感じでした.

In the Katchalski-Katzir algorithm, the pseudo-interaction energy score (docking score S) between a receptor protein and a ligand protein is calculated as the convolution of two discrete functions using N3-point forward FFT and inverse FFT (IFFT),

S(t) = ∑v∈V R(v)L(v+t)

= IFFT[FFT[R(v)]*FFT[L(v)]],

R and L are the discrete score
v is a coordinate in the 3D grid space V

Machine Intelligence Cracks Genetic Controls | via @WIRED - http://wrd.cm/1Ax9LZf

□ Extracting reaction networks from databases–opening Pandora’s box

>> http://bib.oxfordjournals.org/content/15/6/973.full

analysis based on the exported exchange files (BioPAX and SBML formatted data), which are the most common and in the case of Panther and PID, the only method for accessing the databases’ content.

large scale goes huge.

Theano で Deep Learning <6>: 制約付きボルツマンマシン <前編> - StatsFragments

□ single threaded main-memory implementation of graph algorithms often faster than cluster-computing framework:

>> http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html

□ Illumina Launches HiSeq X Five System and HiSeq 3000/4000 Sequencing Systems:

>> http://www.rna-seqblog.com/illumina-launches-hiseq-x-five-system-and-hiseq-30004000-sequencing-systems/

Geneticists should not use “races” to describe sub-populations within a species. They are either sub-populations or sub-species. Thoughts?

Titus: Recommends a new assembler called MEGAHIT http://hgpu.org/?p=12860 #PAGXXIII

mentioned graph-based reconstruction of genome architecture #PAGXXIII
(Bioinformatists, time to learn more about graph th.)

Graph based assemblies may address need for multiple reference genomes but need new search tools for such assemblies #PAGXXIII

□ Ontorat: automatic generation of new ontology terms, annotations, and axioms based on ontology design patterns:

>> http://www.jbiomedsem.com/content/6/1/4

In Ontorat, the ontology axioms are formatted using the Manchester OWL Syntax, a logical syntax designed for writing OWL class expressions. OntoFox and Ontorat have been combined in use for development of new ontologies, such as the Cell Line Ontology (CLO), Vaccine Ontology, Ontology of Biological and Clinical Statistics (OBCS) & Beta Cell Genomics Ontology

□ Detecting early-warning signals for sudden deterioration of complex diseases by dynamical network biomarkers

>> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3314989/

RT 鍵 臨界近傍って、次元が減るんですよ。だからパラメータ量が無数にあっても臨界近傍の実質的に支配的なパラメータ群の数は極端に減っていると思います。

暗号化状態でセキュリティレベルの更新と演算の両方ができる準同型暗号方式を開発 http://prw.kyodonews.jp/opn/release/201501166936/ … 今後、保険やバイオインフォマティクスなどの分野で使われる計算を暗号化したまま行うことで、大規模なプライバシー保護データマイニングシステムが構築可能



razoralign: 擬態のプロセスに関する解明も進められてます。視覚認知が遺伝に及ぼす化学的作用の関係も考えられますが、驚く位単純に構成されてる可能性も。擬態に限らず環境因子と遺伝子発現の相互作用は必然・決定論的で、生き残ると無自覚にマスゲームの模様が完成するイメージ。


□ Gradual and contingent evolutionary emergence of leaf mimicry in butterfly wing patterns

>> http://www.biomedcentral.com/1471-2148/14/229

the phylogenetic evolution of leaf mimicry patterns, for which a key principle is the ‘body plan’ or ‘ground plan’, referring to the structural composition of organisms by homologous elements shared across species.

