Article_Dec.2 - Open Data, Bioinformatics and Others...

2011-12-10 09:18:48 | Science
Linked Dataのようなデータ運用技術が、国家・民族間の紛争を解決するのに必要なのではないかと昔から考えている。どの組織が問題で、誰が味方で、潜在するリスクと、その時間的推移や定量的評価までマッピング。資金やバジェットのトラフィックも可視化など、相互不信の払拭に特化するツールを開発する。


最終的には、国・政府・産業というものの機能や意思決定の一部、または数割をOpen Governmentの域を超えて、民間の取り組みや連携が代替していくことも視野に入れていかなければならない。縦割りの日本社会ですら、トップダウン型の体制では対処出来る問題があまりにも広域・細分化し機能不全。


□ U.S.-India Strategic Dialogueto produce "Data.gov-in-a-Box": White House open source http://Data.gov open government data platform http://ow.ly/1g1stj
米インド戦略対話におけるOpen Governmentのグローバル開発。政府の透明化、新たな多国間イニシアチブ問題の解決、市民による汚職への抵抗力付与の為のオープンデータプラットフォームを設計へ
Data.gov Goes Global
In September, the United States was one of eight founding governments of the Open Government Partnership,a new multilateral initiative that secures concrete commitments from governments to promote transparency, empower citizens, fight corruption, and harness new technologies to strengthen governance.The President also unveiled the U.S. National Action Plan on Open Government, which detailed steps the United States will take to help meet the initiative’s goals.

Open Data Driven Scholarly Communication in 2020

Phil Bourne presenting his perspectives on data publication and open data. His slides are here: slidesha.re/veR9CW #idcc11


mashable: 『神の素粒子』ヒッグス・ボゾン発見か?CERNが13日に緊急記者会見へ
Scientists May Be Closing in on the Higgs Boson Particle - on.mash.to/vXc62y
Europe’s Large Hadron Collider (LHC), which has been consistently operating since 2009 after a botched grand opening the previous year, is theoretically capable of seeing hints of the Higgs boson, and now the rumor among the physics community is that it’s done just that. A two-part lecture scheduled for Dec. 13 at the European Organization for Nuclear Research (CERN) has the tantalizing title, “Update on the Standard Model Higgs searches.”



What if there is no Higgs boson? - physics-math - 09 December 2011 - New Scientist goo.gl/4xlvE

@drg1985 Even a non-rumored 3.5 sigma signal isn't enough for proof of anything. No point speculating this early. We're still a ways off.

ヒッグスボソン緊張~☆ っていうか3.5sigma signal云々って「噂」の域なのか。。粒子の存在が確定出来る可能性は高いが、実際に補足出来るかどうかは今後の課題って発表に落ち着きそうな予感はする。

(Three-dimensional Isomap embeddings of trans,trans-1,2,4-trifluorocyclooctane. Each ball represents one of 8375 conformations of the molecule.)

□ Metadynamics in the conformational space nonlinearly dimensionally reduced by Isomap

>> ow.ly/7U6fx

Here we present a simulation with a bias potential acting in the directions of collective motions determined by a nonlinear dimensionality reduction method. Ad hoc generated conformations of trans,trans-1,2,4-trifluorocyclooctane were analyzed by Isomap method to map these 72-dimensional coordinates to three dimensions, as described by Brown and co-workers. Metadynamics employing the three-dimensional embeddings as collective variables was applied to explore all relevant conformations of the studied system and to calculate its conformational free energy surface.

□ Generating 3-D models of RNA from limited experimental data.

The nucleic acid simulation tool (NAST) has been written for this purpose and can be downloaded for free from https://simtk.org/home/nast. Read more about the validation of this tool in the article "Course-grained modeling of large RNA molecules with knowledge-based potentials and structural filters" in the Feb. 2009 issue of RNA (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648710/?tool=pubmed).

Learn more about Simbios research on physics-based simulations of biological structures at http://simbios.stanford.edu.

(Kepler 22-b which Nasa say is the most Earth-like yet discovered. Photograph: AFP/Getty Images)

□ Exoplanet Kepler 22-b offers best hope yet for a new Earth

>> http://www.guardian.co.uk/science/2011/dec/05/exoplanet-kepler-22-b-nasa-earth

水が液体で存在可能な惑星確認 気温22度、地球の2・4倍
Kepler 22-b, which is about 2.4 times the size of Earth and lies in the so-called "Goldilocks zone", has a relatively comfortable surface temperature of about 22C (72F) and orbits a star not unlike Earth's sun.

But while astronomers believe that it "probably" also possesses water and land, earthlings secretly harbouring hopes that such a planet could potentially host new colonies from our own increasingly overpopulated home may be in for a disappointment.

About 600 light-years from Earth, Kepler 22-b is a considerable trek away while experts are not yet sure if it is made mostly of rock, gas or liquid




What seven billion people looks like j.mp/vUKKUm
Dencity maps population density using circles of various size and hue. Larger, darker circles show areas with fewer people, while smaller, brighter circles highlight crowded cities. Representing denser areas with smaller circles results in additional geographic detail where there are more people, while sparsely populated areas are more vaguely defined.

(Linked Data Sets: Distribution by Links)

(LOD Data Set statistics as of 09/2011)

□ Linked Data: Evolving the Web into a Global Data Space [PDF] (SMWCon Fall 2011, Berlin) http://www4.wiwiss.fu-berlin.de/jentzsch/2011…
? Next step: Linked Data within Enterprises
 ? alternative to data warehouses and EAI middleware
 ? advantages: schema-less data model, pay-as-you go data integration

There were a remarkable amount of talks on connecting SMW with the rest of the Semantic Web, through RDF, SPARQL etc. Cool, SMW is seemingly becoming a natural choice of platform for SemWeb publishing!

The proportion of bio-people were also a bit remarkable. Apart from SNPedia founder Mike Cariaso, there were a whole bunch of others, including Salvadore from the GeneWiki project, Toni .... (and me) .... I guess it reflects how good SMW handles the need to give structure to very heterogenous datasets, so typical for the Life Sciences.

□ Data.govに連邦の追加的栄養支援プログラム関連のデータが新規に公開 MT @nutsci: never looked at data.gov lots of nutrition related data available: http://t.co/j2cyANKb

Maven Semantic: Synthetic Biology Database bit.ly/sXvLRE
The new database is now available to marketing, business development, competitor intelligence, KOL, medical affairs and related departments in the life sciences sector.
The database currently tags 45,000 individuals working in Synthetic Biology. http://bit.ly/tn1jLa.

>> http://www.mavensemantic.com/

Top 10 Countries for Synthetic Biology Research (ranked by number of senior researchers)
United States Of America (23,207)
Japan (2,800)
United Kingdom (2,239)
Germany (1,765)
Italy (1,543)
Canada (1,305)
France (1,189)
China (900)
Australia (711)
Spain (588)

合成生物学: シニア研究者数と研究規模とを短絡することは出来ないし、主導国のアメリカは規格外としても、日本の閉環境で世界第二位の位置に付けてるのは、逆に市場が未だ持て余してる現れだろう。



An overview of linked data & the semantic web from David Reynolds from Epimorphics (making platforms for this epimorphics.com) #idcc11
Epimorphics Ltd. - Linked Data Solutions.
Linked data is a powerful approach to sharing information across the web. It is enabling governments to publish their data to better inform and empower citizens, enabling companies to connect information across silos to aid decision making, and enabling developers to combine data from across organizations to deliver better services.

As experts in all aspects of linked data, Epimorphics can enable you to get the most out of your data assets ? help you model, publish and exploit them.

NatureJapan: 遺伝子における予測可能性 nature.asia/tVXjbI
Predicting mutation outcome from early stochastic variation in genetic interaction partners
遺伝子決定論に限界があることは、臨床でも(一卵性双生児での遺伝学的病変の転帰が同一ではないことから)、また実験においても(同質の環境下であっても、変異によって同系動物間にさまざまな影響が生じることから)、ずっと以前から明らかだった。そうだとしても、個体の表現型は予測可能かもしれないと考えたB Lehnerたちは、発生中の動物で変異の帰結を直接予測する方法を考案した。

非侵襲的な、蛍光を用いる方法によって、線虫(Caenorhabditis elegans)の胚発生過程での遺伝子発現のゆらぎを監視し、各胚の分子ノイズとその胚に対応する成体の表現型を後ろ向きに比較したのである。そして、密接に関係する遺伝子間に生じる調節補償作用と、シャペロンなどの一般的な調節因子がかかわる調節補償作用が明らかになった。こうした補償の強さによって、動物ごとの成体表現型が予測される。

DailyNewsGW: 科学出版会社エルゼビアが、アリアドネ・ゲノミクスを買収。買収額は不明。
Elsevier Buys Ariadne Genomics: read more bit.ly/uvyGxt
Ariadne's flagship product is Pathway Studio for the analyses of molecular pathways and disease progression. The software suite comprises an integrated data mining and visualization product "that organizes relevant facts and relationships from large document collections of genes, proteins, cell processes, and diseases," Elsevier said in a statement.

エルゼビアが分子経路解析ソフトウェア開発会社を買収。遺伝子や細胞プロセスの可視化製品"Pathway Studio"の技術を統合するものと見られる。

エルゼビアは今夏にも学術論文中のゲノム配列を表示するGenome Viewerを、NCBIとの提携でSciVerseに公開していて、Pathway Studioの導入により、データマイニングやオントロジーの強化に力を入れる狙い


Common Data Model for Natural Language Processing Based on Two Existing Standard Information Models: CDA+GrAF: P... http://bit.ly/ttUrL4
the HL7 Clinical Document Architecture (CDA), and the ISO Graph Annotation Format (GrAF; in development), to develop such a data model entitled “CDA+GrAF”

Two use cases, clinical document sections, and the 2010 i2b2/VA NLP Challenge (i.e., problems, tests, and treatments, with their assertions and relations), were used to create examples of such standoff annotation documents, and were successfully validated with the XML schemata provided with both standards. We developed a tool to automatically translate annotation documents from the 2010 i2b2/VA NLP Challenge format to GrAF, and automatically generated 50 annotation documents using this tool, all successfully validated.

RNA-SeQC is a java program which computes a series of quality control metrics for RNA-seq data confluence.broadinstitute.org/display/CGAToo…
RNA-SeQC: RNA-Seqデータの品質管理メトリクスを計算するJavaプログラム
RNA-SeQC is a java program which computes a series of quality control metrics for RNA-seq data. The input can be one or more BAM files. The output consists of HTML reports and tab delimited files of metrics data. This program can be valuable for comparing sequencing quality across different samples or experiments to evaluate different experimental parameters. It can also be run on individual samples as a means of quality control before continuing with downstream analysis.

RNA-SeQC is built on the GATK as well as the Picard API.

□ notSoJunkDNA: SLIDE: RNA-seqにおけるスパース線形モデリング - アイソフォーム識別のためのデータと資源量推定
Sparse linear modeling of next-gen RNA-Seq data for isoform discovery and abundance est. pnas.org/cgi/content/lo… (can't have it to work yet)
Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation
SLIDE is based on a linear model with a design matrix that models the sampling probability of RNA-Seq reads from different mRNA isoforms. To tackle the model unidentifiability issue, SLIDE uses a modified Lasso procedure for parameter estimation. Compared with deterministic isofor

