lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

Planet 9.

2018-06-06 06:05:06 | Science News



□ zetadiv: an R package for computing compositional change across multiple sites, assemblages or cases:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/17/324897.full.pdf

Using orders of zeta beyond pairwise comparisons enables to further refine the uncertainty level of the remaining clusters by distinguishing between clusters with low (i.e. superficial) and high similarity for higher orders of zeta.




□ DNAnexus validated deep learning methods using DeepVariant and Clairvoyante on BGI-SEQ data in a http://SV.AI hackathon

>> http://ow.ly/KmzI30khbQ1






□ Full Bayesian comparative phylogeography from genomic data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/17/324525.full.pdf

If we have N pairs of populations, we would like to assign them to an unknown number of divergence events, k, which can range from one to N. For a given number of divergence events, the Stirling number of the second kind tells us the number of ways of assigning the taxa to the divergence times (i.e., the number of models with k divergence-time parameters).




□ Repression of divergent noncoding transcription by a sequence-specific transcription factor:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/17/314310.full.pdf

a sequence-specific transcription factor limits access of basal transcription machinery to regulatory elements and adjacent sequences that act as divergent cryptic promoters, thereby providing directionality towards productive transcription.




□ Synthetic STARR-seq reveals how DNA shape and sequence modulate transcriptional output and noise:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/18/325910.full.pdf

synSTARR-seq approach to systematically analyze how variations in the recognition sequence of the glucocorticoid receptor (GR) affect transcriptional output. This resulted in the identification of a novel highly active GR binding sequence and revealed that sequence variation both within and flanking GR's core binding site modulate its activity without apparent changes in DNA binding affinity.




□ starmap: Immersive visualisation of single cell data using smartphone-enabled virtual reality:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/17/324855.full.pdf
>> https://vccri.github.io/starmap/

starmap is a web-based VR-enabled tool which combines a 3D scatter plot with star plots (radar chart) to visualise hundreds of thousands of multivariate data points, such as single-cell expression data. a scalable visual design that combines the benefit of a three-dimensional scatter plot for exploring clustering structure and the benefit of star plots (also known as radar chart) for multivariate visualisation of an individual cell and designed to utilise low-cost VR headsets.




□ FALCON-Phase: Integrating PacBio and Hi-C data for phased diploid genomes:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/21/327064.full.pdf

FALCON-Phase, a new method that resolves phase-switches by reconstructing contig-length phase blocks using Hi-C short-reads mapped to both homozygous regions and phase blocks. Such Hi-C data contain ultra-long-range phasing information. FALCON-Phase is 96% accurate, suggesting that Hi-C proximity information can be used to correct nearly all haplotype switches along FALCON-Unzip primary contigs and replace the requirement of parental genotype information. The FALCON-Phase pipeline can also be applied to scaffolds to produce chromosome-scale phased diploid genome assemblies.






□ A population genetic interpretation of GWAS findings for human quantitative traits:

>> http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2002985

the model stabilizing selection in a multidimensional phenotype space, akin to Fisher’s geo-metric model. An individual’s phenotype is a vector in an n-dimensional Euclidian space, in which each dimension corresponds to a continuous quantitative trait. The directions of mutations are assumed to be isotropic, i.e., uniformly distributed on the hypersphere in n-dimensions defined by their size, the results are robust to relaxing this assumption as well.




□ Reinforced Adversarial Neural Computer for De Novo Molecular Design:

>> https://pubs.acs.org/doi/10.1021/acs.jcim.7b00690

RANC (Reinforced Adversarial Neural Computer) is a deep neural network architecture for the de novo design of novel small-molecule organic structures based on generative adversarial network (GAN) paradigm and reinforcement learning. As a generator RANC uses a Differentiable neural computer (DNC), a category of neural networks, with increased generation capabilities due to the addition of an explicit memory bank, which can mitigate common problems found in adversarial settings.




□ EternaBrain: Automated RNA design through move sets from an Internet-scale RNA videogame:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/21/326736.full.pdf

When pipelined with hand-coded move combinations developed by the Eterna community, the resulting EternaBrain method solves 61 out of 100 independent RNA design puzzles in the Eterna100 benchmark. EternaBrain surpasses all six other prior algorithms that were not informed by Eterna strategies and suggests a path for automated RNA design to achieve human-competitive performance.




□ NASA Sends New Research on Orbital ATK Mission to Space Station

>> https://www.nasa.gov/press-release/nasa-sends-new-research-on-orbital-atk-mission-to-space-station

Astronauts soon will have new experiments to conduct related to emergency navigation, DNA sequencing and ultra-cold atom research when the research arrives at the International Space Station following the 4:44 a.m. EDT Monday launch of an Orbital ATK Cygnus spacecraft.






□ DISSEQT - DIStribution based modeling of SEQuence space Time dynamics:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/21/327338.full.pdf

DISSEQT is centered around robust dimension and model reduction algorithms for analysis of genotypic data with additional capabilities for including phenotypic features to explore dynamic genotype-phenotype maps. Nonmetric multidimensional scaling using Kruskal’s stress criterion was used rather than classical multidimensional scaling in the final step of the Isomap algorithm.




□ The EVcouplings Python framework for coevolutionary sequence analysis:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/21/326918.full.pdf

The key prerequisite to infer evolutionary couplings is a high-quality sequence alignment of sufficient evolutionary depth. EVcouplings supports the generation of alignments using jackHMMER and HMMsearch, as well as the import of externally computed alignments. Since the identification of an optimal evolutionary depth can be challenging, the application supports the parallel exploration of different sequence inclusion thresholds, provides flexible alignment filtering parameters,and calculates summary statistics to assess alignment qual.




□ A data mining paradigm for identifying key factors in biological processes using gene expression data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/21/327478.full.pdf

a data processing paradigm to identify key factors in biological processes via systematic collection of gene expression datasets, primary analysis of data, and evaluation of consistent signals. selected a top gene for loss-of-function experimental validation and confirmed its function in epidermal differentiation, proving the ability of this paradigm to identify new factors in biological processes.






□ A Generative Bayesian Approach for Incorporating Biosurveillance Sources into Epidemiological Models:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/22/328518.full.pdf

Most emerging data sources come without a large corpus of historical data to be used to train them, and they do not allow the incorporation of prior or system knowledge to capture the interrelationships between data sources. To remedy this, the current work built a generative model for information fusion of diverse data types. This uses a multilevel Bayesian “information model” that enables easy expansion to include new biosurveillance data sources. This information model is used to estimate the inputs of a standard compartmental disease model, and this combination of a theory-based SEIR model and a statistical Markov-chain Monte Carlo model has advantages for a variety of applications.




□ An analysis and comparison of the statistical sensitivity of semantic similarity metrics:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/22/327833.full.pdf

a statistical sensitivity comparison of 5 semantic similarity metrics (Jaccard, Resnik, Lin, Jiang&Conrath, Hybrid Relative Specificity Similarity) representing 3 different kinds of metrics (Edge based, Node based, Hybrid) and explore key parameter choices can impact sensitivity. To evaluate sensitivity in a controlled fashion, they explore two different models for simulating data with varying levels of similarity and compare to the noise distribution using resampling. This increases the confidence in the generality of the results, although thier evaluation is limited to one context, comparison among profiles sampled from among the Entity-Quality phenotype annotations in the Phenoscape KB.




□ Pore-C: long-read chromatin conformation capture, collaboration with Imielinski lab at NYGC:

>> https://nanoporetech.com/resource-centre/posters/pore-c-using-nanopore-reads-delineate-long-range-interactions-between #nanoporeconf




□ plyranges: A grammar of genomic data transformation:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/23/327841.full.pdf

An example of the former is the Genome Query Language (GQL) and its distributed implementation GenAp which use an SQL-like syntax for fast retrieval of information of unprocessed sequencing data. Similarly, the Genometric Query Language (GMQL) implements a relational algebra for combining genomic datasets. they have created a genomic DSL (Several domain specific language) called plyranges that reformulates notions from existing genomic algebras and embeds them in R as a genomic extension of dplyr.




□ MXP: Modular eXpandable framework for building bioinformatics Pipelines:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/23/329110.full.pdf

developed MXP and tested it on various tasks in our organization, primarily for building pipelines for GWAS (Genome-Wide Association Studies) and post-GWAS analysis. At least two languages are involved into construction of a tool for building pipelines: first, implementation language (which may be a combination of languages) and domain specific language (DSL), which is used to specify a pipeline.






nanopore:
CB: We are planning to integrate MinION and MinIT for a total sequencing and analysis solution – MinION Mk1c #nanoporeconf


□ metrichor:
CB: Lots of changes afoot for Metrichor’s EPI2ME platform including running locally on MinIT/GridION/PromethION #nanoporeconf




□ crossword: A data-driven simulation language for the design of genetic-mapping experiments and breeding strategies:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/25/330563.full.pdf

crossword is a domain-specific language, it allows complex and unique simulations to be performed, but the language is supported by a graphical interface that guides users through functions and options. crossword’s utility in QTL-seq design, where its output accurately reflects empirical data. “QTN_random” phenotyping method was used to sample 60 QTNs for each one of 10 iterations, and the QTN effect was assigned using gamma distribution.




□ OUTRIDER: A statistical method for detecting aberrantly expressed genes in RNA sequencing data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/24/322149.full.pdf

The algorithm uses an autoencoder to model read count expectations according to the co-variation among genes resulting from technical, environmental, common genetic variations and a statistical test based on a NB distribution. The dimension of the autoencoder was fitted using a scheme in which artificially corrupted read counts were injected and presented as the input to the autoencoder, maximizing the likelihood of the original, uncorrupted data. The optimal encoding dimension is obtained by assessing the autoencoder performance in correcting corrupted data, artificially introduce corrupted read counts randomly w/ a probability of 10^-2 shifting the true read counts 2 standard deviations on a log scale randomly up / down.




□ HaploVectors: an integrative analytical tool for phylogeography:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/25/330761.full.pdf

HaploVectors is presented as an R package for computing haplotypic eigenvectors and performing null model-based tests. Investigation of HaploVectors using empirical datasets showed that the method is useful to uncover hidden patterns of haplotypic distribution. HaploVectors, a flexible tool that allows exploring phylogeographical patterns and discriminating biogeographic, neutral and environmental factors acting to shape genetic distribution across space.




□ Solving scaffolding problem with repeats:

>> http://biorxiv.org/cgi/content/short/330472v1




□ REVA: a rank-based multi-dimensional measure of correlation:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/25/330498.full.pdf

a nonparametric statistic, REVA, inspired by the Kendall rank correlation coefficient using U-statistic theory to derive the asymptotic distribution of the new correlation coefficient, developing additional large and finite sample properties along the way. Using a very different, generalized approach to the same problem, introduced the Hilbert-Schmidt independence Criterion, a kernel dependence test in multidimensional Euclidean spaces, building on earlier kernel methods like N-distances.




□ Temporal alignment and latent Gaussian process factor inference in population spike trains

>> http://biorxiv.org/cgi/content/short/331751v1

Scalable GPFA extension that operates directly on unbinned spike times + time-warping using nested Gaussian Processes.




□ MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms:

>> https://academic.oup.com/mbe/article-abstract/35/6/1547/4990887




□ The structure of the genetic code as an optimal graph clustering problem:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/28/332478.full.pdf

the genetic code as a partition of an undirected and unweighted graph, which makes the model general and universal. the structure of the genetic code is a solution to the graph clustering problem. Despite the fact that the standard genetic code is far from being optimal according to the conductance, its structure is characterised by many codon groups reaching the minimum conductance for their size.




□ Towards a Dynamic Interaction Network of Life to unify and expand the evolutionary theory:

>> https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-018-0531-6

Earth evolves and makes sense in isolation, challenging the key assumption of the Modern Synthesis framework that targeting the individual gene or organism (in principle knowing that it is part of a set of complex interactions) allows to capture evolution in all its dimensions. Considering transient collectives as stable entities at a given time-scale, when the collectives change much more slowly than the process in which take part, amounts to a focus on interactions occurring at a given time scale by treating the slower dynamics as stable edges/nodes.




□ Thousands of large-scale RNA sequencing experiments yield a comprehensive new human gene list and reveal extensive transcriptional noise:

>> https://www.biorxiv.org/content/biorxiv/early/2018/05/28/332825.full.pdf

The new human gene database contains 43,162 genes, of which 21,306 are protein-coding and 21,856 are noncoding, and a total of 323,824 transcripts, for an average of 7.5 transcripts per gene. Their expanded gene list includes 4,998 novel genes (1,178 coding and 3,819 noncoding) and 97,511 novel splice variants of protein-coding genes as compared to the most recent human gene catalogs. Alignment and assembly. In total, the 9,795 RNA-seq samples contain 899,960,113,026 reads (449,980,056,513 pairs), an average of 91.9 million reads (46M pairs) per sample.




□ Continuous visualization of differences between biological conditions in single-cell data:

>> http://biorxiv.org/cgi/content/short/337485v1






最新の画像もっと見る

コメントを投稿