lens, align.

Lang ist Die Zeit, es ereignet sich aber Das Wahre.

Catalyst.

2018-12-24 23:33:18 | Science News


"We Are What We Are."




□ The Bellerophon pipeline, improving de novo transcriptomes and removing chimeras:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/13/495754.full.pdf

Bellerophon first uses the quality-assessment tool TransRate to indicate the quality, after which it uses a Transcripts Per Million (TPM) filter to remove lowly expressed contigs and CD-HIT-EST to remove highly identical contigs. Bellerophon pipeline was able to remove between 40 to 91.9% of the chimeras in transcriptome assemblies and removed more chimeric than non-chimeric contigs.






□ NASC-seq monitors RNA synthesis in single cells:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/17/498667.full.pdf

NASC-seq reveals rapidly up- and down-regulated genes during the T-cell activation, and RNA sequenced for induced genes were essentially only newly synthesized. NASC-seq is based on a combination of RNA labeling with 4sU, RNA modification by alkylation as in SLAM-seq, RNA sequencing library preparation as in Smart-seq2, and data analysis that includes a computational model from GRAND-SLAM.




□ Consensify: a method for generating pseudohaploid genome sequences from palaeogenomic datasets with reduced error rates:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/18/498915.full.pdf

The error correction is derived directly from the data itself, without the requirement for additional genomic resources or simplifying assumptions such as contemporaneous sampling. The extended D statistic, rather than standard pseudohaploidisation, makes use of the complete read stack and can further apply a correction to error rates estimated by comparison to data from a high quality “error free” individual.




□ RACER: A data visualization strategy for exploring multiple genetic associations:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/14/495366.full.pdf

Statistical methods have been developed to assess the likelihood that two associations (e.g. disease locus and eQTL) share a common causal variant, however, visualization of the two loci is often a crucial step in determining if a locus is pleiotropic. the Regional Association ComparER (RACER) package, which creates mirror plots, in which the two associations are plotted on a shared x-axis. Mirror plots provide an effective tool for the visual exploration and presentation of the relationship between two genetic associations.




□ MapOptics: A light-weight, cross-platform visualisation tool for optical mapping alignment:

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/bty1013/5232997

MapOptics is a lightweight cross-platform tool that enables the user to visualise and interact with the alignment of Bionano optical mapping data and can be used for in depth exploration of hybrid scaffolding alignments.

public static LinkedHashMap calculateRefOverlap(String refId, RefContig ref, LinkedHashMap queries) {
LinkedHashMap overlapRegions = new LinkedHashMap();




□ Unified Cox model based multifactor dimensionality reduction method for gene-gene interaction analysis of the survival phenotype:

>> https://biodatamining.biomedcentral.com/articles/10.1186/s13040-018-0189-1

Cox UM-MDR is easily implemented by combining Cox-MDR with UM-MDR to detect the significant gene-gene interactions associated with the survival time without cross-validation and permutation testing. Cox UM-MDR has similar power to Cox-MDR, whereas it outperforms Cox-MDR with marginal effects and more robust to heavy censoring when some SNPs having only marginal effects might mask the detection of the causal epistasis.




□ Optimal Gene Filtering for Single-Cell data (OGFSC) – a gene filtering algorithm for single-cell RNA-seq data:

>> https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/bty1016/5237553

Precise identification of differentially expressed genes and cell populations are heavily dependent on the effective reduction of technical noise, for example, by gene filtering. However, there is still no well-established standard in the current approaches of gene filtering. a novel algorithm, termed as OGFSC (Optimal Gene Filtering for Single-Cell data), to construct a thresholding curve based on gene expression levels and the corresponding variances.






□ susieR: "sum of single effects" (SuSiE) sparse multiple regression,: fine-mapping in human genetic association studies:

>> https://github.com/stephenslab/susieR

The methods implemented here are particularly well-suited to settings where some of the X variables are highly correlated, and the true effects are highly sparse (e.g.


□ Nanopore sequencing and rapid fusion testing – a ‘killer app’ in molecular pathology

>> https://nanoporetech.com/resource-centre/william-jeck-nanopore-sequencing-and-rapid-fusion-testing-killer-app-molecular






□ VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database:

>> https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2532-4

VarGenius provides a database to store the output of the analysis (calling quality statistics, variant annotations, internal allelic variant frequencies) and sample information (personal data, genotypes, phenotypes).

VarGenius has been tested on a parallel computing cluster with 52 machines with 120GB of RAM, a 50M whole exome sequencing was executed in about 7h (trio or quartet); a joint analysis of 30 WES in about 24 h and the parallel analysis of 34 single samples from a  M panel in 2h.

VarGenius also implements the GATK VQSR and the GRW pipelines for the joint analysis of hundreds of samples. The VarGenius database can be queried through SQL programming interface which has a very intuitive syntax, and provides a script (query_2_db.pl), described by the online user guide, which allows to perform basic automated queries.




□ artic.network: LHFV Current Outbreak Simulation

>> https://artic-network.github.io/artic-workshop/

#RealTimeGhana18






□ The Future of Science and Science of the Future: Vision and Strategy for the African Open Science Platform (v02)

>> https://zenodo.org/record/2222418#.XBYWFy3AORs

the future nanopore for viral genome sequencing. Participants will return home and train others so Africa will be epidemic ready






@pathogenomenick
12-plex viral genomes- 100s of x coverage per genome from participant-prepared libraries generated in a few minutes, basecalled with guppy and visualised both in real-time by @hamesjadfield RAMPART. And the negative is clean! #RealTimeGhana18


AineToole
RAMPART successfully detecting contaminants and depth of coverage direct from MinION sequencer in real time. #RealTimeGhana18




arambaut:
A nanopore MinIT running a Flongle, wireless controlled by a laptop running @NetworkArtic RAMPART to do real-time viral analysis as the reads are base-called. #RealTimeGhana18


kirstynbrunker:
Team EARRT @george_l present their idea for a measles surveillance system in East Africa to help deal with challenges of widespread movement and understand real-time pathogen spread #RealTimeGhana18 @NetworkArtic






K_G_Andersen:

>> https://www.nature.com/articles/s41564-018-0296-2

genomic epidemiology and how infectious disease genomics can be really helpful in tracking and understanding outbreaks? We tried to cram it all into a single review @NatureMicrobiol:




□ Real-time measurement of protein–protein interactions at single-molecule resolution using a biological nanopore:

>> https://www.nature.com/articles/nbt.4316

engineering a genetically encoded sensor for real-time sampling of transient PPIs at single-molecule resolution. This sensor comprises a truncated outer membrane protein pore, a flexible tether, a protein receptor and a peptide adaptor. This selective nanopore sensor could be applied for single-molecule protein detection, could form the basis for a nanoproteomics platform or might be adapted to build tools for protein profiling and biomarker discovery.






gaurav_bio:

Computational efficiency gains from kallisto paired with high concordance to traditional methods = no brainer: use pseudo-alignment/quasi-mapping. #bioinformatics #genomics

>> https://link.springer.com/protocol/10.1007/978-1-4939-8868-6_2

@lpachter A comparison of RSEM and kallisto for reproducible cancer RNA-seq analysis with workflow design.




□ Israel to Sequence 100K People, Create Genomic Database to Support 'Digital Health' and establish itself as a "world leader in precision medicine".

>> https://www.genomeweb.com/sequencing/israel-sequence-100k-people-create-genomic-database-support-digital-health






ReindertN:
Great @nanopore sequencing run on #microbiome of bleaching #coral to start the weekend: first ~2H yielded >1.1 million reads and >1.1 GBases (60 pooled barcoded amplicons) Using a fresh RevD flowcell and new MinION release 18.12.4. Also superbly nice pore occupancy of >90%




□ Qtlizer: comprehensive QTL annotation of GWAS results:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/17/495903.full.pdf

After applying the ETL process described in the last section, 40,883,209 QTLs (37,014,094 study-wide significant) from 3,856,968 variants to 32,987 genes were finally added to the Genehopper DB.




□ Benchmarker: an unbiased, association-data-driven strategy to evaluate gene prioritization algorithms:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/17/497602.full.pdf

They compared two gene prioritization methods, DEPICT and MAGMA; genes prioritized by both methods strongly outperformed genes prioritized by only one. This strategy is highly generalizable because it can be applied to any method that prioritizes genes or variants based on their similarity to each other with respect to some feature(s) of interest (e.g. similar patterns of gene set membership, similar epigenetic marks).






□ ELIGOS: Decoding the Epitranscriptional Landscape from Native RNA Sequences:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/17/487819.full.pdf

the ONT sequencing signals obtained from cDNA and those derived from the same RNA molecules by dRNA-seq could be used to filter out systematic noise from data to detect locations of possible RNA modifications. The ELIGOS software is publicly available and can be used to detect possible RNA modification sites and secondary structures quickly, on a global transcriptomic scale. Moreover, ELIGOS can be used as a diagnostics tool to improve the base calling algorithm of nanopore sequencing.




□ High Dimensional Mediation Analysis with Applications to Causal Gene Identification:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/17/497826.full.pdf

MedFix is a new application of adaptive lasso with one additional tuning parameter. MedMix is a novel mediation model based on high dimensional linear mixed model, for which we also develop a new variable selection algorithm. motivated by the causal gene identification problem, where causal genes are defined as the genes that mediate the genetic effect. the genetic variants are the high dimensional exposure, the gene expressions the high dimensional mediator, and the phenotype of interest the outcome.






□ Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments:

>> https://almob.biomedcentral.com/articles/10.1186/s13015-018-0135-2

Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact p-values for motif observations in sequences. Biases in motif distributions across ranked sequence lists are evaluated using random walks, Brownian bridges, or modified rank based statistics. A modular setup and fast analytic p-value evaluations make Regmex applicable to diverse and potentially large-scale motif analysis problems.




□ Beyond SNP Heritability: Polygenicity and Discoverability of Phenotypes Estimated with a Univariate Gaussian Mixture Model:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/17/498550.full.pdf

using detailed linkage disequilibrium structure from an extensive reference panel, to estimate these quantities from genome-wide association studies (GWAS) summary statistics for SNPs with minor allele frequency >1%. A power analysis allows us to estimate the proportions of phenotypic variance explained additively by causal SNPs at current sample sizes, and map out sample sizes required to explain larger portions of additive SNP heritability, and also allows for estimating residual inflation.




□ Drop-seq-compatible DART-seq captures multiplexed RNA targets--including non-polyA transcripts, viral transcripts & Ig sequences--simultaneously with 5’-end sequenced transcriptomes in single cells at high throughput.

>> http://bit.ly/2A6NK9U




□ Simphony: simulating large-scale, rhythmic data:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/17/497859.full.pdf

Simphony has adjustable parameters for specifying experimental design and modeling rhythms, including the ability to sample from Gaussian and negative binomial distributions. rhythm detection improved as rhythm amplitude increased or the interval between time points decreased. Rhythm detection also improved as baseline expression increased (and thus as the standard deviation of log-transformed counts of non-rhythmic genes decreased).






□ STRING2GO: Using deep maxout neural networks to improve the accuracy of function prediction from protein interaction networks:

>> https://www.biorxiv.org/content/biorxiv/early/2018/12/17/499244.full.pdf

It adopts deep maxout neural networks to learn a novel type of functional biological network feature representations simultaneously encapsulating both node neighborhoods and co-occurrence functions information. hese higher-level representations are learnt in a supervised way by training deep maxout neural networks to output all the terms in biological process domain associated with an input protein – an approach that has led to higher predictive accuracy in the past.




□ Tensor Decomposition of Stimulated Monocyte and Macrophage Gene Expression Profiles Identifies Neurodegenerative Disease-specific Trans-eQTLs

>> http://biorxiv.org/cgi/content/short/499509v1

robust evidence that some disease-associated genetic variants affect the expression of multiple genes in trans.




□ Variation in proviral content among human genomes mediated by LTR recombination

>> https://mobilednajournal.biomedcentral.com/articles/10.1186/s13100-018-0142-3

The viral content of human genomes is more variable than we thought. Because LTR-LTR recombination events may occur long after proviral insertion but are challenging to detect in resequencing data, we hypothesize that this mechanism is a source of genomic variation in the human population that remains vastly underestimated.