leedsomics - Tumblr blog

leedsomics · 9 months ago

Text

From Genes to Pathways: A Curated Gene Approach to Accurate Pathway Reconstruction in Teleost Fish Transcriptomics

Interpreting the vast amounts of data generated by high-throughput sequencing technologies can often present a significant challenge, particularly for non-model organism. While automated approaches like GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) enrichment analyses are widely used, they often lack specificity for non-model organisms. To bridge this gap, we present a manually curated gene list tailored for teleost fish transcriptomics. This resource focuses on key biological processes crucial for understanding teleost fish physiology, development, and adaptation, including hormone signaling, various metabolic pathways, appetite regulation, digestion, gastrointestinal function, vision, ossification, osmoregulation, and pigmentation. Developed through collaborative efforts of specialists in diverse fields, the list prioritizes genes with established roles in teleost physiology, experimental evidence, and conservation across species. This curated list aims to provide researchers with a reliable starting point for transcriptomic analyses, offering a carefully evaluated set of genes relevant to current research priorities. By streamlining the process of gene selection and interpretation, this resource supports the broader teleost fish research community in designing and analyzing studies that investigate molecular responses to developmental and environmental changes. We encourage the scientific community to collaboratively expand and refine this list, ensuring its continued relevance and utility for teleost fish research. http://dlvr.it/TDdfbz

0 notes

leedsomics · 9 months ago

Text

Human untargeted metabolomics in the gut microbiome era: ethanol vs methanol

Untargeted metabolomics is frequently performed on human fecal samples in conjunction with sequencing to unravel gut microbiome functionality. As sample collection efforts rapidly expand, with individuals often collecting specimens at home, metabolomics experiments should adapt to accommodate the safety and needs of bulk off-site collections. Here, we show that a 95% ethanol, safe to be shipped and handled, extraction pipeline recovers comparable amounts of metabolites as a validated 50% methanol extraction, preserving metabolic profiles differences between subjects. Additionally, the fecal metabolome remains relatively stable when stored in 95% ethanol for up to a week at room temperature. Finally, we suggest a metabolomics data analysis workflow using robust centered log ratio transformation, which removes variance introduced by different sample weights, allowing for reliable and integration-ready untargeted metabolomics experiments in gut microbiome studies. http://dlvr.it/TDcgz5

0 notes

leedsomics · 9 months ago

Text

MTALTCO1: a 259 amino acid long mtDNA-encoded alternative protein that challenges conventional understandings of mitochondrial genomics.

Mitochondrial derived peptides and proteins significantly expand the coding potential of the human mitogenome. Here, we report the discovery of MTALTCO1, a 259 amino acid protein encoded by a mitochondrial alternative open reading frame (mtaltORF) found in the +3 reading frame of the cytochrome oxidase 1 (CO1) gene. Using custom antibodies, we confirmed the mitochondrial expression of MTALTCO1 in human cell lines. Sequence analysis revealed high arginine content and an elevated isoelectric point that were not contingent on CO1's amino acid sequence, suggesting selective pressures acting on this protein. MTALTCO1 displays extensive fusion-fission dynamics at the interspecies level, yet produces a full-length protein throughout human haplogroups. Our findings highlight the importance of identifying novel mtaltORFs in expanding our understanding of the mitochondrial proteome. http://dlvr.it/TDcSFq

0 notes

leedsomics · 9 months ago

Text

Spatial multi-omics reveal intratumoral humoral immunity niches associated with tertiary lymphoid structures in pancreatic cancer immunotherapy pathologic responders

Pancreatic adenocarcinoma (PDAC) is a rapidly progressing cancer that responds poorly to immunotherapies. Intratumoral tertiary lymphoid structures (TLS) have been associated with rare long-term PDAC survivors, but the role of TLS in PDAC and their spatial relationships within the context of the broader tumor microenvironment remain unknown. We generated a spatial multi-omics atlas encompassing 26 PDAC tumors from patients treated with combination immunotherapies. Using machine learning-enabled H&E image classification models and unsupervised gene expression matrix factorization methods for spatial transcriptomics, we characterized cellular states within TLS niches spanning across distinct morphologies and immunotherapies. Unsupervised learning generated a TLS-specific spatial gene expression signature that significantly associates with improved survival in PDAC patients. These analyses demonstrate TLS-associated intratumoral B cell maturation in pathological responders, confirmed with spatial proteomics and BCR profiling. Our study also identifies spatial features of pathologic immune responses, revealing TLS maturation colocalizing with IgG/IgA distribution and extracellular matrix remodeling. http://dlvr.it/TDcG3n

0 notes

leedsomics · 9 months ago

Text

Comparative pan-genomics reveals divergent adaptations in clinically-relevant members of the Fusarium solani species complex

The Fusarium solani species complex are a group of dual-kingdom fungal pathogens capable of causing devastating disease on a wide range of host plants and life-threatening infections in humans that are difficult to treat. In this study, we generate highly contiguous genomes for three clinical isolates of Fusarium keratoplasticum and three clinical isolates of Fusarium petroliphilum and compare them with other genomes of the FSSC from plant and animal sources. We find that human pathogenicity is polyphyletic within the FSSC, including in F. keratoplasticum. Pan-genome analysis revealed a high degree of gene presence-absence in the complex, with only 41% of genes (11,079/27,068) found in all samples and the presence of accessory chromosomes encoding isolate- and species-specific genes. We also defined conserved long non-coding RNAs (lncRNAs) between F. keratoplasticum and F. petroliphilum, revealing that they show a similar low degree of presence-absence variation. Secondary metabolite analysis revealed a conserved core set of biosynthetic gene clusters across the FSSC, as well as a unique cluster potentially linked to keratitis. Transcriptomic analysis under stress conditions showed minimal differential gene expression, indicating that both F. keratoplasticum and F. petroliphilum are well adapted to human infection-relevant conditions. This study provides valuable insights into the evolutionary dynamics, genomic architecture, and potential pathogenicity mechanisms of the FSSC, with implications for understanding multi-kingdom virulence, of increasing relevance as climate change potentially increases the number of fungal species that can grow at human temperatures. http://dlvr.it/TDbBH3

0 notes

leedsomics · 9 months ago

Text

PRAGA: Prototype-aware Graph Adaptive Aggregation for Spatial Multi-modal Omics Analysis

arXiv:2409.12728v2 Announce Type: replace Abstract: Spatial multi-modal omics technology, highlighted by Nature Methods as an advanced biological technique in 2023, plays a critical role in resolving biological regulatory processes with spatial context. Recently, graph neural networks based on K-nearest neighbor (KNN) graphs have gained prominence in spatial multi-modal omics methods due to their ability to model semantic relations between sequencing spots. However, the fixed KNN graph fails to capture the latent semantic relations hidden by the inevitable data perturbations during the biological sequencing process, resulting in the loss of semantic information. In addition, the common lack of spot annotation and class number priors in practice further hinders the optimization of spatial multi-modal omics models. Here, we propose a novel spatial multi-modal omics resolved framework, termed PRototype-Aware Graph Adaptative Aggregation for Spatial Multi-modal Omics Analysis (PRAGA). PRAGA constructs a dynamic graph to capture latent semantic relations and comprehensively integrate spatial information and feature semantics. The learnable graph structure can also denoise perturbations by learning cross-modal knowledge. Moreover, a dynamic prototype contrastive learning is proposed based on the dynamic adaptability of Bayesian Gaussian Mixture Models to optimize the multi-modal omics representations for unknown biological priors. Quantitative and qualitative experiments on simulated and real datasets with 7 competing methods demonstrate the superior performance of PRAGA. http://dlvr.it/TDZ9qC

0 notes

leedsomics · 9 months ago

Text

Data-Independent Acquisition Mass Spectrometry as a Tool for Metaproteomics: Interlaboratory Comparison Using a Model Microbiome

Mass spectrometry (MS)-based metaproteomics is used to identify and quantify proteins in microbiome samples, with the frequently used methodology being Data-Dependent Acquisition mass spectrometry (DDA-MS). However, DDA-MS is limited in its ability to reproducibly identify and quantify lower abundant peptides and proteins. To address DDA-MS deficiencies, proteomics researchers have started using Data-Independent Acquisition Mass Spectrometry (DIA-MS) for reproducible detection and quantification of peptides and proteins. We sought to evaluate the reproducibility and accuracy of DIA-MS metaproteomic measurements relative to DDA-MS using a mock community of known taxonomic composition. Artificial microbial communities of known composition were analyzed independently in three laboratories using DDA- and DIA-MS acquisition methods. DIA-MS yielded more protein and peptide identifications than DDA-MS in each laboratory. In addition, the protein and peptide identifications were more reproducible in all laboratories and provided an accurate quantification of proteins and taxonomic groups in the samples. We also identified some limitations of current DIA tools when applied to metaproteomic data, highlighting specific needs to improve DIA tools enabling analysis of metaproteomic datasets from complex microbiomes. Ultimately, DIA-MS represents a promising strategy for MS-based metaproteomics due to its large number of detected proteins and peptides, reproducibility, deep sequencing capabilities, and accurate quantitation. http://dlvr.it/TDYzpz

0 notes

leedsomics · 9 months ago

Text

Characterizing human CMV-specific CD8+ T cells using multi-layer single-cell omics

In this study we established a comprehensive workflow to collect multi-omics single-cell data using a commercially available micro-well based platform. This included whole transcriptome, cell surface markers (targeted sequencing-based cell surface proteomics), T cell specificities, adaptive immune receptor repertoire (AIRR) profiles and sample multiplexing. With this technique we identified novel paired T cell receptor sequences for three prominent human CMV epitopes. In addition, we review the ability of dCODE dextramers to detect antigen-specific T cells at low frequencies by estimating sensitivities and specificities when used as reagents for single-cell multi-omics. http://dlvr.it/TDXW4g

0 notes

leedsomics · 9 months ago

Text

Quantitative proteomics reveals differential extracellular vesicle cargo from M1 and M2 monocyte-derived human macrophages

Extracellular vesicles (EVs) mediate intercellular communication by carrying molecular cargo that 2 facilitate diverse physiological processes. Macrophages, playing central roles in immune responses, 3 release EVs that modulate various cellular functions. Given the distinct roles of M1 and M2 4 macrophage states, understanding the proteomic profiles of their EVs is important for elucidation of 5 EV-mediated signalling and identifying potential biomarkers for diseases involving macrophage 6 polarisation. We employed quantitative proteomics combined with bioinformatics to characterise 7 the proteomic profile of EVs released by M1 and M2 monocyte-derived macrophages. We identified 8 1,731 proteins in M1/M2 EVs, 132 of which were significantly differentially between M1 and M2. 9 Proteomic data, together with pathway analysis, found that M1/M2 macrophage EV cargo relate to 10 cellular source, and may play roles in shaping immune responses, with M1 EV cargo associated with 11 promotion of pro-inflammatory and antiviral functions, while M2 EV cargo associated with immune 12 regulation and tissue repair. M1 EV cargo was associated with cytokine/chemokine signalling 13 pathways, DNA damage, methylation, and oxidative stress. M2 EV cargo were associated with 14 macrophage alternative-activation signalling pathways, antigen presentation, and lipid metabolism. 15 We also report that macrophage EVs carry metallothioneins, and other related proteins involved in 16 response to metals and oxidative stress. http://dlvr.it/TDXNKy

0 notes

leedsomics · 9 months ago

Text

SoyOD: An Integrated Soybean Multi-omics Database for Mining Genes and Biological Research

Soybean is a globally important crop for food, feed, oil, and nitrogen fixation. A variety of multi-omics research has been carried out generating datasets ranging from genotype to phenotype. To utilise this data, a soybean multi-omics database that has broad data coverage and comprehensive data analysis tools would be of value for basic and applied research. We present the soybean omics database (SoyOD), which integrates significant new datasets with existing public datasets for the most comprehensive collection of soybean multi-omics information. Compared to the existing soybean database, SoyOD incorporates an extensive collection of novel data derived from the deep-sequencing of 984 germplasms, 162 novel transcriptome datasets from seeds at different developmental stages, 53 phenotypic datasets, and over 2500 phenotypic images. In addition, SoyOD integrates existing data resources, including 59 assembled genomes, genetic variation data from 3904 soybean accessions, 225 sets of phenotypic data, and 1097 transcriptomic sequences covering 507 different tissues and treatment conditions. SoyOD is a novel tool, as it can be used to mine and analyze candidate genes for important agronomic traits, as shown in a case study on plant height. Additionally, powerful analytical and easy-to-use toolkits enable users to easily access the available multi-omics datasets, and to rapidly search genotypic and phenotypic data in a particular germplasm. The novelty, comprehensiveness, and user-friendly features of SoyOD make it a valuable resource for soybean molecular breeding and biological research. SoyOD is publicly accessible at https://bis.zju.edu.cn/soyod. http://dlvr.it/TDTzh4

0 notes

leedsomics · 9 months ago

Text

Assessing Reusability of Deep Learning-Based Monotherapy Drug Response Prediction Models Trained with Omics Data

arXiv:2409.12215v1 Announce Type: new Abstract: Cancer drug response prediction (DRP) models present a promising approach towards precision oncology, tailoring treatments to individual patient profiles. While deep learning (DL) methods have shown great potential in this area, models that can be successfully translated into clinical practice and shed light on the molecular mechanisms underlying treatment response will likely emerge from collaborative research efforts. This highlights the need for reusable and adaptable models that can be improved and tested by the wider scientific community. In this study, we present a scoring system for assessing the reusability of prediction DRP models, and apply it to 17 peer-reviewed DL-based DRP models. As part of the IMPROVE (Innovative Methodologies and New Data for Predictive Oncology Model Evaluation) project, which aims to develop methods for systematic evaluation and comparison DL models across scientific domains, we analyzed these 17 DRP models focusing on three key categories: software environment, code modularity, and data availability and preprocessing. While not the primary focus, we also attempted to reproduce key performance metrics to verify model behavior and adaptability. Our assessment of 17 DRP models reveals both strengths and shortcomings in model reusability. To promote rigorous practices and open-source sharing, we offer recommendations for developing and sharing prediction models. Following these recommendations can address many of the issues identified in this study, improving model reusability without adding significant burdens on researchers. This work offers the first comprehensive assessment of reusability and reproducibility across diverse DRP models, providing insights into current model sharing practices and promoting standards within the DRP and broader AI-enabled scientific research community. http://dlvr.it/TDTmvN

0 notes

leedsomics · 9 months ago

Text

PRAGA: Prototype-aware Graph Adaptive Aggregation for Spatial Multi-modal Omics Analysis

arXiv:2409.12728v1 Announce Type: new Abstract: Spatial multi-modal omics technology, highlighted by Nature Methods as an advanced biological technique in 2023, plays a critical role in resolving biological regulatory processes with spatial context. Recently, graph neural networks based on K-nearest neighbor (KNN) graphs have gained prominence in spatial multi-modal omics methods due to their ability to model semantic relations between sequencing spots. However, the fixed KNN graph fails to capture the latent semantic relations hidden by the inevitable data perturbations during the biological sequencing process, resulting in the loss of semantic information. In addition, the common lack of spot annotation and class number priors in practice further hinders the optimization of spatial multi-modal omics models. Here, we propose a novel spatial multi-modal omics resolved framework, termed PRototype-Aware Graph Adaptative Aggregation for Spatial Multi-modal Omics Analysis (PRAGA). PRAGA constructs a dynamic graph to capture latent semantic relations and comprehensively integrate spatial information and feature semantics. The learnable graph structure can also denoise perturbations by learning cross-modal knowledge. Moreover, a dynamic prototype contrastive learning is proposed based on the dynamic adaptability of Bayesian Gaussian Mixture Models to optimize the multi-modal omics representations for unknown biological priors. Quantitative and qualitative experiments on simulated and real datasets with 7 competing methods demonstrate the superior performance of PRAGA. http://dlvr.it/TDTXWx

0 notes

leedsomics · 9 months ago

Text

Untargeted and Semi-Targeted Metabolomics Approach for Profiling Small Intestinal and Fecal Metabolome Using High-Resolution Mass Spectrometry

The gut microbiome is a complex ecosystem varying along different gut sections, consisting of metabolites from food, host, and microbes. Microbially derived metabolites like bile acids and short chain fatty acids interact with host physiology. Current studies often use fecal samples, which don't fully represent the upper gut due to stratification. To sample the proximal gut microbiome, endoscopic methods or new non invasive devices are used. We developed an approach combining untargeted and semi-targeted metabolomics using a QExactive Plus Orbitrap mass spectrometer to profile gut metabolites. We initially selected forty nine key metabolites based on specific criteria, validated them through repeatability tests, and created a compound database with TraceFinder software. Our workflow enables molecule annotation in untargeted studies while validating thirty seven metabolites in semi-targeted analyses. This method, applied to clinical trial samples, shows promise in discovering new gut metabolites. http://dlvr.it/TDTHSH

0 notes

leedsomics · 9 months ago

Text

Muscle fiber proteomics reveals sex- and fiber type-specific adaptations to resistance training

Skeletal muscle hypertrophy is a hallmark of resistance training that positively impacts health and longevity. However, despite physiological differences between sexes and fiber types, the underlying proteome changes with resistance training have not been studied in a sex- and fiber type-specific manner. Herein, we show sex differences in the fiber type-specific proteome, predominantly in type II fibers. Following 8 weeks of resistance training, substantial remodeling of the human skeletal muscle proteome occurred in a sex- and fiber type-specific manner. Notably, type II fibers exhibited much greater adaptations across both sexes, whereas the main sex-difference was a greater remodeling of intermediate filaments in females. In addition, baseline abundance of proteins involved in translation was highly correlated with fiber hypertrophy, and differed between sexes and fiber types. Thus, translational capacity may partially explain differences in resistance training-induced hypertrophy. Our findings demonstrate key aspects of sex- and fiber type differences in muscle physiology and their contributions to resistance training-induced adaptions. http://dlvr.it/TDT2fG

0 notes

leedsomics · 9 months ago

Text

Multi-task benchmarking of single-cell multimodal omics integration methods

Single-cell multimodal omics technologies have empowered the profiling of complex biological systems at a resolution and scale that were previously unattainable. These biotechnologies have propelled the fast-paced innovation and development of data integration methods, leading to a critical need for their systematic categorisation, evaluation, and benchmark. Navigating and selecting the most pertinent integration approach poses a significant challenge, contingent upon the tasks relevant to the study goals and the combination of modalities and batches present in the data at hand. Understanding how well each method performs multiple tasks, including dimension reduction, batch correction, cell type classification and clustering, imputation, feature selection, and spatial registration, and at which combinations will help guide this decision. This study aims to develop a much-needed guideline on choosing the most appropriate method for single-cell multimodal omics data analysis through a systematic categorisation and comprehensive benchmarking of current methods. http://dlvr.it/TDSpvb

0 notes

leedsomics · 9 months ago

Text

DuReS: An R package for denoising experimental tandem mass spectrometry-based metabolomics data

Mass spectrometry-based untargeted metabolomics is a powerful technique for profiling small molecules in biological samples, yet accurate metabolite identification remains challenging. One of the primary obstacles in processing tandem mass spectrometry data is the prevalence of random noise peaks, which can result in false annotations and necessitate labor-intensive verification. A common method for removing noise from MS/MS spectra is intensity thresholding, where low-intensity peaks are discarded based on a user-defined cutoff or by analyzing the top "N" most intense peaks. However, determining an optimal threshold is often dataset-specific and may retain many noisy peaks. In this study, we hypothesize that true signal peaks consistently recur across replicate MS/MS spectra generated from the same precursor ion, unlike random noise. An optimal recurrence frequency of 0.12 (95% CI: 0.087-0.15) was derived using an open-source metabolomics dataset, which enhanced the dot product score between the experimental and library spectra by 66% post-denoising and resulted in a median signal and noise reduction of 5.83% and 99.07%, respectively. Validated across multiple metabolomics datasets, our denoising workflow significantly improved spectral matching metrics, leading to more accurate annotations and fewer false positives. Available freely as an R package, Denoising Using Replicate Spectra (DuReS) (https://github.com/BiosystemEngineeringLab-IITB/dures ) is designed to remove noise while retaining diagnostically significant peaks efficiently. It accepts mzML files and feature lists from standard global untargeted metabolomics analysis software as input, enabling users to seamlessly integrate the denoising pipeline into their workflow without additional data manipulation. http://dlvr.it/TDScDh

0 notes

leedsomics · 9 months ago

Text

k-mer-based approaches to bridging pangenomics and population genetics

arXiv:2409.11683v1 Announce Type: new Abstract: Many commonly studied species now have more than one chromosome-scale genome assembly, revealing a large amount of genetic diversity previously missed by approaches that map short reads to a single reference. However, many species still lack multiple reference genomes and correctly aligning references to build pangenomes is challenging, limiting our ability to study this missing genomic variation in population genetics. Here, we argue that $k$-mers are a crucial stepping stone to bridging the reference-focused paradigms of population genetics with the reference-free paradigms of pangenomics. We review current literature on the uses of $k$-mers for performing three core components of most population genetics analyses: identifying, measuring, and explaining patterns of genetic variation. We also demonstrate how different $k$-mer-based measures of genetic variation behave in population genetic simulations according to the choice of $k$, depth of sequencing coverage, and degree of data compression. Overall, we find that $k$-mer-based measures of genetic diversity scale consistently with pairwise nucleotide diversity ($\pi$) up to values of about $\pi = 0.025$ ($R^2 = 0.97$) for neutrally evolving populations. For populations with even more variation, using shorter $k$-mers will maintain the scalability up to at least $\pi = 0.1$. Furthermore, in our simulated populations, $k$-mer dissimilarity values can be reliably approximated from counting bloom filters, highlighting a potential avenue to decreasing the memory burden of $k$-mer based genomic dissimilarity analyses. For future studies, there is a great opportunity to further develop methods to identifying selected loci using $k$-mers. http://dlvr.it/TDQG72

0 notes