#Cell2Sentence
Explore tagged Tumblr posts
govindhtech · 2 months ago
Text
Cell2Sentence: Understanding Single-Cell Biology With LLMs
Tumblr media
C2S-Scale
Imagine asking a cell about its status, activities, or drug responses and getting a simple English response. Today, Yale University and Cell2Sentence-Scale (C2S-Scale) announce a set of open-source large language models trained to understand biology at the single-cell level.
C2S-Scale simplifies complex biological data into “cell sentences” by connecting biology and AI. Researchers may then ask questions about specific cells, such as “Is this cell cancerous?” or “How will this cell respond to Drug X?” and obtain short, biologically based natural language replies.
C2S-Scale may:
Accelerate medication discovery and development
Customise therapy for better outcomes.
Democratise research by open-sourcing.
Help researchers improve disease prevention, treatment, and understanding.
Due to C2S-Scale's extensive research on textual representations of cells and biological information, language-driven single-cell analysis employing massive language models offers exciting new applications.
Single-cell RNA sequencing
Each individual possesses trillions of cells that build organs, fight infections, or deliver oxygen. Even in the same tissue, cells vary. ScRNA-seq evaluates gene expression to determine cell function.
However, single-cell data are massive, complicated, and hard to assess. Gene expression measurements for each cell can be represented by thousands of values and analysed using specialised tools and models. Single-cell analysis is slow, hard to scale, and exclusively for experts.
Imagine translating hundreds of numbers into language words that language models and humans could understand. What if we could simply communicate with a cell about its feelings, actions, and reactions to treatment or illness? Understanding biological systems from cells to tissues might transform disease research, diagnosis, and treatment.
In the “Scaling Large Language Models for Next-Generation Single-Cell Analysis” session, Google is excited to introduce Cell2Sentence-Scale (C2S-Scale), a set of robust, open-source LLMs that “read” and “write” biological data at the single-cell level. We'll discuss single-cell biology, how cells become word sequences, and how C2S-Scale opens new biological research pathways.
From cells to sentences
C2S-Scale converts each cell's gene expression profile into a "cell sentence," a list of the cell's most active genes ranked by expression level. This lets natural language models like Google Gemini or Gemma be applied to scRNA-seq data.
Language-based interfaces improve single-cell data accessibility, interpretability, and flexibility for Google. Much of biology is described in text, such as gene names, cell types, and experimental information, so LLMs can analyse and understand this data.
The C2S-Scale model family
C2S-Scale extends Google's Gemma open model family for biological reasoning utilising data engineering and intelligently written prompts that include cell words, information, and other biological context. Since the core LLM architecture has not changed, C2S-Scale may leverage the large ecosystem, scalability, and infrastructure of general-purpose language models. LLMs trained on over 1 billion tokens from scientific literature, biological information, and real-world transcriptome datasets are the result.
The C2S-Scale series of models, with 410 million to 27 billion parameters, was built to meet scientific needs. Smaller models are easier to deploy and change with less computational power, making them ideal for exploratory investigations or low-resource environments. Larger models perform better on many biological processes but require more processing resources. By offering this range of model sizes, users may choose the model that best fits their use case while considering computation, speed, and performance. Every model will be open-source for development and use.
The C2S-Scale can accomplish what?
Biology chat: Single-cell data questions answered
Imagine someone asking, “How will this T cell react to anti-PD-1 therapy, a common cancer treatment?”
C2S-Scale models may answer in regular language using cellular data and biological knowledge from pre-training, as seen on the left. The right image shows how conversational analysis allows academics to interact with their data in a way that was previously impossible using natural language.
Use natural language to interpret data
Cell2Sentence-Scale can automatically summarise scRNA-seq data at various complexity levels, from cell type classification to tissue or experiment summaries. This helps researchers reliably and rapidly comprehend new datasets without writing complex algorithms.
Biological scaling laws
Google's major finding is that biological language models scale well and function consistently with model size. Larger C2S-Scale models perform better in biological tasks including producing cells and tissues and categorising cell kinds.
The parameter-efficient regime showed that model size consistently increased dataset interpretation semantic similarity scores. As the model reached 27 billion parameters, comprehensive fine-tuning greatly increased tissue gene overlap. This pattern is comparable to general-purpose LLMs and shows that biological LLMs will improve with additional data and computing, creating more powerful and generally applicable biological discovery tools.
Predicting cell fate
One noteworthy usage of Cell2Sentence-Scale is predicting a cell's reaction to a perturbation like a drug, gene deletion, or cytokine exposure. A baseline cell sentence and treatment description can be used to generate a gene expression sentence by the model.
This ability to reproduce biological activity in silico speeds drug discovery, personalised therapy, and prioritising investigations before they are done in the lab. C2S-Scale advances the production of realistic "virtual cells," which may replace conventional cell lines and animal models faster, cheaper, and more morally.
Reinforcement learning optimisation
As reinforcement learning refines huge language models like Gemini to follow instructions and respond in human-aligned ways, Google improves Cell2Sentence-Scale models for biological thinking. Applying semantic text evaluation reward functions like BERTScore trains C2S-Scale to deliver informative and physiologically correct responses that match the dataset. This guides the model towards scientifically useful responses in complicated tasks like therapeutic therapy simulation.
Try it
GitHub and HuggingFace now host Cell2Sentence resources and models. It encourages you to experiment with your own single-cell data, examine these technologies, and uncover the possibilities of teaching robots life language one cell at a time.
0 notes
thejestglobe · 1 month ago
Text
Google lance l'IA soporifique : la science devient un nouveau somnifère
https://thejestglobe.com/wp-content/uploads/2025/05/Google-lance-lIA-soporifique-la-science-devient-un-nouveau-somnifere-file.webp.jpg-file.webp Google lance Cell2Snooze, l’IA qui fait de chaque cellule un roman soporifique Une avancée scientifique inattendue Google a récemment annoncé l’arrivée de Cell2Snooze, une nouvelle intelligence artificielle dérivée de sa technologie Cell2Sentence, initialement conçue pour décrire les moindres détails de cellules humaines à partir d’échantillons biologiques. Selon les premiers témoignages, ce modèle révolutionnaire serait si efficace qu’il livre des explications interminables, faisant plonger immédiatement dans un état de somnolence profond qu’aucun café ne saurait contrer. Les ingénieurs de Google assurent toutefois que ce n’est pas un bug, mais une caractéristique clef : « Nous voulions rendre la biologie accessible à tous, même à ceux qui n’aiment pas les manuels complets », expliquent-ils du haut de leur sérieux légendaire. Officiellement, l’objectif est de réconcilier la science et le grand public, même si la majorité des usagers avoue avoir décroché avant d’arriver à la troisième phrase. Le remède insoupçonné contre l’insomnie Face à la prolifération des applications de méditation et de yoga du soir, Google semble avoir trouvé la voie royale vers le sommeil : noyer l’utilisateur sous des détails cellulaires plus minutieux que n’importe quelle encyclopédie médicale. « En seulement deux minutes, je savais tout sur la membrane plasmique et, sincèrement, j’ai dû lutter pour garder les yeux ouverts », confie le Dr Bâillement, spécialiste en neurobiologie. Il ajoute que le potentiel thérapeutique de cet outil se révèle déjà prometteur, puisqu’une poignée de volontaires, victimes d’insomnies chroniques, ont pu s’endormir en plein milieu d’une phrase décrivant l’oxydation des mitochondries. Dans cette quête insolite mêlant technologie de pointe et somnifère numérique, Google semble avoir réussi l’impensable : transformer l’ennui scientifique en solution de relaxation imparable.
0 notes