AI that sheds light on living organisms: at CBIO, deciphering biology to improve healthcare

Digital transformation Research Decoding
Published on 19 February 2026
Understanding how cells work, accelerating the discovery of new treatments, and tailoring medicine to each patient are among the major contemporary challenges facing the life sciences. At the Center for Bioinformatics (CBIO) at Mines Paris – PSL, artificial intelligence (AI) is a key scientific lever for addressing these issues, in an international context that promotes AI for the benefit of people, the planet, and progress, as exemplified by the India AI Impact Summit 2026 to be held in New Delhi from February 16 to 20.
Led by Professor Thomas Walter, the CBIO develops advanced machine learning methods capable of analyzing massive and heterogeneous biological data. Presented at the AI Workshop held at Mines Paris – PSL on December 10, 2025, this work illustrates how the center is mobilizing AI in the service of fundamental research, drug discovery, and precision medicine.

AI at the heart of life sciences

The research conducted at CBIO lies at the intersection of mathematics, computer science, biology, and medicine. Its mission is clear: to develop machine learning and AI methods capable of making sense of contemporary biological data, which is growing in both quantity and complexity.

This work addresses three major scientific objectives:

  • Advancing fundamental research by understanding how cells function, how genes interact, and how genomes evolve.
  • Contributing to drug discovery by identifying relevant biological targets and molecules capable of interacting with them.
  • Develop precision medicine by offering diagnostics and treatments tailored to each patient’s specific biological characteristics.

To achieve these objectives, the CBIO relies on a team of permanent researchers with complementary expertise: Chloé-Agathe Azencott, Éloïse Berson, Florian Massip, Vincent Mallet, Véronique Stoven, and Thomas Walter. The CBIO has a strategic partnership with the Institut Curie, a research center and hospital dedicated to cancer, and is affiliated with the “Computational Oncology” unit, which enables it to tackle complex problems and respond to specific clinical needs in oncology.

 

Biological data

Very large-scale data

Being able to predict the progression of a disease or the effectiveness of a treatment is now a major challenge in oncology. AI offers powerful methods for integrating multiple data sources and producing such predictions. However, a key obstacle to these approaches lies in the very nature of the data being analyzed. We refer to data as very high-dimensional when a very large number of variables are measured for a limited number of patients or samples.

For example, in genomic studies, the objective is to identify statistical links between mutations—often numbering in the millions—and a clinical phenotype. In transcriptomics, the expression of approximately 20,000 genes is analyzed. In both cases, the number of patients from which AI models can learn is typically several orders of magnitude smaller than the number of variables measured, making traditional analysis methods fragile and sometimes misleading.

The CBIO therefore adopts several complementary strategies to address this problem:

  • Integration of prior biological knowledge: genes and proteins rarely function alone; they interact within complex networks. This can be used to guide the learning process and make it more robust.
  • Multitask learning: rather than studying a single problem at a time, models are trained on several related issues, such as different types of diseases. This strategy allows synergies between these tasks to be exploited by leveraging common information across multiple datasets.
  • The development of nonlinear models: these are capable of capturing complex relationships. In biology, the effect of a gene often depends on the activity of other genes. These interactions, which are difficult to detect with simple tools, can be revealed using more sophisticated AI models.
  • Providing rigorous statistical guarantees: specific methods allow the risk of false discoveries to be rigorously estimated. In practice, this means that researchers can more reliably distinguish real biological signals from mere coincidences due to chance.

These approaches not only enable prediction, but above all identify the biological mechanisms actually involved, which is a key issue for biomedical research.

AI at the service of digital pathology

One of CBIO’s most visible areas of research is computational pathology, which uses AI to analyze images of biological tissue from histopathology, a discipline that uses microscopic examination of tissue for diagnostic purposes. These images, obtained from stained tissue sections and observed under a microscope, are now digitized in the form of whole-slide images that can reach several gigabytes in size.

The challenge is to automatically extract molecular or clinical information from tissue morphology alone.

CBIO researchers have developed methods capable of:

  • Predicting genetic mutations from tumor images.
  • Identifying subtypes of cancers associated with different prognoses or responses to treatment.
  • Highlighting intra-tumor heterogeneity, i.e., the coexistence of biologically distinct regions within the same tumor.
  • Produce diagnostic support tools, incorporating confidence estimates that are essential for clinical use.

In concrete terms, the images are divided into thousands of small regions, analyzed by AI models pre-trained on millions of medical images, and then aggregated to produce a patient-level prediction. This work paves the way for faster, less expensive, and more accessible oncology, without the need for systematic, labor-intensive molecular analyses. In all of this work, the CBIO collaborates closely with hospital pathology departments, notably at the Institut Curie with Anne Vincent Salomon, Director of the IHU Cancers des Femmes (Institut Curie – Université PSLInserm) and chief pathologist at the Pôle de Médecine diagnostique et théranostique (PMDT), as well as Yves Allory, head of the “Molecular Oncology” research team within the Cell Biology and Cancer Unit (UMR144) and head of the Pathology Department.

 

Linking form and function: the rise of spatial transcriptomics

Another major advance concerns spatial transcriptomics, a recent technology that makes it possible to observe not only which genes are expressed in a tissue, but also where exactly they are expressed. In concrete terms, this approach associates each point in a tissue with a measurement of the activity of thousands of genes, sometimes up to 20,000 genes at a time, while retaining their exact position in space. It thus offers a detailed view of the architecture of a tissue, linking the structure of the tissue to its molecular functioning.

These data are particularly rich, but they are also complex and costly to produce and analyze. It raises new computational challenges, such as cellular deconvolution, which involves determining which cells contribute to a signal measured in a given area, or the integration of several types of data, such as tissue images and molecular measurements. Another challenge is to establish predictive links between different modalities in order to supplement one piece of information with another.

To address these challenges, researchers at CBIO and the Systems Biology team at the Institut Curie, led by Emmanuel Barillot, have developed AI models capable of predicting gene expression at the cellular level based on the visual appearance of tissue observed under a microscope. These models can estimate highly detailed molecular information from routine clinical examinations, without systematically resorting to heavy and costly technologies.

These approaches open up prospects for integrative spatial biology combining images, genetic data, and clinical information. They are particularly promising for better understanding complex diseases such as cancer, where the spatial organization of cells plays a key role in disease progression and response to treatment.

 

AI to connect all scales of life, from proteins to patients

CBIO research also covers structural biology, which focuses on the three-dimensional shape of proteins and RNAs. Thanks to AI, it is now possible to:

  • Learn about the detailed geometric representations of these biomolecules.
  • Predict their interactions with other proteins or small molecules, a key issue in drug discovery.
  • Anticipate host-pathogen interactions, which is particularly useful in the face of emerging new viruses.

At the same time, the center is developing methods to reconcile AI statistical models with mechanistic models of biology, in order to move from correlation to understanding of the underlying biological mechanisms.

The AI Workshop to foster dialogue within a research ecosystem

This work was highlighted during the AI Workshop held in December 2025 at Mines Paris – PSL. Designed as an opportunity for internal exchange, the event allowed faculty, doctoral students, and engineers to present their projects, tools, and platforms through oral presentations and posters. Beyond the diversity of topics, the workshop highlighted a common dynamic: building AI rooted in reality, integrating it into complex systems, and tackling major societal challenges.

Towards more precise and explainable medicine

By developing robust, explainable AI rooted in biological reality, CBIO is helping to transform the way life sciences are studied and applied. From fundamental research to the discovery of treatments and precision medicine, its work illustrates how AI can become a tool for understanding living organisms, serving concrete and responsible medical innovations.

 


To go further

  • Loïc Chadoutaud, Marvin Lerousseau, Daniel Herrero-Saboya, Julian Ostermaier, Jacqueline Fontugne, et al.. sCellST predicts single-cell gene expression from H& E images. Nature Communications, 2026, 17 (1), pp.1194-1194. ⟨10.1038/s41467-025-67965-1⟩. ⟨hal-05502878⟩ 
  • Tristan Lazard, Guillaume Bataillon, Peter Naylor, Tatiana Popova, François-Clément Bidard, et al.. Deep Learning identifies new morphological patterns of Homologous Recombination Deficiency in luminal breast cancers from whole slide images. 2021. ⟨hal-03533688⟩ 
  • Asma Nouira, Chloé-Agathe Azencott. Sparse multitask group lasso for genome-wide association studies. PLoS Computational Biology, 2025, 21 (9), pp.e1012734. ⟨10.1371/journal.pcbi.1012734⟩. ⟨hal-04871066⟩ 
  • Juan G Carvajal-Patiño, Vincent Mallet, David Becerra, Luis Fernando Niño Vasquez, Carlos Oliver, et al.. RNAmigos2: accelerated structure-based RNA virtual screening with deep graph learning. Nature Communications, 2025, 16 (1), pp.2799. ⟨10.1038/s41467-025-57852-0⟩. ⟨hal-05418823⟩ 

Also to be discovered