Special Sessions

SS1: Multi-modal Agents for Visual Analysis and Generation
Abstract: As artificial intelligence advances toward Artificial General Intelligence (AGI), the complexity of task environments grows. Relying on a single large-scale model for decision-making can lead to suboptimal or irrational outcomes, especially in high-stakes fields like medicine, security, robots, automatic drive, etc. Multi-modal agents, which integrate diverse modalities allows for a richer, more nuanced representation of the world, which can lead to more robust and adaptable systems. These agents can perceive, reason, and act in complex environments, offering more comprehensive and accurate reasoning and content generation, which is critical for high-reliability systems in diverse sectors. They are especially useful in complex tasks that require understanding and interaction across different types of data.

Significance: A multi-modal agent is an intelligent system capable of simultaneously processing and integrating information from various modalities to make better decisions and generate richer outputs. These systems allow for richer, more personalized interactions by interpreting and generating content across multiple modalities. In the context of visual analysis and generation, multi-modal agents can combine visual inputs with textual data to enhance interpretation and create richer content. This approach is particularly valuable for tasks that require cross-modal integration, such as diagnosing plant diseases from images and text. With their ability to handle complex environments, multi-modal agents are poised to drive innovations in industries like healthcare, agriculture, and entertainment, ensuring more reliable and comprehensive decision-making.

Solutions: This session will address key challenges in multi-modal agents for visual understanding and generation, such as cross-modal fusion, real-time processing, and maintaining contextual coherence. We will explore effective integration of visual and textual data for more accurate and meaningful outputs, and how to optimize multi-modal learning for enhanced understanding and content generation. By tackling these issues, we aim to advance the development of more robust and capable multi-modal agents.

Organizers: Qi Wang, Guizhou University; Wu Liu, University of Science and Technology of China; Xinchen Liu, JD Explore Academy; Ran Yi, Shanghai Jiao Tong University; Jiebo Luo, University of Rochester

SS2: Green Visual Coding (GVC)
Abstract: In a context characterized by the increasing diversity of high-definition formats for visual data representation, the development of advanced compression solutions has become a critical necessity. This special session aims to bridge technological innovation with ecological responsibility, tackling a key challenge in an era where sustainability and energy efficiency are of utmost importance. To this end, the session will delve into recent advancements in visual data compression, with a particular focus on strategies that optimize the computational complexity of encoding and decoding processes, while minimizing energy consumption and reducing environmental impact. As this domain rapidly evolves, it is attracting significant interest from both the academic research community and industrial stakeholders. Accordingly, the session will serve as a platform to present cutting-edge contributions, fostering progress in this strategically vital and swiftly expanding field. Topics will include, but are not limited to, algorithmic optimization, hardware acceleration, and energy-efficient encoding/decoding techniques.

Organizers: Anissa MOKRAQUI, L2TI, UNIVERSITÉ SORBONNE PARIS NORD; Alexandre Mercat, Ultra Video Group, Tampere University, Finland; Christian Herglotz, BTU, Germany

SS3: Emerging Trends in Visual Food Analysis
Abstract: Analysis of food-related visual data has emerged as a rapidly evolving research domain with significant implications for food quality assessment, composition analysis, and safety inspection. Despite recent advances in image processing and computer vision, food-related visual data presents unique technical challenges that demand specialized solutions such as (i) complex variations in food appearance, texture, and fine-grained categories, (ii) multi-modal image fusion needs in food quality inspection, combining RGB, hyperspectral, X-ray, and thermal imaging data, (iii) image enhancement requirements for food production environments with varying lighting, motion blur, and occlusions, (iv) specialized approaches for complex food surfaces analysis with multiple layers or reflective properties, and (v) quantitative analysis requirements for extracting food nutritional, structural, and safety-related properties from image data. These challenges are further compounded by varying imaging protocols, complex food compositions, and diverse presentation styles across different cultural contexts.

With recent advances in artificial intelligence (AI) and computer vision, particularly the emergence of generative AI techniques and advanced sensing modalities, new trends in addressing these complex challenges have emerged, introducing new possibilities to advance food image analysis. This special session aims to explore emerging trends in visual data analysis for food-related applications in the real world. By bringing together researchers across image processing, computer vision, food science, and industry applications, we expect this convergence of traditional image processing and emerging AI technologies to enable breakthrough solutions in food quality assessment, safety inspection, nutrition analysis, and broader food-related applications.

Topics of interest include, but are not limited to:

  • Food image and video generation using generative models
  • Multi-modal food imaging systems
  • Image restoration and enhancement in food production
  • Fusion of RGB, spectral, and thermal food images
  • X-ray and CT imaging for internal quality inspection
  • Texture and color analysis for food quality assessment
  • Food nutritional composition analysis
  • Large foundation models for food understanding

Organizers: Jiangpeng He, Massachusetts Institute of Technology; Yuhao Chen, University of Waterloo; Fengqing Zhu, Purdue University;

SS4: Computational Imaging for Materials and Microscopy
Abstract: Materials Science is considered one of the top growth areas for Machine Learning. Being highly dependent on microscope observations, Materials Science is also a natural fit with Imaging and Signal Processing: the major user centers for bright source tomography (Brookhaven, Oak Ridge, Argonne, for example) are now investing in Imaging Science in order to analyze the historically large volumes of image data produced in these centers. This session brings together cross-disciplinary researchers who work between the data intensive fields of Computational Imaging and Machine Learning with the scientific fields of Materials Science and Microscopy. Being naturally based in physics, the Materials Science/Computational Imaging collaboration represents a significant growth area in the foreseeable future.

Organizers: Jeff Simmons, Air Force Research Laboratory; S.V. Venkatakrishnan, Oak Ridge National Laboratory; Benjamin Berkels, RWTH Aachen University; Yuejie Chi, Carnegie Mellon University;

SS5: Emerging Immersive Video Coding
Abstract: Immersive video has rapidly attracted attentions from academia and industry for its capability to create realistic and multi-sensory user experiences. Immersive video technology redefines the way how visual data can be perceived. It introduces a new level of engagement between users or between a user and an environment rendered around them. Several immersive representation formats, e.g., point cloud, mesh, volumetric video, and 3D Gaussian Splatting, provide users unparalleled sense of presence and interactivity. They have been transforming applications for virtual reality, augmented reality, gaming, telemedicine, and remote collaboration. In contrast to traditional 2D video, immersive video allows a user to navigate, explore and interact within a virtual or augmented world. With the shift from traditional video to immersive video, an all-new perception and interaction is being realized.

Alongside new user experiences, the exponential growth of data amount from immersive video posed substantial challenges for efficient storage, transmission, and processing. The emerging immersive video often requires to model 3D objects or 3D scenes by conveying 3D spatial structure and texture information with fine details to deliver realistic pictures. The inevitable data volume that far exceeds traditional image and video unfortunately face formidable challenges. In addition, low latency is essential for interactive immersive applications. It is nontrivial to conduct optimization between performance and latency. For example, an efficient compression or processing has to comply with hardware constraint to enable successful deployment of the next wave of immersive video innovations.

This Special Session is to seek original research work that investigate the key problems for immersive video. The topics intersect various disciplines, including computer vision, machine learning, and compression. The session will foster collaboration among researchers from diverse backgrounds, leading to innovative research directions.

Organizers: Li Zhang, Bytedance Inc.; Xinfeng Zhang, University of Chinese Academy of Sciences; Dong Tian, InterDigital;

SS6: Generative Image Models: Methods, Datasets, and Applications
Abstract: This special session is conceived as a cutting-edge academic forum focusing on innovative methods and datasets for generative image models in various applications. Attendees will gain insights into the latest advancements and challenges in the rapidly evolving applications.

The special session will showcase the latest breakthroughs in generative image modeling, offering participants a chance to learn about state-of-the-art techniques and methodologies directly from leading experts in this field. The topics intersect various disciplines, including computer vision, machine learning, and image processing. The session will foster collaboration among researchers and practitioners from diverse backgrounds, leading to innovative research directions. Highlighting practical applications of generative image models in industries will demonstrate their transformative potential and encourage the development of real-world application with significant impact. Hosting a session on this hot topic will elevate the profile of ICIP and attract more diverse audience.

This special session will contain six papers invited from world-renown authors, covering the following topics:

  • High-quality vehicle image synthesis with precise license plate customization
  • Generative model-based human video compression
  • Semantic-aware X-ray image generation for printed circuit board layout extraction
  • Garment De-warping for virtual try-on in the wild
  • Cyclic epipolar geometry for faithful video synthesis
  • High-resolution paired real rainy image dataset and VLM-based data refinement for image deraining

Organizers: Chia-Wen Lin, National Tsing Hua University; Dong Tian, InterDigital;

SS7: Application of Generative AI in Healthcare
Abstract: During the last decade, the rapid growth of Artificial Intelligence (AI) has revolutionized different fields of science. Medical imaging constitutes a big portion of the data analyzed within the healthcare domain and is one of the most prolific interdisciplinary areas within AI research. Several models, techniques and algorithms have been devised tailored to the needs of medical imaging, also being carried over to other tasks in computer vision. The key goal is to bring more objectivity and accuracy in diagnosis, as well as expedite the routine tasks of the physicians. Classification models are used for disease diagnosis and survival prediction, to help prepare the appropriate treatment planning. On the other hand, Generative AI (GenAI) models enable new functionalities, such as semantic image segmentation, medical image synthesis, modalities co-registration and image enhancement. Admittedly, unlike other image processing tasks, medical image processing entails high-stakes decisions. Therefore, models’ transparency and interpretability are aspects that the research in this area should also emphasize. In this special session, we will discuss AI-based models and frameworks with clinical applications in various domains of healthcare. A large part of the special session will be focused on emerging applications of GenAI in healthcare and medical imaging, demonstrating how it can be used to enhance certain tasks of physicians and enable new capabilities in their diagnosis/prognosis pipeline. This special session aims to bridge advances in AI technology with medicine, by showcasing works of applied research that have a high impact on physicians’ practices and patients’ lives.

Organizers: Vasileios Magoulianitis, Keck School of Medicine, University of Southern California; Jonghye Woo, Department of Radiology, Harvard Medical School / Massachusetts General Hospital; Jing Zhang, Department of Computer Science, University of California, Irvine;

SS8: Pathological lmage Processing with Foundation Models
Abstract: Histopathological images are the gold standard for cancer diagnosis, characterized by their vast size, massive amount of information, and multi-level features. These images contain the mechanism of tumor initiation and development, as well as future trends in cancer progression. With the rapid advancement of artificial intelligence technology, the application of AI in the analysis of histopathological images is steadily increasing. The analysis of histopathological images involves various complex tasks such as diagnosis, prognosis, and the discovery of tumor markers, encompassing typical computer vision tasks like classification, segmentation, detection, and regression. Histopathological image data usually come with modalities like pathology reports, genetic testing, liquid biopsies, and other multimodal data. The emergence of multimodal foundation models has brought new solutions to intelligent analysis of histopathological images. In the past two years, Nature and its sub-journals have published over ten articles on the analysis of histopathological images using large models. With the recent release of multimodal datasets containing millions of histopathological images, a large number of scholars in the field have begun conducting research on the analysis of histopathological images using large models. Research on histopathological image analysis using large models can accelerate the clinical application of precise diagnostic analysis techniques, as well as the discovery of tumor markers, thereby aiding in uncovering the mechanism of tumor initiation and development.

Topics:

  • Advancing Cancer diagnosis through Histopathological Image Analysis
  • Multimodal Analysis of Histopathological Images to Uncover Tumor Mechanisms
  • Segmentation of Cells and Tissues in Histopathological Images
  • Prognostic Prediction using Histopathological Images
  • Decoupling and Visualization of Histopathological Images
  • Histopathological Image Representation Learning
  • Utilizing Transfer Pathology Foundation Models for specific types of cancer
  • AI-Driven Discovery of Tumor Markers in Histopathological Images

Organizers: Mingli Song, Zhejiang University; Lijuan Wang, Microsoft Research

SS9: Advances in volumetric media compression
Abstract: This proposed special session aims to convene researchers and practitioners to examine and discuss recent advancements in the field of volumetric media compression, with a particular emphasis on dynamic meshes.

The session will serve as a platform for the comprehensive exploration of the scientific and technological aspects of dynamic meshes, offering an in-depth analysis of its methodologies, applications, and foundational principles. While standardization bodies have played a pivotal role in consolidating and advancing innovations in this domain—for example, through the ongoing standardization of MPEG Video-based Dynamic Mesh Compression (V-DMC)—opportunities remain to investigate advanced approaches and novel perspectives that may lie outside the scope of standardization efforts.

Accordingly, this session aims to foster contributions from organizations and researchers engaged in MPEG standardization activities, as well as from those not directly involved in standardization but actively contributing to the development of mesh compression technologies. By promoting this exchange, the session seeks to complement the achievements of the standardization community with fresh insights and groundbreaking ideas.

Topics of interest for this session include, but are not limited to, challenges such as improving compression efficiency, preserving visual quality, optimizing motion estimation, and innovating geometry and attribute representation techniques in volumetric media. Broader contributions are also encouraged, encompassing general volumetric media coding, alternative coding frameworks, novel algorithms, and experimental methods.

By bringing together diverse perspectives, this special session aims to advance understanding and foster innovation in this rapidly evolving field. It aspires to inspire collaborations that address current challenges and explore new directions for the compression of immersive media content, which is a critical enabler in the development of future volumetric media services and applications.

Organizers: Marius Preda, Institut Mines Telecom;

SS10: Advances in Multimedia Compression and Quality Metrics for Resource-Constrained System
Abstract: The explosive growth of multimedia content including audios, images, videos, and 3D data points significant challenges in achieving efficient compression, perceptual quality assessment, and real-time model deployment, particularly in resource-constrained environments. This special session will spotlight groundbreaking innovations that address these challenges, paving the way for transformative applications in real-time streaming, immersive AR/VR, autonomous systems, and more.

The session will feature high-quality contributions in the following areas but not limited to:

  1. Multimedia Compression: Advancements in traditional and AI-driven compression techniques, perceptually optimized adaptive coding, and integration of machine learning to enhance compression efficiency.
  2. Perceptual Quality Metrics: Development of novel objective and subjective metrics for multimedia content, including audio, video, images, and 3D data, and their impact on user experience and application design.
  3. Optimized Model Deployment: Strategies for real-time neural network deployment, including pruning, quantization, mixed-precision techniques, and hardware-software co-design for low-latency applications.
  4. Emerging Applications and Deployments: Case studies and innovations in edge-device optimization for multimedia processing in healthcare, AR/VR, and intelligent transportation systems, showcasing real-world impacts of cutting-edge research.

This session emphasizes interdisciplinary collaboration, bridging theory and application to foster impactful innovations. By addressing challenges at the intersection of compression, quality assessment, and model optimization, it aligns with ICIP’s mission to highlight emerging trends like Generative AI, delivering insights into the future of multimedia technologies.

With a focus on real-world relevance, this session offers a unique platform for researchers and practitioners to present novel methodologies, practical deployments, and interdisciplinary solutions that advance the state of the art in multimedia engineering.

Organizers: Tushar Shinde, IIT Madras Zanzibar; Patrick Le Callet, Polytech Nantes, Université de Nantes, France; Kumar Rahul, Amazon Prime Video, USA

SS11: Medical Imaging in the Era of Large Vision-Language Models
Abstract: Medical imaging is indispensable in clinical diagnostics and health monitoring. The integration of Artificial Intelligence (AI) with medical imaging has revolutionized diagnostic methodologies, enabling high precision in tasks such as lesion detection and segmentation.

The challenge lies in developing a robust medical imaging system that not only provides high visual fidelity but also enhances clinical utility. AI presents a versatile solution, expanding its applications to image enhancement and direct disease diagnosis. This session explores the integration of AI with medical imaging, discussing innovative approaches that leverage large foundation models for improved diagnostic accuracy and reliability.

The main topics of interest include, but are not limited to:

  • Advancements in AI-driven methods for lesion detection and segmentation.
  • Enhancing the precision of disease diagnosis through AI technologies.
  • Applications of AI in direct disease diagnosis processes.
  • Development and utilization of advanced AI techniques in medical imaging.
  • Investigation of AI’s impact on improving diagnostic outcomes.
  • Interdisciplinary collaboration between AI and medical fields to enhance diagnostic practices.

Organizers: Weide Liu, Harvard Medical School; Wei Zhou, Cardiff University;

Inquiries

Inquiries should be sent to: specialsessions@2025.ieeeicip.org