CIKM 2025

Abstract. Graph-structured data is ubiquitous across diverse domains like social networks, search, question answering, and drug discovery.
Effective retrieval of (sub-)graphs with relevant substructures has become critical to the success of these applications.
This proposed tutorial will introduce attendees to state-of-the-art neural methods for graph retrieval, highlighting architectures that effectively model relevance through innovative combinations of early and late interaction mechanisms.
Participants will explore relevance models that represent graphs as sets of embeddings, enabling alignment-driven similarity scoring between query and corpus graphs and supporting diverse cost functions, both symmetric and asymmetric. We will also discuss compatibility with Approximate Nearest Neighbor (ANN) methods, covering recent advances in locality-sensitive hashing (LSH) and other indexing techniques that significantly enhance scalability in graph retrieval.The tutorial includes hands-on experience with an accessible, PyTorch-integrated toolkit that provides downloadable graph retrieval datasets and baseline implementations of recent methods. Participants will learn to adapt these methods for multi-modal applications — such as molecule, text, and image retrieval — where graph-based retrieval proves particularly effective. Designed for researchers and practitioners, this session delivers both foundational concepts and practical tools for implementing and scaling neural graph retrieval solutions across interdisciplinary applications.

https://sites.google.com/view/graph-match-tutorial/home

Neural Differential Equations for Continuous-Time Analysis.

Yongkyung Oh (University of California, Los Angeles), Dongyoung Lim (Ulsan National Institute of Science & Technology) and Sungil Kim (Ulsan National Institute of Science & Technology).

Abstract. Modeling complex, irregular time series is a critical challenge in knowledge discovery and data mining.
This tutorial introduces Neural Differential Equations (NDEs)—a powerful paradigm for continuous-time deep learning that intrinsically handles the non-uniform sampling and missing values where traditional models falter.
We provide a comprehensive review of the theory and practical application of the entire NDE family: Neural Ordinary (NODEs), Controlled (NCDEs), and Stochastic (NSDEs) Differential Equations.
The tutorial emphasizes robustness and stability and culminates in a hands-on session where participants will use key open-source libraries to solve real-world tasks like interpolation and classification.
Designed for AI researchers and practitioners, this tutorial equips attendees with essential tools for time series analysis.

https://nde-for-ts.github.io

Fairness in Language Models: A Tutorial

Zichong Wang (Florida International University), Avash Palikhe (Florida International University), Zhipeng Yin (Florida International University) and Wenbin Zhang (Florida International University).

Abstract. Language Models (LMs) achieve outstanding performance across diverse applications but often produce biased outcomes, raising concerns about their trustworthy deployment. These concerns call for fairness research specific to LMs; however, most existing work in machine learning assumes access to model internals or training data-conditions that rarely hold in practice for large-scale LMs. As LMs continue to exert growing societal influence, it becomes increasingly important to understand and address fairness challenges unique to these models. To this end, our tutorial begins by showcasing real-world examples of bias to highlight their practical implications and uncover underlying sources. We then define fairness concepts tailored to LMs, review methods for bias evaluation and mitigation, and present a multi-dimensional taxonomy of benchmark datasets for fairness assessment. We conclude by outlining open research challenges, aiming to provide the community with both conceptual clarity and practical tools for fostering fairness in LMs. All tutorial resources are publicly accessible at https://github.com/vanbanTruong/fairness-in-large-language-models.

Generative Models for Synthetic Data: Transforming Data Mining in the GenAI Era.

Dawei Li (Arizona State University), Yue Huang (University of Notre Dame), Ming Li (University of Maryland), Tianyi Zhou (University of Maryland), Xiangliang Zhang (University of Notre Dame) and Huan Liu (Arizona State University).

Abstract. In the era of data-driven artificial intelligence (AI), access to large-scale, high-quality datasets has become a fundamental requirement for breakthroughs in data mining and machine learning. However, real-world data is often scarce, expensive to annotate, or restricted due to privacy and proprietary concerns.
Synthetic data, algorithmically generated datasets that mimic the statistical properties and underlying patterns of real-world data, has emerged as a powerful solution to these challenges.

https://syndata4dm.github.io

Neural Shifts in Collaborative Team Recommendation.

Mahdis Saeedi (University of Windsor) and Hossein Fani (University of Windsor).

Abstract. Team recommendation involves selecting skilled experts to form an almost surely successful collaborative team, or refining the team composition to maintain or excel at performance.
To eschew the tedious and error-prone manual process, various computational and social science theoretical approaches have been proposed wherein the problem definition remains essentially the same, while it has been referred to by such other names as team allocation, selection, composition, and formation.
In this tutorial, we study the advancement of computational approaches from greedy search in pioneering works to the recent learning-based approaches, with a particular in-depth exploration of graph neural network-based methods as the cutting-edge class, via unifying definitions, formulations, and evaluation schema.
More importantly, we then discuss team refinement, a subproblem in team recommendation that involves structural adjustments or expert replacements to enhance team performance in dynamic environments.
Finally, we introduce training strategies, benchmarking datasets, and open-source tools, along with future research directions and real-world applications.

https://fani-lab.github.io/OpeNTF/tutorial/cikm25

A Tutorial on Hypergraph Neural Networks: An In-Depth and Step-By-Step Guide.

Sunwoo Kim (KAIST), Soo Yong Lee (KAIST), Yue Gao (Tsinghua University), Alessia Antelmi (University of Turin), Mirko Polato (University of Turin) and Kijung Shin (KAIST).

Abstract. Higher-order interactions (HOIs) are ubiquitous in real-world networks, such as group discussions on online Q&A platforms, co-purchases of items in e-commerce, and collaborations of researchers.
Investigation of deep learning for networks of HOIs, expressed as hypergraphs, has become an important agenda for the data mining and machine learning communities. As a result, hypergraph neural networks (HNNs) have emerged as a powerful tool for representation learning on hypergraphs. Given this emerging trend, we provide a timely tutorial dedicated to HNNs.
We cover the following topics: (1) inputs, (2) message passing schemes, (3) training strategies, (4) applications (e.g., recommender systems and time series analysis), and (5) open problems of HNNs.
This tutorial is intended for researchers and practitioners who are interested in hypergraph representation learning and its applications.

https://sites.google.com/view/hnn-tutorial

Socially Responsible and Trustworthy Generative Foundation Models: Principles, Challenges, and Practices.

Yue Huang (University of Notre Dame), Canyu Chen (Northwestern University), Lu Cheng (University of Illinois Chicago), Bhavya Kailkhura (Lawrence Livermore National Laboratory), Nitesh Chawla (University of Notre Dame) and Xiangliang Zhang (University of Notre Dame).

Abstract. Generative foundation models (GenFMs), including large language and multimodal models, are transforming information retrieval and knowledge management.
However, their rapid adoption raises urgent concerns about social responsibility, trustworthiness, and governance.
This tutorial offers a comprehensive, hands-on overview of recent advances in responsible GenFMs, covering foundational concepts, multi-dimensional risk taxonomies (including safety, privacy, robustness, truthfulness, fairness, and machine ethics), state-of-the-art evaluation benchmarks, and effective mitigation strategies.
We integrate real-world case studies and practical exercises using open-source tools, and present key perspectives from both policy and industry, including recent regulatory developments and enterprise practices.

The session concludes with a discussion of open challenges, providing actionable guidance for the CIKM community.

https://tutorial-trustgenfm.github.io

Continual Recommender Systems.

Hyunsik Yoo (University of Illinois Urbana-Champaign), Seongku Kang (Korea University) and Hanghang Tong(University of Illinois Urbana-Champaign).

Abstract. Modern recommender systems operate in uniquely dynamic settings: user interests, item pools, and popularity trends shift continuously, and models must adapt in real time without forgetting past preferences.
While existing tutorials on continual or lifelong learning cover broad machine learning domains (e.g., vision and graphs), they do not address recommendation-specific demands—such as balancing stability and plasticity per user, handling cold-start items, and optimizing recommendation metrics under streaming feedback.
This tutorial aims to make a timely contribution by filling that gap. We begin by reviewing the background and problem settings, followed by a comprehensive overview of existing approaches, including replay-based and regularization-based methods.
We then highlight recent efforts to apply continual learning to practical deployment environments, such as resource-constrained systems and sequential interaction settings.
Finally, we discuss open challenges and future research directions.
We believe this tutorial will be valuable to researchers and practitioners in recommender systems, data mining, and artificial intelligence, and will benefit a wide range of real-world application domains related to information retrieval.
The website of this tutorial is available.

https://www.idea.korea.ac.kr/research/tutorial-continual-recommender-systems

Towards Large Generative Recommendation: A Tokenization Perspective.

Yupeng Hou (University of California, San Diego), An Zhang (University of Science and Technology of China), Leheng Sheng (National University of Singapore), Jiancan Wu (University of Science and Technology of China), Xiang Wang (University of Science and Technology of China), Tat-Seng Chua (National University of Singapore) and Julian McAuley (University of California, San Diego).

Abstract. The emergence of large generative models is transforming the landscape of recommender systems. One of the most fundamental components in building these models is action tokenization, the process of converting human-readable data (e.g., user-item interactions) into machine-readable formats (e.g., discrete token sequences).
In this half-day tutorial, we present a comprehensive overview of existing action tokenization techniques, converting actions to (1) item IDs, (2) textual descriptions, and (3) semantic IDs, and explore how they relate to the development of large generative recommendation models.
We then make an in-depth discussion on the challenges, open questions, and potential future directions from the perspective of action tokenization, aiming to inspire the design of next-generation recommender systems.

https://large-genrec.github.io/cikm2025.html

Uncertain Boundaries: A Tutorial on Copyright Challenges and Cross-Disciplinary Solutions for Generative AI.

Zhipeng Yin (Florida International University), Zichong Wang (Florida International University), Avash Palikhe (Florida International University) and Wenbin Zhang (Florida International University).

Abstract. As generative artificial intelligence (AI) becomes increasingly prevalent in creative industries, intellectual property issues have come to the forefront, especially regarding AI-generated content that closely resembles human-created works.
Recent high-profile incidents involving AI-generated outputs reproducing copyrighted materials underscore the urgent need to reassess current copyright frameworks and establish effective safeguards against infringement.
This tutorial systematically examines copyright-related challenges throughout various AI development stages, providing practical recommendations for developers.
It first outlines fundamental copyright principles and considerations specific to generative AI, followed by methods for detecting and assessing potential infringements in AI-generated content.
It also introduces protective strategies to prevent unauthorized replication of creative materials and training datasets.
Additionally, the tutorial details specialized training methods designed to minimize the likelihood of infringement.
Finally, it reviews current AI copyright regulatory frameworks, identifies open research questions, and proposes directions for future research.

https://aicopyright-tutorial.github.io