List of Accepted Tutorials

Tutorial-1: Transforming Human Perception: A Tutorial from Transformer to LLM Architectures

Abstract: This tutorial aims to provide participants with a comprehensive understanding of transformer models and their evolution into Large Language Models (LLMs), highlighting their profound impact on NLP and other AI domains. Starting with an overview of the foundational principles of transformers as outlined in "Attention is All You Need" and "BERT," the tutorial will delve into their architectural design, operational mechanisms, and their diverse applications in fields such as computer vision, medical image analysis, natural language understanding, and beyond. Furthermore, the tutorial will explore recent advancements and extensions of transformer models, including variants like GPT (Generative Pre-trained Transformer). Participants will gain insights into how these models have revolutionized tasks such as text generation, language translation, question answering, and summarization. Additionally, the tutorial will discuss challenges and future directions in transformer research, such as improving model interpretability, handling multimodal data, and scaling up to even larger datasets and models.A brief plan is given below:
-Introduction to Transformers
-Transformer Architectural Insights and Training Techniques
- Delving into ViT and ViViT Models
- Application of Transformers for LLMs

Organizers:

Sumantra Dutta Roy
IIT Delhi

Aditya Nigam
IIT Mandi

Arnav Bhavsar
IIT Mandi

Gaurav Jaswal
TIH IIT Mandi
Tutorial-2: Conditional Generative Models

Abstract: Generative models generate new samples of a specific type of data. These models have had an irreversible impact on the industry and, most significantly, on daily life. To better understand this domain we present the tutorial on Conditional Generative models. The generative models need to be controllable, to that end they are “conditional” generative models. We provide sufficient examples to cover the essential techniques of the present and potentially in the future as well. We look through Autoencoders (AE), Variational Autoencoders (VAE), Generative Adversarial Networks (GAN), and Diffusion Models used in image generation tasks. We then focus on a relatively newer generative technique inspired from reinforcement learning: Generative flow networks(GFlowNets) used in molecular generation.
Below is a brief plan:
-Generative Models
-Autoencoders/ Variational Encoders
-Generative Adversarial Networks
-Diffusion Models
-Reinforcement Learning
-GFlowNets
-Applications of GFlownets

Organizers:

Sinnu Susan Thomas
Digital University Kerala

Vineeth B. S.
IIST Thiruvananthapuram
Tutorial-3: Use of LLMs and multimodal LLMs for information extraction from visually rich PDFs and document images

Abstract: Many documents businesses and individuals deal with daily are document images (digital images of physical documents). On the other hand, born-digital PDFs (also called native PDFs) are no better than document images in terms of machine-readability. These documents contain multimodal content, and understanding natural language alone is not sufficient to comprehend them. Research in multimodal LLMs explores how text, visual, and layout information can be combined to understand visually rich documents holistically. In this tutorial, we will cover significant works that explore the problem of information extraction from document images and demonstrate popular approaches using both text-only and multimodal LLMs. Below is a brief plan:
- LLMs and multimodal LLMs for document images
- Retrieval Augmented Generation
- Transform your visually rich PDFs or document images into a RAG-ready format
- Transformed documents with RAG + text-only LLMs for information extraction
- Multimodal LLMs (with text, layout, and vision capabilities) for information extraction
You can find more details here.

Organizers:

Mihir Goyal
Wadhwani AI

Minesh Mathew
Wadhwani AI
Tutorial-4: Deep Learning Models for Land Cover Mapping from High Resolution Satellite Images

Abstract: Land cover mapping using remote-sensing imagery has attracted significant attention in recent years. Classification of land use and cover is an advantage of remote sensing technology, which provides all information about the land surface. Over the past decade, numerous studies have investigated land cover classification using a broad array of sensors, resolution, feature selection, classifiers, and other techniques of interest. Pixel-based and image-based classification techniques are used for land cover classification from remote sensing images. Accurate and real-time land use/land cover (LULC) maps are essential for dynamic monitoring, planning, and management of the Earth. With the advent of cloud computing platforms and machine learning classifiers, new opportunities arise for more accurate and large-scale LULC mapping from high-resolution remote sensing images. Deep learning-based segmentation of high-resolution satellite images provides valuable information for various geospatial applications, specifically for land use/land cover (LULC) mapping. The segmentation task becomes more challenging with the increasing number and complexity of LULC classes. This tutorial session presents a detailed introduction and implementation of deep learning algorithms for land use/land cover (LULC) mapping from high-resolution remote sensing images. Below is a brief plan:
- Introduction and Applications of Land Cover Mapping
- Deep Learning Models for Land Cover Mapping from High Resolution Satellite Images
- Understanding 2D U-Net and Attention U-Net CNN Models for land use/land cover (LULC) mapping
- Introduction to Deep Learning with TensorFlow
- Implementation of 2D U-Net and Attention U-Net CNN Models for land use/land cover (LULC) mapping using Python
- Challenges in land use/land cover (LULC) mapping from High Resolution Satellite Images
- Advanced deep learning algorithms for land use/land cover (LULC) mapping

Organizers:

Shyam Lal
NITK Surathkal

NCVPRIPG-2024

NCVPRIPG-2024

NCVPRIPG-2024

NCVPRIPG-2024

NCVPRIPG-2024

List of Accepted Tutorials

Tutorial-1: Transforming Human Perception: A Tutorial from Transformer to LLM Architectures

Tutorial-2: Conditional Generative Models

Tutorial-3: Use of LLMs and multimodal LLMs for information extraction from visually rich PDFs and document images

Tutorial-4: Deep Learning Models for Land Cover Mapping from High Resolution Satellite Images