Abstract: This tutorial aims to provide participants with a comprehensive understanding of transformer
models and their evolution into Large Language Models (LLMs), highlighting their profound impact on NLP and other AI domains. Starting with an
overview of the foundational principles of transformers as outlined in "Attention is All You Need" and "BERT," the tutorial will delve into their
architectural design, operational mechanisms, and their diverse applications in fields such as computer vision, medical image analysis, natural language
understanding, and beyond. Furthermore, the tutorial will explore recent advancements and extensions of transformer models, including variants like
GPT (Generative Pre-trained Transformer). Participants will gain insights into how these models have revolutionized tasks such as text generation, language
translation, question answering, and summarization. Additionally, the tutorial will discuss challenges and future directions in transformer research, such as
improving model interpretability, handling multimodal data, and scaling up to even larger datasets and models.A brief plan is given below:
-Introduction to Transformers
-Transformer Architectural Insights and Training Techniques
- Delving into ViT and ViViT Models
- Application of Transformers for LLMs
IIT Delhi
IIT Mandi
IIT Mandi
TIH IIT Mandi
Abstract: Generative models generate new samples of a specific type of data.
These models have had an irreversible impact on the industry and, most
significantly, on daily life. To better understand this domain we present the tutorial on Conditional
Generative models. The generative models need to be controllable, to that
end they are “conditional” generative models. We provide sufficient examples
to cover the essential techniques of the present and potentially in the
future as well. We look through Autoencoders (AE), Variational Autoencoders
(VAE), Generative Adversarial Networks (GAN), and Diffusion Models used in
image generation tasks. We then focus on a relatively newer generative
technique inspired from reinforcement learning: Generative flow
networks(GFlowNets) used in molecular generation.
Below is a brief plan:
-Generative Models
-Autoencoders/ Variational Encoders
-Generative Adversarial Networks
-Diffusion Models
-Reinforcement Learning
-GFlowNets
-Applications of GFlownets
Digital University Kerala
IIST Thiruvananthapuram
Abstract: Many documents businesses and individuals deal with daily are document images (digital images of physical documents). On the other hand, born-digital PDFs (also called native PDFs) are no better than document images in terms of machine-readability. These documents contain multimodal content, and understanding natural language alone is not sufficient to comprehend them. Research in multimodal LLMs explores how text, visual, and layout information can be combined to understand visually rich documents holistically. In this tutorial, we will cover significant works that explore the problem of information extraction from document images and demonstrate popular approaches using both text-only and multimodal LLMs.
Below is a brief plan:
- LLMs and multimodal LLMs for document images
- Retrieval Augmented Generation
- Transform your visually rich PDFs or document images into a RAG-ready format
- Transformed documents with RAG + text-only LLMs for information extraction
- Multimodal LLMs (with text, layout, and vision capabilities) for information extraction
You can find more details here.
Wadhwani AI
Wadhwani AI
Abstract: Land cover mapping using remote-sensing imagery has attracted significant attention in recent years. Classification of land use and cover is an advantage of remote sensing technology, which provides all information about the land surface. Over the past decade, numerous studies have investigated land cover classification using a broad array of sensors, resolution, feature selection, classifiers, and other techniques of interest.
Pixel-based and image-based classification techniques are used for land cover classification from remote sensing images. Accurate and real-time land use/land cover (LULC) maps are essential for dynamic monitoring, planning, and management of the Earth. With the advent of cloud computing platforms and machine learning classifiers, new opportunities arise for more accurate and large-scale LULC mapping from high-resolution remote sensing images.
Deep learning-based segmentation of high-resolution satellite images provides valuable information for various geospatial applications, specifically for land use/land cover (LULC) mapping. The segmentation task becomes more challenging with the increasing number and complexity of LULC classes. This tutorial session presents a detailed introduction and implementation of deep learning algorithms for land use/land cover (LULC) mapping from high-resolution remote sensing images.
Below is a brief plan:
- Introduction and Applications of Land Cover Mapping
- Deep Learning Models for Land Cover Mapping from High Resolution Satellite Images
- Understanding 2D U-Net and Attention U-Net CNN Models for land use/land cover (LULC) mapping
- Introduction to Deep Learning with TensorFlow
- Implementation of 2D U-Net and Attention U-Net CNN Models for land use/land cover (LULC) mapping using Python
- Challenges in land use/land cover (LULC) mapping from High Resolution Satellite Images
- Advanced deep learning algorithms for land use/land cover (LULC) mapping
NITK Surathkal