This site is built with fastpages, An easy to use blogging platform with extra features for Jupyter Notebooks.

Posts

Indic CLIP Multimodal Understanding for Indic Languages

Indic languages, despite their cultural richness and widespread usage, have long been underserved in the field of multimodal AI. Inspired by the success of OpenAI’s CLIP (Contrastive Language-Image Pre-training), I aimed to adapt this powerful architecture to bridge the multimodal understanding gap specifically for Hindi, with an outlook towards Sanskrit. This project demonstrates how vision-language models can effectively associate images with relevant textual descriptions, opening pathways for applications such as cross-modal search and zero-shot image classification tailored to the Indian cultural context.

Apr 26, 2025
Why Batch Size Matters - The Surprising Difference Between Batch Training and Averaging

What is the difference between taking one step with batch_size=256 and starting from the same point 256 times with batch_size=1 and then averaging?

Mar 21, 2025
Attention Paths and Rank Collapse Part 1

This is the first part of a multi-part series diving into the theoretical underpinnings of transformer models. In this installment, we'll explore the concepts of attention paths and rank collapse, two fundamental ideas that help explain how transformers actually work under the hood.

Mar 20, 2025
Unveiling Position Encoding in Transformers - From Absolute to Relative with RoPE

Explore transformer position encodings, from absolute sinusoidal methods through T5 and XLNet innovations to Rotary Position Embedding (RoPE), detailing how complex exponentials and rotational geometry elegantly solve the challenge of encoding token positions in a way that preserves critical relative relationships.

Mar 1, 2025
Deep-Contextualized Embeddings ( ELMO )

How to implement the ELMO paper in fast.ai paper

Dec 17, 2024
Residual Learning

Understanding the role of residuals in model training.

Sep 19, 2022
Do we need downsampling?

Understanding deep cnn architectures.

May 2, 2020
Creating a Maze Solver using Pix2Pix

Maze solving has been a classic computer science problem for decades. Traditionally, we've relied on algorithms like depth-first search, breadth-first search, or A* to find paths through labyrinths. But what if, instead of explicitly programming a solution, we could teach a neural network to visually understand and solve mazes? This is exactly what I set out to explore using a Pix2Pix GAN architecture.

Mar 29, 2025
Understanding Transformer Positional Encodings - A Mathematical Deep Dive

A rigorous mathematical exploration of transformer positional encodings, revealing how sinusoidal functions elegantly encode sequence order through linear transformations, inner product properties, and asymptotic decay behaviors that balance local and global attention.

Feb 22, 2025
Understanding and Preventing Collapse in Self-Supervised Learning A Deep Dive into BYOL

Bootstrap your own latents.

Feb 4, 2025
Gradient Clipping and Adaptive Learning Rates

Understand how gradient clipping and adaptive learning rates relate to stable model training.

Jan 12, 2025
Tight-fisted Optimizer ( Tiger )

How to implement an optimzer in fast.ai example tiger

Nov 19, 2024
im2col

Implement convolutions as matrix operations using im2col

Dec 23, 2023
AEDA ( An Easier Data Augmentation Technique for Text Classification )

A new data augmentation method proposed for text classification.

Jan 31, 2022
Temporal Convolution Networks

What is a Temporal Convolution Network? What are its building blocks? A working implementation using fast.ai and tsai

Dec 22, 2021