Tags | experiments

Contents

attention
backpropagation
cnn
convolution
deeplearning
fast.ai
fastai
gan
gpu
llm
math
multimodal
nlp
optimization
self-supervised-learning
sequencemodelling
transformers
triton
tsai
unsloth

attention

Performer vs Softmax — From Kernel View to Fair Speed Tests • Aug 30, 2025

Attention Paths and Rank Collapse Part 1 • Mar 20, 2025

backpropagation

Making FSDP2 and QLoRA Work Together: Training Llama 8B on 2 GPUs • Dec 15, 2025

Memory-Efficient Backprop: Solving Unsloth's Challenge E • Dec 5, 2025

cnn

Do we need downsampling? • May 2, 2020

convolution

im2col • Dec 23, 2023

deeplearning

From Zero to 1.55x: Writing My First Triton Kernel for NF4 Dequantization • Dec 21, 2025

Making FSDP2 and QLoRA Work Together: Training Llama 8B on 2 GPUs • Dec 15, 2025

Memory-Efficient Backprop: Solving Unsloth's Challenge E • Dec 5, 2025

Performer vs Softmax — From Kernel View to Fair Speed Tests • Aug 30, 2025

Indic CLIP Multimodal Understanding for Indic Languages • Apr 26, 2025

Creating a Maze Solver using Pix2Pix • Mar 29, 2025

Why Batch Size Matters - The Surprising Difference Between Batch Training and Averaging • Mar 21, 2025

Attention Paths and Rank Collapse Part 1 • Mar 20, 2025

Unveiling Position Encoding in Transformers - From Absolute to Relative with RoPE • Mar 1, 2025

Understanding Transformer Positional Encodings - A Mathematical Deep Dive • Feb 22, 2025

Understanding and Preventing Collapse in Self-Supervised Learning A Deep Dive into BYOL • Feb 4, 2025

Gradient Clipping and Adaptive Learning Rates • Jan 12, 2025

Deep-Contextualized Embeddings ( ELMO ) • Dec 17, 2024

Tight-fisted Optimizer ( Tiger ) • Nov 19, 2024

im2col • Dec 23, 2023

Residual Learning • Sep 19, 2022

AEDA ( An Easier Data Augmentation Technique for Text Classification ) • Jan 31, 2022

Temporal Convolution Networks • Dec 22, 2021

Do we need downsampling? • May 2, 2020

fast.ai

Understanding and Preventing Collapse in Self-Supervised Learning A Deep Dive into BYOL • Feb 4, 2025

Gradient Clipping and Adaptive Learning Rates • Jan 12, 2025

Deep-Contextualized Embeddings ( ELMO ) • Dec 17, 2024

fastai

Performer vs Softmax — From Kernel View to Fair Speed Tests • Aug 30, 2025

Indic CLIP Multimodal Understanding for Indic Languages • Apr 26, 2025

AEDA ( An Easier Data Augmentation Technique for Text Classification ) • Jan 31, 2022

Temporal Convolution Networks • Dec 22, 2021

Do we need downsampling? • May 2, 2020

gan

Creating a Maze Solver using Pix2Pix • Mar 29, 2025

gpu

From Zero to 1.55x: Writing My First Triton Kernel for NF4 Dequantization • Dec 21, 2025

llm

Performer vs Softmax — From Kernel View to Fair Speed Tests • Aug 30, 2025

Attention Paths and Rank Collapse Part 1 • Mar 20, 2025

Unveiling Position Encoding in Transformers - From Absolute to Relative with RoPE • Mar 1, 2025

Understanding Transformer Positional Encodings - A Mathematical Deep Dive • Feb 22, 2025

math

Creating a Maze Solver using Pix2Pix • Mar 29, 2025

Why Batch Size Matters - The Surprising Difference Between Batch Training and Averaging • Mar 21, 2025

Attention Paths and Rank Collapse Part 1 • Mar 20, 2025

Unveiling Position Encoding in Transformers - From Absolute to Relative with RoPE • Mar 1, 2025

Understanding Transformer Positional Encodings - A Mathematical Deep Dive • Feb 22, 2025

Understanding and Preventing Collapse in Self-Supervised Learning A Deep Dive into BYOL • Feb 4, 2025

Gradient Clipping and Adaptive Learning Rates • Jan 12, 2025

Deep-Contextualized Embeddings ( ELMO ) • Dec 17, 2024

Tight-fisted Optimizer ( Tiger ) • Nov 19, 2024

im2col • Dec 23, 2023

Residual Learning • Sep 19, 2022

AEDA ( An Easier Data Augmentation Technique for Text Classification ) • Jan 31, 2022

Temporal Convolution Networks • Dec 22, 2021

multimodal

Indic CLIP Multimodal Understanding for Indic Languages • Apr 26, 2025

nlp

AEDA ( An Easier Data Augmentation Technique for Text Classification ) • Jan 31, 2022

optimization

Gradient Clipping and Adaptive Learning Rates • Jan 12, 2025

Deep-Contextualized Embeddings ( ELMO ) • Dec 17, 2024

Tight-fisted Optimizer ( Tiger ) • Nov 19, 2024

self-supervised-learning

Understanding and Preventing Collapse in Self-Supervised Learning A Deep Dive into BYOL • Feb 4, 2025

sequencemodelling

Temporal Convolution Networks • Dec 22, 2021

transformers

Attention Paths and Rank Collapse Part 1 • Mar 20, 2025

Unveiling Position Encoding in Transformers - From Absolute to Relative with RoPE • Mar 1, 2025

Understanding Transformer Positional Encodings - A Mathematical Deep Dive • Feb 22, 2025

triton

From Zero to 1.55x: Writing My First Triton Kernel for NF4 Dequantization • Dec 21, 2025

tsai

Temporal Convolution Networks • Dec 22, 2021

unsloth

From Zero to 1.55x: Writing My First Triton Kernel for NF4 Dequantization • Dec 21, 2025

Making FSDP2 and QLoRA Work Together: Training Llama 8B on 2 GPUs • Dec 15, 2025

Memory-Efficient Backprop: Solving Unsloth's Challenge E • Dec 5, 2025