Multi-label text classification
Learn to build a multi-label text classifier using DistilBERT with imbalanced classes. Covers binary cross-entropy loss, multi-hot encoding, and practical implementation strategies for handling multiple labels.
Read Post
t-distributed Stochastic Neighbor Embedding says "what"
Understand t-SNE dimensionality reduction for visualizing high-dimensional data. Covers perplexity parameter tuning, implementation with TF-IDF vectors, and interactive visualization best practices.
Read Post
Library version mismatches declared not safe
Critical lessons on matching Python package versions between model development and inference. Learn about safetensors format advantages and why version mismatches cause production failures.
Read Post
Mining word collocations
Extract common bigrams and trigrams from text using Gensim and NPMI scoring. Learn to mine jargon, phrases, and collocations from customer reviews, feedback, and text corpora.
Read Post
Notebook Automation in the Cloud
Automate Jupyter notebooks on AWS with custom schedules and compute. Setup EventBridge, Lambda, and SageMaker for reliable, cost-efficient headless notebook execution. Complete guide from environment setup to scheduling.
Read Post
Science Talk: Generative LLMs
Comprehensive introduction to generative LLMs covering basics, training processes, and real-world applications. Slides from talk delivered to 70+ attendees.
Read Post
Kernel Density Estimation
Create effective KDE plots with Seaborn. Learn optimal bin settings, histogram layering, and lesser-known parameters for better distribution visualization.
Read Post
Favorite Jupyter Notebook Settings
Essential Jupyter Notebook customizations to improve your data science workflow. Configuration tips for enhanced productivity and better user experience.
Read Post
Subscribe
All the latest posts directly in your inbox.