Multi-label text classification

Learn to build a multi-label text classifier using DistilBERT with imbalanced classes. Covers binary cross-entropy loss, multi-hot encoding, and practical implementation strategies for handling multiple labels.

t-distributed Stochastic Neighbor Embedding says "what"

Understand t-SNE dimensionality reduction for visualizing high-dimensional data. Covers perplexity parameter tuning, implementation with TF-IDF vectors, and interactive visualization best practices.

Library version mismatches declared not safe

Critical lessons on matching Python package versions between model development and inference. Learn about safetensors format advantages and why version mismatches cause production failures.

Mining word collocations

Extract common bigrams and trigrams from text using Gensim and NPMI scoring. Learn to mine jargon, phrases, and collocations from customer reviews, feedback, and text corpora.

Notebook Automation in the Cloud

Automate Jupyter notebooks on AWS with custom schedules and compute. Setup EventBridge, Lambda, and SageMaker for reliable, cost-efficient headless notebook execution. Complete guide from environment setup to scheduling.

Science Talk: Generative LLMs

Comprehensive introduction to generative LLMs covering basics, training processes, and real-world applications. Slides from talk delivered to 70+ attendees.

Kernel Density Estimation

Create effective KDE plots with Seaborn. Learn optimal bin settings, histogram layering, and lesser-known parameters for better distribution visualization.

Favorite Jupyter Notebook Settings

Essential Jupyter Notebook customizations to improve your data science workflow. Configuration tips for enhanced productivity and better user experience.

Subscribe

All the latest posts directly in your inbox.