Database
Browse 4,557 skills across 394 packs and 37 categories
Data Lake Storage
186LTriggers when users need help with data lake storage design, object storage
Data Lakehouse
133LTriggers when users need help with lakehouse architecture, Delta Lake, Apache
Data Migration
170LTriggers when users need help with data migration, large-scale migration
Data Modeling
137LTriggers when users need help with data modeling, dimensional modeling, Kimball
Data Orchestration
141LTriggers when users need help with data orchestration, Apache Airflow, DAGs,
Data Pipeline Architecture
125LTriggers when users need help with data pipeline design, ETL vs ELT patterns,
Data Quality
148LTriggers when users need help with data quality, data testing, data validation,
Data Warehousing
140LTriggers when users need help with cloud data warehouse design, Snowflake,
Real-Time Analytics
154LTriggers when users need help with real-time analytics, real-time dashboards,
Stream Processing
127LTriggers when users need help with stream processing, Apache Kafka architecture,
Adversarial ML
184LTriggers when users need help with adversarial machine learning, model robustness, or ML security. Activate for questions about adversarial attacks (FGSM, PGD, C&W, AutoAttack), adversarial training, certified robustness, model robustness evaluation, distribution shift, out-of-distribution detection, backdoor attacks, data poisoning, privacy attacks (membership inference, model extraction), and differential privacy in ML.
Convolutional Networks
144LTriggers when users need help with convolutional neural network architectures, CNN design patterns, or vision model selection. Activate for questions about ResNet, EfficientNet, ConvNeXt, depthwise separable convolutions, feature pyramid networks, receptive field analysis, normalization layers, Vision Transformers vs CNNs tradeoffs, and transfer learning from pretrained CNNs.
Generative Models
139LTriggers when users need help with generative deep learning models, image synthesis, or density estimation. Activate for questions about GANs, diffusion models, VAEs, flow-based models, DDPM, StyleGAN, mode collapse, classifier-free guidance, latent diffusion, ELBO, autoregressive generation, and evaluation metrics like FID, IS, and CLIP score.
Graph Neural Networks
148LTriggers when users need help with graph neural networks, graph representation learning, or applying deep learning to graph-structured data. Activate for questions about GCN, GAT, GraphSAGE, message passing, over-smoothing, graph pooling, heterogeneous graphs, temporal graphs, knowledge graphs with GNNs, molecular property prediction, social network analysis, recommendation systems on graphs, and GNN scalability.
Multi Modal Learning
167LTriggers when users need help with multimodal deep learning, vision-language models, or cross-modal representation learning. Activate for questions about CLIP, LLaVA, Flamingo, image captioning, visual question answering, text-to-image alignment, contrastive learning across modalities, audio-visual learning, multimodal fusion strategies (early, late, cross-attention), and multimodal benchmarks.
Neural Architecture Search
182LTriggers when users need help with neural architecture search, automated model design, or model compression. Activate for questions about NAS methods (reinforcement learning, evolutionary, differentiable/DARTS), search spaces, one-shot NAS, hardware-aware NAS, AutoML pipelines, efficient architecture design principles, scaling strategies (width, depth, resolution), and model compression (pruning, quantization, distillation).
Recommender Systems
169LTriggers when users need help with recommendation systems, collaborative filtering, or ranking models. Activate for questions about matrix factorization, ALS, content-based filtering, deep recommender models (NCF, Wide&Deep, DeepFM, two-tower), sequential recommendation, cold start problem, implicit vs explicit feedback, multi-objective ranking, exploration vs exploitation, and real-time recommendation serving.
Recurrent Architectures
147LTriggers when users need help with recurrent neural networks, sequence modeling with LSTMs or GRUs, or modern state-space models. Activate for questions about vanishing gradients, sequence-to-sequence models, attention mechanisms in RNNs (Bahdanau, Luong), bidirectional RNNs, Mamba, S4, and when RNNs still outperform transformers for sequential data.
Regularization Generalization
161LTriggers when users need help with preventing overfitting, improving model generalization, or applying regularization techniques. Activate for questions about dropout, weight decay, data augmentation (CutMix, MixUp, RandAugment, AugMax), label smoothing, early stopping, knowledge distillation, ensemble methods, bias-variance tradeoff in deep learning, and double descent phenomenon.
Self Supervised Learning
169LTriggers when users need help with self-supervised learning, representation learning without labels, or pretext task design. Activate for questions about contrastive learning (SimCLR, MoCo, BYOL), masked modeling (MAE, BEiT, data2vec), pretext tasks, representation evaluation (linear probing, fine-tuning), self-supervised methods for vision vs NLP vs audio, DINO and DINOv2, and curriculum learning.
Speech Audio ML
145LTriggers when users need help with speech processing, audio machine learning, or sound generation. Activate for questions about ASR architectures (CTC, attention-based, Whisper), text-to-speech (Tacotron, VITS, neural codec models), speaker verification, speaker diarization, audio classification, music generation, speech enhancement, speech separation, mel spectrograms, and audio tokenization (SoundStream, EnCodec).
Training Optimization
171LTriggers when users need help with deep learning training procedures, optimizer selection, or training efficiency. Activate for questions about SGD, Adam, AdamW, LAMB, Lion, learning rate schedules, gradient clipping, mixed precision training, FP16, BF16, gradient accumulation, weight initialization, loss landscape analysis, and hyperparameter tuning including Bayesian optimization and population-based training.
Transfer Learning
146LTriggers when users need help with transfer learning, fine-tuning pretrained models, or parameter-efficient adaptation. Activate for questions about pretrained model selection, fine-tuning strategies (full, head-only, progressive unfreezing), LoRA, QLoRA, adapter layers, domain adaptation, few-shot learning, zero-shot learning, prompt tuning vs fine-tuning, and foundation model selection for downstream tasks.
Transformer Architectures
127LTriggers when users need help with transformer model architectures, self-attention mechanisms, or positional encoding strategies. Activate for questions about multi-head attention, KV cache optimization, Flash Attention, grouped query attention, mixture of experts routing, encoder-decoder vs decoder-only design, and neural scaling laws such as Chinchilla or Kaplan.
Distributed Training
125LTriggers when users need help with distributed ML training, including data parallelism (DDP, FSDP), model parallelism (tensor, pipeline), DeepSpeed ZeRO stages 1-3, Megatron-LM, 3D parallelism, communication backends (NCCL, Gloo), gradient compression, checkpoint strategies, fault tolerance, and elastic training.
Feature Stores
109LTriggers when users need help with feature store architecture and implementation, including Feast, Tecton, and Hopsworks. Activate for questions about online vs offline feature serving, feature computation pipelines, point-in-time correctness, feature reuse, feature freshness, streaming features, and feature monitoring and drift detection.
Gpu Infrastructure
120LTriggers when users need help with GPU infrastructure for ML workloads, including GPU cluster architecture (A100, H100, H200, B200), NVIDIA CUDA ecosystem, multi-GPU training setup, InfiniBand networking, NVLink, GPU memory management, spot instances for training, cloud GPU comparison across AWS, GCP, Azure, Lambda, and CoreWeave, and on-prem vs cloud cost analysis.
Inference Optimization
123LTriggers when users need help with ML inference optimization, including model quantization (INT8, INT4, GPTQ, AWQ, GGUF), pruning strategies, knowledge distillation, ONNX Runtime, TensorRT, operator fusion, batching strategies, speculative decoding, and KV cache optimization. Activate for questions about reducing model latency, improving throughput, or lowering inference costs.
ML CI CD
140LTriggers when users need help with CI/CD for ML systems, including training pipelines, model validation, and deployment automation. Activate for questions about GitHub Actions or GitLab CI for ML, automated retraining triggers, model validation gates, deployment strategies (blue-green, canary, shadow), infrastructure as code for ML, and environment reproducibility with Docker, conda, and pip-tools.
ML Cost Optimization
120LTriggers when users need help with ML cost optimization, including compute cost management for training and inference, spot instance strategies, model size vs accuracy tradeoffs, right-sizing GPU instances, caching strategies, batch inference optimization, managed vs self-hosted infrastructure decisions, FinOps for ML teams, and cost attribution and chargeback models.
ML Experiment Tracking
102LTriggers when users need help with ML experiment tracking, including Weights & Biases, MLflow, Neptune, or ClearML setup and configuration. Activate for questions about experiment organization, metric logging, artifact management, hyperparameter sweeps, team collaboration in experiment platforms, and cost tracking across training runs.
ML Monitoring
113LTriggers when users need help with ML model monitoring in production, including data drift detection (PSI, KL divergence, KS test), concept drift, model performance monitoring, prediction monitoring, alerting strategies, shadow mode deployment, ground truth collection, monitoring dashboards, and SLA management for ML systems.
ML Platform Design
150LTriggers when users need help with internal ML platform architecture and design, including self-serve ML infrastructure, platform team responsibilities, abstraction layers for data scientists, notebook-to-production workflows, multi-tenant ML platforms, platform metrics and adoption, and build vs buy decisions for ML tools.
ML Testing
121LTriggers when users need help with testing ML systems, including unit testing ML code, integration testing ML pipelines, data validation testing, model quality testing with regression tests and performance thresholds, training pipeline testing, serving endpoint testing, load testing for ML systems, test data management, and property-based testing for data transforms.
Model Registry
126LTriggers when users need help with model versioning and registry systems, including MLflow Model Registry, Weights & Biases, and SageMaker Model Registry. Activate for questions about model lifecycle management, staging and production transitions, approval workflows, model metadata and lineage, packaging formats, CI/CD integration, and model governance and compliance.
Model Serving
118LTriggers when users need help with model serving and deployment, including serving frameworks like TorchServe, Triton Inference Server, TensorFlow Serving, BentoML, or vLLM. Activate for questions about online vs batch vs streaming inference, REST and gRPC APIs, model warm-up, autoscaling, multi-model serving, A/B testing for models, and canary deployments.
Algorithms Data Structures
134LTriggers when users need help with algorithm design, data structure selection, or complexity analysis.
Compiler Design
151LTriggers when users need help with compiler design, language implementation, or code generation.
Computational Complexity
161LTriggers when users need help with computational complexity theory or its practical implications.
Computer Architecture
170LTriggers when users need help with computer architecture, hardware performance, or low-level optimization.
Computer Networking
142LTriggers when users need help with computer networking concepts, protocols, or architecture.
Concurrent Parallel Programming
144LTriggers when users need help with concurrent or parallel programming. Activate for questions about
Cryptography
158LTriggers when users need help with cryptography concepts, protocols, or implementation decisions.
Database Internals
148LTriggers when users need help with database internals, storage engines, or query optimization.
Distributed Systems
148LTriggers when users need help with distributed systems design or debugging. Activate for questions
Formal Methods
172LTriggers when users need help with formal methods, formal verification, or rigorous specification.
Information Retrieval
165LTriggers when users need help with information retrieval, search systems, or ranking algorithms.
Operating Systems
156LTriggers when users need help with operating system concepts, internals, or system-level programming.
Programming Language Theory
160LTriggers when users need help with programming language theory, type systems, or language design.
Systems Design
176LTriggers when users need help with large-scale system design, architecture, or capacity planning.
LLM Agents
133LTriggers when users need help with LLM agent design, tool use, or multi-agent systems.
LLM Application Patterns
153LTriggers when users need help with LLM application design patterns and architectures.
LLM Cost Management
151LTriggers when users need help with LLM cost optimization, budgeting, or economic analysis.
LLM Evaluation
126LTriggers when users need help with LLM evaluation, benchmarking, or assessment methodology.
LLM Fine Tuning
122LTriggers when users need help with LLM fine-tuning, adaptation, or specialization.
LLM Inference Optimization
142LTriggers when users need help with LLM inference optimization, serving, or deployment performance.
LLM Pretraining
116LTriggers when users need help with LLM pretraining, data curation, or training infrastructure.
LLM Safety Guardrails
136LTriggers when users need help with LLM safety, guardrails, or content moderation systems.
Prompt Engineering Advanced
151LTriggers when users need help with advanced prompt engineering techniques for LLMs.
Rag Architecture
141LTriggers when users need help with RAG systems, retrieval-augmented generation, or knowledge-grounded LLM applications.