A look under the hood: How (and why) we built Question Assistant

What is a good question?

Stack Overflow recently launched Question Assistant, a tool designed to improve question quality by providing automated feedback to users during the question-asking process. Initial attempts to directly rate question quality using Large Language Models (LLMs) proved unreliable, as feedback was repetitive and lacked correlation with defined quality categories.

Why This Matters

The ideal is to have AI autonomously assess and improve question quality, reducing the burden on human moderators. However, subjective quality assessment requires defined data, and LLMs often produce generic feedback without context. Poor question quality leads to wasted moderator time, unanswered questions, and a degraded user experience—costing the platform engagement and potentially contributing to knowledge silos.

Key Insights

Low Krippendorff’s alpha score for human ratings: A survey of 1,000 Stack Overflow reviewers yielded a low Krippendorff’s alpha, indicating unreliable data for training ML models (2025).
Indicator models over direct scoring: Instead of predicting a quality score, the team built individual logistic regression models to identify specific areas needing improvement.
Azure Databricks and Kubernetes: The models were trained and stored in Azure Databricks and deployed via Azure Kubernetes for scalable prediction generation.

Working Example

# Example of TF-IDF vectorization (conceptual)
from sklearn.feature_extraction.text import TfidfVectorizer

corpus = [
    "This question lacks context.",
    "Please provide more details about your problem.",
    "Include a minimal reproducible example."
]

vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)

print(X.toarray())

Practical Applications

Stack Overflow: Question Assistant provides automated feedback to all users asking questions, improving question quality and success rates.
Pitfall: Relying solely on LLMs for subjective quality assessment can lead to generic and unhelpful feedback, hindering usability.

References:

https://stackoverflow.blog/2025/12/31/a-look-under-the-hood-how-and-why-we-built-question-assistant/

On This Page

What is a good question?

Why This Matters

Key Insights

Working Example

Practical Applications

Continue reading

Related Content

7 Advanced Feature Engineering Tricks for Text Data Using LLM Embeddings

Self-Supervised Temporal Pattern Mining for circular manufacturing supply chains with embodied agent feedback loops

Simulating Practical Byzantine Fault Tolerance (PBFT) with Asyncio and Latency Analysis