Understanding AI Risks in Talk to the City

Talk to the City uses Large Language Models (LLMs) to strengthen collective decision-making by transforming large-scale public input into actionable insights. Unlike traditional polling or commercial survey tools, T3C preserves the nuance of individual perspectives and captures authentic voices, while surfacing the broader themes and differences that matter most. Like all AI systems, LLMs have limitations and risks that users should understand.

LLMs excel at recognizing language patterns, identifying themes, and summarizing complex information. However, they are prediction machines, not truth machines. They generate text based on patterns they have learned, not from verified facts. When an LLM produces text that sounds plausible but is not grounded in source materials, it is called a "hallucination."

In the context of summarizing large opinion datasets, this might mean:

  • Invented claims: Generating statements no participants actually made.
  • Overgeneralization: Expanding a specific statement ("I love my German Shepherd") into a broad conclusion ("People prefer large dog breeds").
  • Unwarranted assumptions: Inferring motivations or beliefs that were not expressed.
  • Misattributed quotes: Pairing quotes with claims they do not support.

How Talk to the City Mitigates Hallucinations

We have built Talk to the City with multiple safeguards and validation steps into the processing pipeline to mitigate these risks:

Claim Extraction with Constrained Output

For each comment, the LLM extracts explicit claims and must link them to verbatim quotes from real people that support each claim. The LLM can only use topic and subtopic names generated during the initial extraction phase—no variations or new names are allowed.

Short comments (fewer than three words) are filtered out because they can cause the LLM to hallucinate by inventing missing details.

Each extracted claim in a Talk to the City report is directly traceable to real people's opinions. Clicking a claim reveals the exact supporting quotes and participants behind it to verify whether the summary is fair and accurate.

We also use automated tests to flag low-quality extractions:

  • Overly generic claims ("communication is important")
  • Personal preferences that are too vague to contribute to deliberation on the subject (e.g. "I like cats" in a report on the ethics of pet ownership)
  • Mismatched or incomplete quotes

Transparency and Auditability

Every report also includes a detailed audit log of all processing decisions, showing:

  • Filtering: Which comments were excluded (too short, etc) and why
  • Deduplication: Which claims were merged and why
  • Extraction results: Success, errors, or flagged issues
  • Metadata: Timestamps, model versions, and configuration details

This enables full traceability. If you discover a possible hallucination or misclassification, please report it.

Remaining Risks

Despite our safeguards, some risks remain:

1. Subtle Overgeneralization

The AI might slightly exaggerate or reframe a sentiment. For example:

  • Comment: "I'm not sure about birds"
  • Claim: "Birds are not ideal pets for everyone"

The claim adds certainty not present in the original comment.

2. Topic Sorting Ambiguity

LLM topic sorting may differ from human intuition, occasionally merging or fragmenting categories in unexpected ways.

3. Missing Edge Cases

Niche or minority perspectives can be underrepresented or absorbed into broader themes.

4. Bias Inheritance

Because LLMs reflect patterns in their training data, they may reproduce cultural or societal biases in phrasing or emphasis. To make the most of your report, stay alert to two types of potential issues:

AI Interpretation Limits: These relate to how language models interpret or summarize text.

  • Quotes that don't clearly support the claims they are attached to.
  • Vague or ambiguous comments turned into confident or detailed statements.

Underlying Data Gaps: These arise from limitations in the input itself—such as low participation, uneven representation, or missing perspectives. AI analysis cannot correct for these gaps.

  • Topics supported by very few quotes or contributors.
  • Overly uniform consensus with no visible disagreement.
  • Missing or underrepresented viewpoints you expected to see.

Human Oversight Matters

Think of Talk to the City as a highly capable research assistant: fast, consistent, and insightful, but still requiring human oversight and judgment. Use reports as a starting point for conversation, not the final conclusion. Verify key claims, examine original quotes, and apply your contextual knowledge. Our goal is to help make your community voice easier to hear—clearly, honestly, and with full awareness of technological limits.

LLM Safety - Talk to the City