🗓️ Week 10
LLMs in legal and sensitive contexts: Review & Commentary

DS101 – Fundamentals of Data Science

01 Dec 2025

Overview

Purpose of this deck:

This deck provides commentary, context, and deeper analysis of the cases we discussed in class. It is intended as:

  • A review resource
  • Additional context on technical and legal issues
  • Connections between themes we identified
  • Further reading for those interested

What this covers:

  1. Technical background on LLM hallucinations
  2. Detailed case analysis (UK focus, with US comparison)
  3. Regulatory responses and frameworks
  4. Themes from our class discussion
  5. Implications for data science practice
  6. Resources for further exploration

Note: This is not a replacement for the readings, but a supplement to aid your understanding.

Part 1: Understanding LLM Hallucinations

The Technical Foundation

What are LLMs?

Large Language Models (like GPT-4, Claude, Gemini) are:

  • Neural networks trained on massive text datasets
  • Designed to predict the next most likely token (word/subword)
  • Optimized for coherence and fluency, not truth or accuracy
  • Statistical pattern-matching systems, not knowledge bases

How they generate text:

  1. Given input text (prompt), model calculates probability distribution over all possible next tokens
  2. Selects token based on this distribution (with some randomness for variety)
  3. Repeats process, token by token, until completion
  4. Each token depends only on patterns learned during training

Critical limitation:

LLMs have no mechanism to verify truth. They generate what sounds plausible based on patterns, not what is factually correct.

What Are Hallucinations?

Definition:

Hallucinations occur when LLMs generate outputs that are not grounded in their training data or factual reality - essentially, making things up while sounding confident.

Why they happen:

  • LLMs learn statistical associations between words/concepts
  • When asked to produce something (e.g., a legal case to support an argument), they generate text that:
    • Matches the expected format (case citations look like real citations)
    • Fits the context (supports the argument being made)
    • Sounds plausible (uses appropriate legal language)
    • But may be entirely fabricated

Why this is fundamental, not fixable:

Recent research demonstrated that hallucinations are mathematically inevitable in current LLM architectures. You cannot train a model to never hallucinate without:

  1. Explicit examples of hallucinations to avoid (infinite possibilities) (see “(Im)possibility of Automated Hallucination Detection in Large Language Models” by Karbasi et al. (2025))
  2. External verification system (not part of standard LLMs)
  3. Fundamental redesign of how models work

Video explanation: Why large language models hallucinate

Part 2: Case Studies - Detailed Analysis

UK Case 1: Muhammad Mujeebur Rahman

See a narration of the case here and here

What happened:

  • Rahman filed grounds of appeal in an immigration case
  • The document cited a Court of Appeal case: Y (China) [2010] EWCA Civ 116
  • The Tribunal attempted to locate the case and discovered it does not exist
  • When questioned, Rahman initially insisted the authority was genuine
  • After further challenge, he eventually admitted that the citation had been produced using a generative AI tool
  • He stated he had not realised AI could fabricate case law
  • The Tribunal concluded he had given multiple inconsistent explanations before admitting the true source

UK Case 1: Muhammad Mujeebur Rahman (continued)

Court’s response:

  • The Tribunal found Rahman had misled the court by repeatedly asserting the case was real
  • It accepted that he did not initially understand AI hallucination risks, mitigating the original mistake
  • No contempt referral or criminal action was pursued
  • However: the Tribunal considered his explanations inadequate and referred him to the Bar Standards Board
  • The decision stressed the seriousness of failing to verify authorities before submission

UK Case 1: Muhammad Mujeebur Rahman (continued)

Mitigating factors:

  • He eventually admitted the truth (after several explanations)
  • The Tribunal accepted he was unaware that AI could invent citations
  • No evidence of deliberate fabrication
  • Early-stage case in the profession’s learning curve regarding AI tools

Key quote from Tribunal’s reasoning:

The panel stated that Rahman had:

“directly attempted to mislead the Tribunal through reliance on Y (China)… and only made a full admission in his third explanation,” concluding that he had not “acted with integrity and honesty.”

Lesson: AI may cause the initial error, but the advocate remains responsible for checking authorities and for giving clear, truthful explanations to the court. Candour is essential — and its absence, rather than the AI itself, is what leads to regulatory consequences.

UK Case 2: Ayinde (May 2025)

Background:

Judicial review in the High Court (Administrative Court). The issue arose when counsel for the claimant (Ms Sarah Forey) and the claimant’s solicitors submitted five entirely fictitious cases in the claimant’s Statement of Facts and Grounds. These authorities were later shown to be AI-generated.

What happened:

  • Counsel for the claimant drafted a Statement of Facts and Grounds containing five fake cases
  • When the defendant (London Borough of Haringey) attempted to look up the cases, none existed
  • The defendant requested copies and explanations multiple times
  • The claimant’s solicitors responded with a dismissive letter calling them “cosmetic” and “minor citation errors”
  • No explanation was provided until the wasted-costs hearing
  • Counsel attempted to claim she had a “box of photocopied cases” and “lists of ratios”, but this was rejected by the judge as implausible
  • The judgment strongly suggests the cases were generated using AI (though the court could not make a formal finding because counsel was not cross-examined)

UK Case 2: Ayinde (continued)

The fabricated cases included:

(from paras 18, 20, 55–63)

  • R (El Gendi) v Camden LBC [2020] EWHC 2435 (Admin)
  • R (Ibrahim) v Waltham Forest LBC [2019] EWHC 1873 (Admin)
  • R (H) v Ealing LBC [2021] EWHC 939 (Admin)
  • R (KN) v Barnet LBC [2020] EWHC 1066 (Admin)
  • R (Balogun) v Lambeth LBC [2020] EWCA Civ 1442

All were completely invented, complete with fabricated facts and fabricated legal principles.

What made this worse:

  • Multiple fake cases across four grounds
  • Repeated failure to explain when challenged
  • Dismissive and unprofessional communications, calling the fakes “cosmetic errors”
  • Misstatement of statutory duties (portraying a discretionary duty as mandatory)
  • Attempt to introduce new authorities orally at the wasted-costs hearing without evidence or copies
  • Potentially AI-generated submissions, unverified

The judge described the conduct as:

“wholly improper… unreasonable… and negligent” and “a substantial difficulty with members of the Bar who put fake cases in statements of facts and grounds.” (para 64, 63)

UK Case 2: Ayinde (continued)

Court’s findings:

  • The court held that counsel and the solicitors had acted improperly, unreasonably, and negligently (para 64)
  • The inclusion of fake cases was “professional misconduct” (para 64)
  • The court rejected counsel’s explanations for how the fake cases were created (paras 52–58)
  • The court held the behaviour “undermines the integrity of the legal profession and the Bar” (para 66)

Sanctions:

1. Wasted Costs Order

  • The High Court imposed a wasted-costs order of £4,000 (para 71)

  • Liability shared equally:

    • £2,000 payable by counsel (Ms Forey)
    • £2,000 payable by the solicitors (Haringey Law Centre) (para 72)

2. Reduction of recoverable costs

  • The claimant’s own recoverable costs were further reduced by £7,000 due to the fake cases (paras 73–77)

3. Regulatory referral

The judge ordered (para 78):

  • The transcript must be sent to the Bar Standards Board
  • And to the Solicitors Regulation Authority
  • Counsel should have self-reported, and the solicitors “should have reported themselves” (para 64)

This mirrors the strongest possible judicial condemnation short of contempt.

UK Case 2: Ayinde (continued)

Mitigating / contextual factors:

  • None of the fake cases were argued to be maliciously fabricated
  • Counsel denied dishonesty
  • The judge could not formally find AI-use because counsel was not cross-examined
  • The claimant’s substantive case (housing for a medically vulnerable homeless man) was strong, and counsel’s underlying legal points might have succeeded even without the fake authorities (para 57)

However, the mitigating factors did not prevent sanctions.

Key quote from judgment:

“It is wholly improper to put fake cases in a pleading… providing a fake description of five fake cases… qualifies quite clearly as professional misconduct.” (para 64)

Lesson:

Submitting AI-generated authorities without verification is serious misconduct, but the real professional failure was:

  • The lack of candour,
  • The dismissive responses, and
  • The attempt to minimise what were actually grave errors.

This case stands as the clearest and most serious UK example of AI-fabricated case law leading to real financial sanctions and regulatory referral.

You can see the whole judgement in this case here

David Allen Green wrote an extensive analysis of the case in his blog.

And Khaleed Moyed (Partner at Gunnercooke LLP) also wrote another blog post to analyse this case (which is pretty much jurisprudence at this point!)

UK High Court Warning (June 2025)

Context:

After multiple cases of AI-generated fake citations, the High Court issued a formal warning to the legal profession.

The warning:

Dame Victoria Sharp (President of the King’s Bench Division) and Justice Jeremy Johnson issued a statement:

“Freely available generative artificial intelligence tools, trained on a large language model such as ChatGPT are not capable of conducting reliable legal research. Such tools can produce apparently coherent and plausible responses to prompts, but those coherent and plausible responses may turn out to be entirely incorrect. The responses may make confident assertions that are simply untrue. They may cite sources that do not exist. They may purport to quote passages from a genuine source that do not appear in that source.”

UK High Court Warning (June 2025)

Key points:

  1. General-purpose LLMs unsuitable for legal research
    • ChatGPT, Claude, Gemini, etc. are not reliable for finding cases
    • Specialized legal research AI may be different (with caveats)
  2. Personal responsibility emphasized
    • Lawyers cannot delegate their duty to verify.
    • They have a duty to check the authenticity of every authority
    • The same standards apply as with work from pupils or junior staff
    • “I relied on AI” is not an excuse
    • Professional obligations remain paramount
  1. Consequences outlined
    • Wasted costs orders
    • Referral to professional regulators
    • Potential contempt of court charges
    • Could lead to being struck off (in serious/repeated offences)

Significance:

  • Most senior judges addressing the issue directly
  • Clear institutional position: this is serious
  • Warning gives profession “one last chance”
  • Future cases likely to face harsher penalties

Link: The Guardian coverage

You can find the original text of the warning here

US Case 1: California – $10,000 Sanction (Sept 2025)

Background:

In Noland v. Land of the Free, L.P. (Cal. Ct. App., 2nd Dist., Sept. 12, 2025, B331918), California attorney Amir Mostafavi filed appellate briefs containing numerous AI-generated fabricated quotations and citations.

What happened:

  • Mostafavi drafted an opening brief for a civil appeal
  • He then used ChatGPT to “enhance” the brief, and ran it through other AI tools to “check” it
  • The filed briefs contained quotations attributed to case law that looked plausible
  • The court later found that 21 of 23 quoted passages in the opening brief were fabricated, with more fabrications in the reply brief
  • Some cited cases did not say what was claimed, and some did not exist at all
  • The Court of Appeal issued an order to show cause and then a published opinion “as a warning” to the profession

US Case 1: California – $10,000 Sanction (continued)

Difference from UK cases:

  • Mixed pattern of error:

    • Some authorities were real but misquoted or mischaracterised
    • Some authorities were non-existent
  • So the problem was broader than just “fake quotations from real cases”

  • Like the UK cases, the real issue was complete failure to verify AI output

California 2nd District Court of Appeal’s ruling:

  • Imposed a $10,000 sanction payable to the court for:

    • Filing a frivolous appeal
    • Violating court rules
    • Submitting fabricated quotations and citations
  • Issued a published opinion as a warning, including the line (widely quoted in commentary):

    “No brief, pleading, motion, or any other paper filed in any court should contain any citations — whether provided by generative AI or any other source — that the attorney responsible for submitting the pleading has not personally read and verified.”

  • Referred Mostafavi to the State Bar for potential discipline

Mostafavi’s response (publicly reported):

“In the meantime we’re going to have some victims, we’re going to have some damages, we’re going to have some wreckages. I hope this example will help others not fall into the hole. I’m paying the price.”

(Reported in CalMatters.)

US Case 1: California – $10,000 Sanction (continued)

Key difference from UK:

  • Monetary sanction: $10,000 directly to the court (significantly higher than typical wasted-costs orders we’ve seen so far in the UK)
  • More overtly punitive framing: labelled a frivolous appeal and a waste of judicial resources
  • Very explicit ‘technology-neutral’ rule: the duty is to personally read and verify, regardless of whether the source is AI, a research database, or a human assistant

Link: CalMatters coverage

LawNext coverage

Case judgment

US Case 2: Morgan & Morgan (Feb 2025)

Background:

Attorneys from Morgan & Morgan (one of America’s largest personal injury firms) filed motions in Wadsworth v. Walmart Inc. citing nine cases they could not verify existed.

Significance:

  • Major law firm: Over 900 attorneys nationwide
  • Well-resourced: Full access to Westlaw, LexisNexis, support staff
  • Shows this is not just a problem for solo practitioners or under-resourced lawyers

What happened:

  • Attorneys filed motions with nine case citations
  • Opposing counsel challenged the citations
  • Judge Kelly Rankin ordered attorneys to show cause
  • Attorneys could not verify whether cases existed
  • Appeared to be AI hallucinations

Outcome:

  • Rudwin Ayala: Fined $3,000
  • T. Michael Morgan: Fined $1,000
  • Taly Goody: Fined $1,000
  • Firm threatened to fire responsible attorneys
  • Public embarrassment for major firm

Why this matters:

  • Eliminates “lack of resources” excuse
  • Morgan & Morgan has everything needed to verify citations
  • Suggests cultural problem: over-reliance on technology without verification
  • Pressure for efficiency may trump careful review

Analysis: David Lat’s detailed coverage

Lesson: Technology without professional culture of verification fails regardless of resources.

US Case 3: Self-Representation Trend (Oct 2025)

The growing phenomenon:

NBC News documented a trend: Americans increasingly using ChatGPT instead of lawyers to represent themselves in court.

Example: Lynn White

  • Facing eviction from Long Beach trailer park
  • Lost initial jury trial with a lawyer
  • Used ChatGPT and Perplexity to prepare her appeal
  • Successfully represented herself using AI assistance
  • Her reflection: “It was like having God up there responding to my questions”

The double-edged sword:

Positive aspects:

  • Democratizes legal knowledge
  • Reduces cost barrier to justice
  • Empowers individuals who can’t afford representation
  • Can produce competent (if basic) legal work

US Case 3: Self-Representation Trend (continued)

Serious risks:

  • No error detection: Non-lawyers can’t spot hallucinations
  • Overconfidence: AI sounds authoritative even when wrong
  • Procedural mistakes: May miss critical filing requirements
  • Rights waived unknowingly: May not understand what they’re giving up
  • Opposing counsel advantage: Lawyers can exploit mistakes

Scale of the problem:

  • 348 cases in Charlotin’s database involved pro se (self-represented) litigants
  • Rate accelerating as more people discover ChatGPT
  • Mix of successes and disasters

Ethical dilemma:

Should we restrict AI use by self-represented litigants even though they can’t afford lawyers? Or accept higher error rates as price of access?

Link: NBC News article

Child Protection: Victoria, Australia (Sept 2024)

Background:

Child protection worker in Victoria used ChatGPT to help write court documents, uploading sensitive information about at-risk children.

What happened:

  • Worker uploaded identifying details of children in protection proceedings
  • Used ChatGPT to draft court documents and case notes
  • Data sent to OpenAI servers (located in US)
  • Other potential instances discovered during investigation

OVIC Investigation findings:

Victoria’s Office of the Victorian Information Commissioner found:

  • Breach of Information Privacy Principles 3.1 (data quality) and 4.1 (data security)
  • Sensitive data about vulnerable children potentially:
    • Stored indefinitely on OpenAI servers
    • Used to train future models
    • Accessible to OpenAI staff
    • Subject to US data laws, not Australian privacy protections

Child Protection: Victoria, Australia (Sept 2024)

Why this is uniquely serious:

  1. Vulnerable population: Children at risk of significant harm
  2. Strict confidentiality: Child protection proceedings are closed to public
  3. Identifying information: Names, family circumstances, abuse allegations
  4. International data transfer: Australian child data to US company
  5. No control: Once uploaded, department cannot ensure deletion
  6. Legal requirement: Child protection workers have statutory duty to protect privacy

OVIC’s order:

  • Complete ban on child protection staff using ChatGPT and similar tools until November 2026
  • Block access to: ChatGPT, Jasper, Claude, Gemini, Copilot, etc.
  • Quarterly compliance monitoring
  • Implement technical controls to prevent access
  • Review and update policies

Link: The Guardian coverage

Healthcare: AMA Concerns (July 2023)

Background:

Australian Medical Association raised alarm after doctors began using ChatGPT to write medical notes and patient communications.

What happened:

  • Doctors using ChatGPT to:
    • Draft consultation notes
    • Write referral letters
    • Compose patient communication
    • Summarize medical histories
  • Some uploading patient information to ChatGPT
  • No clear guidance on appropriate use

AMA concerns:

  1. Privacy breaches: Uploading patient data violates medical confidentiality
  2. Hallucinated medical information: Risk of incorrect diagnoses or treatment advice
  3. Lack of clinical judgment: AI cannot replace doctor’s assessment
  4. Liability unclear: Who’s responsible if AI gives wrong advice?
  5. No regulatory framework: Medical AI use largely unregulated

Healthcare: AMA Concerns (continued)

Why healthcare is different from legal:

Legal contexts:

  • Fake cases can be checked against databases
  • Errors often caught before harm (opposing counsel reviews)
  • Reversible in many cases (appeals, corrections)

Healthcare contexts:

  • Hallucinated medical information harder to verify
  • Errors may not be obvious until patient harmed
  • Harm can be irreversible (injury, death)
  • Patients often unaware AI was involved

Common thread:

Both involve professionals delegating critical judgment to systems that:

  • Cannot verify their own output
  • Have no understanding of stakes
  • Cannot be held accountable
  • May hallucinate confidently

AMA call to action:

Stronger AI regulations needed before widespread adoption in healthcare settings.

Link: The Guardian article

Part 3: Comparing Jurisdictional Approaches

UK Approach: Regulatory and Professional

Characteristics:

  1. Guidance-first approach
    • High Court issuing clear warnings
    • Professional bodies (Bar Council, Law Society, SRA) publishing detailed guidance
    • Emphasis on education before punishment
  2. Professional regulation emphasis
    • Referral to Bar Standards Board or SRA
    • Professional discipline rather than solely monetary penalties
    • Can lead to suspension or striking off
  3. Graduated response (so far)
    • Early cases: stern warnings, relatively lenient
    • Clear signal: future cases will be treated more harshly
    • “One last chance” approach
  4. Institutional coordination
    • High Court, Court of Appeal, professional bodies aligned
    • Consistent messaging across institutions
    • Clear standards emerging

UK Approach: Regulatory and Professional (continued)

Bar Council guidance (2024):

  • AI-generated content that misleads court = professional misconduct
  • Barristers must verify all AI output
  • Cannot delegate duty to supervise AI
  • Bringing profession into disrepute = serious matter

Advantages:

  • Gives profession time to adapt
  • Educational opportunity
  • Professional standards maintained

Risks:

  • Leniency might not deter enough
  • Slow to impose serious consequences

US Approach: Punitive and Deterrent

Characteristics:

  1. Heavy monetary penalties
    • $1,000 - $10,000 individual fines common
    • Paid personally by attorney, not by firm
    • Immediate financial consequences
  2. Public shaming
    • Detailed published opinions naming attorneys
    • Often covered extensively in legal press
    • Reputational damage significant
  3. Decentralized responses
    • Each court/judge setting own standards
    • Inconsistent approaches across jurisdictions
    • Creates uncertainty
  4. Contempt and sanctions
    • Some cases referred for contempt of court
    • Rule 11 sanctions (frivolous/false filings)
    • Can lead to suspension from practice

US Approach: Punitive and Deterrent

Federal courts’ position:

Many courts now require:

  • Certification: Attorney certifies no AI was used for legal research, OR
  • Disclosure + verification: If AI used, must disclose and verify all output
  • Personal responsibility: Signature on document = full accountability

Advantages:

  • Strong deterrent effect
  • Clear message: courts take this seriously
  • Immediate consequences

Risks:

  • Inconsistent standards across jurisdictions
  • May not address root causes (lack of understanding)
  • Punishes individuals but doesn’t fix systemic issues

Common Elements Across Jurisdictions

Despite different approaches, these principles are universal:

1. Personal responsibility

  • Lawyer’s signature on document = full accountability
  • Cannot delegate duty to verify to AI or anyone else
  • “I didn’t know” is not a defense
  • Technology doesn’t diminish professional obligations

2. Verification requirement

  • Must check all AI-generated citations against original sources
  • Must read cases cited, not just summaries
  • Must verify legal principles stated
  • Due diligence cannot be automated

3. Seriousness of offense

  • Courts view fake citations as direct threat to justice
  • Undermines court’s ability to rely on submissions
  • Wastes judicial resources
  • Erodes public trust

4. Professional ethics

  • Officers of the court have special duties
  • Honesty and integrity paramount
  • Profession’s reputation at stake
  • Higher standards than laypeople

5. Insufficient excuse

  • Time pressure doesn’t justify cutting corners
  • Lack of access to legal databases doesn’t excuse fake citations
  • Trust in technology doesn’t eliminate duty
  • Good intentions don’t matter if output misleads court

Key insight: Regardless of cultural differences, all legal systems agree professionals cannot delegate judgment to unaccountable systems.

Part 4: Key Themes From Our Discussion

Theme 1: The Verification Crisis

The problem:

A surprisingly large number of competent, experienced lawyers failed to verify AI-generated citations. Why?

Factors identified:

  1. Overconfidence in technology
    • Assumption that AI “knows” legal information
    • Trust in confident, authoritative tone
    • Expectation that technology is reliable
  2. Time and cost pressure
    • Legal research is time-consuming
    • Clients demand efficiency
    • AI promises quick answers
    • Verification takes as long as original research
  3. Lack of understanding
    • Many lawyers don’t understand how LLMs work
    • Don’t realize hallucinations are inevitable
    • Treat AI like search engine or database (it’s not)
  1. Database access inequality
    • Westlaw/LexisNexis expensive (£1000s/year)
    • Junior barristers and solo practitioners may lack access
    • ChatGPT appears as “free legal research”
  2. Confirmation bias
    • AI generates cases that support your argument
    • Psychologically satisfying
    • Less likely to question helpful information

Solutions discussed:

  • Mandatory AI training for legal professionals
  • Law school curriculum updates
  • Technical tools to flag potential hallucinations
  • Culture change: verification as professional norm

Theme 2: Access to Justice vs. Quality of Justice

The dilemma:

Self-represented litigants using ChatGPT because they can’t afford lawyers. Is this:

Democratization of legal access (positive view):

  • Legal knowledge becomes accessible
  • Reduces cost barrier to justice
  • Empowers individuals
  • Better than complete lack of representation
  • Can produce adequate results in simple cases
  • Lynn White’s successful appeal example

Dangerous pseudo-representation (critical view):

  • False confidence in flawed advice
  • Cannot detect hallucinations
  • May miss critical procedural requirements
  • Opposing counsel can exploit mistakes
  • Rights unknowingly waived
  • Risk of worse outcomes than self-representation without AI

Class discussion points:

  • Is access to flawed AI better than no access at all?
  • Should we restrict self-represented litigants from using AI?
  • How do we balance democratization with safety?
  • Role of legal aid and pro bono services?
  • Could simplified legal procedures reduce need for representation?

Unresolved tension:

We want to expand access to justice, but not at the cost of creating new forms of injustice. No easy answer.

Theme 3: Professional Responsibility in the AI Age

Core question:

What does professional responsibility mean when AI can do much of the work?

Traditional professional duties:

  • Competence: Adequate knowledge and skill
  • Diligence: Thorough preparation
  • Honesty: Truthful to court and clients
  • Confidentiality: Protecting client information
  • Independence: Professional judgment not compromised

How AI challenges these:

  1. Competence redefined
    • Need to understand AI capabilities/limitations
    • Knowing when to use vs. avoid AI
    • Ability to verify AI output
    • Technical literacy now part of competence
  1. Diligence expanded
    • Can’t just accept AI output
    • Must verify independently
    • Due diligence includes checking AI work
    • More work, not less
  2. Honesty implications
    • Duty to disclose AI use (in some jurisdictions)
    • Cannot knowingly submit AI hallucinations
    • Must correct errors when discovered
  3. Confidentiality at risk
    • Uploading client data to ChatGPT = breach
    • Must understand data flow and storage
    • Privacy implications of AI use
  4. Independent judgment threatened
    • Cannot delegate decision-making to AI
    • Professional judgment remains human responsibility
    • AI as tool, not replacement

Emerging consensus:

Professional responsibility increases with AI, not decreases. Technology adds new obligations rather than reducing existing ones.

Theme 4: The Opacity Problem

The challenge:

LLMs are “black boxes” - we can’t see how they reach conclusions. Why does this matter?

Implications for legal contexts:

  1. Cannot audit decision-making
    • Don’t know why AI generated particular case
    • Can’t trace reasoning process
    • Impossible to identify specific error source
  2. No basis for trust
    • Trust requires understanding
    • Legal system depends on traceable reasoning
    • “The AI said so” is not reasoning
  3. Accountability requires transparency
    • Can’t fix what we can’t see
    • Can’t improve what we don’t understand
    • Can’t assign responsibility without causal chain
  4. Professional judgment needs comprehension
    • Lawyers must understand arguments they make
    • Cannot advocate for reasoning they don’t grasp
    • Duty to court requires understanding

This connects to broader AI ethics concerns:

  • Algorithmic decision-making in criminal justice
  • Automated systems affecting rights
  • Need for “explainable AI”
  • Right to explanation under GDPR

Why legal profession struggles:

  • Training emphasizes reasoning and precedent
  • Legal culture values transparent logic
  • LLMs fundamentally incompatible with this culture
  • Creates cognitive dissonance

Possible solutions:

  • Require explainable AI for legal applications
  • Develop specialized legal AI with transparency
  • Maintain human decision-making for critical determinations

Theme 5: Sensitive Contexts Require Different Standards

The recognition:

Not all AI uses carry equal risk. Some contexts demand higher standards or outright prohibition.

What makes a context “sensitive”:

  • High stakes: Affects fundamental rights (liberty, safety, family)
  • Vulnerable populations: Children, mentally ill, incarcerated individuals
  • Irreversible consequences: Cannot undo harm
  • Strict confidentiality: Legal/medical/welfare information
  • Power imbalances: State vs. individual, professional vs. layperson

Graduated approach discussed:

Level 1: Complete prohibition

  • Child protection case information
  • Patient medical data
  • Privileged attorney-client communications
  • Rationale: Privacy is absolute, risk too high

Level 2: Highly restricted use

  • Criminal defense/prosecution legal research
  • Medical diagnosis and treatment
  • Judicial decision-making
  • Rationale: Can use with extensive safeguards and verification

Level 3: Permitted with caution

  • Civil legal research (with verification)
  • Administrative tasks (without sensitive data)
  • General legal information
  • Rationale: Lower stakes, easier to verify

Level 4: Generally appropriate

  • Legal education
  • Drafting templates
  • General information
  • Rationale: Low risk, easy to correct errors

Victorian child protection ban is example of Level 1: absolute prohibition until technology demonstrably safe.

Key principle: Higher stakes = higher standards = more restrictions.

Part 5: Implications for Data Science Practice

What This Means for you as potential future Data Scientists

As (potential) data scientists, you will:

  • Build AI systems others rely on
  • Deploy models in sensitive contexts
  • Advise organizations on AI adoption
  • Make decisions about when AI is appropriate
  • Face pressure to automate high-stakes decisions

Lessons from legal cases:

  1. Understand your tools’ limitations
    • Know how LLMs work (and don’t work)
    • Recognize inherent constraints
    • Don’t oversell capabilities
    • Be honest about risks
  2. Context matters enormously
    • Same technology, different risk profiles
    • Assess stakes before deployment
    • Some uses should be prohibited
    • Convenience doesn’t justify risk
  1. Verification is not optional
    • Build verification into systems
    • Human oversight for high-stakes decisions
    • Automated checks where possible
    • Cannot assume AI output is correct
  2. Privacy and confidentiality are paramount
    • Understand data flow
    • Know where data is stored
    • Comply with regulations (GDPR, etc.)
    • Protect vulnerable populations
  3. Accountability requires transparency
    • Black box systems problematic for sensitive contexts
    • Explainability matters
    • Audit trails necessary
    • Stakeholders deserve understanding

Professional Ethics in Data Science

Parallels with legal ethics:

Lawyers have centuries-old professional duties; data science is still defining its equivalent. But the pressures, risks, and responsibilities are increasingly similar.

Emerging duties for data scientists:

  1. Competence

    • Understand the models you deploy
    • Know their limitations and failure modes
    • Keep up with rapidly shifting technical landscapes
  2. Honesty

    • Communicate limitations clearly
    • Avoid overstating accuracy or reliability
    • Do not let organisational incentives distort reporting
  3. Diligence

    • Perform proper validation
    • Document experiments thoroughly
    • Stress-test models under edge cases
  1. Confidentiality

    • Treat user data with the same care lawyers treat privileged information
    • Avoid sending sensitive data to unmanaged services
    • Ensure data minimisation and purpose limitation
  2. Public interest

    • Consider downstream societal impacts
    • Evaluate risks to vulnerable groups
    • Resist harmful deployments, even when pressured

Key idea: Professional ethics evolve as the power of our tools increases. The introduction of LLMs raises the stakes of poor judgment, poor documentation, and poor verification.

Building Responsible AI Systems

Lessons from the case studies applied to system design:

  1. Human-in-the-Loop (HITL) is essential

    • AI can assist, but cannot replace professional judgment
    • Always design workflows where humans verify critical steps
    • “Human oversight” is not a checkbox — it is a process
  2. Verification pipelines matter

    • Automatic detection of hallucinations is unsolved

    • However, workflow-based controls can help:

      • Link to source documents
      • Require citations only from trusted databases
      • Prevent free-text generation in sensitive contexts
  1. Context-specific restrictions

    • Use different models and rules depending on use case
    • E.g., no free-form LLM output in regulated contexts
    • Narrower, domain-specific models reduce risk
  2. Auditability and traceability

    • Record model versions, prompts, and parameters
    • Log what was generated and who approved it
    • Make systems reproducible for later investigation
  3. Design for failure

    • Assume hallucinations will happen
    • Build processes that catch them early
    • Treat verification as a socio-technical problem

Part 6: Where Do We Go From Here?

Future Directions for AI in Law and Sensitive Domains

1. Development of “safe-by-design” legal models Narrow, citation-grounded models trained only on validated corpora (e.g., Westlaw-embedded tools) will likely replace general-purpose chatbots for legal contexts.

2. Mandatory AI literacy for professionals Courts and regulators are already signaling that ignorance is not a defence. Expect:

  • Law schools teaching LLM limitations
  • Professional continuing education requirements
  • Similar expectations spreading to medicine, social care, finance

3. Increasing regulatory intervention We may see:

  • Licensing of high-risk AI tools
  • Bans in certain contexts (as in Victoria)
  • Standardised disclosure rules
  • Liability for negligent AI use

4. Hybrid workflows AI will remain useful:

  • Drafting templates
  • Summaries
  • Organising documents But always under strict verification.

5. Move toward “evidence-linked outputs” Future models may be required to:

  • Ground claims in specific documents
  • Provide retrieval-based citations
  • Produce only outputs backed by verifiable sources

Key Takeaways

1. LLM hallucinations are inevitable Not a bug — a structural property of how models work.

2. High-risk contexts amplify consequences Small errors → massive real-world harm (legal, medical, welfare).

3. Verification is the core skill of the AI era Professionals must assume AI is wrong until proven otherwise.

4. Tools do not remove responsibility Users remain accountable for all outputs they rely on.

5. Data scientists play a central role You will shape the safeguards, workflows, and cultural norms that enable safe AI use.

Further Reading & Resources

Technical background:

  • Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 55, 12, Article 248 (December 2023), 38 pages. https://doi.org/10.1145/3571730
  • Kadavath, Saurav, Tom Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer et al. “Language models (mostly) know what they know.” arXiv preprint arXiv:2207.05221 (2022).

Legal cases & commentary:

Regulatory guidance:

General AI practice: