The Challenge: Unreliable Scenario and Stress Testing

For most finance teams, scenario and stress testing is a painful mix of spreadsheets, slide decks and scattered assumptions. Risk managers struggle to create realistic macro and idiosyncratic scenarios, propagate them consistently through P&L, balance sheet and cash flow, and document every step in a way that stands up to internal and regulatory scrutiny. The result is fragile stress tests that are hard to maintain and even harder to explain.

Traditional approaches rely heavily on manual work: copying assumptions between files, reconciling versions, and re-keying parameters into models. Scenario libraries are often built once a year, then only lightly updated. Regulatory texts and supervisory expectations change constantly, but translating them into concrete stress designs is slow and error-prone. As data, markets and business models move faster, a spreadsheet-centric process simply cannot keep up.

The business impact is significant. Unreliable stress testing means blind spots in tail risk, underestimation of concentration and liquidity risk, and slower reaction to early warning signals. Management may take comfort in a set of scenarios that do not actually cover the institution’s real exposure profile. This increases the risk of unexpected losses, regulatory findings, capital surcharges, and missed opportunities to proactively derisk portfolios or renegotiate limits. It also ties up senior finance and risk talent in manual paperwork instead of higher-value analysis.

The good news: this challenge is tough but absolutely solvable. With modern AI, especially tools like Claude that can digest large risk reports, scenario libraries and regulatory texts, you can systematize how you design, challenge and document stress scenarios. At Reruption, we’ve helped organisations turn messy, document-heavy processes into streamlined AI-supported workflows, and in the rest of this page you’ll find pragmatic guidance on how to do the same for your own stress-testing framework.

Need a sparring partner for this challenge?

Let's have a no-obligation chat and brainstorm together.

Innovators at these companies trust us:

Our Assessment

A strategic assessment of the challenge and high-level tips how to tackle it.

From Reruption’s hands-on work building AI-powered document analysis and decision-support tools, we see a consistent pattern: finance teams don’t lack expertise, they lack a scalable way to apply it across thousands of pages of risk reports, policies and regulations. Claude is particularly strong here because it can reason over long, complex documents and produce structured outputs that plug into existing stress-testing frameworks. Used properly, it becomes an assistant for risk managers—enhancing judgment, standardising documentation and making scenario and stress testing more robust and auditable, not more opaque.

Think of Claude as a stress-testing co-analyst, not a black-box model

The most effective finance teams position Claude as a co-analyst that supports, rather than replaces, existing quantitative models. Instead of asking it to compute capital impacts, they ask it to scan supervisory guidance, risk reports and macro research, then propose candidate scenarios and highlight logical gaps. Human risk officers then decide what to implement.

This mindset is crucial for reducing financial risk. It keeps accountability and model ownership with the risk function while leveraging AI to expand scenario coverage, improve consistency and standardise documentation. Claude’s outputs become structured inputs into your existing P&L, balance sheet and liquidity engines, not a parallel universe of numbers.

Design an AI-first governance layer around scenarios

Unreliable stress testing is often a governance problem disguised as a modelling issue. Before you scale Claude, define how its recommendations will be reviewed, approved and archived. For each scenario family (macro, sectoral, counterparty-specific), clarify who owns the final decision, what evidence is required, and how challenges are recorded.

Claude can then be instructed to generate outputs in governance-friendly formats: traceable rationale, explicit references to regulatory paragraphs, and clear links between assumptions and business drivers. This AI-first governance layer turns a previously ad-hoc process into a repeatable workflow that withstands internal audit and regulatory review.

Prepare your team for AI-augmented stress testing

Successful adoption is less about technology and more about team readiness. Finance and risk professionals need to understand what Claude is good at—reasoning over unstructured text, comparing frameworks, drafting scenarios—and where traditional models remain superior (numerical calibration, portfolio simulations, capital metrics).

Invest a small but focused enablement effort: short sessions where risk managers interact with Claude using real risk reports and scenario libraries, critique its suggestions, and iteratively refine prompts. This builds trust and ensures that AI in finance amplifies domain expertise instead of being perceived as yet another opaque “black box”.

Use Claude to widen scenario coverage without ballooning workload

A common strategic gap is that scenario libraries cover only a narrow band of plausible futures, often anchored on the last crisis. Claude can systematically scan internal loss data, external news, industry analyses and regulatory scenarios to propose additional stress themes and combinations you may have missed.

This allows you to expand from a handful of flagship scenarios to a structured library covering macro downturns, market dislocations, counterparty defaults, operational outages and climate-related risks. The key is to set clear priorities—where additional coverage materially changes risk decisions—and to use Claude to draft scenarios and documentation, while your team focuses on calibration and impact assessment.

Mitigate model risk with transparency and auditable outputs

Any use of AI in risk management must address model risk and explainability. The way to do this with Claude is to design prompts and workflows that explicitly demand transparency: references to source documents, step-by-step reasoning, and alternative scenario variants with trade-offs spelled out.

Strategically, this turns AI from a model-risk headache into part of your model-risk solution. Claude can help you document scenario rationale, align it with regulatory expectations, and maintain an auditable trail of changes over time. This lowers the risk of findings during inspections and strengthens the overall credibility of your stress-testing framework.

Used with the right governance and mindset, Claude can transform unreliable, manual stress testing into a disciplined, AI-supported process that expands scenario coverage, strengthens documentation and ultimately reduces financial risk. At Reruption, we’ve built similar AI workflows for document-heavy decision processes, and we know how to connect Claude’s strengths with your existing risk models and controls. If you’re exploring how to modernise scenario and stress testing, we’re happy to discuss a concrete setup rather than generic AI theory.

Need help implementing these ideas?

Feel free to reach out to us with no obligation.

Real-World Case Studies

From E-commerce to Healthcare: Learn how companies successfully use Claude.

Zalando

E-commerce

In the online fashion retail sector, high return rates—often exceeding 30-40% for apparel—stem primarily from fit and sizing uncertainties, as customers cannot physically try on items before purchase . Zalando, Europe's largest fashion e-tailer serving 27 million active customers across 25 markets, faced substantial challenges with these returns, incurring massive logistics costs, environmental impact, and customer dissatisfaction due to inconsistent sizing across over 6,000 brands and 150,000+ products . Traditional size charts and recommendations proved insufficient, with early surveys showing up to 50% of returns attributed to poor fit perception, hindering conversion rates and repeat purchases in a competitive market . This was compounded by the lack of immersive shopping experiences online, leading to hesitation among tech-savvy millennials and Gen Z shoppers who demanded more personalized, visual tools.

Lösung

Zalando addressed these pain points by deploying a generative computer vision-powered virtual try-on solution, enabling users to upload selfies or use avatars to see realistic garment overlays tailored to their body shape and measurements . Leveraging machine learning models for pose estimation, body segmentation, and AI-generated rendering, the tool predicts optimal sizes and simulates draping effects, integrating with Zalando's ML platform for scalable personalization . The system combines computer vision (e.g., for landmark detection) with generative AI techniques to create hyper-realistic visualizations, drawing from vast datasets of product images, customer data, and 3D scans, ultimately aiming to cut returns while enhancing engagement . Piloted online and expanded to outlets, it forms part of Zalando's broader AI ecosystem including size predictors and style assistants.

Ergebnisse

  • 30,000+ customers used virtual fitting room shortly after launch
  • 5-10% projected reduction in return rates
  • Up to 21% fewer wrong-size returns via related AI size tools
  • Expanded to all physical outlets by 2023 for jeans category
  • Supports 27 million customers across 25 European markets
  • Part of AI strategy boosting personalization for 150,000+ products
Read case study →

Lunar

Banking

Lunar, a leading Danish neobank, faced surging customer service demand outside business hours, with many users preferring voice interactions over apps due to accessibility issues. Long wait times frustrated customers, especially elderly or less tech-savvy ones struggling with digital interfaces, leading to inefficiencies and higher operational costs. This was compounded by the need for round-the-clock support in a competitive fintech landscape where 24/7 availability is key. Traditional call centers couldn't scale without ballooning expenses, and voice preference was evident but underserved, resulting in lost satisfaction and potential churn.

Lösung

Lunar deployed Europe's first GenAI-native voice assistant powered by GPT-4, enabling natural, telephony-based conversations for handling inquiries anytime without queues. The agent processes complex banking queries like balance checks, transfers, and support in Danish and English. Integrated with advanced speech-to-text and text-to-speech, it mimics human agents, escalating only edge cases to humans. This conversational AI approach overcame scalability limits, leveraging OpenAI's tech for accuracy in regulated fintech.

Ergebnisse

  • ~75% of all customer calls expected to be handled autonomously
  • 24/7 availability eliminating wait times for voice queries
  • Positive early feedback from app-challenged users
  • First European bank with GenAI-native voice tech
  • Significant operational cost reductions projected
Read case study →

Shell

Energy

Unplanned equipment failures in refineries and offshore oil rigs plagued Shell, causing significant downtime, safety incidents, and costly repairs that eroded profitability in a capital-intensive industry. According to a Deloitte 2024 report, 35% of refinery downtime is unplanned, with 70% preventable via advanced analytics—highlighting the gap in traditional scheduled maintenance approaches that missed subtle failure precursors in assets like pumps, valves, and compressors. Shell's vast global operations amplified these issues, generating terabytes of sensor data from thousands of assets that went underutilized due to data silos, legacy systems, and manual analysis limitations. Failures could cost millions per hour, risking environmental spills and personnel safety while pressuring margins amid volatile energy markets.

Lösung

Shell partnered with C3 AI to implement an AI-powered predictive maintenance platform, leveraging machine learning models trained on real-time IoT sensor data, maintenance histories, and operational metrics to forecast failures and optimize interventions. Integrated with Microsoft Azure Machine Learning, the solution detects anomalies, predicts remaining useful life (RUL), and prioritizes high-risk assets across upstream oil rigs and downstream refineries. The scalable C3 AI platform enabled rapid deployment, starting with pilots on critical equipment and expanding globally. It automates predictive analytics, shifting from reactive to proactive maintenance, and provides actionable insights via intuitive dashboards for engineers.

Ergebnisse

  • 20% reduction in unplanned downtime
  • 15% slash in maintenance costs
  • £1M+ annual savings per site
  • 10,000 pieces of equipment monitored globally
  • 35% industry unplanned downtime addressed (Deloitte benchmark)
  • 70% preventable failures mitigated
Read case study →

bunq

Banking

As bunq experienced rapid growth as the second-largest neobank in Europe, scaling customer support became a critical challenge. With millions of users demanding personalized banking information on accounts, spending patterns, and financial advice on demand, the company faced pressure to deliver instant responses without proportionally expanding its human support teams, which would increase costs and slow operations. Traditional search functions in the app were insufficient for complex, contextual queries, leading to inefficiencies and user frustration. Additionally, ensuring data privacy and accuracy in a highly regulated fintech environment posed risks. bunq needed a solution that could handle nuanced conversations while complying with EU banking regulations, avoiding hallucinations common in early GenAI models, and integrating seamlessly without disrupting app performance. The goal was to offload routine inquiries, allowing human agents to focus on high-value issues.

Lösung

bunq addressed these challenges by developing Finn, a proprietary GenAI platform integrated directly into its mobile app, replacing the traditional search function with a conversational AI chatbot. After hiring over a dozen data specialists in the prior year, the team built Finn to query user-specific financial data securely, answer questions on balances, transactions, budgets, and even provide general advice while remembering conversation context across sessions. Launched as Europe's first AI-powered bank assistant in December 2023 following a beta, Finn evolved rapidly. By May 2024, it became fully conversational, enabling natural back-and-forth interactions. This retrieval-augmented generation (RAG) approach grounded responses in real-time user data, minimizing errors and enhancing personalization.

Ergebnisse

  • 100,000+ questions answered within months post-beta (end-2023)
  • 40% of user queries fully resolved autonomously by mid-2024
  • 35% of queries assisted, totaling 75% immediate support coverage
  • Hired 12+ data specialists pre-launch for data infrastructure
  • Second-largest neobank in Europe by user base (1M+ users)
Read case study →

Bank of America

Banking

Bank of America faced a high volume of routine customer inquiries, such as account balances, payments, and transaction histories, overwhelming traditional call centers and support channels. With millions of daily digital banking users, the bank struggled to provide 24/7 personalized financial advice at scale, leading to inefficiencies, longer wait times, and inconsistent service quality. Customers demanded proactive insights beyond basic queries, like spending patterns or financial recommendations, but human agents couldn't handle the sheer scale without escalating costs. Additionally, ensuring conversational naturalness in a regulated industry like banking posed challenges, including compliance with financial privacy laws, accurate interpretation of complex queries, and seamless integration into the mobile app without disrupting user experience. The bank needed to balance AI automation with human-like empathy to maintain trust and high satisfaction scores.

Lösung

Bank of America developed Erica, an in-house NLP-powered virtual assistant integrated directly into its mobile banking app, leveraging natural language processing and predictive analytics to handle queries conversationally. Erica acts as a gateway for self-service, processing routine tasks instantly while offering personalized insights, such as cash flow predictions or tailored advice, using client data securely. The solution evolved from a basic navigation tool to a sophisticated AI, incorporating generative AI elements for more natural interactions and escalating complex issues to human agents seamlessly. Built with a focus on in-house language models, it ensures control over data privacy and customization, driving enterprise-wide AI adoption while enhancing digital engagement.

Ergebnisse

  • 3+ billion total client interactions since 2018
  • Nearly 50 million unique users assisted
  • 58+ million interactions per month (2025)
  • 2 billion interactions reached by April 2024 (doubled from 1B in 18 months)
  • 42 million clients helped by 2024
  • 19% earnings spike linked to efficiency gains
Read case study →

Best Practices

Successful implementations follow proven patterns. Have a look at our tactical advice to get started.

Use Claude to mine regulatory texts into concrete scenario requirements

Start by consolidating the key regulatory and supervisory documents that drive your stress-testing obligations—for example, internal risk appetite statements, central bank stress manuals, and relevant EBA/BaFin/ECB guidelines. Upload or connect these texts to Claude (via your chosen secure integration), and ask it to extract scenario design requirements, coverage expectations and documentation standards.

Example prompt:

You are a senior banking supervisor with expertise in stress testing.

Input:
- Internal Stress Testing Policy (PDF)
- ECB Stress Test Methodology 20XX (PDF)

Task:
1. Extract all explicit and implicit expectations regarding:
   - Types of scenarios (macro, market, idiosyncratic, combined)
   - Horizon, severity and frequencies
   - Required documentation and governance
2. Summarise them in a structured table with columns:
   - Source document and paragraph
   - Requirement
   - Implication for scenario design
3. Highlight any gaps or inconsistencies in our internal policy.

Expected outcome: a clear, audit-ready requirements list that your risk team can validate and use as a baseline for systematic scenario design.

Generate and refine scenario narratives before quantification

Many stress tests jump too quickly to numbers. Use Claude to first craft robust, internally consistent scenario narratives. Feed it your current scenario library, recent risk committee minutes, and macro research, then instruct it to propose additional stress scenarios with explicit economic and behavioural chains.

Example prompt:

You are a stress testing expert in a banking group.

Input:
- Our current scenario library (DOCX)
- Last 4 quarterly risk reports (PDF)
- Macro research on "stagflation" and "energy shock" (PDF)

Task:
1. Propose 5 new macroeconomic stress scenarios relevant to our portfolio.
2. For each scenario, describe:
   - Trigger events and narrative (max 200 words)
   - Key macro variables (GDP, unemployment, inflation, rates, FX)
   - Expected impacts on:
     * Wholesale funding costs
     * Retail credit quality
     * Market risk positions
3. Ensure scenarios are distinct from existing ones and cover tail risks.
4. Output in a table + short narratives for management.

Risk managers then review, challenge and select scenarios for quantification, significantly reducing the time spent on initial drafting.

Map scenarios systematically into P&L, balance sheet and cash flow drivers

To tackle the "propagation" problem, let Claude help you build and maintain a driver map between scenario variables and financial statement line items. Provide your existing ALM model documentation, P&L/balance sheet structures and any internal mapping documents.

Example prompt:

You are assisting with stress test model documentation.

Input:
- ALM/stress testing methodology document (PDF)
- Chart of accounts (XLSX export as text)
- Existing driver mapping notes (DOCX)

Task:
1. Create a mapping between stress variables and financial statement items:
   - Macro variables (GDP, unemployment, rates etc.)
   - Market variables (spreads, FX, equities)
   - Idiosyncratic variables (default of top 20 counterparties)
2. For each mapping, specify:
   - Direction of impact (positive/negative)
   - Main transmission channels
   - Relevant model or data source
3. Highlight any financial items without a clear driver linkage.

This produces a living document that supports both implementation and model governance, while revealing where your propagation logic is weak or missing.

Standardise scenario documentation and governance packs

Once scenarios are agreed, Claude can auto-generate consistent documentation for risk committees, internal audit and regulators. Define a documentation template covering narrative, assumptions, parameterisation, transmission channels, limitations and validation results. Then have Claude fill this based on your working files, meeting notes and model outputs.

Example prompt:

You are preparing governance documentation for a new stress scenario.

Input:
- Approved scenario narrative (DOCX)
- Parameter table (CSV as text)
- Excerpts from risk committee minutes (DOCX)

Task:
Using our internal "Scenario Governance Template" (section headings below), draft a complete document:
1. Scenario overview and rationale
2. Link to regulatory and internal requirements (cite sources)
3. Assumptions and parameters
4. Transmission channels into P&L, balance sheet and liquidity
5. Limitations and known weaknesses
6. Validation and backtesting evidence
7. Change log from previous version.

Ensure all claims are traceable back to the inputs and flag any missing information.

Expected outcome: governance packs that are consistent across scenarios, easier to review, and faster to update when assumptions change.

Use Claude as a QA layer on models, scenarios and results

Beyond generation, Claude is highly effective as a quality-assurance layer on your stress-testing framework. After a stress round, feed it the scenario definitions, key outputs and commentary, and ask it to challenge internal consistency and coverage against your documented risk profile.

Example prompt:

You are an internal model validation expert.

Input:
- Description of all scenarios used in the latest stress test (DOCX)
- Summary of portfolio risk profile (DOCX)
- Key stress test results (PDF)

Task:
1. Check whether the scenario set adequately covers:
   - Our main concentrations (by sector, region, product)
   - Funding and liquidity risks
   - Counterparty and concentration risks
2. Identify any obvious blind spots or underrepresented risk drivers.
3. Flag internal inconsistencies (e.g., narrative vs. parameter severity).
4. Suggest 3–5 targeted improvements for the next cycle.

This does not replace formal model validation, but provides a fast, documented challenge that can significantly improve quality between full validation cycles.

Integrate Claude into a secure, finance-ready workflow

For production use in finance, technical integration and security are non-negotiable. Work with IT and risk to route Claude through your approved infrastructure (e.g., private instance, API via controlled backend), enforce access controls, and implement logging of prompts and outputs for auditability.

Define a simple workflow: data preparation (anonymisation where needed), Claude interaction via pre-approved prompt templates, human review, and then storage of accepted outputs in your document management system. Reruption’s AI PoC approach can help you prototype this end-to-end flow on a contained use case—such as automating documentation for two key scenarios—before expanding to broader stress-testing tasks.

When implemented this way, finance teams typically see: 30–50% faster scenario documentation, significantly broader scenario coverage without extra headcount, and a noticeable reduction in findings during internal reviews of scenario and stress testing—while keeping risk ownership firmly with the first and second lines of defence.

Need implementation expertise now?

Let's talk about your ideas!

Frequently Asked Questions

Claude strengthens scenario and stress testing by tackling the document-heavy, judgment-intensive parts of the process. It can read large sets of risk reports, existing scenarios and regulatory texts, then propose additional stress themes, refine narratives and highlight gaps in coverage. It helps create systematic mappings from scenario variables to P&L, balance sheet and liquidity drivers, and generates consistent, audit-ready documentation for governance bodies.

Importantly, Claude does not replace your quantitative models. It complements them by improving scenario design, transparency and traceability—turning a manual, error-prone process into a structured workflow that is easier to defend to management, internal audit and regulators.

You mainly need three things: strong risk domain expertise, basic familiarity with AI tools like Claude, and secure technical access. Your risk and finance teams remain the decision-makers—they define requirements, validate scenarios and own model outputs. Selected team members should be trained on how to interact with Claude using well-structured prompts and how to critically assess its suggestions.

On the technical side, you need an approved way to provide Claude with documents (risk reports, policies, regulations) and to capture its outputs in your existing document management or model governance systems. Reruption typically configures this in close alignment with IT, risk and compliance to ensure security, logging and auditability from day one.

Within a few weeks, most organisations see clear efficiency gains in scenario design and documentation: faster drafting of narratives, more consistent governance packs, and better coverage analyses. After one full stress-testing cycle, the cumulative impact becomes visible—more robust scenario libraries, fewer manual inconsistencies, and improved internal challenge.

In terms of numbers, many teams can reduce the time spent on documentation and desktop research for stress testing by 30–50%, while increasing the number of well-defined, governance-ready scenarios. The calibration and modelling steps still take time, but they start from a higher-quality, more transparent foundation.

The direct cost of running Claude (via API or enterprise access) is usually small compared to the labour cost of senior risk and finance professionals. The ROI comes from reducing manual hours on low-value tasks (copy-pasting, drafting, versioning), improving the quality and breadth of stress scenarios, and lowering regulatory and model-risk exposure.

We recommend starting with a narrowly scoped use case—such as automating scenario documentation for a subset of portfolios—so you can measure concrete savings in preparation time, reduction in review cycles, and improved coverage. This data then supports a business case for expanding Claude’s role across the broader risk management and stress-testing landscape.

Reruption supports you end-to-end, from identifying the right entry point to scaling an AI-supported stress-testing framework. With our AI PoC offering (9,900€), we rapidly validate a concrete use case—for example, using Claude to generate and document macroeconomic scenarios for a specific portfolio—by delivering a working prototype, performance metrics and an implementation roadmap.

Beyond the PoC, our Co-Preneur approach means we embed with your finance and risk teams like co-founders rather than external advisors. We work directly in your P&L, design prompts and workflows, integrate Claude securely into your infrastructure, and help establish governance and enablement so your teams can own and evolve the solution. The goal is not another slide deck, but a functioning AI capability that makes your scenario and stress testing more reliable, explainable and resilient.

Contact Us!

0/10 min.

Contact Directly

Your Contact

Philipp M. W. Hoffmann

Founder & Partner

Address

Reruption GmbH

Falkertstraße 2

70176 Stuttgart

Social Media