The Challenge: Uncategorized Expense Entries

Uncategorized or vaguely coded expenses are a silent tax on your finance function. Employees submit credit card statements, travel receipts, and invoices with missing or generic categories like “Misc” or “Other”, leaving controllers and accountants to decipher descriptions and PDFs one by one. The result is slow month-end closing, inconsistent coding between teams and entities, and unreliable cost center or project views when management needs them most.

Traditional approaches rely on manual review, static expense policies, and basic rules in ERP or T&E systems. These rules quickly break down when merchants change descriptors, employees use different terms for the same thing, or new subscription and SaaS services appear. Shared mailboxes, spreadsheets, and manual journal adjustments might work for a small volume, but they do not scale across thousands of transactions per month or multiple legal entities.

The business impact is significant: misposted costs distort profitability by cost center, project, and customer. Controllers lose days each month chasing down unclear transactions instead of analyzing drivers of spend. Budget owners see outdated or incomplete reports and react too late to rein in travel, procurement, and software subscriptions. In the worst case, inconsistent coding weakens audit trails, increases the risk of policy violations or fraud going unnoticed, and undermines trust in your financial data.

The good news is that this problem is highly solvable with modern AI. By combining your historical postings with a tool like Claude that can read both transaction data and backing documents, you can turn uncategorized entries into clean, consistent, and auditable expense data. At Reruption, we’ve seen how AI-first approaches can replace fragile manual processes, and below we’ll walk through concrete ways to use Claude to regain control over expense categorization.

Need a sparring partner for this challenge?

Let's have a no-obligation chat and brainstorm together.

Innovators at these companies trust us:

Our Assessment

A strategic assessment of the challenge and high-level tips how to tackle it.

From Reruption’s perspective, using Claude to fix uncategorized expense entries is one of the most pragmatic starting points for AI in finance. You already have labeled historical data, clear policies, and a repetitive, text-heavy process that drains time from your team. With our hands-on experience building AI-powered document analysis and classification workflows, we’ve seen that combining Claude’s long-context understanding with targeted finance logic can quickly transform noisy expense data into reliable, real-time insight.

Treat Expense Categorization as a Data Quality Product, Not a One-Off Fix

Many finance teams approach uncategorized expenses as a “month-end clean-up task” instead of a product to design and continuously improve. To get value from Claude for expense classification, you need to think of your expense data as a product with clear owners, quality standards, and feedback loops. That means defining what “good” looks like: target classification accuracy, response time, and acceptable exception rates for manual review.

Strategically, this shifts the conversation from “Can AI tag some expenses?” to “How do we build a system that keeps our expense data clean at all times?” In practice, that involves product-like decisions: which data sources to include (card feeds, T&E, AP), how often to run classification, and how to surface AI outputs back into ERP or BI tools. When finance, IT, and controlling co-own this “data product”, you can iterate quickly on prompts, rules, and workflows instead of treating AI as a black box.

Start with High-Impact Categories and Clear Policies

Not all expense categories are equal. Strategically, you get the fastest ROI from Claude by focusing on areas where spend visibility and policy compliance matter most: travel and entertainment, software subscriptions, marketing, and specific project or customer-related costs. These usually have higher spend, more potential for leakage, and clearer rules that AI can learn.

Before you build anything, pressure-test your existing policies. If your travel policy is vague or cost center assignment rules are unclear, Claude will simply reflect that ambiguity. Use this as a trigger to refine category definitions, cost center mapping rules, and thresholds for approvals. A clear policy framework lets Claude learn consistent patterns, reduces edge cases, and makes the system easier for auditors and controllers to trust.

Design a Human-in-the-Loop Workflow from Day One

AI in finance should be assistive, not autonomous, especially for classification that affects financial statements. Strategically, you want Claude to handle the bulk of straightforward expenses while your finance team focuses on exceptions, policy conflicts, and potential fraud. This requires a designed human-in-the-loop workflow with clear escalation rules, not ad-hoc spot checks.

Define confidence thresholds up front: for example, classifications above 95% confidence and under a certain amount can be auto-posted, while anything below that or above a risk threshold routes to a reviewer. This protects data quality, builds trust with controllers, and creates training data: every human correction becomes a learning signal to refine prompts, rules, or models.

Align Stakeholders on Governance, Risk, and Compliance Early

For many CFOs, the biggest barrier to using AI in expense control isn’t technology, it’s governance. Risk, compliance, and internal audit need confidence that the system will not obscure who made which decision and why. Strategically, you should involve these stakeholders at design stage, not after deployment.

Clarify questions like: What documentation do we need for auditors? How do we log Claude’s suggestions, user overrides, and final postings? What are the approval rules for changing classification logic? By designing auditability and data lineage into your AI workflow, you avoid downstream resistance and unlock faster adoption. This is where Reruption’s focus on security, compliance, and AI-first architecture becomes particularly valuable.

Prepare Your Team for New Roles and Skills

When Claude takes over the repetitive part of expense categorization, your finance team’s work shifts from “doing” to “supervising and improving” the system. Strategically, you should anticipate this and invest in the skills to manage AI-driven processes: prompt design, reviewing AI outputs, defining heuristics, and interpreting classification metrics.

Controllers and accountants don’t need to become data scientists, but they do need a working understanding of how AI expense classification behaves, where it can fail, and how to provide structured feedback. Set expectations clearly: the goal is not to replace the team, but to let them move from low-value categorization work to higher-value analysis, forecasting, and scenario modeling.

Using Claude to clean up uncategorized expense entries is one of the most direct ways finance teams can turn messy data into reliable, real-time spend visibility. When you treat it as a data product, embed human-in-the-loop controls, and align governance from the start, you get both faster closing and stronger audit readiness. Reruption’s engineers and finance-focused consultants can help you scope, prototype, and harden such a solution quickly; if you’re exploring this use case, our AI PoC is a pragmatic way to test it on your own expense data before scaling further.

Need help implementing these ideas?

Feel free to reach out to us with no obligation.

Real-World Case Studies

From Manufacturing to Fintech: Learn how companies successfully use Claude.

Samsung Electronics

Manufacturing

Samsung Electronics faces immense challenges in consumer electronics manufacturing due to massive-scale production volumes, often exceeding millions of units daily across smartphones, TVs, and semiconductors. Traditional human-led inspections struggle with fatigue-induced errors, missing subtle defects like micro-scratches on OLED panels or assembly misalignments, leading to costly recalls and rework. In facilities like Gumi, South Korea, lines process 30,000 to 50,000 units per shift, where even a 1% defect rate translates to thousands of faulty devices shipped, eroding brand trust and incurring millions in losses annually. Additionally, supply chain volatility and rising labor costs demanded hyper-efficient automation. Pre-AI, reliance on manual QA resulted in inconsistent detection rates (around 85-90% accuracy), with challenges in scaling real-time inspection for diverse components amid Industry 4.0 pressures.

Lösung

Samsung's solution integrates AI-driven machine vision, autonomous robotics, and NVIDIA-powered AI factories for end-to-end quality assurance (QA). Deploying over 50,000 NVIDIA GPUs with Omniverse digital twins, factories simulate and optimize production, enabling robotic arms for precise assembly and vision systems for defect detection at microscopic levels. Implementation began with pilot programs in Gumi's Smart Factory (Gold UL validated), expanding to global sites. Deep learning models trained on vast datasets achieve 99%+ accuracy, automating inspection, sorting, and rework while cobots (collaborative robots) handle repetitive tasks, reducing human error. This vertically integrated ecosystem fuses Samsung's semiconductors, devices, and AI software.

Ergebnisse

  • 30,000-50,000 units inspected per production line daily
  • Near-zero (<0.01%) defect rates in shipped devices
  • 99%+ AI machine vision accuracy for defect detection
  • 50%+ reduction in manual inspection labor
  • $ millions saved annually via early defect catching
  • 50,000+ NVIDIA GPUs deployed in AI factories
Read case study →

Shell

Energy

Unplanned equipment failures in refineries and offshore oil rigs plagued Shell, causing significant downtime, safety incidents, and costly repairs that eroded profitability in a capital-intensive industry. According to a Deloitte 2024 report, 35% of refinery downtime is unplanned, with 70% preventable via advanced analytics—highlighting the gap in traditional scheduled maintenance approaches that missed subtle failure precursors in assets like pumps, valves, and compressors. Shell's vast global operations amplified these issues, generating terabytes of sensor data from thousands of assets that went underutilized due to data silos, legacy systems, and manual analysis limitations. Failures could cost millions per hour, risking environmental spills and personnel safety while pressuring margins amid volatile energy markets.

Lösung

Shell partnered with C3 AI to implement an AI-powered predictive maintenance platform, leveraging machine learning models trained on real-time IoT sensor data, maintenance histories, and operational metrics to forecast failures and optimize interventions. Integrated with Microsoft Azure Machine Learning, the solution detects anomalies, predicts remaining useful life (RUL), and prioritizes high-risk assets across upstream oil rigs and downstream refineries. The scalable C3 AI platform enabled rapid deployment, starting with pilots on critical equipment and expanding globally. It automates predictive analytics, shifting from reactive to proactive maintenance, and provides actionable insights via intuitive dashboards for engineers.

Ergebnisse

  • 20% reduction in unplanned downtime
  • 15% slash in maintenance costs
  • £1M+ annual savings per site
  • 10,000 pieces of equipment monitored globally
  • 35% industry unplanned downtime addressed (Deloitte benchmark)
  • 70% preventable failures mitigated
Read case study →

UC San Francisco Health

Healthcare

At UC San Francisco Health (UCSF Health), one of the nation's leading academic medical centers, clinicians grappled with immense documentation burdens. Physicians spent nearly two hours on electronic health record (EHR) tasks for every hour of direct patient care, contributing to burnout and reduced patient interaction . This was exacerbated in high-acuity settings like the ICU, where sifting through vast, complex data streams for real-time insights was manual and error-prone, delaying critical interventions for patient deterioration . The lack of integrated tools meant predictive analytics were underutilized, with traditional rule-based systems failing to capture nuanced patterns in multimodal data (vitals, labs, notes). This led to missed early warnings for sepsis or deterioration, higher lengths of stay, and suboptimal outcomes in a system handling millions of encounters annually . UCSF sought to reclaim clinician time while enhancing decision-making precision.

Lösung

UCSF Health built a secure, internal AI platform leveraging generative AI (LLMs) for "digital scribes" that auto-draft notes, messages, and summaries, integrated directly into their Epic EHR using GPT-4 via Microsoft Azure . For predictive needs, they deployed ML models for real-time ICU deterioration alerts, processing EHR data to forecast risks like sepsis . Partnering with H2O.ai for Document AI, they automated unstructured data extraction from PDFs and scans, feeding into both scribe and predictive pipelines . A clinician-centric approach ensured HIPAA compliance, with models trained on de-identified data and human-in-the-loop validation to overcome regulatory hurdles . This holistic solution addressed both administrative drag and clinical foresight gaps.

Ergebnisse

  • 50% reduction in after-hours documentation time
  • 76% faster note drafting with digital scribes
  • 30% improvement in ICU deterioration prediction accuracy
  • 25% decrease in unexpected ICU transfers
  • 2x increase in clinician-patient face time
  • 80% automation of referral document processing
Read case study →

IBM

Technology

In a massive global workforce exceeding 280,000 employees, IBM grappled with high employee turnover rates, particularly among high-performing and top talent. The cost of replacing a single employee—including recruitment, onboarding, and lost productivity—can exceed $4,000-$10,000 per hire, amplifying losses in a competitive tech talent market. Manually identifying at-risk employees was nearly impossible amid vast HR data silos spanning demographics, performance reviews, compensation, job satisfaction surveys, and work-life balance metrics. Traditional HR approaches relied on exit interviews and anecdotal feedback, which were reactive and ineffective for prevention. With attrition rates hovering around industry averages of 10-20% annually, IBM faced annual costs in the hundreds of millions from rehiring and training, compounded by knowledge loss and morale dips in a tight labor market. The challenge intensified as retaining scarce AI and tech skills became critical for IBM's innovation edge.

Lösung

IBM developed a predictive attrition ML model using its Watson AI platform, analyzing 34+ HR variables like age, salary, overtime, job role, performance ratings, and distance from home from an anonymized dataset of 1,470 employees. Algorithms such as logistic regression, decision trees, random forests, and gradient boosting were trained to flag employees with high flight risk, achieving 95% accuracy in identifying those likely to leave within six months. The model integrated with HR systems for real-time scoring, triggering personalized interventions like career coaching, salary adjustments, or flexible work options. This data-driven shift empowered CHROs and managers to act proactively, prioritizing top performers at risk.

Ergebnisse

  • 95% accuracy in predicting employee turnover
  • Processed 1,470+ employee records with 34 variables
  • 93% accuracy benchmark in optimized Extra Trees model
  • Reduced hiring costs by averting high-value attrition
  • Potential annual savings exceeding $300M in retention (reported)
Read case study →

Zalando

E-commerce

In the online fashion retail sector, high return rates—often exceeding 30-40% for apparel—stem primarily from fit and sizing uncertainties, as customers cannot physically try on items before purchase . Zalando, Europe's largest fashion e-tailer serving 27 million active customers across 25 markets, faced substantial challenges with these returns, incurring massive logistics costs, environmental impact, and customer dissatisfaction due to inconsistent sizing across over 6,000 brands and 150,000+ products . Traditional size charts and recommendations proved insufficient, with early surveys showing up to 50% of returns attributed to poor fit perception, hindering conversion rates and repeat purchases in a competitive market . This was compounded by the lack of immersive shopping experiences online, leading to hesitation among tech-savvy millennials and Gen Z shoppers who demanded more personalized, visual tools.

Lösung

Zalando addressed these pain points by deploying a generative computer vision-powered virtual try-on solution, enabling users to upload selfies or use avatars to see realistic garment overlays tailored to their body shape and measurements . Leveraging machine learning models for pose estimation, body segmentation, and AI-generated rendering, the tool predicts optimal sizes and simulates draping effects, integrating with Zalando's ML platform for scalable personalization . The system combines computer vision (e.g., for landmark detection) with generative AI techniques to create hyper-realistic visualizations, drawing from vast datasets of product images, customer data, and 3D scans, ultimately aiming to cut returns while enhancing engagement . Piloted online and expanded to outlets, it forms part of Zalando's broader AI ecosystem including size predictors and style assistants.

Ergebnisse

  • 30,000+ customers used virtual fitting room shortly after launch
  • 5-10% projected reduction in return rates
  • Up to 21% fewer wrong-size returns via related AI size tools
  • Expanded to all physical outlets by 2023 for jeans category
  • Supports 27 million customers across 25 European markets
  • Part of AI strategy boosting personalization for 150,000+ products
Read case study →

Best Practices

Successful implementations follow proven patterns. Have a look at our tactical advice to get started.

Centralize Expense Data and Context for Claude

Claude delivers the best results when it sees the full picture of each expense: transaction data, descriptions, merchant information, receipts, and invoices. As a first step, work with IT to centralize inputs from your card provider, T&E tool, and AP system into a single pipeline or staging database that Claude can access. Include fields like GL account, cost center, project, vendor, and previous category assignments.

For long-context models, you can bundle multiple receipts or invoice PDFs into one request, letting Claude cross-reference descriptions against your chart of accounts and cost center hierarchies. This enables rules like “assign Uber rides to project cost centers if the description mentions the project code” that would be cumbersome to encode manually. Even if you start with file-based batches, make sure each transaction is enriched with as much structured data as possible before sending it to Claude.

Design a Robust Classification Prompt with Clear Instructions

Prompt design is crucial for consistent, auditable classification. Your prompt should explain your chart of accounts, cost center logic, and policy rules in concise but precise terms, then ask Claude to return a structured JSON output. Here’s a simplified example you can adapt:

System / Instruction to Claude:
You are an AI assistant helping a finance team classify business expenses.

Goals:
- Assign each expense to the correct GL account and cost center.
- Flag potential policy violations or suspicious transactions.

Use this chart of accounts (examples):
- 6100: Travel - Flights
- 6110: Travel - Hotels
- 6120: Travel - Ground Transport
- 6300: Software Subscriptions
- 6400: Marketing & Events
- 6999: Miscellaneous (use only if nothing else fits)

Rules:
- Prefer specific GL codes over Miscellaneous.
- If merchant or description indicates a known SaaS tool, use 6300.
- If a project code (e.g., PRJ-1234) appears, assign that cost center.
- Flag as "policy_violation": true if description suggests personal spend.

Return JSON only in this format:
{
  "gl_account": "<code>",
  "cost_center": "<id or null>",
  "confidence": <0-1>,
  "policy_violation": true/false,
  "notes": "<short rationale>"
}

Now classify this expense:
Merchant: <MERCHANT>
Amount: <AMOUNT>
Date: <DATE>
Description: <DESCRIPTION>
Receipt text: <EXTRACTED_TEXT_FROM_RECEIPT>

Iterate on this prompt with real data until Claude reliably picks your preferred categories and flags edge cases correctly. Small clarifications (for example, which keywords indicate software vs. marketing spend) can materially improve classification quality.

Implement Confidence Thresholds and Review Queues

To safely automate expense classification with Claude, you need a mechanism to distinguish between “safe to auto-post” and “requires review”. Use the confidence score returned by Claude, combined with transaction attributes, to route items accordingly. For example, you might auto-accept expenses under €200 with confidence > 0.97, while any transaction higher than €2,000 or with confidence < 0.9 goes to a human reviewer.

In your workflow tool (ERP, T&E, or a custom app), create distinct queues such as “AI Approved”, “AI Low Confidence”, and “AI Policy Alerts”. Reviewers should see Claude’s proposed category, confidence, and rationale so they can quickly accept or correct. Every override can be logged and periodically sampled as training data to refine prompts, additional rules, or even fine-tuned models in the future.

Use Claude to Normalize Merchant and Description Data

One root cause of uncategorized expenses is messy free-text: different spellings, abbreviations, or cryptic merchant names from card schemes. Claude is very effective at normalizing merchant and description text before classification, which improves consistency across your finance systems.

Introduce a pre-processing step where Claude maps raw strings to standardized values. For example:

Instruction to Claude:
You are cleaning expense transaction data for a finance system.
For each input, return:
{
  "normalized_merchant": "standardized merchant name",
  "normalized_purpose": "short, clear purpose of the spend",
  "tags": ["travel", "software", "subscription", ...]
}

Input:
Merchant: UBER *TRIP HELP.UBER.COM
Description: Ride from office to client PRJ-4589

Expected output:
{
  "normalized_merchant": "Uber",
  "normalized_purpose": "Taxi ride from office to client site",
  "tags": ["travel", "ground_transport", "client_meeting"]
}

You can then base your classification rules on normalized merchants and purposes, dramatically reducing the number of edge cases and improving reporting consistency.

Automate Policy Checks and Annotation for Audits

Beyond categorization, Claude can evaluate each expense against your travel and expense policies and pre-annotate transactions for audit readiness. Feed your policy text (limits, allowed categories, required justifications) into the prompt and ask Claude to flag potential violations or missing documentation.

For example, require Claude to output fields like "policy_flag", "reason", and "missing_docs". A sample configuration might look like:

Instruction to Claude:
Given the company travel policy and the expense details, assess compliance.
Return:
{
  "policy_flag": "none" | "limit_exceeded" | "personal_suspected" | "missing_receipt",
  "reason": "<short explanation>",
  "required_action": "ok" | "request_justification" | "deny_reimbursement"
}

These annotations can be stored alongside each posting, giving auditors a clear trace of what was checked, why something was flagged, and how it was resolved. Over time, you’ll see fewer ad-hoc email chains and more structured, searchable evidence.

Instrument KPIs and Run a Controlled Pilot

Before rolling out AI-driven categorization to all entities, run a pilot on a subset of transactions (for example, one business unit’s travel and software spend). Define clear KPIs such as classification accuracy vs. current baseline, reduction in manual touch time, time saved at month-end close, and policy violation detection rate.

During the pilot, sample a percentage of “AI Approved” transactions for manual quality checks and compare them with a control group processed using your old method. Adjust prompts and thresholds until you consistently hit agreed targets (e.g. > 96% accuracy and > 40% reduction in manual review time). Once validated, you can expand coverage to more categories and entities with realistic expectations on performance.

Implemented carefully, these practices typically lead to tangible results: finance teams often see a 30–60% reduction in manual expense review effort, closing times shortened by 1–3 days for affected entities, and a noticeable improvement in policy adherence and audit readiness. The exact metrics will depend on your baseline and data quality, but with a structured rollout, Claude can turn uncategorized expenses from a recurring headache into a controlled, largely automated process.

Need implementation expertise now?

Let's talk about your ideas!

Frequently Asked Questions

Claude reads the available data for each transaction — merchant name, amount, date, free-text description, and, if available, the receipt or invoice. Using a prompt tailored to your chart of accounts, cost centers, and expense policies, it proposes a GL account, cost center, and optional policy flags (for example, potential personal spend or missing documentation).

Technically, Claude converts your rules and historical examples into a pattern it can apply to new entries. You can have it return a structured JSON output with category, confidence score, and rationale, which your ERP or T&E system then uses either to auto-post low-risk items or route higher-risk items to a human reviewer.

You don’t need a large data science team to get started. The key ingredients are:

  • A finance owner (controller or head of accounting) who defines category rules, policies, and success metrics.
  • An IT/engineering contact who can connect Claude to your expense, card, and ERP systems or at least export/import batch files.
  • Someone comfortable iterating prompts and reviewing Claude’s outputs — this can be a power user in finance with light support from an AI engineer.

Reruption typically complements your team with the missing pieces: we bring the AI engineering, prompt design, and workflow automation expertise so your finance team can focus on validating results and refining business rules rather than building infrastructure from scratch.

Timelines depend on your system landscape and data readiness, but for a focused scope (e.g. travel and card expenses for one entity), you can usually see proof-of-value within a few weeks. In a typical setup:

  • Week 1: Scope definition, data access, and initial prompt design based on your chart of accounts and policies.
  • Weeks 2–3: Pilot on historical transactions, accuracy measurement, prompt and workflow refinement.
  • Weeks 4–6: Live pilot on current expenses with human-in-the-loop review and KPI tracking.

By the end of an initial 4–6 week period, most finance teams can quantify reductions in manual review time and improvements in categorization consistency, and decide whether to scale across more categories or entities.

The ROI comes from three main areas: reduced manual effort, faster and cleaner closing, and better spend control. For mid-sized and larger organizations processing thousands of expenses per month, it’s common to free up the equivalent of 0.5–2 FTE worth of manual categorization and chasing unclear entries. That time can be redirected to analysis, forecasting, and strategic projects.

On top of that, more accurate and timely categorization improves cost center and project reporting, which helps budget owners identify savings opportunities in travel, procurement, and subscriptions. While exact numbers depend on your baseline, many teams can justify the investment purely on labor and closing efficiency; the upside from better spend decisions is additional leverage rather than the only value driver.

Reruption works as a Co-Preneur — we embed with your team and build real solutions, not slide decks. For this specific use case, we typically start with our AI PoC offering (9,900€), where we:

  • Define the expense control use case and success metrics with your finance team.
  • Assess data sources (card feeds, T&E, ERP) and design the architecture for Claude-based classification.
  • Build a working prototype that classifies your own uncategorized expenses, including prompts, workflows, and basic dashboards.
  • Measure accuracy, speed, and cost per run, and outline a production rollout plan.

From there, we can stay on to harden the solution, integrate it with your existing tools, and help your finance team adopt new, AI-first ways of working. Because we operate in your P&L and move with high velocity, you get a tangible, tested system for AI-driven expense control in weeks, not months.

Contact Us!

0/10 min.

Contact Directly

Your Contact

Philipp M. W. Hoffmann

Founder & Partner

Address

Reruption GmbH

Falkertstraße 2

70176 Stuttgart

Social Media