The Challenge: Uncategorized Expense Entries

Uncategorized or vaguely coded expenses are a silent tax on your finance function. Employees submit credit card statements, travel receipts, and invoices with missing or generic categories like “Misc” or “Other”, leaving controllers and accountants to decipher descriptions and PDFs one by one. The result is slow month-end closing, inconsistent coding between teams and entities, and unreliable cost center or project views when management needs them most.

Traditional approaches rely on manual review, static expense policies, and basic rules in ERP or T&E systems. These rules quickly break down when merchants change descriptors, employees use different terms for the same thing, or new subscription and SaaS services appear. Shared mailboxes, spreadsheets, and manual journal adjustments might work for a small volume, but they do not scale across thousands of transactions per month or multiple legal entities.

The business impact is significant: misposted costs distort profitability by cost center, project, and customer. Controllers lose days each month chasing down unclear transactions instead of analyzing drivers of spend. Budget owners see outdated or incomplete reports and react too late to rein in travel, procurement, and software subscriptions. In the worst case, inconsistent coding weakens audit trails, increases the risk of policy violations or fraud going unnoticed, and undermines trust in your financial data.

The good news is that this problem is highly solvable with modern AI. By combining your historical postings with a tool like Claude that can read both transaction data and backing documents, you can turn uncategorized entries into clean, consistent, and auditable expense data. At Reruption, we’ve seen how AI-first approaches can replace fragile manual processes, and below we’ll walk through concrete ways to use Claude to regain control over expense categorization.

Need a sparring partner for this challenge?

Let's have a no-obligation chat and brainstorm together.

Innovators at these companies trust us:

Our Assessment

A strategic assessment of the challenge and high-level tips how to tackle it.

From Reruption’s perspective, using Claude to fix uncategorized expense entries is one of the most pragmatic starting points for AI in finance. You already have labeled historical data, clear policies, and a repetitive, text-heavy process that drains time from your team. With our hands-on experience building AI-powered document analysis and classification workflows, we’ve seen that combining Claude’s long-context understanding with targeted finance logic can quickly transform noisy expense data into reliable, real-time insight.

Treat Expense Categorization as a Data Quality Product, Not a One-Off Fix

Many finance teams approach uncategorized expenses as a “month-end clean-up task” instead of a product to design and continuously improve. To get value from Claude for expense classification, you need to think of your expense data as a product with clear owners, quality standards, and feedback loops. That means defining what “good” looks like: target classification accuracy, response time, and acceptable exception rates for manual review.

Strategically, this shifts the conversation from “Can AI tag some expenses?” to “How do we build a system that keeps our expense data clean at all times?” In practice, that involves product-like decisions: which data sources to include (card feeds, T&E, AP), how often to run classification, and how to surface AI outputs back into ERP or BI tools. When finance, IT, and controlling co-own this “data product”, you can iterate quickly on prompts, rules, and workflows instead of treating AI as a black box.

Start with High-Impact Categories and Clear Policies

Not all expense categories are equal. Strategically, you get the fastest ROI from Claude by focusing on areas where spend visibility and policy compliance matter most: travel and entertainment, software subscriptions, marketing, and specific project or customer-related costs. These usually have higher spend, more potential for leakage, and clearer rules that AI can learn.

Before you build anything, pressure-test your existing policies. If your travel policy is vague or cost center assignment rules are unclear, Claude will simply reflect that ambiguity. Use this as a trigger to refine category definitions, cost center mapping rules, and thresholds for approvals. A clear policy framework lets Claude learn consistent patterns, reduces edge cases, and makes the system easier for auditors and controllers to trust.

Design a Human-in-the-Loop Workflow from Day One

AI in finance should be assistive, not autonomous, especially for classification that affects financial statements. Strategically, you want Claude to handle the bulk of straightforward expenses while your finance team focuses on exceptions, policy conflicts, and potential fraud. This requires a designed human-in-the-loop workflow with clear escalation rules, not ad-hoc spot checks.

Define confidence thresholds up front: for example, classifications above 95% confidence and under a certain amount can be auto-posted, while anything below that or above a risk threshold routes to a reviewer. This protects data quality, builds trust with controllers, and creates training data: every human correction becomes a learning signal to refine prompts, rules, or models.

Align Stakeholders on Governance, Risk, and Compliance Early

For many CFOs, the biggest barrier to using AI in expense control isn’t technology, it’s governance. Risk, compliance, and internal audit need confidence that the system will not obscure who made which decision and why. Strategically, you should involve these stakeholders at design stage, not after deployment.

Clarify questions like: What documentation do we need for auditors? How do we log Claude’s suggestions, user overrides, and final postings? What are the approval rules for changing classification logic? By designing auditability and data lineage into your AI workflow, you avoid downstream resistance and unlock faster adoption. This is where Reruption’s focus on security, compliance, and AI-first architecture becomes particularly valuable.

Prepare Your Team for New Roles and Skills

When Claude takes over the repetitive part of expense categorization, your finance team’s work shifts from “doing” to “supervising and improving” the system. Strategically, you should anticipate this and invest in the skills to manage AI-driven processes: prompt design, reviewing AI outputs, defining heuristics, and interpreting classification metrics.

Controllers and accountants don’t need to become data scientists, but they do need a working understanding of how AI expense classification behaves, where it can fail, and how to provide structured feedback. Set expectations clearly: the goal is not to replace the team, but to let them move from low-value categorization work to higher-value analysis, forecasting, and scenario modeling.

Using Claude to clean up uncategorized expense entries is one of the most direct ways finance teams can turn messy data into reliable, real-time spend visibility. When you treat it as a data product, embed human-in-the-loop controls, and align governance from the start, you get both faster closing and stronger audit readiness. Reruption’s engineers and finance-focused consultants can help you scope, prototype, and harden such a solution quickly; if you’re exploring this use case, our AI PoC is a pragmatic way to test it on your own expense data before scaling further.

Need help implementing these ideas?

Feel free to reach out to us with no obligation.

Real-World Case Studies

From Logistics to Banking: Learn how companies successfully use Claude.

DHL

Logistics

DHL, a global logistics giant, faced significant challenges from vehicle breakdowns and suboptimal maintenance schedules. Unpredictable failures in its vast fleet of delivery vehicles led to frequent delivery delays, increased operational costs, and frustrated customers. Traditional reactive maintenance—fixing issues only after they occurred—resulted in excessive downtime, with vehicles sidelined for hours or days, disrupting supply chains worldwide. Inefficiencies were compounded by varying fleet conditions across regions, making scheduled maintenance inefficient and wasteful, often over-maintaining healthy vehicles while under-maintaining others at risk. These issues not only inflated maintenance costs by up to 20% in some segments but also eroded customer trust through unreliable deliveries. With rising e-commerce demands, DHL needed a proactive approach to predict failures before they happened, minimizing disruptions in a highly competitive logistics industry.

Lösung

DHL implemented a predictive maintenance system leveraging IoT sensors installed on vehicles to collect real-time data on engine performance, tire wear, brakes, and more. This data feeds into machine learning models that analyze patterns, predict potential breakdowns, and recommend optimal maintenance timing. The AI solution integrates with DHL's existing fleet management systems, using algorithms like random forests and neural networks for anomaly detection and failure forecasting. Overcoming data silos and integration challenges, DHL partnered with tech providers to deploy edge computing for faster processing. Pilot programs in key hubs expanded globally, shifting from time-based to condition-based maintenance, ensuring resources focus on high-risk assets.

Ergebnisse

  • Vehicle downtime reduced by 15%
  • Maintenance costs lowered by 10%
  • Unplanned breakdowns decreased by 25%
  • On-time delivery rate improved by 12%
  • Fleet availability increased by 20%
  • Overall operational efficiency up 18%
Read case study →

Walmart (Marketplace)

Retail

In the cutthroat arena of Walmart Marketplace, third-party sellers fiercely compete for the Buy Box, which accounts for the majority of sales conversions . These sellers manage vast inventories but struggle with manual pricing adjustments, which are too slow to keep pace with rapidly shifting competitor prices, demand fluctuations, and market trends. This leads to frequent loss of the Buy Box, missed sales opportunities, and eroded profit margins in a platform where price is the primary battleground . Additionally, sellers face data overload from monitoring thousands of SKUs, predicting optimal price points, and balancing competitiveness against profitability. Traditional static pricing strategies fail in this dynamic e-commerce environment, resulting in suboptimal performance and requiring excessive manual effort—often hours daily per seller . Walmart recognized the need for an automated solution to empower sellers and drive platform growth.

Lösung

Walmart launched the Repricer, a free AI-driven automated pricing tool integrated into Seller Center, leveraging generative AI for decision support alongside machine learning models like sequential decision intelligence to dynamically adjust prices in real-time . The tool analyzes competitor pricing, historical sales data, demand signals, and market conditions to recommend and implement optimal prices that maximize Buy Box eligibility and sales velocity . Complementing this, the Pricing Insights dashboard provides account-level metrics and AI-generated recommendations, including suggested prices for promotions, helping sellers identify opportunities without manual analysis . For advanced users, third-party tools like Biviar's AI repricer—commissioned by Walmart—enhance this with reinforcement learning for profit-maximizing daily pricing decisions . This ecosystem shifts sellers from reactive to proactive pricing strategies.

Ergebnisse

  • 25% increase in conversion rates from dynamic AI pricing
  • Higher Buy Box win rates through real-time competitor analysis
  • Maximized sales velocity for 3rd-party sellers on Marketplace
  • 850 million catalog data improvements via GenAI (broader impact)
  • 40%+ conversion boost potential from AI-driven offers
  • Reduced manual pricing time by hours daily per seller
Read case study →

NatWest

Banking

NatWest Group, a leading UK bank serving over 19 million customers, grappled with escalating demands for digital customer service. Traditional systems like the original Cora chatbot handled routine queries effectively but struggled with complex, nuanced interactions, often escalating 80-90% of cases to human agents. This led to delays, higher operational costs, and risks to customer satisfaction amid rising expectations for instant, personalized support . Simultaneously, the surge in financial fraud posed a critical threat, requiring seamless fraud reporting and detection within chat interfaces without compromising security or user trust. Regulatory compliance, data privacy under UK GDPR, and ethical AI deployment added layers of complexity, as the bank aimed to scale support while minimizing errors in high-stakes banking scenarios . Balancing innovation with reliability was paramount; poor AI performance could erode trust in a sector where customer satisfaction directly impacts retention and revenue .

Lösung

Cora+, launched in June 2024, marked NatWest's first major upgrade using generative AI to enable proactive, intuitive responses for complex queries, reducing escalations and enhancing self-service . This built on Cora's established platform, which already managed millions of interactions monthly. In a pioneering move, NatWest partnered with OpenAI in March 2025—becoming the first UK-headquartered bank to do so—integrating LLMs into both customer-facing Cora and internal tool Ask Archie. This allowed natural language processing for fraud reports, personalized advice, and process simplification while embedding safeguards for compliance and bias mitigation . The approach emphasized ethical AI, with rigorous testing, human oversight, and continuous monitoring to ensure safe, accurate interactions in fraud detection and service delivery .

Ergebnisse

  • 150% increase in Cora customer satisfaction scores (2024)
  • Proactive resolution of complex queries without human intervention
  • First UK bank OpenAI partnership, accelerating AI adoption
  • Enhanced fraud detection via real-time chat analysis
  • Millions of monthly interactions handled autonomously
  • Significant reduction in agent escalation rates
Read case study →

Tesla, Inc.

Automotive

The automotive industry faces a staggering 94% of traffic accidents attributed to human error, including distraction, fatigue, and poor judgment, resulting in over 1.3 million global road deaths annually. In the US alone, NHTSA data shows an average of one crash per 670,000 miles driven, highlighting the urgent need for advanced driver assistance systems (ADAS) to enhance safety and reduce fatalities. Tesla encountered specific hurdles in scaling vision-only autonomy, ditching radar and lidar for camera-based systems reliant on AI to mimic human perception. Challenges included variable AI performance in diverse conditions like fog, night, or construction zones, regulatory scrutiny over misleading Level 2 labeling despite Level 4-like demos, and ensuring robust driver monitoring to prevent over-reliance. Past incidents and studies criticized inconsistent computer vision reliability.

Lösung

Tesla's Autopilot and Full Self-Driving (FSD) Supervised leverage end-to-end deep learning neural networks trained on billions of real-world miles, processing camera feeds for perception, prediction, and control without modular rules. Transitioning from HydraNet (multi-task learning for 30+ outputs) to pure end-to-end models, FSD v14 achieves door-to-door driving via video-based imitation learning. Overcoming challenges, Tesla scaled data collection from its fleet of 6M+ vehicles, using Dojo supercomputers for training on petabytes of video. Vision-only approach cuts costs vs. lidar rivals, with recent upgrades like new cameras addressing edge cases. Regulatory pushes target unsupervised FSD by end-2025, with China approval eyed for 2026.

Ergebnisse

  • Autopilot Crash Rate: 1 per 6.36M miles (Q3 2025)
  • Safety Multiple: 9x safer than US average (670K miles/crash)
  • Fleet Data: Billions of miles for training
  • FSD v14: Door-to-door autonomy achieved
  • Q2 2025: 1 crash per 6.69M miles
  • 2024 Q4 Record: 5.94M miles between accidents
Read case study →

Amazon

Retail

In the vast e-commerce landscape, online shoppers face significant hurdles in product discovery and decision-making. With millions of products available, customers often struggle to find items matching their specific needs, compare options, or get quick answers to nuanced questions about features, compatibility, and usage. Traditional search bars and static listings fall short, leading to shopping cart abandonment rates as high as 70% industry-wide and prolonged decision times that frustrate users. Amazon, serving over 300 million active customers, encountered amplified challenges during peak events like Prime Day, where query volumes spiked dramatically. Shoppers demanded personalized, conversational assistance akin to in-store help, but scaling human support was impossible. Issues included handling complex, multi-turn queries, integrating real-time inventory and pricing data, and ensuring recommendations complied with safety and accuracy standards amid a $500B+ catalog.

Lösung

Amazon developed Rufus, a generative AI-powered conversational shopping assistant embedded in the Amazon Shopping app and desktop. Rufus leverages a custom-built large language model (LLM) fine-tuned on Amazon's product catalog, customer reviews, and web data, enabling natural, multi-turn conversations to answer questions, compare products, and provide tailored recommendations. Powered by Amazon Bedrock for scalability and AWS Trainium/Inferentia chips for efficient inference, Rufus scales to millions of sessions without latency issues. It incorporates agentic capabilities for tasks like cart addition, price tracking, and deal hunting, overcoming prior limitations in personalization by accessing user history and preferences securely. Implementation involved iterative testing, starting with beta in February 2024, expanding to all US users by September, and global rollouts, addressing hallucination risks through grounding techniques and human-in-loop safeguards.

Ergebnisse

  • 60% higher purchase completion rate for Rufus users
  • $10B projected additional sales from Rufus
  • 250M+ customers used Rufus in 2025
  • Monthly active users up 140% YoY
  • Interactions surged 210% YoY
  • Black Friday sales sessions +100% with Rufus
  • 149% jump in Rufus users recently
Read case study →

Best Practices

Successful implementations follow proven patterns. Have a look at our tactical advice to get started.

Centralize Expense Data and Context for Claude

Claude delivers the best results when it sees the full picture of each expense: transaction data, descriptions, merchant information, receipts, and invoices. As a first step, work with IT to centralize inputs from your card provider, T&E tool, and AP system into a single pipeline or staging database that Claude can access. Include fields like GL account, cost center, project, vendor, and previous category assignments.

For long-context models, you can bundle multiple receipts or invoice PDFs into one request, letting Claude cross-reference descriptions against your chart of accounts and cost center hierarchies. This enables rules like “assign Uber rides to project cost centers if the description mentions the project code” that would be cumbersome to encode manually. Even if you start with file-based batches, make sure each transaction is enriched with as much structured data as possible before sending it to Claude.

Design a Robust Classification Prompt with Clear Instructions

Prompt design is crucial for consistent, auditable classification. Your prompt should explain your chart of accounts, cost center logic, and policy rules in concise but precise terms, then ask Claude to return a structured JSON output. Here’s a simplified example you can adapt:

System / Instruction to Claude:
You are an AI assistant helping a finance team classify business expenses.

Goals:
- Assign each expense to the correct GL account and cost center.
- Flag potential policy violations or suspicious transactions.

Use this chart of accounts (examples):
- 6100: Travel - Flights
- 6110: Travel - Hotels
- 6120: Travel - Ground Transport
- 6300: Software Subscriptions
- 6400: Marketing & Events
- 6999: Miscellaneous (use only if nothing else fits)

Rules:
- Prefer specific GL codes over Miscellaneous.
- If merchant or description indicates a known SaaS tool, use 6300.
- If a project code (e.g., PRJ-1234) appears, assign that cost center.
- Flag as "policy_violation": true if description suggests personal spend.

Return JSON only in this format:
{
  "gl_account": "<code>",
  "cost_center": "<id or null>",
  "confidence": <0-1>,
  "policy_violation": true/false,
  "notes": "<short rationale>"
}

Now classify this expense:
Merchant: <MERCHANT>
Amount: <AMOUNT>
Date: <DATE>
Description: <DESCRIPTION>
Receipt text: <EXTRACTED_TEXT_FROM_RECEIPT>

Iterate on this prompt with real data until Claude reliably picks your preferred categories and flags edge cases correctly. Small clarifications (for example, which keywords indicate software vs. marketing spend) can materially improve classification quality.

Implement Confidence Thresholds and Review Queues

To safely automate expense classification with Claude, you need a mechanism to distinguish between “safe to auto-post” and “requires review”. Use the confidence score returned by Claude, combined with transaction attributes, to route items accordingly. For example, you might auto-accept expenses under €200 with confidence > 0.97, while any transaction higher than €2,000 or with confidence < 0.9 goes to a human reviewer.

In your workflow tool (ERP, T&E, or a custom app), create distinct queues such as “AI Approved”, “AI Low Confidence”, and “AI Policy Alerts”. Reviewers should see Claude’s proposed category, confidence, and rationale so they can quickly accept or correct. Every override can be logged and periodically sampled as training data to refine prompts, additional rules, or even fine-tuned models in the future.

Use Claude to Normalize Merchant and Description Data

One root cause of uncategorized expenses is messy free-text: different spellings, abbreviations, or cryptic merchant names from card schemes. Claude is very effective at normalizing merchant and description text before classification, which improves consistency across your finance systems.

Introduce a pre-processing step where Claude maps raw strings to standardized values. For example:

Instruction to Claude:
You are cleaning expense transaction data for a finance system.
For each input, return:
{
  "normalized_merchant": "standardized merchant name",
  "normalized_purpose": "short, clear purpose of the spend",
  "tags": ["travel", "software", "subscription", ...]
}

Input:
Merchant: UBER *TRIP HELP.UBER.COM
Description: Ride from office to client PRJ-4589

Expected output:
{
  "normalized_merchant": "Uber",
  "normalized_purpose": "Taxi ride from office to client site",
  "tags": ["travel", "ground_transport", "client_meeting"]
}

You can then base your classification rules on normalized merchants and purposes, dramatically reducing the number of edge cases and improving reporting consistency.

Automate Policy Checks and Annotation for Audits

Beyond categorization, Claude can evaluate each expense against your travel and expense policies and pre-annotate transactions for audit readiness. Feed your policy text (limits, allowed categories, required justifications) into the prompt and ask Claude to flag potential violations or missing documentation.

For example, require Claude to output fields like "policy_flag", "reason", and "missing_docs". A sample configuration might look like:

Instruction to Claude:
Given the company travel policy and the expense details, assess compliance.
Return:
{
  "policy_flag": "none" | "limit_exceeded" | "personal_suspected" | "missing_receipt",
  "reason": "<short explanation>",
  "required_action": "ok" | "request_justification" | "deny_reimbursement"
}

These annotations can be stored alongside each posting, giving auditors a clear trace of what was checked, why something was flagged, and how it was resolved. Over time, you’ll see fewer ad-hoc email chains and more structured, searchable evidence.

Instrument KPIs and Run a Controlled Pilot

Before rolling out AI-driven categorization to all entities, run a pilot on a subset of transactions (for example, one business unit’s travel and software spend). Define clear KPIs such as classification accuracy vs. current baseline, reduction in manual touch time, time saved at month-end close, and policy violation detection rate.

During the pilot, sample a percentage of “AI Approved” transactions for manual quality checks and compare them with a control group processed using your old method. Adjust prompts and thresholds until you consistently hit agreed targets (e.g. > 96% accuracy and > 40% reduction in manual review time). Once validated, you can expand coverage to more categories and entities with realistic expectations on performance.

Implemented carefully, these practices typically lead to tangible results: finance teams often see a 30–60% reduction in manual expense review effort, closing times shortened by 1–3 days for affected entities, and a noticeable improvement in policy adherence and audit readiness. The exact metrics will depend on your baseline and data quality, but with a structured rollout, Claude can turn uncategorized expenses from a recurring headache into a controlled, largely automated process.

Need implementation expertise now?

Let's talk about your ideas!

Frequently Asked Questions

Claude reads the available data for each transaction — merchant name, amount, date, free-text description, and, if available, the receipt or invoice. Using a prompt tailored to your chart of accounts, cost centers, and expense policies, it proposes a GL account, cost center, and optional policy flags (for example, potential personal spend or missing documentation).

Technically, Claude converts your rules and historical examples into a pattern it can apply to new entries. You can have it return a structured JSON output with category, confidence score, and rationale, which your ERP or T&E system then uses either to auto-post low-risk items or route higher-risk items to a human reviewer.

You don’t need a large data science team to get started. The key ingredients are:

  • A finance owner (controller or head of accounting) who defines category rules, policies, and success metrics.
  • An IT/engineering contact who can connect Claude to your expense, card, and ERP systems or at least export/import batch files.
  • Someone comfortable iterating prompts and reviewing Claude’s outputs — this can be a power user in finance with light support from an AI engineer.

Reruption typically complements your team with the missing pieces: we bring the AI engineering, prompt design, and workflow automation expertise so your finance team can focus on validating results and refining business rules rather than building infrastructure from scratch.

Timelines depend on your system landscape and data readiness, but for a focused scope (e.g. travel and card expenses for one entity), you can usually see proof-of-value within a few weeks. In a typical setup:

  • Week 1: Scope definition, data access, and initial prompt design based on your chart of accounts and policies.
  • Weeks 2–3: Pilot on historical transactions, accuracy measurement, prompt and workflow refinement.
  • Weeks 4–6: Live pilot on current expenses with human-in-the-loop review and KPI tracking.

By the end of an initial 4–6 week period, most finance teams can quantify reductions in manual review time and improvements in categorization consistency, and decide whether to scale across more categories or entities.

The ROI comes from three main areas: reduced manual effort, faster and cleaner closing, and better spend control. For mid-sized and larger organizations processing thousands of expenses per month, it’s common to free up the equivalent of 0.5–2 FTE worth of manual categorization and chasing unclear entries. That time can be redirected to analysis, forecasting, and strategic projects.

On top of that, more accurate and timely categorization improves cost center and project reporting, which helps budget owners identify savings opportunities in travel, procurement, and subscriptions. While exact numbers depend on your baseline, many teams can justify the investment purely on labor and closing efficiency; the upside from better spend decisions is additional leverage rather than the only value driver.

Reruption works as a Co-Preneur — we embed with your team and build real solutions, not slide decks. For this specific use case, we typically start with our AI PoC offering (9,900€), where we:

  • Define the expense control use case and success metrics with your finance team.
  • Assess data sources (card feeds, T&E, ERP) and design the architecture for Claude-based classification.
  • Build a working prototype that classifies your own uncategorized expenses, including prompts, workflows, and basic dashboards.
  • Measure accuracy, speed, and cost per run, and outline a production rollout plan.

From there, we can stay on to harden the solution, integrate it with your existing tools, and help your finance team adopt new, AI-first ways of working. Because we operate in your P&L and move with high velocity, you get a tangible, tested system for AI-driven expense control in weeks, not months.

Contact Us!

0/10 min.

Contact Directly

Your Contact

Philipp M. W. Hoffmann

Founder & Partner

Address

Reruption GmbH

Falkertstraße 2

70176 Stuttgart

Social Media