The Challenge: Uncategorized Expense Entries

Uncategorized or vaguely coded expenses are a silent tax on your finance function. Employees submit credit card statements, travel receipts, and invoices with missing or generic categories like “Misc” or “Other”, leaving controllers and accountants to decipher descriptions and PDFs one by one. The result is slow month-end closing, inconsistent coding between teams and entities, and unreliable cost center or project views when management needs them most.

Traditional approaches rely on manual review, static expense policies, and basic rules in ERP or T&E systems. These rules quickly break down when merchants change descriptors, employees use different terms for the same thing, or new subscription and SaaS services appear. Shared mailboxes, spreadsheets, and manual journal adjustments might work for a small volume, but they do not scale across thousands of transactions per month or multiple legal entities.

The business impact is significant: misposted costs distort profitability by cost center, project, and customer. Controllers lose days each month chasing down unclear transactions instead of analyzing drivers of spend. Budget owners see outdated or incomplete reports and react too late to rein in travel, procurement, and software subscriptions. In the worst case, inconsistent coding weakens audit trails, increases the risk of policy violations or fraud going unnoticed, and undermines trust in your financial data.

The good news is that this problem is highly solvable with modern AI. By combining your historical postings with a tool like Claude that can read both transaction data and backing documents, you can turn uncategorized entries into clean, consistent, and auditable expense data. At Reruption, we’ve seen how AI-first approaches can replace fragile manual processes, and below we’ll walk through concrete ways to use Claude to regain control over expense categorization.

Need a sparring partner for this challenge?

Let's have a no-obligation chat and brainstorm together.

Innovators at these companies trust us:

Our Assessment

A strategic assessment of the challenge and high-level tips how to tackle it.

From Reruption’s perspective, using Claude to fix uncategorized expense entries is one of the most pragmatic starting points for AI in finance. You already have labeled historical data, clear policies, and a repetitive, text-heavy process that drains time from your team. With our hands-on experience building AI-powered document analysis and classification workflows, we’ve seen that combining Claude’s long-context understanding with targeted finance logic can quickly transform noisy expense data into reliable, real-time insight.

Treat Expense Categorization as a Data Quality Product, Not a One-Off Fix

Many finance teams approach uncategorized expenses as a “month-end clean-up task” instead of a product to design and continuously improve. To get value from Claude for expense classification, you need to think of your expense data as a product with clear owners, quality standards, and feedback loops. That means defining what “good” looks like: target classification accuracy, response time, and acceptable exception rates for manual review.

Strategically, this shifts the conversation from “Can AI tag some expenses?” to “How do we build a system that keeps our expense data clean at all times?” In practice, that involves product-like decisions: which data sources to include (card feeds, T&E, AP), how often to run classification, and how to surface AI outputs back into ERP or BI tools. When finance, IT, and controlling co-own this “data product”, you can iterate quickly on prompts, rules, and workflows instead of treating AI as a black box.

Start with High-Impact Categories and Clear Policies

Not all expense categories are equal. Strategically, you get the fastest ROI from Claude by focusing on areas where spend visibility and policy compliance matter most: travel and entertainment, software subscriptions, marketing, and specific project or customer-related costs. These usually have higher spend, more potential for leakage, and clearer rules that AI can learn.

Before you build anything, pressure-test your existing policies. If your travel policy is vague or cost center assignment rules are unclear, Claude will simply reflect that ambiguity. Use this as a trigger to refine category definitions, cost center mapping rules, and thresholds for approvals. A clear policy framework lets Claude learn consistent patterns, reduces edge cases, and makes the system easier for auditors and controllers to trust.

Design a Human-in-the-Loop Workflow from Day One

AI in finance should be assistive, not autonomous, especially for classification that affects financial statements. Strategically, you want Claude to handle the bulk of straightforward expenses while your finance team focuses on exceptions, policy conflicts, and potential fraud. This requires a designed human-in-the-loop workflow with clear escalation rules, not ad-hoc spot checks.

Define confidence thresholds up front: for example, classifications above 95% confidence and under a certain amount can be auto-posted, while anything below that or above a risk threshold routes to a reviewer. This protects data quality, builds trust with controllers, and creates training data: every human correction becomes a learning signal to refine prompts, rules, or models.

Align Stakeholders on Governance, Risk, and Compliance Early

For many CFOs, the biggest barrier to using AI in expense control isn’t technology, it’s governance. Risk, compliance, and internal audit need confidence that the system will not obscure who made which decision and why. Strategically, you should involve these stakeholders at design stage, not after deployment.

Clarify questions like: What documentation do we need for auditors? How do we log Claude’s suggestions, user overrides, and final postings? What are the approval rules for changing classification logic? By designing auditability and data lineage into your AI workflow, you avoid downstream resistance and unlock faster adoption. This is where Reruption’s focus on security, compliance, and AI-first architecture becomes particularly valuable.

Prepare Your Team for New Roles and Skills

When Claude takes over the repetitive part of expense categorization, your finance team’s work shifts from “doing” to “supervising and improving” the system. Strategically, you should anticipate this and invest in the skills to manage AI-driven processes: prompt design, reviewing AI outputs, defining heuristics, and interpreting classification metrics.

Controllers and accountants don’t need to become data scientists, but they do need a working understanding of how AI expense classification behaves, where it can fail, and how to provide structured feedback. Set expectations clearly: the goal is not to replace the team, but to let them move from low-value categorization work to higher-value analysis, forecasting, and scenario modeling.

Using Claude to clean up uncategorized expense entries is one of the most direct ways finance teams can turn messy data into reliable, real-time spend visibility. When you treat it as a data product, embed human-in-the-loop controls, and align governance from the start, you get both faster closing and stronger audit readiness. Reruption’s engineers and finance-focused consultants can help you scope, prototype, and harden such a solution quickly; if you’re exploring this use case, our AI PoC is a pragmatic way to test it on your own expense data before scaling further.

Need help implementing these ideas?

Feel free to reach out to us with no obligation.

Real-World Case Studies

From Logistics to Energy: Learn how companies successfully use Claude.

UPS

Logistics

UPS faced massive inefficiencies in delivery routing, with drivers navigating an astronomical number of possible route combinations—far exceeding the nanoseconds since Earth's existence. Traditional manual planning led to longer drive times, higher fuel consumption, and elevated operational costs, exacerbated by dynamic factors like traffic, package volumes, terrain, and customer availability. These issues not only inflated expenses but also contributed to significant CO2 emissions in an industry under pressure to go green. Key challenges included driver resistance to new technology, integration with legacy systems, and ensuring real-time adaptability without disrupting daily operations. Pilot tests revealed adoption hurdles, as drivers accustomed to familiar routes questioned the AI's suggestions, highlighting the human element in tech deployment. Scaling across 55,000 vehicles demanded robust infrastructure and data handling for billions of data points daily.

Lösung

UPS developed ORION (On-Road Integrated Optimization and Navigation), an AI-powered system blending operations research for mathematical optimization with machine learning for predictive analytics on traffic, weather, and delivery patterns. It dynamically recalculates routes in real-time, considering package destinations, vehicle capacity, right/left turn efficiencies, and stop sequences to minimize miles and time. The solution evolved from static planning to dynamic routing upgrades, incorporating agentic AI for autonomous decision-making. Training involved massive datasets from GPS telematics, with continuous ML improvements refining algorithms. Overcoming adoption challenges required driver training programs and gamification incentives, ensuring seamless integration via in-cab displays.

Ergebnisse

  • 100 million miles saved annually
  • $300-400 million cost savings per year
  • 10 million gallons of fuel reduced yearly
  • 100,000 metric tons CO2 emissions cut
  • 2-4 miles shorter routes per driver daily
  • 97% fleet deployment by 2021
Read case study →

DBS Bank

Banking

DBS Bank, Southeast Asia's leading financial institution, grappled with scaling AI from experiments to production amid surging fraud threats, demands for hyper-personalized customer experiences, and operational inefficiencies in service support. Traditional fraud detection systems struggled to process up to 15,000 data points per customer in real-time, leading to missed threats and suboptimal risk scoring. Personalization efforts were hampered by siloed data and lack of scalable algorithms for millions of users across diverse markets. Additionally, customer service teams faced overwhelming query volumes, with manual processes slowing response times and increasing costs. Regulatory pressures in banking demanded responsible AI governance, while talent shortages and integration challenges hindered enterprise-wide adoption. DBS needed a robust framework to overcome data quality issues, model drift, and ethical concerns in generative AI deployment, ensuring trust and compliance in a competitive Southeast Asian landscape.

Lösung

DBS launched an enterprise-wide AI program with over 20 use cases, leveraging machine learning for advanced fraud risk models and personalization, complemented by generative AI for an internal support assistant. Fraud models integrated vast datasets for real-time anomaly detection, while personalization algorithms delivered hyper-targeted nudges and investment ideas via the digibank app. A human-AI synergy approach empowered service teams with a GenAI assistant handling routine queries, drawing from internal knowledge bases. DBS emphasized responsible AI through governance frameworks, upskilling 40,000+ employees, and phased rollout starting with pilots in 2021, scaling production by 2024. Partnerships with tech leaders and Harvard-backed strategy ensured ethical scaling across fraud, personalization, and operations.

Ergebnisse

  • 17% increase in savings from prevented fraud attempts
  • Over 100 customized algorithms for customer analyses
  • 250,000 monthly queries processed efficiently by GenAI assistant
  • 20+ enterprise-wide AI use cases deployed
  • Analyzes up to 15,000 data points per customer for fraud
  • Boosted productivity by 20% via AI adoption (CEO statement)
Read case study →

Upstart

Banking

Traditional credit scoring relies heavily on FICO scores, which evaluate only a narrow set of factors like payment history and debt utilization, often rejecting creditworthy borrowers with thin credit files, non-traditional employment, or education histories that signal repayment ability. This results in up to 50% of potential applicants being denied despite low default risk, limiting lenders' ability to expand portfolios safely . Fintech lenders and banks faced the dual challenge of regulatory compliance under fair lending laws while seeking growth. Legacy models struggled with inaccurate risk prediction amid economic shifts, leading to higher defaults or conservative lending that missed opportunities in underserved markets . Upstart recognized that incorporating alternative data could unlock lending to millions previously excluded.

Lösung

Upstart developed an AI-powered lending platform using machine learning models that analyze over 1,600 variables, including education, job history, and bank transaction data, far beyond FICO's 20-30 inputs. Their gradient boosting algorithms predict default probability with higher precision, enabling safer approvals . The platform integrates via API with partner banks and credit unions, providing real-time decisions and fully automated underwriting for most loans. This shift from rule-based to data-driven scoring ensures fairness through explainable AI techniques like feature importance analysis . Implementation involved training models on billions of repayment events, continuously retraining to adapt to new data patterns .

Ergebnisse

  • 44% more loans approved vs. traditional models
  • 36% lower average interest rates for borrowers
  • 80% of loans fully automated
  • 73% fewer losses at equivalent approval rates
  • Adopted by 500+ banks and credit unions by 2024
  • 157% increase in approvals at same risk level
Read case study →

UC San Diego Health

Healthcare

Sepsis, a life-threatening condition, poses a major threat in emergency departments, with delayed detection contributing to high mortality rates—up to 20-30% in severe cases. At UC San Diego Health, an academic medical center handling over 1 million patient visits annually, nonspecific early symptoms made timely intervention challenging, exacerbating outcomes in busy ERs . A randomized study highlighted the need for proactive tools beyond traditional scoring systems like qSOFA. Hospital capacity management and patient flow were further strained post-COVID, with bed shortages leading to prolonged admission wait times and transfer delays. Balancing elective surgeries, emergencies, and discharges required real-time visibility . Safely integrating generative AI, such as GPT-4 in Epic, risked data privacy breaches and inaccurate clinical advice . These issues demanded scalable AI solutions to predict risks, streamline operations, and responsibly adopt emerging tech without compromising care quality.

Lösung

UC San Diego Health implemented COMPOSER, a deep learning model trained on electronic health records to predict sepsis risk up to 6-12 hours early, triggering Epic Best Practice Advisory (BPA) alerts for nurses . This quasi-experimental approach across two ERs integrated seamlessly with workflows . Mission Control, an AI-powered operations command center funded by $22M, uses predictive analytics for real-time bed assignments, patient transfers, and capacity forecasting, reducing bottlenecks . Led by Chief Health AI Officer Karandeep Singh, it leverages data from Epic for holistic visibility. For generative AI, pilots with Epic's GPT-4 enable NLP queries and automated patient replies, governed by strict safety protocols to mitigate hallucinations and ensure HIPAA compliance . This multi-faceted strategy addressed detection, flow, and innovation challenges.

Ergebnisse

  • Sepsis in-hospital mortality: 17% reduction
  • Lives saved annually: 50 across two ERs
  • Sepsis bundle compliance: Significant improvement
  • 72-hour SOFA score change: Reduced deterioration
  • ICU encounters: Decreased post-implementation
  • Patient throughput: Improved via Mission Control
Read case study →

DHL

Logistics

DHL, a global logistics giant, faced significant challenges from vehicle breakdowns and suboptimal maintenance schedules. Unpredictable failures in its vast fleet of delivery vehicles led to frequent delivery delays, increased operational costs, and frustrated customers. Traditional reactive maintenance—fixing issues only after they occurred—resulted in excessive downtime, with vehicles sidelined for hours or days, disrupting supply chains worldwide. Inefficiencies were compounded by varying fleet conditions across regions, making scheduled maintenance inefficient and wasteful, often over-maintaining healthy vehicles while under-maintaining others at risk. These issues not only inflated maintenance costs by up to 20% in some segments but also eroded customer trust through unreliable deliveries. With rising e-commerce demands, DHL needed a proactive approach to predict failures before they happened, minimizing disruptions in a highly competitive logistics industry.

Lösung

DHL implemented a predictive maintenance system leveraging IoT sensors installed on vehicles to collect real-time data on engine performance, tire wear, brakes, and more. This data feeds into machine learning models that analyze patterns, predict potential breakdowns, and recommend optimal maintenance timing. The AI solution integrates with DHL's existing fleet management systems, using algorithms like random forests and neural networks for anomaly detection and failure forecasting. Overcoming data silos and integration challenges, DHL partnered with tech providers to deploy edge computing for faster processing. Pilot programs in key hubs expanded globally, shifting from time-based to condition-based maintenance, ensuring resources focus on high-risk assets.

Ergebnisse

  • Vehicle downtime reduced by 15%
  • Maintenance costs lowered by 10%
  • Unplanned breakdowns decreased by 25%
  • On-time delivery rate improved by 12%
  • Fleet availability increased by 20%
  • Overall operational efficiency up 18%
Read case study →

Best Practices

Successful implementations follow proven patterns. Have a look at our tactical advice to get started.

Centralize Expense Data and Context for Claude

Claude delivers the best results when it sees the full picture of each expense: transaction data, descriptions, merchant information, receipts, and invoices. As a first step, work with IT to centralize inputs from your card provider, T&E tool, and AP system into a single pipeline or staging database that Claude can access. Include fields like GL account, cost center, project, vendor, and previous category assignments.

For long-context models, you can bundle multiple receipts or invoice PDFs into one request, letting Claude cross-reference descriptions against your chart of accounts and cost center hierarchies. This enables rules like “assign Uber rides to project cost centers if the description mentions the project code” that would be cumbersome to encode manually. Even if you start with file-based batches, make sure each transaction is enriched with as much structured data as possible before sending it to Claude.

Design a Robust Classification Prompt with Clear Instructions

Prompt design is crucial for consistent, auditable classification. Your prompt should explain your chart of accounts, cost center logic, and policy rules in concise but precise terms, then ask Claude to return a structured JSON output. Here’s a simplified example you can adapt:

System / Instruction to Claude:
You are an AI assistant helping a finance team classify business expenses.

Goals:
- Assign each expense to the correct GL account and cost center.
- Flag potential policy violations or suspicious transactions.

Use this chart of accounts (examples):
- 6100: Travel - Flights
- 6110: Travel - Hotels
- 6120: Travel - Ground Transport
- 6300: Software Subscriptions
- 6400: Marketing & Events
- 6999: Miscellaneous (use only if nothing else fits)

Rules:
- Prefer specific GL codes over Miscellaneous.
- If merchant or description indicates a known SaaS tool, use 6300.
- If a project code (e.g., PRJ-1234) appears, assign that cost center.
- Flag as "policy_violation": true if description suggests personal spend.

Return JSON only in this format:
{
  "gl_account": "<code>",
  "cost_center": "<id or null>",
  "confidence": <0-1>,
  "policy_violation": true/false,
  "notes": "<short rationale>"
}

Now classify this expense:
Merchant: <MERCHANT>
Amount: <AMOUNT>
Date: <DATE>
Description: <DESCRIPTION>
Receipt text: <EXTRACTED_TEXT_FROM_RECEIPT>

Iterate on this prompt with real data until Claude reliably picks your preferred categories and flags edge cases correctly. Small clarifications (for example, which keywords indicate software vs. marketing spend) can materially improve classification quality.

Implement Confidence Thresholds and Review Queues

To safely automate expense classification with Claude, you need a mechanism to distinguish between “safe to auto-post” and “requires review”. Use the confidence score returned by Claude, combined with transaction attributes, to route items accordingly. For example, you might auto-accept expenses under €200 with confidence > 0.97, while any transaction higher than €2,000 or with confidence < 0.9 goes to a human reviewer.

In your workflow tool (ERP, T&E, or a custom app), create distinct queues such as “AI Approved”, “AI Low Confidence”, and “AI Policy Alerts”. Reviewers should see Claude’s proposed category, confidence, and rationale so they can quickly accept or correct. Every override can be logged and periodically sampled as training data to refine prompts, additional rules, or even fine-tuned models in the future.

Use Claude to Normalize Merchant and Description Data

One root cause of uncategorized expenses is messy free-text: different spellings, abbreviations, or cryptic merchant names from card schemes. Claude is very effective at normalizing merchant and description text before classification, which improves consistency across your finance systems.

Introduce a pre-processing step where Claude maps raw strings to standardized values. For example:

Instruction to Claude:
You are cleaning expense transaction data for a finance system.
For each input, return:
{
  "normalized_merchant": "standardized merchant name",
  "normalized_purpose": "short, clear purpose of the spend",
  "tags": ["travel", "software", "subscription", ...]
}

Input:
Merchant: UBER *TRIP HELP.UBER.COM
Description: Ride from office to client PRJ-4589

Expected output:
{
  "normalized_merchant": "Uber",
  "normalized_purpose": "Taxi ride from office to client site",
  "tags": ["travel", "ground_transport", "client_meeting"]
}

You can then base your classification rules on normalized merchants and purposes, dramatically reducing the number of edge cases and improving reporting consistency.

Automate Policy Checks and Annotation for Audits

Beyond categorization, Claude can evaluate each expense against your travel and expense policies and pre-annotate transactions for audit readiness. Feed your policy text (limits, allowed categories, required justifications) into the prompt and ask Claude to flag potential violations or missing documentation.

For example, require Claude to output fields like "policy_flag", "reason", and "missing_docs". A sample configuration might look like:

Instruction to Claude:
Given the company travel policy and the expense details, assess compliance.
Return:
{
  "policy_flag": "none" | "limit_exceeded" | "personal_suspected" | "missing_receipt",
  "reason": "<short explanation>",
  "required_action": "ok" | "request_justification" | "deny_reimbursement"
}

These annotations can be stored alongside each posting, giving auditors a clear trace of what was checked, why something was flagged, and how it was resolved. Over time, you’ll see fewer ad-hoc email chains and more structured, searchable evidence.

Instrument KPIs and Run a Controlled Pilot

Before rolling out AI-driven categorization to all entities, run a pilot on a subset of transactions (for example, one business unit’s travel and software spend). Define clear KPIs such as classification accuracy vs. current baseline, reduction in manual touch time, time saved at month-end close, and policy violation detection rate.

During the pilot, sample a percentage of “AI Approved” transactions for manual quality checks and compare them with a control group processed using your old method. Adjust prompts and thresholds until you consistently hit agreed targets (e.g. > 96% accuracy and > 40% reduction in manual review time). Once validated, you can expand coverage to more categories and entities with realistic expectations on performance.

Implemented carefully, these practices typically lead to tangible results: finance teams often see a 30–60% reduction in manual expense review effort, closing times shortened by 1–3 days for affected entities, and a noticeable improvement in policy adherence and audit readiness. The exact metrics will depend on your baseline and data quality, but with a structured rollout, Claude can turn uncategorized expenses from a recurring headache into a controlled, largely automated process.

Need implementation expertise now?

Let's talk about your ideas!

Frequently Asked Questions

Claude reads the available data for each transaction — merchant name, amount, date, free-text description, and, if available, the receipt or invoice. Using a prompt tailored to your chart of accounts, cost centers, and expense policies, it proposes a GL account, cost center, and optional policy flags (for example, potential personal spend or missing documentation).

Technically, Claude converts your rules and historical examples into a pattern it can apply to new entries. You can have it return a structured JSON output with category, confidence score, and rationale, which your ERP or T&E system then uses either to auto-post low-risk items or route higher-risk items to a human reviewer.

You don’t need a large data science team to get started. The key ingredients are:

  • A finance owner (controller or head of accounting) who defines category rules, policies, and success metrics.
  • An IT/engineering contact who can connect Claude to your expense, card, and ERP systems or at least export/import batch files.
  • Someone comfortable iterating prompts and reviewing Claude’s outputs — this can be a power user in finance with light support from an AI engineer.

Reruption typically complements your team with the missing pieces: we bring the AI engineering, prompt design, and workflow automation expertise so your finance team can focus on validating results and refining business rules rather than building infrastructure from scratch.

Timelines depend on your system landscape and data readiness, but for a focused scope (e.g. travel and card expenses for one entity), you can usually see proof-of-value within a few weeks. In a typical setup:

  • Week 1: Scope definition, data access, and initial prompt design based on your chart of accounts and policies.
  • Weeks 2–3: Pilot on historical transactions, accuracy measurement, prompt and workflow refinement.
  • Weeks 4–6: Live pilot on current expenses with human-in-the-loop review and KPI tracking.

By the end of an initial 4–6 week period, most finance teams can quantify reductions in manual review time and improvements in categorization consistency, and decide whether to scale across more categories or entities.

The ROI comes from three main areas: reduced manual effort, faster and cleaner closing, and better spend control. For mid-sized and larger organizations processing thousands of expenses per month, it’s common to free up the equivalent of 0.5–2 FTE worth of manual categorization and chasing unclear entries. That time can be redirected to analysis, forecasting, and strategic projects.

On top of that, more accurate and timely categorization improves cost center and project reporting, which helps budget owners identify savings opportunities in travel, procurement, and subscriptions. While exact numbers depend on your baseline, many teams can justify the investment purely on labor and closing efficiency; the upside from better spend decisions is additional leverage rather than the only value driver.

Reruption works as a Co-Preneur — we embed with your team and build real solutions, not slide decks. For this specific use case, we typically start with our AI PoC offering (9,900€), where we:

  • Define the expense control use case and success metrics with your finance team.
  • Assess data sources (card feeds, T&E, ERP) and design the architecture for Claude-based classification.
  • Build a working prototype that classifies your own uncategorized expenses, including prompts, workflows, and basic dashboards.
  • Measure accuracy, speed, and cost per run, and outline a production rollout plan.

From there, we can stay on to harden the solution, integrate it with your existing tools, and help your finance team adopt new, AI-first ways of working. Because we operate in your P&L and move with high velocity, you get a tangible, tested system for AI-driven expense control in weeks, not months.

Contact Us!

0/10 min.

Contact Directly

Your Contact

Philipp M. W. Hoffmann

Founder & Partner

Address

Reruption GmbH

Falkertstraße 2

70176 Stuttgart

Social Media