The Challenge: Uncategorized Expense Entries

In many organisations, employees submit expenses with vague or missing categories: “client meeting”, “software”, “travel”, or simply nothing at all. Finance teams are left guessing which GL account, cost center or project code is correct. The result is slow and painful month-end closing, endless back-and-forth with employees, and a constant fear of misposted costs hiding in the ledger.

Traditional approaches rely on static expense policies, manual spreadsheets, and after-the-fact clean-up by overworked accountants. Rules in ERP or expense tools can only go so far: they struggle with messy free-text descriptions, mixed-language receipts, and new vendors that don’t match existing mappings. Delegating the work to junior staff or shared service centres doesn’t remove the problem either – it just moves the bottleneck and introduces more room for human error.

The business impact is significant. Misclassified or uncategorised entries distort your view of spend by cost center and project, making it harder to control travel, SaaS, and procurement costs in time to act. Closing takes longer, forecast quality suffers, and finance loses credibility as a strategic partner when reports have to be restated. Worse, policy violations and potential fraud can hide inside a blob of “miscellaneous” expenses that no one has time to investigate properly.

The challenge is real, but it is solvable. Modern AI—especially tools like ChatGPT for expense classification—can read descriptions and receipts, infer the most likely category, and apply your finance policies at scale. At Reruption, we’ve helped teams turn similar unstructured, manual processes into robust AI-powered workflows. In the sections below, you’ll find practical guidance on how to use ChatGPT to bring order to uncategorised expenses and regain trustworthy spend visibility.

Need a sparring partner for this challenge?

Let's have a no-obligation chat and brainstorm together.

Innovators at these companies trust us:

Our Assessment

A strategic assessment of the challenge and high-level tips how to tackle it.

From Reruption’s work building AI-powered internal tools and automations, we’ve seen that uncategorised expense entries are exactly the kind of repetitive, judgement-based pattern that ChatGPT for finance handles well. When you combine your existing chart of accounts and policies with a tailored ChatGPT workflow, you can dramatically reduce manual classification while keeping finance in full control of the rules.

Treat Expense Classification as a Policy Engine, Not a Guessing Game

Many teams approach AI for expense categorisation as a black box that simply “guesses” a GL code. Strategically, you should treat ChatGPT as a policy engine that operates under explicit guardrails: your chart of accounts, cost center structure, and expense policy become the reference system the model must follow. This shifts the mindset from hoping the AI is right to designing clear, auditable decision criteria.

Define which inputs ChatGPT should consider (description, merchant, amount, cost center, project, date, country) and which policies it must always enforce (e.g., no alcohol on cost centers 4xxx, no hardware on travel budgets). The more you encode your internal logic into the prompts and system design, the more reliable and compliant the output becomes.

Start with a Focused Pilot on One Expense Category Cluster

Rather than automating every expense in one go, focus your initial ChatGPT deployment on 1–2 high-friction clusters: for example, travel and entertainment or software & subscriptions. These categories usually produce many uncategorised entries and require repetitive interpretation of descriptions like “SaaS invoice”, “Uber ride”, or “client dinner”.

A focused pilot lets you measure impact (e.g., reduction in manual touches, improved policy adherence) without risking disruption to your entire close process. It also helps your finance team build trust in AI-driven suggestions, because they can compare AI classifications with their own judgement in a contained, low-risk area before scaling.

Design a Human-in-the-Loop Workflow from Day One

For finance leaders, control and auditability are non-negotiable. Strategically, you should design human-in-the-loop approval into your ChatGPT workflow instead of aiming for full autonomy from the start. The model proposes categories and explains its reasoning; your team validates, corrects, and uses that feedback to improve future performance.

This approach reduces risk, helps auditors understand how decisions are made, and provides a safety net for edge cases such as unusual vendors, mixed business/private receipts, or new project codes. Over time, as error rates drop and confidence rises, you can increase the share of transactions that are auto-approved and reserve human review for flagged anomalies only.

Involve Finance, Not Just IT, in Designing the AI Rules

Successful use of ChatGPT for finance workflows is not an IT project with a finance stakeholder; it is a finance project with AI capabilities. The people who currently classify expenses understand the subtle patterns (e.g., how to distinguish marketing from sales spend, or which trainings belong to which cost center) that must be captured in prompts, examples, and evaluation criteria.

Bring experienced accountants and controllers into working sessions where you co-design the rules, examples, and exceptions that ChatGPT should follow. This aligns the AI’s behaviour with your real-world judgement and reduces resistance later, because the team recognises their own expertise in the system’s output.

Plan Governance, Data Protection and Change Management Upfront

Using ChatGPT in finance introduces new governance questions: which data can be sent to the model, how outputs are logged, and how to demonstrate compliance during audits. Strategically, you need clear policies on data anonymisation or pseudonymisation (e.g., masking employee names), retention of model inputs/outputs, and role-based access to the AI tools.

At the same time, don’t underestimate change management. Controllers and accountants need to understand when to trust the AI, when to override it, and how their role shifts from “doing every classification” to “designing and supervising the classification system”. Transparent communication and training reduce fear and help your team see ChatGPT as leverage, not a threat.

Used with the right strategy, ChatGPT can turn uncategorised expenses from a chronic manual headache into a controlled, auditable workflow that reflects your real cost structure. The key is to embed your policies, keep finance in the driver’s seat, and scale from a well-designed pilot. Reruption’s team combines deep AI engineering with hands-on finance process experience, so if you want to explore a proof of concept or bring a classification prototype into your existing tools, we’re ready to help you design and implement it in your real environment.

Need help implementing these ideas?

Feel free to reach out to us with no obligation.

Real-World Case Studies

From Healthcare to News Media: Learn how companies successfully use ChatGPT.

AstraZeneca

Healthcare

In the highly regulated pharmaceutical industry, AstraZeneca faced immense pressure to accelerate drug discovery and clinical trials, which traditionally take 10-15 years and cost billions, with low success rates of under 10%. Data silos, stringent compliance requirements (e.g., FDA regulations), and manual knowledge work hindered efficiency across R&D and business units. Researchers struggled with analyzing vast datasets from 3D imaging, literature reviews, and protocol drafting, leading to delays in bringing therapies to patients. Scaling AI was complicated by data privacy concerns, integration into legacy systems, and ensuring AI outputs were reliable in a high-stakes environment. Without rapid adoption, AstraZeneca risked falling behind competitors leveraging AI for faster innovation toward 2030 ambitions of novel medicines.

Lösung

AstraZeneca launched an enterprise-wide generative AI strategy, deploying ChatGPT Enterprise customized for pharma workflows. This included AI assistants for 3D molecular imaging analysis, automated clinical trial protocol drafting, and knowledge synthesis from scientific literature. They partnered with OpenAI for secure, scalable LLMs and invested in training: ~12,000 employees across R&D and functions completed GenAI programs by mid-2025. Infrastructure upgrades, like AMD Instinct MI300X GPUs, optimized model training. Governance frameworks ensured compliance, with human-in-loop validation for critical tasks. Rollout phased from pilots in 2023-2024 to full scaling in 2025, focusing on R&D acceleration via GenAI for molecule design and real-world evidence analysis.

Ergebnisse

  • ~12,000 employees trained on generative AI by mid-2025
  • 85-93% of staff reported productivity gains
  • 80% of medical writers found AI protocol drafts useful
  • Significant reduction in life sciences model training time via MI300X GPUs
  • High AI maturity ranking per IMD Index (top global)
  • GenAI enabling faster trial design and dose selection
Read case study →

AT&T

Telecommunications

As a leading telecom operator, AT&T manages one of the world's largest and most complex networks, spanning millions of cell sites, fiber optics, and 5G infrastructure. The primary challenges included inefficient network planning and optimization, such as determining optimal cell site placement and spectrum acquisition amid exploding data demands from 5G rollout and IoT growth. Traditional methods relied on manual analysis, leading to suboptimal resource allocation and higher capital expenditures. Additionally, reactive network maintenance caused frequent outages, with anomaly detection lagging behind real-time needs. Detecting and fixing issues proactively was critical to minimize downtime, but vast data volumes from network sensors overwhelmed legacy systems. This resulted in increased operational costs, customer dissatisfaction, and delayed 5G deployment. AT&T needed scalable AI to predict failures, automate healing, and forecast demand accurately.

Lösung

AT&T integrated machine learning and predictive analytics through its AT&T Labs, developing models for network design including spectrum refarming and cell site optimization. AI algorithms analyze geospatial data, traffic patterns, and historical performance to recommend ideal tower locations, reducing build costs. For operations, anomaly detection and self-healing systems use predictive models on NFV (Network Function Virtualization) to forecast failures and automate fixes, like rerouting traffic. Causal AI extends beyond correlations for root-cause analysis in churn and network issues. Implementation involved edge-to-edge intelligence, deploying AI across 100,000+ engineers' workflows.

Ergebnisse

  • Billions of dollars saved in network optimization costs
  • 20-30% improvement in network utilization and efficiency
  • Significant reduction in truck rolls and manual interventions
  • Proactive detection of anomalies preventing major outages
  • Optimized cell site placement reducing CapEx by millions
  • Enhanced 5G forecasting accuracy by up to 40%
Read case study →

Airbus

Aerospace

In aircraft design, computational fluid dynamics (CFD) simulations are essential for predicting airflow around wings, fuselages, and novel configurations critical to fuel efficiency and emissions reduction. However, traditional high-fidelity RANS solvers require hours to days per run on supercomputers, limiting engineers to just a few dozen iterations per design cycle and stifling innovation for next-gen hydrogen-powered aircraft like ZEROe. This computational bottleneck was particularly acute amid Airbus' push for decarbonized aviation by 2035, where complex geometries demand exhaustive exploration to optimize lift-drag ratios while minimizing weight. Collaborations with DLR and ONERA highlighted the need for faster tools, as manual tuning couldn't scale to test thousands of variants needed for laminar flow or blended-wing-body concepts.

Lösung

Machine learning surrogate models, including physics-informed neural networks (PINNs), were trained on vast CFD datasets to emulate full simulations in milliseconds. Airbus integrated these into a generative design pipeline, where AI predicts pressure fields, velocities, and forces, enforcing Navier-Stokes physics via hybrid loss functions for accuracy. Development involved curating millions of simulation snapshots from legacy runs, GPU-accelerated training, and iterative fine-tuning with experimental wind-tunnel data. This enabled rapid iteration: AI screens designs, high-fidelity CFD verifies top candidates, slashing overall compute by orders of magnitude while maintaining <5% error on key metrics.

Ergebnisse

  • Simulation time: 1 hour → 30 ms (120,000x speedup)
  • Design iterations: +10,000 per cycle in same timeframe
  • Prediction accuracy: 95%+ for lift/drag coefficients
  • 50% reduction in design phase timeline
  • 30-40% fewer high-fidelity CFD runs required
  • Fuel burn optimization: up to 5% improvement in predictions
Read case study →

Amazon

Retail

In the vast e-commerce landscape, online shoppers face significant hurdles in product discovery and decision-making. With millions of products available, customers often struggle to find items matching their specific needs, compare options, or get quick answers to nuanced questions about features, compatibility, and usage. Traditional search bars and static listings fall short, leading to shopping cart abandonment rates as high as 70% industry-wide and prolonged decision times that frustrate users. Amazon, serving over 300 million active customers, encountered amplified challenges during peak events like Prime Day, where query volumes spiked dramatically. Shoppers demanded personalized, conversational assistance akin to in-store help, but scaling human support was impossible. Issues included handling complex, multi-turn queries, integrating real-time inventory and pricing data, and ensuring recommendations complied with safety and accuracy standards amid a $500B+ catalog.

Lösung

Amazon developed Rufus, a generative AI-powered conversational shopping assistant embedded in the Amazon Shopping app and desktop. Rufus leverages a custom-built large language model (LLM) fine-tuned on Amazon's product catalog, customer reviews, and web data, enabling natural, multi-turn conversations to answer questions, compare products, and provide tailored recommendations. Powered by Amazon Bedrock for scalability and AWS Trainium/Inferentia chips for efficient inference, Rufus scales to millions of sessions without latency issues. It incorporates agentic capabilities for tasks like cart addition, price tracking, and deal hunting, overcoming prior limitations in personalization by accessing user history and preferences securely. Implementation involved iterative testing, starting with beta in February 2024, expanding to all US users by September, and global rollouts, addressing hallucination risks through grounding techniques and human-in-loop safeguards.

Ergebnisse

  • 60% higher purchase completion rate for Rufus users
  • $10B projected additional sales from Rufus
  • 250M+ customers used Rufus in 2025
  • Monthly active users up 140% YoY
  • Interactions surged 210% YoY
  • Black Friday sales sessions +100% with Rufus
  • 149% jump in Rufus users recently
Read case study →

American Eagle Outfitters

Apparel Retail

In the competitive apparel retail landscape, American Eagle Outfitters faced significant hurdles in fitting rooms, where customers crave styling advice, accurate sizing, and complementary item suggestions without waiting for overtaxed associates . Peak-hour staff shortages often resulted in frustrated shoppers abandoning carts, low try-on rates, and missed conversion opportunities, as traditional in-store experiences lagged behind personalized e-commerce . Early efforts like beacon technology in 2014 doubled fitting room entry odds but lacked depth in real-time personalization . Compounding this, data silos between online and offline hindered unified customer insights, making it tough to match items to individual style preferences, body types, or even skin tones dynamically. American Eagle needed a scalable solution to boost engagement and loyalty in flagship stores while experimenting with AI for broader impact .

Lösung

American Eagle partnered with Aila Technologies to deploy interactive fitting room kiosks powered by computer vision and machine learning, rolled out in 2019 at flagship locations in Boston, Las Vegas, and San Francisco . Customers scan garments via iOS devices, triggering CV algorithms to identify items and ML models—trained on purchase history and Google Cloud data—to suggest optimal sizes, colors, and outfit complements tailored to inferred style and preferences . Integrated with Google Cloud's ML capabilities, the system enables real-time recommendations, associate alerts for assistance, and seamless inventory checks, evolving from beacon lures to a full smart assistant . This experimental approach, championed by CMO Craig Brommers, fosters an AI culture for personalization at scale .

Ergebnisse

  • Double-digit conversion gains from AI personalization
  • 11% comparable sales growth for Aerie brand Q3 2025
  • 4% overall comparable sales increase Q3 2025
  • 29% EPS growth to $0.53 Q3 2025
  • Doubled fitting room try-on odds via early tech
  • Record Q3 revenue of $1.36B
Read case study →

Best Practices

Successful implementations follow proven patterns. Have a look at our tactical advice to get started.

Create a Standardised Classification Schema for ChatGPT

Before you write a single prompt, align on a standardised classification schema that ChatGPT must follow. This includes your chart of accounts, cost centers, project codes, tax codes (if relevant), and allowed expense categories per employee group or country. Put this schema into a concise, machine-readable form (e.g., JSON or a structured table) that can be embedded into prompts or retrieved via an internal API.

Here is an example of how you might present the allowed outputs in a prompt:

Allowed_GL_accounts = [
  {"code": "6100", "name": "Travel - Transport"},
  {"code": "6110", "name": "Travel - Accommodation"},
  {"code": "6120", "name": "Travel - Meals & Entertainment"},
  {"code": "6400", "name": "Software Subscriptions"},
  {"code": "6500", "name": "Office Supplies"}
]

Allowed_cost_centers = ["100-Marketing", "200-Sales", "300-IT", "400-Operations"]

By constraining ChatGPT to this schema, you reduce variance in the output and make it much easier to integrate with your ERP or expense system.

Use Structured Prompts to Infer Category, Cost Center and Policy Violations

For operational reliability, your prompts should ask ChatGPT to return a structured JSON output with specific fields: suggested GL account, cost center, project, policy compliance flag, and a short explanation. This makes it straightforward to parse and feed into your finance systems. Include the raw description, vendor name, amount, date and any existing metadata from your expense tool.

An example classification prompt could look like this:

You are an AI assistant helping a corporate finance team classify employee expenses.

Tasks:
1) Propose the most likely GL account from Allowed_GL_accounts.
2) Suggest the most plausible cost center from Allowed_cost_centers based on description.
3) Check if the expense violates company policy.
4) Explain your reasoning briefly.

Company expense policy (summary):
- Alcohol is not reimbursable.
- First-class travel is not reimbursable.
- Software purchases must go to GL 6400 and cost center 300-IT, unless clearly marketing tools.

Return ONLY valid JSON in this format:
{
  "gl_account": "code from Allowed_GL_accounts",
  "cost_center": "value from Allowed_cost_centers",
  "policy_violation": true/false,
  "violation_reason": "string or empty",
  "explanation": "short explanation for finance team"
}

Allowed_GL_accounts: <insert schema here>
Allowed_cost_centers: <insert schema here>

Expense to classify:
Description: "Uber to client meeting in Berlin"
Vendor: "Uber"
Amount: 32.50 EUR
Country: Germany

This level of structure turns ChatGPT’s output into a reliable building block for automation rather than a free-text suggestion.

Integrate via API into Your Existing Expense or ERP Workflow

To get real value, embed ChatGPT via API directly into the tools your team already uses—whether that’s your expense management system, card platform, or ERP workflow. When a new expense is submitted or imported, your middleware service can call the ChatGPT API with the relevant data, then write back the suggested codes and explanations.

A typical technical sequence looks like this:

1) Expense is created in your expense tool → 2) Webhook triggers a small integration service → 3) Service enriches data (e.g., maps merchant to known vendor categories) and calls ChatGPT with your standard prompt → 4) ChatGPT returns structured JSON with classification and flags → 5) Service writes suggestions and explanation back to the expense record for user or finance review.

This way, reviewers see AI suggestions directly in their normal approval screen, avoiding context switching or manual copy-paste.

Configure Confidence Thresholds and Routing Rules

Not every AI suggestion should be auto-approved. Implement confidence thresholds and routing rules that decide which expenses are auto-posted and which go to manual review. You can estimate confidence from ChatGPT by asking it to return a self-rated confidence score (0–100) or by building a simple heuristic (e.g., whether the explanation contains uncertainty phrases).

Example pattern in your prompt:

Additionally, estimate a confidence score from 0 to 100.
- 90-100: Very clear classification (known vendor, typical amount, matches policy).
- 60-89: Reasonable classification but might need review.
- <60: Unclear classification, must be reviewed.

Add this field to the JSON output as "confidence".

In your integration, you might auto-post expenses with confidence ≥ 90, send 60–89 to fast-lane human review, and route <60 or any policy_violation = true to senior finance review. This keeps risk under control while still delivering high automation rates.

Log All AI Decisions for Auditability and Continuous Improvement

For finance and audit teams, it’s crucial to keep a transparent record of how AI classifications were made. Store each ChatGPT response—including the suggested codes, confidence, and explanation—alongside the final human decision. This enables you to prove to auditors how classifications were derived and to analyse where the AI tends to be corrected.

Regularly export this log to evaluate performance: measure agreement rate between AI and human decisions, typical error patterns, and which policies are most often violated or misinterpreted. Use this feedback to refine prompts, update the allowed schema, or add more examples for tricky categories (e.g., mixed personal/business subscriptions or multi-purpose SaaS tools).

Provide Clear, Human-Readable Explanations to Build Trust

Don’t just return codes—always include a short, human-readable explanation that makes it obvious why ChatGPT chose a specific category. This helps reviewers process entries faster and builds confidence in the system. It also serves as documentation for future audits or internal reviews.

Example explanation style:

"explanation": "Classified as 6100 Travel - Transport because the vendor is Uber, 
amount is typical for a local ride, and the description references a client meeting."

Over time, you’ll notice finance reviewers relying more on these explanations and less on manual inspection, which translates directly into fewer manual touches per transaction and shorter closing cycles.

Implemented carefully, these practices can realistically reduce manual expense classification work by 40–70%, cut days from month-end close, and improve visibility into spend by cost center and project. The exact metrics will depend on your current process maturity and data quality, but even conservative implementations of ChatGPT for expense categorisation typically pay back within a few closing cycles through saved hours and clearer cost control.

Need implementation expertise now?

Let's talk about your ideas!

Frequently Asked Questions

Accuracy depends on how well you configure the prompts, constraints and training examples. In our experience, when you provide a clear chart of accounts, cost center list and policy summary, ChatGPT can match or exceed human-level accuracy on routine expenses such as travel, meals, and common subscriptions.

The best approach is to measure accuracy in a pilot: run ChatGPT on historical expenses, compare its suggestions with final booked entries, and track agreement rates. With proper setup and human-in-the-loop review for low-confidence cases, many organisations achieve 85–95% alignment on first pass, then improve further as they refine prompts and rules.

You don’t need to rebuild your finance stack. Typically, you need:

  • An integration point (API or webhook) from your expense tool or ERP when a new expense is created or changed.
  • A small middleware service that formats the data, calls the ChatGPT API with your standard prompts, and parses the JSON response.
  • Configuration work in your finance tools to store suggested GL codes, cost centers and explanations, and to route entries based on confidence or policy flags.

From a skills perspective, this usually requires a developer with API experience and a finance lead who can specify the rules and test the results. Reruption often co-builds this layer with clients so finance doesn’t have to rely solely on IT backlogs.

A well-scoped pilot can be designed and implemented in a matter of weeks, not months. With our AI Proof of Concept approach, we typically get a working prototype running on real data within days, then spend another 1–3 weeks validating accuracy, refining prompts, and integrating into a limited part of your workflow (e.g., travel expenses for one business unit).

Meaningful results—such as reduced manual touches per expense, faster approvals, and fewer uncategorised entries—usually appear within the first or second closing cycle after go-live. Full rollout to all expense categories and entities takes longer, but is incremental once the core pattern is proven.

ROI comes from three main areas: time savings, better spend visibility, and risk reduction. Automating classification can free significant hours during month-end close and throughout the month—especially for finance teams dealing with high volumes of small-ticket expenses. This time can be reallocated to analysis and business partnering.

Better categorisation improves your ability to manage travel, SaaS and procurement costs in real time, which often leads to concrete savings (e.g., spotting duplicate tools, out-of-policy spend, or unjustified travel). Finally, consistent application of policies and better anomaly detection reduces compliance and fraud risk. When you add these together, it’s common to see the investment in a ChatGPT-based solution pay back within a few quarters, sometimes much faster in high-volume environments.

Reruption combines strategic finance understanding with deep AI engineering. We typically start with a 9.900€ AI PoC where we define the use case (expense categories, policies, systems), build a working prototype that classifies your real expenses via ChatGPT, and measure performance on accuracy, speed and cost per run.

From there, we apply our Co-Preneur approach: embedding alongside your team, co-owning outcomes, and pushing from prototype to production-ready workflows. We help with prompt design, API integration into your existing tools, security and compliance questions, and change management for your finance organisation. The goal is not just a demo, but a concrete, maintainable system that actually makes your month-end close faster and your spend data more reliable.

Contact Us!

0/10 min.

Contact Directly

Your Contact

Philipp M. W. Hoffmann

Founder & Partner

Address

Reruption GmbH

Falkertstraße 2

70176 Stuttgart

Social Media