AI‑powered document intelligence.

An educational prototype developed at Frankfurt School - AI for Finance Certificate Course, exploring how Artificial Intelligence can transform financial document processing.

This is a non-commercial learning project. No real customer data is used.

AI with guardrails Human‑in‑the‑loop Transparent by design

AuroraBank DocReader 3000

AI-Powered Document Intelligence

Reinventing document trust. Exploring how artificial intelligence can make financial data processing faster, more transparent, and verifiable.

The Architecture of Learning

OCR, natural-language extraction, and semantic search merge into a transparent cycle of continuous learning and improvement through human feedback.

Technical Foundation

Modular architecture with PHP backend, Python AI worker, and Docker infrastructure. MySQL, Qdrant, and Redis ensure reliability and real-time responsiveness.

Research by Design

A proof of concept for intelligent, auditable automation. Inviting innovators to explore how trust, compliance, and AI evolve together.

System Overview

AuroraBank demonstrates a modular AI pipeline: from document ingestion to structured, explainable outputs. Each step is traceable and can be analyzed for performance and accuracy.

The Problem

Manual Document Processing: Banks spend countless hours manually processing invoices, contracts, and compliance documents. This is slow, error-prone, and expensive - exactly what AI should solve.

The Solution

AI-Powered Automation: DocReader 3000 automatically extracts text, validates data, and makes documents searchable. Users can ask questions in natural language: "Show me all invoices from Hamburg" or "Calculate total expenses."

Banking Use Case

Real-World Application: Process loan applications, vendor invoices, and regulatory filings. Reduce processing time from hours to seconds while maintaining accuracy and compliance.

AI, on your terms

Clear boundaries

AI that stays in scope, built for predictable outcomes.

Human oversight

Review flows that keep people in control.

Learning system

Improves with use — templates and patterns, not guesswork.

Trustable outputs

Every result is explainable and ready to share.

Privacy first

Minimal data, masked where needed, switchable integrations.

Fast to value

From upload to insight with a clean, modern UI.

Explainable AI

Every document processed by AuroraBank passes through transparent steps — OCR, NLP, classification, and structured storage. The system highlights how AI decisions can be made visible and verifiable in a financial context.

Transparent Processing

Each document follows a clear path: OCR extraction → NLP analysis → field classification → structured storage. Every step is logged and auditable.

Confidence Scoring

AI decisions include confidence levels for each extracted field. Low-confidence results are flagged for human review, ensuring accuracy in financial contexts.

Decision Traceability

Track exactly why the AI made specific classifications or extractions. View the source text, applied patterns, and reasoning behind each automated decision.

What you get

Modern Review UI

Designed for speed and accuracy.

Template learning

Consistent results on recurring layouts.

Deterministic answers

Guardrails for reliable interactions.

Semantic discovery

Find the right parts across pages.

Built‑in privacy

Respectful by default; configurable when needed.

Easy export

Data you can act on — without friction.

AI Pipeline
Production Ready

Advanced document intelligence powered by modern AI/ML technologies

Document Processing Pipeline

Purpose: Transform semi-structured financial documents into fully structured, validated, and actionable business data through an iterative AI-powered pipeline with human oversight, continuous learning feedback loops, and multi-stage validation gates.

Iterative Learning

Continuous improvement through feedback loops

Validation Gates

Multi-stage quality assurance checkpoints

Error Recovery

Graceful handling of validation failures

1

Document Ingestion & Validation

Multi-Format Ingestion

PDF, JPG, PNG, TIFF support

✓ Format validation ✓ Size limits (100MB) ✓ Security scanning
Security & Compliance

Authentication & authorization

✓ Basic Auth validation ✓ CSRF protection ✓ File type verification
2

AI-Powered Text Extraction & Analysis

Advanced OCR Engine

PaddleOCR + Tesseract hybrid

✓ High accuracy rate ✓ Multi-language support ✓ Confidence scoring
Field Extraction AI

EU EN16931 compliant extraction

✓ 30+ field types ✓ Pattern recognition ✓ Context awareness
3

Human-in-the-Loop Quality Assurance

Expert Review Interface

Domain expert validation & correction

✓ Field-by-field validation ✓ Confidence threshold gates ✓ Correction reason tracking
Adaptive Learning Engine

Real-time pattern generation & model updates

✓ Correction pattern analysis ✓ Dynamic threshold adjustment ✓ Model performance tracking
4

Iterative Re-processing (Conditional Loop)

Enhanced Re-extraction

Apply learned patterns to improve accuracy

✓ Learned pattern integration ✓ Improved confidence scores ✓ Reduced human review load
Quality Gate Decision

Automated quality assessment & routing

✓ Confidence threshold evaluation ✓ Loop-back decision logic ✓ Convergence detection
5

Complete Data Storage & Indexing

Structured Data Storage

MySQL/PostgreSQL with ACID transaction safety

✓ Extracted field data (invoices, contracts) ✓ Relational data integrity & constraints ✓ ACID compliance (safe operations)
Document File Storage

Secure file system for original documents

✓ Original PDF/image files preserved ✓ Encrypted file system storage ✓ Backup & versioning system
6

AI Intelligence & Search

Vector Database (Qdrant)

Semantic search infrastructure for AI operations

✓ 1536-dimensional embeddings ✓ High-performance similarity search ✓ Real-time vector indexing
Hybrid RAG Assistant

Intelligent document analysis with dual-mode processing

✓ Instant responses for structured data ✓ AI-powered complex analysis ✓ Cost-optimized query routing
Multi-Modal Intelligence

Advanced reasoning & citation system

✓ Source document citations ✓ Multi-document reasoning ✓ Financial analysis insights
7

Multi-Stage Validation Gateway

Contract Compliance Validator

Validates stored contract data against legal rules

✓ High validation accuracy on stored data ✓ Multi-jurisdiction compliance rules ✓ Risk scoring & escalation
Sanctions Screening Engine

Screens stored entity data against watchlists

✓ Comprehensive screening coverage ✓ Global sanctions databases ✓ Real-time list updates
Financial Data Validator

Validates stored financial data for compliance

✓ High processing accuracy ✓ IBAN/BIC/VAT validation ✓ Tax compliance verification
AI Fraud Detection

Analyzes stored transaction patterns for fraud

✓ Advanced fraud detection accuracy ✓ Pattern anomaly analysis ✓ Real-time risk scoring
8

Business Integration & APIs

Business Intelligence

Actionable insights & recommendations

✓ Compliance risk analysis ✓ Financial trend detection ✓ Automated reporting
Payment & ERP APIs

Seamless integration with business systems

✓ SEPA payment processing ✓ SAP/Oracle ERP integration ✓ Banking API connectivity
External System Integration

Secure connectivity to enterprise systems

✓ REST/GraphQL API endpoints ✓ OAuth 2.0 authentication ✓ Rate limiting & monitoring

Why This Pipeline Excels

High Performance

Highest accuracy with sub-second response times

Scalable Architecture

Docker-based microservices with horizontal scaling capability

Self-Improving AI

Continuous learning from user feedback with pattern recognition

Technology Stack

AI & Machine Learning

PaddleOCR Tesseract OCR OpenAI GPT OpenAI Embeddings NumPy OpenCV scikit-learn Pillow

Data & Storage

Qdrant Vector DB MySQL Redis Cache Docker Volumes File System Storage JSON Data Exchange

Backend & Infrastructure

PHP-FPM Python Docker Compose Apache HTTP Server Linux Ubuntu REST APIs

Frontend & UX

HTMX JavaScript ES6 HTML5 CSS Grid/Flexbox Glass Morphism Responsive Design

Security & Compliance

Basic Auth CSRF Protection Input Validation Privacy-by-Design

File Processing

PDF Processing Image Processing Multi-format Support File Validation

Open Source & Frontier AI Models

This educational project demonstrates the combination of Open Source Software with modern Frontier AI Models for transparent and cost-effective solutions.

Open Source Foundation

PaddleOCR Qdrant Vector DB MySQL Redis Docker Python PHP Apache

Cost-effective, transparent and adaptable technologies form the foundation of the system.

Frontier AI Models

OpenAI GPT-4o Text Embeddings 3 API Integration Semantic Search Natural Language RAG Architecture

State-of-the-art AI models via APIs for intelligent document analysis and natural language interaction.

Hybrid Architecture

Local Processing Cloud APIs Cost Optimization Fallback Systems Pattern Matching Semantic Understanding

Intelligent combination of local processing and cloud-based AI services for optimal performance.

Live Interactive Demo

Experience the AI-powered document processing system in action

DocReader 3000

Upload documents, search semantically, and access the review interface

Health monitor

Real-time infrastructure monitoring

Checking...

Document Processing

AI-powered document analysis

-
Total Processed
-
Today

Vector Search

OpenAI-powered semantic search

-
Vectors
-
Indexed

Performance

System responsiveness

-
Response
-
Load

Processing Queue

Document workflow status

-
In Queue
-
Processing

OCR Accuracy

Text recognition quality

-
Confidence
-
Success Rate

Storage Usage

Capacity management

-
Used Space
-
Available
Infrastructure Services
Database
Vector DB
Cache

DocReader 3000 Hybrid RAG Monitor

Real-time AI system transparency for educational insights

Stopped
DocReader 3000 Hybrid RAG Monitor Educational AI System Dashboard
[--:--:--] System monitor ready - Click 'Start Monitor' to begin
📄
0
Documents Processed
🔍
0%
OCR Confidence
🧠
0
Vector Embeddings

Concept Demonstrations

Future vision & architectural concepts

System Orchestrator

The control layer connecting contracts, invoices, and compliance checks

All Systems Operational
OPERATIONAL

Contract Validator

Automated contract compliance verification

Validation Rate 99.7%
Response Time < 2s
Daily Checks 1,247
OPERATIONAL

Sanctions Validator

Real-time sanctions list screening

Screening Rate 99.9%
Response Time < 1s
Daily Screens 3,891
OPERATIONAL

Invoice Processing API

AI-powered invoice extraction & validation

Processing Rate 98.4%
OCR Accuracy 97.2%
Daily Invoices 2,156
Compliance KPIs
98.7%
Overall Compliance
< 1.3s
Avg Response
7,294
Daily Operations
0
Critical Issues

AI Fraud Detection

Real-time document analysis and anomaly detection for banking security

Active Monitoring
High Risk Alerts
3
+2 last hour
Medium Risk
12
-1 last hour
Documents Cleared
1,847
+156 last hour

Active Detection Algorithms

Anomaly Pattern Detection
92.4% accuracy
1,234 detections today
Signature Verification AI
89.7% accuracy
892 signatures verified
Amount Consistency Check
94.2% accuracy
2,156 amounts validated
Entity Relationship Analysis
88.5% accuracy
567 relationships mapped

Recent Fraud Alerts

HIGH
Suspicious Amount Pattern
Invoice INV-2025-0847: Amount €50,000 exceeds vendor average by 340%
2 minutes ago
MEDIUM
Duplicate Document Detection
Potential duplicate: INV-2025-0845 matches 89% with INV-2025-0823
7 minutes ago
MEDIUM
Unusual Vendor Activity
TechCorp GmbH: 15 invoices in last 2 hours (avg: 3/day)
12 minutes ago

Overall Risk Assessment

80 SECURE
Document Authenticity 94%
Pattern Compliance 87%
Entity Verification 91%
Amount Validation 96%

Security & Privacy

Educational prototype with enterprise-grade security concepts for learning purposes

Authentication & Authorization

Demonstrates Basic Auth implementation with proper credential validation and session management for educational purposes.

Data Protection

Shows how sensitive document data can be protected with encryption, access controls, and privacy-by-design principles.

Audit & Compliance

Demonstrates comprehensive logging and audit trails essential for financial document processing systems.

Tool AI Explainable AI Features Technology Architecture Compliance Security Team Prototype

Made by Humans

AuroraBank was developed as part of the Frankfurt School - AI for Finance Certificate Course — a collaborative prototype exploring how artificial intelligence can make financial document processing more transparent, auditable, and efficient.

Project Team

Martin Gräbing

System Architecture, AI Engineering & Concept Development

Dr. Melissa Schall

AI Methodology, Academic Insight & Ethical Framing

Maxi Leuchters

Data Modeling, Applied Research & Process Design

Legal Notice

Legal Notice according to § 5 TMG and § 18 MStV

Responsible for Content

Martin Gräbing
40235 Düsseldorf, Germany
https://www.aurorabank.de

Disclaimer

Content is created to the best of our knowledge but without warranty. We accept no responsibility for external links. Infringing content will be removed immediately upon notification.

Copyright

All content is subject to German copyright law. Use only with written permission or for private use.

Privacy (short)

We store no personal data on this portal page and set no tracking cookies. If you email us, your details are processed to handle your request.