FraudShield: Bringing Large Language Models to the Edge

Project Overview

The growth of mobile financial transactions has led to a surge in sophisticated SMS fraud ("smishing"). Traditional rule-based filters often miss these evolving threats, while powerful Cloud AI solutions raise serious privacy concerns—users do not want their private banking messages sent to a third-party server.

FraudShield is our answer to this dilemma. It is a lightweight mobile application that runs Large Language Models (LLMs) entirely on-device. By optimizing and compressing advanced AI models, we achieved 99.5% fraud detection accuracy in an offline environment, ensuring user data never leaves their phone.

The Challenge

Deploying GenAI for security presented a "Trilemma" of conflicting constraints:

1. The Privacy Barrier

Financial text messages contain sensitive personal data (OTP codes, balance alerts). Sending this data to a cloud API (like OpenAI) for analysis is a security risk and a privacy violation. The solution had to work 100% offline.

2. The Resource Gap

Standard LLMs require gigabytes of RAM and powerful GPUs. Running them on a standard smartphone usually drains the battery instantly or crashes the device.

3. Real-Time Latency

Fraud happens in seconds. A user might click a phishing link immediately. We couldn't afford the latency of network calls; the detection needed to happen in milliseconds.

Our Solution: Extreme Model Quantization

We moved the intelligence from the server to the pocket. Instead of building a simple app wrapper, we engaged in deep Machine Learning engineering to fine-tune and shrink open-source models.

Phase 1: Model Optimization & Fine-Tuning

We selected compact "Small Language Models" (SLMs)—specifically Llama-160M and Qwen1.5-0.5B. We fine-tuned these models on a curated dataset of financial fraud messages, teaching them to recognize subtle linguistic patterns used by scammers (urgency, fake authority, suspicious links).

Phase 2: Quantization & Compression

To fit these models on a phone, we utilized 4-bit quantization and converted them to the ONNX (Open Neural Network Exchange) format.

Drastic Size Reduction: We compressed the Llama-160M model down to a mere 168MB footprint.
Efficiency: This allowed the model to load into the RAM of even mid-range Android devices without affecting performance.

Phase 3: The Mobile Guardian

We built a React Native interface that runs the ONNX model in the background. It intercepts incoming SMS, tokenizes the text locally, infers the probability of fraud, and alerts the user—all within a fraction of a second.

Key Features

🛡️ 100% Offline Privacy

No internet connection is required for detection. Your messages are processed on your device's CPU/NPU, ensuring absolute data sovereignty. No data is ever uploaded to the cloud.

⚡ Sub-Second Inference

By using ONNX Runtime, the app delivers near-instant analysis. Users are warned about a "Suspicious Transaction Alert" before they even finish reading the message.

🧠 Context-Aware Understanding

Unlike keyword blockers (which block anything saying "Bank"), the LLM understands context. It can distinguish between a legitimate "Your balance is low" alert from your actual bank and a fake "URGENT: UPDATE DETAILS" phishing attempt.

Technical Implementation

Tech Stack

Core AI: Fine-tuned Llama-160M & Qwen1.5-0.5B models.
Model format: ONNX (Open Neural Network Exchange) for cross-platform hardware acceleration.
Mobile Framework: React Native with a custom C++ bridge for the inference engine.
Training: PyTorch with QLoRA (Quantized Low-Rank Adaptation) for efficient fine-tuning.

Comparative Performance Data

We tested multiple model architectures to find the perfect balance of speed vs. accuracy.

Model Variant	Quantization	Accuracy	Model Size
Llama-160M-Chat	4-bit (bnb4)	99.47%	168 MB
Qwen1.5-0.5B-Chat	4-bit (bnb4)	99.50%	797 MB

The Llama-160M variant proved to be the "sweet spot," delivering enterprise-grade security with a footprint smaller than a typical social media app.

Results & Impact

FraudShield demonstrates that privacy and AI power can coexist.

99.47% Detection Accuracy: Outperformed traditional keyword-based filters by a significant margin.
Zero Data Leakage: Successfully validated that no data packets leave the device during the scanning process.
Battery Efficient: The optimized model consumes negligible battery power, running only when a message arrives.
Featherweight Footprint: At just ~170MB, the full AI engine is light enough for mass adoption in emerging markets with older devices.

Conclusion

FraudShield is more than an app; it is a proof of concept for the future of Edge AI. CodeScale has proven that we can take massive, complex Large Language Models and engineer them to solve real-world problems on the most constrained devices, bringing the power of AI to users without compromising their privacy.