Evidential Learning for Robust Classification

📄 Read the full project report
📂 Code available at: https://github.com/amir-aghdam/evidential-classification

Deep neural networks are often overconfident and unaware of their own limitations, especially in uncertain or ambiguous situations. This project takes a bold step forward by applying evidential deep learning to visual classification — combining transformer-based representation learning with probabilistic uncertainty modeling.

We introduce a novel pipeline built on DINO v2 Vision Transformers, replacing standard classification heads with a Dirichlet-based evidential output. This empowers the model not only to predict, but to quantify how much it trusts its own decisions — an essential feature for safety-critical or fine-grained tasks.

🚀 Quick Highlight

Achieved +4.1% absolute accuracy gain and vastly improved uncertainty calibration over standard deep networks — showing promise for real-world deployment where reliability matters.

🌼 Dataset and Task

We validated our approach on a challenging fine-grained flower classification task, known for subtle inter-class differences. This serves as a perfect benchmark for testing uncertainty-aware methods.

Samples from the Flowers dataset, illustrating subtle inter-class differences.

🧠 Architecture and Training

We adapt a pretrained DINO v2 ViT-S/14 and append an evidential layer. The model is trained using an uncertainty-regularized evidential loss, allowing it to estimate both class prediction and confidence bounds. Our architecture adds an evidential head on top of a frozen DINO v2 encoder, trained with a KL-regularized loss function.

📊 Performance

Metric	Cross-Entropy	Evidential
Accuracy	94.55%	⭐ 98.69%
Precision	86.32%	96.55%
Recall	86.07%	96.61%
F1 Score	86.14%	96.55%

In addition to numerical gains, the model calibrates its uncertainty, lowering trust in incorrect predictions and raising confidence only when justified.

Evidential models express higher uncertainty on incorrect predictions — a crucial trait missing in standard classifiers.

🎯 Visual Interpretability

Grad-CAM and t-SNE visualizations show that evidential models develop more semantically meaningful feature spaces and focus on more relevant image regions.

Grad-CAM reveals sharper and more interpretable attention regions in evidential networks.

Clearer cluster formation in evidential models shows more structured representation learning.

💡 Why This Matters

This work demonstrates that uncertainty isn’t just a bonus — it’s essential. By embedding calibrated confidence into model outputs, we build a foundation for safer, more responsible AI systems in domains like healthcare, robotics, autonomous driving, and scientific discovery, where “I don’t know” is often the most important answer.