Efficient and Accurate Potato Disease Classification Using Lightweight Vision Transformers: A Comparative Benchmark Against a Deep CNN Architecture

Authors

DOI:

https://doi.org/10.30546/UNECCSDT.2026.001.210

Keywords:

Potato diseases, Deep learning, Vision Transformer (ViT), Plant disease classification, Computational efficiency

Abstract

The potato is a cornerstone of global food security, but its cultivation is significantly hampered by diseases like Early and Late Blight, which can lead to severe crop losses. Traditional visual inspection for disease diagnosis is often subjective, labor-intensive, and unreliable, necessitating the development of automated, accurate, and scalable solutions. This study develops and rigorously evaluates a deep learning system for the automated classification of healthy, Early Blight, and Late Blight potato leaves using the public PlantVillage dataset. We present a comparative analysis of a leading Convolutional Neural Network (CNN),

ResNet-101, against two prominent Vision Transformer (ViT) architectures, Swin Base, and MobileViT v2. Model performance was meticulously assessed using accuracy, precision, recall, and F1-score, alongside an evaluation of computational efficiency based on parameter count and GFLOPs. The results showed that while all models demonstrated high classification capabilities, the Vision Transformer architectures significantly outperformed the conventional CNN model. The MobileViT v2 model emerged as the superior architecture, achieving the highest classification accuracy of 99.69% on the test set. Critically, this state-of-the-art performance was coupled with exceptional computational efficiency, as the model contains only 4.39M parameters. The findings clearly indicate that modern, lightweight Vision Transformer architectures can provide a more accurate and vastly more efficient solution for potato disease classification than deeper, more established CNNs, highlighting their significant potential for deployment in real-world, resource-constrained agricultural environments and paving the way for advanced tools that foster sustainable farming practices.

Downloads

Published

2025-06-17

How to Cite

Efficient and Accurate Potato Disease Classification Using Lightweight Vision Transformers: A Comparative Benchmark Against a Deep CNN Architecture. (2025). Journal of Computer Science and Digital Technologies , 2(1), 5-15. https://doi.org/10.30546/UNECCSDT.2026.001.210

Similar Articles

1-10 of 19

You may also start an advanced similarity search for this article.