A bounding-box approach for identifying weeds (red) and crops (green) [1]

Deep Learning-based Computer Vision for Real-Time Weed Detection and Classification

Pranay Shah
6 min readJul 1, 2021

--

Computer vision is becoming an increasingly effective technology in tackling common agricultural tasks such as yield prediction, plant species identification, disease detection and weed management. Of these, weed management is a particularly important task as it has one of the largest impacts on crop yield. Chemical weed management makes use of either full-coverage spraying or selective spraying of herbicides. Full-coverage spraying can reduce crop yield and exacerbate health and ecological issues due to excessive herbicide application, whereas selective spraying specifically targets the weeds to increase crop yield and protects the surrounding ecosystem. Traditional selective spraying is labour intensive and time-consuming, so an automatic weed detection system that precisely, efficiently and cost-effectively identifies weeds in real-time is the primary aim of an agricultural machine vision system. In this short article, the discussions from [1] and [2] are summarised.

Solutions typically apply traditional machine learning (ML) or deep learning (DL). Generally, traditional ML methods require less data for training and have lower computational requirements. However, deep domain expertise is critical for hand-crafting the most informative features as inputs to traditional ML algorithms. The problem of weed identification is incredibly challenging due to the complex natural environment occupied by weeds and crops. Among the usual sources of variation in image data such as occlusion, overlapping objects, shadow effects, motion blur and noise, some of them are especially difficult to overcome in weed detection, for which even expertly engineered features may not be optimal. Such features can be categorized into colour, texture, shape and spectral modalities. When used alone, colour-based features are the most unreliable because the inter-class (between plant species) and intra-class (between instances of the same species) differences in colour can be negligible. Leaf colour is also highly dependent on factors such as season, climate, geography, lighting conditions and disease. Similar factors affect the efficacy of spectral features of leaves, such as their reflectivity. Both shape and textural features are sensitive to overlapping or occlusion. Some textural features, such as the Grey-level co-occurrence matrix (GLCM), are more robust but equally impractical for real-time inference because of their larger computational complexity. Although features from each of these modalities can be combined to improve detection accuracy, the model’s performance relies on the quality of feature selection and it is likely that some important plant features will go unutilised.

Although traditional ML algorithms have shown promise as candidates for real-time precision weed management, most notably Support Vector Machines (SVM), most of them are outperformed by DL methods. DL automatically learns and selects the best hierarchical features that asymptotically approach optimality with respect to the specific learning task. Unlike traditional ML, training a neural network for classification demands a much larger volume of data. There are very few public datasets comprising a diversity of weed or crop species, let alone those containing real field images. However, even datasets of the same species can have limited transferability because of the variations induced by seasonality, geography, and growth phases. What DL gains by automating feature selection, it loses due to the significant manual effort in labelling data. Future research in active learning, semi-supervised, self-supervised, few-shot or zero-shot learning can help to minimise the labelling cost.

Typically, digital cameras or multi-spectral cameras are mounted on data-acquisition vehicles to capture RGB channels, Near-Infrared (NIR) data, or other spectral data. Preprocessing these raw images facilitates the network’s ability to learn, and can include image resizing for resolution tuning or improved computational efficiency, background removal to eliminate shadow effects and debris, motion blur removal, denoising, data whitening, and channel normalisation. Subsequently, training data are augmented to improve the generalisation performance of the DL model. Augmentation techniques include geometric transformations, gamma correction, cropping, kernel filtering, noise injection and colour space transformation. Colour augmentation is particularly advantageous for real-time inference as it enables the model to learn how to respond to changing lighting conditions when the agricultural robot is in motion.

The range of DL architectures in the literature fall under one of Convolutional Neural Networks (CNN), Fully Convolutional Networks (FCN), Region Proposal Networks (RPN), Graph Convolutional Networks (GCN), or Hybrid Networks (HN), depending on the approach taken to classify the weeds. Within CNN, taking pretrained weights using transfer learning from public datasets such as ImageNet [3], COCO [4], and KITTI [5], researchers have compared various popular architectures for image classification. Some of these are AlexNet, VGG-19, GoogLeNet, NasNet, Inception-ResNet, DetectNet, ResNet-50, ResNet-101, and Inception-v3. The best model varies depending on the species being classified, the number of classes being considered, and the classification runtime. HNs, such as the AgroAVNET which combines structural properties of AlexNet and VGGNet, combines architectural features from multiple models according to their influence on the overall model performance. The next level of abstraction would be to predict the bounding boxes of the weeds using RPNs. This object detection approach has been successful using models such as YOLO-v3, Faster R-CNN, Single Shot Detector (SSD), and Mask R-CNN. The feature extractors in these object detection frameworks can be a CNN or FCN with finetuned versions of the aforementioned architectures. Alternatively, the tiny YOLO-v3 model can speed up inference time of YOLO-v3 with a small compromise in accuracy, and is more suitable for real-time inference. Modern GPUs are optimized for convolution operations, so moving towards fully convolutional networks can speed up training time. FCNs can classify at the pixel level for semantic image segmentation and common approaches adopt well-known encoder-decoder architectures such as SegNet or U-Net, with variations of ResNet or VGG, for example, as the encoder blocks. End-to-end backpropagation in FCNs means that the learned features will be optimal for the classifier since the feature learner is coupled with the classifier. Segmentation accuracy heavily depends on dataset size, so transfer learning can be applied to give the learning a head start, while data augmentation techniques can artificially boost the volume of training data. Finally, GCNs, such as Graph Weed Network (GWN), represent images as graphs and are considered as semi-supervised methods because unannotated nodes can be approximately labelled by a weighted average of nearby nodes with known labels. For a more in-depth discussion of the many approaches to DL-based weed detection, please refer to section 10 in [1].

Perhaps a more disciminative factor for model choice than minute architecture-driven improvements in classification accuracy is a low algorithmic complexity to meet the requirements of real-time image processing. This will enable precise and timely application of the herbicide that is right for the weed species. The processing speed also depends on the efficiency of the hardware, whether the robot is acquiring data for training or doing inference on unseen field data. AI infrastructure companies have the capability and expert knowledge to build custom mobile units with advanced embedded GPUs and networking infrastructure. Modern solutions are equipped to meet the criteria of fast real-time image processing, allowing businesses to take full advantage of powerful DL in digital agriculture.

References:

[1] Mahmudul Hasan, A. S. M.; Sohel, F; Diepeveen, D; Laga, H; G. K. Jones, M. A survey of deep learning techniques for weed detection from images.CoRR, abs/2103.01415, 2021, https://arxiv.org/abs/2103.01415

[2] Wu, Z.; Chen, Y.; Zhao, B.; Kang, X.; Ding, Y. Review of Weed Detection Methods Based on Computer Vision. Sensors 2021, 21, 3647. https://doi.org/10.3390/s21113647

[3] Deng, J.; Dong,W.; Socher, R.; Li, L.; Li, K.; Li, F. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255.

[4] Lin, T.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, L. Microsoft COCO: Common Objects in Context; Springer: Zurich. Switzerland, 2014; pp. 740–755.

[5] Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. (2013). Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32 (11), 1231{1237}.

--

--

Pranay Shah
Pranay Shah

Written by Pranay Shah

Researcher in Causal Inference @ Imperial College London | AI Researcher @ BSI | MSc Machine Learning, Imperial College London | MEng, University of Cambridge

No responses yet