Waste Sorting, a Classification Problem


Predictive models and applications

Today’s industrial system is increasingly migrating towards recycled raw materials. In this context, recycling and waste sorting play a crucial role. The goal of this work is to address the problem of waste sorting with a scientific approach and propose some ideas and solutions on how Data Science can help improve resource utilization. The research questions are therefore two: “How can waste sorting be analyzed as a statistical classification problem?” and “What applications could be implemented using predictive models?”.
To answer these questions, data was collected to classify waste using three strategies: Web Scraping, programming a Telegram Chatbot, and finally designing and building an Arduino-based bin capable of recording waste data. Once the data collection phase was completed, Deep Learning and Machine Learning algorithms were used to classify waste in the best possible way. These models were then combined through Ensemble Learning mechanisms to find a better predictor, with which two applications were built: a Chatbot that recognizes the type of waste from photos and an automatic waste sorting bin.
To address this problem, the work was structured as follows:

Conclusions

The purpose of this study was to use statistical classification as a framework to analyze the problem of waste sorting, with the aim of evaluating which applications can be implemented using predictive models. To this end, a quantitative investigation was conducted using different collection methods. The different data collection methods led to a more accurate and complete result in the model estimation phase. Subsequently, by leveraging different Machine Learning algorithms, it was possible to design two useful applications for waste sorting.

Select a node to see more information.

The models trained in this study leverage both structured data and unstructured information with the goal of creating multiple "Stronger Learners" and combining them through Ensemble Learning methods to obtain robust and accurate estimates.
The models created, as can be observed in the chart below, have high accuracy. In particular, it can be noted that the Stacking models created have an accuracy above 90% and equally high precision in individual waste categories.



However, this project, however thorough, inevitably overlooks some variables and information that should be considered for model improvement, seeking to identify corrective elements. Indeed, it is important to note that in this analysis it was not possible to analyze organic waste due to an intrinsic hygiene problem related to this waste category. Furthermore, the environment created inside the data recording "robot" is unable to create complete and stable isolation. For this reason, the parameters potentially analyzed could be influenced by external elements (such as light and static electricity in the environment) and cause prediction distortion.
The project carried out has produced results that, despite the critical points, highlight the potential of Data Science. In this context, further analysis could lead to significant improvements.
A first improvement could be to increase the size of the Validation Set created in Chapter 3 to increase the external validity of the prediction.
Furthermore, the use of ResNet 152 or Inception V4 for image recognition could lead to a significant increase in model accuracy.
In addition, through processes not strictly statistical, the analysis would be more complete, achieving higher accuracy levels. Among the processes that would allow this deepening are:

  • The use of Reinforcement Learning to allow applications to learn from errors.
  • The use, together with the created classifiers, of Amazon Web Services tools to create new, more accurate Ensemble Learning models.
  • Reading, when present in the image, the barcode of the product to be disposed of.

In conclusion, I believe that this work can be a good starting point for developing tools that help improve resource utilization.