Waste Sorting

Today’s industrial system is increasingly migrating towards recycled raw materials. In this context, recycling and waste sorting play a crucial role. The goal of this work is to address the problem of waste sorting with a scientific approach and propose some ideas and solutions on how Data Science can help improve resource utilization. The research questions are therefore two: “How can waste sorting be analyzed as a statistical classification problem?” and “What applications could be implemented using predictive models?”.
To answer these questions, data was collected to classify waste using three strategies: Web Scraping, programming a Telegram Chatbot, and finally designing and building an Arduino-based bin capable of recording waste data. Once the data collection phase was completed, Deep Learning and Machine Learning algorithms were used to classify waste in the best possible way. These models were then combined through Ensemble Learning mechanisms to find a better predictor, with which two applications were built: a Chatbot that recognizes the type of waste from photos and an automatic waste sorting bin.
To address this problem, the work was structured as follows:

Conclusions

The purpose of this study was to use statistical classification as a framework to analyze the problem of waste sorting, with the aim of evaluating which applications can be implemented using predictive models. To this end, a quantitative investigation was conducted using different collection methods. The different data collection methods led to a more accurate and complete result in the model estimation phase. Subsequently, by leveraging different Machine Learning algorithms, it was possible to design two useful applications for waste sorting.

Select a node to see more information.

{ "class": "go.GraphLinksModel",
  "nodeDataArray": [
{
  "key":1, "pos":"-180 -57", "icon":"Web", "iconWidth": 40, "iconHeight": 40, "portHeight": 20,
  "text":"Web\nScraping",
  "description": "Among the many possible solutions, the Web Scraping technique called HTML Parsing was chosen, which is a deserialization of HTML pages. This process receives raw HTML code, interprets it, and generates a DOM (Document Object Model) tree structure from the code. With this technique, 41042 images divided into 129 categories were collected.",
  "caption":"Web Scraping",
  "imgsrc":"http://webdata-scraping.com/media/2016/04/web_scraping_spider.png"
},
{
  "key":2, "pos":"-180 100", "icon":"telegram", "iconWidth": 40, "iconHeight": 60, "portHeight": 20,
  "text":"Telegram\nChat bot",
  "description": "The Bot implemented in this section receives images by waste category, saves them, and returns a thank you message to the sender. With this data collection technique, 1348 photos of different types of waste were collected.",
  "caption":"Chatbot Telegram",
  "imgsrc":"https://gioditcom.files.wordpress.com/2017/04/telegram-marketing.png"
},
{
  "key":3, "pos":"-180 250", "icon":"rbin", "iconWidth": 40, "iconHeight": 60, "portHeight": 20,
  "text":"Automatic\nRecycling\nBin",
  "description": "As a final data source, a bin was built to also collect structured data on waste. The choice of this third data collection method was to have a very precise dataset about the different types of waste. In building the bin, mechanical work was needed to make it function structurally; robotic work was needed as the bin had to move and send sensor data to the computer; and finally, software work was needed to program software capable of receiving and saving the data sent by the bin.",
  "caption":"title",
  "imgsrc":"https://images-na.ssl-images-amazon.com/images/I/51nYpnkWBdL._SL500_AC_SS350_.jpg"
},
{
  "key":4, "pos":"-80 -57", "icon":"ML", "iconWidth": 40, "iconHeight": 60, "portHeight": 20,
  "text":"ResNet1",
  "description": "ResNet is a CNN architecture from Microsoft Research designed by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. This architecture was created in 2015 and won the ImageNet 2015 competition.",
  "caption":"title",
  "imgsrc":"https://zdnet4.cbsistatic.com/hub/i/r/2018/08/07/828a6f3b-a1b8-4030-9a41-2907daada863/resize/370xauto/0a6fbd6c37fed9bc4d2ee15667255a14/ml-recorded-future.png"
},
{
  "key":5, "pos":"-80 170", "icon":"ML", "iconWidth": 40, "iconHeight": 60, "portHeight": 20,
  "text":"Resnet2",
  "description": "ResNet is a CNN architecture from Microsoft Research designed by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. This architecture was created in 2015 and won the ImageNet 2015 competition.",
  "caption":"title",
  "imgsrc":"https://zdnet4.cbsistatic.com/hub/i/r/2018/08/07/828a6f3b-a1b8-4030-9a41-2907daada863/resize/370xauto/0a6fbd6c37fed9bc4d2ee15667255a14/ml-recorded-future.png"
},
{
  "key":7, "pos":"-80 300", "icon":"ML", "iconWidth": 40, "iconHeight": 60, "portHeight": 20,
  "text":"GBM",
  "description": "Gradient Boosting is an implementation of the Boosting algorithm, from which it inherits the logic of building a Strong Learner. The models used are generally decision trees and its purpose is to minimize a generic cost function. This algorithm starts by training a weak learner on the training data and then re-fits it by giving greater weight to incorrectly classified observations through a shrinkage parameter. This process is repeated until a stopping rule is reached.",
  "caption":"title",
  "imgsrc":"https://zdnet4.cbsistatic.com/hub/i/r/2018/08/07/828a6f3b-a1b8-4030-9a41-2907daada863/resize/370xauto/0a6fbd6c37fed9bc4d2ee15667255a14/ml-recorded-future.png"
},
{
  "key":10, "pos":"-80 400", "icon":"ML", "iconWidth": 40, "iconHeight": 60, "portHeight": 20,
  "text":"MLP",
  "description": "A Multi-Layer Perceptron net (MLP) combines multiple processing layers using artificial neurons. It consists of an input layer, one or more hidden layers, and an output layer. The layers are interconnected through nodes or neurons, with each layer using the output of the previous layer as input.",
  "caption":"title",
  "imgsrc":"https://zdnet4.cbsistatic.com/hub/i/r/2018/08/07/828a6f3b-a1b8-4030-9a41-2907daada863/resize/370xauto/0a6fbd6c37fed9bc4d2ee15667255a14/ml-recorded-future.png"
},
{
  "key":8, "pos":"200 0", "icon":"telegram", "iconWidth": 40, "iconHeight": 60, "portHeight": 20,
  "text":"Telgram application",
  "description": "The first application examined is the implementation of a Chatbot. The software used as a platform is the messaging application Telegram. The implementation consists of a program that receives photos as input and returns the corresponding waste category of the object in the image as output.",
  "caption":"title",
  "imgsrc":"https://gioditcom.files.wordpress.com/2017/04/telegram-marketing.png"
},
{
  "key":9, "pos":"180 230", "icon":"rbin", "iconWidth": 40, "iconHeight": 60, "portHeight": 20,
  "text":"Automatic Recycling bin application",
  "description": "The second application examined consists of a bin capable of performing automatic waste sorting.",
  "caption":"title",
  "imgsrc":"https://images-na.ssl-images-amazon.com/images/I/51nYpnkWBdL._SL500_AC_SS350_.jpg"
}
],
  "linkDataArray": [
{"from":1, "to":4 },
{"from":2, "to":5, "label": "APG" },
{"from":3, "to":5 },
{"from":3, "to":7, "toSpot":"bottom" },
{"from":3, "to":10, "toSpot":"bottom" },
{"from":4, "to":8 },
{"from":5, "to":8 },
{"from":4, "to":9, "fromSpot":"rightsingle", "color":"orange"  },
{"from":5, "to":9, "fromSpot":"rightsingle", "color":"orange" }, 
{"from":7, "to":9, "fromSpot":"rightsingle", "color":"orange" }
 ]}

The models trained in this study leverage both structured data and unstructured information with the goal of creating multiple "Stronger Learners" and combining them through Ensemble Learning methods to obtain robust and accurate estimates.
The models created, as can be observed in the chart below, have high accuracy. In particular, it can be noted that the Stacking models created have an accuracy above 90% and equally high precision in individual waste categories.

However, this project, however thorough, inevitably overlooks some variables and information that should be considered for model improvement, seeking to identify corrective elements. Indeed, it is important to note that in this analysis it was not possible to analyze organic waste due to an intrinsic hygiene problem related to this waste category. Furthermore, the environment created inside the data recording "robot" is unable to create complete and stable isolation. For this reason, the parameters potentially analyzed could be influenced by external elements (such as light and static electricity in the environment) and cause prediction distortion.
The project carried out has produced results that, despite the critical points, highlight the potential of Data Science. In this context, further analysis could lead to significant improvements.
A first improvement could be to increase the size of the Validation Set created in Chapter 3 to increase the external validity of the prediction.
Furthermore, the use of ResNet 152 or Inception V4 for image recognition could lead to a significant increase in model accuracy.
In addition, through processes not strictly statistical, the analysis would be more complete, achieving higher accuracy levels. Among the processes that would allow this deepening are:

The use of Reinforcement Learning to allow applications to learn from errors.
The use, together with the created classifiers, of Amazon Web Services tools to create new, more accurate Ensemble Learning models.
Reading, when present in the image, the barcode of the product to be disposed of.

In conclusion, I believe that this work can be a good starting point for developing tools that help improve resource utilization.

Waste Sorting, a Classification Problem

Predictive models and applications

Conclusions