Convolutional neural networks (CNNs) are a class of Deep Artificial Neural Networks that are commonly applied to image processing tasks, such as object detection and identification.
CNNs share the underlying structure with the MLPs seen previously, in fact:
MLPs can only work with images if they are first converted into a vector of pixel values. The problem with this transformation is the potential loss of spatial integrity, as information about how pixels combine with each other could be lost.
Convolutional neural networks maintain the spatial integrity of input images because they can extract information from a data matrix through convolutional filters; moreover, they can analyze color images (thus in three dimensions) by processing each channel individually, but keeping them grouped as the same input. In practice, convolutional filters are a set of weights that are applied to the pixel values in our input image. These weights are learned and refined through backpropagation during the training phase.
The use of convolutional filters in CNNs is inspired by the work of Hubel and Wiesel, two neuroscientists who studied how images trigger neuronal activation in the visual cortex. In their study, they suggested that visual processing is a hierarchical process that begins with the identification of basic features that are then combined into more complex constructs. Similarly, filters in CNNs allow the neural network to identify basic features and highlight them; these can then be forwarded for further processing, but must first be corrected by an activation function.
The filtering process, as can be seen in the example in Figure 2.16, consists of sliding a convolutional filter over an image and generating a filtered version of the image.
Esempio applicazione filtro convoluzionale
Convolutional filters can enhance and suppress specific features present in input images (for example curves, edges, or colors). Filters within a CNN can mainly differ in 4 characteristics:
Applicazione di un filtro convoluzionale con imbottitura
Applicazione di un filtro convoluzionale senza Padding
In a convolutional neural network, the inputs \(I\) are images, thus matrices in three dimensions. Formally \(I\) has dimension \(H \times W \times C\) where $H W $ are the pixels while \(C\) are the channels, therefore:
\[I \in \mathbb{R}^{H \times W \times C}\]
Assuming a filter \(K \in \mathbb{R}^{k_1 \times k_2 \times C \times D}\) and a bias \(b \in \mathbb{R}^{D}\) the output of the convolutional procedure is:
\[(I \ast K)_{ij} = \sum_{m = 0}^{k_1 - 1} \sum_{n = 0}^{k_2 - 1} \sum_{c = 1}^{C} K_{m,n,c} \cdot I_{i+m, j+n, c} + b \]
The weights and bias are updated through a forward-propagation and back-propagation procedure, generally inserting a fully connected MLP at the end of the filters.
In CNNs, the most common activation function is ReLU, because it ensures that only nodes with a positive activation send their values forward, which guarantees that:
Un attributo comune nelle reti neurali convoluzionali è quello del Pooling, esemplificato in Figura 2.18. Il pooling avviene normalmente dopo che le mappe delle funzionalità sono state passate tramite la funzione di attivazione ReLU. L’obiettivo del pooling è ridurre le dimensioni della matrice dell’immagine senza perdita di informazioni. A sua volta, questa operazione, riduce la quantità di elaborazione richiesta, risparmiando tempo e risorse nella fase di formazione. Le varietà di pooling più comuni sono:
Esempio Pooling
Un altro importante elemento costitutivo della CNN è la normalizzazione (normalization batch). Per aumentare la stabilità di una rete neurale si sottrae dai valori in input la loro media batch e successivamente si dividono per la deviazione standard. La normalizzazione viene, di norma, attuata prima di un livello di attivazione ed è utilizzata per velocizzare il processo di riduzione al minimo la funzione di perdita.
\[ \]
\[ \mu_B = \frac{1}{m} \sum_{i=1}^m x_i \]
\[ \sigma_B^2 = \frac{1}{m} \sum_{i=1}^m (x_i-\mu_B)^2\]
\[ \hat{x_i} = \frac{x_i - \mu_B}{\sqrt{\sigma^2_B + \epsilon}} \]
\[ y_i = \gamma \hat x_i + \beta == BN_{\gamma,\beta}(x_i) \]
\[ \]
L’offset \(\beta\) e lo scale factor \(\gamma\) sono parametri apprendibili che vengono aggiornati nella back-propagation come fossero normali pesi. Se si imposta la \(\gamma\) come 1 e \(\beta\) come 0 l’intero processo è solo standardizzazione.