Main tasks in Computer Vision:
- Regression: Output variable takes continuous value. E.g. Distance to target
- Classification: Output variable takes class labels. E.g. Probability of belonging to a class
1.What is computer vision?
2.Learning visual features
<!-- Convolution and Padding -->
<!-- Filters, Strides, and Channels -->
Convolutional Neural Networks
Building and Training CNNs
Applications of CNNs
We want to build computer systems able to see what is present in the world, but also to predict and anticipate events.
In particular it enables automatic feature extraction, something that before DNN used to require relevant human participation.
To a computer images, of course, are numbers.
An (RGB) image is just a NxNx3 matrix of numbers [0,255]
Each image is characterized by a different set of features.
Before attempting to build a computer vision system
we need to be aware of what feature keys are in our data that need to be identified and detected.
Notice also that feature characterization needs to define a hierarchy of features that allowas an increasing level of detail
HEAD -> Eyes/Mouth/Nose/… ->
Filters can be used to extract local features
Different features can be extracted with different filters.
Filters that matter in one part of the input should matter elsewhere so:
The key question is how to pick-up the operation that can take
An decide if the patch is in the image.
This operation is the convolution.
1. Convolution: Apply filters to generate feature maps.
2. Non linearity: E.g. (ReLU) to deal with non linear data.
3. Pooling: Downsampling operations on feature maps.
Each neuron in the hidden layer:
For each neuron (\(p\), \(q\)) in the hidden layer:
Pooling downsamples feature maps to reduce the spatial dimensions of the feature maps while retaining the essential information.
Key objectives of pooling in CNNs:
Dimensionality Reduction:
Translation Invariance:
Robustness to Variations:
Extraction of Salient Features:
Spatial Hierarchy:
array_reshape()
function.to_categorical()
function.model <- keras_model_sequential() %>%
layer_conv_2d(filters = 16,
kernel_size = c(3,3),
activation = 'relu',
input_shape = input_shape) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(rate = 0.25) %>%
layer_flatten() %>%
layer_dense(units = 10,
activation = 'relu') %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = num_classes,
activation = 'softmax')
Model: "sequential"
________________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================
conv2d (Conv2D) (None, 26, 26, 16) 160
max_pooling2d (MaxPooling2D) (None, 13, 13, 16) 0
dropout_1 (Dropout) (None, 13, 13, 16) 0
flatten (Flatten) (None, 2704) 0
dense_1 (Dense) (None, 10) 27050
dropout (Dropout) (None, 10) 0
dense (Dense) (None, 10) 110
================================================================================
Total params: 27,320
Trainable params: 27,320
Non-trainable params: 0
________________________________________________________________________________