-
Notifications
You must be signed in to change notification settings - Fork 29
Description
After looking carefully at the HOG code and comparing the time needed to perform pedestrian detection in an image (I found it too slow) I wanted to propose another version to be as fast as possible (even though it might differ a little bit from the first proposed version in academia). It would be nice to actually benchmark both versions in terms of F-score in a "benchmark task". In any case probably some of the ideas of the proposed version might be reused to rethink some parts of the current HOG version which could yield faster compute times.
After some discussion with @zygmuntszpak on slack I will start outlining the different components needed to implement the standard HOG:
- Divide window into adjacent non-overlapping cells of size 8 x 8 pixels.
- For each cell, compute a histogram of the gradient orientations binned into B bins.
- Group the cells into overlapping blocks of 2 x 2 cells (so each block has 16 x 16 pixels).
- Concatenate the four cell histograms in each block into a single block feature, and normalize the block feature by its Euclidean norm.
- The resulting HOG feature is the concatenation of all blocks features within a specified window (eg 128 x 64).
Currently the HOG in the code does this process for a given input image and HOG() struct. This has a basic problem faced by users when they want to use the descriptor in a 'big' image to perform object detection. This problem is a mix of redundant computations of histograms (in case there are overlapping windows) as well as a lot of allocations (since for each window there are several arrays that are created: for gradients (in x and y coordinates) for magnitudes and for orientations).
Fast HOG version 1
- Same
- Same
- Skip
- Skip
- Resulting HOG is view of the cell features with a specified window
Why skipping 3 and 4 ?
Well, if we do not normalize the histograms, it seems a bit odd to need the blocks since we would end up with the exact same histogram cells copied in different blogs (seems quite a lot of redundant information, when normalized it makes sense since the normalization factor changes the "redundant cells").
I will tell the array made by the histograms a Hogmap. Which might look like this:
C_11 C_12 C_13 C_14 ...
C_21 C_22 C_23 C_24 ...
C_31 C_32 C_33 C_34 ...
...
Where C_ij corresponds to a histogram with B bins.
Hei but this is not a HOG!
Well it is descriptor made with histograms of oriented gradients. It's just not normalizing different block regions in order to get faster computes. I would like to test if there is a real high penalty in performance. When the original HOG was proposed no one (as far as I am aware) used to grow the train sets online. We could do it to have samples with different illuminations in different regions, allowing the learning algorithm to learn to be invariant under such events without us needing the descriptor to make local normalizations.