The OpenVX Specification  dba1aa3

## Detailed Description

Extracts Histogram of Oriented Gradients features from the input grayscale image.

The Histogram of Oriented Gradients (HOG) vision function is split into two nodes vxHOGCellsNode and vxHOGFeaturesNode. The specification of these nodes cover a subset of possible HOG implementations. The vxHOGCellsNode calculates the gradient orientation histograms and average gradient magnitudes for each of the cells. The vxHOGFeaturesNode uses the cell histograms and optionally the average gradient magnitude of the cells to produce a HOG feature vector. This involves grouping up the cell histograms into blocks which are then normalized. A moving window is applied to the input image and for each location the block data associated with the window is concatenated to the HOG feature vector.

## Data Structures

struct  vx_hog_t
The HOG descriptor structure. More...

## Functions

vx_node VX_API_CALL vxHOGCellsNode (vx_graph graph, vx_image input, vx_int32 cell_width, vx_int32 cell_height, vx_int32 num_bins, vx_tensor magnitudes, vx_tensor bins)
[Graph] Performs cell calculations for the average gradient magnitude and gradient orientation histograms. More...

vx_node VX_API_CALL vxHOGFeaturesNode (vx_graph graph, vx_image input, vx_tensor magnitudes, vx_tensor bins, const vx_hog_t *params, vx_size hog_param_size, vx_tensor features)
[Graph] The node produces HOG features for the W1xW2 window in a sliding window fashion over the whole input image. Each position produces a HOG feature vector. More...

vx_status VX_API_CALL vxuHOGCells (vx_context context, vx_image input, vx_int32 cell_width, vx_int32 cell_height, vx_int32 num_bins, vx_tensor magnitudes, vx_tensor bins)
[Immediate] Performs cell calculations for the average gradient magnitude and gradient orientation histograms. More...

vx_status VX_API_CALL vxuHOGFeatures (vx_context context, vx_image input, vx_tensor magnitudes, vx_tensor bins, const vx_hog_t *params, vx_size hog_param_size, vx_tensor features)
[Immediate] Computes Histogram of Oriented Gradients features for the W1xW2 window in a sliding window fashion over the whole input image. More...

## Data Structure Documentation

 struct vx_hog_t

The HOG descriptor structure.

Definition at line 1699 of file vx_types.h.

Data Fields
vx_int32 cell_width The histogram cell width of type VX_TYPE_INT32.
vx_int32 cell_height The histogram cell height of type VX_TYPE_INT32.
vx_int32 block_width The histogram block width of type VX_TYPE_INT32. Must be divisible by cell_width.
vx_int32 block_height The histogram block height of type VX_TYPE_INT32. Must be divisible by cell_height.
vx_int32 block_stride The histogram block stride within the window of type VX_TYPE_INT32. Must be an integral number of cell_width and cell_height.
vx_int32 num_bins The histogram size of type VX_TYPE_INT32.
vx_int32 window_width The feature descriptor window width of type VX_TYPE_INT32
vx_int32 window_height The feature descriptor window height of type VX_TYPE_INT32
vx_int32 window_stride The feature descriptor window stride of type VX_TYPE_INT32
vx_float32 threshold The threshold for the maximum L2-norm value for a histogram bin. It is used as part of block normalization. It defaults to 0.2.

## Function Documentation

 vx_node VX_API_CALL vxHOGCellsNode ( vx_graph graph, vx_image input, vx_int32 cell_width, vx_int32 cell_height, vx_int32 num_bins, vx_tensor magnitudes, vx_tensor bins )

[Graph] Performs cell calculations for the average gradient magnitude and gradient orientation histograms.

Firstly, the gradient magnitude and gradient orientation are computed for each pixel in the input image. Two 1-D centred, point discrete derivative masks are applied to the input image in the horizontal and vertical directions.

$M_h = [-1, 0, 1]$

and

$M_v = [-1, 0, 1]^T$

$$G_v$$ is the result of applying mask $$M_v$$ to the input image, and $$G_h$$ is the result of applying mask $$M_h$$ to the input image. The border mode used for the gradient calculation is implementation dependent. Its behavior should be similar to VX_BORDER_UNDEFINED. The gradient magnitudes and gradient orientations for each pixel are then calculated in the following manner.

$G(x,y) = \sqrt{G_v(x,y)^2 + G_h(x,y)^2}$

$\theta(x,y) = arctan(G_v(x,y), G_h(x,y))$

where $$arctan(v, h)$$ is $$tan^{-1}(v/h)$$ when $$h!=0$$,

$$-pi/2$$ if $$v<0$$ and $$h==0$$,

$$pi/2$$ if $$v>0$$ and $$h==0$$

and $$0$$ if $$v==0$$ and $$h==0$$

Secondly, the gradient magnitudes and orientations are used to compute the bins output tensor and optional magnitudes output tensor. These tensors are computed on a cell level where the cells are rectangular in shape. The magnitudes tensor contains the average gradient magnitude for each cell.

$magnitudes(c) = \frac{1}{(cell\_width * cell\_height)}\sum\limits_{w=0}^{cell\_width} \sum\limits_{h=0}^{cell\_height} G_c(w,h)$

where $$G_c$$ is the gradient magnitudes related to cell $$c$$. The bins tensor contains histograms of gradient orientations for each cell. The gradient orientations at each pixel range from 0 to 360 degrees. These are quantised into a set of histogram bins based on the num_bins parameter. Each pixel votes for a specific cell histogram bin based on its gradient orientation. The vote itself is the pixel's gradient magnitude.

$bins(c, n) = \sum\limits_{w=0}^{cell\_width} \sum\limits_{h=0}^{cell\_height} G_c(w,h) * 1[B_c(w, h, num\_bins) == n]$

where $$B_c$$ produces the histogram bin number based on the gradient orientation of the pixel at location ( $$w$$, $$h$$) in cell $$c$$ based on the $$num\_bins$$ and

$1[B_c(w, h, num\_bins) == n]$

is a delta-function with value 1 when $$B_c(w, h, num\_bins) == n$$ or 0 otherwise.

Parameters
 [in] graph The reference to the graph. [in] input The input image of type VX_DF_IMAGE_U8. [in] cell_width The histogram cell width of type VX_TYPE_INT32. [in] cell_height The histogram cell height of type VX_TYPE_INT32. [in] num_bins The histogram size of type VX_TYPE_INT32. [out] magnitudes (Optional) The output average gradient magnitudes per cell of vx_tensor of type VX_TYPE_INT16 of size $$[floor(image_{width}/cell_{width}) ,floor(image_{height}/cell_{height}) ]$$. [out] bins The output gradient orientation histograms per cell of vx_tensor of type VX_TYPE_INT16 of size $$[floor(image_{width}/cell_{width}) ,floor(image_{height}/cell_{height}), num_{bins}]$$.
Returns
vx_node.
Return values
 0 Node could not be created. * Node handle.
 vx_node VX_API_CALL vxHOGFeaturesNode ( vx_graph graph, vx_image input, vx_tensor magnitudes, vx_tensor bins, const vx_hog_t * params, vx_size hog_param_size, vx_tensor features )

[Graph] The node produces HOG features for the W1xW2 window in a sliding window fashion over the whole input image. Each position produces a HOG feature vector.

Firstly if a magnitudes tensor is provided the cell histograms in the bins tensor are normalised by the average cell gradient magnitudes.

$bins(c,n) = \frac{bins(c,n)}{magnitudes(c)}$

To account for changes in illumination and contrast the cell histograms must be locally normalized which requires grouping the cell histograms together into larger spatially connected blocks. Blocks are rectangular grids represented by three parameters: the number of cells per block, the number of pixels per cell, and the number of bins per cell histogram. These blocks typically overlap, meaning that each cell histogram contributes more than once to the final descriptor. To normalize a block its cell histograms $$h$$ are grouped together to form a vector $$v = [h_1, h_2, h_3, ... , h_n]$$. This vector is normalised using L2-Hys which means performing L2-norm on this vector; clipping the result (by limiting the maximum values of v to be threshold) and renormalizing again. If the threshold is equal to zero then L2-Hys normalization is not performed.

$L2norm(v) = \frac{v}{\sqrt{\|v\|_2^2 + \epsilon^2}}$

where $$\|v\|_k$$ be its k-norm for k=1, 2, and $$\epsilon$$ be a small constant. For a specific window its HOG descriptor is then the concatenated vector of the components of the normalized cell histograms from all of the block regions contained in the window. The W1xW2 window starting position is at coordinates 0x0. If the input image has dimensions that are not an integer multiple of W1xW2 blocks with the specified stride, then the last positions that contain only a partial W1xW2 window will be calculated with the remaining part of the W1xW2 window padded with zeroes. The Window W1xW2 must also have a size so that it contains an integer number of cells, otherwise the node is not well-defined. The final output tensor will contain HOG descriptors equal to the number of windows in the input image. The output features tensor has 3 dimensions, given by:

$[ (floor((image_{width}-window_{width})/window_{stride}) + 1),$

$(floor((image_{height}-window_{height})/window_{stride}) + 1),$

$floor((window_{width} - block_{width})/block_{stride} + 1) * floor((window_{height} - block_{height})/block_{stride} + 1) *$

$(((block_{width} * block_{height}) / (cell_{width} * cell_{height})) * num_{bins})]$

See vxCreateTensor and vxCreateVirtualTensor. We recommend the output tensors always be virtual objects, with this node connected directly to the classifier. The output tensor will be very large, and using non-virtual tensors will result in a poorly optimized implementation. Merging of this node with a classifier node such as that described in the classifier extension will result in better performance. Notice that this node creation function has more parameters than the corresponding kernel. Numbering of kernel parameters (required if you create this node using the generic interface) is explicitly specified here.

Parameters
 [in] graph The reference to the graph. [in] input The input image of type VX_DF_IMAGE_U8. (Kernel parameter #0) [in] magnitudes (Optional) The gradient magnitudes per cell of vx_tensor of type VX_TYPE_INT16. It is the output of vxHOGCellsNode. (Kernel parameter #1) [in] bins The gradient orientation histograms per cell of vx_tensor of type VX_TYPE_INT16. It is the output of vxHOGCellsNode. (Kernel parameter #2) [in] params The parameters of type vx_hog_t. (Kernel parameter #3) [in] hog_param_size Size of vx_hog_t in bytes. Note that this parameter is not counted as one of the kernel parameters. [out] features The output HOG features of vx_tensor of type VX_TYPE_INT16. (Kernel parameter #4)
Returns
vx_node.
Return values
 0 Node could not be created. * Node handle.
 vx_status VX_API_CALL vxuHOGCells ( vx_context context, vx_image input, vx_int32 cell_width, vx_int32 cell_height, vx_int32 num_bins, vx_tensor magnitudes, vx_tensor bins )

[Immediate] Performs cell calculations for the average gradient magnitude and gradient orientation histograms.

Firstly, the gradient magnitude and gradient orientation are computed for each pixel in the input image. Two 1-D centred, point discrete derivative masks are applied to the input image in the horizontal and vertical directions.

$M_h = [-1, 0, 1]$

and

$M_v = [-1, 0, 1]^T$

$$G_v$$ is the result of applying mask $$M_v$$ to the input image, and $$G_h$$ is the result of applying mask $$M_h$$ to the input image. The border mode used for the gradient calculation is implementation dependent. Its behavior should be similar to VX_BORDER_UNDEFINED. The gradient magnitudes and gradient orientations for each pixel are then calculated in the following manner.

$G(x,y) = \sqrt{G_v(x,y)^2 + G_h(x,y)^2}$

$\theta(x,y) = arctan(G_v(x,y), G_h(x,y))$

where $$arctan(v, h)$$ is $$tan^{-1}(v/h)$$ when $$h!=0$$,

$$-pi/2$$ if $$v<0$$ and $$h==0$$,

$$pi/2$$ if $$v>0$$ and $$h==0$$

and $$0$$ if $$v==0$$ and $$h==0$$

Secondly, the gradient magnitudes and orientations are used to compute the bins output tensor and optional magnitudes output tensor. These tensors are computed on a cell level where the cells are rectangular in shape. The magnitudes tensor contains the average gradient magnitude for each cell.

$magnitudes(c) = \frac{1}{(cell\_width * cell\_height)}\sum\limits_{w=0}^{cell\_width} \sum\limits_{h=0}^{cell\_height} G_c(w,h)$

where $$G_c$$ is the gradient magnitudes related to cell $$c$$. The bins tensor contains histograms of gradient orientations for each cell. The gradient orientations at each pixel range from 0 to 360 degrees. These are quantised into a set of histogram bins based on the num_bins parameter. Each pixel votes for a specific cell histogram bin based on its gradient orientation. The vote itself is the pixel's gradient magnitude.

$bins(c, n) = \sum\limits_{w=0}^{cell\_width} \sum\limits_{h=0}^{cell\_height} G_c(w,h) * 1[B_c(w, h, num\_bins) == n]$

where $$B_c$$ produces the histogram bin number based on the gradient orientation of the pixel at location ( $$w$$, $$h$$) in cell $$c$$ based on the $$num\_bins$$ and

$1[B_c(w, h, num\_bins) == n]$

is a delta-function with value 1 when $$B_c(w, h, num\_bins) == n$$ or 0 otherwise.

Parameters
 [in] context The reference to the overall context. [in] input The input image of type VX_DF_IMAGE_U8. [in] cell_width The histogram cell width of type VX_TYPE_INT32. [in] cell_height The histogram cell height of type VX_TYPE_INT32. [in] num_bins The histogram size of type VX_TYPE_INT32. [out] magnitudes The output average gradient magnitudes per cell of vx_tensor of type VX_TYPE_INT16 of size $$[floor(image_{width}/cell_{width}) ,floor(image_{height}/cell_{height}) ]$$. [out] bins The output gradient orientation histograms per cell of vx_tensor of type VX_TYPE_INT16 of size $$[floor(image_{width}/cell_{width}) ,floor(image_{height}/cell_{height}), num_{bins}]$$.
Returns
A vx_status_e enumeration.
Return values
 VX_SUCCESS Success * An error occurred. See vx_status_e.
 vx_status VX_API_CALL vxuHOGFeatures ( vx_context context, vx_image input, vx_tensor magnitudes, vx_tensor bins, const vx_hog_t * params, vx_size hog_param_size, vx_tensor features )

[Immediate] Computes Histogram of Oriented Gradients features for the W1xW2 window in a sliding window fashion over the whole input image.

Firstly if a magnitudes tensor is provided the cell histograms in the bins tensor are normalised by the average cell gradient magnitudes.

$bins(c,n) = \frac{bins(c,n)}{magnitudes(c)}$

To account for changes in illumination and contrast the cell histograms must be locally normalized which requires grouping the cell histograms together into larger spatially connected blocks. Blocks are rectangular grids represented by three parameters: the number of cells per block, the number of pixels per cell, and the number of bins per cell histogram. These blocks typically overlap, meaning that each cell histogram contributes more than once to the final descriptor. To normalize a block its cell histograms $$h$$ are grouped together to form a vector $$v = [h_1, h_2, h_3, ... , h_n]$$. This vector is normalised using L2-Hys which means performing L2-norm on this vector; clipping the result (by limiting the maximum values of v to be threshold) and renormalizing again. If the threshold is equal to zero then L2-Hys normalization is not performed.

$L2norm(v) = \frac{v}{\sqrt{\|v\|_2^2 + \epsilon^2}}$

where $$\|v\|_k$$ be its k-norm for k=1, 2, and $$\epsilon$$ be a small constant. For a specific window its HOG descriptor is then the concatenated vector of the components of the normalized cell histograms from all of the block regions contained in the window. The W1xW2 window starting position is at coordinates 0x0. If the input image has dimensions that are not an integer multiple of W1xW2 blocks with the specified stride, then the last positions that contain only a partial W1xW2 window will be calculated with the remaining part of the W1xW2 window padded with zeroes. The Window W1xW2 must also have a size so that it contains an integer number of cells, otherwise the node is not well-defined. The final output tensor will contain HOG descriptors equal to the number of windows in the input image. The output features tensor has 3 dimensions, given by:

$[ (floor((image_{width}-window_{width})/window_{stride}) + 1),$

$(floor((image_{height}-window_{height})/window_{stride}) + 1),$

$floor((window_{width} - block_{width})/block_{stride} + 1) * floor((window_{height} - block_{height})/block_{stride} + 1) *$

$(((block_{width} * block_{height}) / (cell_{width} * cell_{height})) * num_{bins})]$

See vxCreateTensor and vxCreateVirtualTensor. The output tensor from this function may be very large. For this reason, is it not recommended that this "immediate mode" version of the function be used. The preferred method to perform this function is as graph node with a virtual tensor as the output.

Parameters
 [in] context The reference to the overall context. [in] input The input image of type VX_DF_IMAGE_U8. [in] magnitudes The averge gradient magnitudes per cell of vx_tensor of type VX_TYPE_INT16. It is the output of vxuHOGCells. [in] bins The gradient orientation histogram per cell of vx_tensor of type VX_TYPE_INT16. It is the output of vxuHOGCells. [in] params The parameters of type vx_hog_t. [in] hog_param_size Size of vx_hog_t in bytes. [out] features The output HOG features of vx_tensor of type VX_TYPE_INT16.
Returns
A vx_status_e enumeration.
Return values
 VX_SUCCESS Success * An error occurred. See vx_status_e.