The OpenVX Specification
dba1aa3

Extracts Histogram of Oriented Gradients features from the input grayscale image.
The Histogram of Oriented Gradients (HOG) vision function is split into two nodes vxHOGCellsNode
and vxHOGFeaturesNode
. The specification of these nodes cover a subset of possible HOG implementations. The vxHOGCellsNode
calculates the gradient orientation histograms and average gradient magnitudes for each of the cells. The vxHOGFeaturesNode
uses the cell histograms and optionally the average gradient magnitude of the cells to produce a HOG feature vector. This involves grouping up the cell histograms into blocks which are then normalized. A moving window is applied to the input image and for each location the block data associated with the window is concatenated to the HOG feature vector.
Data Structures  
struct  vx_hog_t 
The HOG descriptor structure. More...  
Functions  
vx_node VX_API_CALL  vxHOGCellsNode (vx_graph graph, vx_image input, vx_int32 cell_width, vx_int32 cell_height, vx_int32 num_bins, vx_tensor magnitudes, vx_tensor bins) 
[Graph] Performs cell calculations for the average gradient magnitude and gradient orientation histograms. More...  
vx_node VX_API_CALL  vxHOGFeaturesNode (vx_graph graph, vx_image input, vx_tensor magnitudes, vx_tensor bins, const vx_hog_t *params, vx_size hog_param_size, vx_tensor features) 
[Graph] The node produces HOG features for the W1xW2 window in a sliding window fashion over the whole input image. Each position produces a HOG feature vector. More...  
vx_status VX_API_CALL  vxuHOGCells (vx_context context, vx_image input, vx_int32 cell_width, vx_int32 cell_height, vx_int32 num_bins, vx_tensor magnitudes, vx_tensor bins) 
[Immediate] Performs cell calculations for the average gradient magnitude and gradient orientation histograms. More...  
vx_status VX_API_CALL  vxuHOGFeatures (vx_context context, vx_image input, vx_tensor magnitudes, vx_tensor bins, const vx_hog_t *params, vx_size hog_param_size, vx_tensor features) 
[Immediate] Computes Histogram of Oriented Gradients features for the W1xW2 window in a sliding window fashion over the whole input image. More...  
struct vx_hog_t 
The HOG descriptor structure.
Definition at line 1699 of file vx_types.h.
Data Fields  

vx_int32  cell_width 
The histogram cell width of type VX_TYPE_INT32 . 
vx_int32  cell_height 
The histogram cell height of type VX_TYPE_INT32 . 
vx_int32  block_width 
The histogram block width of type VX_TYPE_INT32 . Must be divisible by cell_width. 
vx_int32  block_height 
The histogram block height of type VX_TYPE_INT32 . Must be divisible by cell_height. 
vx_int32  block_stride 
The histogram block stride within the window of type VX_TYPE_INT32 . Must be an integral number of cell_width and cell_height. 
vx_int32  num_bins 
The histogram size of type VX_TYPE_INT32 . 
vx_int32  window_width 
The feature descriptor window width of type VX_TYPE_INT32 
vx_int32  window_height 
The feature descriptor window height of type VX_TYPE_INT32 
vx_int32  window_stride 
The feature descriptor window stride of type VX_TYPE_INT32 
vx_float32  threshold  The threshold for the maximum L2norm value for a histogram bin. It is used as part of block normalization. It defaults to 0.2. 
vx_node VX_API_CALL vxHOGCellsNode  (  vx_graph  graph, 
vx_image  input,  
vx_int32  cell_width,  
vx_int32  cell_height,  
vx_int32  num_bins,  
vx_tensor  magnitudes,  
vx_tensor  bins  
) 
[Graph] Performs cell calculations for the average gradient magnitude and gradient orientation histograms.
Firstly, the gradient magnitude and gradient orientation are computed for each pixel in the input image. Two 1D centred, point discrete derivative masks are applied to the input image in the horizontal and vertical directions.
\[ M_h = [1, 0, 1] \]
and
\[ M_v = [1, 0, 1]^T \]
\(G_v\) is the result of applying mask \(M_v\) to the input image, and \(G_h\) is the result of applying mask \(M_h\) to the input image. The border mode used for the gradient calculation is implementation dependent. Its behavior should be similar to VX_BORDER_UNDEFINED
. The gradient magnitudes and gradient orientations for each pixel are then calculated in the following manner.
\[ G(x,y) = \sqrt{G_v(x,y)^2 + G_h(x,y)^2} \]
\[ \theta(x,y) = arctan(G_v(x,y), G_h(x,y)) \]
where \(arctan(v, h)\) is \( tan^{1}(v/h)\) when \(h!=0\),
\( pi/2 \) if \(v<0\) and \(h==0\),
\( pi/2 \) if \(v>0\) and \(h==0\)
and \( 0 \) if \(v==0\) and \(h==0\)
Secondly, the gradient magnitudes and orientations are used to compute the bins output tensor and optional magnitudes output tensor. These tensors are computed on a cell level where the cells are rectangular in shape. The magnitudes tensor contains the average gradient magnitude for each cell.
\[magnitudes(c) = \frac{1}{(cell\_width * cell\_height)}\sum\limits_{w=0}^{cell\_width} \sum\limits_{h=0}^{cell\_height} G_c(w,h)\]
where \(G_c\) is the gradient magnitudes related to cell \(c\). The bins tensor contains histograms of gradient orientations for each cell. The gradient orientations at each pixel range from 0 to 360 degrees. These are quantised into a set of histogram bins based on the num_bins parameter. Each pixel votes for a specific cell histogram bin based on its gradient orientation. The vote itself is the pixel's gradient magnitude.
\[bins(c, n) = \sum\limits_{w=0}^{cell\_width} \sum\limits_{h=0}^{cell\_height} G_c(w,h) * 1[B_c(w, h, num\_bins) == n]\]
where \(B_c\) produces the histogram bin number based on the gradient orientation of the pixel at location ( \(w\), \(h\)) in cell \(c\) based on the \(num\_bins\) and
\[1[B_c(w, h, num\_bins) == n]\]
is a deltafunction with value 1 when \(B_c(w, h, num\_bins) == n\) or 0 otherwise.
[in]  graph  The reference to the graph. 
[in]  input  The input image of type VX_DF_IMAGE_U8 . 
[in]  cell_width  The histogram cell width of type VX_TYPE_INT32 . 
[in]  cell_height  The histogram cell height of type VX_TYPE_INT32 . 
[in]  num_bins  The histogram size of type VX_TYPE_INT32 . 
[out]  magnitudes  (Optional) The output average gradient magnitudes per cell of vx_tensor of type VX_TYPE_INT16 of size \( [floor(image_{width}/cell_{width}) ,floor(image_{height}/cell_{height}) ] \). 
[out]  bins  The output gradient orientation histograms per cell of vx_tensor of type VX_TYPE_INT16 of size \( [floor(image_{width}/cell_{width}) ,floor(image_{height}/cell_{height}), num_{bins}] \). 
vx_node
. 0  Node could not be created. 
*  Node handle. 
vx_node VX_API_CALL vxHOGFeaturesNode  (  vx_graph  graph, 
vx_image  input,  
vx_tensor  magnitudes,  
vx_tensor  bins,  
const vx_hog_t *  params,  
vx_size  hog_param_size,  
vx_tensor  features  
) 
[Graph] The node produces HOG features for the W1xW2 window in a sliding window fashion over the whole input image. Each position produces a HOG feature vector.
Firstly if a magnitudes tensor is provided the cell histograms in the bins tensor are normalised by the average cell gradient magnitudes.
\[bins(c,n) = \frac{bins(c,n)}{magnitudes(c)}\]
To account for changes in illumination and contrast the cell histograms must be locally normalized which requires grouping the cell histograms together into larger spatially connected blocks. Blocks are rectangular grids represented by three parameters: the number of cells per block, the number of pixels per cell, and the number of bins per cell histogram. These blocks typically overlap, meaning that each cell histogram contributes more than once to the final descriptor. To normalize a block its cell histograms \(h\) are grouped together to form a vector \(v = [h_1, h_2, h_3, ... , h_n]\). This vector is normalised using L2Hys which means performing L2norm on this vector; clipping the result (by limiting the maximum values of v to be threshold) and renormalizing again. If the threshold is equal to zero then L2Hys normalization is not performed.
\[L2norm(v) = \frac{v}{\sqrt{\v\_2^2 + \epsilon^2}}\]
where \( \v\_k \) be its knorm for k=1, 2, and \( \epsilon \) be a small constant. For a specific window its HOG descriptor is then the concatenated vector of the components of the normalized cell histograms from all of the block regions contained in the window. The W1xW2 window starting position is at coordinates 0x0. If the input image has dimensions that are not an integer multiple of W1xW2 blocks with the specified stride, then the last positions that contain only a partial W1xW2 window will be calculated with the remaining part of the W1xW2 window padded with zeroes. The Window W1xW2 must also have a size so that it contains an integer number of cells, otherwise the node is not welldefined. The final output tensor will contain HOG descriptors equal to the number of windows in the input image. The output features tensor has 3 dimensions, given by:
\[[ (floor((image_{width}window_{width})/window_{stride}) + 1),\]
\[ (floor((image_{height}window_{height})/window_{stride}) + 1),\]
\[ floor((window_{width}  block_{width})/block_{stride} + 1) * floor((window_{height}  block_{height})/block_{stride} + 1) *\]
\[ (((block_{width} * block_{height}) / (cell_{width} * cell_{height})) * num_{bins})] \]
See vxCreateTensor
and vxCreateVirtualTensor
. We recommend the output tensors always be virtual objects, with this node connected directly to the classifier. The output tensor will be very large, and using nonvirtual tensors will result in a poorly optimized implementation. Merging of this node with a classifier node such as that described in the classifier extension will result in better performance. Notice that this node creation function has more parameters than the corresponding kernel. Numbering of kernel parameters (required if you create this node using the generic interface) is explicitly specified here.
[in]  graph  The reference to the graph. 
[in]  input  The input image of type VX_DF_IMAGE_U8 . (Kernel parameter #0) 
[in]  magnitudes  (Optional) The gradient magnitudes per cell of vx_tensor of type VX_TYPE_INT16 . It is the output of vxHOGCellsNode . (Kernel parameter #1) 
[in]  bins  The gradient orientation histograms per cell of vx_tensor of type VX_TYPE_INT16 . It is the output of vxHOGCellsNode . (Kernel parameter #2) 
[in]  params  The parameters of type vx_hog_t . (Kernel parameter #3) 
[in]  hog_param_size  Size of vx_hog_t in bytes. Note that this parameter is not counted as one of the kernel parameters. 
[out]  features  The output HOG features of vx_tensor of type VX_TYPE_INT16 . (Kernel parameter #4) 
vx_node
. 0  Node could not be created. 
*  Node handle. 
vx_status VX_API_CALL vxuHOGCells  (  vx_context  context, 
vx_image  input,  
vx_int32  cell_width,  
vx_int32  cell_height,  
vx_int32  num_bins,  
vx_tensor  magnitudes,  
vx_tensor  bins  
) 
[Immediate] Performs cell calculations for the average gradient magnitude and gradient orientation histograms.
Firstly, the gradient magnitude and gradient orientation are computed for each pixel in the input image. Two 1D centred, point discrete derivative masks are applied to the input image in the horizontal and vertical directions.
\[ M_h = [1, 0, 1] \]
and
\[ M_v = [1, 0, 1]^T \]
\(G_v\) is the result of applying mask \(M_v\) to the input image, and \(G_h\) is the result of applying mask \(M_h\) to the input image. The border mode used for the gradient calculation is implementation dependent. Its behavior should be similar to VX_BORDER_UNDEFINED
. The gradient magnitudes and gradient orientations for each pixel are then calculated in the following manner.
\[ G(x,y) = \sqrt{G_v(x,y)^2 + G_h(x,y)^2} \]
\[ \theta(x,y) = arctan(G_v(x,y), G_h(x,y)) \]
where \(arctan(v, h)\) is \( tan^{1}(v/h)\) when \(h!=0\),
\( pi/2 \) if \(v<0\) and \(h==0\),
\( pi/2 \) if \(v>0\) and \(h==0\)
and \( 0 \) if \(v==0\) and \(h==0\)
Secondly, the gradient magnitudes and orientations are used to compute the bins output tensor and optional magnitudes output tensor. These tensors are computed on a cell level where the cells are rectangular in shape. The magnitudes tensor contains the average gradient magnitude for each cell.
\[magnitudes(c) = \frac{1}{(cell\_width * cell\_height)}\sum\limits_{w=0}^{cell\_width} \sum\limits_{h=0}^{cell\_height} G_c(w,h)\]
where \(G_c\) is the gradient magnitudes related to cell \(c\). The bins tensor contains histograms of gradient orientations for each cell. The gradient orientations at each pixel range from 0 to 360 degrees. These are quantised into a set of histogram bins based on the num_bins parameter. Each pixel votes for a specific cell histogram bin based on its gradient orientation. The vote itself is the pixel's gradient magnitude.
\[bins(c, n) = \sum\limits_{w=0}^{cell\_width} \sum\limits_{h=0}^{cell\_height} G_c(w,h) * 1[B_c(w, h, num\_bins) == n]\]
where \(B_c\) produces the histogram bin number based on the gradient orientation of the pixel at location ( \(w\), \(h\)) in cell \(c\) based on the \(num\_bins\) and
\[1[B_c(w, h, num\_bins) == n]\]
is a deltafunction with value 1 when \(B_c(w, h, num\_bins) == n\) or 0 otherwise.
[in]  context  The reference to the overall context. 
[in]  input  The input image of type VX_DF_IMAGE_U8 . 
[in]  cell_width  The histogram cell width of type VX_TYPE_INT32 . 
[in]  cell_height  The histogram cell height of type VX_TYPE_INT32 . 
[in]  num_bins  The histogram size of type VX_TYPE_INT32 . 
[out]  magnitudes  The output average gradient magnitudes per cell of vx_tensor of type VX_TYPE_INT16 of size \( [floor(image_{width}/cell_{width}) ,floor(image_{height}/cell_{height}) ] \). 
[out]  bins  The output gradient orientation histograms per cell of vx_tensor of type VX_TYPE_INT16 of size \( [floor(image_{width}/cell_{width}) ,floor(image_{height}/cell_{height}), num_{bins}] \). 
vx_status_e
enumeration. VX_SUCCESS  Success 
*  An error occurred. See vx_status_e . 
vx_status VX_API_CALL vxuHOGFeatures  (  vx_context  context, 
vx_image  input,  
vx_tensor  magnitudes,  
vx_tensor  bins,  
const vx_hog_t *  params,  
vx_size  hog_param_size,  
vx_tensor  features  
) 
[Immediate] Computes Histogram of Oriented Gradients features for the W1xW2 window in a sliding window fashion over the whole input image.
Firstly if a magnitudes tensor is provided the cell histograms in the bins tensor are normalised by the average cell gradient magnitudes.
\[bins(c,n) = \frac{bins(c,n)}{magnitudes(c)}\]
To account for changes in illumination and contrast the cell histograms must be locally normalized which requires grouping the cell histograms together into larger spatially connected blocks. Blocks are rectangular grids represented by three parameters: the number of cells per block, the number of pixels per cell, and the number of bins per cell histogram. These blocks typically overlap, meaning that each cell histogram contributes more than once to the final descriptor. To normalize a block its cell histograms \(h\) are grouped together to form a vector \(v = [h_1, h_2, h_3, ... , h_n]\). This vector is normalised using L2Hys which means performing L2norm on this vector; clipping the result (by limiting the maximum values of v to be threshold) and renormalizing again. If the threshold is equal to zero then L2Hys normalization is not performed.
\[L2norm(v) = \frac{v}{\sqrt{\v\_2^2 + \epsilon^2}}\]
where \( \v\_k \) be its knorm for k=1, 2, and \( \epsilon \) be a small constant. For a specific window its HOG descriptor is then the concatenated vector of the components of the normalized cell histograms from all of the block regions contained in the window. The W1xW2 window starting position is at coordinates 0x0. If the input image has dimensions that are not an integer multiple of W1xW2 blocks with the specified stride, then the last positions that contain only a partial W1xW2 window will be calculated with the remaining part of the W1xW2 window padded with zeroes. The Window W1xW2 must also have a size so that it contains an integer number of cells, otherwise the node is not welldefined. The final output tensor will contain HOG descriptors equal to the number of windows in the input image. The output features tensor has 3 dimensions, given by:
\[[ (floor((image_{width}window_{width})/window_{stride}) + 1),\]
\[ (floor((image_{height}window_{height})/window_{stride}) + 1),\]
\[ floor((window_{width}  block_{width})/block_{stride} + 1) * floor((window_{height}  block_{height})/block_{stride} + 1) *\]
\[ (((block_{width} * block_{height}) / (cell_{width} * cell_{height})) * num_{bins})] \]
See vxCreateTensor
and vxCreateVirtualTensor
. The output tensor from this function may be very large. For this reason, is it not recommended that this "immediate mode" version of the function be used. The preferred method to perform this function is as graph node with a virtual tensor as the output.
[in]  context  The reference to the overall context. 
[in]  input  The input image of type VX_DF_IMAGE_U8 . 
[in]  magnitudes  The averge gradient magnitudes per cell of vx_tensor of type VX_TYPE_INT16 . It is the output of vxuHOGCells . 
[in]  bins  The gradient orientation histogram per cell of vx_tensor of type VX_TYPE_INT16 . It is the output of vxuHOGCells . 
[in]  params  The parameters of type vx_hog_t . 
[in]  hog_param_size  Size of vx_hog_t in bytes. 
[out]  features  The output HOG features of vx_tensor of type VX_TYPE_INT16 . 
vx_status_e
enumeration. VX_SUCCESS  Success 
*  An error occurred. See vx_status_e . 