#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <math.h>
#include "allheaders.h"
#define DEBUG_CROSSINGS 0 |
#define DEBUG_FREQUENCY 0 |
#define DEBUG_HISTO 0 |
Input: na Return: na with all values rounded to nearest integer, or null on error
Input: na halfwidth (of rectangular filter, minus the center) Return: na (after low-pass filtering), or null on error
Notes: (1) Full convolution takes place only from i = halfwidth to i = n - halfwidth - 1. We do the end parts using only the partial array available. We do not pad the ends with 0. (2) This implementation assumes specific fields in the Numa!
Input: nax (<optional> numa of abscissa values) nay (numa of ordinate values, corresponding to nax) delta (parameter used to identify when a new peak can be found) Return: nad (abscissa pts at threshold), or null on error
Notes: (1) If nax == NULL, we use startx and delx from nay to compute the crossing values in nad.
Input: nax (<optional> numa of abscissa values; can be NULL) nay (numa of ordinate values, corresponding to nax) thresh (threshold value for nay) Return: nad (abscissa pts at threshold), or null on error
Notes: (1) If nax == NULL, we use startx and delx from nay to compute the crossing values in nad.
l_int32 numaEvalBestHaarParameters | ( | NUMA * | nas, | |
l_float32 | relweight, | |||
l_int32 | nwidth, | |||
l_int32 | nshift, | |||
l_float32 | minwidth, | |||
l_float32 | maxwidth, | |||
l_float32 * | pbestwidth, | |||
l_float32 * | pbestshift, | |||
l_float32 * | pbestscore | |||
) |
Input: nas (numa of non-negative signal values) relweight (relative weight of (-1 comb) / (+1 comb) contributions to the 'convolution'. In effect, the convolution kernel is a comb consisting of alternating +1 and -weight.) nwidth (number of widths to consider) nshift (number of shifts to consider for each width) minwidth (smallest width to consider) maxwidth (largest width to consider) &bestwidth (<return> width giving largest score) &bestshift (<return> shift giving largest score) &bestscore (<optional return>=""> convolution with "Haar"-like comb) Return: 0 if OK, 1 on error
Notes: (1) This does a linear sweep of widths, evaluating at shifts for each width, computing the score from a convolution with a long comb, and finding the (width, shift) pair that gives the maximum score. The best width is the "half-wavelength" of the signal. (2) The convolving function is a comb of alternating values +1 and -1 * relweight, separated by the width and phased by the shift. This is similar to a Haar transform, except there the convolution is performed with a square wave. (3) The function is useful for finding the line spacing and strength of line signal from pixel sum projections. (4) The score is normalized to the size of nas divided by the number of half-widths. For image applications, the input is typically an array of pixel projections, so one should normalize by dividing the score by the image width in the pixel projection direction.
l_int32 numaEvalHaarSum | ( | NUMA * | nas, | |
l_float32 | width, | |||
l_float32 | shift, | |||
l_float32 | relweight, | |||
l_float32 * | pscore | |||
) |
Input: nas (numa of non-negative signal values) width (distance between +1 and -1 in convolution comb) shift (phase of the comb: location of first +1) relweight (relative weight of (-1 comb) / (+1 comb) contributions to the 'convolution'. In effect, the convolution kernel is a comb consisting of alternating +1 and -weight.) &score (<return> convolution with "Haar"-like comb) Return: 0 if OK, 1 on error
Notes: (1) This does a convolution with a comb of alternating values +1 and -relweight, separated by the width and phased by the shift. This is similar to a Haar transform, except that for Haar, (1) the convolution kernel is symmetric about 0, so the relweight is 1.0, and (2) the convolution is performed with a square wave. (2) The score is normalized to the size of nas divided by twice the "width". For image applications, the input is typically an array of pixel projections, so one should normalize by dividing the score by the image width in the pixel projection direction. (3) To get a Haar-like result, use relweight = 1.0. For detecting signals where you expect every other sample to be close to zero, as with barcodes or filtered text lines, you can use relweight > 1.0.
Input: nas (input values) delta (relative amount to resolve peaks and valleys) Return: nad (locations of extrema), or null on error
Notes: (1) This returns a sequence of extrema (peaks and valleys). (2) The algorithm is analogous to that for determining mountain peaks. Suppose we have a local peak, with bumps on the side. Under what conditions can we consider those 'bumps' to be actual peaks? The answer: if the bump is separated from the peak by a saddle that is at least 500 feet below the bump. (3) Operationally, suppose we are looking for a peak. We are keeping the largest value we've seen since the last valley, and are looking for a value that is delta BELOW our current peak. When we find such a value, we label the peak, use the current value to label the valley, and then do the same operation in reverse (looking for a valley).
Input: source na max number of peaks to be found fract1 (min fraction of peak value) fract2 (min slope) Return: peak na, or null on error.
Notes: (1) The returned na consists of sets of four numbers representing the peak, in the following order: left edge; peak center; right edge; normalized peak area
l_int32 numaGetHistogramStats | ( | NUMA * | nahisto, | |
l_float32 | startx, | |||
l_float32 | deltax, | |||
l_float32 * | pxmean, | |||
l_float32 * | pxmedian, | |||
l_float32 * | pxmode, | |||
l_float32 * | pxvariance | |||
) |
Input: nahisto (histogram: y(x(i)), i = 0 ... nbins - 1) startx (x value of first bin: x(0)) deltax (x increment between bins; the bin size; x(1) - x(0)) &xmean (<optional return>=""> mean value of histogram) &xmedian (<optional return>=""> median value of histogram) &xmode (<optional return>=""> mode value of histogram: xmode = x(imode), where y(xmode) >= y(x(i)) for all i != imode) &xvariance (<optional return>=""> variance of x) Return: 0 if OK, 1 on error
Notes: (1) If the histogram represents the relation y(x), the computed values that are returned are the x values. These are NOT the bucket indices i; they are related to the bucket indices by x(i) = startx + i * deltax
l_int32 numaGetHistogramStatsOnInterval | ( | NUMA * | nahisto, | |
l_float32 | startx, | |||
l_float32 | deltax, | |||
l_int32 | ifirst, | |||
l_int32 | ilast, | |||
l_float32 * | pxmean, | |||
l_float32 * | pxmedian, | |||
l_float32 * | pxmode, | |||
l_float32 * | pxvariance | |||
) |
numaGetHistogramStatsOnInterval()
Input: nahisto (histogram: y(x(i)), i = 0 ... nbins - 1) startx (x value of first bin: x(0)) deltax (x increment between bins; the bin size; x(1) - x(0)) ifirst (first bin to use for collecting stats) ilast (last bin for collecting stats; use 0 to go to the end) &xmean (<optional return>=""> mean value of histogram) &xmedian (<optional return>=""> median value of histogram) &xmode (<optional return>=""> mode value of histogram: xmode = x(imode), where y(xmode) >= y(x(i)) for all i != imode) &xvariance (<optional return>=""> variance of x) Return: 0 if OK, 1 on error
Notes: (1) If the histogram represents the relation y(x), the computed values that are returned are the x values. These are NOT the bucket indices i; they are related to the bucket indices by x(i) = startx + i * deltax
l_int32 numaGetStatsUsingHistogram | ( | NUMA * | na, | |
l_int32 | maxbins, | |||
l_float32 * | pmin, | |||
l_float32 * | pmax, | |||
l_float32 * | pmean, | |||
l_float32 * | pvariance, | |||
l_float32 * | pmedian, | |||
l_float32 | rank, | |||
l_float32 * | prval, | |||
NUMA ** | phisto | |||
) |
Input: na (an arbitrary set of numbers; not ordered and not a histogram) maxbins (the maximum number of bins to be allowed in the histogram; use 0 for consecutive integer bins) &min (<optional return>=""> min value of set) &max (<optional return>=""> max value of set) &mean (<optional return>=""> mean value of set) &variance (<optional return>=""> variance) &median (<optional return>=""> median value of set) rank (in [0.0 ... 1.0]; median has a rank 0.5; ignored if &rval == NULL) &rval (<optional return>=""> value in na corresponding to ) &histo (<optional return>=""> Numa histogram; use NULL to prevent) Return: 0 if OK, 1 on error
Notes: (1) This is a simple interface for gathering statistics from a numa, where a histogram is used 'under the covers' to avoid sorting if a rank value is requested. In that case, by using a histogram we are trading speed for accuracy, because the values in are quantized to the center of a set of bins. (2) If the median, other rank value, or histogram are not requested, the calculation is all performed on the input Numa. (3) The variance is the average of the square of the difference from the mean. The median is the value in na with rank 0.5. (4) There are two situations where this gives rank results with accuracy comparable to computing stastics directly on the input data, without binning into a histogram: (a) the data is integers and the range of data is less than , and (b) the data is floats and the range is small compared to , so that the binsize is much less than 1. (5) If a histogram is used and the numbers in the Numa extend over a large range, you can limit the required storage by specifying the maximum number of bins in the histogram. Use == 0 to force the bin size to be 1. (6) This optionally returns the median and one arbitrary rank value. If you need several rank values, return the histogram and use numaHistogramGetValFromRank(nah, rank, &rval) multiple times.
Input: na (histogram) rval (value of input sample for which we want the rank) &rank (<return> fraction of total samples below rval) Return: 0 if OK, 1 on error
Notes: (1) If we think of the histogram as a function y(x), normalized to 1, for a given input value of x, this computes the rank of x, which is the integral of y(x) from the start value of x to the input value. (2) This function only makes sense when applied to a Numa that is a histogram. The values in the histogram can be ints and floats, and are computed as floats. The rank is returned as a float between 0.0 and 1.0. (3) The numa parameters startx and binsize are used to compute x from the Numa index i.
Input: na (histogram) rank (fraction of total samples) &rval (<return> approx. to the bin value) Return: 0 if OK, 1 on error
Notes: (1) If we think of the histogram as a function y(x), this returns the value x such that the integral of y(x) from the start value to x gives the fraction 'rank' of the integral of y(x) over all bins. (2) This function only makes sense when applied to a Numa that is a histogram. The values in the histogram can be ints and floats, and are computed as floats. The val is returned as a float, even though the buckets are of integer width. (3) The numa parameters startx and binsize are used to compute x from the Numa index i.
Input: na maxbins (max number of histogram bins) &binsize (<return> size of histogram bins) &binstart (<optional return>=""> start val of minimum bin; input NULL to force start at 0) Return: na consisiting of histogram of integerized values, or null on error.
Note: (1) This simple interface is designed for integer data. The bins are of integer width and start on integer boundaries, so the results on float data will not have high precision. (2) Specify the max number of input bins. Then , the size of bins necessary to accommodate the input data, is returned. It is one of the sequence: {1, 2, 5, 10, 20, 50, ...}. (3) If &binstart is given, all values are accommodated, and the min value of the starting bin is returned. Otherwise, all negative values are discarded and the histogram bins start at 0.
Input: na (numa of floats; these may be integers) maxbins (max number of histogram bins; >= 1) Return: na consisiting of histogram of quantized float values, or null on error.
Notes: (1) This simple interface is designed for accurate binning of both integer and float data. (2) If the array data is integers, and the range of integers is smaller than , they are binned as they fall, with binsize = 1. (3) If the range of data, (maxval - minval), is larger than , or if the data is floats, they are binned into exactly bins. (4) Unlike numaMakeHistogram(), these bins in general have non-integer location and width, even for integer data.
Input: na binsize (typically 1.0) maxsize (of histogram ordinate) Return: na (histogram of bins of size , starting with the na[0] (x = 0.0) and going up to a maximum of x = , by increments of ), or null on error
Notes: (1) This simple function generates a histogram of values from na, discarding all values < 0.0 or greater than min(, maxval), where maxval is the maximum value in na. The histogram data is put in bins of size delx = , starting at x = 0.0. We use as many bins as are needed to hold the data.
l_int32 numaMakeRankFromHistogram | ( | l_float32 | startx, | |
l_float32 | deltax, | |||
NUMA * | nasy, | |||
l_int32 | npts, | |||
NUMA ** | pnax, | |||
NUMA ** | pnay | |||
) |
Input: startx (xval corresponding to first element in nay) deltax (x increment between array elements in nay) nasy (input histogram, assumed equally spaced) npts (number of points to evaluate rank function) &nax (<optional return>=""> array of x values in range) &nay (<return> rank array of specified npts) Return: 0 if OK, 1 on error
Input: nas (input histogram) area (target sum of all numbers in dest histogram; e.g., use area = 1.0 if this represents a probability distribution) Return: nad (normalized histogram), or null on error
Input: nas (input histogram) newsize (number of old bins contained in each new bin) Return: nad (more coarsely re-binned histogram), or null on error
l_int32 numaSelectCrossingThreshold | ( | NUMA * | nax, | |
NUMA * | nay, | |||
l_float32 | estthresh, | |||
l_float32 * | pbestthresh | |||
) |
Input: nax (<optional> numa of abscissa values; can be NULL) nay (signal) estthresh (estimated pixel threshold for crossing: e.g., for images, white <--> black; typ. ~120) &bestthresh (<return> robust estimate of threshold to use) Return: 0 if OK, 1 on error
Note: (1) When a valid threshold is used, the number of crossings is a maximum, because none are missed. If no threshold intersects all the crossings, the crossings must be determined with numaCrossingsByPeaks(). (2) is an input estimate of the threshold that should be used. We compute the crossings with 41 thresholds (20 below and 20 above). There is a range in which the number of crossings is a maximum. Return a threshold in the center of this stable plateau of crossings. This can then be used with numaCrossingsByThreshold() to get a good estimate of crossing locations.
l_int32 numaSplitDistribution | ( | NUMA * | na, | |
l_float32 | scorefract, | |||
l_int32 * | psplitindex, | |||
l_float32 * | pave1, | |||
l_float32 * | pave2, | |||
l_float32 * | pnum1, | |||
l_float32 * | pnum2, | |||
NUMA ** | pnascore | |||
) |
Input: na (histogram) scorefract (fraction of the max score, used to determine the range over which the histogram min is searched) &splitindex (<optional return>=""> index for splitting) &ave1 (<optional return>=""> average of lower distribution) &ave2 (<optional return>=""> average of upper distribution) &num1 (<optional return>=""> population of lower distribution) &num2 (<optional return>=""> population of upper distribution) &nascore (<optional return>=""> for debugging; otherwise use null) Return: 0 if OK, 1 on error
Notes: (1) This function is intended to be used on a distribution of values that represent two sets, such as a histogram of pixel values, and the goal is to determine the means of the two sets and the best splitting point. (2) The Otsu method finds a split point that divides the distribution into two parts by maximizing a score function that is the product of two terms: (a) the square of the difference of centroids, (ave1 - ave2)^2 (b) fract1 * (1 - fract1) where fract1 is the fraction in the lower distribution. This biases the split point into the larger "bump" (i.e., toward the point where the (b) term reaches its maximum of 0.25 at fract1 = 0.5. To avoid this, we define a range of values near the maximum of the score function, and choose the value within this range such that the histogram itself has a minimum value. The range is determined by scorefract: we include all abscissa values to the left and right of the value that maximizes the score, such that the score stays above (1 - scorefract) * maxscore. (3) We normalize the score so that if the two distributions were of equal size and at opposite ends of the numa, the score would be 1.0.
const l_int32 BinSizeArray[] [static] |
Инициализатор
{2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000, 20000, 50000, 100000, 200000, 500000, 1000000, 2000000, 5000000, 10000000, 200000000, 50000000, 100000000}