Functions for descriptive statistics
│
English (en) │
français (fr) │
Descriptive statistics aim at characterising empirical data by summative parameters (and also by tables and plots).
Contents
Standard functions defined in math unit
The unit math
of the runtime library provides a plethora of routines for descriptive statistics.

mean
: Returns the mean value of an array. 
stdDev
: Returns the (sample) standard deviation of an array. 
popNStdDev
: Returns the (population) standard deviation of an array. 
meanAndStdDev
: Returns mean and standard deviation of an array. 
momentSkewKurtosis
: Returns the first four moments of an array. 
variance
: Returns the (sample) variance of an array. 
popnVariance
: Returns the (population) variance of an array. 
totalVariance
: Returns the total variance of an array. 
sumOfSquares
: Returns the sum of squares of an array. 
sum
: Returns the sum of values of an array. 
sumsAndSquares
: Returns sum and sum of squares of the values in an array.
These functions expect an array of predefined length (e.g. array[1..100] of float
) or a 0based open array (e.g. array of extended
) with subsequent call of the setLength
procedure.
Standard functions defined in other units

length
: Delivers the length (n) of an array.
Custom functions
Some functions aren't defined in the RTL. The subsequent section lists source code of some commonly used measures for centrality and dispersion. Where not otherwise specified the code is provided with a BSD license.
Together with other useful statistical code expanded and thoroughly tested versions of these functions are also available in the QUANTUM SALIS project.
Median
The term median denotes the 50% quantile of a sample, i.e. the value separating the higher half of a data vector from its lower half. It can be calculated with:
type TExtArray = array of Extended; // Based on Shell Sort  avoiding recursion allows for sorting // of very large arrays, too function sortExtArray(const data: TExtArray): TExtArray; var data2: TExtArray; arrayLength, i, j, k: longint; h: extended; begin arrayLength := high(data); data2 := copy(data, 0, arrayLength + 1); k := arrayLength div 2; while k > 0 do begin for i := 0 to arrayLength  k do begin j := i; while (j >= 0) and (data2[j] > data2[j + k]) do begin h := data2[j]; data2[j] := data2[j + k]; data2[j + k] := h; if j > k then dec(j, k) else j := 0; end; end; k := k div 2 end; result := data2; end; function median(const data: TExtArray): extended; var centralElement: integer; sortedData: TExtArray; begin sortedData := sortExtArray(data); centralElement := length(sortedData) div 2; if odd(length(sortedData)) then result := sortedData[centralElement] else result := (sortedData[centralElement  1] + sortedData[centralElement]) / 2; end;
Of course, the function sortExtArray
may be replaced with another sorting algorithm, e.g. QuickSort.
The Shell Sort algorithm presented here has the advantage that it is able to sort very large vectors even on machines with a very small amount of memory (albeit with the expense of slightly reduced speed compared to QuickSort).
Standard error of the mean
The standard error of the mean (SEM) is a measure that estimates how precisely the true mean of the population can be known.
Calculation of SEM is simple:
function sem(const data: array of Extended): extended; begin sem := stddev(data) / sqrt(length(data)); end;
Coefficient of variation
The coefficient of variation (CoV or CV), also known as relative standard deviation (RSD), is a measure of dispersion that is standardised with respect to the data's mean.
It can be calculated with:
{** calculates the coefficient of variation (CV or CoV) of a vector of extended } function cv(const data: array of Extended): extended; begin result := stddev(data) / mean(data); end;