NumPy Statistical Functions
NumPy provides many statistical functions to find minimum elements, maximum elements, percentile standard deviation, and variance from arrays. Function descriptions are as follows:
numpy.amin() and numpy.amax()
numpy.amin() is used to calculate the minimum values of the elements along a specified axis in the array.
numpy.amax() is used to calculate the maximum values of the elements along a specified axis in the array.
Example
import numpy as np
a = np.array([[3,7,5],[8,4,3],[2,4,9]])
print ('Our array is:')
print (a)
print ('\n')
print ('Calling amin() function:')
print (np.amin(a,1))
print ('\n')
print ('Calling amin() function again:')
print (np.amin(a,0))
print ('\n')
print ('Calling amax() function:')
print (np.amax(a))
print ('\n')
print ('Calling amax() function again:')
print (np.amax(a, axis = 0))
Output:
Our array is:
[[3 7 5]
[8 4 3]
[2 4 9]]
Calling amin() function:
[3 3 2]
Calling amin() function again:
[2 4 3]
Calling amax() function:
9
Calling amax() function again:
[8 7 9]
numpy.ptp()
The numpy.ptp() function calculates the range of values (maximum - minimum) in the array.
Example
import numpy as np
a = np.array([[3,7,5],[8,4,3],[2,4,9]])
print ('Our array is:')
print (a)
print ('\n')
print ('Calling ptp() function:')
print (np.ptp(a))
print ('\n')
print ('Calling ptp() function along axis 1:')
print (np.ptp(a, axis = 1))
print ('\n')
print ('Calling ptp() function along axis 0:')
print (np.ptp(a, axis = 0))
Output:
Our array is:
[[3 7 5]
[8 4 3]
[2 4 9]]
Calling ptp() function:
7
Calling ptp() function along axis 1:
[4 5 7]
Calling ptp() function along axis 0:
[6 3 6]
numpy.percentile()
The percentile is a measure used in statistics that indicates the value below which a given percentage of observations in a group of observations fall. The numpy.percentile() function takes the following parameters.
numpy.percentile(a, q, axis)
Parameters:
- a: Input array
- q: Percentile to compute, which must be between 0 and 100 inclusive
- axis: Axis along which the percentiles are computed
First, understand the percentile:
The p-th percentile is a value such that at least p% of the data items are less than or equal to this value, and at least (100-p)% of the data items are greater than or equal to this value.
For example: Entrance exam scores from colleges are often reported in percentile form. Suppose a candidate's raw score in the Chinese section of the entrance exam is 54. It is difficult to know how his score compares to other students who took the same exam. However, if the raw score of 54 corresponds exactly to the 70th percentile, we can understand that approximately 70% of the students scored lower than him, and about 30% scored higher.
Here, p = 70.
Example
import numpy as np
a = np.array([[10, 7, 4], [3, 2, 1]])
print ('Our array is:')
print (a)
print ('Calling percentile() function:')
# 50% percentile, which is the median after sorting in 'a'
print (np.percentile(a, 50))
# axis 0, compute along columns
print (np.percentile(a, 50, axis=0))
# axis 1, compute along rows
print (np.percentile(a, 50, axis=1))
# Keep dimensions
print (np.percentile(a, 50, axis=1, keepdims=True))
Output:
Our array is:
[[10 7 4]
[ 3 2 1]]
Calling percentile() function:
3.5
[6.5 4.5 2.5]
[7. 2.]
[[7.]
[2.]]
numpy.median()
The numpy.median() function is used to calculate the median (middle value) of the elements in array a.
Example
import numpy as np
a = np.array([[10, 7, 4], [3, 2, 1]])
print ('Our array is:')
print (a)
print ('Calling median() function:')
print (np.median(a))
# axis 0, compute along columns
print (np.median(a, axis=0))
# axis 1, compute along rows
print (np.median(a, axis=1))
Output:
Our array is:
[[10 7 4]
[ 3 2 1]]
Calling median() function:
3.5
[6.5 4.5 2.5]
[7. 2.]
Our array is: [[30 65 70] [80 95 10] [50 90 60]]
Calling the median() function: 65.0
Calling the median() function along axis 0: [50. 90. 60.]
Calling the median() function along axis 1: [65. 80. 60.] Modified array: (array([0.625, 2.625, 4.625]), array([8., 8., 8.]))
Standard Deviation
Standard deviation is a measure of the dispersion of a set of data from its mean.
It is the square root of the variance.
The formula for standard deviation is as follows:
std = sqrt(mean((x - x.mean())**2))
If the array is [1, 2, 3, 4], its mean is 2.5. Therefore, the squared differences are [2.25, 0.25, 0.25, 2.25], and the square root of the mean of these values divided by 4, i.e., sqrt(5/4), results in 1.1180339887498949.
Example
import numpy as np
print(np.std([1, 2, 3, 4]))
Output:
1.1180339887498949
Variance
Variance in statistics (sample variance) is the average of the squared differences from the Mean, i.e., mean((x - x.mean())**2).
In other words, standard deviation is the square root of the variance.
Example
import numpy as np
print(np.var([1, 2, 3, 4]))
Output:
1.25