I am trying to figure out how to calculate covariance with the Python Numpy function cov. When I pass it two one-dimentional arrays, I get back a 2x2 matrix of results. I don't know what to do with that. I'm not great at statistics, but I believe covariance in such a situation should be a single number. This is what I am looking for. I wrote my own:
def cov(a, b):
if len(a) != len(b):
return
a_mean = np.mean(a)
b_mean = np.mean(b)
sum = 0
for i in range(0, len(a)):
sum += ((a[i] - a_mean) * (b[i] - b_mean))
return sum/(len(a)-1)
That works, but I figure the Numpy version is much more efficient, if I could figure out how to use it.
Does anybody know how to make the Numpy cov function perform like the one I wrote?
Thanks,
Dave
This question is related to
python
numpy
covariance
When a
and b
are 1-dimensional sequences, numpy.cov(a,b)[0][1]
is equivalent to your cov(a,b)
.
The 2x2 array returned by np.cov(a,b)
has elements equal to
cov(a,a) cov(a,b)
cov(a,b) cov(b,b)
(where, again, cov
is the function you defined above.)
Thanks to unutbu for the explanation. By default numpy.cov calculates the sample covariance. To obtain the population covariance you can specify normalisation by the total N samples like this:
Covariance = numpy.cov(a, b, bias=True)[0][1]
print(Covariance)
or like this:
Covariance = numpy.cov(a, b, ddof=0)[0][1]
print(Covariance)
Source: Stackoverflow.com