Standard deviation of a list

Question

I want to find mean and standard deviation of 1st, 2nd,... digits of several (Z) lists. For example, I have

A_rank=[0.8,0.4,1.2,3.7,2.6,5.8]
B_rank=[0.1,2.8,3.7,2.6,5,3.4]
C_Rank=[1.2,3.4,0.5,0.1,2.5,6.1]
# etc (up to Z_rank )...

Now I want to take the mean and std of *_Rank[0], the mean and std of *_Rank[1], etc.
(ie: mean and std of the 1st digit from all the (A..Z)_rank lists;
the mean and std of the 2nd digit from all the (A..Z)_rank lists;
the mean and std of the 3rd digit...; etc).

Hello, viral. Stack Overflow works best as a question-and-answer site. You ask a question, and everyone else provides answers. Your post contains only statements, no questions. Do you have a specific programming question? To put it another way, what have you tried so far, and where are you stuck? — Robᵩ, Mar 13 '13 at 15:40
Sorry If I did not convey question properly. I want to take mean of A_rank[0] (0.8),B_rank[0](0.1),C_rank[0](1.2),...Z_rank[0]. same for A_rank[1](0.4),B_rank[1](2.8),C_rank[1](3.4),...Z_rank[1]. — physics_for_all, Mar 13 '13 at 15:45

NPE · Answer 1 · 2013-03-13 15:50:43Z

up vote 61 down vote

I would put A_Rank et al into a 2D NumPy array, and then use numpy.mean() and numpy.std() to compute the means and the standard deviations:

In [17]: import numpy

In [18]: arr = numpy.array([A_rank, B_rank, C_rank])

In [20]: numpy.mean(arr, axis=0)
Out[20]: 
array([ 0.7       ,  2.2       ,  1.8       ,  2.13333333,  3.36666667,
        5.1       ])

In [21]: numpy.std(arr, axis=0)
Out[21]: 
array([ 0.45460606,  1.29614814,  1.37355985,  1.50628314,  1.15566239,
        1.2083046 ])

edited Mar 13 '13 at 15:50

answered Mar 13 '13 at 15:42

NPE

282k46616793

1

the result of numpy.std is not correct. Given these values: 20,31,50,69,80 and put in Excel using STDEV.S(A1:A5) the result is 25,109 NOT 22,45. – Jim Clermonts Oct 1 '15 at 9:28

2

This is correct: numpy.std(arr, ddof=1) – Jim Clermonts Oct 1 '15 at 9:34

11

@JimClermonts It has nothing to do with correctness. Whether or not ddof=0 (default, interprete data as population) or ddof=1 (interprete it as samples, i.e. estimate true variance) depends on what you're doing. – runDOSrun Jan 15 '16 at 10:32

4

To further clarify @runDOSrun's point, the Excel function STDEV.P() and the Numpy function std(ddof=0) calculate the population sd, or uncorrected sample sd, whilst the Excel function STDEV.S() and Numpy function std(ddof=1) calculate the (corrected) sample sd, which equals sqrt(N/(N-1)) times the population sd, where N is the number of points. See more: en.m.wikipedia.org/wiki/… – binaryfunt Apr 9 '16 at 15:56

add a comment |

Bengt · Answer 2 · 2014-02-02 00:27:10Z

up vote 45 down vote

Since Python 3.4 / PEP450 there is a statistics module in the standard library, which has a method stddev for calculating the standard deviation of iterables like yours:

>>> A_rank = [0.8, 0.4, 1.2, 3.7, 2.6, 5.8]
>>> import statistics
>>> statistics.stdev(A_rank)
2.0634114147853952

answered Feb 2 '14 at 0:27

Bengt

6,10323147

15

It's worth pointing out that pstddev should probably be used instead if your list represents the entire population (i.e. the list is not a sample of a population). stddev is calculated using sample variance and will overestimate the population mean. – Alex Riley Jan 3 '15 at 17:55

The functions are actually called stdev and pstdev, not using std for standard as one would expect. I couldn't edit the post as edits need to modify at least 6 chars... – mknaf Jan 6 at 16:00

add a comment |

Alex Riley · Answer 3 · 2015-02-20 20:21:33Z

up vote 27 down vote

Here's some pure-Python code you can use to calculate the mean and standard deviation.

All code below is based on the statistics module in Python 3.4.

def mean(data):
    """Return the sample arithmetic mean of data."""
    n = len(data)
    if n < 1:
        raise ValueError('mean requires at least one data point')
    return sum(data)/n # in Python 2 use sum(data)/float(n)

def _ss(data):
    """Return sum of square deviations of sequence data."""
    c = mean(data)
    ss = sum((x-c)**2 for x in data)
    return ss

def pstdev(data):
    """Calculates the population standard deviation."""
    n = len(data)
    if n < 2:
        raise ValueError('variance requires at least two data points')
    ss = _ss(data)
    pvar = ss/n # the population variance
    return pvar**0.5

Note: for improved accuracy when summing floats, the statistics module uses a custom function _sum rather than the built-in sum which I've used in its place.

Now we have for example:

>>> mean([1, 2, 3])
2.0
>>> pstdev([1, 2, 3])
0.816496580927726

edited Feb 20 '15 at 20:21

answered Jan 3 '15 at 18:48

Alex Riley

47.9k1598119

Should it not be pvar=ss/(n-1) ? – Ranjith Ramachandra Jun 8 '15 at 13:28

@Ranjith: if you want to calculate the sample variance (or sample SD) you can use n-1. The code above is for the population SD (so there are n degrees of freedom). – Alex Riley Jun 8 '15 at 13:38

Sorry. I didn't notice that – Ranjith Ramachandra Jun 8 '15 at 13:53

add a comment |

Alex Riley · Answer 4 · 2015-09-15 12:49:25Z

up vote 15 down vote

In Python 2.7.1, you may calculate standard deviation using numpy.std() for:

Population std: Just use numpy.std() with no additional arguments besides to your data list.
Sample std: You need to pass ddof (i.e. Delta Degrees of Freedom) set to 1, as in the following example:

numpy.std(< your-list >, ddof=1)

The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero.

It calculates sample std rather than population std.

edited Sep 15 '15 at 12:49

Alex Riley

47.9k1598119

answered Jul 12 '15 at 9:22

Ome

16114

Thanks for posting this. I've spent a lot of hours trying to figure out the sample standard deviation in numpy. I didn't understand the ddof thing. – Roly Jul 16 '15 at 2:10

You are welcome. I've updated my previous post for a bit more clarification. – Ome Jul 16 '15 at 9:08

add a comment |

Community · Answer 5 · 2017-04-13 12:19:15Z

up vote 7 down vote

In python 2.7 you can use NumPy's numpy.std() gives the population standard deviation.

In Python 3.4 statistics.stdev() returns the sample standard deviation. The pstdv() function is the same as numpy.std().

edited Apr 13 at 12:19

Community♦

11

answered Apr 24 '14 at 16:15

ab-user216125

608811

add a comment |

Samy Bencherif · Answer 6 · 2017-05-22 16:11:16Z

The other answers cover how to do std dev in python sufficiently, but no one explains how to do the bizarre traversal you've described.

I'm going to assume A-Z is the entire population. If not see Ome's answer on how to inference from a sample.

So to get the standard deviation/mean of the first digit of every list you would need something like this:

#standard deviation
numpy.std([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])

#mean
numpy.mean([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])

To shorten the code and generalize this to any nth digit use the following function I generated for you:

def getAllNthRanks(n):
    return [A_rank[n], B_rank[n], C_rank[n], D_rank[n], E_rank[n], F_rank[n], G_rank[n], H_rank[n], I_rank[n], J_rank[n], K_rank[n], L_rank[n], M_rank[n], N_rank[n], O_rank[n], P_rank[n], Q_rank[n], R_rank[n], S_rank[n], T_rank[n], U_rank[n], V_rank[n], W_rank[n], X_rank[n], Y_rank[n], Z_rank[n]]

Now you can simply get the stdd and mean of all the nth places from A-Z like this:

#standard deviation
numpy.std(getAllNthRanks(n))

#mean
numpy.mean(getAllNthRanks(n))

For any one interested, I generated the function using this messy one-liner: str([chr(x)+'_rank[n]' for x in range(65,65+26)]).replace("'", "") — Samy Bencherif, May 22 at 16:13

Elad Yehezkel · Answer 7 · 2017-06-08 15:24:56Z

up vote 0 down vote

pure python code:

from math import sqrt

def stddev(lst):
    mean = float(sum(lst)) / len(lst)
    return sqrt(float(reduce(lambda x, y: x + y, map(lambda x: (x - mean) ** 2, lst))) / len(lst))

edited 2 hours ago

answered 3 hours ago

Elad Yehezkel

164

add a comment |

asked	4 years, 2 months ago
viewed	120,601 times
active	today

Standard deviation of a list

7 Answers 7

Your Answer

Not the answer you're looking for? Browse other questions tagged python list standard-deviation or ask your own question.

Linked

Hot Network Questions

Standard deviation of a list

7 Answers 7

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python list standard-deviation or ask your own question.

Linked

Related

Hot Network Questions