Python package statistics - addition

In my earlier post today, I published some statistics about Python packages. Someone from Python users mailing lists pointed out that mean and standard deviation for such a highly skewed data was meaningless and better summary could be obtained from non-parametric statistics. So, I will publish some new statistics in this post.

File Sizes

First, second and third quartiles are 4KB, 11.5KB and 38KB respectively. Here is a histogram of log(size)

Python package size histogram

Downloads

Daily

First, second and third quartiles are 0, 2 and 8 respectively. I couldn't normalize the data so no histogram here.

Weekly

First, second and third quartiles are 16,39 and 100 respectively.

Monthly

First, second and third quartiles are 56, 147 and 375 respectively.

Lines of Python code

I am not quite sure how exactly would skewness affect my predictions about population mean. I will just publish quartile parameters in my sample. First, second and third quartiles were 48, 199 and 498 respectively. It appears that most packages on PyPI are quite small. I also made a boxplot of this, but it was very squeezed and looked meaningless so I didn't post it.