Download this page as:
This page introduces the scientific number-crunching package numpy. It covers only the basics of the basics. The goal is to justify its use and make the distinction between arrays and lists. After going through this page, the reader should be able to understand the concept of numpy arrays, create arrays from scratch and do calculations with them. We stick to 1D arrays here, the extension to multidimensional arrays is for a later tutorial.
Python’s ease-of-use often comes at a price: speed. Let’s try to compute the sine of 20 million (uniform) random floats using Python’s standard modules, and time it.
In [1]: import time # get access to CPU time
In [2]: import math # standard module implementing mathematical operators
In [3]: import random # generate random numbers
In [4]: import numpy as np
In [5]: t0 = time.time()
In [6]: output = []
In [7]: for i in range(20000000):
...: output.append(math.sin(random.random()))
...:
In [8]: print(time.time()-t0) # in seconds
7.37354803085
We can gain ~50% with some tricks (list comprehension, generator and local variables):
In [9]: sin,rand = math.sin,random.random
In [10]: t0 = time.time()
In [11]: output = [sin(rand()) for i in xrange(20000000)] # in Python 3, replace xrange with range
In [12]: print(time.time()-t0)
3.45774412155
With numpy, we can gain another 50%, and have a much cleaner implementation:
In [13]: t0 = time.time()
In [14]: output = np.sin(np.random.uniform(size=20000000))
In [15]: print(time.time()-t0)
0.719166040421
A pure FORTRAN program is, however, still almost 50% faster than numpy.
Basically, numpy provides vectorized functions written in C or FORTRAN that can act on pure Python objects, with a little bit of function-call overhead. Most of the looping is done in C or FORTRAN, avoiding the expensive for loops in pure Python.
Sometimes doing things in a vectorized way is not possible or just too confusing. Vectorization is more an art than a science, so the basic answer is that if it runs fast enough then you are good to go. Otherwise things need to be vectorized or maybe coded in C or Fortran (see Optimization).
Arrays can be created in different ways:
In [16]: a = np.array([10, 20, 30, 40]) # create an array from a list of values
In [17]: a
Out[17]: array([10, 20, 30, 40])
In [18]: b = np.arange(4) # create an array of 4 integers, from 0 to 3
In [19]: b
Out[19]: array([0, 1, 2, 3])
In [20]: np.linspace(-np.pi, np.pi, 5) # create an array of 5 evenly spaced samples from -pi to pi
Out[20]: array([-3.14159265, -1.57079633, 0. , 1.57079633, 3.14159265])
In [21]: np.logspace(1,3,9) # create a log-spaced array of 9 floats between (and including) 10 and 1000
Out[21]:
array([ 10. , 17.7827941 , 31.6227766 , 56.23413252,
100. , 177.827941 , 316.22776602, 562.34132519,
1000. ])
There is also a submodule np.random which allows you to create simple random arrays:
In [22]: np.random.uniform(size=5,low=-5,high=5)
Out[22]: array([-0.1787103 , -0.67993912, -1.97491323, -1.34603415, 2.22377329])
In [23]: np.random.normal(size=6,loc=1,scale=4)
Out[23]:
array([ 1.16464237, 3.10491595, -2.21260085, 1.34827415, 3.4720418 ,
2.83764875])
Tip
You can set a seed via np.random.seed(100) which takes an integer as an argument. Setting the seed guarantees the same set of random variables when repeating execution.
The function arange is better only used when working with integer arguments.
New arrays can be obtained by operating with existing arrays:
In [24]: a + b**2 # elementwise operations
Out[24]: array([10, 21, 34, 49])
There are shortcuts to fill arrays with ones or zeros, or create arrays just like another one, but filled with ones or zeros:
In [25]: f = np.ones(3) # float array of ones
In [26]: g = np.zeros(4, dtype=int) # int array of zeros
In [27]: i = np.ones_like(g) # array of zeros with same length/type as f
Exercise: Create a squared arctan curve
Create a squared arctan curve between -4*pi and 4*pi sampling 100 points. Note that the value of pi is in the numpy namespace (np.pi).
Click to Show/Hide Solution
In [28]: x = np.linspace(-4*np.pi, 4*np.pi, 100)
In [29]: y = np.arctan(x)**2
Lists and array behave differently!
Arithmics:
In [30]: mylist = [1,2,3] In [31]: myarray = np.array([1,2,3]) In [32]: mylist*2 Out[32]: [1, 2, 3, 1, 2, 3] In [33]: myarray*2 Out[33]: array([2, 4, 6])Manipulations: lists are easy to modify. Using .append() and .remove() makes them efficiently longer or shorter. Numpy arrays are not meant to be modified. There is no real alternative to .remove(), though there is an alternative np.append (or np.hstack). These are very costly, however, and should be avoided in loops. If you don’t know the length of your array in advance, it is often better to first create a list with .append(), and turn that into an array after you’re done.
But they are similar in some ways:
Indexing and slicing (though numpy is much more powerful - but we’ll get to that later):
In [34]: mylist = [1,2,3,4,5,6,7,8] In [35]: myarray = np.array([1,2,3,4,5,6,7,8]) In [36]: mylist[1::2] Out[36]: [2, 4, 6, 8] In [37]: myarray[1::2] Out[37]: array([2, 4, 6, 8])
Exercise: Sorting and reversing
Create a random array (following standard Poisson distribution) of size 100, sort and reverse it, and print the second-to-last element of that array.
Click to Show/Hide Solution
In [38]: x = np.random.poisson(size=100)
In [39]: x = np.sort(x)
In [40]: x = x[::-1]
In [41]: print(x[-2])
0
This is a non-exhaustive list of useful commands you might want to consider when creating arrays:
linspace | Return evenly spaced numbers over a specified interval. |
arange | Return evenly spaced values within a given interval. |
logspace | Return numbers spaced evenly on a log scale. |
meshgrid | Return coordinate matrices from two or more coordinate vectors. |
zeros | Return a new array of given shape and type, filled with zeros. |
zeros_like | Return an array of zeros with the same shape and type as a given array. |
ones | Return a new array of given shape and type, filled with ones. |
ones_like | Return an array of ones with the same shape and type as a given array. |