Table Of Contents

Previous topic

Numpy and Scipy

Next topic

Frequency Analysis

This Page

Download this page as:

NumPy basics

This page introduces the scientific number-crunching package numpy. It covers only the basics of the basics. The goal is to justify its use and make the distinction between arrays and lists. After going through this page, the reader should be able to understand the concept of numpy arrays, create arrays from scratch and do calculations with them. We stick to 1D arrays here, the extension to multidimensional arrays is for a later tutorial.

Why Numpy?

Python’s ease-of-use often comes at a price: speed. Let’s try to compute the sine of 20 million (uniform) random floats using Python’s standard modules, and time it.

In [1]: import time # get access to CPU time

In [2]: import math # standard module implementing mathematical operators

In [3]: import random # generate random numbers

In [4]: import numpy as np

In [5]: t0 = time.time()

In [6]: output = []

In [7]: for i in range(20000000):
   ...:     output.append(math.sin(random.random()))
   ...: 

In [8]: print(time.time()-t0) # in seconds
7.37354803085

We can gain ~50% with some tricks (list comprehension, generator and local variables):

In [9]: sin,rand = math.sin,random.random

In [10]: t0 = time.time()

In [11]: output = [sin(rand()) for i in xrange(20000000)] # in Python 3, replace xrange with range

In [12]: print(time.time()-t0)
3.45774412155

With numpy, we can gain another 50%, and have a much cleaner implementation:

In [13]: t0 = time.time()

In [14]: output = np.sin(np.random.uniform(size=20000000))

In [15]: print(time.time()-t0)
0.719166040421

A pure FORTRAN program is, however, still almost 50% faster than numpy.

Basically, numpy provides vectorized functions written in C or FORTRAN that can act on pure Python objects, with a little bit of function-call overhead. Most of the looping is done in C or FORTRAN, avoiding the expensive for loops in pure Python.

Sometimes doing things in a vectorized way is not possible or just too confusing. Vectorization is more an art than a science, so the basic answer is that if it runs fast enough then you are good to go. Otherwise things need to be vectorized or maybe coded in C or Fortran (see Optimization).

Making arrays

Arrays can be created in different ways:

In [16]: a = np.array([10, 20, 30, 40])   # create an array from a list of values

In [17]: a
Out[17]: array([10, 20, 30, 40])

In [18]: b = np.arange(4)                 # create an array of 4 integers, from 0 to 3

In [19]: b
Out[19]: array([0, 1, 2, 3])

In [20]: np.linspace(-np.pi, np.pi, 5)      # create an array of 5 evenly spaced samples from -pi to pi
Out[20]: array([-3.14159265, -1.57079633,  0.        ,  1.57079633,  3.14159265])

In [21]: np.logspace(1,3,9) # create a log-spaced array of 9 floats between (and including) 10 and 1000
Out[21]: 
array([   10.        ,    17.7827941 ,    31.6227766 ,    56.23413252,
         100.        ,   177.827941  ,   316.22776602,   562.34132519,
        1000.        ])

There is also a submodule np.random which allows you to create simple random arrays:

In [22]: np.random.uniform(size=5,low=-5,high=5)
Out[22]: array([-0.1787103 , -0.67993912, -1.97491323, -1.34603415,  2.22377329])

In [23]: np.random.normal(size=6,loc=1,scale=4)
Out[23]: 
array([ 1.16464237,  3.10491595, -2.21260085,  1.34827415,  3.4720418 ,
        2.83764875])

Tip

You can set a seed via np.random.seed(100) which takes an integer as an argument. Setting the seed guarantees the same set of random variables when repeating execution.

The function arange is better only used when working with integer arguments.

New arrays can be obtained by operating with existing arrays:

In [24]: a + b**2            # elementwise operations
Out[24]: array([10, 21, 34, 49])

There are shortcuts to fill arrays with ones or zeros, or create arrays just like another one, but filled with ones or zeros:

In [25]: f = np.ones(3)              # float array of ones

In [26]: g = np.zeros(4, dtype=int)  # int array of zeros

In [27]: i = np.ones_like(g)         # array of zeros with same length/type as f

Exercise: Create a squared arctan curve

Create a squared arctan curve between -4*pi and 4*pi sampling 100 points. Note that the value of pi is in the numpy namespace (np.pi).

Click to Show/Hide Solution

In [28]: x = np.linspace(-4*np.pi, 4*np.pi, 100)

In [29]: y = np.arctan(x)**2

The difference between lists and arrays:

Lists and array behave differently!

  • Arithmics:

    In [30]: mylist = [1,2,3]
    
    In [31]: myarray = np.array([1,2,3])
    
    In [32]: mylist*2
    Out[32]: [1, 2, 3, 1, 2, 3]
    
    In [33]: myarray*2
    Out[33]: array([2, 4, 6])
    
  • Manipulations: lists are easy to modify. Using .append() and .remove() makes them efficiently longer or shorter. Numpy arrays are not meant to be modified. There is no real alternative to .remove(), though there is an alternative np.append (or np.hstack). These are very costly, however, and should be avoided in loops. If you don’t know the length of your array in advance, it is often better to first create a list with .append(), and turn that into an array after you’re done.

But they are similar in some ways:

  • Indexing and slicing (though numpy is much more powerful - but we’ll get to that later):

    In [34]: mylist = [1,2,3,4,5,6,7,8]
    
    In [35]: myarray = np.array([1,2,3,4,5,6,7,8])
    
    In [36]: mylist[1::2]
    Out[36]: [2, 4, 6, 8]
    
    In [37]: myarray[1::2]
    Out[37]: array([2, 4, 6, 8])
    

Exercise: Sorting and reversing

Create a random array (following standard Poisson distribution) of size 100, sort and reverse it, and print the second-to-last element of that array.

Click to Show/Hide Solution

In [38]: x = np.random.poisson(size=100)

In [39]: x = np.sort(x)

In [40]: x = x[::-1]

In [41]: print(x[-2])
0

Go back to the matplotlib tutorial.

Summary

This is a non-exhaustive list of useful commands you might want to consider when creating arrays:

linspace Return evenly spaced numbers over a specified interval.
arange Return evenly spaced values within a given interval.
logspace Return numbers spaced evenly on a log scale.
meshgrid Return coordinate matrices from two or more coordinate vectors.
zeros Return a new array of given shape and type, filled with zeros.
zeros_like Return an array of zeros with the same shape and type as a given array.
ones Return a new array of given shape and type, filled with ones.
ones_like Return an array of ones with the same shape and type as a given array.
Copyright: Smithsonian Astrophysical Observatory under terms of CC Attribution 3.0 Creative Commons
 License