Download this page as: - :download:`a commented Python script ` - :download:`a minimal Python script ` .. _`loadtxt`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt .. _`lombscargle`: http://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.lombscargle.html#scipy.signal.lombscargle .. _`lstsq`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.lstsq.html .. _`argsort`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html .. _`pyfits`: http://www.stsci.edu/institute/software_hardware/pyfits .. _`urllib`: http://docs.python.org/2/library/urllib.html .. _`np.linspace()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html .. _`np.logspace()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.logspace.html .. _`np.arange()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html#numpy.arange .. _`np.meshgrid()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.meshgrid.html .. _`np.ones()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.ones.html .. _`np.ones_like()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.ones_like.html#numpy.ones_like .. _`np.zeros()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html .. _`np.zeros_like()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros_like.html#numpy.zeros_like .. _`np.array()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html .. _`dtype`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.dtype.html .. _`shape`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html .. _`np.mean()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html .. _`np.average()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.average.html .. _`np.median()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.median.html .. _`np.std()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.std.html .. _`np.max()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.amax.html .. _`np.min()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.amin.html .. _`np.argmax()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.argmax.html .. _`np.argmin()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.argmin.html .. _`np.ptp()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.ptp.html .. _`np.reshape()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html .. _`np.hstack()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.hstack.html .. _`np.vstack()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.vstack.html .. _`np.dstack()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.dstack.html .. _`np.column_stack()`: http://docs.scipy.org/doc/numpy/reference/generated/numpy.column_stack.html .. contents:: :depth: 2 Frequency Analysis ==================== NumPy is at the core of nearly every scientific Python application or module since it provides a fast N-d array datatype that can be manipulated in a vectorized form. This will be familiar to users of IDL or Matlab. NumPy has a good and systematic `basic tutorial `_ available. It is highly recommended that you read this tutorial to fill in the gaps left by this workshop. In this tutorial, we are going to take a real observed light curve of the :math:`\beta` Cephei star HD129929 (after `Aerts et al. 2004 `_), and fit a sine through the data: +-------------------------------+-------------------------------+ | .. image:: timeseries.png | .. image:: phasediagram.png | | :scale: 50 | :scale: 50 | +-------------------------------+-------------------------------+ We assume that the data :math:`F_i (i=0,N)` can be represented by .. math:: F_i = A \sin(\omega t_i + \phi) We will model the datapoints with a linear least squares. Note that we can write the problem also as .. math:: F_i = a \sin(\omega t_i) + b\cos(\omega t_i) with .. math:: A = \sqrt{a^2 + b^2} \phi = \mathrm{arctan} \left(\frac{a}{b}\right) If we can remove the dependency on the frequency :math:`\omega`, which is an unknown and nonlinear parameter, the system is linear and we can look for the least-squares solution of the equation .. math:: C X = B That is, the vector :math:`X` (2x1) that minimizes :math:`||B-CX||^2`, with :math:`C` the Nx2 matrix .. math:: C = \left(\begin{array}{cc} \sin(\omega t_1) & \cos(\omega t_1)\\ \sin(\omega t_2) & \cos(\omega t_2)\\ \vdots & \vdots\\ \sin(\omega t_N) & \cos(\omega t_N)\\ \end{array}\right) and :math:`B` (Nx1) .. math:: B = \left(\begin{array}{c} F_1 \\ \vdots \\ F_N \end{array}\right) The errors can be estimated as the square root of the diagonal elements of the covariance matrix, once the Jacobian :math:`J` and sum of squares :math:`SS` is calculated: .. math:: Cov = \frac{SS}{N-2} \left( [J^T J]^{-1} \right) From a statistical sense, this is not the optimal way of modelling the dataset. It is however, still very useful and traditionally widely applied. One only needs to be careful on how to interpret the results, the derived parameters and their errors. What's new ----------- We'll introduce or further explore: - numpy and scipy: fitting a sine to an observed time series - numpy: creating N-dimensional arrays - numpy: indexing arrays - `pyfits`_: a package for reading and writing FITS files - `urllib`_: a standard library providing common functionality to interface with the internet Setup -------- We will use a real FITS-format dataset that we download straight from the ViZieR service. For this, we'll need the `urllib`_ standard library. The dataset is contained in the catalog `J/A+A/415/241/table1 `_: .. ipython:: In [1]: import urllib In [1]: cat = 'J/A+A/415/241/table1' In [1]: uri = 'http://vizier.u-strasbg.fr/viz-bin/asu-binfits/VizieR?&-source={}&-out.all'.format(cat) In [1]: url = urllib.URLopener() In [1]: filen,msg = url.retrieve(uri,filename='cat.fits') In [1]: url.close() Reading the data ------------------ The FITS-file can easily be read in using `pyfits`_: .. ipython:: In [1]: import pyfits In [1]: hdus = pyfits.open('cat.fits') In [1]: hdus In [1]: hdus.info() Type ``?hdus`` to get a little more detail on the ``hdus`` object. Since this FITS file does not contain an image, only the first table extension (second HDU) is important: .. ipython:: In [1]: table = hdus[1].data .. raw:: html

Click to Show/Hide Tip: pyFITS

The HDUList from a FITS file acts like an ordered dictionary: instead of accessing via an index ``n``, you can also access the extensions by *name*: .. ipython:: In [1]: table = hdus['J_A_A_415_241_TABLE1'].data You can find (and set) the name of an extension throught the header keyword ``EXTNAME`` (case-insensitive). .. ipython:: In [1]: hdus[1].header['extname'] A frequently used convention for images and other telescope data, however, is to store scientific data in an extension named ``SCI``. For more information on FITS files and pyFITS, see the `pyFITS documentation `_ Even if the primary HDU contains no data, the header often contains useful information: .. ipython:: In [1]: hdus[0].header .. raw:: html
Let's have a look at the columns of the data. A lot of information is contained within the header of the table HDU: .. ipython:: In [1]: hdus[1].header But if you're only interested in the column names, you can easily retrieve them with .. ipython:: In [1]: hdus[1].columns.names The two columns we are interested in are the times (``HJD``) and the visual magnitudes (``Vmag``). so it is useful to create two new variables for them. Since we don't need the FITS file anymore, we can close it to free up some memory. .. ipython:: In [1]: hjd = hdus[1].data.field('HJD') In [1]: vmag = hdus[1].data.field('Vmag') In [1]: hdus.close() The variables ``hjd`` and ``vmag`` are *arrays*. They have a shape and a type: .. ipython:: In [1]: hjd.shape In [1]: vmag.dtype .. raw:: html

More on data types

In constrast to a list, all the elements in an array need to be of the same type. That data type is given by the `dtype `_ property of an array. Numpy understand the basic Python types ``bool``, ``int``, ``float``, ``str`` and ``complex``, and will always convert one type to another if more precision is required. So in almost all cases, a user shouldn't worry too much about these, unless memory size is important. But under the hood numpy supports a much greater variety of data types than Python does, e.g. because of the difference between 32-bit and 64-bit floats. Here is a `list of some of the dtypes `_, and their string representation: - ``np.int16``: 16-bit integer (``i2``) - ``np.int32``: 32-bit integer (``i4`` or ``i``) - ``np.int64``: 64-bit integer (``i8``) - ``np.uint``: unsigned 64-bit integer (``u8``) - ``np.float32``: 32-bit float (``f4``) - ``np.float64``: 64-bit float (``f8``) - ``np.complex128``: 128-bit complex numbe (``c16``) Also the byte order can specified (big-endian (``>``) or little-endian (``<``)), which can be important if you get the arrays from a binary file. Personal experience has showed me that often doing a simple:: data = np.array(data,float) Can solve some problems when numpy arrays from FITS files are passed on to C or Fortran libraries. Actually, there is one more data type, called ``object``: this makes it possible to put custom objects in an array and calculate with them. .. raw:: html
.. admonition:: Exercise: Array properties Figure out all the methods owned by these numpy arrays. Find out the totale time span of the data, and the mean visual magnitude. .. raw:: html

Click to Show/Hide Solution

:: hjd. x.T x.argsort x.clip x.ctypes x.dot x.flags x.item x.min x.prod x.repeat x.setfield x.squeeze x.take x.transpose x.all x.astype x.compress x.cumprod x.dtype x.flat x.itemset x.nbytes x.ptp x.reshape x.setflags x.std x.tofile x.var x.any x.base x.conj x.cumsum x.dump x.flatten x.itemsize x.ndim x.put x.resize x.shape x.strides x.tolist x.view x.argmax x.byteswap x.conjugate x.data x.dumps x.getfield x.max x.newbyteorder x.ravel x.round x.size x.sum x.tostring x.argmin x.choose x.copy x.diagonal x.fill x.imag x.mean x.nonzero x.real x.searchsorted x.sort x.swapaxes x.trace .. ipython:: In [1]: hjd.ptp() In [1]: vmag.mean() .. raw:: html
.. admonition:: Exercise: Plotting arrays Plot the visual magnitudes versus time, using black filled circles. .. raw:: html

Click to Show/Hide Solution

:: hjd. x.T x.argsort x.clip x.ctypes x.dot x.flags x.item x.min x.prod x.repeat x.setfield x.squeeze x.take x.transpose x.all x.astype x.compress x.cumprod x.dtype x.flat x.itemset x.nbytes x.ptp x.reshape x.setflags x.std x.tofile x.var x.any x.base x.conj x.cumsum x.dump x.flatten x.itemsize x.ndim x.put x.resize x.shape x.strides x.tolist x.view x.argmax x.byteswap x.conjugate x.data x.dumps x.getfield x.max x.newbyteorder x.ravel x.round x.size x.sum x.tostring x.argmin x.choose x.copy x.diagonal x.fill x.imag x.mean x.nonzero x.real x.searchsorted x.sort x.swapaxes x.trace .. ipython:: In [1]: import matplotlib.pyplot as plt In [1]: plt.plot(hjd,vmag,'ko') .. image:: timeseries.png :scale: 50% .. raw:: html
Calculating with arrays ------------------------- The use of lists in scientific scripting is very limited: they are well suited to store all kinds of information, but they are *extremely* slow when used in calculations. Numpy arrays are the solution to this problem. They can be easily used in calculations, where one has to remember that all the default operations on array are *element-wise*: .. ipython:: In [1]: import numpy as np In [1]: a = np.array([10,20,30,40]) In [1]: b = np.arange(4) In [1]: a + b**2 # elementwise operation Arrays may have more than one dimension: .. ipython:: In [1]: f = np.ones([3, 4]) # 3 x 4 float array of ones In [1]: f In [1]: g = np.zeros([2, 3, 4], dtype=int) # 3 x 4 x 5 int array of zeros In [1]: i = np.zeros_like(f) # array of zeros with same shape/type as f You can change the dimensions of existing arrays: .. ipython:: In [1]: w = np.arange(12) In [1]: w.shape = [3, 4] # does not modify the total number of elements In [1]: w In [1]: x = np.arange(5) In [1]: y = x.reshape(5, 1) In [1]: y = x.reshape(-1, 1) # Same thing but NumPy figures out correct length And you can stack arrays horizontally, vertically, as columns, and make a coordinate grid out of two arrays: .. ipython:: In [1]: x = np.linspace(0,1,3) In [1]: y = np.linspace(0,10,3) In [1]: np.hstack([x,y]) # shape (6,) In [1]: np.vstack([x,y]) # shape (2,3) In [1]: np.column_stack([x,y]) # shape (3,2) In [1]: xgrid, ygrid = np.meshgrid(x,y) # 2x shape (3,3) In [1]: print(xgrid) In [1]: print(ygrid) .. admonition:: Exercise: Make a ripple Calculate a surface ``z = cos(r) / (r + 5)`` where ``r = sqrt(x**2 + y**2)``. Set ``x`` and ``y`` to an array that goes from -20 to 20 with 160 elements. Use `np.meshgrid` to generate a coordinate array. Use `plt.imshow` to display the image of ``z``. .. raw:: html

Click to Show/Hide Solution

.. ipython:: In [1]: x = np.linspace(-20,20,160) In [1]: y = np.linspace(-20,20,160) In [1]: x,y = np.meshgrid(x,y) # converts x and y into 2D arrays representing coordinates In [1]: r = np.sqrt(x**2 + y**2) In [1]: z = np.cos(r) / (r + 5) In [1]: import matplotlib.pyplot as plt In [1]: plt.imshow(z) .. image:: ripple.png :scale: 50 .. raw:: html
Finding a frequency ---------------------- Now we want to compute a frequency spectrum. Since the time series is non-equidistant, we cannot easily use the Fast Fourier Transform implemented in `numpy's FFT pack `_. Luckily, scipy provides the solution within the `scipy.signal `_ package, where a fast implementation of the `Lomb-Scargle periodogram `_ is available: .. ipython:: In [1]: import scipy.signal Let's search in the frequency interval between 6.2 and 7.2 cy/d, with a frequency resolution equal to one-fifth of the inverse time span (note that `lombscargle`_ needs angular frequencies): .. ipython:: In [1]: df = 0.2 / hjd.ptp() In [1]: freq_interval = np.arange(6.2, 7.2, df)*2*np.pi Now we can compute the periodogram, after we subtracted the mean from the magnitude array: .. ipython:: In [1]: vmag = vmag - vmag.mean() In [1]: pgram = scipy.signal.lombscargle(hjd,vmag,freq_interval) Oops! What happened here? Apparently, the Lomb-Scargle periodogram fails to compute, and it has something to do with the type of the array. The Lomb-Scargle function is picky about the dtype of the array it receives. Since we loaded the times and magnitudes from a FITS record, these arrays (can) have a non-standard data type. Although this looks more like an omission in the Lomb-Scargle function (it should check for this and cast arrays to the right format), this is easily fixed by converting them to normal float arrays manually: .. ipython:: In [1]: hjd = np.array(hjd,float) In [1]: vmag = np.array(vmag,float) In [1]: pgram = scipy.signal.lombscargle(hjd,vmag,freq_interval) .. admonition:: Exercise: Normalise the periodogram The periodogram from scipy's `lombscargle`_ is a power spectrum, not an amplitude spectrum. Look at the documentation to figure out how to normalise it to an amplitude spectrum. Perform the normalisation and make a plot of the amplitude spectrum versus cyclical frequency, using a black connected line. .. raw:: html

Click to Show/Hide Solution

.. ipython:: In [1]: Ndata = len(hjd) # equivalent to Ndata = hjd.shape[0] In [1]: norm = np.sqrt(4.0 * pgram / Ndata) In [1]: freq_interval_cy = freq_interval/(2*np.pi) In [1]: plt.plot(freq_interval_cy,norm,'k-') .. image:: periodogram.png :scale: 50% .. raw:: html
The frequency that is most likely present in the data is the one that has the highest peak in the periodogram. We can easily retrieve it via: .. ipython:: In [1]: freq = freq_interval[np.argmax(pgram)] In [1]: print(freq/(2*np.pi)) The function ``argmax`` returns the index of the maximum value. We use that index to recover the frequency at that index. The value from the paper is 6.461699 cy/d. Fitting a sine --------------- Now let us create the array :math:`C` as described in the introduction and call the least square solver `lstsq`_ from the linear algebra module: .. ipython:: In [1]: C = np.column_stack([np.sin(freq*hjd), np.cos(freq*hjd)]) In [1]: X, sumsq, rank, s = np.linalg.lstsq(C,vmag) Now retrieve the value for the amplitude and the phase (see introduction): .. ipython:: In [1]: amplitude = np.sqrt(X[0]**2 + X[1]**2) In [1]: phase = np.arctan2(X[1],X[0]) In [1]: fit = amplitude * np.sin(freq*hjd + phase) A phasediagram can be a good way of evaluating the fit by eye. The phases are easily calculated and plotted with: .. ipython:: In [1]: phased = 2*np.pi * (hjd % (2*np.pi/freq)) # frequency in rad/s, but time in s! In [1]: plt.plot(phased,vmag,'ko'); The fit, however, we want to plot as a connected line. To do this, we need to sort the phases, as the order of the phases is still chronological: .. ipython:: In [1]: hjd In [1]: phased Indexing ------------ We've seen above that we can easily select an element from an array, and even ranges with strides. Indexing can also be done with **integer** arrays and **boolean** arrays. It is easiest to understand integer indexing when seen in action: .. ipython:: In [1]: x = np.linspace(0,1,11) In [1]: print(x) In [1]: index = np.array([0,1,2,3,4]) In [1]: print(x[index]) In [1]: index = np.array([0,3,1,4,2]) In [1]: print(x[index]) You can use this to sort one array with another array via numpy's `argsort`_: .. ipython:: In [1]: x = np.random.uniform(size=5) In [1]: x In [1]: index = np.argsort(x) In [1]: index In [1]: x[index] Alternatively, it is possible to index arrays with boolean arrays, allowing conditional selection. Again, seeing it in action clarifies things: .. ipython:: In [1]: x = np.linspace(0,1,11) In [1]: x In [1]: index = np.array([True,False,True,True]) In [1]: x[index] You can use this for example to select all positive numbers from an array: .. ipython:: In [1]: x = np.random.normal(size=5) In [1]: x In [1]: is_positive = x > 0 In [1]: is_positive In [1]: x[is_positive] You can combine conditions element-wise with the operators ``&`` (and) and ``|`` (or). We can use both methods to plot the fit as a thick connected red line, and the negative values in the first half of the phase as a (very) thick blue line: .. ipython:: In [1]: sortarray = np.argsort(phased) In [1]: select = (fit[sortarray] < 0) & (phased[sortarray] < 0.5) In [1]: plt.plot(phased[sortarray], fit[sortarray], 'r-',lw=2); In [1]: plt.plot(phased[sortarray][select], fit[sortarray][select], 'b-',lw=4); .. image:: phasediagram.png :scale: 50% .. raw:: html Calculating errors ------------------- To compute the errors, we first need to compute the Jacobian, which contains the partial derivatives of the fitting function: .. ipython:: In [1]: J = np.column_stack([np.sin(freq*hjd + phase), amplitude * np.cos(freq*hjd + phase)]) In [1]: cov = sumsq / (Ndata - 2) * np.linalg.inv(np.dot(J.T, J)) In [1]: e_amplitude = np.sqrt(cov.diagonal()[0]) In [1]: e_phase = np.sqrt(cov.diagonal()[1]) In [1]: print("Amplitude = {:.4f} +/- {:.4f} mag".format(amplitude,e_amplitude)) In [1]: print("Phase = {:.2f} +/- {:.2f} rad".format(phase,e_phase)) Wrapping up -------------- An uninterrupted version of what we've done:: import numpy as np import matplotlib.pyplot as plt import scipy.signal import urllib import pyfits # Download data cat = 'J/A+A/415/241/table1' uri = 'http://vizier.u-strasbg.fr/viz-bin/asu-binfits/VizieR?&-oc=deg,eq=J2000&-source={}&-out.all'.format(cat) url = urllib.URLopener() filen,msg = url.retrieve(uri,filename='cat.fits') url.close() # Read the FITS file with pyfits.open('cat.fits'): hdus = pyfits.open('cat.fits') hjd = np.array(hdus[1].data.field('HJD'),float) vmag = np.array(hdus[1].data.field('Vmag'),float) # Subtract mean from data vmag = vmag - vmag.mean() # Compute the periodogram and retrieve the frequency df = 0.2 / hjd.ptp() freq_interval = np.arange(6.2, 7.2, df)*2*np.pi pgram = scipy.signal.lombscargle(hjd,vmag,freq_interval) freq = freq_interval[np.argmax(pgram)] # Perform the linear fit C = np.column_stack([np.sin(freq*hjd), np.cos(freq*hjd)]) X, sumsq, rank, s = np.linalg.lstsq(C,vmag) # Retrieve the parameters amplitude = np.sqrt(X[0]**2 + X[1]**2) phase = np.arctan2(X[1],X[0]) fit = amplitude * np.sin(freq*hjd + phase) # Compute the errors Ndata = len(hjd) J = np.column_stack([np.sin(freq*hjd + phase), amplitude * np.cos(freq*hjd + phase)]) cov = sumsq / (Ndata - 2) * np.linalg.inv(np.dot(J.T, J)) e_amplitude = np.sqrt(cov.diagonal()[0]) e_phase = np.sqrt(cov.diagonal()[1]) # Report results print("Frequency = {:.6f} cy/d".format(freq/(2*np.pi))) print("Amplitude = {:.4f} +/- {:.4f} mag".format(amplitude,e_amplitude)) print("Phase = {:.2f} +/- {:.2f} rad".format(phase,e_phase)) Note the use of the ``with`` statement instead of the ``open`` / ``close`` paradigm. Summary ----------- **Array properties** Access via:: myarray. ========================= ========================================== `dtype`_ Data-type of the array’s elements. `shape`_ Tuple of array dimensions. ========================= ========================================== **Basic statistics** Some of the functions below have duplicate definitions as array properties. This is somewhat confusing, so it is advised to always use:: mymean = np.mean(myarray) instead of:: mymean = myarray.mean() All array methods have function equivalents, but not vice versa. ========================= ======================================================================== `np.mean()`_ Compute the arithmetic mean along the specified axis. `np.average()`_ Compute the weighted average along the specified axis. `np.median()`_ Compute the median along the specified axis. `np.std()`_ Compute the standard deviation along the specified axis. `np.ptp()`_ Range of values (maximum - minimum) along an axis. `np.max()`_ Return the maximum of an array or maximum along an axis. `np.min()`_ Return the minimum of an array or minimum along an axis. `np.argmax()`_ Indices of the maximum values along an axis. `np.argmin()`_ Indices of the minimum values along an axis. ========================= ======================================================================== **Creating arrays** ========================= ======================================================================== `np.array()`_ Create an array `np.linspace()`_ Return evenly spaced numbers over a specified interval. `np.arange()`_ Return evenly spaced values within a given interval. `np.logspace()`_ Return numbers spaced evenly on a log scale. `np.meshgrid()`_ Return coordinate matrices from two or more coordinate vectors. `np.zeros()`_ Return a new array of given shape and type, filled with zeros. `np.zeros_like()`_ Return an array of zeros with the same shape and type as a given array. `np.ones()`_ Return a new array of given shape and type, filled with ones. `np.ones_like()`_ Return an array of ones with the same shape and type as a given array. `np.reshape()`_ Gives a new shape to an array without changing its data. ========================= ======================================================================== **Combining arrays** ========================= ============================================================== `np.hstack()`_ Stack arrays in sequence horizontally (column wise). `np.vstack()`_ Stack arrays in sequence vertically (row wise). `np.column_stack()`_ Stack 1-D arrays as columns into a 2-D array. `np.dstack()`_ Stack arrays in sequence depth wise (along third axis). ========================= ==============================================================