Solar home electricity dataset by Ausgrid

I've started exploring an open dataset that should be very useful for research on residential solar energy (for example studying energy self-consumption with a PV-battery system).

My Python code to manipulate this dataset and some preliminary studies (simple statistics) is freely available on Github: This data exploration code is extensively using pandas (for slicing, grouping, resampling, etc.).

About the dataset

The “Solar home electricity dataset” is made available by Ausgrid, an Australian electric utility which operates the distribution grid in Sidney and nearby areas.

This dataset contains a pretty rich record:

  • electricity consumption and the PV production
  • of 300 customers (in Sidney and its area),
  • over three years (July 2010 to June 2013),
  • with a 30 minutes timestep.

Here is a small extract of the PV production for 3 randomly chosen customers, over 3 days in July 2011:

Many more plots and statistics are given in the Jupyter notebooks available on the Github repository.

Locating postcodes (geocoding)

Also, the exploration of this dataset was an occasion to discover the Google Maps Geocoding API. Indeed, the location of each anonymous customer is given by a postcode only. To enable quantitative study of the spatiotemporal pattern in PV production, I've tried to locate this postcodes. The Python code for this is in the Postcodes location.ipynb Notebook. Map plotting is done with cartopy.

Extracted from this notebook, here is an overview of the locations of the postcodes present in the dataset (in Australia, NSW). Red rectangles are the boundaries of each postcode, as returned by Google maps (small in urban areas along the coast, gigantic otherwise).


I've started to play with the pretty nice Cozir CO2 sensor from GSS. This relates to research projects on air quality control.

For testing purpose, the sensor is connected to my computer through a USB-serial converter cable. In order to communicate  with the sensor (e.g. grab the CO2 concentration data), I've written a bit of Python code to wrap the low-level ASCII communication protocol into a higher level, more compact API.

For example, instead of exchanging byte codes, reading the temperature becomes:

For people that might be interested, all the details and the code are on the Github repository:

Also, the repo contains a basic logging program that can be customized to anyone's needs. Tested for now with Python 2.7, on Linux and Windows.


Python training at ENS Rennes

I just put online the material of my Python training class I taught yesterday for the first time to a group of ~15 science teachers at my school ENS Rennes. It is a new requirement of the French Education Ministry that these teachers (in Math, Physics and Engineering) will have to teach Programming and Numerical computing with Python, starting next academic year in September 2013. They can otherwise choose Scilab, which can be useful for playing with block diagrams.

Since the targeted audience was mainly composed of Mechanical Engineering teachers, I tried to anchor several examples in that field. Nothing "advanced" though, because I'm not a mechanical engineer ! And more importantly, those examples were meant to be ready-to-use, for teachers to be soon in front of their new students. I hope they found this useful !

Unicode Cheatsheet for Greek & Math

I often find it useful to add some mathematical or greek symbols in my scientific computing Python programs (mainly in commentary). Here is a selection I've taken from Gnome Character Map for math. Greek was mainly generated by a Python script...


plus-minus _ ±
multiplication _ ×
division _ ÷
power _  ² ³
root _ √ ∛
infinity _ ∞
integrals and sum _ ∫ ∬ ∑
partial diff.             _ ∂
increment/Laplace _ ∆ (different from Greek delta : Δ)
nabla                      _ ∇
expectation             _ ⟨⟩ [apparently the double-struck E symbol kills Worpress...]


  • equality            = ≈ ≠ ≡
  • inequality          < > ≤ ≥ ⩽ ⩾
  • proportional to     ∝
  • element of          ∈ ∉
  • subset of           ⊂ ⊄
  • quantifiers         ∀ ∃ ∄


  • integers            ℕ ℤ
  • real numbers        ℝ
  • complex numbers     ℂ
  • empty set           ∅


  • arrows : → ⟶ ⇒
  • maps to : ↦ ⟼


pairs of lower case and upper case letters (putting a dot on letters which have very similar latin counterparts)

α .
β .
γ Γ
δ Δ
ε .
ζ .
η .
θ Θ
. .
κ .
λ Λ
μ .
ν .
ξ Ξ
. .
π Π
ρ .
ς .
σ Σ
τ .
υ .
φ Φ (and also ϕ at U+03d5)
χ .
ψ Ψ
ω Ω

This greek table was first generated by these three lines of Python:

Better stats with Python thanks to Patsy

Patsy, a Python package for describing statistical models and building design matrices using  "formulas" : a great addition in the ecosystem of Python statistical packages !

The first release a very nice package called Patsy was announced last week by Nathaniel Smith. Quoting Nathaniel's message, Patsy is « a Python package for describing statistical models and building design matrices using "formulas". Patsy's formulas are inspired by and largely compatible with those used in R ».

I should add to this message that Patsy is also King Arthur's assistant in the Monty Python and The Holy Grail film. While I discovered this by chance, I doubt it is a coincidence...

This new module is a great addition to the emerging ecosystem of Python statistical packages which already includes statsmodels and pandas as flagships, the latter being very actively developed. A complete description of Patsy's functionalities is provided on its overview page, but the best description to me is the clearly written quickstart section. Not all modules enjoy such a nice documentation, so let's enjoy !

Open Terminal Here

Friday afternoon, I finally got tired of a bug in "Open Terminal Here", a Bash script I was using so far. I made a Python replacement...

Open Terminal Here ?

"Open Terminal Here" just a simple Bash script meant to be launched from a right-click within a directory opened in Nautilus, the Gnome file manager. As the name states, the gnome-terminal should be launched with its working directory being set to the currently opened directory. Simply handy! Only one problem: setting the  working directory was failing whenever the name of directory contained non-ASCII characters. 🙁

This "Open Terminal Here" is not a standard functionality of Nautilus (as opposed to Dolphin where you can just press Maj+F4 to launch a "konsole"). Rather, you have to download an extension script from G-Scripts, a nice central repository, and drop it in a quite buried directory (~/.gnome2/nautilus-scripts). When it comes to the "Open Terminal Here" function, there are several scripts available. Choosing between these is up to the visitor...

"Open Terminal Here" in Bash

Here is the central part of the Bash script I was using so far:

I had never paid attention to this code until Friday (especially since I have such a poor Bash understanding). I now realize that line 1 does most of the job, which is :

  1. retreive the $NAUTILUS_SCRIPT_CURRENT_URI environment variable.
  2. cut the "file://" prefix from the URI
  3. replace "%20" by real space characters " "

This Bash script works most of the time, except with directory names containing non-ASCII characters. Indeed, in a directory called "études notes" for example, Nautilus sets the URI variable to "file:///home/pierre/%C3%A9tudes%20notes". One can see how "é" gets encoded as "%C3%A9" and the Bash script doesn't perform any decoding.

Now "Open Terminal Here" with Python

In order to support non-ASCII, I tried to build my own script. I chose Python simply because that's the language I'm the most familiar with. The entire source is a GitHub Gist. Here are the main lines:

Well, it's basically the same as in Bash, except with a useful call to the urllib.unquote function which performs all the URI decoding job. All imports are from the Python standard library: great !

And by the way, with Perl !

While writing this post, I realized I had on my laptop a second "Open Terminal Here" script. I had forgotten about it because it was disabled for I don't know what reason. It's a Perl script, also coming from G-Script. Here are the main lines:

What can be said except that it also works ! The URI decoding is nicely performed by "chr(hex($1))". I actually think the author wrote this script for the very same reason I wrote a Python one: "he loves Perl!" 😉

Conclusion ?

I don't know if it's that useful to have three versions of a script to solve the very same tiny problem... Still, one can notice the vivid differences in the programming approach: while my Python scripts relies mostly function coming from standard modules, the Perl script is literally built on Regular Expressions. No surprise !

Finally, the Bash script just calls command line utilities (cut and sed), but doesn't decode the URI. Is there a command line program for that ?

And what about Haskell ?

I'll keep this for another day 😉 I need to get a Fistful of Monads first. Would be a nice experiment though...

Autoregressive processes : learning with an interactive tool

I recently got interested in second order autoregressive processes ("AR(2)" processes), after spending quite some time working only with first order ones... It was time to get an extra degree of freedom/complexity ! However, I didn't have much intuition about the behavior of these models and neither about how to "tune the knobs" that set this behavior so I spent some time on this subject.

For the sake of clarity, let's write down a definition/notation for an AR(2) process X_k:

X_k = a_1 X_{k-1} + a_2 X_{k-2} + Z_k

with the real numbers a1 and a2 being the two parameters that tune the "behavior" of the process. Z_k may be call the "input" in the context of linear filters, while "innovation" (i.e. a white noise input) would better fit in the context of random processes. A nice property of AR(2) processes, as opposed to AR(1) is their ability to capture slowly oscillating behaviors, like ocean waves. Also, within the AR(2) model, AR(1) is included as the special case a2 = 0.

Bits of a non-interactive study

In order to understand the interplay of the two parameters a1 and a2, I spent extensive time doing paper-based analytical/theoretical studies. I also created a IPython Notebook which I should publish when there is proper Notebook -> HTML conversion system (edit: now available on nbviewer). Most of the AR(2) behavior is governed by the characteristic polynomial P(z) = z^2 - a_1 z - a_2. This polynomial helps in showing that there are two regimes : the one with two real roots (when  a_2 > - a_1^2/4 ) and the one with two complex-conjugate roots (when  a_2 < - a_1^2/4 ) where I can get my long-awaited slow oscillations. A third regime is the special case of a double root  a_2 = - a_1^2/4 which I don't think is of a special interest.

Beyond the study of the root, I also computed the variance of the process and its spectral density. I found both not so easy to compute !

Creating an interactive tool

After the static paper-based study and the still semi-static computer-based study, where I was able to derive and plot the impulse response and some realization of random trajectories, I wanted to get a real-time interactive experience of the interplay of the two parameters a1 and a2. To this end I wrote a tiny Python script, based on the event handling machinery of Matplotlib. More specifically, I started from the "looking glass" example script which I altered progressively as my understanding of event handling in Matplotlib was getting better. Here is a frozen screen capture of the resulting plots:

screen capture from the Interactive AR2 demo
Interactive demo of some properties of an auto-regressive AR(2) process

I put the code of this script on a GitHub Gist. This code is quite crude but it should work (assuming that you have NumPy, SciPy and Matplotlib installed). I've implemented the plots of three properties of the process:

  1. the root locus diagram, where one can see the transition between the two real roots and the two complex-conjugate roots as the  a_2 = - a_1^2/4 boundary is crossed.
  2. the Impulse Response (output of the filter for a discrete impulse  Z_k = \delta_k ), implemented simply using scipy.signal.lfilter.
  3. the Frequency Response, which I normalized using the variance of the process so that it always has unit integral. This latter property is mainly based on my own analytical derivations so that it should not be taken as a gold reference !

My feeling about interactive plotting with Matplotlib

Overall, I felt pretty at ease with the event handling machinery of Matplotlib. I feel that writing an interactive plot didn't add too many heavy lines of code compared to a regular static plot. Also, the learning curve was very smooth, since I didn't have to learn a new plotting syntax, as opposed to some of my older attempts at dynamic plotting with Traits and Chaco. Still, I feel that if I wanted to add several parameters in a small interactive GUI, Traits would be the way to go because I haven't seen a nice library of predefined widgets in Matplotlib (it's not it's purpose anyway). So I keep the Traits and Chaco packages for future iterations !

Electricity consumption peaks

Thanks to the publicly available RTE éCO2mix data about French electricity, I was able to analyze the French electricity consumption since June 24th 2000 (15 minutes time-step averages), that is a bit more than 11 years. This is the first data analysis based on my éCO2mix downloading tool I announced a few days ago. It shows the tremendous increase (44 %) in the peak consumption value over this period.

Note : tools (Python script) and data that generated these curves and tables are openly available on GitHub.

Consumption evolution

To get a glimpse at what is going on with French electricity consumption, here is a plot over those 11 years.

weekly electricity consumption 2001-2011 (with records)
French electricity consumption : weekly averages, with records pointed with red diamonds

How to read this plot

  • The read line shows weekly averages so that weekly and daily variations are smoothed out.
  • The light blue shaded region which is the range between min and max consumption during the given week. This gives an idea of intra-week variability that the red curve doesn't show.
  • The red diamonds shows the successive consumption records (see below)

Pattern and trend

On this plot, one can see the seasonal consumption pattern which is basically:

  • less consumption during summers
  • more consumption during winters

The reason for this being roughly that there is a bigger need for heating and lightning during winter than during summer.

Also, there is a general increasing trend. More precisely, one can see that consumption during summer is just slowing increasing, while winter demand is getting bigger and bigger. I will now illustrate this with the analysis of consumption records.

Consumption records

I searched for successive records in the consumption data (starting June 2000). I found 27 records ("I" being my Python code...) which are all listed in the table below. All records (but one) happened around 7pm in winter, that is during the so called "evening peak", when people come back home and put both heating and lighting on (and start watching TV...).

Also, all these records happened during weekdays other than Saturday or Sunday. This is related to the weekly pattern which contains a significant drop on weekends.

A tremendous increase of peak demand

If I summarize these successive records over the years, here is what I get :

  • During Winter 2000-2001, the highest peak was 70.5 GW (at 7pm on Tuesday January 9th, 2001)
  • During Winter 2005-2006, the highest peak was 86.2 GW (at 7pm on Friday January 27th 2006)
  • During Winter 2011-2012 an historical record was set at 101.7 GW (at 7pm on Wednesday February 8th, 2012)

This is +44 % increase in peak demand, which I feel is quite huge over this not-so-long period of 11 years. This requires the commissioning of new dedicated power plants to avoid black outs during winter.

Rationale for this increase

I'm not a specialist of this topic, but the basic idea is that this tremendous increase is French specific (if compared to similar European countries like Germany or UK).

It comes from the fact that a high proportion of house heating is electric (around 30 %, and even more in new buildings). Each new building with an electric heating system is prone to be an additional peak load during winter, and they are a lot.

This is summarized by the "thermal sensitivity" of French electricity consumption : there is an increase of +2.3 GW each time the outside air temperature goes down by 1°C. In Germany, it is only +0.5 GW...

And why do French people rely so much on electric heating ? Perhaps because of the price and also the culture of "cheap nuclear electricity". French regulated price for electrical energy is about 0.11 €/kWh for residential customers (including taxes). The regulated price has been frozen for some years by the French government to avoid angering people. Also, the installation cost of electric heaters is lower than of a fuel/gas based central heating system.

However, in my opinion, French electricity price is bound to increase of maybe 30 % in the coming years, partly due to the need to either stop or rebuild nuclear power plants. It is sad that a lot of peoplee who will have to pay this extra cost where trapped in the first place by an artificially low electricity price.

Electricity consumption peaks table

French electricity consumption peaks between June 2000 and February 2012 (demand records computed with 15 minutes averages)
DatePower (GW)WeekdayHour