Blog

Big Batteries Needed ?

From Big Batteries Needed To Make Fickle Wind And Solar Power Work, on NPR :

You can think of a fully charged battery as a source of energy, ready to sell its product to the electric grid, just the way a power plant does. For that to work, battery owners would need to buy electricity to charge the battery when the price is low, and then sell that electricity back to the grid when the price is high.

But that idea turns out to be a dud.

Not many articles in mass media or in scientific journals take the time to explain how useful batteries can be to integrate renewables. But fewer also explains, like this NPR post, that batteries, at their current cost and capabilities, are not ready for the massive deployment on the grid that is predicted by some.

Fortunately, there is much on-going research on battery technology (and on other storage technologies as well). The progress on batteries has been tremendous and steady since their invention (e.g. the impressive improvement of electric model aircraft since 70s), so there may still be technological leaps to come.

The Big Data Brain Drain: Why Science is in Trouble

This is the title of a blog post by Jake Vanderplas, researcher in Astronomy & Machine Learning at University of Washington. He points out that conducting successful research requires more and more data manipulation skills, going along programming skills. However, in academia, ability to write good software is not promoted, if not discouraged !

"academia has been singularly successful at discouraging these very practices that would contribute to its success"

"any time spent building and documenting software tools is time spent not writing research papers, which are the primary currency of the academic reward structure"

On the other hand, software skills are very important and thus well rewarded in the industry, thus the idea of "Big Data Brain Drain" which pumps talented young graduates out of academic research.

After the diagnosis

Jake's post is the "medical diagnosis", and each disease calls for a treatment ! Since the problem is sociological/organizational, the treatment must be sociological/organizational. Jake lays 4 propositions, in particular the evolution of research evaluation criteria. Of course the "implementation details" of evaluation are always a tough issue, not only for research (thinking of learning and teaching evaluation here).

But in general, I hope that the recognition of good software will change positively, along with the general issue of reproducibility. In fact, I think that many academics are aware of the issue, but they just don't see the practical track to recover from the current "dead end" (and also senior researcher don't have much time to thoroughly work on the issue) :

"Making an openly available program for electrical machine sizing would be immensely useful for our research community! It would summarize 20 years of research of our group. I just don't take/find the time for it."

This is an (approximate & very shortened) transcript of the reaction of Hamid Ben Ahmed, one of my PhD advisor when discussing the topic this week. This means that in the field Electrical Engineering (which is has been tied for decades with closed source softwares like Simulink or 3D finite elements models) the feeling that "something is not working" is already there, and that's a good start!

Pushing the change

Now, it is all about academics pushing "le Système" (i.e. French academia), and not waiting for the change to come "from the top". Indeed, I feel that top-level research directors have too many other things to deal with, like managing huge research consortium, writing huge evaluation reports, ... no time for "far away issues" such as reproducibility 😉

Let's just push !

PS : not all of the electrical engineering research runs on closed software. See for example the open source work of Prof. Bernard Uguen and his team on radio wave propagation : www.pylayers.org (from IETR, a neighbor lab of Rennes University 1)

Open science, reproducibility, coding errors

Just recently going through Fernando Perez' G+, I went across several links on open science and reproducibility of science.

One blog post about the non-evolution of Elsevier publishing policies. As a result, Greg Martin, a mathematician in Vancouver, has decided to resign from the editorial board of Elsevier’s Journal of Number Theory.

I found also two blog posts by Matthew Brett on the NiPy blog ("NeuroImaging with Python" community) about Unscientific Programming and the Ubiquity of Error in computing. The latter asserts that computing tools in science lead easily to results with many mistakes (not to mention the recent discussion about Excel spreadsheets mistakes). From my experience in computing for science, I very much agree with this fact. Often those mistakes are small though (i.e. the order of magnitude of the result is preserved), but not always...

From my research...

Matthew Brett blog reminded of a pretty bad example of error in computing that I encounter when working on my last conference paper dealing with the modeling of a sodium-sulfur battery (cf. PowerTech article on my publications page).

Just in time for the deadline, I submitted a "long abstract" in October 2012. I had just finished implementing the model of the battery and the simulation was up and running. One of the main figure presenting the results is copied here :

Evolution of a figure in my article between Abstract and Final version
Evolution of a figure in my article between Abstract and Final version

and now the interesting thing is to compare the October 2012 version to the February 2013 version (submission of the full paper). Beyond surface changes in the annotations, I've highlighted two big differences in the results. The most striking change is in the lower pane : the "cycle counting" drops from about 400 down to 25 cycles/month. One order of magnitude less !

What happened in the meantime...

Without entering the details, there were several really tiny errors in the implementation of the model. I would not even call these erros "bugs", those were just tiny mistakes. One was somehow like writing Q += i instead of Q += i*dt (omitting the time step when counting electric charges). And when the time step dt is 0.1, that's makes an easy method to miss exactly one order of magnitude ! Spotting those errors in fact takes quite some time and probably one or two weeks were devoted to debugging in November (just after the submission of the abstract).

Of course, reviewers have almost no way to spot this error. First the code is not accessible (the battery model is confidential) and second, the value that was wrong (cycle counting) cannot be easily checked with a qualitative reasoning.

Since I don't know how to solve the problem from the reviewer point of view, let's get back to my position : the man who writes the wrong code. And let's try to see how to make this code a little better.

Testing a physical simulation code

There is one thing which I feel makes model simulation code different from some other codes: it is the high number of tests required to check that it works well. Even with only 3 state variables and one input variable like the sodium-sulfur battery, there are quite a lot of combinations.

I ended up asking the code to report the evolution the mode on one single time step in this ASCII art fashion :

 ------ State at t0 ------
   SoE| DoD_c| T_mod| N_cyc
 0.500| 282.5| 305.0|   0.0
                                                      (dt=0.100 h)
        : ------------ Power flows [W] --------- | - State evolution -
  P_req :  P_sto | P_emf | P_heat| P_Joul| P_invl| ΔDoD | ΔT_m |100 ΔN
-99,000 : -47,619|-53,230|     +0| +3,230| +2,381| +6.68| +0.27|+0.620
-14,140 : -14,140|-15,107| +1,676|   +260|   +707| +1.90| +0.00|+0.176
-10,000 : -10,000|-10,629| +1,915|   +129|   +500| +1.33| +0.00|+0.124
     +0 :      +0|     +0| +2,300|     +0|     +0| +0.00| +0.00|+0.000
+10,000 : +10,000| +9,408| +2,435|    +92|   +500| -1.18| +0.00|+0.110
+14,140 : +14,140|+13,251| +2,437|   +182|   +707| -1.66| +0.00|+0.154
+99,000 : +52,632|+47,645| +1,093| +2,355| +2,632| -5.98| +0.00|+0.555

and to cover enough situations, I have in fact 6 text files containing each 5 blocks like this one. That's 30 tables to check manually, so that's still doable, but there is no easy way to tell "oh, that -15,107 over there doesn't looks right"... it just take time and a critical eye (the latter is the most difficult to get).

Freeze the test for the future

If I now want to upgrade the battery simulation code, I want to ensure that the time spent by my thesis advisor and I on the result checking is not lost.I want to enforce that the code always generates the exact same numerical result.

This is the (somehow dirty) method I've used : run the same test, generate the ASCII string, and compare the result with a previous run stored in a text file. I pasted the code here so that the word "dirty" gets an appropriate meaning :

def test_single_step_store(T_mod, N_cycles, write_test=False):
    '''test the NaSStorage.store_power() method on a single timestep
    and compare with results saved in files
    "test/NaSStorage single step test Tnnn.txt"

    each test is run for a given Temperature and N_cycles,
    for a predefined range of 
     * State of Energy [0, 0.001, 0.5, 0.999, 1] 
     * Power requests [-99e3,-14.14e3, -10e3, 0 , 10e3, 14.14e3, 99e3]
    '''
    SoE_list = [0, 0.001, 0.5, 0.999, 1]
    P_req_list = [-99e3,-14.14e3, -10e3, 0 , 10e3, 14.14e3, 99e3]

    # Run the series of tests
    header = 'Test of `NaSStorage.store_power` at '+\
             'T_mod={:03.1f}, N_cycles={:04.0f}'.format(T_mod, N_cycles)
    print(header)
    res = header + '\n'
    for SoE in SoE_list:
        res += single_step_analysis(SoE, T_mod, N_cycles, P_req_list,
                                    dt=0.1, display=False)
        res += '\n'

    import io
    test_fname = '80 test/NaSStorage single step test '+\
                 'T{:04.0f}-N{:04.0f}.txt'.format(T_mod*10, N_cycles)
    # Write test results
    if write_test:
        from datetime import datetime
        with io.open(test_fname,'w') as res_file:
            res_file.write('NaSStorage test file generated on %s\n' % \
                           datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
            res_file.write(res)

    # Read test results
    with io.open(test_fname) as res_file:
        res_file.readline() # skip the date header
        res_cmp = res_file.read()

    print('Comparing with recorded test "%s"' % test_fname)
    test_ok = (res_cmp == res)
    print('Comparison test OK!' if test_ok else 'There are differences in the test:')
    #assert test_ok
    ### Print the differences with red color:
    if not test_ok:
        from termcolor import colored
        res_list = [c if c == res_cmp[i] else colored(c, 'red', attrs = ['bold'])
                    for i,c in enumerate(res) if i<len(res_cmp)]
        print(''.join(res_list))

Now, hopefully the simulation code is doing what it should and there won't be a third version of the figure in my article... Hopefully !

Python training at ENS Rennes

I just put online the material of my Python training class I taught yesterday for the first time to a group of ~15 science teachers at my school ENS Rennes. It is a new requirement of the French Education Ministry that these teachers (in Math, Physics and Engineering) will have to teach Programming and Numerical computing with Python, starting next academic year in September 2013. They can otherwise choose Scilab, which can be useful for playing with block diagrams.

Since the targeted audience was mainly composed of Mechanical Engineering teachers, I tried to anchor several examples in that field. Nothing "advanced" though, because I'm not a mechanical engineer ! And more importantly, those examples were meant to be ready-to-use, for teachers to be soon in front of their new students. I hope they found this useful !

Les marchands de doute / Merchants of Doubt

Book cover of "les marchands de doute" (Fr) "Merchants of Doubt" (En)I just finished the book "les marchands de doute" ("Merchants of Doubt") by Naomi Oreskes et Erik M. Conway. It took me some weeks because I read so sparsely, but I'm glad I did.

Since by nature I like thinking in terms of doubts, I know I could get easily influenced by some skeptical arguments on Global warming. Arguments read or heard every now and then from friends, colleagues or randomly on the internet. Indeed, I think that the "Cartesian doubt" approach is fruitful when approaching a new research topic.

However, the book "Merchants of Doubt" sheds the light on an unfruitful kind of doubt I was unaware of: the unfair doubt, the doubt on a topic where a reasonable scientific consensus is already reached, the doubt that is overly supported by people which feels threatened by those scientific results. Such doubt is purported beyond reason in order to give a false impression to the general public. The impression that there is still an active scientific debate when this debate is actually long gone (or moved to minor details of the question).

I believe "Merchants of Doubt" is the outcome of a tremendous work of historical research from its authors. Naomi Oreskes et Erik M. Conway show the unexpected connection between the doubt on the dangers of Tobacco, Pesticides, Ozone depletion and Climate change in general. Indeed, it's quite incredible to learn that some of the people who were constantly harassing scientific results (and sometimes scientists themselves!) on those pretty different topics were actually the same people!

Just as a pointer, Naomi Oreskes and Jacques Treiner (translator of the French edition) were invited on the radio program Science publique just a year ago, when the French edition was published.

As a result this reading, I have now downloaded some of the IPCC reports on Climate change! I doubt I'll take the time to read those entirely, but my first very positive impression was on how clear their writing is. Figures are clearly presented and there is a very systematic approach in the treatment of uncertainty (since there is always some) by the use of a precisely defined vocabulary (like "Virtually certain", "Extremely likely", "More likely than not", depending on clearly expressed probability thresholds). This is clearly in contrast to the crude approach of the "Merchants of Doubt"...

Unicode Cheatsheet for Greek & Math

I often find it useful to add some mathematical or greek symbols in my scientific computing Python programs (mainly in commentary). Here is a selection I've taken from Gnome Character Map for math. Greek was mainly generated by a Python script...

Math

Arithmetic
plus-minus _ ±
multiplication _ ×
division _ ÷
power _  ² ³
root _ √ ∛
infinity _ ∞
Operators
integrals and sum _ ∫ ∬ ∑
partial diff.             _ ∂
increment/Laplace _ ∆ (different from Greek delta : Δ)
nabla                      _ ∇
expectation             _ ⟨⟩ [apparently the double-struck E symbol kills Worpress...]

Relationships

  • equality            = ≈ ≠ ≡
  • inequality          < > ≤ ≥ ⩽ ⩾
  • proportional to     ∝
  • element of          ∈ ∉
  • subset of           ⊂ ⊄
  • quantifiers         ∀ ∃ ∄

Sets

  • integers            ℕ ℤ
  • real numbers        ℝ
  • complex numbers     ℂ
  • empty set           ∅

Arrows

  • arrows : → ⟶ ⇒
  • maps to : ↦ ⟼

Greek

pairs of lower case and upper case letters (putting a dot on letters which have very similar latin counterparts)

α .
β .
γ Γ
δ Δ
ε .
ζ .
η .
θ Θ
. .
κ .
λ Λ
μ .
ν .
ξ Ξ
. .
π Π
ρ .
ς .
σ Σ
τ .
υ .
φ Φ (and also ϕ at U+03d5)
χ .
ψ Ψ
ω Ω

This greek table was first generated by these three lines of Python:

l = [unichr(a)+u' '+unichr(b)
     for a,b in zip(range(0x3b1, 0x3ca), range(0x391,0x3aa)) ]

print(u'\n'.join(l))

Better stats with Python thanks to Patsy

Patsy, a Python package for describing statistical models and building design matrices using  "formulas" : a great addition in the ecosystem of Python statistical packages !

The first release a very nice package called Patsy was announced last week by Nathaniel Smith. Quoting Nathaniel's message, Patsy is « a Python package for describing statistical models and building design matrices using "formulas". Patsy's formulas are inspired by and largely compatible with those used in R ».

I should add to this message that Patsy is also King Arthur's assistant in the Monty Python and The Holy Grail film. While I discovered this by chance, I doubt it is a coincidence...

This new module is a great addition to the emerging ecosystem of Python statistical packages which already includes statsmodels and pandas as flagships, the latter being very actively developed. A complete description of Patsy's functionalities is provided on its overview page, but the best description to me is the clearly written quickstart section. Not all modules enjoy such a nice documentation, so let's enjoy !

Open Terminal Here

Friday afternoon, I finally got tired of a bug in "Open Terminal Here", a Bash script I was using so far. I made a Python replacement...

Open Terminal Here ?

"Open Terminal Here" just a simple Bash script meant to be launched from a right-click within a directory opened in Nautilus, the Gnome file manager. As the name states, the gnome-terminal should be launched with its working directory being set to the currently opened directory. Simply handy! Only one problem: setting the  working directory was failing whenever the name of directory contained non-ASCII characters. 🙁

This "Open Terminal Here" is not a standard functionality of Nautilus (as opposed to Dolphin where you can just press Maj+F4 to launch a "konsole"). Rather, you have to download an extension script from G-Scripts, a nice central repository, and drop it in a quite buried directory (~/.gnome2/nautilus-scripts). When it comes to the "Open Terminal Here" function, there are several scripts available. Choosing between these is up to the visitor...

"Open Terminal Here" in Bash

Here is the central part of the Bash script I was using so far:

base="`echo $NAUTILUS_SCRIPT_CURRENT_URI | cut -d'/' -f3- | sed 's/%20/ /g'`"
if [ -z "$NAUTILUS_SCRIPT_SELECTED_FILE_PATHS" ]; then
     dir="$base"
else 
     while [ ! -z "$1" -a ! -d "$base/$1" ]; do shift; done
     dir="$base/$1"
fi

gnome-terminal --working-directory="$dir"

I had never paid attention to this code until Friday (especially since I have such a poor Bash understanding). I now realize that line 1 does most of the job, which is :

  1. retreive the $NAUTILUS_SCRIPT_CURRENT_URI environment variable.
  2. cut the "file://" prefix from the URI
  3. replace "%20" by real space characters " "

This Bash script works most of the time, except with directory names containing non-ASCII characters. Indeed, in a directory called "études notes" for example, Nautilus sets the URI variable to "file:///home/pierre/%C3%A9tudes%20notes". One can see how "é" gets encoded as "%C3%A9" and the Bash script doesn't perform any decoding.

Now "Open Terminal Here" with Python

In order to support non-ASCII, I tried to build my own script. I chose Python simply because that's the language I'm the most familiar with. The entire source is a GitHub Gist. Here are the main lines:

import os
from urllib import unquote
from subprocess import Popen

# 1a) Retrieve the URI of the current directory:
env = os.getenv("NAUTILUS_SCRIPT_CURRENT_URI", "/home/pierre")

# 1b) Process the URI to make it just a regular Path string
env = env.replace('file://', '') # Should fail with Python 3 ?!
env = unquote(env) # decode the URI

# 2) Launch gnome-terminal:
Popen(["gnome-terminal", '--working-directory=%s' % env])

Well, it's basically the same as in Bash, except with a useful call to the urllib.unquote function which performs all the URI decoding job. All imports are from the Python standard library: great !

And by the way, with Perl !

While writing this post, I realized I had on my laptop a second "Open Terminal Here" script. I had forgotten about it because it was disabled for I don't know what reason. It's a Perl script, also coming from G-Script. Here are the main lines:

use strict;

$_ = $ENV{'NAUTILUS_SCRIPT_CURRENT_URI'};
if ($_ and m#^file:///#) {
  s/%([0-9A-Fa-f]{2})/chr(hex($1))/eg;
  s#^file://##;
  exec "gnome-terminal --working-directory='$_'";
}

What can be said except that it also works ! The URI decoding is nicely performed by "chr(hex($1))". I actually think the author wrote this script for the very same reason I wrote a Python one: "he loves Perl!" 😉

Conclusion ?

I don't know if it's that useful to have three versions of a script to solve the very same tiny problem... Still, one can notice the vivid differences in the programming approach: while my Python scripts relies mostly function coming from standard modules, the Perl script is literally built on Regular Expressions. No surprise !

Finally, the Bash script just calls command line utilities (cut and sed), but doesn't decode the URI. Is there a command line program for that ?

And what about Haskell ?

I'll keep this for another day 😉 I need to get a Fistful of Monads first. Would be a nice experiment though...

Autoregressive processes : learning with an interactive tool

I recently got interested in second order autoregressive processes ("AR(2)" processes), after spending quite some time working only with first order ones... It was time to get an extra degree of freedom/complexity ! However, I didn't have much intuition about the behavior of these models and neither about how to "tune the knobs" that set this behavior so I spent some time on this subject.

For the sake of clarity, let's write down a definition/notation for an AR(2) process X_k:

X_k = a_1 X_{k-1} + a_2 X_{k-2} + Z_k

with the real numbers a1 and a2 being the two parameters that tune the "behavior" of the process. Z_k may be call the "input" in the context of linear filters, while "innovation" (i.e. a white noise input) would better fit in the context of random processes. A nice property of AR(2) processes, as opposed to AR(1) is their ability to capture slowly oscillating behaviors, like ocean waves. Also, within the AR(2) model, AR(1) is included as the special case a2 = 0.

Bits of a non-interactive study

In order to understand the interplay of the two parameters a1 and a2, I spent extensive time doing paper-based analytical/theoretical studies. I also created a IPython Notebook which I should publish when there is proper Notebook -> HTML conversion system (edit: now available on nbviewer). Most of the AR(2) behavior is governed by the characteristic polynomial P(z) = z^2 - a_1 z - a_2. This polynomial helps in showing that there are two regimes : the one with two real roots (when  a_2 > - a_1^2/4 ) and the one with two complex-conjugate roots (when  a_2 < - a_1^2/4 ) where I can get my long-awaited slow oscillations. A third regime is the special case of a double root  a_2 = - a_1^2/4 which I don't think is of a special interest.

Beyond the study of the root, I also computed the variance of the process and its spectral density. I found both not so easy to compute !

Creating an interactive tool

After the static paper-based study and the still semi-static computer-based study, where I was able to derive and plot the impulse response and some realization of random trajectories, I wanted to get a real-time interactive experience of the interplay of the two parameters a1 and a2. To this end I wrote a tiny Python script, based on the event handling machinery of Matplotlib. More specifically, I started from the "looking glass" example script which I altered progressively as my understanding of event handling in Matplotlib was getting better. Here is a frozen screen capture of the resulting plots:

screen capture from the Interactive AR2 demo
Interactive demo of some properties of an auto-regressive AR(2) process

I put the code of this script on a GitHub Gist. This code is quite crude but it should work (assuming that you have NumPy, SciPy and Matplotlib installed). I've implemented the plots of three properties of the process:

  1. the root locus diagram, where one can see the transition between the two real roots and the two complex-conjugate roots as the  a_2 = - a_1^2/4 boundary is crossed.
  2. the Impulse Response (output of the filter for a discrete impulse  Z_k = \delta_k ), implemented simply using scipy.signal.lfilter.
  3. the Frequency Response, which I normalized using the variance of the process so that it always has unit integral. This latter property is mainly based on my own analytical derivations so that it should not be taken as a gold reference !

My feeling about interactive plotting with Matplotlib

Overall, I felt pretty at ease with the event handling machinery of Matplotlib. I feel that writing an interactive plot didn't add too many heavy lines of code compared to a regular static plot. Also, the learning curve was very smooth, since I didn't have to learn a new plotting syntax, as opposed to some of my older attempts at dynamic plotting with Traits and Chaco. Still, I feel that if I wanted to add several parameters in a small interactive GUI, Traits would be the way to go because I haven't seen a nice library of predefined widgets in Matplotlib (it's not it's purpose anyway). So I keep the Traits and Chaco packages for future iterations !