Python Northeast: The Scientist-Programmer
In a new location at Clavering House (and in a weird scheduling quirk, on at the same time as the iPhone group—I walked into the wrong room before realising I’d misheard “iPhone” for “Python”!), this month’s Python North East went boldly where no meetup has gone before … the Python Song!
OK, this was in the leadup as there was a delay, but it cannot be unheard. (The lyrics are pretty clever though). There was also heated discussion as to whether MongoDB is a good framework to use (arguments against included that it doesn’t scale well and fails silently: “it’s good for messing around, but for proper stuff it is a pain”, and better frameworks exist for big projects such as Cassandra, Elastase Search or Solar/Sphinx).
No, this month’s meetup showed how coding can be used in the hard sciences, in this case, modeling neutron diffraction (not quite reversing the polarity but pretty close!). Talking to us about how python and particularly NumPy can help do this was Rowan Hargreaves.
Hargreaves is part of the ISIS project (not named after the river, but the Egyptian goddess, as they bring together various bits, as the goddess did when her husband was chopped into pieces by another god). Using enough electricity to power a medium sized town, the project in an impressive structure with massive chutes to allow experiments with splitting deuterium (heavily oxidised water) into particles. His work is involves a two way process of doing experiments, attempting to model the behaviour of the diffraction (how the particles bounce off), testing it and continuing the refining of the model. Each cycle can take six months, so this isn’t your high school science! (For the record, he also recommends Python Scripting for Computational Science as a good primer for his type of work.)
This is where NumPy comes in. When you’re doing hugely complex equations, time (and memory) savings are key. Hargreaves noticed huge improvements in speed with his modelling when using the framework, as noted by NASA and replicated below..
Packages/Compilers | Elapsed Time (s) |
---|---|
Python | 48.55 |
NumPy | 0.96 |
Matlab | 2.398 |
gfortran | 0.540 |
gfortran with -O3 | 0.280 |
ifort | 0.276 |
ifort with -O3 | 0.2600 |
Java | 12.5518 |
Various processing speeds of programming languages, taken from NASA
Sure, Pascal and Fortran are faster … but then you have to write Pascal or Fortran.
He also gave a number of basic demos to show how it works.
He has noticed a few random quirks and inconsistencies (e.g. how you name things in one function may have to be different in another) but all in all has found it useful. He can also use it to pass in some of his boss’s FORTRAN data (yes, some people stick to what they were taught) or conversely, pass his data through to the nicer visual modellers around.
He also made some interesting points about science and coding. He had a great quote about scientist doing computing only ever experimenting with it (my Google-fu has failed to bring up the source, sadly). [EDIT: thanks Rowan for clearing this up, the quote is:
However, there is a more serious aspect of this: there’s no real reason for people in academia to investigate coding in any depth, let alone make any contributions in this respect, as it’s not recognised as valid contributions to research (or in English HE terms “contributing to the REF”). I’d point to the most startling example of this as the person who created the popular open source bibliographic software Zotero being denied tenure as he had not created enough ‘research outputs’.
On a happier note, this talk was a reminder of how much bleed there is between disciplines when it comes to coding. You might be a web designer, journalist, artist, or in this instance scientist-programmer, in all instances, coding has some use.
[UPDATE March 2013]
Data Community DC has put together a blog post on being a data scientist with Python that is well worth looking at (as is the related Hacker News Discussion).
Member discussion