News for the ‘Software’ Category
“You know Science… Is this as awesome as it sounds?”

Asked a good friend of mine at the very sociable hour of 12.45am. He was referring to the latest Cornell Computational synthesis Laboratory software, Eureqa. A program designed for reverse engineering dynamical systems. After a brief moment of wistfully staring into space and wishing I did indeed know science (there’s alot of it!!) I gave my knee jerk response: “No”. I’ll be honest, at that point I had only read a press release and not the original paper, nor had I test driven the software. Terrible, I know, but the marketing of these things is usually so over-hyped it’s hard to believe it could really be that awesome. Fear not though, I have rectified this situation and spent the past few evenings reading all about it whilst having a good old tinker with the software. I have to say it is as awesome as it sounds.
The Interface
For freeware, the interface is elegant and well thought out. The tabulated environment provides an intuitive walk through the stages of data analysis (data input, data smoothing, options for equation development, starting the search and analyzing the solutions). As a result the detailed instructions are not strictly necessary for a first run, but do come in handy for fine tuning a run.
Entering Data
The software requires the input of raw data. If the data is noisy you can smooth it in Eureqa (with or without relation to a confidence rating). For complex data sets it is recommended that you pre-process the data in another application before copying into Eureqa.
The second stage is selecting the type of equation the data represents. This is done by selecting the parameter (x) you are interested in and which variables can be used within the formula such that x = f (selected parameters). You also need to select how you want the fitness of the system to be measured. There are 12 fitness objectives to choose from and weightings can be applied to all.
Perhaps the most the complex part of the selection criteria is choosing the ‘building blocks’ of the equation. Here you use your knowledge of the data set to limit the formula search to a certain group of mathematical terms (add, subtract, multiply etc).
Finding the Solution
Once all options are selected the fun part starts. Just click ‘Play’ and watch the error reduce as the software attempts to compute a viable solution to the data set. The program uses an algorithm that takes the derivative/s of the data set/s. It then combines the previously selected building blocks into multiple equations. The performance of each equation is compared to the data set and the equation with the smallest calculated error is kept (the ‘fittest’ solution). The next set of equations is based on the fittest solution from the previous set of calculations. Again, these equations are tested against the data set and the fittest solution is kept. This cycle is repeated iteratively until the fitness of the system is optimized (the computed error becomes negligible when compared to the data).
Now, the best part about science is getting your hands dirty. I decided to start with a simple data set just to test the stability of the program and the computational speed. To this end I fed it with a column of x = 1:39 and a second column of x^2-x. I asked it to find the relationship between these two columns. It quickly computed the correct solution. Great. Although, perhaps not all that impressive seeing as this is something that I could have solved by inspection. Next stop something harder.
Using the true logic of a scientist I went from ‘little test case’ to ‘here is a shed load of data I’ve collected and don’t yet understand, do something impressive with it’. This understandably was not so successful. There are many reasons for this. First of all the data could have benefited from pre-processing, whilst the data set should be curved the limited data points led to Eureqa fitting a straight line. I also left the ‘building blocks’ option as the defaults (I haven’t yet decided what should and shouldn’t be available for computing the data set). Still, Eureqa happily cranked the handle to produce an equation that fitted the data reasonably well. Unfortunately, the sequence of errors led to the computation of an equation that had several constants, and only one variable (time, if you’re interested…). Simple analytical inspection of the data suggests there is a minimum of 3 variables. It did however notice that there was a non-linear dependence in the data. A minor win.
Overall
On the whole it’s failure at my second test is no surprise. The software is limited by the knowledge the user has of the data set, the amount of pre-processing that has been conducted on the data set and the mathematical terms that could apply to the system. I certainly would not say this is a drawback of the software at all. It is a tool for aiding scientists discover otherwise indiscernible inter-dependencies within complex data sets. And, like with most tools in science, as long as the user understands its limitations it remains a valid tool for discovery. I’d certainly use it again, and hope to continue to tinker with it until eventually it gives me something as good as Newtons second law of motion – but one that has not been discovered yet
References
- Brandon Kiem (2009) “Download your own robot scientist”, Wired Science, Dec 14 -18th.
- Schmidt M., Lipson H. (2009) “Distilling Free-Form Natural Laws from Experimental Data,” Science, Vol. 324, no. 5923, pp. 81 – 85.
- Handinflow Image taken from CS4FN (Computer Science for Fun)
- Paul Jones (2009) The fun and joys of early hours tweeting, Twitter.
Edited: December 24th, 2009



