Grid Running – Part I

I’ve just given a talk about CERN@school at #GridPP30, the 30th GridPP Collaboration Meeting, at the University of Glasgow. It’s been a fantastic meeting – kindly sponsored by Dell – and it’s been great to meet everyone and let them know about how CERN@school is having an Impact. If you’re interested, you can find the slides here (along with the rest of the GridPP30 programme), but in a nutshell it covers what I’ve been up to for the last six months with a GridPP flavoured twist. Since my last post, we’ve now got to the point of running jobs with some custom CERN@school software as a member of the CERN@school Virtual Organisation (VO). The next step is to develop and run the GEANT4 simulations of the Timepix detectors and the LUCID experiment.

For the moment, though, I thought it might be interesting to put into practice something that I’ve been thinking a lot about recently. For simplicity, the jobs I ran used a custom piece of stand-alone software that takes a Timepix data file, reads it and extracts information about which of its 65,536 pixels had recorded the tell-tale signs of ionising radiation. The Pixelman software that runs the Timepix detector outputs the data in the format XYC, where X is the x position on the detectors sensor element (4 bytes), Y is the y position (4 bytes), and C is the number of time that pixel has spent over threshold (2 bytes). So, in binary format, five “hit” pixels would look like this (click on it for the full image):

Binary Cluster

The software takes this data, reads in each byte and converts each pixel’s 10-byte sequence of 1’s and 0’s into the three numbers we need to make sense of the data. It also checks a separate file that contains the information about where new frames begin (data files contain can contain more than one frame). So the above fifty bytes translate into the following pixel information in a handy Comma Separated Value (CSV) file:

57, 1, 83
58, 1, 115
59, 1, 28
58, 2, 71
59, 2, 46

You can see by the x and y values that these particular pixels are adjacent to each other. This is, of course, intentional – these pixels form a single cluster that I’d picked out from the sample data earlier. Here are the pixels visualised in the Pixelman frame viewer:

cluster-for-blog

Sure, it’s not exactly the Higgs boson – it’s almost certainly an electron that has hit the sensor element of the detector and bounced about a bit within the silicon, creating a signal in multiple pixels – but it’s a nice bit of particle physics data processing. We’ve taken the “raw” output from a detector and processed it using some software in order to interpret what it might mean. Future versions might depend on other external software packages (such as CERN’s ROOT analysis framework) – which would need to be installed for the CERN@school VO – and be used to process data stored on GridPP Storage Elements (SEs).

So what? Well, the fact that this particular piece of software is stand-alone means that it shouldn’t be too difficult to make the source code available to anyone. I’ve been following the #openaccess and #opendata discussions with interest, as the two concepts are pretty fundamental to what CERN@school is trying to achieve, and I think something that naturally follows on from this making sure that your analysis is reproducible by anyone – i.e. they can access and use your code to process the data that you also make available. So that’s what I’ve done:

This has also (finally!) given me the chance to use Digital Science‘s figshare – something I’ve been meaning to do for ages. So, if you’ve got access to something that’ll compile and run some C++, you should be able to do a little CERN@school data analysis yourself. But there are some questions to think about:

  • Can you actually use this? If not, why not?
  • If you have, what have you/can you do with the data? What else do you need to know/want to know about the data set?
  • Would you want to actually use this? After all, it’s going to take some time to get set-up and running going. Is it worth the effort?

If you do do anything with the code or the data, I’d love to hear about it – please use the comments section below for this, or any other questions, or suggestions for improvement. Happy coding!

Update – 17:56 28th March 2013: There’s now a summary of the GridPP30 meeting here – thanks Neasan!

Update – 05:53 22nd May 2013: GitHub have very kindly given us an educational account, which you can find here – so I have moved the repository above there and updated the link. You can sign up for an educational account here.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s