Apama@CERN

Bastian Hossbach » Research » Apama@CERN

Application Monitoring with Apama at CERN

This project is done in cooperation with the European Organization for Nuclear Research (CERN) and Software AG. The CERN in Geneva is widely known for operating the world's largest particle accelerator, the Large Hadron Collider (LHC), and as the birthplace of the World Wide Web (WWW).

CERN

The circular LHC is located at a depth ranging from 50 to 175 meters underground and has a circumference of about 27 km. Among the LHC, seven detectors were constructed for observing particle collisions and the resulting phenomena. One of those detectors is ATLAS (A Toroidal LHC Apparatus). ATLAS was one of the two detectors that discovered a particle consistent with the Higgs boson (aka the 'God particle') in 2012. The following photo of ATLAS was taken during the shutdown in 2013:

ATLAS detector

ATLAS is 45 meters long, 25 meters in diameter and weights 7,000 tons (for comparison, the weight of the Eiffel Tower is 7,300 tons). When active, ATLAS generates constantly 40 mio. events per second. Each event has a size of about 25 MB (2 MB with zero suppression) resulting in a total of 1 petabyte (80 terabytes with zero suppression) of data per second. This incredible amount of data is handled by hardware based filtering via FPGAs in the first place. Here, the amount of data is reduced to the 100,000 most interesting events per second. The hardware based filters are placed directly at ATLAS. After this first stage of filtering, the remaining data is forwarded to a data center placed about 100 meters above the ATLAS cavern at ground level:

ATLAS data center

This data center consists of about 2,000 server machines having 17,000 CPU cores in total. The server machines execute more than 10,000 applications for reducing the data to the 1,000 most interesting events per second that finally can be stored and analyzed deeply (still resulting in an event stream with the challenging volume of about 2 GB per second). Of course, it is absolutely necessary for the experiments that the applications run smoothly. But software, especially massive parallel and distributed systems, can get in trouble or fail. In this project, we design and implement a solution based on modern event processing techniques as well as event processing products of Software AG, especially Apama, for monitoring all applications in real-time. Detected problems are solved automatically and immediately. For example, applications in trouble and all depending applications are reconfigured or simply restarted. We presented a showcase of this project at the Software AG booth at CeBIT 2014 (only a part of our team is shown at the following photo):

CeBIT 2014

The following video shows our final showcase in action (in my office a few days before CeBIT):

Related Documents