Seminar Franck Cappello from Argonne National Laboratory - June 26th - 9h45 Amphithéâtre Leyteire Campus Victoire, Bordeaux26/06 : 09h45
On the occasion of the 14th Scheduling for Large Scale Systems Workshop we are honoured to annonce the seminar of
Franck Cappello from Argonne National Laboratory
HPC-BigData Convergence: What to do when scientific data becomes too big?
June 26th - 9h45 Amphithéâtre Leyteire Campus Victoire, Bordeaux
Franck Cappello received his Ph.D. from the University of Paris XI in 1994 and joined CNRS in1995. In 2003, he joined INRIA, where he initiated the Grid’5000 project (https://www.grid5000.fr) and served as its director until 2008. In 2009, Franck became visiting research professor at the University of Illinois at Urbana Champaign to create with Marc Snir the Joint-Laboratory on Petascale Computing. The joint-lab has developed in 2014 as the Joint laboratory on Extreme Scale Computing (JLESC: https://jlesc.github.io) gathering seven of the most prominent HPC research and production centers: NCSA, Inria, ANL, BSC, JSC, Riken CCS and UTK. From 2008, as a member of the executive committee of the International Exascale Software Project, he led the roadmap and strategy efforts for projects related to resilience at the extreme scale. In 2016, Franck became the director of two Exascale Computing Project (ECP: https://www.exascaleproject.org/) software projects related to resilience (VeloC) and lossy compression (SZ) of scientific data that will help Exascale applications to run efficiently. These software are part of the Exascale software stack (https://e4s-project.github.io ) that will run on US Exascale systems in 2021. He is an IEEE Fellow and the recipient of the 2018 IEEE TCPP Outstanding Service award.
A critical common problem in consumer big data applications and scientific computing (HPC) is the need to communicate, store, compute and analyze extremely large volumes of high velocity and diverse data. For many scientific simulations and instruments, data is already “too big”. Architectural and technological trends of systems used in HPC call for a significant reduction of these big scientific datasets that are mainly composed of floating-point data. In this talk, we present experimental results of currently identified use cases of generic lossy compression to address the different limitations related to processing and managing scientific bigdata. We show from a collection of experiments run on parallel systems of a leadership facility that lossy data compression not only can reduce the footprint of big scientific datasets on storage but also can reduce I/O and checkpoint/restart times, accelerate computation, and even allow significantly larger problems to be run than without lossy compression. These results suggest that lossy compression will become an important technology in many aspects of the convergence between HPC and bigdata. This talk is intended to develop discussion between the data compression, the HPC and scheduling communities.