The US Department of Energy’s National Energy Research Scientific Computing Center (NERSC), Intel, and a consortium of five universities have jointly released a new big data research center to handle DOE’s most urgent data-intensive science problems at scale.
Equipped with Cori supercomputer, the Big Data Center (BDC) will be responsible for testing whether or not the current HPC systems is able to support data-intensive workloads that needs analysis of 100 terabytes plus datasets on 100,000 CPU cores or greater. Ultimately, the BDC, according to the DOE, Intel and the five Intel Parallel Computing Centers’ expectation, will result in an optimized and a scalable production that data analytics and management stack on the supercomputer.
Cori, the newest supercomputer developed by NERSC, is a Cray XC40 that is composed of two parts—one is running with Intel Xeon Phi “Knights Landing” processors while another one is running with Intel Xeon “Haswell” processors. All processors are running on the same blazing-fast inter-node network based on Cray “Aries” technologies.
The system also comes with a first-of-its kind NVRAM “burst buffer” storage device, in additional to a large Lustre scratch file system to increase I/O performance to 1.7TB/sec. Cori currently is named as the 6th most powerful supercomputer worldwide by the June 2017 Top500 list, following Sunway TaihuLight, Tianhe-2 (MilkyWay-2), Piz Daint, Titan, and Sequoia.
Prabhat, the lead for NERSC’s analytics, data, and services team as well as the director of BDC, stated that the first task of the research project is identifying applications in the DOE data science community, articulating analytics requirements and then developing scalable algorithms. But the key to success of the task lies in developing algorithms in the context of the production stack.
Fortunately, they have a top-notch multi-disciplinary team with experienced performance optimization and scaling experts included, therefore being able to maximize capability applications on Cori.
The director of code modernization department at Intel, Joseph Curley also said that BDC’s objective is to help users make full use of the supercomputer hardware capabilities so as to have their largest problems on the Cori supercomputer solved.
Meanwhile, BDC is also created with several other underlying objectives, for instance, building and hardening data analytics frameworks in the software stack, generating a productive way for developers and data scientists to use the Cori supercomputer for getting insights from their data.
In addition to code modernization at scale, according to Joseph Curley, Intel is expecting to create the software environment and software stack needed in the process, with the cooperation with NERSC and IPCCs.