ISC'14

June 22–26, 2014
Leipzig, Germany

Session Details

 
Name: BoF 05: Drilling Down: Understanding User-Level Activity on Today's Supercomputers
 
Time: Tuesday, June 24, 2014
09:00 am - 10:00 am
 
Room:   Hall 4
CCL - Congress Center Leipzig
 
Presenter:   Mark Fahey, University of Tennessee
  Richard Gerber, NERSC
  Bilel Hadri, KAUST
  Robert McLay, TACC
  Tim Robinson, CSCS
  Zhengji Zhao, NERSC
 
Abstract:   Let’s talk real, no-­‐kiddin’ supercomputer analytics, aimed at moving beyond monitoring the machine as a whole or even its individual hardware components. We’re interested in drilling down to the level of individual batch submissions, users, and binaries. And we’re not just targeting performance: we’re after ready answers to the "what, where, how, when and why" that stakeholders are clamoring for – everything from which libraries (or individual functions!) are in demand to preventing the problems that get in the way of successful science. This BoF will bring together those with experience and interest in present and future system tools and technologies that can provide this type of job-­‐level insight, and will be the kickoff meeting for a new Special Interest Group (SIG) for those who want to explore this topic more deeply.
We hope to promote tools and technologies aimed at focusing attention across a wide range of metrics and measures of job-­‐level activity, including intended activity (measures of demand and goals); successful activity (especially insight into problems and issues); and effective activity (measures of performance and impact). We see potential benefits to stakeholders beyond just end users: sponsoring institutions interested in strategic priorities and scientific impact; support organizations and development teams concerned about meeting users’ needs and expectations; and those seeking to study user activity to improve value and effectiveness. Perhaps most important of all, we envision tools that make possible active interventions: preventing, detecting and correcting problems; alerting users and support staff to issues warranting attention; identifying high (and low) value libraries and tools; and precisely characterizing training needs, deficiencies, and opportunities.
The proposers bring to the table passion and experience improving the end-­‐user experience. Both Fahey and McLay are actively engaged in work focused on job-­‐level activity. Dr. Fahey is the author of ALTD, a tool that reports software and library usage at the individual job level. McLay is the author of Lmod, an innovative environmental module system with numerous features that facilitate job-­‐level analysis and protect the user from common configuration problems. Both tools are currently deployed at numerous major centers across the United States and Europe.
Among those who have already committed to participate in this BoF are Richard Gerber (Senior Science Advisor, National Energy Research Scientific Computing Center); Bilel Hadri (King Abdullah University of Science and Technology); Tim Robinson (HPC Application Analyst, Swiss National Supercomputing Centre); and Zhengji Zhao (National Energy Research Scientific Computing Center). Others who have expressed strong interest include representatives from several other XSEDE, DoD, and DOE labs and service providers. We intend to solicit additional participants, and we have every reason to expect the discussion to be lively.
This is an open forum. We’ll kick off the discussion with a twelve-­‐minute introduction to plant some seeds and kick up some dust: three-­‐minute opening comments by four presenters with interest and experience in these questions (and a range of viewpoints and priorities). We'll then begin a moderated but open-­‐ended discussion. We will conclude with a bit of logistics regarding the appropriate next step for the SIG. We are confident this will be only the beginning of an initiative that will prove both energizing and valuable.

Targeted Audience
This BoF will bring together those with experience and interest in present and future system tools and technologies that can provide library and application usage and job-­‐level monitoring type of job-­‐level insight, and will be the kickoff meeting for a new Special Interest Group for those who want to explore this topic more deeply.