HPC sites had budgets in place to upgrade their power
and cooling capabilities, to the average tune of about $7
million. Not surprisingly, government sites typically have
the largest systems and are under the greatest energy and
spatial pressure. Industrial sites are least constrained —
half of them never see their energy bills, because someone
else in the company receives and pays them.
By and large, HPC data centers, with the help of vendors,
have been coping so far with rising requirements for power
and cooling, even at an average cost of about $1 million per
megawatt. The real question is what the future holds.
THE SWORD OF DAMOCLES?
As we approach the exascale era, is the power and cooling issue a disruptive sword of Damocles hanging over the
HPC community, or will it continue to be manageable?
The study I cited earlier asked HPC users and vendors whether they expected any revolutionary advances
in power and cooling technology in the next five years.
The users said no, the vendors said yes, and they were
referring in most cases to the same advances (because the
vendors had briefed the users about their plans).
The users are less optimistic about breakthroughs, but
most do not seem heavily concerned yet, except on one
important point: potential tradeoffs between productivity
and power efficiency. Tradeoffs frequently mentioned by
users include pressure to overbuy energy-efficient, hard-er-to-program coprocessors and accelerators; and accepting more service disruptions because of shorter upgrade
cycles to deploy more energy-efficient systems.
But these tradeoffs seem manageable in the sense that
they are based on decisions users will make. Even the
vaunted goal of fitting an exascale computer into a 20
MW power envelope is a matter of time, of when rather
than if. It might happen in 2020, 2024 or a different year,
but it will happen.
THE BIGGER UNKNOWNS
Much less certain is when deeper, more integral energy-efficiency capabilities will become available. These are considerably more challenging than squeezing a peak exascale into a
20 MW package, however difficult that may be. These deeper
capabilities will go a long way toward making exascale and
lesser extreme-scale computing a reasonable proposition for
funders and users alike.
In particular, sophisticated power management (“power
steering”) will be needed throughout the system to dynami-
cally shift power to where it’s needed at every moment. Both
hardware and software will need to be able to “learn” about
power needs on the fly. Achieving this goal will require large
investments to develop software that can power profile and
power-steer many elements of the system, including:
• Cores and processors
• The interconnect and network interface
• The storage system
• The operating system, programming model and entire
• Application codes (power-aware applications)
The HPC community is capable of developing these and
other needed capabilities, given enough time, money and
personnel. So, the real question, as with so many major
HPC undertakings, is when will these necessary elements
come together in sufficient quantity? If the past is any
guide, a large chunk of the funding will need to come from
government sources. And for that to happen, it has become
increasingly clear that the HPC community will need to
make a strong case for the returns government funders can
anticipate from major HPC investments like this.
This discussion so far assumes that HPC data centers will
have access to enough reliable energy, even in the exascale
era. That is not a given. Today’s largest HPC systems already
consume as much electricity as a small city, and their exascale successors promise to devour more, even with expected
advances in energy-efficiency. Some of the biggest HPC data
centers worry that their local power companies may balk at
fully supplying their future needs. A few sites have “plan B”
scenarios in place, in which they go off the grid and build
small nuclear reactors. And in some parts of the world, reliable access to adequate power is already a major challenge
for HPC data centers — an important reminder that power
and cooling are concerns not only for sites marching toward
exascale capacity, but for most HPC sites.
In their pursuit of energy efficiency at extreme scale,
HPC sites will likely have fellow travelers in the form of
major Internet players. A pattern is already forming in
which these companies locate new data centers in geographical areas where power is comparatively cheap and
plentiful. Google set the tone more than five years ago, by
building a vast new data center along the Columbia River
near Oregon’s Dalles Dam, with its 1.8-gigawatt power
station and relatively inexpensive hydroelectric power. A
prominent HPC example is Oak Ridge National Laboratory, whose power appetite is fed by the Tennessee Valley
Authority. The lab’s data center hosts multiple petascale
systems from DOE, NSF and NOAA. The biggest HPC and
Internet data centers will likely have much to learn from
each other in the coming years.
Steve Conway is Research VP, HPC at IDC. He may be
reached at editor@ScientificComputing.com.