Availability of new processor and accelerator technologies is enabling
application of existing computational techniques, such as deep learning, to
new application areas like self-driving cars and medical image classification.
advantage of the performance provided by fast storage technologies
(i.e., DSSD) at the scale and economics provided by traditional
spinning media and object-based storage. Developments in this area,
combined with software-defined storage technologies, will change
how we think about IO and storage infrastructure in HPC.
On the cloud front, Dell EMC already has many HPC
research sites utilizing cloud technologies to manage their HPC
infrastructure in a more flexible manner. As these tools and
techniques continue to mature, they could change the way we view
and build HPC infrastructure.
Also, containerization technologies like Docker help researchers
collaborate better and guarantee reproducibility of results, while
providing close-to-bare-metal support. Similarly, packages such as
singularity are more targeted towards HPC and address the security
issues existing in contemporary technologies. This can help shape
how applications are deployed on HPC systems in the future and
how researchers can easily collaborate across organizations.
Q: How specifically are these technologies helping
people do their jobs?
Onur: One particular example that we experienced recently
was related to bioinformatics and use of container technology in
HPC. Genomic pipelines are often complicated to deploy and run.
They have a lot of dependencies with other applications, compiler
versions, OS versions, etc. Because of this, reproducing a result
from one researcher at a secondary physical cluster by another
researcher is very complicated. Containers solve this problem, since
the application and its dependencies are all included in the package,
and all the second researcher needs to have is Docker installed.
Researchers have fewer things to debug now and can focus on the
science rather than software issues.
Another example is the work we did in understanding the right
building blocks to architect a genomics solution, which lead to the
development of the Dell EMC HPC System for Life Sciences. This
enabled customers like Translational Genomics Institute (TGen)
to cut down their genomics analysis time from multiple weeks to
several hours per patient.
Availability of new processor and accelerator technologies,
including but not limited to NVIDIA GPUs, FPGAs, Intel Xeon Phi
Processors, Intel Xeon processors, etc. allow us to get a lot more
performance per node out of our systems today than was possible a
few years ago. Availability of this much computational power is now
enabling us to apply existing computational techniques (such as deep
learning) to new application areas (self-driving cars, medical image
classification, etc.) However, with more choice comes complexity as
well. That is why one of our goals is to do the early evaluation of
new technologies, design and integrate complete systems in a
balanced architecture, and help the HPC community make the
right design decisions.
Q: If any technologies are not working, how does
the HPC Innovation Lab work on improving them?
Any specific examples?
Onur: With any new technology under development, there
are often issues we run into in the early days that are related
to performance, stability and scalability. Our HPC Innovation
Lab has been involved with many such instances, ranging from
certain applications not scaling on a new processor technology,
to more complicated race conditions that occur on parallel file
systems only at scale, to requirements for fabric discovery tools
that can discover a multi-thousand nodes fabric in parallel.
What makes the situation more unique for the HPC industry is
that many HPC users, especially in research fields, like to be at the
bleeding edge of technology. To minimize the impact of issues
with new technologies, our approach is to engage early in the
development process. We have very close engineering relationships
with our technology partners, such as Bright Computing, Intel,
Mellanox, NVIDIA and Red Hat. We collaborate with them from
their alpha/beta phase to their release phase in order to understand
how they are working and to suggest improvements/optimizations.
We also spend time looking at how to integrate new technologies
into our customers’ existing infrastructure, which may be based
on a different architecture. Being able to test and
evaluate these technologies early on gives us the
opportunity to get these critical issues fixed before
the technology becomes generally available.
However, certain real-life workloads cannot be
easily replicated in a lab environment. Realizing
this, we also partner with our Dell EMC HPC
Innovation Centers, such as the Texas Advanced
Computing Center (TACC), the Dell/Cambridge
HPC Solution Centre and the San Diego
Supercomputer Center (SDSC), to incorporate
their feedback from real-life applications into the
new technologies we work on.
Finally, we have production HPC resources
built on these new technologies that we make
available to our customers remotely so that they
can test their own workloads and evaluate how
they work for their own applications. ●