Professor John R. Williams and PhD  alumnus Mohamad Sindi recently won the IEEE Innovative Paper Award for their paper titled “Using Container Migration for HPC Workloads Resilience”. The award was presented during the IEEE High Performance Extreme Computing Conference (HPEC’19) on September 25 in Waltham, MA. The paper was competing against numerous submissions from some top academic institutions such as MIT, Harvard, Stanford, Georgia Tech, Duke, and Carnegie Mellon. The paper introduces an innovative method to address a global challenge in the domain of High Performance Computing (HPC), which is fault-tolerance for large scale HPC workloads. The work was rated as “Outstandingly Novel” in terms of novelty. Their invention uses a machine learning algorithm that is highly accurate in detecting sick machines with a low false positive rate. Upon detection, the container running on the sick machine is frozen, a memory snapshot is taken, and the container is migrated to a healthy machine where the computation is resumed. The whole process is automated and occurs in around 60 seconds on average.