Title: |
Learning from Failure Across Multiple Clusters: A Trace-Driven Approach to Understanding, Predicting, and Mitigating Job Terminations |
Article URLs: |
|
Alternative Article URLs: |
|
Authors: |
Nosayba El-Sayed |
-
MIT, Computer Science and Artificial Intelligence Lab
|
Hongyu Zhu |
-
University of Toronto, Department of Computer Science
|
Bianca Schroeder |
-
University of Toronto, Department of Computer Science
|
Sharing: |
Unknown
|
Verification: |
Authors have
not verified
information
|
Artifact Evaluation Badge: |
none
|
Artifact URLs: |
|
Artifact Correspondence Email Addresses: |
|
NSF Award Numbers: |
|
DBLP Key: |
conf/icdcs/El-SayedZS17
|
Author Comments: |
|