Versioned distributed arrays for resilience in scientific applications: Global view resilience A Chien, P Balaji, P Beckman, N Dun, A Fang, H Fujita, K Iskra, ... Procedia Computer Science 51, 29-38, 2015 | 42 | 2015 |
Exploring versioned distributed arrays for resilience in scientific applications: global view resilience A Chien, P Balaji, N Dun, A Fang, H Fujita, K Iskra, Z Rubenstein, Z Zheng, ... The International Journal of High Performance Computing Applications 31 (6 …, 2017 | 18 | 2017 |
Resilience for stencil computations with latent errors A Fang, A Cavelan, Y Robert, AA Chien 2017 46th International Conference on Parallel Processing (ICPP), 581-590, 2017 | 13 | 2017 |
How much SSD is useful for resilience in supercomputers A Fang, AA Chien Proceedings of the 5th Workshop on Fault Tolerance for HPC at eXtreme Scale …, 2015 | 11 | 2015 |
Multi-versioning performance opportunities in bgas system for resilience N Dun, D Pleiter, A Fang, N Vandenbergen, AA Chien International Conference on High Performance Computing, 486-504, 2016 | 8 | 2016 |
Fault tolerance assistant (fta): An exception handling programming model for mpi applications A Fang, I Laguna, K Sato, T Islam, K Mohror Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States), 2016 | 8 | 2016 |
Towards understanding post-recovery efficiency for shrinking and non-shrinking recovery A Fang, H Fujita, AA Chien European Conference on Parallel Processing, 656-668, 2015 | 8 | 2015 |
Applying gvr to molecular dynamics: Enabling resilience for scientific computations A Fang, AA Chien University of Chicago, Tech. Rep. TR-2014-04, 2014 | 8 | 2014 |
Fault tolerance assistant (FTA): an exception handling approach for MPI programs A Fang, I Laguna, K Sato, T Islam, K Mohror ExaMPI15 Exascale MPI at Supercomputing 2015 (SC15), 2015 | 5 | 2015 |
Exploring versioning for resilience in scientific applications: Global view resilience A Chien, P Balaji, P Beckman, N Dun, A Fang, H Fujita, K Iskra, ... International Conference on Computational Science (ICCS), 2015 | 5 | 2015 |
ABFR: convenient management of latent error resilience using application knowledge A Fang, AA Chien Proceedings of the 27th international symposium on high-performance parallel …, 2018 | 4 | 2018 |
Resilient n-body tree computations with algorithm-based focused recovery: Model and performance analysis A Cavelan, A Fang, AA Chien, Y Robert High Performance Computing Systems. Performance Modeling, Benchmarking, and …, 2018 | 4 | 2018 |
Using global view resilience (GVR) to add resilience to exascale applications H Fujita, N Dun, A Fang, ZA Rubenstein, Z Zheng, K Iskra, J Hammond, ... Proceedings of the international conference for high performance computing …, 2014 | 3 | 2014 |
Flexible error recovery using versions in global view resilience N Dun, H Fujita, A Fang, Y Liu, AA Chien, P Balaj, K Iskra, W Bland, ... 2015 IEEE International Conference on Cluster Computing, 512-513, 2015 | 2 | 2015 |
Application-based Focused Recovery (ABFR): Convenient Management of Latent Error Resilience Using Application Knowledge A Fang The University of Chicago, 2018 | | 2018 |
Versioned Distributed Arrays for Resilience in Scientific Applications: Global View Resilience. K Teranishi, MA Heroux, MF Hoemmen, A Chien, P Balaji, P Beckman, ... Sandia National Lab.(SNL-CA), Livermore, CA (United States); Sandia National …, 2015 | | 2015 |