Pytorch distributed: Experiences on accelerating data parallel training S Li, Y Zhao, R Varma, O Salpekar, P Noordhuis, T Li, A Paszke, J Smith, ... 2020 International Conference on Very Large Databases (VLDB 2020 …, 2020 | 459 | 2020 |
Sample-efficient neural architecture search by learning actions for monte carlo tree search L Wang, S Xie, T Li, R Fonseca, Y Tian IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (9), 5503-5515, 2021 | 97* | 2021 |
GPU resource sharing and virtualization on high performance computing systems T Li, VK Narayana, E El-Araby, T El-Ghazawi 2011 International Conference on Parallel Processing, 733-742, 2011 | 65 | 2011 |
Productivity of GPUs under different programming paradigms M Malik, T Li, U Sharif, R Shahid, T El‐Ghazawi, G Newby Concurrency and computation: practice and experience 24 (2), 179-191, 2012 | 29 | 2012 |
Symbiotic scheduling of concurrent GPU kernels for performance and energy optimizations T Li, VK Narayana, T El-Ghazawi Proceedings of the 11th ACM Conference on Computing Frontiers, 1-10, 2014 | 20 | 2014 |
A static task scheduling framework for independent tasks accelerated using a shared graphics processing unit T Li, VK Narayana, T El-Ghazawi 2011 IEEE 17th International Conference on Parallel and Distributed Systems …, 2011 | 20 | 2011 |
A power-aware symbiotic scheduling algorithm for concurrent GPU kernels T Li, VK Narayana, T El-Ghazawi 2015 IEEE 21st International Conference on Parallel and Distributed Systems …, 2015 | 19 | 2015 |
Exploring graphics processing unit (GPU) resource sharing efficiency for high performance computing T Li, VK Narayana, T El-Ghazawi Computers 2 (4), 176-214, 2013 | 13 | 2013 |
Reconfigurable active drive: An fpga accelerated storage architecture for data-intensive applications T Li, M Huang, T El-Ghazawi, H Huang 2009 Symposium on Application Accelerators in High-Performance Computing, 1-3, 2009 | 10 | 2009 |
Accelerated high-performance computing through efficient multi-process GPU resource sharing T Li, VK Narayana, T El-Ghazawi Proceedings of the 9th Conference on Computing Frontiers, 269-272, 2012 | 8 | 2012 |
& Chintala, S.(2020). Pytorch distributed: Experiences on accelerating data parallel training S Li, Y Zhao, R Varma, O Salpekar, P Noordhuis, T Li arXiv preprint arXiv:2006.15704, 0 | 6 | |
Reordering GPU kernel launches to enable efficient concurrent execution T Li, VK Narayana, T El-Ghazawi arXiv preprint arXiv:1511.07983, 2015 | 5 | 2015 |