Megatron-LM: Training Multi-Billion Parameter Language Models Using GPU Model Parallelism M Shoeybi, M Patwary, R Puri, P LeGresley, J Casper, B Catanzaro arXiv preprint arXiv:1909.08053, 2019 | 1629 | 2019 |
Scalable Bayesian Optimization Using Deep Neural Networks J Snoek, O Rippel, K Swersky, R Kiros, N Satish, N Sundaram, M Patwary, ... arXiv preprint arXiv:1502.05700, 2015 | 1310 | 2015 |
Deep learning scaling is predictable, empirically J Hestness, S Narang, N Ardalani, G Diamos, H Jun, H Kianinejad, ... arXiv preprint arXiv:1712.00409, 2017 | 722 | 2017 |
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ... arXiv preprint arXiv:2201.11990, 2022 | 710* | 2022 |
Efficient large-scale language model training on GPU clusters using megatron-LM D Narayanan, M Shoeybi, J Casper, P LeGresley, M Patwary, ... Proceedings of the International Conference for High Performance Computing …, 2021 | 562 | 2021 |
Twitter trending topic classification K Lee, D Palsetia, R Narayanan, MMA Patwary, A Agrawal, A Choudhary Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on …, 2011 | 481 | 2011 |
GraphMat: High performance graph analytics made productive N Sundaram, N Satish, MMA Patwary, SR Dulloor, MJ Anderson, ... Proceedings of the VLDB Endowment 8 (11), 1214-1225, 2015 | 404 | 2015 |
Navigating the maze of graph analytics frameworks using massive graph datasets N Satish, N Sundaram, MMA Patwary, J Seo, J Park, MA Hassaan, ... Proceedings of the 2014 ACM SIGMOD international conference on Management of …, 2014 | 246 | 2014 |
A new scalable parallel DBSCAN algorithm using the disjoint-set data structure MMA Patwary, D Palsetia, A Agrawal, W Liao, F Manne, A Choudhary SC'12: Proceedings of the International Conference on High Performance …, 2012 | 233 | 2012 |
Training Question Answering Models From Synthetic Data R Puri, R Spring, M Patwary, M Shoeybi, B Catanzaro arXiv preprint arXiv:2002.09599, 2020 | 156 | 2020 |
Controllable Story Generation with External Knowledge Using Large-Scale Language Models P Xu, M Patwary, M Shoeybi, R Puri, P Fung, A Anandkumar, B Catanzaro Proceedings of the 2020 Conference on Empirical Methods in Natural Language …, 2020 | 149* | 2020 |
Factuality enhanced language models for open-ended text generation N Lee, W Ping, P Xu, M Patwary, PN Fung, M Shoeybi, B Catanzaro Advances in Neural Information Processing Systems 35, 34586-34599, 2022 | 141 | 2022 |
BioMegatron: Larger Biomedical Domain Language Model HC Shin, Y Zhang, E Bakhturina, R Puri, M Patwary, M Shoeybi, R Mani Proceedings of the 2020 Conference on Empirical Methods in Natural Language …, 2020 | 137 | 2020 |
Fast maximum clique algorithms for large graphs RA Rossi, DF Gleich, AH Gebremedhin, MMA Patwary Proceedings of the companion publication of the 23rd international …, 2014 | 118 | 2014 |
Fast Algorithms for the Maximum Clique Problem on Massive Sparse Graphs B Pattabiraman, M Patwary, M Ali, AH Gebremedhin, W Liao, ... arXiv preprint arXiv:1209.5818, 2012 | 112 | 2012 |
ColPack: Software for graph coloring and related problems in scientific computing AH Gebremedhin, D Nguyen, MMA Patwary, A Pothen ACM Transactions on Mathematical Software (TOMS) 40 (1), 1-31, 2013 | 102 | 2013 |
Deep learning at 15PF: supervised and semi-supervised classification for scientific data T Kurth, J Zhang, N Satish, E Racah, I Mitliagkas, MMA Patwary, T Malas, ... Proceedings of the International Conference for High Performance Computing …, 2017 | 96 | 2017 |
StarCoder 2 and The Stack v2: The Next Generation A Lozhkov, R Li, LB Allal, F Cassano, J Lamy-Poirier, N Tazi, A Tang, ... arXiv preprint arXiv:2402.19173, 2024 | 91 | 2024 |
End-to-End Training of Neural Retrievers for Open-Domain Question Answering DS Sachan, M Patwary, M Shoeybi, N Kant, W Ping, WL Hamilton, ... arXiv preprint arXiv:2101.00408, 2021 | 90 | 2021 |
Parallel efficient sparse matrix-matrix multiplication on multicore platforms MMA Patwary, NR Satish, N Sundaram, J Park, MJ Anderson, ... International Conference on High Performance Computing, 48-57, 2015 | 84 | 2015 |