Follow
Nitish Shirish Keskar
Nitish Shirish Keskar
OpenAI
Verified email at openai.com - Homepage
Title
Cited by
Cited by
Year
On large-batch training for deep learning: Generalization gap and sharp minima
NS Keskar, D Mudigere, J Nocedal, M Smelyanskiy, PTP Tang
arXiv preprint arXiv:1609.04836, 2016
35412016
Gpt-4 technical report
J Achiam, S Adler, S Agarwal, L Ahmad, I Akkaya, FL Aleman, D Almeida, ...
arXiv preprint arXiv:2303.08774, 2023
3310*2023
Regularizing and optimizing LSTM language models
S Merity, NS Keskar, R Socher
arXiv preprint arXiv:1708.02182, 2017
12972017
Ctrl: A conditional transformer language model for controllable generation
NS Keskar, B McCann, LR Varshney, C Xiong, R Socher
arXiv preprint arXiv:1909.05858, 2019
11652019
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models
A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ...
arXiv preprint arXiv:2206.04615, 2022
8772022
The natural language decathlon: Multitask learning as question answering
B McCann, NS Keskar, C Xiong, R Socher
arXiv preprint arXiv:1806.08730, 2018
6702018
Improving generalization performance by switching from adam to sgd
NS Keskar, R Socher
arXiv preprint arXiv:1712.07628, 2017
6492017
Neural text summarization: A critical evaluation
W Kryściński, NS Keskar, B McCann, C Xiong, R Socher
arXiv preprint arXiv:1908.08960, 2019
3952019
Gedi: Generative discriminator guided sequence generation
B Krause, AD Gotmare, B McCann, NS Keskar, S Joty, R Socher, ...
arXiv preprint arXiv:2009.06367, 2020
3342020
A closer look at deep learning heuristics: Learning rate restarts, warmup and distillation
A Gotmare, NS Keskar, C Xiong, R Socher
arXiv preprint arXiv:1810.13243, 2018
3022018
Progen: Language modeling for protein generation
A Madani, B McCann, N Naik, NS Keskar, N Anand, RR Eguchi, ...
arXiv preprint arXiv:2004.03497, 2020
2522020
An analysis of neural language modeling at multiple scales
S Merity, NS Keskar, R Socher
arXiv preprint arXiv:1803.08240, 2018
1902018
Deep learning-enabled breast cancer hormonal receptor status determination from base-level H&E stains
N Naik, A Madani, A Esteva, NS Keskar, MF Press, D Ruderman, DB Agus, ...
Nature communications 11 (1), 5727, 2020
1892020
Weighted transformer network for machine translation
K Ahmed, NS Keskar, R Socher
arXiv preprint arXiv:1711.02132, 2017
1612017
Balancing communication and computation in distributed optimization
AS Berahas, R Bollapragada, NS Keskar, E Wei
IEEE Transactions on Automatic Control 64 (8), 3141-3155, 2018
1202018
Sequence-to-sequence prediction using a neural network model
NS Keskar, K Ahmed, R Socher
US Patent 11,928,600, 2024
1122024
Multitask learning as question answering
NS Keskar, B McCann, C Xiong, R Socher
US Patent 11,501,076, 2022
902022
Multitask learning as question answering
B McCann, NS Keskar, C Xiong, R Socher
US Patent 10,776,581, 2020
842020
Xlda: Cross-lingual data augmentation for natural language inference and question answering
J Singh, B McCann, NS Keskar, C Xiong, R Socher
arXiv preprint arXiv:1905.11471, 2019
802019
Coarse-grain fine-grain coattention network for multi-evidence question answering
V Zhong, C Xiong, NS Keskar, R Socher
arXiv preprint arXiv:1901.00603, 2019
752019
The system can't perform the operation now. Try again later.
Articles 1–20