Nitish Shirish Keskar
Nitish Shirish Keskar
Salesforce Research
Verified email at salesforce.com - Homepage
Title
Cited by
Cited by
Year
On large-batch training for deep learning: Generalization gap and sharp minima
NS Keskar, D Mudigere, J Nocedal, M Smelyanskiy, PTP Tang
arXiv preprint arXiv:1609.04836, 2016
13092016
Regularizing and optimizing LSTM language models
S Merity, NS Keskar, R Socher
arXiv preprint arXiv:1708.02182, 2017
7012017
The natural language decathlon: Multitask learning as question answering
B McCann, NS Keskar, C Xiong, R Socher
arXiv preprint arXiv:1806.08730, 2018
2242018
Improving generalization performance by switching from adam to sgd
NS Keskar, R Socher
arXiv preprint arXiv:1712.07628, 2017
2222017
Ctrl: A conditional transformer language model for controllable generation
NS Keskar, B McCann, LR Varshney, C Xiong, R Socher
arXiv preprint arXiv:1909.05858, 2019
1812019
An analysis of neural language modeling at multiple scales
S Merity, NS Keskar, R Socher
arXiv preprint arXiv:1803.08240, 2018
1292018
Weighted transformer network for machine translation
K Ahmed, NS Keskar, R Socher
arXiv preprint arXiv:1711.02132, 2017
742017
Neural text summarization: A critical evaluation
W Kryściński, NS Keskar, B McCann, C Xiong, R Socher
arXiv preprint arXiv:1908.08960, 2019
652019
A closer look at deep learning heuristics: Learning rate restarts, warmup and distillation
A Gotmare, NS Keskar, C Xiong, R Socher
arXiv preprint arXiv:1810.13243, 2018
552018
Balancing communication and computation in distributed optimization
AS Berahas, R Bollapragada, NS Keskar, E Wei
IEEE Transactions on Automatic Control 64 (8), 3141-3155, 2018
452018
Coarse-grain fine-grain coattention network for multi-evidence question answering
V Zhong, C Xiong, NS Keskar, R Socher
arXiv preprint arXiv:1901.00603, 2019
332019
adaqn: An adaptive quasi-newton algorithm for training rnns
NS Keskar, AS Berahas
Joint European Conference on Machine Learning and Knowledge Discovery iná…, 2016
302016
A second-order method for convex 1-regularized optimization with active-set prediction
N Keskar, J Nocedal, F Íztoprak, A Waechter
Optimization Methods and Software 31 (3), 605-621, 2016
242016
Ctrl: A conditional transformer language model for controllable generation
N Shirish Keskar, B McCann, LR Varshney, C Xiong, R Socher
arXiv e-prints, arXiv: 1909.05858, 2019
232019
Identifying generalization properties in neural networks
H Wang, NS Keskar, C Xiong, R Socher
arXiv preprint arXiv:1809.07402, 2018
222018
Progen: Language modeling for protein generation
A Madani, B McCann, N Naik, NS Keskar, N Anand, RR Eguchi, ...
arXiv preprint arXiv:2004.03497, 2020
192020
Xlda: Cross-lingual data augmentation for natural language inference and question answering
J Singh, B McCann, NS Keskar, C Xiong, R Socher
arXiv preprint arXiv:1905.11471, 2019
152019
Sequence-to-sequence prediction using a neural network model
NS Keskar, K Ahmed, R Socher
US Patent App. 15/884,125, 2019
142019
A nonmonotone learning rate strategy for SGD training of deep neural networks
NS Keskar, G Saon
2015 IEEE International Conference on Acoustics, Speech and Signalá…, 2015
142015
Multitask Learning As Question Answering
NS Keskar, B McCann, C Xiong, R Socher
US Patent App. 15/974,075, 2019
122019
The system can't perform the operation now. Try again later.
Articles 1–20