Follow
Jonathan Uesato
Jonathan Uesato
Unknown affiliation
Verified email at mit.edu
Title
Cited by
Cited by
Year
Scaling language models: Methods, analysis & insights from training gopher
JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, F Song, J Aslanides, ...
arXiv preprint arXiv:2112.11446, 2021
7582021
Adversarial risk and the dangers of evaluating against weak attacks
J Uesato, B O’donoghue, P Kohli, A Oord
International Conference on Machine Learning, 5025-5034, 2018
6192018
Ethical and social risks of harm from language models
L Weidinger, J Mellor, M Rauh, C Griffin, J Uesato, PS Huang, M Cheng, ...
arXiv preprint arXiv:2112.04359, 2021
6082021
On the effectiveness of interval bound propagation for training verifiably robust models
S Gowal, K Dvijotham, R Stanforth, R Bunel, C Qin, J Uesato, ...
arXiv preprint arXiv:1810.12715, 2018
4782018
Robustfill: Neural program learning under noisy i/o
J Devlin, J Uesato, S Bhupatiraju, R Singh, A Mohamed, P Kohli
International conference on machine learning, 990-998, 2017
4282017
Technical report on the cleverhans v2. 1.0 adversarial examples library
N Papernot, F Faghri, N Carlini, I Goodfellow, R Feinman, A Kurakin, ...
arXiv preprint arXiv:1610.00768, 2016
3902016
Are labels required for improving adversarial robustness?
JB Alayrac, J Uesato, PS Huang, A Fawzi, R Stanforth, P Kohli
Advances in Neural Information Processing Systems 32, 2019
3382019
Robustness via curvature regularization, and vice versa
SM Moosavi-Dezfooli, A Fawzi, J Uesato, P Frossard
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019
3232019
Taxonomy of risks posed by language models
L Weidinger, J Uesato, M Rauh, C Griffin, PS Huang, J Mellor, A Glaese, ...
Proceedings of the 2022 ACM Conference on Fairness, Accountability, and …, 2022
3152022
Improving alignment of dialogue agents via targeted human judgements
A Glaese, N McAleese, M Trębacz, J Aslanides, V Firoiu, T Ewalds, ...
arXiv preprint arXiv:2209.14375, 2022
2952022
Uncovering the limits of adversarial training against norm-bounded adversarial examples
S Gowal, C Qin, J Uesato, T Mann, P Kohli
arXiv preprint arXiv:2010.03593, 2020
2952020
Training verified learners with learned verifiers
K Dvijotham, S Gowal, R Stanforth, R Arandjelovic, B O'Donoghue, ...
arXiv preprint arXiv:1805.10265, 2018
1702018
Scalable verified training for provably robust image classification
S Gowal, KD Dvijotham, R Stanforth, R Bunel, C Qin, J Uesato, ...
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2019
1642019
Challenges in detoxifying language models
J Welbl, A Glaese, J Uesato, S Dathathri, J Mellor, LA Hendricks, ...
arXiv preprint arXiv:2109.07445, 2021
1502021
Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming
S Dathathri, K Dvijotham, A Kurakin, A Raghunathan, J Uesato, RR Bunel, ...
Advances in Neural Information Processing Systems 33, 5318-5331, 2020
1012020
Specification gaming: the flip side of AI ingenuity
V Krakovna, J Uesato, V Mikulik, M Rahtz, T Everitt, R Kumar, Z Kenton, ...
DeepMind Blog 3, 2020
872020
Cyprien de Masson d’Autume
JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, F Song, J Aslanides, ...
802021
Rigorous agent evaluation: An adversarial approach to uncover catastrophic failures
J Uesato, A Kumar, C Szepesvari, T Erez, A Ruderman, K Anderson, ...
arXiv preprint arXiv:1812.01647, 2018
762018
An alternative surrogate loss for pgd-based adversarial testing
S Gowal, J Uesato, C Qin, PS Huang, T Mann, P Kohli
arXiv preprint arXiv:1910.09338, 2019
752019
Solving math word problems with process-and outcome-based feedback
J Uesato, N Kushman, R Kumar, F Song, N Siegel, L Wang, A Creswell, ...
arXiv preprint arXiv:2211.14275, 2022
732022
The system can't perform the operation now. Try again later.
Articles 1–20