Hannah Rose Kirk

Cited by

	All	Since 2019
Citations	699	699
h-index	12	12
i10-index	14	14

420

210

105

315

20212022202320248 67 419 197

Public access

View all

2 articles

1 article

available

not available

Based on funding mandates

Co-authors

Bertie VidgenOxford, TuringVerified email at rewire.online
Paul RöttgerPostdoctoral Researcher, Bocconi UniversityVerified email at unibocconi.it
Aleksandar (Suny) ShtedritskiPhD student, University of OxfordVerified email at robots.ox.ac.uk
Scott A. HaleOxford Internet Institute, University of Oxford, Meedan, and the Alan Turing InstituteVerified email at oii.ox.ac.uk
Yuki M. AsanoAssistant Professor, University of AmsterdamVerified email at uva.nl
Yennie JunGoogle Research, Truveta, University of Oxford, UN Global PulseVerified email at google.com
Frédéric A. DreyerUniversity of OxfordVerified email at physics.ox.ac.uk
Siobhan Mackenzie HallDPhil Student, University of OxfordVerified email at nds.ox.ac.uk
Leon DerczynskiITU Copenhagen & NVIDIAVerified email at itu.dk
Max BainUniversity of OxfordVerified email at robots.ox.ac.uk
Jonas SchuettResearch Fellow, Centre for the Governance of AI, Oxford, UKVerified email at governance.ai
Luciano FloridiYale University - Alma Mater Studiorum University of BolognaVerified email at yale.edu
Jakob MökanderUniversity of OxfordVerified email at oii.ox.ac.uk
Tristan ThrushStanfordVerified email at stanford.edu
Wenjie YinQueen Mary University of LondonVerified email at qmul.ac.uk
abeba birhaneAdjunct assistant professor at the school of computer science and statistics, Trinity College DublinVerified email at tcd.ie
Yash BhalgatVisual Geometry Group, University of OxfordVerified email at robots.ox.ac.uk
Hugo BergUndergraduate student, Mathematics & Computer Science, University of OxfordVerified email at ccc.ox.ac.uk
Dirk HovyBocconi UniversityVerified email at unibocconi.it
Noah BroestlGoogle Research and Oxford Uehiro Centre for Practical EthicsVerified email at google.com

Hannah Rose Kirk

University of Oxford

Verified email at oii.ox.ac.uk - Homepage

Large language models NLP Ethics in AI Alignment AI Safety


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Bias out-of-the-box: An empirical analysis of intersectional occupational biases in popular generative language models HR Kirk, Y Jun, F Volpin, H Iqbal, E Benussi, F Dreyer, A Shtedritski, ... Advances in neural information processing systems 34, 2611-2624, 2021	127	2021
Auditing large language models: a three-layered approach J Mökander, J Schuett, HR Kirk, L Floridi AI and Ethics, 1-31, 2023	109	2023
Dataperf: Benchmarks for data-centric ai development M Mazumder, C Banbury, X Yao, B Karlaš, W Gaviria Rojas, S Diamos, ... Advances in Neural Information Processing Systems 36, 2024	72	2024
SemEval-2023 task 10: explainable detection of online sexism HR Kirk, W Yin, B Vidgen, P Röttger arXiv preprint arXiv:2303.04222, 2023	72	2023
A prompt array keeps the bias away: Debiasing vision-language models with adversarial learning H Berg, SM Hall, Y Bhalgat, W Yang, HR Kirk, A Shtedritski, M Bain Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the …, 2022	62	2022
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate HR Kirk, B Vidgen, P Röttger, T Thrush, SA Hale Proceedings of the 2022 Conference of the North American Chapter of the …, 2021	44	2021
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback HR Kirk, B Vidgen, P Röttger, SA Hale arXiv preprint arXiv:2303.05453, 2023	41	2023
Handling and Presenting Harmful Text in NLP HR Kirk, A Birhane, B Vidgen, L Derczynski EMNLP Findings, 2022	29*	2022
Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements C Borchers, DS Gala, B Gilburt, E Oravkin, W Bounsi, YM Asano, HR Kirk Proceedings of the 4th workshop on gender bias in natural language …, 2022	24	2022
Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset HR Kirk, Y Jun, P Rauba, G Wachtel, R Li, X Bai, N Broestl, M Doff-Sotta, ... Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), 2021	23	2021
Xstest: A test suite for identifying exaggerated safety behaviours in large language models P Röttger, HR Kirk, B Vidgen, G Attanasio, F Bianchi, D Hovy arXiv preprint arXiv:2308.01263, 2023	22	2023
Assessing language model deployment with risk cards L Derczynski, HR Kirk, V Balachandran, S Kumar, Y Tsvetkov, MR Leiser, ... arXiv preprint arXiv:2303.18190, 2023	15	2023
Casteist but not racist? quantifying disparities in large language model bias between india and the west K Khandelwal, M Tonneau, AM Bean, HR Kirk, SA Hale arXiv preprint arXiv:2309.08573, 2023	10	2023
The nuances of Confucianism in technology policy: An inquiry into the interaction between cultural and political systems in Chinese digital ethics HR Kirk, K Lee, C Micallef International Journal of Politics, Culture, and Society, 1-24, 2020	10	2020
The past, present and better future of feedback learning in large language models for subjective human preferences and values HR Kirk, AM Bean, B Vidgen, P Röttger, SA Hale arXiv preprint arXiv:2310.07629, 2023	8	2023
Balancing the picture: Debiasing vision-language datasets with synthetic contrast sets B Smith, M Farinha, SM Hall, HR Kirk, A Shtedritski, M Bain arXiv preprint arXiv:2305.15407, 2023	8	2023
Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning HR Kirk, B Vidgen, SA Hale Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying …, 2022	6	2022
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models P Röttger, V Hofmann, V Pyatkin, M Hinck, HR Kirk, H Schütze, D Hovy arXiv preprint arXiv:2402.16786, 2024	3	2024
Visogender: A dataset for benchmarking gender bias in image-text pronoun resolution SM Hall, F Gonçalves Abrantes, H Zhu, G Sodunke, A Shtedritski, HR Kirk Advances in Neural Information Processing Systems 36, 2024	3	2024
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models B Vidgen, HR Kirk, R Qian, N Scherrer, A Kannappan, SA Hale, P Röttger arXiv preprint arXiv:2311.08370, 2023	3	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors