Rlaif: Scaling reinforcement learning from human feedback with ai feedback H Lee, S Phatale, H Mansoor, KR Lu, T Mesnard, J Ferret, C Bishop, ... | 403 | 2023 |
LLMs cannot find reasoning errors, but can correct them! G Tyen, H Mansoor, P Chen, T Mak, V Cărbune arXiv preprint arXiv:2311.08516, 2023 | 45 | 2023 |
Screenai: A vision-language model for ui and infographics understanding G Baechler, S Sunkara, M Wang, F Zubach, H Mansoor, V Etter, ... arXiv preprint arXiv:2402.04615, 2024 | 33 | 2024 |
RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback H Lee, S Phatale, H Mansoor, T Mesnard, J Ferret, KR Lu, C Bishop, ... Forty-first International Conference on Machine Learning, 0 | 23 | |
Methods and systems for predicting conversion rates of content publisher and content provider pairs R Kirillov, H Mansoor US Patent 9,246,990, 2016 | 10 | 2016 |
RLAIF: Scaling reinforcement learning from human feedback with ai feedback, 2024 H Lee, S Phatale, H Mansoor, T Mesnard, J Ferret, K Lu, C Bishop, E Hall, ... URL https://openreview. net/forum, 0 | 9 | |
Methods and systems for providing an actionable object within a third-party content slot of an information resource of a content publisher R Kirillov, A Tyler, D Banfield, H Mansoor, DM Goodridge, LA Collard US Patent 10,067,916, 2018 | 6 | 2018 |
Methods and systems for providing an actionable object within a third-party content slot of an information resource of a content publisher R Kirillov, A Tyler, D Banfield, H Mansoor, DM Goodridge, LA Collard US Patent 9,461,936, 2016 | 6 | 2016 |
Chart-based reasoning: Transferring capabilities from llms to vlms V Carbune, H Mansoor, F Liu, R Aralikatte, G Baechler, J Chen, A Sharma arXiv preprint arXiv:2403.12596, 2024 | 3 | 2024 |
PERL: Parameter Efficient Reinforcement Learning from Human Feedback H Sidahmed, S Phatale, A Hutcheson, Z Lin, Z Chen, Z Yu, J Jin, ... arXiv preprint arXiv:2403.10704, 2024 | 3 | 2024 |
VQA Training Sets are Self-play Environments for Generating Few-shot Pools T Misiunas, H Mansoor, J Uijlings, O Riva, V Carbune arXiv preprint arXiv:2405.19773, 2024 | | 2024 |
The Impact of Preference Agreement in Reinforcement Learning from Human Feedback: A Case Study in Summarization S Gooding, H Mansoor arXiv preprint arXiv:2311.04919, 2023 | | 2023 |
Methods and systems for providing an actionable object within a third-party content slot of an information resource of a content publisher R Kirillov, A Tyler, D Banfield, H Mansoor, DM Goodridge, LA Collard US Patent 10,210,140, 2019 | | 2019 |