Optimal Design for Reward Modeling in RLHF

Summary

This is a publication. If there is no link to the publication on this page, you can try the pre-formated search via the search engines listed on this page.

Authors: Antoine Scheid, Etienne Boursier, Alain Durmus, Michael I. Jordan, Pierre Ménard, Eric Moulines, Michal Valko

Journal publisher: arxiv

Published year: 2024