Show HN: AI Alignment at Your Discretion

When feeding alignment principles to an AI, what does it prioritize? What are humans prioritizing when they give feedback? Does AI have the same priorities as human feedback?

We formalize these questions as 'alignment discretion'. Just like we scrutinize judges when they interpret and balance legal principles (judicial discretion), we should scrutinize annotators and algorithms when they balance alignment principles. Our paper presents a set of metrics to do exactly this, and we find significant discrepancies between human and algorthmic discretion.


Comments URL: https://news.ycombinator.com/item?id=43116772

Points: 1

# Comments: 0