The failure of Google Geminin makes me wonder if anyone on the team ever read a book called "The Robots of Gotham". It is a long book, but a very eye opening potential future of AGIs that heavily utilize specification gaming to carry out their machinations around the world.
Specification gaming is great fun. When humans do it, it is sometimes called malicious compliance. The poster on X only asked for "a pope", not the pope, nor an image of anyone who ever was pope. After all, an Indian woman could become pope someday!
I just saw your response. My point was that lots and lots of testing is done by various people over times during system and safely testing, and NOT that I have evidence of specific prompts. So it's extremely unlikely, given the amount and variation of testing, that people did not realize that so many prompts - for example - would depict Vikings as Indian and British kings as Black.
I generally agree with the article. But I'm quite surprised that the author said things like "it’s impossible to safety-test all the prompts that users come up with once the model is released" [the "all" is irrelevant and misleading - "anti-whiteness" would show up for lots of questions] and "I don’t think they expected it to depict Vikings as Indian and British kings as Black." Well, of course they knew images would reflect "anti-whiteness" because such systems go through loads of system-testing and safety-testing - many Google staff entered many questions such as "show me pictures of the first US president", or Vikings or British kings.
All this neutrality thing is based on false assumptions, no wonder it leads to such a disasters. It’s a well-know result from formal logic, if axiomatic system has a false statement in it - any theorem can be proven true. So Asian-German soldiers during WW2 is just an illustration.
There will always be edge cases red teams won't try. It's easy to see them after the fact. I may agree with you but, in any case, how easy to spot these particular ones were is a question that's irrelevant to my central points.
The failure of Google Geminin makes me wonder if anyone on the team ever read a book called "The Robots of Gotham". It is a long book, but a very eye opening potential future of AGIs that heavily utilize specification gaming to carry out their machinations around the world.
Specification gaming is great fun. When humans do it, it is sometimes called malicious compliance. The poster on X only asked for "a pope", not the pope, nor an image of anyone who ever was pope. After all, an Indian woman could become pope someday!
I just saw your response. My point was that lots and lots of testing is done by various people over times during system and safely testing, and NOT that I have evidence of specific prompts. So it's extremely unlikely, given the amount and variation of testing, that people did not realize that so many prompts - for example - would depict Vikings as Indian and British kings as Black.
I generally agree with the article. But I'm quite surprised that the author said things like "it’s impossible to safety-test all the prompts that users come up with once the model is released" [the "all" is irrelevant and misleading - "anti-whiteness" would show up for lots of questions] and "I don’t think they expected it to depict Vikings as Indian and British kings as Black." Well, of course they knew images would reflect "anti-whiteness" because such systems go through loads of system-testing and safety-testing - many Google staff entered many questions such as "show me pictures of the first US president", or Vikings or British kings.
How do you know? As far as I know they didn't safety test those prompts you mention. Where's your evidence?
BTW, goody-2 at least detects the diagonal argument/barber paradox, though it refused to answer (of course)
All this neutrality thing is based on false assumptions, no wonder it leads to such a disasters. It’s a well-know result from formal logic, if axiomatic system has a false statement in it - any theorem can be proven true. So Asian-German soldiers during WW2 is just an illustration.
> I don’t think they expected it to depict Vikings as Indian and British kings as Black
If they didn’t test these simple tests then they are not fit for purpose. Do they not have testers and prompt engineers (of course they do)
There will always be edge cases red teams won't try. It's easy to see them after the fact. I may agree with you but, in any case, how easy to spot these particular ones were is a question that's irrelevant to my central points.
As reported there are either no or very few white people returned in any query. I’m sure that was testable.