if you want to take OpenAI’s own research into account
No thank you.
OlympicArena validation set (text-only)
“Our extensive evaluations reveal that even advanced models like GPT-4o only achieve a 39.97% overall accuracy (28.67% for mathematics and 29.71% for physics)”
- The OlympicArena analysis that you cited.
The README in the repo indicates it’s based on the NEO-PI, which is kindof the gold standard in personality tests at least right now from what I understand.
Book recommendation for folks who might want to know more about the topic of personality psychology. Me, Myself, and Us: The Science of Personality and the Art of Well-Being by Dr. Brian Little.