Skip to content Skip to footer
0 items - £0.00 0

Why exams intended for humans might not be good benchmarks for LLMs like GPT-4