News
Microsoft's Phi 3 Medium 128K Instruct, Meta's Llama 3 70B, Google's Gemma 2 and Mistral AI's Mistral 7B Instruct were able to output benchmark test sets verbatim. Hunt's data even provided the ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results