
Background information
7 questions you have about DeepSeek (and the answers)
by Samuel Buchmann
When its latest AI "Llama 4" was released, Meta boasted high scores on a benchmark platform. However, the model only achieves these scores in a special version that is not even available.
The performance of artificial intelligence (AI) is tested using benchmarks. One of the leading platforms for this is LM Arena. Good results attract attention - as was the case with Meta's new "Llama 4", which the company released at the weekend. However, it is now clear that Meta has been playing its cards close to its chest in order to make its model look as good as possible. This is reported by the portal "TechCrunch".
In its press release, Meta emphasises the ELO score of 1417 for "Maverick" (the medium-sized model in the LLama 4 family). This very high score means that Maverick often wins direct benchmark duels against competitors. It suggests that Meta's model is ahead of OpenAI's 4o and only just behind the current leader Gemini 2.5 Pro from Google.
The waves that Maverick made in the Community were correspondingly high. It seemed as if Meta was at the forefront with it, after its previous models had always lagged behind. As it now turns out, however, the developers did not use the publicly available version of Maverick for the benchmarks on LM Arena, but an "experimental chat version". However, this was only mentioned in the small print.
Meta's approach does not explicitly contravene the rules of LM Arena - but it does contradict the idea behind the platform. This is because the benchmarks lose their meaning when developers send specially optimised versions of their models into the race that are not available anywhere because they have other disadvantages. This means that the scores no longer represent realistic performance and are no longer suitable for assessment.
The episode shows how much pressure Meta is under in the AI race. Especially now that a second open-weight model, the Chinese DeepSeek, is on the market. Before its launch, Llama 4 was reportedly postponed several times because it did not fulfil internal expectations. In the end, it was strangely released on a Saturday (5 April) instead of the following Monday (7 April) as originally planned. When asked why, Meta CEO Mark Zuckerberg replied on Threads: "Then it was done."
My fingerprint often changes so drastically that my MacBook doesn't recognise it anymore. The reason? If I'm not clinging to a monitor or camera, I'm probably clinging to a rockface by the tips of my fingers.