LLM Colosseum

Project summary

Evaluate Large Language Models (LLM) quality by having them fight in realtime in Street Fighter III. Who is the best? OpenAI or MistralAI? Let them fight! Open source code and ranking.

Making LLMs fight in realtime assess their speed and their reasoning abilities. LLMs have to quickly assess their environment and take actions based on it.

LLMs are different from Reinforcement Learning (RL) models, that are based on maximizing a reward function. LLMs have a more prior knowledge about the concepts of fighting, video games, Street Fighter, available guides, etc. This is a different approach that can help advance how AI understand and act with their environment.

This project was made by teams from phospho and Quivr.