Top Large Language Models (LLMs)
The top large language models in the Summer of 2024, along with recommendations for when to use each based upon needs like API, tunable, or fully hosted.
≈ 12 minutes readWe’re excited to bring you an updated guide on choosing the right Large Language Models (LLMs) for various use cases! Since our original post, the LLM landscape has rapidly evolved with the release of dozens of new LLMs from research labs and companies around the world. These models vary widely in size, performance, benchmarks, intended applications, and licenses, making the selection process increasingly complex.
In this post, which you can think of as a curated LLM gallery, we aim to simplify your decision-making by recommending our top LLM choices tailored to specific use cases. This guide is designed to help you navigate the diverse options and find the best LLM to meet your needs.
GPT-4o
Our Pick for a fully hosted, API-Accessible LLM (Paid Tier)
Our Pick for a fully hosted, API-Accessible Multimodal LLM
For a fully hosted, API-based large language model (LLM) that balances performance, cost, and features including multimodality, GPT-4o from OpenAI is our top recommendation. OpenAI has been at the forefront of developing generative pre-trained transformer-based language models, and GPT-4o continues to push the boundaries of AI.
Our team has conducted extensive testing to compare GPT-4o with other leading models in the market. On a range of tasks as diverse as information extraction from resumes and generating summaries with accurate citations, we observe that GPT-4o significantly outperforms models like Meta Llama 3 70B, Google Gemini Advanced, and GPT-3.5 Turbo.
These observations are supported by the LMSYS Chatbot Arena Leaderboard, GPT-4o’s top ranking on this leaderboard suggests that our experiences are not unique. Another standout feature of GPT-4o is its support for multimodal features through its API. This means that the model can handle not only text but also images and audio, thus making it our top choice for both fully hosted text as well as multimodal applications.
Other models considered: Google Gemini Advanced. Claude 3 Opus. Hosted versions of large OS models Llama3 70B and Mistral Large.
GPT-3.5 Turbo
Our pick for a fully hosted chat interface LLM (Free Tier)
GPT-3.5 is a text-only model that can perform a lot of the text-based functions that GPT-4 can, albeit GPT-4 usually exhibits better performance.
GPT-3.5 is a successor to InstructGPT and GPT-3. InstructGPT itself was specifically trained to receive prompts and provide detailed responses that follow specific instructions, while GPT-3.5 Turbo is designed to engage in natural language conversations. OpenAI frequently pushes updates and new features such as the ChatGPT plugins which unlock even more LLM use cases.
Basic (non-peak) access to GPT-3.5 Turbo via a web UI is free, however, if you want to use the more powerful variants such as GPT4o you only get a limited amount of usage before the free usage runs out. The high quality and free chat interface usage make GPT 3.5 turbo our top pick for a model to use if you don’t need API access and can make do with using the web interface to use the model.
CodeQwen-1.5
Our pick for best model for code understanding and completion
Announced in April 2024, CodeQwen-1.5 is a 7B model by the Qwen team of Alibaba optimized for code understanding and completion. The model has been trained on a large amount of code. EvalPlus is a popular benchmark that evaluates code-writing LLMs. As of June 2024, CodeQwen1.5 is the best open-source LLM on this benchmark achieving near GPT-4 level performance with just 7B parameters. The relatively small size of the model compared to other LLMs makes it easy to self-host or fine-tune with LoRA-based techniques. The model is open for limited commercial use under a customized license.
Other models considered: Codestral, Code Llama Family, DeepSeek-coder-33b
Mistral 7B Instruct v0.3
Our pick for best model to fine-tune for commercial or research purposesAnnounced in May 2024, Mistral-7B-Instruct-v0.3 is an improvement on the popular, and already excellent Mistral-7B-Instruct-v0.2 model. The new version extends the vocabulary size and adds better support for function calling and tool use which enable use-cases like Agentic RAG. The overall instruction-following capabilities of the model are also improved compared to the v0.2 version. The model comes with a permissive Apache 2.0 license along with excellent community support and finetuning recipes, making it our top pick for the model to choose if you need to finetune from a base model on your own data for both commercial as well as non-commercial purposes.
Other Models Considered: Llama 3 8B, Starling LM 7B Alph, Zephyr 7B Beta
Llama 3 70B
Our pick for a high-quality self-hosted model if compute isn’t an issue
Meta’s Llama 3 70B, announced in April 2024, is part of the third iteration of Llama models, and the 70B variant is the largest of the available sizes.
Llama 3 70B holds its own against several large models beating Claude Sonnet, GPT 3.5, and Mistral Medium on human evaluation benchmarks. The tokenizer for Llama 3 has a whopping 128K token vocabulary allowing efficient language encoding along with its 15 TRILLION token training data which includes 30 languages also enabling multilingual use cases. Llama 3 70B is available via a custom license which allows commercial use with some caveats.
Llama 3 70B’s high-scoring benchmarks make it our top choice for the LLM to use if you have the computing resources to host such a large variant!
Other models considered: Falcon 2 180B, Bloom
Llama 3 8B
Our pick for a self-hosted model for use without fine-tuning
Meta’s Llama 3 8B is the smaller sibling of the Llama 4 70B model described above. Benefitting from much of the same advances as its larger variant, Llama 3 8B variant sets a new state-of-the-art level of performance for models in its own size class across a variety of tasks.
The model is available on Huggingface, with a custom license. The model size is small allowing hosting on hobby grade GPUS, and out-of-the-box performance beats similarly sized Mistral and Gemini models. Both of these facts combine to make Llama 3 8B an ideal candidate for use by hobby or side-projects where one might not have a large budget for hosting or data for finetuning and needs to make do with a good performing out-of-the-box model.
Other models considered: Mistral 7B Instruct v0.3, Gemma 7B
Gorilla-openfunctions-v2
Our pick for a self-hosted model to use for tool use and function calling
Gorilla OpenFunctions-v2 stands out as the premier self-hosted model for function calling and tool use, both basic features needed in order to use an LLM as a home assistant or in an agentic scenario. It is a 7B model with support for multiple programming languages, including Python, Java, JavaScript, and REST APIs for function calling. It ranks at the same level if not higher than commercially available function calling models and surpasses numerous other open-source models of similar size. On the Berkeley function calling leaderboard, it is surpassed only by the GPT4 family of models.
Its advanced capabilities include parallel and multiple function calls, which allows developers to handle complex scenarios seamlessly. The performance, advanced function calling capabilities, and relatively manageable size make Gorilla OpenFunctions v2 our top choice for a self-hosted model for tool use and function calling.
Other models considered: NexusRaven-V2, Hermes 2 Pro Llama 3 8B