How to Run Llama 3 on Mac: A Step-by-Step Guide
Llama is a series of advanced artificial intelligence models developed by Meta. In this tutorial, I will guide you through the process of running Meta Llama on a Mac using Ollama, a powerful tool for setting up and running large language models locally.
Setup
For this demonstration, we are using a MacBook Pro running macOS Sonoma 14.4.1 with 64GB of memory. While we focus on macOS, similar steps can be followed for other operating systems like Linux or Windows.
Installing Ollama
Ollama is essential for running large language models like Llama locally. Here’s how to get started:
- Visit the Ollama Website: Go to the Ollama website and select your platform.
- Download Ollama for macOS: Click on “Download for macOS” to get the installation file.
- Install Ollama: Follow the on-screen instructions to complete the installation.
Downloading Meta Llama Models
Ollama provides Meta Llama models in a 4-bit quantized format, making them more efficient to run on local machines. Here’s how to download them:
- Open Terminal: Launch the terminal on your Mac.
- Download the 8B Model: Run the following command to download the 4-bit quantized Meta Llama 3 8B chat model:
This model is about 4.7 GB in size.ollama pull llama3
- Download the 70B Model (Optional): For the larger 70B model, use:
This model is approximately 39 GB in size.ollama pull llama3:70b
Running the Model
Using ollama run
To run the Llama 3 model, follow these steps:
-
Run the Model: In your terminal, type:
ollama run llama3
-
Ask Questions: You can now interact with the model by typing your questions. For example:
Who wrote the book Godfather?
The model will respond with detailed information.
-
Specific Responses: To get concise answers, specify your request:
Who wrote the book Godfather? Answer with only the name.
Using curl
You can also interact with the Llama model using the curl
command:
- Run with
curl
: In your terminal, enter:curl http://localhost:11434/api/chat -d '{ "model": "llama3", "messages": [ { "role": "user", "content": "Who wrote the book Godfather?" } ], "stream": false }'
- View Response: The model will generate and return the response.
Using a Python Script
To run the Llama model using a Python script, follow these steps:
- Install Python: Visit the Python website to download and install Python for macOS.
- Create a Script: Open your code editor and create a new Python file. Add the following code:
import requests import json url = "http://localhost:11434/api/chat" def llama3(prompt): data = { "model": "llama3", "messages": [ { "role": "user", "content": prompt } ], "stream": False, } headers = { "Content-Type": "application/json" } response = requests.post(url, headers=headers, json=data) return response.json()["message"]["content"] response = llama3("Who wrote the book Godfather?") print(response)
- Run the Script: In your terminal, navigate to the script’s directory and run:
python <name_of_script>.py
Exploring More Examples and Resources
To further explore Llama models and integrate them into your applications, check out the following resources:
- Llama-Recipes GitHub Repo: Find detailed examples and walkthroughs for running Llama models on various platforms, including installation instructions, dependencies, and use cases.
- Build with Meta Llama Series: Discover more tutorials and videos that showcase the practical applications of Llama models.
I hope this guide helps you get started with Meta Llama on your Mac.