AskHandle Blog
How to Convert JSON to JSONL for OpenAI Fine-Tuning
- JSONL
- OpenAI
- Fine-Tuning

How to Convert JSON to JSONL for OpenAI Fine-Tuning
Fine-tuning OpenAI's models can help you customize the behavior of the model to better suit your specific use case. One common task when preparing data for fine-tuning is converting JSON data into a format known as JSONL (JSON Lines). This format is particularly useful when working with OpenAI’s fine-tuning API because it stores each data entry as a single line, making the model training process more efficient.
In this guide, we’ll walk you through the process of converting a JSON dataset into JSONL format using a New York Giants sports team example. This will allow you to create a dataset that can be used to fine-tune a model that provides sports-related information.
What is JSONL?
JSONL stands for JSON Lines, a file format where each line is a separate JSON object. This structure makes it easy to read and process large datasets in a line-by-line fashion, which is perfect for tasks such as model fine-tuning. The OpenAI fine-tuning API expects data in JSONL format, where each line represents a separate interaction between the user and the assistant.
Example Data Structure for Fine-Tuning
When using OpenAI’s fine-tuning API, the data needs to follow a specific structure. The key elements of the JSONL format are:
messages: An array of messages that represent the conversation between thesystem,user, andassistant.role: Defines who is sending the message (system,user, orassistant).content: The content of the message.weight(optional): Indicates the importance of the assistant’s response (usually set to1for most use cases).
Here’s a typical example of the format:
1{
2 "messages": [
3 {"role": "system", "content": "You are a knowledgeable sports assistant who answers questions about teams, players, and sports events."},
4 {"role": "user", "content": "Tell me about the New York Giants."},
5 {"role": "assistant", "content": "The New York Giants are a professional football team based in East Rutherford, New Jersey. They are part of the NFC East division in the NFL."}
6 ]
7}Example: Creating a Dataset for the New York Giants
Let’s say you want to create a dataset where users can ask questions about the New York Giants, and the assistant will provide informative answers. Below is an example of the JSON structure that represents interactions between a user and the assistant:
1{
2 "input": {
3 "messages": [
4 {
5 "role": "user",
6 "content": "What year did the New York Giants win the Super Bowl?"
7 }
8 ],
9 "tools": [],
10 "parallel_tool_calls": true
11 },
12 "preferred_output": [
13 {
14 "role": "assistant",
15 "content": "The New York Giants won the Super Bowl four times: 1986, 1990, 2007, and 2011."
16 }
17 ],
18 "non_preferred_output": [
19 {
20 "role": "assistant",
21 "content": "The Giants have won several Super Bowls."
22 }
23 ]
24}In this case, the user asks about the Super Bowl victories of the New York Giants, and the assistant provides two responses: a more detailed preferred output, and a shorter non-preferred output.
Converting JSON to JSONL
To fine-tune OpenAI’s models, we need to convert this JSON data into JSONL format. The key is ensuring that each line contains a complete conversation with the necessary system, user, and assistant roles, structured appropriately.
Steps to Convert JSON to JSONL
-
Identify the Components: The input JSON data contains an array of
messagesand separatepreferred_outputandnon_preferred_outputfields. These need to be combined into a single conversation. -
Format Each Entry: Each line in the JSONL file must represent a full conversation, including the
system,user, andassistantmessages.
Here’s what the converted JSONL file will look like:
1{"messages": [{"role": "system", "content": "You are a knowledgeable sports assistant who answers questions about teams, players, and sports events."}, {"role": "user", "content": "What year did the New York Giants win the Super Bowl?"}, {"role": "assistant", "content": "The New York Giants won the Super Bowl four times: 1986, 1990, 2007, and 2011.", "weight": 1}]}
2{"messages": [{"role": "system", "content": "You are a knowledgeable sports assistant who answers questions about teams, players, and sports events."}, {"role": "user", "content": "What year did the New York Giants win the Super Bowl?"}, {"role": "assistant", "content": "The Giants have won several Super Bowls."}]}Key Points:
- Each line contains a single conversation with a
system,user, andassistantmessage. - The
weightattribute is added to thepreferred_outputresponse to indicate that it is the preferred response (you can adjust the weight based on the quality of the responses). - The
non_preferred_outputis included as an alternative, shorter response from the assistant.
Automating the Conversion with Python
If you have a larger dataset, manually converting it to JSONL can be time-consuming. You can automate the process with a Python script. Below is a Python script that reads the input JSON file and converts it into JSONL format:
Python Script for Conversion
1import json
2
3def convert_json_to_jsonl(input_file, output_file):
4 # Read the JSON data
5 with open(input_file, 'r') as infile:
6 data = json.load(infile)
7
8 # Open output file for writing JSONL
9 with open(output_file, 'w') as outfile:
10 # Combine the system message, user message, and both assistant outputs
11 for preferred, non_preferred in zip(data["preferred_output"], data["non_preferred_output"]):
12 jsonl_entry = {
13 "messages": [
14 {"role": "system", "content": "You are a knowledgeable sports assistant who answers questions about teams, players, and sports events."},
15 {"role": "user", "content": data["input"]["messages"][0]["content"]},
16 {"role": "assistant", "content": preferred["content"], "weight": 1}
17 ]
18 }
19 json.dump(jsonl_entry, outfile)
20 outfile.write('\n')
21
22 # Write non-preferred output as an alternative
23 jsonl_entry["messages"][2] = {"role": "assistant", "content": non_preferred["content"]}
24 json.dump(jsonl_entry, outfile)
25 outfile.write('\n')
26
27if __name__ == "__main__":
28 input_file = 'input.json' # Path to your input JSON file
29 output_file = 'output.jsonl' # Path to your output JSONL file
30 convert_json_to_jsonl(input_file, output_file)How to Use the Python Script:
-
Save the input JSON data in a file named
input.json. -
Save the script as
convert_json_to_jsonl.py. -
Run the script using Python:
bash1python convert_json_to_jsonl.py
This script will generate an output.jsonl file, where each line corresponds to a conversation about the New York Giants, complete with the system, user, and assistant messages.