2024-09-09 web, development, javascript
Script to Process Phrases with OpenAI
By O. Wolfson
Introduction
This tutorial guides you through creating a Python script that processes Italian phrases from a JSON file, obtains explanations using OpenAI's GPT-3 model, and saves these explanations in another JSON file. A key aspect of this project is the use of a virtual environment for better dependency management and project isolation.
Prerequisites
- Basic understanding of Python.
- An OpenAI API key.
- Python environment with
virtualenv
installed. - A JSON file with Italian phrases.
Step 1: Setting Up a Virtual Environment
Before starting, it’s crucial to set up a virtual environment. This keeps your project dependencies separate from your global Python installation.
-
Create a Virtual Environment:
bashpython -m venv openai-env
This command creates a new virtual environment named
openai-env
. -
Activate the Virtual Environment:
- On Windows:
bash
openai-env\Scripts\activate
- On macOS and Linux:
bash
source openai-env/bin/activate
- On Windows:
-
Install Required Packages: With the environment activated, install the
openai
package:bashpip install openai
Step 2: Importing Libraries
In your Python script, import the necessary libraries:
pythonimport openai
import json
from tqdm import tqdm
import time
Step 3: OpenAI API Key Configuration
Set your OpenAI API key:
pythonopenai.api_key = "your-api-key"
You can get an API key from OpenAI. Sign up for an account and create a new API key. This may require you to enter your credit card information and pay a small fee, depending on the account type and your usage.
Step 4: Reading Input Data
Load your JSON file containing the Italian phrases:
pythonwith open("italian-language-phrases.json", "r") as file:
phrases = json.load(file)
See the JSON data file used in this example.
Step 5: Preparing for Phrase Processing
Check for an existing explanations file. If not found, create an empty list:
pythontry:
with open("italian-phrase-explanations.json", "r") as file:
explanations = json.load(file)
except FileNotFoundError:
explanations = []
Step 6: Progress Tracking and Time Estimation
Utilize the tqdm library for a progress bar:
pythontotal_phrases = len(phrases)
estimated_time = total_phrases * 30 # Assuming 30 seconds per phrase
print(f"Starting to process {total_phrases} phrases. Estimated time: {estimated_time//60} minutes {estimated_time%60} seconds.\n")
for item in tqdm(phrases, desc="Processing phrases", unit="phrase"):
# ...processing logic here...
Step 7: Processing Phrases and Storing Explanations
Inside the loop, use OpenAI to get explanations for each new phrase:
pythonfor item in tqdm(phrases, desc="Processing phrases", unit="phrase"):
# Skip already processed phrases
if any(exp["id"] == item["id"] for exp in explanations):
continue
phrase = item["phrase"]
completion = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": f"Explain the Italian phrase: {phrase}"}],
)
explanation = {"id": item["id"], "explanation": completion.choices[0].message.content}
explanations.append(explanation)
with open("italian-phrase-explanations.json", "w") as file:
json.dump(explanations, file, indent=4)
Step 8: Finalizing the Script
Once all phrases are processed, output a completion message:
pythonprint("COMPLETE! Explanations written to italian-phrase-explanations.json")
Step 9: Running the Script
Run the script from your terminal:
bash# On Windows:
python process-phrases.py
# On macOS and Linux:
python3 process-phrases.py
output:
Check out the phrase explanations here. Note that the each object contains markdown that can be rendered as HTML.
Complete Python Script Code
pythonimport openai
import json
from tqdm import tqdm
import time
# OpenAI API key
openai.api_key = "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"
# Read the input JSON containing Italian phrases
with open("italian-language-phrases.json", "r") as file:
phrases = json.load(file)
# Check if explanations file already exists, if not, initialize an empty list
try:
with open("italian-phrase-explanations.json", "r") as file:
explanations = json.load(file)
except FileNotFoundError:
explanations = []
# Estimate Time Remaining
total_phrases = len(phrases)
estimated_time = total_phrases * 30 # Assuming 30 seconds per phrase
print(
f"Starting to process {total_phrases} phrases. Estimated time: {estimated_time//60} minutes {estimated_time%60} seconds.\n"
)
# Progress bar using tqdm
for item in tqdm(phrases, desc="Processing phrases", unit="phrase"):
# If the item's explanation already exists, skip it
if any(exp["id"] == item["id"] for exp in explanations):
continue
# Printing progress updates
i = phrases.index(item) + 1
print(f"\nProcessing phrase {i} of {total_phrases}...")
phrase = item["phrase"]
# Query OpenAI for an explanation
completion = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "user",
"content": f"Give me a brief linguistic / grammatical breakdown of the following Italian phrase (formatted in markdown): {phrase}. Please do not address the question or questioner in your response. Just deliver the explanation itself, as the entire text response will be used directly in some documentation. please format each explanation so that all Italian words are in **bold** and the corresponding English are *bold** words as well. The explanation should be in markdown format.",
}
],
)
# Append the explanation to the explanations list
explanation = {
"id": item["id"],
"explanation": completion.choices[0].message.content,
}
explanations.append(explanation)
# Write the current explanation to the JSON file immediately
with open("italian-phrase-explanations.json", "w") as file:
json.dump(explanations, file, indent=4)
print(f"Processed and saved explanation for phrase {i}.\n")
print(f"COMPLETE! Explanations written to italian-phrase-explanations.json")
Conclusion
By using a virtual environment, this script provides a reliable and isolated way to process Italian phrases with OpenAI’s API. This method is essential for maintaining a clean and conflict-free development environment.
Additional Tips
- Deactivate your virtual environment when you're finished by typing
deactivate
in your terminal. - Consider maintaining a
requirements.txt
file for easy setup of the environment on different machines. - Regularly update your dependencies to catch up with the latest versions and security patches.
This article now includes a complete guide on setting up a virtual environment for your Python project, ensuring a more organized and efficient development process, especially when integrating powerful tools like OpenAI's API.