| 2023-11-23

Script to Process Phrases with OpenAI

    Introduction

    This tutorial guides you through creating a Python script that processes Italian phrases from a JSON file, obtains explanations using OpenAI's GPT-3 model, and saves these explanations in another JSON file. A key aspect of this project is the use of a virtual environment for better dependency management and project isolation.

    Prerequisites

    • Basic understanding of Python.
    • An OpenAI API key.
    • Python environment with virtualenv installed.
    • A JSON file with Italian phrases.

    Step 1: Setting Up a Virtual Environment

    Before starting, it’s crucial to set up a virtual environment. This keeps your project dependencies separate from your global Python installation.

    1. Create a Virtual Environment:

      python -m venv openai-env
      
      bash

      This command creates a new virtual environment named openai-env.

    2. Activate the Virtual Environment:

      • On Windows:
        openai-env\Scripts\activate
        
        bash
      • On macOS and Linux:
        source openai-env/bin/activate
        
        bash
    3. Install Required Packages: With the environment activated, install the openai package:

      pip install openai
      
      bash

    Step 2: Importing Libraries

    In your Python script, import the necessary libraries:

    import openai
    import json
    from tqdm import tqdm
    import time
    
    python

    Step 3: OpenAI API Key Configuration

    Set your OpenAI API key:

    openai.api_key = "your-api-key"
    
    python

    You can get an API key from OpenAI. Sign up for an account and create a new API key. This may require you to enter your credit card information and pay a small fee, depending on the account type and your usage.

    Step 4: Reading Input Data

    Load your JSON file containing the Italian phrases:

    with open("italian-language-phrases.json", "r") as file:
        phrases = json.load(file)
    
    python

    See the JSON data file used in this example.

    Step 5: Preparing for Phrase Processing

    Check for an existing explanations file. If not found, create an empty list:

    try:
        with open("italian-phrase-explanations.json", "r") as file:
            explanations = json.load(file)
    except FileNotFoundError:
        explanations = []
    
    python

    Step 6: Progress Tracking and Time Estimation

    Utilize the tqdm library for a progress bar:

    total_phrases = len(phrases)
    estimated_time = total_phrases * 30  # Assuming 30 seconds per phrase
    
    print(f"Starting to process {total_phrases} phrases. Estimated time: {estimated_time//60} minutes {estimated_time%60} seconds.\n")
    
    for item in tqdm(phrases, desc="Processing phrases", unit="phrase"):
        # ...processing logic here...
    
    python

    Step 7: Processing Phrases and Storing Explanations

    Inside the loop, use OpenAI to get explanations for each new phrase:

    for item in tqdm(phrases, desc="Processing phrases", unit="phrase"):
        # Skip already processed phrases
        if any(exp["id"] == item["id"] for exp in explanations):
            continue
    
        phrase = item["phrase"]
        completion = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": f"Explain the Italian phrase: {phrase}"}],
        )
    
        explanation = {"id": item["id"], "explanation": completion.choices[0].message.content}
        explanations.append(explanation)
    
        with open("italian-phrase-explanations.json", "w") as file:
            json.dump(explanations, file, indent=4)
    
    python

    Step 8: Finalizing the Script

    Once all phrases are processed, output a completion message:

    print("COMPLETE! Explanations written to italian-phrase-explanations.json")
    
    python

    Step 9: Running the Script

    Run the script from your terminal:

    # On Windows:
    python process-phrases.py
    
    # On macOS and Linux:
    python3 process-phrases.py
    
    bash

    output:

    Check out the phrase explanations here. Note that the each object contains markdown that can be rendered as HTML.

    Complete Python Script Code

    import openai
    import json
    from tqdm import tqdm
    import time
    
    # OpenAI API key
    openai.api_key = "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"
    
    # Read the input JSON containing Italian phrases
    with open("italian-language-phrases.json", "r") as file:
        phrases = json.load(file)
    
    # Check if explanations file already exists, if not, initialize an empty list
    try:
        with open("italian-phrase-explanations.json", "r") as file:
            explanations = json.load(file)
    except FileNotFoundError:
        explanations = []
    
    # Estimate Time Remaining
    total_phrases = len(phrases)
    estimated_time = total_phrases * 30  # Assuming 30 seconds per phrase
    print(
        f"Starting to process {total_phrases} phrases. Estimated time: {estimated_time//60} minutes {estimated_time%60} seconds.\n"
    )
    
    # Progress bar using tqdm
    for item in tqdm(phrases, desc="Processing phrases", unit="phrase"):
        # If the item's explanation already exists, skip it
        if any(exp["id"] == item["id"] for exp in explanations):
            continue
    
        # Printing progress updates
        i = phrases.index(item) + 1
        print(f"\nProcessing phrase {i} of {total_phrases}...")
    
        phrase = item["phrase"]
    
        # Query OpenAI for an explanation
        completion = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[
                {
                    "role": "user",
                    "content": f"Give me a brief linguistic / grammatical breakdown of the following Italian phrase (formatted in markdown): {phrase}. Please do not address the question or questioner in your response. Just deliver the explanation itself, as the entire text response will be used directly in some documentation. please format each explanation so that all Italian words are in **bold** and the corresponding English are *bold** words as well. The explanation should be in markdown format.",
                }
            ],
        )
    
        # Append the explanation to the explanations list
        explanation = {
            "id": item["id"],
            "explanation": completion.choices[0].message.content,
        }
        explanations.append(explanation)
    
        # Write the current explanation to the JSON file immediately
        with open("italian-phrase-explanations.json", "w") as file:
            json.dump(explanations, file, indent=4)
    
        print(f"Processed and saved explanation for phrase {i}.\n")
    
    print(f"COMPLETE! Explanations written to italian-phrase-explanations.json")
    
    python

    Conclusion

    By using a virtual environment, this script provides a reliable and isolated way to process Italian phrases with OpenAI’s API. This method is essential for maintaining a clean and conflict-free development environment.

    Additional Tips

    • Deactivate your virtual environment when you're finished by typing deactivate in your terminal.
    • Consider maintaining a requirements.txt file for easy setup of the environment on different machines.
    • Regularly update your dependencies to catch up with the latest versions and security patches.

    This article now includes a complete guide on setting up a virtual environment for your Python project, ensuring a more organized and efficient development process, especially when integrating powerful tools like OpenAI's API.


    Thanks for reading. If you enjoyed this post, I invite you to explore more of my site. I write about web development, programming, and other fun stuff.