Klasifikační analýza klíčových slov pomocí AI

Vzpomínáte si na doby, kdy jsme všichni trávili hodiny analýzou klíčových slov v Excelu nebo OpenRefine? Procházení tisíců řádků, ruční kategorizace a hodnocení relevance... Ta doba je naštěstí za námi. Umělá inteligence dnes dokáže celý tento proces neuvěřitelně zjednodušit a zefektivnit.

Níže sdílím praktický skript, který vám s analýzou klíčových slov pomocí AI pomůže začít.

Stačí ho zkopírovat do Jupyter Notebooku a můžete experimentovat. Pokud nejste zkušení programátoři, nezoufejte - když se někde zaseknete, použijte ChatGPT nebo jiného AI asistenta, který vám pomůže skript upravit přesně podle vašich potřeb.

V čem spočívá kouzlo tohoto přístupu?

Hlavní výhodou je schopnost zpracovat obrovské množství dat najednou. Můžete vzít kompletní export klíčových slov z Ahrefs nebo jiného SEO nástroje (klidně desítky tisíc položek) a nechat AI, aby vám je automaticky oklasifikovala. Vy pak pracujete jen s profiltrovaným výstupem, který vám ušetří hodiny manuální práce.

AI často dokáže odhalit souvislosti mezi klíčovými slovy, které byste při ruční analýze snadno přehlédli. To vám pomůže lépe porozumět záměrům uživatelů a efektivněji strukturovat obsah.

Jak začít s použitím skriptu?

Nainstalujte si Jupyter Notebook - nejjednodušší způsob je stáhnout si distribuci Anaconda, která obsahuje Jupyter i všechny potřebné knihovny
Vytvořte nový notebook a zkopírujte do něj kód uvedený níže
Získejte API klíč od OpenAI:
- Zaregistrujte se na OpenAI Platform
- V sekci API keys si vygenerujte nový klíč
- Uložte klíč do souboru .env ve stejném adresáři jako váš notebook (formát: OPENAI_API_KEY=váš_klíč_zde) Alternativně můžete klíč zadat přímo při spuštění skriptu, když vás o to požádá
Připravte data - uložte svůj soubor s klíčovými slovy jako keywords.csv do stejného adresáře (ujistěte se, že obsahuje sloupec s názvem "Keyword")
Upravte prompt - v konfigurační sekci skriptu přizpůsobte kategorie produktů a popis vaší domény. Pokud si nejste jistí, nechte ChatGPT, aby vám prompt upravil podle vašich potřeb. Dají se s tím dělat úplná kouzla.
Spusťte skript a sledujte, jak AI automaticky klasifikuje vaše klíčová slova

Během chvilky budete mít přípravu pro profesionální analýzu klíčových slov, která by vás jinak stála hodiny manuální práce. Stačí jen pár kliknutí a trocha experimentování!

A teď už ten slíbený kód:

Krok 1: Instalace balíčků

# Run this cell first to install required packages
!pip install python-dotenv pydantic tqdm pandas openai

Krok 2: Konfigurace

Tady si upravíte prompt dle potřeby. Představivosti se meze nekladou.

"""
SEO Keyword Classification Tool

This script uses OpenAI's API to automatically classify keywords for SEO analysis.
It processes keywords from a CSV file and classifies them according to various parameters.

Requirements:
- Python 3.6+
- OpenAI API key saved in a .env file
- Input CSV file with a column named 'Keyword'
"""

#------------------------ CONFIGURATION - MODIFY AS NEEDED ------------------------#
# Input and output file settings
INPUT_FILE = 'keywords.csv'                 # CSV file containing keywords to analyze
KEYWORD_COLUMN = 'Keyword'                  # Name of column containing keywords

# OpenAI API settings
MODEL = "gpt-4.1"                           # OpenAI model to use
TEMPERATURE = 0.3                           # Temperature setting (0.0-1.0)
API_KEY_ENV_VAR = "OPENAI_API_KEY"          # Name of environment variable for API key

# Processing settings 
MAX_WORKERS = 10                            # Number of parallel requests (adjust based on API limits)
BATCH_SIZE = 100                            # Process keywords in batches of this size
SAVE_INTERVAL = 50                          # Save intermediate results every N keywords
RETRY_COUNT = 3                             # Number of retries for failed requests
INITIAL_RETRY_DELAY = 0.5                   # Initial delay before retrying (seconds)

# Category settings - CUSTOMIZE FOR YOUR DOMAIN
PRODUCT_CATEGORIES = [
    "Smartphones",                          # Category 1
    "Laptops",                              # Category 2
    "TVs",                                  # Category 3
    "Audio Devices",                        # Category 4
    "Cameras",                              # Category 5
    "Gaming Devices",                       # Category 6
    "Smart Home",                           # Category 7
    "Accessories",                          # Category 8
    "Other",                                # Other relevant products
    "N/A"                                   # Not applicable
]

# Custom classification prompt - CUSTOMIZE FOR YOUR DOMAIN
SYSTEM_PROMPT = """
Act as an SEO specialist and help classify keywords based on:
1. Search type (Product search, Informational search, Other)
2. Relevance to our website (Yes, No)
3. Product category ({categories})
4. Intent specificity (Yes, No) - whether the keyword specifically implies shopping for electronics

We sell these categories of electronics products: {categories_list}.

For the "Intent specificity" field:
- Mark "Yes" if the keyword explicitly mentions buying, shopping, prices, or clearly implies purchasing electronics
- Mark "No" if the keyword is generic or doesn't specifically imply shopping intent

Examples:
- "best smartphones 2025" would be "Yes" (specifically about shopping for smartphones)
- "phone" would be "No" (too generic, doesn't imply shopping intent)
- "laptop deals" would be "Yes" (specifically about purchasing laptops)
- "how to connect bluetooth device" would be "No" (doesn't imply shopping)
"""

Krok 3: Klient pro připojení a import dat

#------------------------ IMPORTS ------------------------#
import pandas as pd
import os
import time
import json
from dotenv import load_dotenv
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Optional, Literal
from tqdm.notebook import tqdm  # For Jupyter Notebook progress bars
import concurrent.futures  # For parallel processing

#------------------------ SETUP ------------------------#
# Load environment variables from .env file
load_dotenv()

# Check for API key and provide helpful error
api_key = os.environ.get(API_KEY_ENV_VAR)
if not api_key:
    # Alternative for Jupyter - direct input if .env fails
    api_key = input("OpenAI API key not found in environment. Please enter your API key: ")
    os.environ[API_KEY_ENV_VAR] = api_key

# Initialize the OpenAI client with API key from environment variable
client = OpenAI(api_key=api_key)

# Format the system prompt with categories
categories_str = ", ".join([f'"{cat}"' for cat in PRODUCT_CATEGORIES[:-1]]) + f', "{PRODUCT_CATEGORIES[-1]}"'
FORMATTED_SYSTEM_PROMPT = SYSTEM_PROMPT.format(
    categories=categories_str,
    categories_list=", ".join(cat for cat in PRODUCT_CATEGORIES if cat != "N/A" and cat != "Other")
)

#------------------------ DATA MODEL ------------------------#
# Define Pydantic model for structured output
class KeywordClassification(BaseModel):
    keyword: str = Field(..., description="The keyword being analyzed")
    search_type: Literal["Product search", "Informational search", "Other"] = Field(
        ..., description="Type of search intent"
    )
    relevance: Literal["Yes", "No"] = Field(
        ..., description="Whether the keyword is relevant to the website"
    )
    product_category: Optional[Literal[tuple(PRODUCT_CATEGORIES)]] = Field(
        "N/A", description="Which product category the keyword relates to (if relevant)"
    )
    intent_specificity: Literal["Yes", "No"] = Field(
        ..., description="Whether the keyword specifically implies shopping for electronics"
    )

Krok 4: Funkce na zpracování dat

#------------------------ FUNCTIONS ------------------------#
# Function to classify keywords using OpenAI with Function Calling
def classify_keyword(keyword, model=MODEL, retry_count=RETRY_COUNT, retry_delay=INITIAL_RETRY_DELAY):
    for attempt in range(retry_count):
        try:
            # Define the function that specifies the response structure
            tools = [
                {
                    "type": "function",
                    "function": {
                        "name": "classify_keyword",
                        "description": "Classify a keyword for SEO purposes",
                        "parameters": KeywordClassification.model_json_schema()
                    }
                }
            ]

            response = client.chat.completions.create(
                model=model,
                messages=[
                    {"role": "system", "content": FORMATTED_SYSTEM_PROMPT}, 
                    {"role": "user", "content": f"Classify this keyword: '{keyword}'"}
                ],
                tools=tools,
                tool_choice={"type": "function", "function": {"name": "classify_keyword"}},
                max_tokens=1000,
                temperature=TEMPERATURE
            )

            # Extract the function call arguments
            tool_call = response.choices[0].message.tool_calls[0]
            result_json = tool_call.function.arguments

            # Parse into Pydantic model
            classification = KeywordClassification.model_validate_json(result_json)
            return classification

        except Exception as e:
            if attempt < retry_count - 1:
                print(f"Error processing keyword '{keyword}', retrying... ({e})")
                time.sleep(retry_delay)
                retry_delay *= 2  # Exponential backoff
            else:
                print(f"Failed to process keyword '{keyword}' after {retry_count} attempts: {e}")
                return None

# Helper function to save intermediate results
def save_results(results, filename, timestamp=True):
    if not results:
        return None

    results_df = pd.DataFrame([{
        'Keyword': r.keyword,
        'Search Type': r.search_type,
        'Relevant': r.relevance,
        'Product Category': r.product_category,
        'Intent Specificity': r.intent_specificity
    } for r in results])

    # Add timestamp if requested
    if timestamp:
        time_str = time.strftime("%Y%m%d-%H%M%S")
        file_parts = filename.split('.')
        output_filename = f"{file_parts[0]}_{time_str}.{file_parts[1]}"
    else:
        output_filename = filename

    results_df.to_csv(output_filename, index=False)
    print(f"Results saved to '{output_filename}' with {len(results)} keywords")
    return results_df

# Process keywords with parallel execution
def process_keywords(df, max_workers=MAX_WORKERS, batch_size=BATCH_SIZE, save_interval=SAVE_INTERVAL):
    results = []
    errors = []

    # Get unique keywords from the dataframe
    keywords_to_process = df[KEYWORD_COLUMN].unique().tolist()
    total_keywords = len(keywords_to_process)

    print(f"\nProcessing {total_keywords} unique keywords with up to {max_workers} parallel workers...")

    # Create a checkpoint file to track progress
    checkpoint_file = 'processing_checkpoint.txt'
    processed_keywords = set()

    # Check if checkpoint file exists and load processed keywords
    if os.path.exists(checkpoint_file):
        with open(checkpoint_file, 'r') as f:
            processed_keywords = set(line.strip() for line in f)
        print(f"Resuming from checkpoint. {len(processed_keywords)} keywords already processed.")

    # Filter out already processed keywords
    keywords_to_process = [k for k in keywords_to_process if k not in processed_keywords]
    print(f"{len(keywords_to_process)} keywords remaining to process.")

    # Add previously processed keywords from checkpoint
    if processed_keywords and os.path.exists('keyword_classifications_final.csv'):
        previous_results_df = pd.read_csv('keyword_classifications_final.csv')
        previous_results = [
            KeywordClassification(
                keyword=row['Keyword'],
                search_type=row['Search Type'],
                relevance=row['Relevant'],
                product_category=row['Product Category'],
                intent_specificity=row['Intent Specificity']
            )
            for _, row in previous_results_df.iterrows()
        ]
        results.extend(previous_results)
        print(f"Loaded {len(previous_results)} previously processed keywords.")

    # Process in batches for better progress tracking
    for i in range(0, len(keywords_to_process), batch_size):
        batch = keywords_to_process[i:i+batch_size]
        batch_results = []
        batch_errors = []

        print(f"\nProcessing batch {i//batch_size + 1}/{(len(keywords_to_process) + batch_size - 1)//batch_size}...")

        # Use ThreadPoolExecutor for parallel processing
        with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
            # Submit all tasks
            future_to_keyword = {executor.submit(classify_keyword, keyword): keyword for keyword in batch}

            # Process results as they complete with progress bar
            for future in tqdm(concurrent.futures.as_completed(future_to_keyword), total=len(batch), desc="Processing"):
                keyword = future_to_keyword[future]
                try:
                    classification = future.result()
                    if classification:
                        batch_results.append(classification)
                        results.append(classification)
                        # Update checkpoint file
                        with open(checkpoint_file, 'a') as f:
                            f.write(f"{keyword}\n")
                    else:
                        batch_errors.append(keyword)
                        errors.append(keyword)
                except Exception as e:
                    print(f"Exception processing keyword '{keyword}': {e}")
                    batch_errors.append(keyword)
                    errors.append(keyword)

                # Save intermediate results at regular intervals
                if len(results) % save_interval == 0:
                    save_results(results, f'keyword_classifications_interim.csv')

        # Save batch results
        if batch_results:
            save_results(batch_results, f'keyword_classifications_batch_{i//batch_size + 1}.csv')
            print(f"Batch results saved (processed {len(results)} keywords so far)")

    # Final results
    if results:
        final_results_df = save_results(results, 'keyword_classifications_final.csv', timestamp=False)

        print("\nClassification Results:")
        display(final_results_df.head(10))  # Show just the first 10 rows in Jupyter
        print(f"Total processed: {len(results)} keywords")

        # Merge with original data
        merged_df = df.merge(
            final_results_df,
            left_on=KEYWORD_COLUMN,
            right_on='Keyword',
            how='left'
        )

        # Save merged results
        merged_df.to_csv('keywords_with_classifications.csv', index=False)
        print("Merged results saved to 'keywords_with_classifications.csv'")

    # Report errors
    if errors:
        print(f"\nFailed to process {len(errors)} keywords:")
        for keyword in errors[:10]:  # Show just the first 10 errors
            print(f"- {keyword}")
        if len(errors) > 10:
            print(f"... and {len(errors) - 10} more")

        # Save errors to file
        pd.DataFrame({'Failed Keywords': errors}).to_csv('failed_keywords.csv', index=False)
        print("List of failed keywords saved to 'failed_keywords.csv'")

    return results, errors, final_results_df if results else None

Krok 5: Načti a zobraz

# Load data and display information
try:
    # Check if input file exists
    if not os.path.exists(INPUT_FILE):
        print(f"Error: Input file '{INPUT_FILE}' not found.")
        print("Please update the INPUT_FILE variable in the configuration section.")
        df = None
    else:
        # Read the CSV file into a DataFrame
        df = pd.read_csv(INPUT_FILE)

        # Check if keyword column exists
        if KEYWORD_COLUMN not in df.columns:
            print(f"Error: Column '{KEYWORD_COLUMN}' not found in input file.")
            print(f"Available columns: {', '.join(df.columns)}")
            print("Please update the KEYWORD_COLUMN variable in the configuration section.")
            df = None
        else:
            # Display information about the dataset
            print(f"Total rows in dataset: {len(df)}")
            print(f"Unique keywords in dataset: {df[KEYWORD_COLUMN].nunique()}")
            print("\nSample data:")
            display(df.head())  # Using display() for better formatting in Jupyter

except Exception as e:
    print(f"An error occurred while loading data: {e}")
    df = None

Krok 6: Neplecha dokončena!

# Execute keyword processing
if df is not None:
    # Process the keywords
    results, errors, classifications_df = process_keywords(
        df,
        max_workers=MAX_WORKERS,
        batch_size=BATCH_SIZE,
        save_interval=SAVE_INTERVAL
    )

    print("\nProcessing complete!")

    # Display final results in notebook
    if classifications_df is not None:
        print("\nFinal Classifications Sample:")
        display(classifications_df.head(10))

Výstup najdete jako soubor keywords_with_classifications.csv

Jestli něco nefunguje, dejte vědět a opravím. Kód by měl s minimálními úpravami běžet případně i na Google Colab, kde je to celé v cloudu.

Máte nějaké nápady, jak nejlépe stavět prompty pro klasifikaci klíčových slov? Budu rád, když se podělíte.

Klasifikační analýza klíčových slov pomocí AI

Jan Onesork

V čem spočívá kouzlo tohoto přístupu?

Jak začít s použitím skriptu?

Krok 1: Instalace balíčků

Krok 2: Konfigurace

Krok 3: Klient pro připojení a import dat

Krok 4: Funkce na zpracování dat

Krok 5: Načti a zobraz

Krok 6: Neplecha dokončena!

Diskuze členů

V data věříme

Základy OpenRefine pro SEO

Klasifikační analýza klíčových slov pomocí AI

Jan Onesork

Dostávej nejnovější artikle do svojí škatule.

V čem spočívá kouzlo tohoto přístupu?

Jak začít s použitím skriptu?

Krok 1: Instalace balíčků

Krok 2: Konfigurace

Krok 3: Klient pro připojení a import dat

Krok 4: Funkce na zpracování dat

Krok 5: Načti a zobraz

Krok 6: Neplecha dokončena!

Diskuze členů

V data věříme

Základy OpenRefine pro SEO