Vzpomínáte si na doby, kdy jsme všichni trávili hodiny analýzou klíčových slov v Excelu nebo OpenRefine? Procházení tisíců řádků, ruční kategorizace a hodnocení relevance... Ta doba je naštěstí za námi. Umělá inteligence dnes dokáže celý tento proces neuvěřitelně zjednodušit a zefektivnit.
Níže sdílím praktický skript, který vám s analýzou klíčových slov pomocí AI pomůže začít.
Stačí ho zkopírovat do Jupyter Notebooku a můžete experimentovat. Pokud nejste zkušení programátoři, nezoufejte - když se někde zaseknete, použijte ChatGPT nebo jiného AI asistenta, který vám pomůže skript upravit přesně podle vašich potřeb.
V čem spočívá kouzlo tohoto přístupu?
Hlavní výhodou je schopnost zpracovat obrovské množství dat najednou. Můžete vzít kompletní export klíčových slov z Ahrefs nebo jiného SEO nástroje (klidně desítky tisíc položek) a nechat AI, aby vám je automaticky oklasifikovala. Vy pak pracujete jen s profiltrovaným výstupem, který vám ušetří hodiny manuální práce.
AI často dokáže odhalit souvislosti mezi klíčovými slovy, které byste při ruční analýze snadno přehlédli. To vám pomůže lépe porozumět záměrům uživatelů a efektivněji strukturovat obsah.
Jak začít s použitím skriptu?
- Nainstalujte si Jupyter Notebook - nejjednodušší způsob je stáhnout si distribuci Anaconda, která obsahuje Jupyter i všechny potřebné knihovny
- Vytvořte nový notebook a zkopírujte do něj kód uvedený níže
- Získejte API klíč od OpenAI:
- Zaregistrujte se na OpenAI Platform
- V sekci API keys si vygenerujte nový klíč
- Uložte klíč do souboru
.env
ve stejném adresáři jako váš notebook (formát:OPENAI_API_KEY=váš_klíč_zde
) Alternativně můžete klíč zadat přímo při spuštění skriptu, když vás o to požádá
- Připravte data - uložte svůj soubor s klíčovými slovy jako
keywords.csv
do stejného adresáře (ujistěte se, že obsahuje sloupec s názvem "Keyword") - Upravte prompt - v konfigurační sekci skriptu přizpůsobte kategorie produktů a popis vaší domény. Pokud si nejste jistí, nechte ChatGPT, aby vám prompt upravil podle vašich potřeb. Dají se s tím dělat úplná kouzla.
- Spusťte skript a sledujte, jak AI automaticky klasifikuje vaše klíčová slova
Během chvilky budete mít přípravu pro profesionální analýzu klíčových slov, která by vás jinak stála hodiny manuální práce. Stačí jen pár kliknutí a trocha experimentování!
A teď už ten slíbený kód:
Krok 1: Instalace balíčků
# Run this cell first to install required packages
!pip install python-dotenv pydantic tqdm pandas openai
Krok 2: Konfigurace
Tady si upravíte prompt dle potřeby. Představivosti se meze nekladou.
"""
SEO Keyword Classification Tool
This script uses OpenAI's API to automatically classify keywords for SEO analysis.
It processes keywords from a CSV file and classifies them according to various parameters.
Requirements:
- Python 3.6+
- OpenAI API key saved in a .env file
- Input CSV file with a column named 'Keyword'
"""
#------------------------ CONFIGURATION - MODIFY AS NEEDED ------------------------#
# Input and output file settings
INPUT_FILE = 'keywords.csv' # CSV file containing keywords to analyze
KEYWORD_COLUMN = 'Keyword' # Name of column containing keywords
# OpenAI API settings
MODEL = "gpt-4.1" # OpenAI model to use
TEMPERATURE = 0.3 # Temperature setting (0.0-1.0)
API_KEY_ENV_VAR = "OPENAI_API_KEY" # Name of environment variable for API key
# Processing settings
MAX_WORKERS = 10 # Number of parallel requests (adjust based on API limits)
BATCH_SIZE = 100 # Process keywords in batches of this size
SAVE_INTERVAL = 50 # Save intermediate results every N keywords
RETRY_COUNT = 3 # Number of retries for failed requests
INITIAL_RETRY_DELAY = 0.5 # Initial delay before retrying (seconds)
# Category settings - CUSTOMIZE FOR YOUR DOMAIN
PRODUCT_CATEGORIES = [
"Smartphones", # Category 1
"Laptops", # Category 2
"TVs", # Category 3
"Audio Devices", # Category 4
"Cameras", # Category 5
"Gaming Devices", # Category 6
"Smart Home", # Category 7
"Accessories", # Category 8
"Other", # Other relevant products
"N/A" # Not applicable
]
# Custom classification prompt - CUSTOMIZE FOR YOUR DOMAIN
SYSTEM_PROMPT = """
Act as an SEO specialist and help classify keywords based on:
1. Search type (Product search, Informational search, Other)
2. Relevance to our website (Yes, No)
3. Product category ({categories})
4. Intent specificity (Yes, No) - whether the keyword specifically implies shopping for electronics
We sell these categories of electronics products: {categories_list}.
For the "Intent specificity" field:
- Mark "Yes" if the keyword explicitly mentions buying, shopping, prices, or clearly implies purchasing electronics
- Mark "No" if the keyword is generic or doesn't specifically imply shopping intent
Examples:
- "best smartphones 2025" would be "Yes" (specifically about shopping for smartphones)
- "phone" would be "No" (too generic, doesn't imply shopping intent)
- "laptop deals" would be "Yes" (specifically about purchasing laptops)
- "how to connect bluetooth device" would be "No" (doesn't imply shopping)
"""
Krok 3: Klient pro připojení a import dat
#------------------------ IMPORTS ------------------------#
import pandas as pd
import os
import time
import json
from dotenv import load_dotenv
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Optional, Literal
from tqdm.notebook import tqdm # For Jupyter Notebook progress bars
import concurrent.futures # For parallel processing
#------------------------ SETUP ------------------------#
# Load environment variables from .env file
load_dotenv()
# Check for API key and provide helpful error
api_key = os.environ.get(API_KEY_ENV_VAR)
if not api_key:
# Alternative for Jupyter - direct input if .env fails
api_key = input("OpenAI API key not found in environment. Please enter your API key: ")
os.environ[API_KEY_ENV_VAR] = api_key
# Initialize the OpenAI client with API key from environment variable
client = OpenAI(api_key=api_key)
# Format the system prompt with categories
categories_str = ", ".join([f'"{cat}"' for cat in PRODUCT_CATEGORIES[:-1]]) + f', "{PRODUCT_CATEGORIES[-1]}"'
FORMATTED_SYSTEM_PROMPT = SYSTEM_PROMPT.format(
categories=categories_str,
categories_list=", ".join(cat for cat in PRODUCT_CATEGORIES if cat != "N/A" and cat != "Other")
)
#------------------------ DATA MODEL ------------------------#
# Define Pydantic model for structured output
class KeywordClassification(BaseModel):
keyword: str = Field(..., description="The keyword being analyzed")
search_type: Literal["Product search", "Informational search", "Other"] = Field(
..., description="Type of search intent"
)
relevance: Literal["Yes", "No"] = Field(
..., description="Whether the keyword is relevant to the website"
)
product_category: Optional[Literal[tuple(PRODUCT_CATEGORIES)]] = Field(
"N/A", description="Which product category the keyword relates to (if relevant)"
)
intent_specificity: Literal["Yes", "No"] = Field(
..., description="Whether the keyword specifically implies shopping for electronics"
)
Krok 4: Funkce na zpracování dat
#------------------------ FUNCTIONS ------------------------#
# Function to classify keywords using OpenAI with Function Calling
def classify_keyword(keyword, model=MODEL, retry_count=RETRY_COUNT, retry_delay=INITIAL_RETRY_DELAY):
for attempt in range(retry_count):
try:
# Define the function that specifies the response structure
tools = [
{
"type": "function",
"function": {
"name": "classify_keyword",
"description": "Classify a keyword for SEO purposes",
"parameters": KeywordClassification.model_json_schema()
}
}
]
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": FORMATTED_SYSTEM_PROMPT},
{"role": "user", "content": f"Classify this keyword: '{keyword}'"}
],
tools=tools,
tool_choice={"type": "function", "function": {"name": "classify_keyword"}},
max_tokens=1000,
temperature=TEMPERATURE
)
# Extract the function call arguments
tool_call = response.choices[0].message.tool_calls[0]
result_json = tool_call.function.arguments
# Parse into Pydantic model
classification = KeywordClassification.model_validate_json(result_json)
return classification
except Exception as e:
if attempt < retry_count - 1:
print(f"Error processing keyword '{keyword}', retrying... ({e})")
time.sleep(retry_delay)
retry_delay *= 2 # Exponential backoff
else:
print(f"Failed to process keyword '{keyword}' after {retry_count} attempts: {e}")
return None
# Helper function to save intermediate results
def save_results(results, filename, timestamp=True):
if not results:
return None
results_df = pd.DataFrame([{
'Keyword': r.keyword,
'Search Type': r.search_type,
'Relevant': r.relevance,
'Product Category': r.product_category,
'Intent Specificity': r.intent_specificity
} for r in results])
# Add timestamp if requested
if timestamp:
time_str = time.strftime("%Y%m%d-%H%M%S")
file_parts = filename.split('.')
output_filename = f"{file_parts[0]}_{time_str}.{file_parts[1]}"
else:
output_filename = filename
results_df.to_csv(output_filename, index=False)
print(f"Results saved to '{output_filename}' with {len(results)} keywords")
return results_df
# Process keywords with parallel execution
def process_keywords(df, max_workers=MAX_WORKERS, batch_size=BATCH_SIZE, save_interval=SAVE_INTERVAL):
results = []
errors = []
# Get unique keywords from the dataframe
keywords_to_process = df[KEYWORD_COLUMN].unique().tolist()
total_keywords = len(keywords_to_process)
print(f"\nProcessing {total_keywords} unique keywords with up to {max_workers} parallel workers...")
# Create a checkpoint file to track progress
checkpoint_file = 'processing_checkpoint.txt'
processed_keywords = set()
# Check if checkpoint file exists and load processed keywords
if os.path.exists(checkpoint_file):
with open(checkpoint_file, 'r') as f:
processed_keywords = set(line.strip() for line in f)
print(f"Resuming from checkpoint. {len(processed_keywords)} keywords already processed.")
# Filter out already processed keywords
keywords_to_process = [k for k in keywords_to_process if k not in processed_keywords]
print(f"{len(keywords_to_process)} keywords remaining to process.")
# Add previously processed keywords from checkpoint
if processed_keywords and os.path.exists('keyword_classifications_final.csv'):
previous_results_df = pd.read_csv('keyword_classifications_final.csv')
previous_results = [
KeywordClassification(
keyword=row['Keyword'],
search_type=row['Search Type'],
relevance=row['Relevant'],
product_category=row['Product Category'],
intent_specificity=row['Intent Specificity']
)
for _, row in previous_results_df.iterrows()
]
results.extend(previous_results)
print(f"Loaded {len(previous_results)} previously processed keywords.")
# Process in batches for better progress tracking
for i in range(0, len(keywords_to_process), batch_size):
batch = keywords_to_process[i:i+batch_size]
batch_results = []
batch_errors = []
print(f"\nProcessing batch {i//batch_size + 1}/{(len(keywords_to_process) + batch_size - 1)//batch_size}...")
# Use ThreadPoolExecutor for parallel processing
with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
# Submit all tasks
future_to_keyword = {executor.submit(classify_keyword, keyword): keyword for keyword in batch}
# Process results as they complete with progress bar
for future in tqdm(concurrent.futures.as_completed(future_to_keyword), total=len(batch), desc="Processing"):
keyword = future_to_keyword[future]
try:
classification = future.result()
if classification:
batch_results.append(classification)
results.append(classification)
# Update checkpoint file
with open(checkpoint_file, 'a') as f:
f.write(f"{keyword}\n")
else:
batch_errors.append(keyword)
errors.append(keyword)
except Exception as e:
print(f"Exception processing keyword '{keyword}': {e}")
batch_errors.append(keyword)
errors.append(keyword)
# Save intermediate results at regular intervals
if len(results) % save_interval == 0:
save_results(results, f'keyword_classifications_interim.csv')
# Save batch results
if batch_results:
save_results(batch_results, f'keyword_classifications_batch_{i//batch_size + 1}.csv')
print(f"Batch results saved (processed {len(results)} keywords so far)")
# Final results
if results:
final_results_df = save_results(results, 'keyword_classifications_final.csv', timestamp=False)
print("\nClassification Results:")
display(final_results_df.head(10)) # Show just the first 10 rows in Jupyter
print(f"Total processed: {len(results)} keywords")
# Merge with original data
merged_df = df.merge(
final_results_df,
left_on=KEYWORD_COLUMN,
right_on='Keyword',
how='left'
)
# Save merged results
merged_df.to_csv('keywords_with_classifications.csv', index=False)
print("Merged results saved to 'keywords_with_classifications.csv'")
# Report errors
if errors:
print(f"\nFailed to process {len(errors)} keywords:")
for keyword in errors[:10]: # Show just the first 10 errors
print(f"- {keyword}")
if len(errors) > 10:
print(f"... and {len(errors) - 10} more")
# Save errors to file
pd.DataFrame({'Failed Keywords': errors}).to_csv('failed_keywords.csv', index=False)
print("List of failed keywords saved to 'failed_keywords.csv'")
return results, errors, final_results_df if results else None
Krok 5: Načti a zobraz
# Load data and display information
try:
# Check if input file exists
if not os.path.exists(INPUT_FILE):
print(f"Error: Input file '{INPUT_FILE}' not found.")
print("Please update the INPUT_FILE variable in the configuration section.")
df = None
else:
# Read the CSV file into a DataFrame
df = pd.read_csv(INPUT_FILE)
# Check if keyword column exists
if KEYWORD_COLUMN not in df.columns:
print(f"Error: Column '{KEYWORD_COLUMN}' not found in input file.")
print(f"Available columns: {', '.join(df.columns)}")
print("Please update the KEYWORD_COLUMN variable in the configuration section.")
df = None
else:
# Display information about the dataset
print(f"Total rows in dataset: {len(df)}")
print(f"Unique keywords in dataset: {df[KEYWORD_COLUMN].nunique()}")
print("\nSample data:")
display(df.head()) # Using display() for better formatting in Jupyter
except Exception as e:
print(f"An error occurred while loading data: {e}")
df = None
Krok 6: Neplecha dokončena!
# Execute keyword processing
if df is not None:
# Process the keywords
results, errors, classifications_df = process_keywords(
df,
max_workers=MAX_WORKERS,
batch_size=BATCH_SIZE,
save_interval=SAVE_INTERVAL
)
print("\nProcessing complete!")
# Display final results in notebook
if classifications_df is not None:
print("\nFinal Classifications Sample:")
display(classifications_df.head(10))
Výstup najdete jako soubor keywords_with_classifications.csv
Jestli něco nefunguje, dejte vědět a opravím. Kód by měl s minimálními úpravami běžet případně i na Google Colab, kde je to celé v cloudu.
Máte nějaké nápady, jak nejlépe stavět prompty pro klasifikaci klíčových slov? Budu rád, když se podělíte.
Diskuze členů