WebQA Generator
Last Update: 05/02/2025Overview
The WebQA Generator enables users to extract content directly from web pages and generate high-quality question-answer (QA) pairs. This tool is ideal for building datasets from online articles, documentation, blogs, or any publicly accessible web-based resource.

Configuration Options
URL Configuration
Input one or more URLs to extract content from. Each page will be processed in the same session.
Example:
https://example.com
https://docs.example.org/guide
Chunk Size
Default: 1900 characters
Splits the web content into smaller segments.
- Smaller chunks → More specific QA pairs
- Larger chunks → Broader contextual questions
Questions per Chunk
Default: 3
Specifies how many QA pairs to generate per chunk of text.
Auto-Download Dataset
Automatically downloads the generated QA pairs as a file when enabled.
Auto-Save to Database
Saves the generated QA pairs directly to a connected database when enabled.
Output Formats
Choose from the following supported formats:
DocSynth Single-Turn Format
Simple question-answer pairs in a single-turn structure.
Format: json
{
"conversations": [
{"from": "human", "value": "Q"},
{"from": "assistant", "value": "A"}
]
}
QA Format
Minimalistic format with standalone QA pairs.
Format: json
{
"question": "Q",
"answer": "A"
}
OpenAI Format
Compatible with OpenAI's fine-tuning format, includes system-level guidance.
Format: json
{
"messages": [
{"role": "system", "content": "..."},
{"role": "user", "content": "Q"},
{"role": "assistant", "content": "A"}
]
}
OpenAI Format – System Message Configuration
Define the system role's content in the OpenAI format to control the tone, behavior, or focus of generated QA pairs. This message provides initial context for the assistant.
Example:
{
"messages": [
{
"role": "system",
"content": "Generate QA pairs that summarize key technical concepts in a concise and professional manner."
},
{
"role": "user",
"content": "Q"
},
{
"role": "assistant",
"content": "A"
}
]
}
This can be uniquely crafted to suit OpenAI-specific training pipelines.
Actions
Generate Q&A Pairs
Starts processing the listed URLs using the configured chunking, output, and generation settings.
Start Over
Resets all fields and settings to their default state, clearing the URL list for a fresh session.