DocSynth Tools

Last Update: 05/02/2025

Overview

DocSynth Tools is a utility suite designed to process, convert, and manage QA pair datasets generated by either the DocQA Generator or SynthQA Generator. These tools streamline dataset preparation for AI model training across various platforms.

DocSynth Tools Overview

Available Tools

Doc Format Converter

Convert QA datasets between supported output formats to ensure compatibility with different training pipelines.

View

Doc Merge

Combine multiple QA files into a single, consolidated JSONL file for streamlined processing and analysis.

View

Doc Validation

Split a QA dataset into training and validation sets, a critical step for machine learning model development.

View

Doc Splitter

Divide a large QA dataset into smaller, manageable files for distributed processing or modular testing.

View

Process Flow

1. Generate

Create QA pairs with DocQA or SynthQA

2. Process

Manage and transform with DocSynth Tools

3. Train

Use datasets for AI model training