Documentation

Reverse conversion scope

mdcraft.ai Reverse Conversion Scope#

Objective#

Define the first-release scope for PDF -> Markdown beta.

This part of the product should be useful, but the messaging must remain disciplined. Reverse conversion is valuable because it restores editability, yet the quality ceiling varies heavily by source quality.

HTML -> Markdown remains future scope and should not be treated as part of the current public Phase 1 workflow.

Product promise#

Recover usable markdown from text-first PDFs, then review it before download.

Scope by format#

PDF -> Markdown#

This should launch as beta.

In scope#

text-first PDFs with clear section structure
reports, guides, notes, and whitepapers
moderate table usage
occasional code blocks

Out of scope#

scanned image-only PDFs unless OCR quality is clearly acceptable
heavily designed magazine-style layouts
multi-column layouts with floating sidebars
charts that need semantic reconstruction
complex forms

Review-and-fix workflow#

Reverse conversion should never be a blind one-click download in the MVP.

Required review step#

After extraction, show the user:

rendered markdown preview
raw markdown editor
highlighted confidence warnings for risky blocks
quick fixes for headings, tables, lists, and code fences

Confidence indicators#

Flag likely errors such as:

malformed tables
suspicious heading jumps
broken list indentation
missing code fences
OCR uncertainty
image-only pages

Output quality goals#

PDF -> Markdown#

preserve section order for text-first files
reconstruct headings and lists where confidence is high
output usable tables only when row and column structure is believable
prefer plain text over fake precision

UX copy guidance#

Good promise#

Best for text-first PDFs. Review before download for the cleanest result.

Bad promise#

Perfect PDF to Markdown conversion for any file.

Technical notes#

PDF conversion should normalize pages into text blocks and structure candidates before markdown assembly
OCR should remain optional and clearly labeled as beta
keep source snapshots or intermediate data only as long as required for the current session unless the user explicitly saves work

Beta release criteria#

PDF -> Markdown produces useful output on most text-first PDFs
risky extraction cases are surfaced instead of silently producing broken markdown