Documentation
Reverse conversion scope
mdcraft.ai Reverse Conversion Scope#
Objective#
Define the first-release scope for PDF -> Markdown beta.
This part of the product should be useful, but the messaging must remain disciplined. Reverse conversion is valuable because it restores editability, yet the quality ceiling varies heavily by source quality.
HTML -> Markdown remains future scope and should not be treated as part of the current public Phase 1 workflow.
Product promise#
Recover usable markdown from text-first PDFs, then review it before download.
Scope by format#
PDF -> Markdown#
This should launch as beta.
In scope#
- text-first PDFs with clear section structure
- reports, guides, notes, and whitepapers
- moderate table usage
- occasional code blocks
Out of scope#
- scanned image-only PDFs unless OCR quality is clearly acceptable
- heavily designed magazine-style layouts
- multi-column layouts with floating sidebars
- charts that need semantic reconstruction
- complex forms
Review-and-fix workflow#
Reverse conversion should never be a blind one-click download in the MVP.
Required review step#
After extraction, show the user:
- rendered markdown preview
- raw markdown editor
- highlighted confidence warnings for risky blocks
- quick fixes for headings, tables, lists, and code fences
Confidence indicators#
Flag likely errors such as:
- malformed tables
- suspicious heading jumps
- broken list indentation
- missing code fences
- OCR uncertainty
- image-only pages
Output quality goals#
PDF -> Markdown#
- preserve section order for text-first files
- reconstruct headings and lists where confidence is high
- output usable tables only when row and column structure is believable
- prefer plain text over fake precision
UX copy guidance#
Good promise#
Best for text-first PDFs. Review before download for the cleanest result.
Bad promise#
Perfect PDF to Markdown conversion for any file.
Technical notes#
- PDF conversion should normalize pages into text blocks and structure candidates before markdown assembly
- OCR should remain optional and clearly labeled as beta
- keep source snapshots or intermediate data only as long as required for the current session unless the user explicitly saves work
Beta release criteria#
- PDF -> Markdown produces useful output on most text-first PDFs
- risky extraction cases are surfaced instead of silently producing broken markdown