flatreader

Show HN: Fixmydocuments.com – Transform any document into an optimized version

A few months ago, I submitted a webapp that lets you take any YouTube video and turn it into a polished written document in markdown format. I got feedback from people that they wanted something that could work for any audio file. Separately from that, I submitted an open-source project (llm_aided_ocr) a few months back that lets you "upgrade" the output of tesseract OCR, using an LLM to correct transcription errors and also to convert the formatting to use markdown. Well, I decided to combine all those features and more in my newest app, called FixMyDocuments.com.

You can submit any kind of document-- PDFs (including scanned PDFs that require OCR), MS Word and Powerpoint files, images, audio files (mp3, m4a, etc.), and turn them into highly optimized versions in nice markdown formatting, from which HTML and PDF versions are automatically generated. Once converted, you can also edit them directly in the site using the built-in markdown editor, where it saves a running revision history and regenerates the PDF/HTML versions.

In addition to just getting the optimized version of the document, you can also generate many other kinds of "derived documents" from the original: interactive multiple choice quizzes that you can actually take and get graded on; slick looking presentation slides as PDF or HTML (using LaTeX and Reveal.js), an in-depth summary, a concept mind map (using Mermaid diagrams) and outline, custom lesson plans where you can select your target audience, a readability analysis and grade-level versions of your original document (good for simplifying concepts for students), Anki Flashcards that you can import directly into the Anki app or use on the site in a nice interface, and more.

For any HTML generated content, you can also host it with one click and you get a unique URL that you can distribute to anyone for viewing, and they don't need to have an account to see it.

This has been a lot more challenging to make than I originally guessed it would be, but I'm pretty pleased with the final output quality, which was a result of tons of prompt engineering and iteration and chaining together different prompts in pipelines. The mind map generation in particular is ~2,700 lines of Python code and involves many dozens if not hundreds of separate LLM inference calls to generate a single mind map from a source document. What I think is interesting about this is that, even though one theoretically could do many of these things using ChatGPT manually, it wouldn't be practical because of the many stages of complex logic involved in combining and transforming the LLM outputs.

There was also a lot of more manual "quality control" filtering/processing involved to remove any traces of the LLM inserting irrelevant text, such as preambles/introductory comments (even when explicitly prompted not to do so).

Anyway, happy to answer any questions people have about it.

You get 100 free credits just for signing up with a Google account, which is enough to process a bunch of modest sized documents. Please give it a try and let me know what you think!

Comments URL: https://news.ycombinator.com/item?id=42453651

Points: 3

# Comments: 3