We need an HTML Document standard

I’ve recently been messing with Markdown (to my utter chagrin) with the idea of using it to update my resume and blog. I was simply astounded by how much recreation of solved problems is going on in that space. I prefer JavaScript/Node so I’ve played with UnifiedJS (remark/rehype), MarkdownIt, MarkedJS, and others. It’s honestly just absurd how over-engineered each project is.

If all you’re producing is HTML - which is 99% of what Markdown is used for - having to deal with a raw AST in order to modify a document is like deciding to use Assembly instead of Python. Sure, it could be more efficient if you really want to spend the time, but it’s generally a step backwards in every way. I soon realized the only predictable, reliable and maintainable way to deal with Markdown was to extract the frontmatter, then convert the rest into bog-standard CommonMark HTML and then use JSDOM to do any additional manipulation. So instead of fighting with some wonky AST tree and APIs, I can use the DOM and standard web tools and code.

But then I gave up. Markdown was created in 2006, when web browsers were anemic compared to today's monsters. Using Markdown in 2022 is like using COBOL. Sure, it works, but we can do better.

What I would like to see is a new HTML Document standard (none of the various implementations out there qualify) that mimics the core reason Markdown and other plain-text systems like AsciiDoc or LaTeX exist: To separate the writing from the presentation, but with some basic formatting as needed for most documents. There are various custom HTML doc formats out there: ePub and mobi files use HTML inside, as does Microsoft’s CHM and MHT. And there’s a hundred zipped XML file formats out there - docx, odt, etc. But they’re either write-only, proprietary or are too complicated for this purpose. This doesn't even include MIME HTML, CBOR files like Web Bundles, the Internet Archives WebArchive format, WARC and more. 

What I would want is a simple .htmd standard file format, which - like all the "lightweight markup languages" out there - is just text containing a strict subset of HTML (no forms and iframes) and CSS which basically mimics the output of Markdown. It wouldn't have any JavaScript, enforced by the file extension/mime-type and CSP, nor embedded files like images. The subset would be limited to just semantic tags and reasonable formatting, to guarantee editable HTML. Nothing dynamic or crazy. Just pure WYSIWYG. If the W3C were to adopt the standard,  also allow custom editor skins like CKEditor, TinyMCE, Trix etc. But again, with standard output. This would be great for online forums like HN or reddit. In standalone apps, like Apple’s Text Editor or Microsoft’s WordPad, the output would be a cross platform rich text document that is readable and writable by any browser or standard .htmd editor.

The idea is to Keep It Simple Stupid, but also provide basic cross-platform WYSIWYG editing where the simple, clean formatting is always displayed exactly like it looks when editing. I used Typora, which is a great little rich text editor that uses WebKit for the interface, and then exports Markdown, which I then process into a web page. It’s insane. We need to cut out that moronic middle step. 

Since a basic HTML Document editor doesn't exist yet, I made one. It's called Hypertext. Go try it out. I'm using it now to write this.

Browser engines have progressed so far since Markdown was created. It’s all a matter of standardization at this point. Keep the spec simple and focused on just creating simple documents. If someone wants to use the output as a full-on web page, then it’s just a matter of post-processing (just like is done now) and adding full-strength CSS, JavaScript, etc. The CommonMark spec could even be updated so that .htmd is the standard output of a processed .md text file.

The web has tilted too far towards the dynamic app end of the spectrum, and lost its roots as a document format. I think something like this would be a great way to get back to that.