The Plain Text Problem

I’ve run into a problem. I want to keep my series on using plain text going. I’d like to help people transition their lives away from excessive applications like Microsoft Word, but, as it turns out, the next logical step doesn’t exist.

There are several great options for writing in Markdown and tools for enhancing it, but getting your existing documents converted into it seems to be a real challenge. It’s one that needs solving.

Readers of the series have responded positively to idea of a lighter-weight approach to writing, one that was far more portable and far more future-proofed, but have made it clear that they need a way transition from where they are now. They want to take their backlog of DOC, DOCX and RTF files and convert them into either Markdown files or plain text files with Markdown formatting. At the moment, there’s no solution that’s viable for average computer users.

Several of my geekier friends have offered up several possible solutions; many of them involve scripts and geekery that is way beyond me. There’s nothing out there that is even remotely accessible to most who are just starting to dip their toes into plain text and Markdown. There is, however, an interest in shifting away from overwrought, overpriced technology for creating text in a way that can easily make its way to the web. This is a void that needs filling.

My skills fall far short of what’s needed here, but I though I’d get the ball rolling and offer up the easy part. I can’t provide Markdown newbies the right tool, but I can offer some insight for developers into what they might need and want. Keep in mind, I’ve already done this the long, hard, stupid way, manually converting my essential files, so this is in no way, shape or form a personal request. I don’t need this, but I think there is a significant and growing audience who does. The tool I’m suggesting needs to be simple, it needs to be something that anyone at any level can use. The minute you start talking in code, scripts, services, macros, you’ve lost those who need this most. This needs to be a self-contained, user friendly application, not some combination of slick scripts and services.

With this in mind, here’s a rough idea for a product aimed at the average person writing for the web. Users should be able to drag several files (including at least RTF, Doc and Docx) into an app or onto either a dock or menubar icon to begin queuing the files. Once a specific set of files has been added, the user should be able to go in and select:

  • MD files or .txt files with Markdown Syntax.
  • Info to prepend and append text to each file name for creating a naming convention for your library. For example, a user could prepend a category tag like Blogx and append a date stamp, which could be created manually or pulled from file data like date created. This would make their new file read Blogx – Original File Name – 12–10–02.
  • Add OpenMeta tags (mainly because I worry that Brett Terpstra would kick me if I didn’t suggest this).
  • Choose a location to send your files, with the likely intent being a single flat folder for use in an application like nvALT (although I’d still leave the option to manually change the folder for those who prefer a folder structure… and before you tell me that Hazel can probably do this after the fact, remember… we’re talking average users here).
  • The option to add a link to the original file (or for brownie points automatically add a link to the file in places where certain aspects of the original file cannot be converted).

From there, users could batch their files, name them correctly by grouping them with keywords and migrate their archives into a new system. There’s probably a lot more that could be done, but as limited as my understanding is, I get the feeling that the conversion is the real challenge. I know guys like Brett Terpstra are starting to work on some of the geekier ways to do this, but a script won’t be the thing that helps those that need it most.

So anyway, if someone with… you know… actual tangible skills wants to build this, I believe there’s a growing need and audience who would be willing to pay in exchange for the time it would save them in making the switch to a plain text system.

13 Responses to The Plain Text Problem

  1. Michael – I totally agree with your suggestion and would definitely purchase an app that implements this process. As long as we are wishing I would add HTML to the list of input formats (although HTML conversion is already possible thanks to Brett Terpstra).

  2. I would say the easiest path is to output to html (MS-Word can do this, and Open/LibreOffice can as well, giving cleaner html code; there are various scripts out there to clean up the html that ms-word provides).

    Assuming the author has written using stylesheets rather than individually formatting every paragraph, the html file can be opened in a text editor and then some simple search and replace commands will give a good markdown shape: “[H1]” becomes “[H1]# ” and so on, and “[b]” becomes “[b]” while “[/b]” becomes “[/b]” and correspondingly “[i]” and “[/i]” become “[i]” and “[/i]”

    The resulting html file can then be opened in Firefox and “Save As” text. The text file then is OK as markdown.

    Even quicker, though offering less control, is this path: from MS-Word (or Open/ LibreOffice) open the .doc file, Save As html, then open the html file in Firefox and straight Save As text. I think this gives bold as surrounded by * and italics surrounded by / with headings indented by pairs of spaces – H1 by 2 spaces, H2 by 4 spaces, and so forth. A few search and replaces will then get the thing into decent-enough Markdown.

    Some hand tooling of the first mechanical Markdown translation will still be needed. But that gives the author a good excuse for one more round of quick scanning and copyediting.

    This comes from a writer who has periodically transformed long pieces from one format to another several times as he falls in love with one more tech fad after another, rather than getting on with the work of writing…


  3. Another fairly common issue I run into with plain text is collaboration. It’s great to read. It’s easy to write. It works on everyone’s machine. But there is no convenient way to do commenting or track changes on something that is longer than a thousand words.

    So what I’m really looking for is an application that does a really good “diff” between two plain text files, shows both versions side by side, highlights the differences, and provides a good interface for moving through each of the differences and merging the files. The real geeks will probably be able to do this using git or some other code versioning-like software. I want a way to send plain text files back and forth with my girlfriend or my sister and give them an easy way to move through my changes.

    Maybe it exists, but so many people I know will just say “use Google Docs” or “use git” but that’s not really a solution. Track changes is a huge part of what makes Word successful.

    • If you’re conversant with Emacs, you could use ediff (found by compare buffers on the menu bar). It’s the best differencing tool I’ve ever seen, including a number of commercial ones. It allows you to apply changes from either document into the other with just a single keystroke, and when it highlights a difference, it can show you what’s changed on the line, not just the line that’s changed. Even though I’ve switched from emacs to vim for editing, I still use emacs just for the difference tool.

  4. I don’t understand Markdown or the need for html for WRITING? For the future proofing of todays documents seems to me one way to go is searchable PDFs. The text is workable, selectable and comes with the formatting of the original document. Is the Markdown/HTML push for writing on the web or writing and would Markdown be better for what?

  5. I would start with why are you using plain text and why it makes sense? For me it’s the simplicity and future “proof-ness” ease of editing regardless of platform but I see no reason to chose different format should it be better for to goal I want to achieve.

    Once you expect to add more features and hope to use it for more purposes, you add more and more complexity. So at some point you may be trying to fit square peg into round hole which might be creating more constrains than eliminating.

    I think each file type has it’s strengths that make sense in certain context. I recall listening to David Sparks where he shared how has he written two books. He started with plain text and Scrivener and then once when the collaboration with editor was necessary he moved to Word.

    Plain text is excellent for starting but may not be appropriate for a finished product.

  6. There is a way of round-tripping from other formats: Scrivener – and at $45 it’s not expensive. Scrivener will import RTF, doc, docx and various other formats and you can choose between a couple of conversion options. Whilst Scrivener’s internal format is RTF it will export and import text docs in Markdown format. In addition it will sync a project to dropbox – giving you Markdown files to work on or collaborate with and re-sync them with the project on launch. Even if all you use Scrivener for is a file translator and repository and all the work is done in markdown it’s still a cheap solution. As for track changes – why not comment or mark up changes in Markdown? If you want more complex than that then Scrivener can do diffs on files in two windows to show you the changes that have been made. I’m not a fan on big. complex, multi-person track changes in any event – they’re worse than useless.

    Disclaimer: I have no connection with Scrivener or the team.

  7. I’ve been an advocate for plain text for some time – largely because I come from a Unix command line background. The tools you are looking for use one of two programming languages and a set of scripts.

    The first is RestructuredText which is a markdown language which leaves your text files looking like text files but when run through the perl scripts supplied by docutils produces very good HTML output. It can even produce HTML with boxes, tables and side bars. RestructuredText and docutils can also produce PostScript output.

    The other potential solution uses python and again a set of scripts. It too uses a markdown language, which is very similar to ReStructuredText but is in many ways simpler. This particular set of scripts using the Almost Free Text (AFT) markdown can produce good HTML, RTF and PDF output. They are very simple to use and will work across multiple platforms (although as an anti-Apple person I can’t speak for any of them working on Apple OSX).

    All that is necessary is for the writer to write in plain text using the markup of choice (AFT or ReST) and run it through the appropriate script and bingo one web-page ready to go. .

  8. I’m running into a similar problem, while thinking about adopting version control for writers. A lot of the writers, editors and publishers I’ve been talking with seem to agree that the version control tools available for programmers would be great if they could be made more accessible for the average user-of-text but it seems the tools aren’t completely ready just yet.

Leave a reply