Markdown vs HTML – hoedown or showdown?

On a mailing list recently, a friend asked: should my workflow use an HTML editor or markdown? There is, of course, no easy answer. It depends what trade-offs you want to make. At Fire and Lion we use markdown for book production, and I know very smart people who think that’s crazy. They’d pick an HTML editor any day.

What worries me about working in an HTML editor is that it must make assumptions about what the user intends. That is, HTML editors abstract what you see from what you’re storing. What you see is what you hope you get. The real source format is what the user types, but HTML editors effectively discard it, jumping straight to rendered output and hiding its structures from view. Non-technical users often have no idea what they’re actually storing, and very little control over it.

And those assumptions are the root of all editor evil: they inevitably lead to a hidden mess of legacy markup. As users edit, and especially as they paste from other sources, their editing software has to make guesses about what formatting and what HTML elements the user wants to keep and what it can discard. You only have to glance at the source of a heavily edited WordPress page to find a teeming mass of unnecessary spans, redundant attributes and inline CSS.

Over time, the HTML gets messier, and that mess is swept under the rug of the WYSIWYG view. And as it gets messier, it becomes less portable, and conversion tools become less useful. Suddenly I can’t just reuse my HTML somewhere else without unpredictable results. For any given reuse I lose hours to cleaning up my HTML, effectively creating a whole new fork of my project, and losing the ‘single-source master’ feature of my workflow. I’m sure every new HTML editor aims to solve that problem, but I haven’t found one that’s solved it yet.

So that’s why I remain a champion of markdown-based workflows: there is no abstraction in the editor, because I’m only ever working in plain text. The simplicity of plain text means my content stays clean as I go, because there is no rug to sweep a mess under.

The bare bones of markdown have other spin-offs, too:

  • Markdown is more portable. By ‘portable’ I mean between people and between machines. For non-technical people, markdown is more open than HTML: it’s instantly readable and copy-pastable. A format that’s useful without a developer in the room is exponentially cheaper to work with, especially when you have to move it between machines.
  • The contraints of markdown force us to keep document structures simpler, sticking to fewer, standardised elements.
  • And diffs of plain-text markdown (in Git especially) are easy to read. We can use them in editorial workflows as is.

However! Those who prefer HTML editors are right that markdown has serious constraints. Or, rather, that HTML5 (like many markup languages) provides more features than markdown can provide natively. For instance, markdown can’t produce tables with merged cells, create plain divs and spans, or manage nested snippets for things like figures. They’d argue rightly that the markdown editing experience can be clunky, especially to those accustomed to Word-like UIs. And that non-technical users mostly don’t share my concerns about messy underlying markup: they just want an editor that looks great and is easy to use.

Like tabs versus spaces, I don’t expect this debate will ever be resolved. What matters is that we each pick our own trade-offs, and respect the trade-offs others make.

 

I love you, InDesign, but it’s time to let you go

I love you, InDesign, but it’s time to let you go. We just can’t be together in a multi-format world.

InDesign is expensive, so I can’t have my whole team working in it. It’s so powerful that it takes years of experience to use it without making a mess. And it’s fundamentally incapable of producing both print PDF and ready-to-use HTML from a single master file, despite some amazing hacks.

Adobe has tried valiantly to turn this page-based, hot-lead-replacement into a multi-format tool, but its roots in print are just too deep. Making books in InDesign and converting them to high-quality ebooks and websites is a rocky journey that leaves even the smartest typesetters bloodied and broke.

At Fire and Lion we make a lot of books for screen and paper (mostly for publishing companies and non-profits). To our clients, what matters most is that the books are well-crafted in every format, and that working with us is problem-free. Behind the scenes, we have to do something special to make that possible.

So the first thing we do is avoid using InDesign for setting everything but the most heavily illustrated books. We don’t do page-by-page layout and convert to HTML later. In fact, we do exactly the opposite: we make each book as a little website, and then output to PDF.

To put it another way: Fire and Lion makes responsive websites that respond not only to screen sizes but to the pages of a book. And we do it so well that, looking at the finished product, you can’t tell the difference between our books and those you’d get from a typesetter working in InDesign.

We’ve been lucky to work with clients who’ve let us make their books with this cutting-edge toolset. You have to be brave to accept a GitHub repository as your open files, rather than an InDesign package; but it’s brave people like that who move our industry forward.

Nothing we’re doing is a secret: our workflow is open. So when we’re not making books, we’ll be talking about how we make them. If you’re working with similar tools, or curious about ours, let us know.

Three things every editor should know about digital publishing

A while ago I gave this talk at the Cape Town Professional Editors Group. Here are my speaking notes.

Today, every passage you edit will sooner or later be read on screen. This digital world desperately needs our craft and high standards, but what does that mean for our daily work? In this talk I’ll pick out three big, important issues, and talk about some of the tools we’re using to tackle them. The first is text-only editing. That is, the end of word processing as we know it. The second is real-time, collaborative editing. And third, automagical pagination: how do we edit when there’s no such thing as ‘page two’?

So what does this digitisation thing really mean for editors? I think, basically, it means you’re editing text that will be read on a screen. Importantly, you’re editing text that will be read on a screen and on paper.

Now if you’re going to edit for the screen, the single most important thing is to actually read on a screen yourself. If you aren’t reading on screen, you simply cannot edit for the screen. Just like you can’t fix a car if you’ve never ridden in one.

That said, we are all busy people and there is an infinite amount to learn about computers: the rate at which the technium evolves far outstrips the rate that we can understand it. Even the greatest minds in computing readily admit that the Internet is now bigger and more complex than any one person understands.

So the trick is to not try too hard to learn it. Rather, just start using web- and screen-oriented tools and the learning will come when you need it. No one went to a whole seminar on how to use email before they sent an email.

In the next thirty minutes or so I’ll pick out three big, important developments and talk about some of the tools we’re using to tackle them. This is basically show and tell.

The first is text-only editing. That is, the end of word processing as we know it.

The second is real-time, collaborative editing.

And third, I’ll talk a little about automagical pagination: how do we edit when there’s no such thing as ‘page two’?

Text-only editing

First, what is text-only editing? Text-only editing is editing in plain-text files. When you do this, you’ll probably be using a particular writing structure called Markdown. For instance, let’s use Stackedit to write plain-text markdown. Type on the left, and on the right Stackedit turns our plain text into formatted HTML.

On the left, I type plain text in a markdown structure. On the right, formatted HTML.

On the left, I type plain text in a markdown structure. On the right, formatted HTML.

What we’re seeing here is the separation of content (which is structured text and image-references only) from formatting and design.

What are the big advantages of text-only editing?

  1. Smaller, faster files.
  2. Computers need perfect consistency (the digital age is a wonderful place for obsessive copy editors). Here the tools force our hand, and we learn to be less sloppy.
  3. Text-only means fewer copy-paste messes (when you copy paste into a new document and the fonts go all weird), because I’m getting only and exactly what I’m seeing. Plain text. We do have learn some new tricks like unicode glyphs (there is no ‘Insert symbol’ font or formatting gimmicks, like superscripting an o for a degrees symbol). This is actually a good thing, even if it seems like more work at first while we learn its tricks.
  4. Less file corruption, because there is simply less going on – less code to go wrong.
  5. Better version control, especially if you learn to use a tool like Git.

Collaborative editing

Collaborative editing has literally changed the way I write, edit and deliver documents.

What is collaborative editing? In short, me and someone else editing the same online document at the same time. The biggest tool for this is Google Docs.

What are the major pros of collaborative editing?

  1. It lets others watch while you work. And you can watch while others work. Publishing is weird because it’s always been a team sport played by lonely freelancers from their own home offices. Collaborative editing instantly makes the team aspect real and useful.
  2. You can use commenting for feedback and discussion. Track changes just isn’t the same as actual live annotation. No more emailing documents with increasing repetitions of the word ‘final’ in the file name. (Also, see Hypothesis.)
  3. Instant delivery of work and real-time review. As soon as you’re ready for your client to check something, share the doc and the ball’s in their court. So much editing is problem solving, and collaborative editing means the publisher-editor-designer are basically always in the room together at the same time.

I cannot believe that Google Docs has been around for years and people are still editing in MS Word. I promise, promise, promise you want to move all your writing and editing into Google Docs. (You could also use something similarly cloud-based with live collaborative editing but, for better or for worse, most people are familiar with Google and already have Google accounts).

Automagical pagination

Lastly, what is automagical pagination? Well, on screen, our software and screen size are going to decide how much text is on the ‘page’, the visible area in front of us. On screens we might refer to this as the ‘viewport’. If you’ve used Kindle, iBooks, or Google Play Books you probably know what this looks like.

There are a few key issues that arise when text flows into a viewport. And very importantly, when you’re editing the same text for both that viewport and also print output.

  1. Hyphenation and non-breaking spaces. Of course you never want to put a hard hyphen into a line because that line will be made and remade in countless different lengths in its life, and you don’t want your hyphens turning up in the middle of a line. You also don’t want the space in a number like 100 000 breaking over a line, so you need to learn how to insert a non-breaking space. And there are several other glyphs that have similar complications, like ellipses and en dashes.
  2. Cross-references. That is, referring to other places in the document. On screen, you can’t say ‘see page twenty’, because ‘page twenty’ is completely different on my computer and on my phone. You can’t say ‘Click here to go to the figure’, because in print there is nothing to click. And you can’t say, ‘in the figure below’, because on screen the figure might shift position. Common solutions are to introduce numbering systems for sections and figures, or to completely rephrase cross references. (Some smart digital-first workflows let you insert a variable that becomes a page reference in print, and is a hyperlink on screen.)
  3. Elements that appear on screen but not in print. For instance, let’s say you want to include a YouTube clip in an ebook, but you can’t have the clip in the print version. In some systems, it is possible to mark certain elements to appear in one version but be completely hidden in another.
  4. Minimalist courage. For maximum compatibility with unknown reading systems, you have to use fewer, more carefully-chosen features. You can’t have ten different variations on headings or boxes. Pick very few features, and treat them consistently with the same styling rules. Make no single-instance exceptions (e.g. never say “I’ll make this one heading smaller because otherwise it’ll look funny here.”)
  5. Strict content hierarchies. You have to place every feature of the book in a hierarchy, as if your whole book was a tree of trunk, branches and leaves. Computers need hierarchy.

There is lots more we could go into here but there isn’t enough time and we’d bore half the room. And as I suggested at the start, it doesn’t matter how much you try to stuff in your head now, when the only way to make it useful and make it stick is to deal with issues as they come up in your work.

I hope that you have some concrete questions, though, so we can spend some time dealing with those real issues that you’ve already come up against.

Uncapped: Can humans handle a blank cheque for data?

I remember the first time I used uncapped Internet. I was sitting in a friend’s coffee shop in Bristol, on a trip from South Africa, and my brain exploded. Till then, every time I’d clicked a link I’d worried about how much data it would use. I’d had to think carefully about every video, every download, every large page. And that anxiety was a constant force against browsing freely. That was just the way the web worked: you had a data allowance and, no matter how big it got, you had to use it wisely.

So uncapped Internet was not just fun, it was a revelation. Revolution, even. Suddenly nothing stood between me and anything my heart desired. I could indulge my curiosity at will. Games, video, porn, software, music, and learning. So. Much. Learning. That hour in a cafe in Bristol literally changed my life.

And now so many of us — almost anyone at the wealthier end of middle class — just take it for granted. Once we cross from limited to unlimited — from finite to infinite — we easily forget what it’s like to manage with limited resources. Our behaviour on one side of that line is very different from our behaviour on the other.

Doug Hoernle, founder of mobile-education business Rethink Education, tells me their young, low-income users are relentlessly careful about the data they consume on their phones. When a class is doing research, one student will open Wikipedia, screenshot the page, and WhatsApp the image to everyone else. This isn’t just to save data, but to ration it: they don’t know how big the Wikipedia page will be, but they can tell exactly how big a screenshot is before they download it. Similarly, before they decide to download a free app, they’ll check its size and calculate its cost in data: its real price. And even then, they’ll weigh up carefully whether that app’s size in their cheap phone’s memory is worth all the photos they could save in its place. For them, there is no such thing as ‘free’.

As much as I love unlimited data, it has a real danger: without a budgetary constraint on our browsing, there is far less pressure to choose carefully. We just open the Internet firehose and let it run. We’ll curate later, we think, and then we complain that there’s so much crap on the web and we never have any time for ourselves or our work. And then, perversely, we turn that firehose back on the web and upload our own stuff — often half-baked — for others to deal with.

Humans have a lot to learn about managing the firehose. I may learn a few tricks in my lifetime, and I’ll pass them on to my son, who’ll learn a few more. It’ll take generations for us to be comfortable, confident, happy dealing with an unlimited supply of information.

For some of those lessons, we should look to those who’re still capped, for whom every byte counts. How do they make their decisions? Who influences them? What sites and apps really matter? Are their constraints teaching them — at least till they cross that great divide — how to be more discerning people? Maybe they have something to teach us about priorities.

(This post was first published on Medium.)

Institutional licensing: the next textbook business model

Right now, in South Africa, the textbook-publishing industry faces a real threat to its future, because – faced with constant non-delivery of books – government is desperate to change the way it buys them. Other countries face similar challenges. Government officials often cite the ‘high price of books’ as a key issue. Whether you buy the state’s argument or not (I don’t, though I get why they’re fed up), the system is going to change, and publishers can either lead the change or be changed.

Are textbook publishers planning or being planned for? (image by Pedro Vezini, CC-BY-NC-SA, Flickr)

Are publishers planning or being planned for?

To have any real effect, the change must be a fundamental change in the mainstream textbook business model: instead of selling copies, we must sell licences. Specifically, we must sell licences with no limitations on the number of downstream copies. This is especially urgent and appropriate for school books, though it’ll work for universities and colleges, too. And it’s perfectly suited to a digital future of tablets and ebooks. Best of all, it will save government money and make publishers’ jobs simpler.

I’ll describe the model in some detail as I see it, and please contribute your thoughts and criticisms in the comments or directly with me. I’m particularly interested in hearing where similar models have already been implemented.

The textbook problem

The root of our problem is per-copy pricing. (I’ve written about this at length before.) From the moment they’re conceived to the time they’re paid for, textbooks are priced per copy:

  • If I’m a commissioning editor planning a new book, I use a spreadsheet to cost it. First, I enter my per-copy selling price, based primarily on what my customers are used to paying. Then I multiply that price by a sales guesstimate, and work out my production costs. If price × sales covers my costs, leaving some profit (often 10–15% nett), I can publish.
  • If I’m a writer, my potential royalty is a small percentage of the per-copy price. Will it justify the time I’ll spend writing? I do the maths in my head to work out how much I get from every book sold; but I have to gamble: what I’ll really earn is utterly unpredictable.
  • If I’m a marketing person, my job is to convince customers that my per-copy price represents ‘good value’, as if the effect of a given book on a child’s education can be reduced to currency. I may have a lot of work to do if I’m going to hit the publisher’s sales guesstimate. Or not – there is really no telling beforehand.
  • If I’m a state employee buying books for schools, I want to work out how many copies I can buy. So I divide my budget by the per-copy price of the books. For instance, one million rand divided by R100 per copy equals 10 000 copies. If I want more books, I ask the publisher to lower their price (or I change the rules to force a race to the per-copy-price bottom). The publisher then makes a judgement call, weighing maximum profitability against the possibility of losing the sale.

Here is a scary truth: from a textbook’s conception till the day the cash is banked, the whole system is based on false assumptions. The per-copy price is based on predicted sales, but there is no way to accurately predict sales for a given book. When future sales are always a wild guess, every aspect of a book’s financial forecast is fictional. So the per-copy price is fictional, too. For any given book, the ideal per-copy price is probably much higher or much lower than it needs to be to make the publisher a sensible margin.

Per-copy prices only start looking accurate when averaged out over dozens of loss-making and profit-making books. Once you’ve sold many different books, this fiction begins to look like truth, till we hear publishers say, in defending their prices, ‘books must cost x per copy because it costs y to create them’. Publishers say this all the time, but it’s an illusion of averages, a mirage of the fictional world we’ve chosen to sell our books in.

This is why larger publishers seem so much more successful than small publishers: the big ones don’t make better decisions, they just publish enough books to even out their guesswork.

Since everyone is in on the price-per-copy game, the entire industry is set up to make the fiction look like truth, by building business models around per-copy margins.

How did we end up in this fiction? We confused customers with beneficiaries, and our product with our solution. When your customer is your beneficiary, per-copy-pricing makes sense, as it does in general-interest publishing – when I buy a novel for myself I am the customer and the beneficiary. But in education, especially in state schooling, large institutions are our customers and their students are our beneficiaries. The state doesn’t want ‘ten thousand copies’, it wants ‘every child to have textbooks’. Those are two fundamentally different products. As publishers, we keep trying to sell ‘ten thousand copies’, when we should be selling the solution to the state’s problem.

The licensing alternative

Per-copy pricing is so ingrained that an alternative model seems inconceivable. But licensing is not only worth exploring, it’s already happening. As I’ll explain later, it’s been around for years under other names.

What do I mean by licensing? Instead of selling copies of finished textbooks (in paper or as rights-managed ebooks), a publisher sells a single, flat-fee licence to an institution. That licence lets the institution produce and distribute as many copies as they like for their beneficiaries, in print and digitally, for as long as they like.

For example, let’s say the state needs to give each learner a grade-ten maths textbook. It has to allow for a lot of uncertainty:

  • It doesn’t know how many learners will take maths over the next few years.
  • It doesn’t know when central government will change the curriculum.
  • Some learners need the book in print, and others as an ebook. This ratio will vary constantly across the country.
  • Different schools have different kinds of learners, so schools need to be able to choose, from a wide range, the books that best suit theirs.

The state’s textbook team looks at several publishers’ books and identifies their favourites by publishing an approved-books list. Teachers then get to see the books and say which ones they want to teach with.

In the traditional per-copy-pricing system, provinces would estimate how many copies they need before placing orders. For the ebooks, they’d have to know whether the publishers’ ebooks will work with the province’s ebook platform, and especially their DRM scheme, and whether those ebooks come with time-restricted licences (e.g. some publishers’ ebooks expire after, say, three years). Some provinces also work through bookstores, whose margins are included in per-copy prices.

Under a licence-based system, instead, central government would buy a licence from each publisher in return for a flat fee. Each publisher hands over a set of open digital files watermarked (in document metadata and on visible pages) with plain-language details of the licence. For instance, the title page says ‘Smart Maths 10 may be freely used by teachers and learners at government schools in South Africa. For any other uses, contact the publisher.’ Each publisher provides the state with:

  • a print-ready PDF
  • a web-optimised PDF
  • an epub file.
  • Potentially a web-based, free-to-view version.
  • Potentially a content-database API.

Each of these formats makes the licence more valuable to the state. The licence lets the state print their own copies centrally in big runs, and lets its schools and parents print their own extra copies as needed. The licence lets them email the ebooks, with no digital rights management, to all their schools. This gives all teachers electronic access to all approved textbooks, in addition to the printed copies of the one they chose for their own students. This diversity enriches their teaching and exposes teachers to new publishing brands.

If the publisher provides a content-database API, too, the state, provinces and schools can integrate the content and exercises into Learner Management Systems. (This is turn feeds usage data to the publisher. If there is a free-to-view website version, the publisher also gets to collect data on user behaviour there.)

Importantly, there is no need for expensive DRM systems to control access to digital textbooks. Right now, the per-copy-pricing model requires installing and maintaining content servers running DRM schemes that uniquely identify devices and students and lock content to each, tracking identities with hardware addresses and user logins. In under-resourced state schools, DRM is going to be a massive, complicated expense fraught with technical and educational problems. With DRM-free ebooks and free-to-view websites, all that goes away. Learners and parents can even use their own devices in addition to those provided by the state or the school, which in turn mitigates the problem of lost or damaged devices and the security risk of having children carry valuable devices around.

The licence fee

The big question is this: how much should a licence cost?

We can safely say that a high-school textbook costs about R1m (US$100K) to create. That includes paying writers, editors, artists, photographers, designers, developers, project managers, and management and admin support staff, all of whom will be working on several books at a time. That means that for a R2m licence fee, a publisher could make a good margin that lets them invest in new books before future licence deals are secured, and cross-subsidises titles that don’t get licensed.

What would it change for the state? We can make some ballpark estimates. If a country has 100 000 grade-ten learners enrolled for maths, and buys a R120-per-copy textbook for each one, it’ll spend R12m on physical books before distribution costs. If they can’t deliver a copy to every child (perhaps they under-order or under-deliver in some districts, as often happens), they’ll incur further purchasing and logistical costs to fill the gaps. If they want to buy a further, say, 10 000 ebooks at R100 each, they’ll spend another R1m before the cost of DRM infrastructure, training and maintenance. If we conservatively ballpark the DRM setup at R10m per year, the state is in for at least R23m. (Today, the South African government budgets about R5bn for books, which is very roughly R20–50m per subject per grade.)

Let’s say the state wants ten different maths textbooks for teachers to choose from. If they bought a once-off R2m licence from each publisher instead, they’d spend R20m on licences. The state could then print 100 000 copies for, say, R3m, depending on the books’ specs.

But now that the licence is paid for, next year they could print more copies without paying any further fee. Schools could fill their own gaps by printing off extra copies on school copiers or at nearby print shops, reducing the need for top-up orders. This saving would apply every year till the curriculum changes, when the publisher creates a new book for a new licence sale. Curriculum changes and updates are inevitable, so there will always be work for publishers.

Moreover, there is almost no cost for distributing ebooks, because there is no need for DRM, which is a major cost factor in ebook distribution, given the skills and server infrastructure required to manage it.

Licence fees would likely be negotiated over time and price bands might settle. Ideally, quality, popularity and value-added services (such as LMS-pluggability or multi-platform support) might factor into final pricing. This would be an important way to keep a diverse range of publishers in business. Book diversity is critical to a healthy educational system.

Leakage

What happens when the textbook leaks out of the institution? For instance, if a private school uses the textbook that was licensed to the state. Or, in contravention of the licence terms, a university puts on its intranet a textbook only licensed to another university.

Exactly what happens today when an institution illegally copies a book for its students: the publisher can choose to take legal action. Under a licence-based system, this is much easier to do than in the per-copy-pricing system:

  • Firstly, the licensed files can be watermarked (and even digitally fingerprinted), which makes them trackable and identifiable downstream. The traditional per-copy-pricing system makes watermarking a book with every customer’s details very difficult.
  • Second, institutions that matter as customers are generally easy to identify and take legal action against. They are legally exposed and must stay above the law just to stay in business.

Of course it would still be important that anyone can buy single copies from publishers at a per-copy price for small-scale or private use. But those sales would be small compared to licence-based revenue.

Institutional licensing by any other name

Licensing like this is not new. The DBE’s printing and distribution of Siyavula books is one similar system. While Siyavula books are open-licensed for anyone, not just for a specific institution, the effect is the same. Essentially, Siyavula’s philanthropic and corporate funders have paid the licence fee up front on behalf of the education system. From then on, the state can distribute copies for as long as the books suit the curriculum.

Institutional licensing is already the norm for many publications: every time any institution commissions a publication, they are effectively buying a licence to make and distribute unlimited copies to their beneficiaries. For instance, if a medical company pays a small firm to brand a book on diabetes for them, they can print and give away as many copies as they like. Many media-production companies run this way, producing IP for institutions to distribute to their beneficiaries. There is no reason this can’t work for the ones we call textbook-publishing companies.

There will still be a measure of risk-for-reward: a publisher is a company that invests in creating content in advance of a sale. After much of the initial investment is made, the state must approve and buy the licence for the investment to pay off. But licensing is a much simpler model than per-copy pricing, and with fewer marketing overheads.

In another way, institutional licensing is already baked into the DNA of the publishing industry. When a publishing company signs a contract with an author and pays a flat fee for the right to distribute their work, they are entering into exactly the kind of unlimited-copy, institutional licensing scheme I’m describing.

The challenge

To become reality, institutional licensing for textbooks requires a special kind of alchemy: perfect timing. The state must phase in a licence system at a time that fits with curriculum change and book-publishing timeframes. This is very difficult. And publishers must be ready to sell licences when the state comes knocking, despite limited resources and immediate challenges.

In South Africa, our current crisis provides, perhaps, the opportunity we need: a state department determined to change the way they buy books, and a clan of publishing companies facing an uncertain future. We are about to see what we’re made of, and it’s an exciting time to play a part.