Exporting HTML from QuarkXPress

A recent post gave an overview of Quark’s ability as a web-authoring tool and my conclusion was essentially ‘nice looking pages, shame about the code’. Alternatively, using the same method of exporting Quark to Word, you can also export a string of text to HTML in a similar fashion, although the results are similarly unsatisfying.

Word 2003 produces some pretty horrible code too and less web functionality – I’m informed that Word 2007 has improved this somewhat, but still has some way to go. (If you have experience of Word 2007 and Quark, please leave a comment on this page.)

Here’s an illustration. Open up Notepad (or similar) and type the following:

<p>Hello</p>

Select ‘Save As’ then type ‘file.htm’ into the filename box, which will produce an HTML file with the icon of your default browser. Double-click the file to view the results. As a comparison, open up MS Word, type in ‘Hello’ and save as a webpage. Then open up the file in Notepad to see the code. You’ll agree that it’s awful and there’s a lot of it. (QuarkXPress is no better.)

The straightforward way to publish online is to make a web-optimised PDF, but there are also good reasons to do it in HTML. There are proprietary tools that will produce nice code for you, but buying such a product might not be an option.

What I’m going to describe is tricky and a bit of a work-in-progress. It’s not an elegant solution, but it works okay and will hopefully save you some time. It’s a better alternative than making a text-only file and then formatting paragraph by paragraph.

Why is ‘nice’ code important?

The main reason is Cascading Style Sheets (CSS). Like the style functions in Word and Quark, they remove the need for manual formatting and keep the content and formatting separate. Without formatting to worry about, all you mostly need to do is produce bog-standard HTML (with perhaps some additional styles created by whoever wrote the CSS). This is the whole principle on which WordPress and other blogging platforms are founded and the reason why we can change our page templates so easily.

Additional reasons to use standard code are accessibility and the desirability of keeping file sizes down.

The aim of this tutorial

This tutorial will essentially describe how to convert a long text-string from Quark into HTML. It is likely that you’ll still need to go into the code to fix some glitches and you’ll have to add any images, tables and suchlike manually. It’s a bit of a hack solution, but it has worked for me. As always, if you have any suggestions, please email me or leave a comment.

Structural considerations

The structure of HTML is essentially similar to any large document. In a book, you’ll have a visual hierarchy of styles where, for example, a chapter heading is superior to a main heading, which is in turn superior to a subheading. Likewise in HTML there are headings <h1> to <h6>. Likewise you’ll be using ordered (numbered) lists and unordered lists (bullet points), corresponding to <li> tags, which in turn are wrapped in the <ol> and <ul> tags respectively. Your body copy will correspond to the HTML paragraph style <p>. In a book or report, the chances are those tags and styles will correspond to the vast majority of your document. The trick is to produce plain text that is wrapped in these tags.

Step-by-step HTML from Quark

1. Get rid of all formatting, numbers and bullet points

To guard against any residual fonts creeping into the HTML, convert all the text to 12pt Times New Roman. You can do either do this in the style sheets or by deleting fonts in the Usage box and replacing them with Times New Roman. Use find and replace to get rid of all the bullet points in the document. You’ll also need to get rid of all the numbers in numbered lists. (Of course, this assumes your bulleted lists and numbered lists are built using styles.)

Bear in mind the aim is to produce HTML — the CSS and the browser will do the formatting for you.

2. Save the text as MS Word

Save to Word using File>Save Text (if you’re unsure how to do this, it’s on the same principle as the method discussed here). Your Quark styles will follow through into the Word version

3. Apply Word styles to the text

Open up your new word file. In Word, Format>Styles and Formatting will display the styles. Now, this is where it gets a bit tricky as Microsoft has generally overcomplicated this function and it’s a mess. You may have to go to the bottom of the Styles and Formatting pane and select Show: Custom, then display all styles.

The heading styles you’re looking for are Heading 1 to Heading 6, which correspond to <h1> to <h6> in HTML. By convention, Heading 1/<h1> is the title. Click on your main heading style, select ‘Select All’ on the drop-down menu, then apply Heading 2. Apply Heading 3 to the next style and so on.

For body text, you need to apply the style Normal (Web).

4. Bullets and numbers

Bullets and numbers are tricky – try as I might, I couldn’t find a solution that produced the right tags. The solution is a hack – select the styles that apply to bullet and numbers and apply one of the near-redundant html styles on the list (I use HTML typewriter). You may want to apply others if you have well-used styles, although keep a note of what they are.

5. Save as html and fix the code

Saving as ‘Web page, filtered’ seems to produce better results than save as ‘Web page’. You can then open up your file in Notepad to do some more fixing.

  • All the code before the first line of copy and after the final line can be deleted. If you use a content management system like WordPress, you don’t need them. If you’re passing on the html to a webdesigner, then he or she can sort it out. If you’re doing it yourself, you’ll know what code to use
  • The bullets and numbers, to which you’ve applied an redundant style, will have code like this:
<p class=MsoNormal><tt>Bullet point 1</tt></p>

You can fix this using find and replace. In this example, replace <p class=MsoNormal><tt> with <li> and change </tt></p> to </li>. These will all display as bullets in a browser. To make strict HTML, you’ll have to manually add <ul> and </ul> before and after these list tags to make proper bullets and <ol> and </ol> to make numbers.

  • You’ll have to do a similar hack for any other styles you’ve defined in this way with a different format. You may want to use <blockquote> for example.

6. Open up in a web browser and check your work

What you’ll see won’t look that special – the text will take on the default characteristics of your browser – but hopefully the structure will be okay. If anything looks awry, you’ll have to go into the code and fix it. The most usual problem is font tags creeping into the HTML, which you don’t want. Other tags in your original file, such as ones for superscript, italic and so on may also have gone missing.

4 Responses to “Exporting HTML from QuarkXPress”

  1. Hello – I am trying to figure out a way to easily extract text and graphics from a magazine produced in Quark to post the publication on-line via a Content Mgt. System.

    Your comments appear to apply to a primarily text document, so it may not work for me. Can you offer any advice? Thank you.

  2. Hi

    Good question and there’s probably not an easy answer that’s not going to require some work on your part.

    The hack above is good for a long text document, but for a magazine, I expect it wouldn’t take any longer to copy the text and apply the relevant CSS styles in the content management system.

    What will take longer is getting the images out of the Quark layout and into the CMS. What I think I’d do would be to copy your QuarkXPress file, then go to Layout>Layout Properties and change the Layout Type from Print to Web. You can then use File>Export>HTML to make webpages from it.

    The HTML it produces won’t be much use for your CMS, but this action will downsample all the images in your layout to web resolution. You’ll then have to upload the relevant images to your CMS and place them appropriately in the document. Of course, you’ve far fewer options for formatting as you have in Quark.

  3. Hello – I am trying to figure out a way to easily extract text and graphics from a magazine produced in Quark to post the publication on-line via a Content Mgt. System.

    Your comments appear to apply to a primarily text document, so it may not work for me. Can you offer any advice? Thank you.

  4. Greetings!

    Been using Quark for years for print — now am trying my luck at designing my first webpage using this software, solely because my mind is not wired for code and this is the only design software I know reasonably well. I have built style sheets for my cascading menus (using Bliss 2 Heavy) but every time I convert to html files the font defaults back to Times New Roman. Blech! Doesn’t match the design at all.

    Also, even though I have given the shape of the menu boxes rounded corners, once I export to html the corners become squared again…

    I would love your advice on how to fix these two big issues — in ‘dummy speak’ please…bearing in mind that I am not at all familiar with code…

    Many, many thanks!
    Crystal (from Vancouver Island, Canada)

Leave a Reply

You can use these XHTML tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>