I'm going to show you a simpler method of converting an rtf or doc file in Word to an HTML file in Notepad++. In the past I've used Mobipocket Creator for making the HTML, but that required an extra step or two in the word processor. This time, I'm going directly from Word to Notepad++.
But not automatically. I don't do "automatic."
Let's say you've written a standard novel in Word. All the paragraphs are indented and the whole shebang is double spaced. Curly quotes, em dashes, etc. Standard manuscript stuff.
If you're an indie pubber, you may no longer need such a document. What you do need is an HTML file you can surround with templates for tables of content and the opf file, along with a cover image: items you can feed into KindleGen. (I feed my book mush into Kindle Previewer, which has a built-in version of KindleGen to make the mobi file. Previewer also makes it easier to see the list of errors and warnings.)
Prepping the rtf or doc file:
Open your book in Word, click "File/Save As," and give it a new title. You don't want to be messing around with the original.
(Today's instructions are for creating a Kindle file for Amazon. You may still need a properly prepared doc file when dealing with other venues. Smashwords, for instance, has rules for preparing Word docs to get the best out of their formatting program, which is ominously called the Meatgrinder.)
Here's how to change your one-em dashes to double hyphens in Word. Click Edit and select "Replace..."
Find what: ^+
Replace with: --
Click Replace All
Now replace ellipses with three periods:
Find what: [Ctrl+Alt+.]
(hold Control and Alt keys down and tap the period key)
Replace with: ...
Click Replace All
It may be your ellipses are already just three periods. In this case, you won't find any "proper" ellipses, of the sort substituted by AutoCorrect.
The technique given above is similar to the one I posted earlier on this blog and included in the book (Kindle Creation for Control Freaks).
I'm starting to think it might be better to hold off converting dashes and ellipses until after the document is safely in Notepad++. In addition to questions about ellipses, I've also encountered some em dashes that were not "legitimate" dashes. A normal search in Word failed to find them.
Converting the document to a text file (which was part of the original process) turned those fake dashes into real hyphens, which I had to dig out of the background by tedious proofreading (e.g., examine all hyphens).
I think I would've been better off copying and pasting directly from Word into Notepad++. There's a decent chance those fake dashes would have come through the way real em dashes do: slightly longer than hyphens.
Real or fake, you can copy and paste one of those guys into Notepad++'s Find window and replace all of them with HTML dash code:
—
or
[space]–[space]
Same for ellipses: find one in Notepad++ and copy/paste it into the Find window. Replace it with:
. . .
An optional final space may be appropriate, depending on where the ellipse appears in the sentence. If surrounded by text, use the final space. If the ellipse comes at the end of a paragraph, don't. (If the ellipse follows a complete sentence, you might want to put a period in front of the first non-breaking space, making four periods in all.)
(Elsewhere on the blog you'll find my warning about using either dashes or ellipses at the end of any paragraph longer than 15 or 20 characters. Bad things happen when those guys are allowed to make contact with the right-hand margin.)
You can also use the HTML code for ellipses (…). It produces a small cluster of three dots, which will be centered in however big a space Kindle has cleared out in its quest to justify the right margin. I personally don't like that look. If you go this way, I suggest you put a non-breaking space ( ) in front of it. Couldn't hurt, and might actually work. Though not at gargantuan font sizes.
Waiting to convert dashes and ellipses in Notepad++ saves a step or two, and has no downside I can see. But don't give up on Word just yet. There's still some very useful stuff you can do to get your rtf or doc file ready for ebook publication.
With just a few keystrokes you can find all the text you've put in italics and mark it with HTML tags. This will save you buckets of time later. Here's how:
Find what: [Ctrl+i]
Replace with: <i>^&</i>
Click Replace All
To reset the search window, click "More" and select "No Formatting."
Same with words set in bold, if you have 'em:
Find what: [Ctrl+b]
Replace with: <b>^&</b>
Click Replace All
If you've already taken the time to set off the first few words of the opening paragraph of your chapters, this is a good time to preserve some of that formatting. Let's say you've put the words in bold and italics:
Find what: [Ctrl+b,i]
Replace with: <b><i>^&</i></b>
Click Replace All
Now we're going to add paragraph tags. We'll do that by searching for paragraph marks and replacing them with the appropriate code:
Find what: ^p
Replace with: </p>^p<p>
(and yes, the end tag comes first this time)
Click Replace All
Add a <p> tag to the beginning of the first paragraph, then go to the bottom of the document and delete the extra tags you may find there. Add this to the end:
</body>
</html>
Save this document as a "text only" file.
All of these searches work well in Word. I looked into Open Office Writer and found I couldn't search for special characters (paragraph marks, em dashes, and so forth). Maybe someone has found a way around that.
As for using Writer to enclose italics and other formatting in HTML tags, you can easily search for text in italics (though you have to select it off a font list; Ctrl-i doesn't work). Unfortunately, I can see no way to use wild cards (^& doesn't work here). You'd have to go to every use of italics or bold and manually add the front and back tags. Very disappointing.
Before leaving Word, make sure you have "chapter" or "chap" at the top of each chapter to make it easy to search for later. Also, put a symbol (like "#") on its own line to separate section breaks inside chapters.
Let the file cool off in Word for a moment and open Notepad++. Click the "Encoding" box (along the top line) and select "Encode in UTF-8 without BOM." Along the bottom of the page it will say: ANSI as UTF-8. (If you later format your book for ePub, this setting will help get your HTML file past the validator.)
Back to the Word document. Hold the Control key and tap the letters a and c, one at a time. This will select the entire document and copy it to the clipboard.
Go back to Notepad++ and paste the results there. Click File and select Save As. Call it "book" and pick "Hyper Text Markup Language" as the file type. You should save your book in a new folder set up just for it.
Use the book's actual title for the name of this new folder. Calling the file "book" will come in handy later, when you use my templates for TOCs and the opf file.
Now run the Punctuation Swap Grid, available on the right-hand sidebar of this blog. Follow the instructions precisely, running the searches in the order given. Happily, all but the first two searches are "Replace All."
(The Grid presumes you've already set up your dashes as double hyphens, etc. If you skipped that step in Word, copy and paste examples of a dash or ellipse in the Find window, as detailed above.)
When you're finished, do a quick search for single and double straight quotes, just to make sure all of them have been dealt with. If you include some odd constructions in your novel (quoting words after dashes or inside parentheses, for instance), the Grid will ignore them. Fix your file now before you begin proofing. (Also, do it before you add HTML code that contains quote marks you need to leave alone.)
Next, grab a template for front-of-file HTML and paste it at the top of your document. (You may want to try the beginning of "A Short Book," located at the top of the Templates list.) Modify it for this work (title, copyright date, and so forth). You could also remove any styles you won't be using. If you mostly write novels, you should probably customize your version of the front-end template and store it in your Kindle Making folder.
At this point I normally suggest you change all paragraphs to "body" style, then search for chapter and section starts to change the first paragraph there to "start" style. Alternatively, you could set all paragraphs to body style by putting this:
p {margin:0%; text-indent:1.2em;}
in your <style> section. Then you only have to add the "start" class style to the appropriate paragraphs:
<p class="start">Blah blah blah.</p>
You might also use two different "start" styles if you want more "margin-top" at the beginnings of chapters and less space between sections inside chapters. If you use an image for the chapter title, you may want little or no margin-top for the first paragraphs of chapters.
Some folks use 1.3 or 1.4 em for their indents. Feel free to experiment with the spec. Just remember, too big an indent makes the text look like a manuscript, not a finished book. Real book indents are pretty small.
Each section start should already have a "#" or other symbol to help you locate it. Add the "start" class for the first paragraph. Delete the symbol, but leave a blank line to make editing easier. (The empty line won't pass through into the book, which is why you have to use margin-top in the paragraph style to force the gap between sections.)
Another reason you need to search for chapter starts is to add a page break between chapters (most folks want one) and a div id in front of each. You could just copy and paste what you find in the template at the beginning of Chapter One. Remember to adjust the chapter id numbers so the tables of content will work properly. (You don't want all your chapters coded "chap_1".)
Add your boilerplate "About the Author" hunk (along with its own div id) after the last chapter and before the /body tag. Customize it for the genre (and audience) of the current work and mention any relevant books published since your last outing.
What remains is the usual stuff: special treatment for the first few words of chapters and sections, a method of naming or numbering your chapters, using either header code or images of numbers created in Gimp (see below for a link).
Images will retain the proper spacing top and bottom, which is not always easy to obtain in HTML for all devices. On the other hand, images add to the file length of the book, which could mean larger download fees removed from your royalties (but only at 70%).
Open your new book in a browser, narrow the window size to simulate ebook format, and start proofing.
Eventually you'll need to load templates for the tables of content and the opf file, modifying each for the current book. If you have to add chapter listings to the TOCs, save this new, longer file as your template. It's always easier to delete chapter listings than to add them, especially in the ncx toc.
Go here to get a random number you can use in the ncx and opf files (same number for both, please). Some time ago I grabbed a hundred numbers off that site and pasted them into a notepad file I put in my Kindle Making folder. Copy the first unused number and paste it into the special files. Remember to annotate the listing to show the number has been used and for what book.
Load up Gimp (or your preferred photo-editing software) and create a cover. I like to make a title page image as well. It will work in all versions and screen resolutions of Kindle, now and in the future, and at every font-size selection (which is out of your hands). Using HTML header style to cobble together a title page is just asking for trouble.
I want to thank Guido Henkel for the shorthand method of tagging italics (and other text formats) in Word, using the wild card symbol. The symbol used above (^&) means "find what text."
Henkel's multi-part blog post on ebook formatting includes his technique for chapter titles. He creates a paragraph style instead of using header code. He also doubles up his "centering" command by using both a "class" and a "span" to apply it, a sort of belt and suspenders approach that may be useful when publishing across multiple platforms.
(Generally, I only use spans to apply "first words" treatment to the beginning lines of chapters and sections. Examples of this use can be found in the second half of the "Short Book" template.)
Nowadays I use Notepad++ for all my writing, even novels. When I want a dash (and I try hard not to), I just type the two hyphens and move on. Same thing for ellipses: I type the three periods and keep going. In the end I use the Swap Grid to change the punctuation to the desired version. (You may already have noticed I prefer en dashes to em dashes for ebooks. Up to you, though.)
I use the Search box in "extended mode" (bottom left, Search Mode section) to add paragraph tags:
Find what: \r\n
Replace with: </p>\r\n<p>
Click Replace All
The \r searches for CR (carriage return), the \n for LF (line feed), embedded formatting commands that are hidden until you click the pilcrow (paragraph) button on the features line. (Note these are back slashes.) The result is the same as the search in Word: I have to add a paragraph start tag at the beginning of the first paragraph and delete the extra one at the end of the document.
(By the way, "pilcrow" is short for "pilled crow," "pill" being a form of "pillage" that means stripped or [in this case] de-feathered. The Oxford Universal Dictionary says the word is archaic and that its history is obscure. It dates from 1440.)
Bottom line, it's still handy to have an old copy of Word installed on your computer, if only to automate the tagging of italics (and other stuff) in novel manuscripts. After that, copy and paste the whole deal into Notepad++ and run the Swap Grid (modified as needed to find dashes and ellipses by copying instances found in the text). Add paragraph tags using the extended search, slap on some front-end HTML code from a template, and you'll be well on your way to producing a new ebook.
[Check the earliest posts on this blog for more detailed information on paragraph styles, page one treatment, image creation in Gimp, and so forth. Or get a copy of Kindle Creation for Control Freaks, which spends more time with each topic.]
Saturday, February 21, 2015
BACK TO BASICS
Labels:
converting book files,
doc,
Gimp 2,
html,
Kindle formatting,
Kindle Previewer,
notepad++,
rtf,
Word
Tuesday, February 17, 2015
NO JUSTIFICATION FOR IT
The television ad for the latest version of the Kindle contains an image of the device displaying the opening of some document. There's an ornate drop cap at the beginning, probably created by embedding a font file in the mobi and using split code to put the big capital in kf8 (Fire) devices while using a simple two em capital for older devices, where drop caps won't work. (I guess it could also be a dropped image of a capital letter.)
You can link to one of my more popular posts for information how to use drop caps. There's also a post showing you how to use embedded fonts.
But I should remind you using drop caps is problematical: You can get good vertical spacing in a few font sizes, but the big caps drift out of spec for other sizes, showing up a little too high or too low, relative to the surrounding lines. Different letters may also require different spacing specs. Getting a nice fat capital letter to look perfect in just one font size, as in the TV ad, is pretty easy.
But so much for drop caps. I think Amazon has always used them in their advertised samples.
The thing I find most interesting about the new Kindle is that the text is not justified on the right-side margin.
Elsewhere on this blog you will find my rants about right-side justification, and how it messes up the look of the page on your Kindle—depending on the font size, which you are not at liberty to specify.
Dashes and ellipses are particularly affected, which is why I make a great effort to eliminate as many of those guys as I can in my writing and editing. I have a post about that, too.
Up to now, the default mode of display is justified right margin. Amazon made a point of preferring it. They said it gave their customers the experience they desired from an ebook.
Have they changed their minds about that?
My position has always been this: If you can't hyphenate the text, don't justify the right-side margin.
I have a feeling automatic hyphenation (and it would have to be automatic, unless you had a way of locking the font size option) is still a ways off.
Is Amazon listening to complaints about justification?
It is possible to insert code into your mobi file to force non-justified text for your book. In the past Amazon has objected to the use of that code.
I've also heard some readers find non-justification of the text to be grounds for returning the book for a refund.
How you react to these facts is up to you.
If you want to go ragged on the right side, just add:
text-align:left;
to your paragraph styles.
(It works in mobi and kf8 [Kindle Classic and Fire], at least according to Kindle Previewer.)
You can link to one of my more popular posts for information how to use drop caps. There's also a post showing you how to use embedded fonts.
But I should remind you using drop caps is problematical: You can get good vertical spacing in a few font sizes, but the big caps drift out of spec for other sizes, showing up a little too high or too low, relative to the surrounding lines. Different letters may also require different spacing specs. Getting a nice fat capital letter to look perfect in just one font size, as in the TV ad, is pretty easy.
But so much for drop caps. I think Amazon has always used them in their advertised samples.
The thing I find most interesting about the new Kindle is that the text is not justified on the right-side margin.
Elsewhere on this blog you will find my rants about right-side justification, and how it messes up the look of the page on your Kindle—depending on the font size, which you are not at liberty to specify.
Dashes and ellipses are particularly affected, which is why I make a great effort to eliminate as many of those guys as I can in my writing and editing. I have a post about that, too.
Up to now, the default mode of display is justified right margin. Amazon made a point of preferring it. They said it gave their customers the experience they desired from an ebook.
Have they changed their minds about that?
My position has always been this: If you can't hyphenate the text, don't justify the right-side margin.
I have a feeling automatic hyphenation (and it would have to be automatic, unless you had a way of locking the font size option) is still a ways off.
Is Amazon listening to complaints about justification?
It is possible to insert code into your mobi file to force non-justified text for your book. In the past Amazon has objected to the use of that code.
I've also heard some readers find non-justification of the text to be grounds for returning the book for a refund.
How you react to these facts is up to you.
If you want to go ragged on the right side, just add:
text-align:left;
to your paragraph styles.
(It works in mobi and kf8 [Kindle Classic and Fire], at least according to Kindle Previewer.)
Labels:
drop caps,
new Kindle fire,
right justification
Subscribe to:
Comments (Atom)