I mentioned last time I was going with Notepad++ for all my writing. I said the only thing I might miss from MS Word was the word count tool. I also said it's not that important for indie puppers to have an accurate word count.
Ignoring that last bit (and to aid those who just gotta know what the count is), I set out to obtain a substitute for Word's counting tool.
I found a free program called Primitive Word Counter, but when I ran it, I was surprised to discover the program does not give you a count of the total number of words in a chunk of text. Rather, it tells you how many times you used each word (by raw number and percentage).
As a bonus, it provides the total number of different words used, in case you want to check the breadth of your working vocabulary.
(There was also a list of "phrases" used. In the work I used as a sample, the phrase "of the" was used the most [200 times]. Maybe I should watch that in the future.)
Since I didn't want to waste the time I spent downloading the program, I tried a little calculation. It said I used the word "the" the most: 1935 times or 6.56% of the time. I divided 1935 by .0656 and came up with 29,497 (after rounding). That should be a good approximation of the total words.
How close was it to the MS Word count? Word clocked this manuscript at 29,204. Assuming this to be accurate, the error was 293 words, or 1.0033%. Not too bad. Using other words from the top of the list resulted in similar accuracy. (There were 5132 different words listed.) If the Primitive counter had given the use percentage to three places, I could have done better. (Words used only once [more than half the total] produced a 0.00% use rate.)
Since I always end up proofing my books in Firefox, I also downloaded an add-on app for word counting. Its count for the same chunk of text was 29,871. An error of 2.28%. Less good...
Then I messed about with Notepad++ itself. Every word has a space after it (except the last one in a given paragraph). Counting the number of spaces gave me 28,449 "words." That's 2.58% low. Adding one word for every paragraph brought the count to 29,217. I then subtracted the number of dashes (16), because en dashes (when expressed on the final page) add an extra space to the text, distorting the count. Final tally: 29,201 words, just 3 words less than MS Word's count. Error: .01027%.
I'll take it!
But here's the problem I just ellided over. I started with the "completed" HTML version of the manuscript. I opened the file in Firefox and highlighted the main text (to give the add-on app a chance to get a count), then copied and pasted the text into Word for its count.
I also pasted the text into a new tab in Notepad++ to get my text-based count of spaces. (Just open the Find window, tap the space bar to put a space in the "Find what" window, and click Count.)
(Skipping this step, and counting the spaces in the HTML file, is problematical. There are lots of extra spaces embedded in the tags.)
If you use Notepad++ to compose text, you can count spaces in your WIP to get a fast idea of the wordage to that point. The "space" count will be, as we've seen, some 2.5% low. You could just make the adjustment.
To get the most accurate count, however, you need to include paragraphs and dashes into the mix. How do you find out how many paragraphs and dashes there are?
Since I was starting with an HTML version of the book, it was easy to count the number of times /p> showed up. (For more accuracy, I also temporarily removed the front matter of the document.) Similar procedure for counting dashes; run a search for the HTML tag for dashes.
But if you haven't got that far in the writing process, you won't yet have an HTML version to make counting paragraphs easy. In a text only environment, Notepad++ doesn't let you count carriage returns (which appear when you click the Show All Characters icon). And if you're well along in the writing there will probably be too many paragraphs to count manually.
I suppose you could use Primitive counter and make the calculation based on the use of selected words. You'll only be off by one percent.
But there's a better way.
The method I suggest for creating text destined to run through MobiPocket Creator is to leave a blank line (an extra carriage return) between paragraphs. As a result, the number of actual paragraphs is easily calculated. Add one to Notepad++'s last line number and divide by two. Now add this to the number of spaces found.
Dashes in manuscripts are handled differently than final page text. In the draft, I suggest you use two hyphens (and no spaces) to denote dashes (they become en [or em] dashes after a search of the HTML document). This means each "dash" links two words into one extra long one, actually reducing the count. So: Count the double hyphens and for each add one word to the total. (Same with ellipses: add one word for each.)
The adjusted "space" count for this blog post was dead on, compared to MS Word. Error: 0.00%.
Using the HTML file to create a word count is very, very accurate. Just not as accurate as working with the original text file. If you still have one.
No comments:
Post a Comment
Comments are welcome -- and moderated by me. Please be patient.