Can I just paste content from Word?

The most likely complaint you can get after launching a Joomla site is, that the owner calls you or mails you, saying, that the page is looking weird after he added his first content item. Blindly you can bet, that he (sometimes despite your warnings) just copy/pasted that content from a Word document, another web-page or some other word processor. The order above isn't an arbitrary one, BTW! What you can do? Humm, you can clean his content item, but you must make them understand, why isn't a good idea to do this, and you must provide them with an alternative: an easy way to paste from Word without damaging (badly) the site.

The interesting and unexpected about this is that the worse result is showing on IE8, on most cases FF, Chrome, Safari are producing a good or at least acceptable result, while the output in IE8 is a complete disaster.

The surprise is bigger even, when you have chosen the best available editor for Joomla, which has built in tools to avoid the problems like this: JCE Editor.

Let's see what is the root of the problem, and how can be avoided! You can get the first clues if you look to the article in raw HTML mode - especially when you compare it with the same view of a demo article shipped in Joomla. You don't need to be a HTML guru to see the obvious difference: the article pasted from Word has a lot of extra code in it. Most of that code is standard HTML, but someone with at least marginal HTML knowledge will see, that there is much more, than that: an ocean of Microsoft specific code, which may help MS Word to render the content nicely, but obviously is a junk when the same text should be shown in a web-page, especially because important part of this junk code have nothing to do with HTML standard codes, and often have serious syntax-breaking errors jeopardizing the end result. Yes, this is a factual thing: MS Word adds a lot of specific markup to the content that will be carried over to JCE or other WYSIWYG editors.

The markup is there to make sure the document looks the same when pasted. since is built by Microsoft, and optimized for their own software solutions,  in most cases it is is totally useless and creates all sorts of problems. The above mentioned problem is just one. Another common problem is that articles pasted from Word look different from other content on the site, as it contains style info that conflicts with the stylesheet of the Joomla template. This has been a problem for years, and it will probably exist as long as Word is out there…

Bottomline: Don't paste directly from Word

So, the problem is not with Joomla itself, but with the content entered into the page. Let's see a very concrete example: if you look to the source of article/page is very probable to found code chunks like this, obviously originating from MS Word.

 [if gte mso 9]...

This code snippet means that the code between this bracket and his closing counterpart is intended to be shown only in Internet Explorer version 9 or greater (and in other browsers, since IE is the only one looking at these IF statements). And this is only one of examples, believe me, there are even brutal HTML standard violations too. The interesting thing is, that most browsers are behaving standard wise: they are just dropping the code they aren't understanding. This explain why in these browsers the page is looking generally acceptable. But IE tries to render this Microsoft specific code too. But the code is generally already malformed due to Cut&Paste operation and the user edits, and other factors. A recipe for disaster.

If you must, then pasting from Word should be done with caution. What you can do? Use a good Joomla editor, as JCE - as I said above. Be careful not to paste directly from Word into JCE, or at least use the "Paste from Word" function when doing so. That will remove all (let's be realistic: most) of the MS Word specific code from your page.

For most of the cases another workaround exist (and can be used along with the JCE's Paste from Word snippet: first transfer the text to Notepad or another simple text editor, save it, then copy-paste it from there. This will clear most of Microsoft specific junk.

Of course, in an ideal world the user should type in the new content directly in JCE or even the default Joomla editor, and format it there. Or, even better, every single Joomla user should be a HTML guru... and beer and chicks should be free ;)

The bigger problem is, when the worst is done: a large amount of text is already pasted into the site using Word. What can be done in this case?

Sometimes is easier to cut&paste the content from the article, paste it to notepad, save it, then cut&paste it back. But with large amount of articles this might be out of question. Fortunately JCE has a tool for that too: a nice button called "Clean up HTML", and his twin, the "Clean up Formatting" button. These are located by default on the far right end of the first icon row in JCE. Use them.

Setting up JCE to improve pasting from Word

If you have users on your site that often forget to paste using the "Paste from Word" function you can force JCE to strip Word styles.

Go to Components -> JCE -> Editor Profiles -> [Your profile]

Then navigate to "Plugin Parameters" and select the tab "Paste":

On this tab, you have some options that you can activate to have JCE remove more markup from the pasted content.