Wednesday, March 9, 2011

Getting clean HTML from a Word document using Gmail

Recently, I had to convert a word document into an HTML file. First, like everyone else, I tried to directly save the word file as HTML web page from word.
This saved the file as HTML alright, but when I went to view the HTML source, there were just too many unnecessary tags and the HTML looked almost illegible. So I had to find other means to get this done.

After some search on the Internet, I got to know the following method which helped me get clean HTML out of my word document.
1. Send the word document as an attachment to your Gmail account.
2. Log into Gmail.
3. Open the email with this attachment and click on the "View" next to the attachment.
4. This will open up a new window/tab where Gmail tries to read your word document.
5. On this page click on View->Plain HTML
6. Now you should see the same data without all the formatting as just plain text.
7. Right Click on this Page->View Source to get the HTML portion of this page.
8. The HTML you have right now will be very clean and you can easily work with it from here on.

That's it!

