I have a number of word documents that will be converted to HTML. It is required the paragraphs in the word documents should be converted to <p> elements.

After some tests with the Microsoft Office API's SaveAs method to convert the documents to the HTML, I realized the paragraphs with manual line breaks (break by "Shift-Enter") couldn't be placed in a separated <p> element, instead the paragraphs are grouped in a same <p> element.

In order to separate them, I have been trying to replace the "Shift-Enter" line breaks with the "Enter"/Carriage return before doing the conversion. However, I couldn't find a suitable way to do the line break replacement job. I have tried the WdLineEndingType parameter in the SaveAs method, but it seems not effective for the issue.

For those looking in MS Word: use Control-H {Find & replace].

Find Special character: Manual Line Break (^L)

Replace with: Paragraph Mark (^P)

Replace All will do the whole document.

up vote 3 down vote accepted

The ms-word office API provides a find function in the Range object, enabling to search and replace the strings.

The following code is to find the manual line breaks("^l") with the carriage return("^p").

Range r = oDoc.Content;
r.WholeStory();
r.Find.Execute("^l", ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, "^p", WdReplace.wdReplaceAll);

Then use SaveAs to convert the word document to HTML, it will properly place each lines in <p> elements.

Paragraph mark ( Paragraph mark )

^p (doesn't work in the Find what box when the Use wildcards option is turned on), or ^13

Your Answer

 

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Not the answer you're looking for? Browse other questions tagged or ask your own question.