Parse error - o:p tag I think #60

malcb · 2021-12-31T17:38:28Z

I tried to convert a web page and got parse error. I added a html validator to check the web site and that suggested that the problem might be <o:p></o:p> tags. These are not standard tags but are added by MS word (typical!). I saved the web page and stripped <o:p></o:p> and then tried again with the local file. This time there was no parse error. Hence it looks like the problem is MS, as usual. Perhaps the fix would be to ignore unknown tags rather than throwing an error.

malcb · 2022-01-03T09:18:31Z

The same parse error occurs when the web page has errors too. This can be invisible errors, that is missing closing tags, corrupt tags, or similar that the browser overcomes so that the page still renders ok. I think the browser must just ignore the error so the text still displays ok, hence the error is invisible, but the parser in save-as-ebook throws out the text so the ebook doesn't match the web page.

I have a work around for this for anyone having similar problems. The extension rewriter allows you to set up rules for rewriting a page and these rules apply to changing the html too. Rewriter seems to affect the the whole page, not just the visible text. Hence rewriter can be set to remove all <o:p> and </o:p> tags so that save-as-ebook will work ok (unless the page has other errors, which is how I found that this was another problem). Rewriter can be restricted to specific URLs so you can limit the effects to just where you need it. The matching and replacing use regex so it is very powerful if need be but replacing just the o:p tags does need anything complicated.

alexadam self-assigned this Apr 14, 2023

alexadam added bug enhancement labels Apr 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse error - o:p tag I think #60

Parse error - o:p tag I think #60

malcb commented Dec 31, 2021

malcb commented Jan 3, 2022

Parse error - o:p tag I think #60

Parse error - o:p tag I think #60

Comments

malcb commented Dec 31, 2021

malcb commented Jan 3, 2022