Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse error - o:p tag I think #60

Open
malcb opened this issue Dec 31, 2021 · 1 comment
Open

Parse error - o:p tag I think #60

malcb opened this issue Dec 31, 2021 · 1 comment
Assignees

Comments

@malcb
Copy link

malcb commented Dec 31, 2021

I tried to convert a web page and got parse error. I added a html validator to check the web site and that suggested that the problem might be <o:p></o:p> tags. These are not standard tags but are added by MS word (typical!). I saved the web page and stripped <o:p></o:p> and then tried again with the local file. This time there was no parse error. Hence it looks like the problem is MS, as usual. Perhaps the fix would be to ignore unknown tags rather than throwing an error.

@malcb
Copy link
Author

malcb commented Jan 3, 2022

The same parse error occurs when the web page has errors too. This can be invisible errors, that is missing closing tags, corrupt tags, or similar that the browser overcomes so that the page still renders ok. I think the browser must just ignore the error so the text still displays ok, hence the error is invisible, but the parser in save-as-ebook throws out the text so the ebook doesn't match the web page.

I have a work around for this for anyone having similar problems. The extension rewriter allows you to set up rules for rewriting a page and these rules apply to changing the html too. Rewriter seems to affect the the whole page, not just the visible text. Hence rewriter can be set to remove all <o:p> and </o:p> tags so that save-as-ebook will work ok (unless the page has other errors, which is how I found that this was another problem). Rewriter can be restricted to specific URLs so you can limit the effects to just where you need it. The matching and replacing use regex so it is very powerful if need be but replacing just the o:p tags does need anything complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants