[BUGFIX] Respect language based style names on reading Word files #2597
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[BUGFIX] Respect language based style names on reading Word files
Microsoft Office saves Office document with language based style
mappings for default styles. For example, if a german based Word
version is used, it writes following to the
word/styles.xml
inthe container archive (*.docs):
versus for a english based version it would be:
The value of
<w:name />
defines the internal native codeidentifier, whereas the
w:styleId
attribute on the outer<w:style />
tag would describe the virtual or alias name.Later parsing of the document structure, for example the
paragraphs, references the alias (
w:styleId
) name of astyle. The reader code uses hardcoded RegEx matchings in
a case-insensitive manner but using the englisch speaking
variant (
Header\s+d
) - on the language based one, whichwould not match at all.
Therefore, multiple tasks need to be done and contained
in this change:
aliases. Along with this corresponding lookup method is
added.
hardcoded language RegEx is needed to be used.
wordfile styles settings for all possible styles.