-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[BUGFIX] Respect language based style names on reading Word files
Microsoft Office saves Office document with language based style mappings for default styles. For example, if a german based Word version is used, it writes following to the `word/styles.xml` in the container archive (*.docs): ``` <w:style w:type="paragraph" w:styleId="berschrift1"> <w:name w:val="heading 1"/> .... </w:style> ``` versus for a english based version it would be: ``` <w:style w:type="paragraph" w:styleId="Heading1"> <w:name w:val="heading 1"/> ... </w:style> ``` The value of `<w:name />` defines the internal native code identifier, whereas the `w:styleId` attribute on the outer `<w:style />` tag would describe the virtual or alias name. Later parsing of the document structure, for example the paragraphs, references the alias (`w:styleId`) name of a style. The reader code uses hardcoded RegEx matchings in a case-insensitive manner but using the englisch speaking variant (`Header\s+d`) - on the language based one, which would not match at all. Therefore, multiple tasks need to be done and contained in this change: * A alias map is implementend and used to register title aliases. Along with this corresponding lookup method is added. * Use the lookup method to resolve for alias where the hardcoded language RegEx is needed to be used. * Gathering all style alias names during reading the wordfile styles settings for all possible styles.
- Loading branch information
Showing
3 changed files
with
87 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters