Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lost characters with complex emoji in command history (Grapheme Clustering) #6364

Open
peterjc opened this issue Nov 6, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@peterjc
Copy link

peterjc commented Nov 6, 2024

What Operating System(s) are you seeing this problem on?

macOS

Which Wayland compositor or X11 Window manager(s) are you using?

No response

WezTerm version

20240203-110809-5046fc22

Did you try the latest nightly build to see if the issue is better (or worse!) than your current version?

No, and I'll explain why below

Describe the bug

Using most emoji and a sample of double-width Asian characters in commands does not seem to be a problem for the history in WezTerm.

However, composite emoji like "pirate flag" ("🏴‍☠️") or "farmer" ("🧑‍🌾") using a zero-width-joiner (ZWJ) have a display width (2) which is smaller than the sum of their parts (2 + 0 + 2 = 4) and is likely the cause of the glitch.

The pirate flag is black-flag, ZWJ, skull-and-crossbones. See https://en.wikipedia.org/wiki/Zero-width_joiner for background on ZWJ, including non-emoji real language examples.

See also https://mitchellh.com/writing/grapheme-clusters-in-terminals and https://contour-terminal.org/internals/text-stack/#other-terminal-emulator-related-challenges which notes mismatches in wcwidth() and utf8proc_charwidth.

See also https://github.com/contour-terminal/terminal-unicode-core for the "Terminal Unicode Core Specification" which is truing to standardize grapheme cluster behaviour. I see from #4320 that WezTerm should be following that.

To Reproduce

Running with no user configuration on

  1. start wezterm from finder.
  2. press Apple+EQUALS a dozen times to enlarge the font, and resize window accordingly
  3. Run the following:
(original prompt) $ export PS1="$ "
$ clear
$ uname -rsm
Darwin 21.6.0 x86_64
$ echo "[AB]"
[AB]
$ echo "[🏴‍☠️]"
[🏴‍☠️]
$

In the above the final command was typed using e, c, h, o, space, double-quote, open-square-bracket, ctrl+apple+space, search for pirate, select pirate flag, close-square-bracket, double-quote, enter.

[You can also use paste, or the system menu "Show emojis & symbols" but the above is simpler to describe reproducibly.]

So far so good, the pirate flag printed as expected with a width of two characters:

Screenshot 2024-11-06 at 10 02 31

Now press up to pull back the previous command echo "[🏴‍☠️]" from the history, again so far so good, the cursor is after the closing quote around the flag example:

Screenshot 2024-11-06 at 10 03 12

Now press up again to pull back the command echo "[AB]", and this time you should see:

$ echoAB]"
Screenshot 2024-11-06 at 10 05 19

Again the cursor is after the closing quote - but is it as if three characters are missing. The expected output is:

$ echo "[AB]"

I presume there is some cleverness not to redraw the first unchanged 7 characters echo "[ and only overwrite the changed part from 🏴‍☠️]" to AB]". The off by three then fits with wrongly using an alternative length of 5 for the pirate flag?

With a longer prompt, and a longer command, it is often nigh impossible to edit the command as one might expect.

Configuration

no config

Expected Behavior

I would expect even when using unicode characters with ZWJ (grapheme clustering), moving through the command history would work without mispositioning.

Logs

No response

Anything else?

No response

@peterjc peterjc added the bug Something isn't working label Nov 6, 2024
@peterjc
Copy link
Author

peterjc commented Nov 8, 2024

I should clarify this was using the Apple provided elderly bash, and also happens on ARM.

@peterjc
Copy link
Author

peterjc commented Nov 8, 2024

Using the Apple provided zsh, this does not happen - while composing the command you don't see the composite character, but rather the three elements which make it up. That feels like a different issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant