Red Hat Bugzilla – Bug 1487615
bash fails to execute commands containing multibyte characters
Last modified: 2018-04-10 09:51:33 EDT
Description of problem: Seen on RHEL7.4 installed from scratch only. When an executable contains non-ASCII character (e.g. "é") and the executable is not found, bash prints the executable name in a non human-readable manner: # é bash: $'\303\251': command not found Maybe this issue is not within bash, but I could not found the root cause yet. Unexpectedly, when upgrading a RHEL7.3 system with bash from RHEL7.4 (or completely upgrading the system), the issue cannot be reproduced on the newly upgraded system. Version-Release number of selected component (if applicable): bash-4.2.46-28.el7.x86_64 How reproducible: Always Steps to Reproduce: 1. Install a RHEL7.4 VM from scratch 2. Run "é" Actual results: # é bash: $'\303\251': command not found Expected results: # é bash: é: command not found Additional info: I'm not sure at all if it is due to Bash itself for the following reasons: - trying with zsh on RHEL7.4 doesn't reproduce the issue - upgrading a RHEL7.3 system to RHEL7.4 doesn't reproduce the issue
Created attachment 1322344 [details] Fix quoting for wide characters Backported patch from http://git.savannah.gnu.org/cgit/bash.git/diff/lib/sh/strtrans.c?h=devel&id=7610e0c52
Comment on attachment 1322344 [details] Fix quoting for wide characters > diff --git a/lib/sh/strtrans.c b/lib/sh/strtrans.c > --- a/lib/sh/strtrans.c > +++ b/lib/sh/strtrans.c > @@ -208,6 +211,8 @@ ansic_quote (str, flags, rlen) > char *r, *ret, *s; > int l, rsize; > unsigned char c; > + size_t slen; > + DECLARE_MBSTATE; > > if (str == 0 || *str == 0) > return ((char *)0); > @@ -219,9 +224,11 @@ ansic_quote (str, flags, rlen) > *r++ = '$'; > *r++ = '\''; > > - for (s = str, l = 0; *s; s++) > + s = str; > + slen = strlen (str); > + > + for (s = str; c = *s; s++) > { > - c = *s; > l = 1; /* 1 == add backslash; 0 == no backslash */ > switch (c) > { I know it comes from the upstream commit ... but why is the slen variable assigned strlen(str) if the value is not used anywhere? Also the 's' variable is assigned twice in a row for no apparent reason. Could not it suggest that the upstream commit was incomplete?
Created attachment 1324104 [details] Fix quoting for wide characters Besides upstream patch from comment 2, I have also backportec below patches : http://git.savannah.gnu.org/cgit/bash.git/diff/lib/sh/strtrans.c?id=ec860d767 http://git.savannah.gnu.org/cgit/bash.git/diff/lib/sh/strtrans.c?id=c920c360d http://git.savannah.gnu.org/cgit/bash.git/diff/lib/sh/strtrans.c?id=be06f7783 http://git.savannah.gnu.org/cgit/bash.git/diff/lib/sh/strtrans.c?id=595e3e692
Comment on attachment 1324104 [details] Fix quoting for wide characters Looks good.
As part of testing process I decided to try and reproduce this bug on every single character up to end of Unicode's Supplementary Multilingual Plane (sorry CJK!). Detailed results and reproducer script are attached to this report as `unicode-results.tar.gz`. As a summary, below is number of characters that trigger the bug: $ grep -c '\$' *.txt bash-4.2.46-28.el7.txt:129407 bash-4.2.46-30.el7.txt:64051 bash-4.4.12-7.fc26.txt:48655 My understanding of these results: 1. Patch greatly improves the situation - after applying it, more than half of characters don't trigger the issue anymore. 2. Patch doesn't bring bash 4.2 to bash 4.4 level - there are over 15000 characters that trigger the bug in former, but not the latter. 3. bash 4.4 is far from ideal - there are almost 50000 characters that still trigger the bug (that is more than 1/3 of all tested characters). I am kind of lost here. On the one hand, I want to mark this issue as verified. Character from original report no longer triggers it, patch brings vast improvements, most of affected characters are affected upstream as well and chances of running into this are minuscule anyway (who puts emoji in their command line tools names?). On the other hand, issue is clearly not fixed. Thoughts?
Created attachment 1333720 [details] Test results for bash 4.2 before and after the patch, plus bash 4.4
Mirosław, Thanks for the thorough tests. It looks like for most of the characters where bash triggers error string "$<unicode character>: command not found", do not have a corresponding unicode character in the utf-8 table. Also, in some cases bash uses a separate character '𘊴' (instead of $<unicode character>) while printing the error message for some of these unicode characters.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0800