Description of problem:
I'm used to putting spaces around pipes to improve readability. This has been
broken in terms lately though. A command like :
[nim@ulysse nim]$ ps auwx | grep tomcat
bash: grep: command not found
The problem is linked to the key sequence I use to type pipe-space with my
keyboard layout : pipe is altgr+6 ant thus space is often altgr+space which is
defined as nobreakspace.
Anyway there is no reason for a shell to distinguish between space and
nobreakspace (unlike a text editor) so bash should be taught to treat all spaces
the same way.
Steps to Reproduce:
1. set fr-latin9 as default keyboard layout
2. type a command involving pipe-space, keeping altgr pressed when space is pressed
3. watch bash complain
bash-2.05b-25, probably allways been broken
See the bash man page, in particular the section about the IFS variable.
So what ?
Is there a good reason nobreakspace should not be added to the default IFS value ?
I really do not want to tweak IFS on dozens of systems unless there is a very
good reason not to add nobreakspace to IFS by default
Yes: it can be a filename character.
I'd agree with the thought. Except I've yet to find any app that uses
nobreakspace in file names and users certainly do not.
I can't remember ever not having to escape/quote file names with spaces, because
everyone uses normal space in them.
So I really don't see it an IFS showstopper. When it's so easy to insert
nobreakspace after a pipe on some layouts, it's a shame it's not in IFS. At
least in text editors it makes sence, because one *sees* the nobreakspace.
The initial value of IFS is mandated by POSIX.
Sounds like your problem is with the font used in whichever terminal you have
not displaying the non-breaking character correctly as a rotated '[' or whatever.
Seeing nobreakspaces in shell might help. But is it really possible ? It can't
be done changing fonts - people will use whatever font they like (vera,
corefonts...) and showing them is app dependant - do we want to see
nobreakspaces in html pages for example ?
I don't suppose whoever wrote the POSIX entry thought about nobreakspace. Is it
even available without UTF8/latin9 ? I'd hate to be stuck forever because no one
really thought of it and no one wants to discuss what was written long ago.
i'm not sure if it's a bug in urw-fonts. What is correct to display for
altgr+space? I have tried xterm with fixed font, it also has this problem!
Owen: any comment about this issue?
No-break-space should show as a space; if it displayed anything else,
it wouldn't work for text files. I don't think there is anything
to do here except to change the AltGR+space keyboard mapping
not to produce a no-break-space. (This doesn't seem a like a very
useful thing to me as compared to obvious problem here.)
Reassing to XFree86 - Mike Harris will probably want a bug filed
nobreakspace is *very* useful in locales where typographic rules require spaces
before punctation marks like : (ie in french for example like here). Without it
one can not do any serious editing because most/all the text apps out there will
consider very smart to go to line before ending : or Â» (really, even MS word had
to add a nobreakspace access because despite all the automated workarounds in
its code people still have to handle it manually).
The problem here is bash follows posix and posix clearly didn't thought about
nobreakspace. Can't we add the working behaviour to bash, at least in its
non-strict posix mode ? Or ask whatever org handles Posix or LSB their thoughts
on this ?
All the various workarounds proposed so far involved killing one class or app or
another, when the core problem is really in bash defaults.
Think about what you're asking:
That a non-breaking space character, i.e. one that should *not* be used to
delineate separate words, be used a a marker for splitting words from each other.
It's entirely clear to me that it would be the wrong thing to do, and that bash
is doing exactly the right thing in this case.
Well I'd agree with you *if* someone somewhere was taking advantage of this
*and* the shell provided some sort of visual hint a non-breaking space is used.
Right now both conditions are false : no app I know of uses it the way you
suggest, and when it's used otherwise (ie as a separator, wich does happen) the
shell will issue a very confusing error message (really what would *you* do if
your shell told you bash: grep: command not found or you found it in some
server logs - cron jobs and such do use pipes)
People do not think of non-breaking spaces in shells. Hell even if an app
somewhere decided to get smart and replace spaces with non-breaking spaces in
filenames it'd be shot down by the number of disgrunted users that could no
longer figure how to type the file names in their shells (because a space and a
non-breaking space look the same in a term unlike a text processor).
Really from a usability POV, a shell should not differentiate between
space/non-breaking space. Or if it does provide some kind of bloody visual hint.
But the whole default as mandated by posix shows its writers didn't want to get
into makefile-style whitespace mess - tabs are treated exactly like spaces, as
non-breaking spaces should.
Actually, looking though the POSIX shell grammer, it implies that the
delimitation of tokens should be done based on isspace()/LC_CTYPE.
And in non-C locales, isspace(U+00A0) should be true; so perhaps
in theory this *is* a bash bug. In pratice though, I suspect the chance
of things changing here is miniscule. It's certainly something where
we would be very hestitant to deviate from upstream.
(Note that the default IFS only contains U+0020, so U+00A0
won't be interchangeable with space in all circumstances, just when
parsiing input into tokens.)
So, if the nobreakspace key combination is desired, I suspect that the
correct resolution is to move it back to bash and WONTFIX. Though
you could always try to get the change upstream.
Single UNIX, and POSIX last time I read it, says IFS starts as: the space
character, the tab character, and the newline character.
No bash bug here. Even if you thought that changing bash would make things
work: what about all the other shells?
FWIW, shells are not used only by users. It is perfectly OK to generate a string
of random non-(0, slash, space, tab, newline) characters and use them in a shell
script for a file name. Such script would break by changing IFS if all uses of
the variable were not quoted ("$key").
The original problem can be worked around by
alias ' grep'=grep
(where the second space is non-breaking), if there are only a few commands
often used after | (which true in my case).
* Note that tokenization *does not* use IFS.
Try IFS="@ " shell -c "ls@|@cat"
It won't work. If you read the appropriate section of POSIX,
it strongly implies that LC_CTYPE does effect tokenization.
I'm not saying that bash's behavior should change - it might
be a noticeable speed hit, it would involve changing lots of
code, and it might even be security concern; but, I do think
that bash's current behavior doesn't comply with the letter
* Generally, any time you use filenames in a shell script,
you should generally quote so it works if you have (normal)
spaces in them. If your shell script doesn't handle filenames
with spaces, it doesn't really matter if it also doesn't handle
filnames with non-breaking spaces in them.
Btw the problem with the aliasing is it breaks as soon as one tries to use a new
command or type two nobreakspaces. So it's a bit helpful, but not much.
I'll post the bug url to gnu.bash.bug so hopefully bash maintainers can chirp in.
Add non-breaking space to the .inputrc.
As has been pointed out, this is an interactive user-mistyping type of bug,
not a bug in semantics or behavior.
It should easily be fixable by having non-breaking space produce standard
space from within readline.
Another data point: the libc locale definition for LC_CTYPE
does *not* include U+00A0 (non-breaking space).
(From my reading of ISO C, it is not really obvious
whether it belongs there or not).
I have filed a defect report against the POSIX standard regarding the
locale-dependent definition of token recognition.
Brian's solution looks nice, but I wouldn't want it on default
install. It's as if readline were silently rewriting 'a' to 'b'.
[ about using .inputrc ]
Except for the wonderful world of shell scripts and crontabs.
The original typing might be interactive, but the consequence may also manifest
itself long after on another computer in a non-interactive environment.
(though as workaround go this is certainly one of the best suggested yet)
Shell scripts and crontabs are not typed interactively, they are typed into an
editor (and, in the specific case presented here, into an editor running under
X-windows on a graphical system). I don't think bash should ignore the
differences between the two characters, and I don't think that editors should
hide the values of the characters that users type.
I do think that the purpose of interactive features in the shell is to make
life easier for the end-user -- those who wish to enter non-breaking spaces
into the command line may do so by prefixing them with C-v.
Sure. My (admitely evil) user wish is to have the shell treat all non
quoted/escaped whitespace as a token separator which is what 99,99 % of users
will want I think.
Whitespace differentiation is ok and has lots of usages in editing (where one
can *see* the differing whitespace effects aither directly or indirectly) but it
has no place in the shell IMHO. As long as the shell relies on some sort of
filtering *not* to feed it special whitespace some poor user will find a way to
feed it these chars. Because unix is that flexible.
Nicolas, by using other <blank> characters you are actually asking for trouble.
You type in a script, carefully test it and after you are confident that it works,
you distribute it/sell it/whatever. The poor recipient of your script will find
out that it does not, in fact, work, on his/her system. In fact, it won't work
on *your* system when run with LC_CTYPE=C. (e.g. editing /etc/rc.d/rc.sysinit).
Hey, I didn't say I actively *wanted* to do this. Don't get me wrong. Just that
this will happen in reality and the shell bloody well should not misbehave when
it encounters unusual whitespace.
[ and even if what you described happened to me why should I feel ashamed ? I
know at least one $wellknown $bigbucks product whose utilities crashed on any
non en_US locale. Of course this was not documented anywhere, one had to find a
fellow non-american to learn the workaround ]
Fact of life : what's "simple" for users is not always so for apps. Apps usually
have to adapt, because users won't ever (the user is evil and dumb - unless you
are the user:)
Here the simple user-understandable rule is whitespace acts as shell separator
We all know unicode locale is the near-future, and non-break-space is generic
enough to affect most locales. Or to we want to sprinkle every other script line
with LANG=C like I've seen RedHat doing lately for example ?
This is not a bug in XFree86 (or anything else). It's just simply a user
doing something wrong, and getting something unexpected. The proper thing
to do is don't do that.
Closing bug as NOTABUG.
Did we have the Posix defect report answer ?
The general opinion seemed to be the specs are not crystal-clear on this point
Not sure it should stay open (as mentioned above, it's not something
where we would want to deviate from upstream bash), but definitely not
an XFree86 issue.
XCU ERN 25 has been filed (see
but expect that to last a few weeks.
I have no idea whether it will be accepted, the standard is quite clear
that LC_CTYPE should be honored. I hope we will get some sort of rationale
Resolution of the POSIX defect report (XCU ERN 25, to separate tokens
only by <space> or <tab> instead of <blank>):
> The standard clearly states these requirements,and conforming implementations
> must conform to this.
> The group feels there is no defect here.
All I can say is I'm fine with current bash behavior, POSIX or not.
I won't say I agree with the POSIX guys, but it's their call -> closing