Description of problem: I'm used to putting spaces around pipes to improve readability. This has been broken in terms lately though. A command like : [nim@ulysse nim]$ ps auwx | grep tomcat will return bash: grep: command not found The problem is linked to the key sequence I use to type pipe-space with my keyboard layout : pipe is altgr+6 ant thus space is often altgr+space which is defined as nobreakspace. Anyway there is no reason for a shell to distinguish between space and nobreakspace (unlike a text editor) so bash should be taught to treat all spaces the same way. How reproducible: Allways Steps to Reproduce: 1. set fr-latin9 as default keyboard layout 2. type a command involving pipe-space, keeping altgr pressed when space is pressed 3. watch bash complain Additional info: bash-2.05b-25, probably allways been broken
See the bash man page, in particular the section about the IFS variable.
So what ? Is there a good reason nobreakspace should not be added to the default IFS value ? I really do not want to tweak IFS on dozens of systems unless there is a very good reason not to add nobreakspace to IFS by default
Yes: it can be a filename character.
I'd agree with the thought. Except I've yet to find any app that uses nobreakspace in file names and users certainly do not. I can't remember ever not having to escape/quote file names with spaces, because everyone uses normal space in them. So I really don't see it an IFS showstopper. When it's so easy to insert nobreakspace after a pipe on some layouts, it's a shame it's not in IFS. At least in text editors it makes sence, because one *sees* the nobreakspace.
The initial value of IFS is mandated by POSIX. Sounds like your problem is with the font used in whichever terminal you have not displaying the non-breaking character correctly as a rotated '[' or whatever.
Seeing nobreakspaces in shell might help. But is it really possible ? It can't be done changing fonts - people will use whatever font they like (vera, corefonts...) and showing them is app dependant - do we want to see nobreakspaces in html pages for example ? I don't suppose whoever wrote the POSIX entry thought about nobreakspace. Is it even available without UTF8/latin9 ? I'd hate to be stuck forever because no one really thought of it and no one wants to discuss what was written long ago.
i'm not sure if it's a bug in urw-fonts. What is correct to display for altgr+space? I have tried xterm with fixed font, it also has this problem! Owen: any comment about this issue?
No-break-space should show as a space; if it displayed anything else, it wouldn't work for text files. I don't think there is anything to do here except to change the AltGR+space keyboard mapping not to produce a no-break-space. (This doesn't seem a like a very useful thing to me as compared to obvious problem here.) Reassing to XFree86 - Mike Harris will probably want a bug filed in bugs.xfree86.org
nobreakspace is *very* useful in locales where typographic rules require spaces before punctation marks like : (ie in french for example like here). Without it one can not do any serious editing because most/all the text apps out there will consider very smart to go to line before ending : or » (really, even MS word had to add a nobreakspace access because despite all the automated workarounds in its code people still have to handle it manually). The problem here is bash follows posix and posix clearly didn't thought about nobreakspace. Can't we add the working behaviour to bash, at least in its non-strict posix mode ? Or ask whatever org handles Posix or LSB their thoughts on this ? All the various workarounds proposed so far involved killing one class or app or another, when the core problem is really in bash defaults.
Think about what you're asking: That a non-breaking space character, i.e. one that should *not* be used to delineate separate words, be used a a marker for splitting words from each other. It's entirely clear to me that it would be the wrong thing to do, and that bash is doing exactly the right thing in this case.
Well I'd agree with you *if* someone somewhere was taking advantage of this *and* the shell provided some sort of visual hint a non-breaking space is used. Right now both conditions are false : no app I know of uses it the way you suggest, and when it's used otherwise (ie as a separator, wich does happen) the shell will issue a very confusing error message (really what would *you* do if your shell told you bash: grep: command not found or you found it in some server logs - cron jobs and such do use pipes) People do not think of non-breaking spaces in shells. Hell even if an app somewhere decided to get smart and replace spaces with non-breaking spaces in filenames it'd be shot down by the number of disgrunted users that could no longer figure how to type the file names in their shells (because a space and a non-breaking space look the same in a term unlike a text processor). Really from a usability POV, a shell should not differentiate between space/non-breaking space. Or if it does provide some kind of bloody visual hint. But the whole default as mandated by posix shows its writers didn't want to get into makefile-style whitespace mess - tabs are treated exactly like spaces, as non-breaking spaces should.
Actually, looking though the POSIX shell grammer, it implies that the delimitation of tokens should be done based on isspace()/LC_CTYPE. And in non-C locales, isspace(U+00A0) should be true; so perhaps in theory this *is* a bash bug. In pratice though, I suspect the chance of things changing here is miniscule. It's certainly something where we would be very hestitant to deviate from upstream. (Note that the default IFS only contains U+0020, so U+00A0 won't be interchangeable with space in all circumstances, just when parsiing input into tokens.) So, if the nobreakspace key combination is desired, I suspect that the correct resolution is to move it back to bash and WONTFIX. Though you could always try to get the change upstream.
Single UNIX, and POSIX last time I read it, says IFS starts as: the space character, the tab character, and the newline character. No bash bug here. Even if you thought that changing bash would make things work: what about all the other shells?
FWIW, shells are not used only by users. It is perfectly OK to generate a string of random non-(0, slash, space, tab, newline) characters and use them in a shell script for a file name. Such script would break by changing IFS if all uses of the variable were not quoted ("$key"). The original problem can be worked around by alias ' grep'=grep (where the second space is non-breaking), if there are only a few commands often used after | (which true in my case).
* Note that tokenization *does not* use IFS. Try IFS="@ " shell -c "ls@|@cat" It won't work. If you read the appropriate section of POSIX, it strongly implies that LC_CTYPE does effect tokenization. I'm not saying that bash's behavior should change - it might be a noticeable speed hit, it would involve changing lots of code, and it might even be security concern; but, I do think that bash's current behavior doesn't comply with the letter of POSIX. * Generally, any time you use filenames in a shell script, you should generally quote so it works if you have (normal) spaces in them. If your shell script doesn't handle filenames with spaces, it doesn't really matter if it also doesn't handle filnames with non-breaking spaces in them.
Btw the problem with the aliasing is it breaks as soon as one tries to use a new command or type two nobreakspaces. So it's a bit helpful, but not much. I'll post the bug url to gnu.bash.bug so hopefully bash maintainers can chirp in.
Add non-breaking space to the .inputrc. As has been pointed out, this is an interactive user-mistyping type of bug, not a bug in semantics or behavior. It should easily be fixable by having non-breaking space produce standard space from within readline.
Another data point: the libc locale definition for LC_CTYPE does *not* include U+00A0 (non-breaking space). (From my reading of ISO C, it is not really obvious whether it belongs there or not). I have filed a defect report against the POSIX standard regarding the locale-dependent definition of token recognition. Brian's solution looks nice, but I wouldn't want it on default install. It's as if readline were silently rewriting 'a' to 'b'.
[ about using .inputrc ] Except for the wonderful world of shell scripts and crontabs. The original typing might be interactive, but the consequence may also manifest itself long after on another computer in a non-interactive environment. (though as workaround go this is certainly one of the best suggested yet)
Shell scripts and crontabs are not typed interactively, they are typed into an editor (and, in the specific case presented here, into an editor running under X-windows on a graphical system). I don't think bash should ignore the differences between the two characters, and I don't think that editors should hide the values of the characters that users type. I do think that the purpose of interactive features in the shell is to make life easier for the end-user -- those who wish to enter non-breaking spaces into the command line may do so by prefixing them with C-v.
Sure. My (admitely evil) user wish is to have the shell treat all non quoted/escaped whitespace as a token separator which is what 99,99 % of users will want I think. Whitespace differentiation is ok and has lots of usages in editing (where one can *see* the differing whitespace effects aither directly or indirectly) but it has no place in the shell IMHO. As long as the shell relies on some sort of filtering *not* to feed it special whitespace some poor user will find a way to feed it these chars. Because unix is that flexible.
Nicolas, by using other <blank> characters you are actually asking for trouble. You type in a script, carefully test it and after you are confident that it works, you distribute it/sell it/whatever. The poor recipient of your script will find out that it does not, in fact, work, on his/her system. In fact, it won't work on *your* system when run with LC_CTYPE=C. (e.g. editing /etc/rc.d/rc.sysinit).
Hey, I didn't say I actively *wanted* to do this. Don't get me wrong. Just that this will happen in reality and the shell bloody well should not misbehave when it encounters unusual whitespace. [ and even if what you described happened to me why should I feel ashamed ? I know at least one $wellknown $bigbucks product whose utilities crashed on any non en_US locale. Of course this was not documented anywhere, one had to find a fellow non-american to learn the workaround ] Fact of life : what's "simple" for users is not always so for apps. Apps usually have to adapt, because users won't ever (the user is evil and dumb - unless you are the user:) Here the simple user-understandable rule is whitespace acts as shell separator unless quoted/escaped. We all know unicode locale is the near-future, and non-break-space is generic enough to affect most locales. Or to we want to sprinkle every other script line with LANG=C like I've seen RedHat doing lately for example ?
This is not a bug in XFree86 (or anything else). It's just simply a user doing something wrong, and getting something unexpected. The proper thing to do is don't do that. Closing bug as NOTABUG.
Did we have the Posix defect report answer ? The general opinion seemed to be the specs are not crystal-clear on this point
Not sure it should stay open (as mentioned above, it's not something where we would want to deviate from upstream bash), but definitely not an XFree86 issue.
XCU ERN 25 has been filed (see http://www.opengroup.org/austin/aardvark/finaltext/xcubug.txt), but expect that to last a few weeks. I have no idea whether it will be accepted, the standard is quite clear that LC_CTYPE should be honored. I hope we will get some sort of rationale at least.
Resolution of the POSIX defect report (XCU ERN 25, to separate tokens only by <space> or <tab> instead of <blank>): > The standard clearly states these requirements,and conforming implementations > must conform to this. > > The group feels there is no defect here. All I can say is I'm fine with current bash behavior, POSIX or not.
I won't say I agree with the POSIX guys, but it's their call -> closing