Thanks to "hiredman" (Kevin Downey?) on Matrix for initially reporting this yesterday. I then ran into it myself today, coincidentally. I'm attaching a test text file. To reproduce the problem: 1. Save the test file as both "test.txt" and "test.1" 2. Open a blank document in a text editor with a search function (I'm using gedit) 3. Run `less test.1`, copy the text `--retry-delay`, and paste it into the blank document on one line 4. Run `less test.txt`, copy the text '--retry-delay', and paste it into the blank document on a second line 5. Run a search for "--", using ASCII dashes (just hit the dash key on the keyboard with the layout set to US). Note that it is found in the second line, but not the first 6. Copy one of each type of dash into some kind of character analyzer (I used https://apps.timwhitlock.info/unicode/inspect ). Note that the dash character from `less test.1` is a unicode dash - code 2010, UTF-8 code E2 80 90 - but the dash character from `less test.txt` is an ASCII dash - code 002D, UTF-8 code 2D. This affects real-world man pages, since less is the pager used for those. The test text is copied from the man page for curl. If you run `man curl`, find the same text (which is a bit hard because the search function won't find it easily! I search for some text near it...), copy the `--retry-delay` text from it, and paste it, again you will find it's converted to unicode dashes. The source file from the curl source tree uses ASCII dashes, I have verified this. If you gunzip the actual installed /usr/share/man1/curl.1.gz file and open it with something other than less, e.g. directly with gedit, the dashes are ASCII dashes. But when opened with `man curl`, the dashes are unicode dashes. I tried to reduce the test case to just a single dash character, or just two dash characters, but the bug doesn't reproduce in that case. The attached file seems to be the simplest reproducer. I *think* there needs to be enough to trigger something within less to decide to "treat this as a manpage" - i.e. the numeric file extension and the `.fi` directive. Note how when you run `less test.1` the ".fi" string does not appear, but when you run "less test.txt" it does. It seems this affects text which isn't marked up in any way. For instance, in the raw curl manpage "source", there are all these uses of "--capath": See also \fI--capath\fP and \fI-k, --insecure\fP. curl --capath /local/directory https://example.com See also \fI--cacert\fP and \fI--capath\fP. See also \fI--proxy-insecure\fP, \fI--cacert\fP and \fI--capath\fP. See also \fI--proxy-capath\fP, \fI--cacert\fP, \fI--capath\fP and \fI-x, --proxy\fP. Added in 7.52.0. See also \fI--proxy-cacert\fP, \fI-x, --proxy\fP and \fI--capath\fP. Added in 7.52.0. If you run "man curl", all the uses which are 'marked up' as `\fI--capath\fP` are rendered with ASCII dashes. But the single use which is not 'marked up' at all (the example, "curl --capath /local/directory https://example.com") is rendered with unicode dashes. Reproducible: Always
Created attachment 1976607 [details] example text file that less sees as a man page if saved with the extension .1
I think this may be downstream only, as it likely involves the shenanigans in lesspipe.sh , which is a downstream source.
Actually, since lesspipe.sh seems to sorta redirect to `man -P` for things it decides are man pages, maybe this is a man bug after all, not just man showing the bug because it's using less as a pager, as I first assumed? Yeesh. Let's reassign to man...
Aha. So if I temporarily remove `/usr/bin/man`, forcing lesspipe.sh to use the fallback to `groff` instead (in its `manfilter()` function), the bug doesn't happen. So this does indeed seem to ultimately be man's fault.
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle. Changing version to 39.