Bug 2224123 - man converts non-marked-up ASCII dashes - into unicode dashes ‐
Summary: man converts non-marked-up ASCII dashes - into unicode dashes ‐
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: groff
Version: 39
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Nikola Forró
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 2249869 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-19 21:13 UTC by Adam Williamson
Modified: 2023-11-16 02:36 UTC (History)
9 users (show)

Fixed In Version: groff-1.23.0-3.fc39
Clone Of:
Environment:
Last Closed: 2023-11-16 02:36:17 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
example text file that less sees as a man page if saved with the extension .1 (57 bytes, text/plain)
2023-07-19 21:14 UTC, Adam Williamson
no flags Details
screenshot of the issue occurring in 2023-11-03 Fedora 39 (418.33 KB, image/png)
2023-11-03 16:08 UTC, Adam Williamson
no flags Details

Description Adam Williamson 2023-07-19 21:13:29 UTC
Thanks to "hiredman" (Kevin Downey?) on Matrix for initially reporting this yesterday. I then ran into it myself today, coincidentally.

I'm attaching a test text file. To reproduce the problem:

1. Save the test file as both "test.txt" and "test.1"
2. Open a blank document in a text editor with a search function (I'm using gedit)
3. Run `less test.1`, copy the text `--retry-delay`, and paste it into the blank document on one line
4. Run `less test.txt`, copy the text '--retry-delay', and paste it into the blank document on a second line
5. Run a search for "--", using ASCII dashes (just hit the dash key on the keyboard with the layout set to US). Note that it is found in the second line, but not the first
6. Copy one of each type of dash into some kind of character analyzer (I used https://apps.timwhitlock.info/unicode/inspect ). Note that the dash character from `less test.1` is a unicode dash - code 2010, UTF-8 code E2 80 90 - but the dash character from `less test.txt` is an ASCII dash - code 002D, UTF-8 code 2D.

This affects real-world man pages, since less is the pager used for those. The test text is copied from the man page for curl. If you run `man curl`, find the same text (which is a bit hard because the search function won't find it easily! I search for some text near it...), copy the `--retry-delay` text from it, and paste it, again you will find it's converted to unicode dashes. The source file from the curl source tree uses ASCII dashes, I have verified this. If you gunzip the actual installed /usr/share/man1/curl.1.gz file and open it with something other than less, e.g. directly with gedit, the dashes are ASCII dashes. But when opened with `man curl`, the dashes are unicode dashes.

I tried to reduce the test case to just a single dash character, or just two dash characters, but the bug doesn't reproduce in that case. The attached file seems to be the simplest reproducer. I *think* there needs to be enough to trigger something within less to decide to "treat this as a manpage" - i.e. the numeric file extension and the `.fi` directive. Note how when you run `less test.1` the ".fi" string does not appear, but when you run "less test.txt" it does.

It seems this affects text which isn't marked up in any way. For instance, in the raw curl manpage "source", there are all these uses of "--capath":

See also \fI--capath\fP and \fI-k, --insecure\fP.
 curl --capath /local/directory https://example.com
See also \fI--cacert\fP and \fI--capath\fP.
See also \fI--proxy-insecure\fP, \fI--cacert\fP and \fI--capath\fP.
See also \fI--proxy-capath\fP, \fI--cacert\fP, \fI--capath\fP and \fI-x, --proxy\fP. Added in 7.52.0.
See also \fI--proxy-cacert\fP, \fI-x, --proxy\fP and \fI--capath\fP. Added in 7.52.0.

If you run "man curl", all the uses which are 'marked up' as `\fI--capath\fP` are rendered with ASCII dashes. But the single use which is not 'marked up' at all (the example, "curl --capath /local/directory https://example.com") is rendered with unicode dashes.

Reproducible: Always

Comment 1 Adam Williamson 2023-07-19 21:14:56 UTC
Created attachment 1976607 [details]
example text file that less sees as a man page if saved with the extension .1

Comment 2 Adam Williamson 2023-07-19 21:20:38 UTC
I think this may be downstream only, as it likely involves the shenanigans in lesspipe.sh , which is a downstream source.

Comment 3 Adam Williamson 2023-07-19 21:22:38 UTC
Actually, since lesspipe.sh seems to sorta redirect to `man -P` for things it decides are man pages, maybe this is a man bug after all, not just man showing the bug because it's using less as a pager, as I first assumed? Yeesh. Let's reassign to man...

Comment 4 Adam Williamson 2023-07-19 21:26:20 UTC
Aha. So if I temporarily remove `/usr/bin/man`, forcing lesspipe.sh to use the fallback to `groff` instead (in its `manfilter()` function), the bug doesn't happen. So this does indeed seem to ultimately be man's fault.

Comment 5 Fedora Release Engineering 2023-08-16 08:08:00 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle.
Changing version to 39.

Comment 6 Adam Williamson 2023-10-26 06:58:56 UTC
per https://lwn.net/Articles/947941/ , it seems groff may indeed still somehow be responsible for this, specifically groff 1.23.0. The timing fits - that version went stable for F39 on 2023-07-17, right before this bug was reported.

Comment 7 Lukas Javorsky 2023-11-01 11:51:42 UTC
Thank you for the report.

I've been reading through the Debian discussions [1][2] about this issue and we've decided to follow their decision and revert the newly introduced mapping from the 1.23.0 groff version.

The decision has been made based on the unpopular decision to keep the mapping and convince maintainers/upstreams to write their man-pages with correctly used hyphens.

Debian tried it and it resulted in a ton of emails to the groff maintainer (they didn't want to switch). With the current capacity shortage, we have to use our resources wisely and thus we don't want to end up causing this type of chaos within the package maintainers with the problematic man pages.

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1041731
[2] https://lwn.net/Articles/947941/

Comment 8 Lukas Javorsky 2023-11-02 15:47:12 UTC
Hi Adam,

I've tried to reproduce your issue, however, I was able to reproduce it on the older groff-1.22.4 version as well.

Comment 9 Adam Williamson 2023-11-02 15:58:50 UTC
Hum. Well, possibly we're actually not seeing exactly the same thing as the Debian issue, then? You can look at my earlier debugging notes, especially https://bugzilla.redhat.com/show_bug.cgi?id=2224123#c4 . Does that help?

Comment 10 Lukas Javorsky 2023-11-02 17:42:29 UTC
My bad, I was skipping one step.

The reproducer works, I'll start working on the revert

Comment 11 Lukas Javorsky 2023-11-03 11:19:25 UTC
Sorry for the confusion, the reproducer indeed doesn't work as it behaves the same with the older groff-1.22-4 version.

Comment 12 Lukas Javorsky 2023-11-03 11:52:05 UTC
I also tried the 'man curl' and tried to search for the '--relay-delay 5...' text which is in your attached example and it worked just fine with the old and new groff version.

I was also able to find the "curl --capath /local/directory https://example.com" with both versions of groff using 'man curl'.

Comment 13 Adam Williamson 2023-11-03 16:07:39 UTC
Huh. It is definitely still broken for me. See the screenshot I'm about to attach, this is `man curl` with a search for `--capath` active - note how three instances of it (which are escaped in the source) are highlighted, but the instance in `curl --capath /local/directory https://example.com` is not highlighted.

This is Fedora 39, man running on a GNOME terminal.

Comment 14 Adam Williamson 2023-11-03 16:08:17 UTC
Created attachment 1996998 [details]
screenshot of the issue occurring in 2023-11-03 Fedora 39

Comment 15 Nikola Forró 2023-11-03 16:42:35 UTC
(In reply to Adam Williamson from comment #13)
> Huh. It is definitely still broken for me. See the screenshot I'm about to
> attach, this is `man curl` with a search for `--capath` active - note how
> three instances of it (which are escaped in the source) are highlighted, but
> the instance in `curl --capath /local/directory https://example.com` is not
> highlighted.

I can reproduce this as well on Fedora 39. Search for --retry-delay doesn't work either. Everything works after downgrading to groff-1.22.4-11.fc38, and also after adding

.if '\*[.T]'utf8' \{\
.  char - \-
.\}

(taken from debian's mandoc.local [1]) to /etc/groff/site-tmac/man.local.

[1] https://salsa.debian.org/debian/groff/-/blob/d5394c68d70e6c5199b01d2522e094c8fd52e64e/debian/mandoc.local

Comment 16 Lukas Javorsky 2023-11-03 22:03:59 UTC
Interesting, I might have something with my setup.

I'll prepare a fix and a testing RPM which I will share with you and kindly ask you to verify if it fixes this bug. Is it okay with you?

Comment 17 Adam Williamson 2023-11-03 22:13:47 UTC
Sure, no problem. Thanks!

Comment 18 Lukas Javorsky 2023-11-06 10:10:15 UTC
Hi Adam,

I've created a PR: https://src.fedoraproject.org/rpms/groff/pull-request/5#

And here is the scratch-build (groff-base package is the one you need): https://koji.fedoraproject.org/koji/taskinfo?taskID=108648971

Please test the package and let me know if it fixes the bug you've encountered.

Thank you

Comment 19 Nikola Forró 2023-11-06 10:36:24 UTC
The fix works for me.

Comment 20 Lukas Javorsky 2023-11-06 11:12:05 UTC
Thank you, Nikola.

I'll now write an email to the Fedora community about this, and merge/build it after ~1 week from the email (if there is no serious concerns about the revert)

Comment 21 Fedora Update System 2023-11-09 13:55:54 UTC
FEDORA-2023-1ca38baed6 has been submitted as an update to Fedora 39. https://bodhi.fedoraproject.org/updates/FEDORA-2023-1ca38baed6

Comment 22 Fedora Update System 2023-11-10 02:18:33 UTC
FEDORA-2023-1ca38baed6 has been pushed to the Fedora 39 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-1ca38baed6`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-1ca38baed6

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 23 Zbigniew Jędrzejewski-Szmek 2023-11-15 21:05:24 UTC
*** Bug 2249869 has been marked as a duplicate of this bug. ***

Comment 24 Fedora Update System 2023-11-16 02:36:17 UTC
FEDORA-2023-1ca38baed6 has been pushed to the Fedora 39 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.