Bug 1173619

Summary: dashes in command arguments are misrended in xhtml output
Product: [Fedora] Fedora Reporter: David Woodhouse <dwmw2>
Component: groffAssignee: Nikola Forró <nforro>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 22CC: jchaloup
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-19 12:31:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Woodhouse 2014-12-12 14:18:18 UTC
This is the source of the man page for openconnect(8):
http://git.infradead.org/users/dwmw2/openconnect.git/blob_plain/HEAD:/openconnect.8.in

I have no idea why the dashes therein are all "escaped" as \- instead of just a simple - character. I probably cargo-culted it from whatever man page I started with and stripped down.

As well as being shipped as a man page, the manual is also converted to xhtml (groff -t -K UTF-8 -mandoc -Txhtml) and included in the HTML documentation:
http://www.infradead.org/openconnect/manual.html

It seems that the "escaped" dashes end up being rendered into HTML as &minus;, and thus I have received a complaint from a user that when copying from the browser and pasting into a command line, the options didn't work because he got U+2212 MINUS SIGN instead of U+002D HYPHEN-MINUS.

Should I just do a global 's/\-/-/' on the source file, or was there a *reason* for them being that way in a man page?

Comment 1 David Woodhouse 2014-12-12 14:34:48 UTC
FWIW *every* other man page I've just looked at has also used \- instead of plain - in options. It wasn't just something odd that I made up :)

Comment 2 Jan Chaloupka 2015-01-19 14:46:45 UTC
> I have no idea why the dashes therein are all "escaped" as \- instead of
> just a simple - character. I probably cargo-culted it from whatever man page
> I started with and stripped down.

implicitly groff renders '-' as a hyphen (U+2010) instead of ASCII 0x2D (minus).

> Should I just do a global 's/\-/-/' on the source file, or was there
> a *reason* for them being that way in a man page?

this will render - as - (at least on my machine), but groff upstream would prefer to user \- (as you can read in [1]). Even if it is working, I would not change it, all man pages from man-pages upstream uses \- for minus.

> It seems that the "escaped" dashes end up being rendered into HTML as &minus;,

Yes, because - is minus sign. In order to give semantics to a symbol -, &minus entity is used instead of -. However, browser renders this symbol as U+2212. There is &shy; entity, which represents 0x2D symbol (expected behaviour), but it is rendered as a blank symbol. However &#x2d; is working.

You can run:
$ groff -t -K UTF-8 -mandoc -Txhtml manpage | sed 's/\&minus;/\&#x2d;/g'

As from [2], \- is still being replaced by U+2212. And from [3] it looks like the question is still not resolved.

[1] https://lists.debian.org/debian-devel/2003/03/msg01481.html
[2] http://lists.gnu.org/archive/html/groff/2007-09/msg00073.html
[3] http://lists.gnu.org/archive/html/groff/2007-09/msg00175.html

Comment 3 David Woodhouse 2015-01-26 13:01:23 UTC
cf. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=159872#54 in which Colin Watson points out that groff_char(7) says the following:

       -      the   ISO   Latin-1   `Hyphen,  Minus  Sign'
              (code 45) prints as a hyphen; a  minus  sign
              can be obtained with \-.

So if \- is supposed to be a minus sign, surely it is a bug that in UTF-8 terminal output it *isn't* being emitted as U+2212 MINUS SIGN?
Emitting U+002D HYPHEN-MINUS is wrong.

Comment 4 Jan Chaloupka 2015-01-26 15:44:12 UTC
citing Colin's response:

"Firstly, it is not converted to em-dash. On the utf8 device, it is
converted to either Unicode HYPHEN (0x2010) or SOFT HYPHEN (0x00AD)
depending on context."

Depending on context is important. 

Try the simplest example of hmtl:
<html>
<head>
</head>
</body>

----
<br />
&minus;&minus;&minus;&minus;

</body>
</html>

and open it in a web browser. As you can see, the two lines has different width and look different. I believe upstream chooses U+2212 as it renders significantly better than the "correct" minus sign that can be copied one2one to terminal. Terminal and html are two different devices and thus requires two different mappings for \-.

Comment 5 Fedora Admin XMLRPC Client 2015-07-13 10:39:08 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 6 Fedora End Of Life 2016-07-19 12:31:45 UTC
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.