Bug 450787

Summary: optparse doesn't handle Unicode help text
Product: Red Hat Enterprise Linux 5 Reporter: Bryan Mason <nobody+bjmason>
Component: pythonAssignee: James Antill <james.antill>
Status: CLOSED WONTFIX QA Contact: Martin Jenner <mjenner>
Severity: low Docs Contact:
Priority: low    
Version: 5.2CC: crobinso
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-06-12 16:34:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 446950    
Attachments:
Description Flags
Test case
none
Proposed patch none

Description Bryan Mason 2008-06-11 00:38:53 UTC
Description of problem:

    optparse doesn't handle Unicode help text

Version-Release number of selected component (if applicable):

    python-2.4.3-21.el5

How reproducible:

    Every time.

Steps to Reproduce:

    1. Create a program that uses Unicode text for the help text in
       OptionParser.add_option().
    2. Run the program created in step 1.
    3. Boom!

or

    1. Unpack attached tarball with test program.
    2. run 'LANG=ja_JP.UTF-8 ./test3.py --help'
    3. Boom!
  
Actual results:

    $ LANG=ja_JP.UTF-8 ./test3.py --help
    Traceback (most recent call last):
      File "./test3.py", line 21, in ?
        (options,args) = parser.parse_args()
      File "/usr/lib64/python2.4/optparse.py", line 1275, in parse_args
        stop = self._process_args(largs, rargs, values)
      File "/usr/lib64/python2.4/optparse.py", line 1315, in _process_args
        self._process_long_opt(rargs, values)
      File "/usr/lib64/python2.4/optparse.py", line 1390, in _process_long_opt
        option.process(opt, value, values, self)
      File "/usr/lib64/python2.4/optparse.py", line 707, in process
        return self.take_action(
      File "/usr/lib64/python2.4/optparse.py", line 728, in take_action
        parser.print_help()
      File "/usr/lib64/python2.4/optparse.py", line 1534, in print_help
        file.write(self.format_help())
    UnicodeEncodeError: 'ascii' codec can't encode characters in position
114-168: ordinal not in range(128)

Expected results:

    No errors.  Program runs normally.

Additional info:

    See http://bugs.python.org/issue1498146

Comment 1 Bryan Mason 2008-06-11 00:38:53 UTC
Created attachment 308878 [details]
Test case

Comment 2 Bryan Mason 2008-06-11 00:40:01 UTC
I should have a proposed patch that fixes this shortly.

Comment 3 Bryan Mason 2008-06-11 22:39:18 UTC
Created attachment 309010 [details]
Proposed patch

Make optparse handle Unicode correctly.  Adapted from upstream patches to
optparse.py here:

http://svn.python.org/view/python/trunk/Lib/optparse.py?rev=46861&r1=46507&r2=46861


and here:

http://svn.python.org/view/python/trunk/Lib/optparse.py?rev=50791&r1=46863&r2=50791

Comment 5 Cole Robinson 2008-06-12 15:48:46 UTC
Was this intended to be filed against python-virtinst? It seems like the patch
you are presenting is against python itself.

Comment 6 Bryan Mason 2008-06-12 16:03:34 UTC
D'oh!  I was working on two bugs at once and got confused.  Yes, this should be
against python, not python-virtinst.  Sorry.

Comment 7 James Antill 2008-06-12 16:34:34 UTC
 In general python just doesn't play well with unicode, IMO. The whole API
pretty much guarantees tracebacks.
 python 2.5.1 does the same thing, we we worked around it in yum by doing:

-        self.optparser.print_help()
+        sys.stdout.write(self.optparser.format_help())

-        self.optparser.print_usage()
+        sys.stdout.write(self.optparser.format_usage())

...see upstream commit: a3a53f16b45e06aeaa3666b47705dc879b182724


 I'm loath to change anything in python/optparse to try and fix/workaround this,
because as I said, it's almost impossible to end up with something you _know_
won't traceback under all sets of input.