Bug 1693712

Summary: Switch to a string for RPM calls
Product: Red Hat Enterprise Linux 8 Reporter: Panu Matilainen <pmatilai>
Component: rpmlintAssignee: Thomas Woerner <twoerner>
Status: CLOSED ERRATA QA Contact: Karel Srot <ksrot>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 8.0CC: emrakova, twoerner
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-05 21:13:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1631292, 1715212    
Bug Blocks:    

Description Panu Matilainen 2019-03-28 14:11:46 UTC
Description of problem:

Rpm's python3 API has been totally braindamaged all this time but people are only noticing now that it's starting to get used. 

We're changing rpm to return all string data as surrogate-escaped utf-8 python strings everywhere (instead of bytes with unknown encoding that the API doesn't otherwise even accept, see bug 1631292). This makes most rpm-scripts written for python2 just work with python3 too (from the rpm pov).

Most software that has kept python2 compatibility are automatically compatible with the fixed API, but unfortunately python3-only users like rpmlint need fixing for the new behavior.

There's at least one affected place in rpmlint, which will after the change start failing with the following traceback:

Traceback (most recent call last):
  File "/usr/bin/rpmlint", line 378, in <module>
    main()
  File "/usr/bin/rpmlint", line 166, in main
    runChecks(pkg)
  File "/usr/bin/rpmlint", line 223, in runChecks
    check.check(pkg)
  File "/usr/share/rpmlint/TagsCheck.py", line 695, in check
    self.check_summary(pkg, lang, ignored_words)
  File "/usr/share/rpmlint/TagsCheck.py", line 903, in check_summary
    if not Pkg.is_utf8_bytestr(summary):
  File "/usr/share/rpmlint/Pkg.py", line 168, in is_utf8_bytestr
    s.decode('UTF-8')
AttributeError: 'str' object has no attribute 'decode'

As the broken rpm versions are widely in use, it's best to keep compatibility with both initially. One possible way to fix this is simply:

--- Pkg.py.orig	2019-03-28 16:06:54.491218904 +0200
+++ Pkg.py	2019-03-28 16:07:13.412186582 +0200
@@ -168,6 +168,8 @@
         s.decode('UTF-8')
     except UnicodeError:
         return False
+    except AttributeError:
+        return True
     return True

Comment 1 Thomas Woerner 2019-06-13 12:54:06 UTC
The change in the 8.1 rpm requires to also change rpmlint. Without a fix the output of rpmlint will contain warning messages.

Here is a simple reproducer:

$ rpmlint systemd | grep UnicodeWarning
/var/str/source/Pkg.py:184: UnicodeWarning: decode() called on unicode string, see https://bugzilla.redhat.com/show_bug.cgi?id=1693751
  s.decode('UTF-8')
/var/str/source/Pkg.py:184: UnicodeWarning: decode() called on unicode string, see https://bugzilla.redhat.com/show_bug.cgi?id=1693751
  s.decode('UTF-8')

The proposed fix from #1693712#c0 is not working.

Here is a working proposal:

 def is_utf8_bytestr(s):
-    try:
-        s.decode('UTF-8')
-    except UnicodeError:
-        return False
-    return True
-
+    if isinstance(s, str):
+        try:
+            s.encode("utf8")
+        except UnicodeEncodeError:
+            return False
+        else:
+            return True
+    else:
+        try:
+            s.decode('UTF-8')
+        except UnicodeError:
+            return False
+        else:
+            return True

The encode in the first part is only there to make sure that we really have something that can be utf8. A str type in Python3 should always be utf8, but it is possible also to add bad escape sequences. This test is revealing this.

The output of the gating tests that will be added with #1682275 also contain lots of the warning lines.

Comment 2 Panu Matilainen 2019-06-13 13:01:51 UTC
Oh, sorry, the proposed code was prior to adding the .decode() compatibility kludge. Which once again is causing more problems than its solving. I think we need to reconsider the compat thing on rpm side...

Comment 11 errata-xmlrpc 2019-11-05 21:13:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3437