Bug 878363 - The /usr/lib/rpm/rpm2cpio.sh script does not work in RHEL7 due to changed behavior of awk
The /usr/lib/rpm/rpm2cpio.sh script does not work in RHEL7 due to changed beh...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: rpm (Show other bugs)
7.0
Unspecified Unspecified
medium Severity medium
: rc
: ---
Assigned To: Panu Matilainen
Patrik Kis
: EasyFix, Regression, Upstream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-20 04:39 EST by Patrik Kis
Modified: 2014-06-13 06:50 EDT (History)
1 user (show)

See Also:
Fixed In Version: rpm-4.10.2-1.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-13 06:50:03 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Patrik Kis 2012-11-20 04:39:04 EST
Description of problem:
Character over 127 are encoded as 2 octal characters in RHEL7 with LANG=en_US.UTF-8. In RHEL6 it was encoded in one octal.

Version-Release number of selected component (if applicable):
gawk-4.0.1-2.el7.x86_64

How reproducible:
always

Steps to Reproduce:
# export LANG=en_US.UTF-8
# awk 'BEGIN {printf("%c", 0x80); }' | hexdump 
0000000 80c2                                   
0000002
# export LANG=en_US.UTF-16
# awk 'BEGIN {printf("%c", 0x80); }' | hexdump 
0000000 0080                                   
0000001
# awk 'BEGIN {printf("%c%c", 0x79,0x80); }' | hexdump 
0000000 8079                                   
0000002

  
Actual results:
0x80 is encoded as 0xc280

Expected results:
0x80 is encoded as 0x80


Additional info:
On RHEL-6:
# rpm -q gawk
gawk-3.1.7-9.el6.x86_64
# awk 'BEGIN {printf("%c%c", 0x79 ,0x80); }' | od -x
0000000 8079
0000002
Comment 2 Martin Bříza 2012-11-21 07:52:47 EST
I confirm this is a valid issue. Processing invalid Unicode byte sequences is inconsistent with the previous gawk version and with the standard printf function provided by glibc.
Comment 3 Martin Bříza 2012-11-21 08:41:33 EST
I'm starting to be more and more convinced this is rather a bug fix than a regression. Unicode support in 4.x version was hugely improved - to the scale that even every character in printf is treated according to locale, regardless of its real size.
The value above 0x7F that is entered as the parameter is (in UTF-8 locale) the character's position in the Unicode table, so 0x80 here is "Padding Character". Entering for example 0xAC prints "¬", the "Not sign". This is of course caused by the fact caused by the fact UTF-8 characters below 0x80 are identical to their values.
This means the 0x80 sequence is not invalid as I wrote in the previous comment. I thought it's trying to print the value directly, not to convert it as a wide character.
The same behavior then applies for UTF-16 in your report, too, but with higher values.
Is there any reason to keep the previous behavior?
Comment 4 Patrik Kis 2012-11-21 11:31:13 EST
Yes, you are right. I should have checked what awk is actually print out and not only its binary value. With respect of UTF-8 awk behavior seems to be correct, as far as I know.

The reason why I filed this bug is the rpm2cpio.sh script from rpm package. When the script stopped to work on RHEL7 I just found that awk behavior has changed.

The script actually seems to "misuse" this functionality and use awk to get a particular binary value into a bash variable. Since the real reason here is not the particular character but the hex value behind it, I think the script should be fixed.
I will change this bug to rpm component so the script can be fixed there.
Comment 5 Patrik Kis 2012-11-22 03:34:42 EST
Changing the bug to component rpm.

Description of problem:
The script /usr/lib/rpm/rpm2cpio.sh stopped working in RHEL-7.

Version-Release number of selected component (if applicable):
rpm-4.10.1-3.el7.x86_64

How reproducible:
always

Steps to Reproduce:
# /usr/lib/rpm/rpm2cpio.sh bash-4.2.37-6.el7.x86_64.rpm 
  
Actual results:
Unrecognized rpm file: bash-4.2.37-6.el7.x86_64.rpm

Expected results:
The cpio archive.

Additional info:
The reason of this failure is in the awk changed behavior, which seems to be a bug fix. See the details in the comments above.

The script can be fixed easily, juts add to the beginning:
export LANG=en_US
Comment 6 Panu Matilainen 2012-11-22 05:30:02 EST
What cases like these make me think of is "git rm scripts/rpm2cpio.sh" really...

Anyway, fixed upstream now (but LANG=C, not en_US). This also affects every Fedora >= 16 version.
Comment 7 Patrik Kis 2012-11-22 09:00:07 EST
(In reply to comment #6)
> What cases like these make me think of is "git rm scripts/rpm2cpio.sh"
> really...
> 
Yes, I don't think that anybody is using it, but have an auto test for it.
Comment 8 Panu Matilainen 2012-12-11 03:34:38 EST
Fixed in rpm-4.10.2-1.el7
Comment 10 Ludek Smid 2014-06-13 06:50:03 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.