Bug 802839

Summary: rpm2cpio fails intermittently when file is piped to stdin
Product: Red Hat Enterprise Linux 6 Reporter: David Rennalls <david_rennalls>
Component: rpmAssignee: Panu Matilainen <pmatilai>
Status: CLOSED ERRATA QA Contact: Patrik Kis <pkis>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.2CC: charlieb-fedora-bugzilla, ffesti, mvadkert, pkis
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 10:51:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 840699    

Description David Rennalls 2012-03-13 15:37:58 UTC
Description of problem:
At times piping a large .rpm on a local disk to rpm2cpio fails.. 
for e.g. 
[/tmp]$ cat large.rpm | rpm2cpio 1>/dev/null
error: rpm2cpio: headerRead failed: hdr blob(154603): BAD, read returned 98008
error reading header from package

strace confirms read is reading less than requested.. I ended up just looking at the code as this used to work fine with rpm 4.4.x (in RHEL 5.x). It seems some code was removed in 4.8.x that was handling all the cases for a 'read'. The case I'm hitting is straight from 'man 2 read'

...It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or...etc...

So this appears to be a regression unless I'm missing something.
In rpm 4.4.x (used on RHEL 5.x) ufdio_s->read=udfRead, udfRead loops etc until it gets all the data i.e. does things right
see http://rpm.org/gitweb?p=rpm.git;a=blob;f=rpmio/rpmio.c;h=dcb68c676a4f90a355d28b6f700a3322c5c62f3b;hb=refs/heads/rpm-4.4.x#l2346

In rpm 4.8.x (RHEL 6) ufdio_s->read=fdRead ('plain' read is done..)
see http://rpm.org/gitweb?p=rpm.git;a=blob;f=rpmio/rpmio.c;h=6473d557766b1941fc9cc1ceaec7a1e9b90d572e;hb=refs/heads/rpm-4.8.x#l760

That was introduced with this change...
Eliminate ufdio-specific read, write, seek and close
- we dont do network IO anymore so ufdio only differs from fdio by
  downloading the file on open if necessary, after that it's just fdio
http://rpm.org/gitweb?p=rpm.git;a=commit;h=2dc82d4e3e9c2959f4f731895993645761905073

..so I guess it's an oversight ?

Version-Release number of selected component (if applicable):
[~]$ rpm -q rpm
rpm-4.8.0-19.el6.i686

How reproducible:
Happens most of the time for me if the rpm is pretty large >150M and on a local disk.


Steps to Reproduce:
1. Run 'cat large.rpm | rpm2cpio 1>/dev/null' for a large rpm on a local disk
  
Actual results:
error: rpm2cpio: headerRead failed: hdr blob(154603): BAD, read returned 98008
error reading header from package

Expected results:
no error

Additional info:

Comment 3 Charlie Brady 2012-04-25 13:25:49 UTC
Why does this bug have the Needinfo flag set? What additional information do you need? Clearly rpm 4.8.x misuses read().

Comment 4 Miroslav Vadkerti 2012-04-25 13:37:11 UTC
It's a needinfo on the developer (you are probably not authorized to see the recepient or I forgot to add the developer mail address)

Comment 5 Miroslav Vadkerti 2012-04-25 13:37:37 UTC
Needinfo reset back on Panu to get his attention on this bug.

Comment 6 Panu Matilainen 2012-04-26 08:54:44 UTC
I suppose "oversight" is a fitting description, rpm2cpio being the only thing in rpm that accepts input from a pipe. I dont think I was even aware of it supporting reading from stdin prior to this bug...

While this technically is a regression, its on a very rarely used feature and the "normal" usage of 'rpm2cpio package.rpm|cpio ...' works just fine. The rpmio subsystem is such a house of cards that I'm not going to risk breaking more commonly used functionality with a hurried fix for a corner case issue  - moving to 6.4.0.

Comment 7 Panu Matilainen 2012-07-05 13:07:27 UTC
devel_ack, in 4.8.x this can be handled in the timedRead() wrapper easily enough. Upstream will need something different though...

Comment 11 Patrik Kis 2012-10-05 12:44:30 UTC
I'm trying to reproduce this issue but no error occurred from 200 attempts.

# rpm -q rpm
rpm-4.8.0-27.el6.x86_64
# du -h large-1-1.noarch.rpm
190M	large-1-1.noarch.rpm
# for i in `seq 1 200`; do cat large-1-1.noarch.rpm | rpm2cpio 1>/dev/null; done
#

Any ideas? Is it possible that it depends on particular rpm used for test?

Maybe another scenario that can be used to verify the bug fix?

Comment 12 Charlie Brady 2012-10-05 13:12:18 UTC
(In reply to comment #11)

> Any ideas? Is it possible that it depends on particular rpm used for test?

It will depend on OS and file system implementation. Maybe on memory pressure too. Try booting your system with mem=...

Comment 13 Panu Matilainen 2012-10-08 06:07:23 UTC
To reproduce you need a package which a header larger than system pipe buffer, the total size of the package (or amount of memory) is not important. Of course there tends to be a correlation: the packages with a large header tend to be large overall.

Anyway, the kernel package should be a fairly reliable reproducer.

Comment 15 Panu Matilainen 2012-10-30 10:01:35 UTC
Here's an actual reproducer (although you'll probably want to find a closer mirror for the kernel package):

$ wget http://ftp.funet.fi/pub/mirrors/fedora.redhat.com/pub/fedora/linux/releases/17/Fedora/x86_64/os/Packages/k/kernel-3.3.4-5.fc17.x86_64.rpm
$ cat kernel-3.3.4-5.fc17.x86_64.rpm |rpm2cpio|wc -l

Output with bug present will be something like:
error: rpm2cpio: headerRead failed: hdr blob(681753): BAD, read returned 129672
error reading header from package
0

Output with bug fixed (assuming same kernel package etc):
299593

Comment 16 Patrik Kis 2012-11-14 09:17:48 UTC
Thanks Panu for the reproducer; it works well on my workstation but the success rate dramatically falls on any of the servers in our lab.

I checked the the pipe buffer size and it seems it is 65536 bytes on all system except ppc64 where it is 1048576 bytes. On ppc64 therefore I cannot reproduce it at all, but at least that makes sense. What I don't understand why it is that problem to reproduce it even on x86_64. Any idea? Is there anything else than the pipe buffer size that should be considered on the test systems?

I also tried to create a test package which contained nothing but a large changelog; that gain worked fine on my desktop but not on servers.

Comment 17 Panu Matilainen 2012-11-15 11:16:52 UTC
I dont see what else besides pipe buffer size could affect it. Where are you getting the pipe buffer size from?

Also... if everything else fails, use a bigger hammer ;) Bigger hammer being a larger package in this case: even the kernel header is below 1M in size, whereas header max size is 64M.

Comment 18 Charlie Brady 2012-11-15 13:49:18 UTC
Do you need more than one reproducer? Isn't this a simple problem, fixed by correct use of read() inside fdRead():

http://linux.die.net/man/2/read

Comment 19 Patrik Kis 2012-11-15 15:48:44 UTC
(In reply to comment #17)
> I dont see what else besides pipe buffer size could affect it. Where are you
> getting the pipe buffer size from?
> 
I used the script found here:
http://unix.stackexchange.com/questions/11946/how-big-is-the-pipe-buffer

> Also... if everything else fails, use a bigger hammer ;) Bigger hammer being
> a larger package in this case: even the kernel header is below 1M in size,
> whereas header max size is 64M.
Well, the big hammer seems to help. Having a spec file with 15M changelog gives quite good failure chances. (The poorest is on s390x, about 5%, but that is enough to verify the issue with higher attempts).

(In reply to comment #18)
> Do you need more than one reproducer? Isn't this a simple problem, fixed by
> correct use of read() inside fdRead():
> 
> http://linux.die.net/man/2/read
Yes, the problem is seems to be quite simple, but I need a deterministic way to reproduce the issue, so we can say there is not regression in the future.
Now I think we found the way.

Comment 23 errata-xmlrpc 2013-02-21 10:51:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0461.html