Hide Forgot
Description of problem: At times piping a large .rpm on a local disk to rpm2cpio fails.. for e.g. [/tmp]$ cat large.rpm | rpm2cpio 1>/dev/null error: rpm2cpio: headerRead failed: hdr blob(154603): BAD, read returned 98008 error reading header from package strace confirms read is reading less than requested.. I ended up just looking at the code as this used to work fine with rpm 4.4.x (in RHEL 5.x). It seems some code was removed in 4.8.x that was handling all the cases for a 'read'. The case I'm hitting is straight from 'man 2 read' ...It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or...etc... So this appears to be a regression unless I'm missing something. In rpm 4.4.x (used on RHEL 5.x) ufdio_s->read=udfRead, udfRead loops etc until it gets all the data i.e. does things right see http://rpm.org/gitweb?p=rpm.git;a=blob;f=rpmio/rpmio.c;h=dcb68c676a4f90a355d28b6f700a3322c5c62f3b;hb=refs/heads/rpm-4.4.x#l2346 In rpm 4.8.x (RHEL 6) ufdio_s->read=fdRead ('plain' read is done..) see http://rpm.org/gitweb?p=rpm.git;a=blob;f=rpmio/rpmio.c;h=6473d557766b1941fc9cc1ceaec7a1e9b90d572e;hb=refs/heads/rpm-4.8.x#l760 That was introduced with this change... Eliminate ufdio-specific read, write, seek and close - we dont do network IO anymore so ufdio only differs from fdio by downloading the file on open if necessary, after that it's just fdio http://rpm.org/gitweb?p=rpm.git;a=commit;h=2dc82d4e3e9c2959f4f731895993645761905073 ..so I guess it's an oversight ? Version-Release number of selected component (if applicable): [~]$ rpm -q rpm rpm-4.8.0-19.el6.i686 How reproducible: Happens most of the time for me if the rpm is pretty large >150M and on a local disk. Steps to Reproduce: 1. Run 'cat large.rpm | rpm2cpio 1>/dev/null' for a large rpm on a local disk Actual results: error: rpm2cpio: headerRead failed: hdr blob(154603): BAD, read returned 98008 error reading header from package Expected results: no error Additional info:
Why does this bug have the Needinfo flag set? What additional information do you need? Clearly rpm 4.8.x misuses read().
It's a needinfo on the developer (you are probably not authorized to see the recepient or I forgot to add the developer mail address)
Needinfo reset back on Panu to get his attention on this bug.
I suppose "oversight" is a fitting description, rpm2cpio being the only thing in rpm that accepts input from a pipe. I dont think I was even aware of it supporting reading from stdin prior to this bug... While this technically is a regression, its on a very rarely used feature and the "normal" usage of 'rpm2cpio package.rpm|cpio ...' works just fine. The rpmio subsystem is such a house of cards that I'm not going to risk breaking more commonly used functionality with a hurried fix for a corner case issue - moving to 6.4.0.
devel_ack, in 4.8.x this can be handled in the timedRead() wrapper easily enough. Upstream will need something different though...
I'm trying to reproduce this issue but no error occurred from 200 attempts. # rpm -q rpm rpm-4.8.0-27.el6.x86_64 # du -h large-1-1.noarch.rpm 190M large-1-1.noarch.rpm # for i in `seq 1 200`; do cat large-1-1.noarch.rpm | rpm2cpio 1>/dev/null; done # Any ideas? Is it possible that it depends on particular rpm used for test? Maybe another scenario that can be used to verify the bug fix?
(In reply to comment #11) > Any ideas? Is it possible that it depends on particular rpm used for test? It will depend on OS and file system implementation. Maybe on memory pressure too. Try booting your system with mem=...
To reproduce you need a package which a header larger than system pipe buffer, the total size of the package (or amount of memory) is not important. Of course there tends to be a correlation: the packages with a large header tend to be large overall. Anyway, the kernel package should be a fairly reliable reproducer.
Here's an actual reproducer (although you'll probably want to find a closer mirror for the kernel package): $ wget http://ftp.funet.fi/pub/mirrors/fedora.redhat.com/pub/fedora/linux/releases/17/Fedora/x86_64/os/Packages/k/kernel-3.3.4-5.fc17.x86_64.rpm $ cat kernel-3.3.4-5.fc17.x86_64.rpm |rpm2cpio|wc -l Output with bug present will be something like: error: rpm2cpio: headerRead failed: hdr blob(681753): BAD, read returned 129672 error reading header from package 0 Output with bug fixed (assuming same kernel package etc): 299593
Thanks Panu for the reproducer; it works well on my workstation but the success rate dramatically falls on any of the servers in our lab. I checked the the pipe buffer size and it seems it is 65536 bytes on all system except ppc64 where it is 1048576 bytes. On ppc64 therefore I cannot reproduce it at all, but at least that makes sense. What I don't understand why it is that problem to reproduce it even on x86_64. Any idea? Is there anything else than the pipe buffer size that should be considered on the test systems? I also tried to create a test package which contained nothing but a large changelog; that gain worked fine on my desktop but not on servers.
I dont see what else besides pipe buffer size could affect it. Where are you getting the pipe buffer size from? Also... if everything else fails, use a bigger hammer ;) Bigger hammer being a larger package in this case: even the kernel header is below 1M in size, whereas header max size is 64M.
Do you need more than one reproducer? Isn't this a simple problem, fixed by correct use of read() inside fdRead(): http://linux.die.net/man/2/read
(In reply to comment #17) > I dont see what else besides pipe buffer size could affect it. Where are you > getting the pipe buffer size from? > I used the script found here: http://unix.stackexchange.com/questions/11946/how-big-is-the-pipe-buffer > Also... if everything else fails, use a bigger hammer ;) Bigger hammer being > a larger package in this case: even the kernel header is below 1M in size, > whereas header max size is 64M. Well, the big hammer seems to help. Having a spec file with 15M changelog gives quite good failure chances. (The poorest is on s390x, about 5%, but that is enough to verify the issue with higher attempts). (In reply to comment #18) > Do you need more than one reproducer? Isn't this a simple problem, fixed by > correct use of read() inside fdRead(): > > http://linux.die.net/man/2/read Yes, the problem is seems to be quite simple, but I need a deterministic way to reproduce the issue, so we can say there is not regression in the future. Now I think we found the way.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0461.html