Created attachment 314500 [details] pipetest.src.rpm Description of problem: The following code: #include <stdio.h> int main(int argc, char **argv) { FILE *cpipe; const int BUFFER_SIZE = 4096; char buffer[BUFFER_SIZE]; cpipe = popen(argv[1],"r"); while(!feof(cpipe)) { fgets(buffer, BUFFER_SIZE, cpipe); } pclose(cpipe); } when run with "uname" for argument outputs "Linux" and hangs when run in a rawhide-i386 chroot on a F8 xen dom0 machine. This also appears to be reproducible on the Fedora F10 x86_64 xen builders. Cannot reproduce on non-rawhide chroots. It runs okay under strace, but the bad behavior can be seen under gdb. There you can see that the file descriptor is all messed up: (gdb) print *cpipe $2 = {_flags = -72539000, _IO_read_ptr = 0x55577000 "", _IO_read_end = 0x55577000 "", _IO_read_base = 0x55577000 "", _IO_write_base = 0x55577000 "", _IO_write_ptr = 0x55577000 "", _IO_write_end = 0x55577000 "", _IO_buf_base = 0x55577000 "", _IO_buf_end = 0x55579000 <Address 0x55579000 out of bounds>, _IO_save_base = 0x0, _IO_backup_base = 0x0, _IO_save_end = 0x0, _markers = 0x0, _chain = 0x556f6560, _fileno = -14452, _flags2 = 0, _old_offset = 0, _cur_column = 0, _vtable_offset = 0 '\0', _shortbuf = "", _lock = 0x804a0a8, _offset = -1, __pad1 = 0x0, __pad2 = 0x0, __pad3 = 0x0, __pad4 = 0x0, __pad5 = 0, _mode = -1, _unused2 = '\0' <repeats 39 times>} I'm attaching a src.rpm that can be used to test this as well. How reproducible: Everytime.
This can't really be caused by the Rawhide kernel, chroots don't use the kernel from the target system. The kernel in use is the respective Xen kernel, on that F8 machine it's an F8 kernel, on Koji it's a RHEL kernel.
Hm, good point. Re-assigning to glibc. Only happens in a xen environment though too.
Okay, looks like _IO_new_proc_open() has changed from F-9 to F-10. In F-9 we get the pipe file descriptor with: if (_IO_pipe (pipe_fds) < 0) return NULL; where it appears (being a total glibc novice) that _IO_pipe calls the "pipe" Linux syscall. In F-10 we get the pipe file descriptor with: int r = __pipe2 (pipe_fds, O_CLOEXEC); i.e. calling the "pipe2" linux syscall. So, it looks like the pipe2 linux syscall is borked in the xen kernel. I see nonsense being returned: (gdb) print pipe_fds $9 = {-1425810944, 10922} Now, do we "fix" by moving back to using __pipe in glibc or can we get __pipe2 fixed in the xen kernel?
I seem to have the results backwards in the first comment: I'm only seeing this on x86_64 guests, not i386.
As Kevin mentions procsses run in the chroot on builders aren't using the rawhide kernel. So I not 100% sure that this is a xen specific issue. I think it is simply that the F10 glibc is wanting to be running on a kernel supporting the new pipe2 syscall, but F8/9 kernels used on the builders don't have this. I think glibc needs to detect when the kernel has no pipe2() syscall and fall back to old syscall usage. Or perhaps glibc already has this fallback, and it is simply not working correctly on x86_64 kernels ?
glibc checks for the pipe2 syscall with "__have_pipe2": #ifdef O_CLOEXEC # ifndef __ASSUME_PIPE2 if (__have_pipe2 >= 0) # endif { int r = __pipe2 (pipe_fds, O_CLOEXEC); # ifndef __ASSUME_PIPE2 if (__have_pipe2 == 0) __have_pipe2 = r != -1 || errno != ENOSYS ? 1 : -1; if (__have_pipe2 > 0) # endif if (r < 0) return NULL; } #endif #ifndef __ASSUME_PIPE2 # ifdef O_CLOEXEC if (__have_pipe2 < 0) # endif if (__pipe (pipe_fds) < 0) return NULL; #endif Presumably the xen kernels "have" pipe2, it just isn't working in the x86_64 version.
Can you strace it? glibc will fallback to pipe if pipe2 syscall fails with -ENOSYS. Of course if the kernel pretends the syscall succeeded and returns garbage, it is a kernel that that has to be fixed.
brk(0x1d04c000) = 0x1d04c000 syscall_293(0x7fffd9777e80, 0x80000, 0x400799, 0x2b62d18c76f0, 0, 0x1d02b100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) = 0x125 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0 x2b62d18c7780) = 27744 close(11106) = -1 EBADF (Bad file descriptor) fcntl(-779343360, F_SETFD, 0) = -1 EBADF (Bad file descriptor) mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b62d1351000 read(3515623936, 0x2b62d1351000, 8192) = -1 EBADF (Bad file descriptor) read(3515623936, 0x2b62d1351000, 8192) = -1 EBADF (Bad file descriptor) read(3515623936, 0x2b62d1351000, 8192) = -1 EBADF (Bad file descriptor)
Ug. There is a bug in the RHEL-5 x86_64 kernel only that would cause "unknown" system calls to return their system call numbers instead of -1 with ENOSYS, as they should. This is another instance of it. This will be fixed by 5.3. Chris Lalancette
Oh, I should have said "RHEL-5 xen x86_64 kernel"; and the relevant BZ is 453394. Can you try the test kernels at http://people.redhat.com/dzickus/el5/105.el5/ and see if it fixes the issue for you? Chris Lalancette
Well, we'll need that fixed kernel installed on the Xen host which hosts Koji.
Right. Well, I asked this on the other bug as well; what do we need to do to make that happen? Assuming that this does fix this issue (I would like confirmation), then we can propose it for the z-stream. Would that work? Chris Lalancette
New kernel works: syscall_293(0x7fffedee1aa0, 0x80000, 0x400799, 0x2ba7bd15e6f0, 0, 0x1517100, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) = -1 (errno 38) pipe([5, 6]) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2ba7bd15e780) = 2185 I also saw this in the F-8 x86_64 xen kernel as well. Thanks!
> what do we need to do to make that happen? Talk to the infrastructure folks.
OK. It seems like we have what we need in Koji already (a workaround), so we don't need to do anything further. I'm going to close this bug as a dup of the other one, so we only have to track one bug. Chris Lalancette *** This bug has been marked as a duplicate of bug 453394 ***