Bug 459442 - popen() fails to connect output pipe to reader process running under xen
popen() fails to connect output pipe to reader process running under xen
Status: CLOSED DUPLICATE of bug 453394
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen (Show other bugs)
5.2
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Xen Maintainance List
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-08-18 18:14 EDT by Orion Poplawski
Modified: 2008-08-28 14:59 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-08-28 14:59:19 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
pipetest.src.rpm (2.14 KB, application/x-rpm)
2008-08-18 18:14 EDT, Orion Poplawski
no flags Details

  None (edit)
Description Orion Poplawski 2008-08-18 18:14:38 EDT
Created attachment 314500 [details]
pipetest.src.rpm

Description of problem:

The following code:

#include <stdio.h>
int main(int argc, char **argv) {
  FILE *cpipe;
  const int BUFFER_SIZE = 4096;
  char buffer[BUFFER_SIZE];

  cpipe = popen(argv[1],"r");
  while(!feof(cpipe))
    {
    fgets(buffer, BUFFER_SIZE, cpipe);
    }
  pclose(cpipe);
}

when run with "uname" for argument outputs "Linux" and hangs when run in a rawhide-i386 chroot on a F8 xen dom0 machine.  This also appears to be reproducible on the Fedora F10 x86_64 xen builders.  Cannot reproduce on non-rawhide chroots.

It runs okay under strace, but the bad behavior can be seen under gdb.  There you can see that the file descriptor is all messed up:

(gdb) print *cpipe
$2 = {_flags = -72539000, _IO_read_ptr = 0x55577000 "", _IO_read_end = 0x55577000 "",
  _IO_read_base = 0x55577000 "", _IO_write_base = 0x55577000 "",
  _IO_write_ptr = 0x55577000 "", _IO_write_end = 0x55577000 "",
  _IO_buf_base = 0x55577000 "",
  _IO_buf_end = 0x55579000 <Address 0x55579000 out of bounds>, _IO_save_base = 0x0,
  _IO_backup_base = 0x0, _IO_save_end = 0x0, _markers = 0x0, _chain = 0x556f6560,
  _fileno = -14452, _flags2 = 0, _old_offset = 0, _cur_column = 0,
  _vtable_offset = 0 '\0', _shortbuf = "", _lock = 0x804a0a8, _offset = -1,
  __pad1 = 0x0, __pad2 = 0x0, __pad3 = 0x0, __pad4 = 0x0, __pad5 = 0, _mode = -1,
  _unused2 = '\0' <repeats 39 times>}


I'm attaching a src.rpm that can be used to test this as well.

How reproducible:
Everytime.
Comment 1 Kevin Kofler 2008-08-19 05:04:57 EDT
This can't really be caused by the Rawhide kernel, chroots don't use the kernel from the target system. The kernel in use is the respective Xen kernel, on that F8 machine it's an F8 kernel, on Koji it's a RHEL kernel.
Comment 2 Orion Poplawski 2008-08-19 10:20:52 EDT
Hm, good point.  Re-assigning to glibc.  Only happens in a xen environment though too.
Comment 3 Orion Poplawski 2008-08-28 12:05:55 EDT
Okay, looks like _IO_new_proc_open() has changed from F-9 to F-10.

In F-9 we get the pipe file descriptor with:

  if (_IO_pipe (pipe_fds) < 0)
    return NULL;

where it appears (being a total glibc novice) that _IO_pipe calls the "pipe" Linux syscall.

In F-10 we get the pipe file descriptor with:

      int r = __pipe2 (pipe_fds, O_CLOEXEC);

i.e. calling the "pipe2" linux syscall.  

So, it looks like the pipe2 linux syscall is borked in the xen kernel.  I see nonsense being returned:

(gdb) print pipe_fds
$9 = {-1425810944, 10922}

Now, do we "fix" by moving back to using __pipe in glibc or can we get __pipe2 fixed in the xen kernel?
Comment 4 Orion Poplawski 2008-08-28 12:13:55 EDT
I seem to have the results backwards in the first comment: I'm only seeing this on x86_64 guests, not i386.
Comment 5 Daniel Berrange 2008-08-28 12:21:17 EDT
As Kevin mentions procsses run in the chroot on builders aren't using the rawhide kernel. So I not 100% sure that this is a xen specific issue. 

I think it is simply that the F10 glibc is wanting to be running on a kernel supporting the new pipe2 syscall, but F8/9 kernels used on the builders don't have this. I think glibc needs to detect when the kernel has no pipe2() syscall and fall back to old syscall usage. Or perhaps glibc already has this fallback, and it is simply not working correctly on x86_64 kernels ?
Comment 6 Orion Poplawski 2008-08-28 12:32:53 EDT
glibc checks for the pipe2 syscall with "__have_pipe2":

#ifdef O_CLOEXEC
# ifndef __ASSUME_PIPE2
  if (__have_pipe2 >= 0)
# endif
    {
      int r = __pipe2 (pipe_fds, O_CLOEXEC);
# ifndef __ASSUME_PIPE2
      if (__have_pipe2 == 0)
        __have_pipe2 = r != -1 || errno != ENOSYS ? 1 : -1;

      if (__have_pipe2 > 0)
# endif
        if (r < 0)
          return NULL;
    }
#endif
#ifndef __ASSUME_PIPE2
# ifdef O_CLOEXEC
  if (__have_pipe2 < 0)
# endif
    if (__pipe (pipe_fds) < 0)
      return NULL;
#endif


Presumably the xen kernels "have" pipe2, it just isn't working in the x86_64 version.
Comment 7 Jakub Jelinek 2008-08-28 12:35:11 EDT
Can you strace it?  glibc will fallback to pipe if pipe2 syscall fails with -ENOSYS.  Of course if the kernel pretends the syscall succeeded and returns garbage, it is a kernel that that has to be fixed.
Comment 8 Orion Poplawski 2008-08-28 13:04:54 EDT
brk(0x1d04c000)                         = 0x1d04c000
syscall_293(0x7fffd9777e80, 0x80000, 0x400799, 0x2b62d18c76f0, 0, 0x1d02b100, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) = 0x125
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0
x2b62d18c7780) = 27744
close(11106)                            = -1 EBADF (Bad file descriptor)
fcntl(-779343360, F_SETFD, 0)           = -1 EBADF (Bad file descriptor)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b62d1351000
read(3515623936, 0x2b62d1351000, 8192)  = -1 EBADF (Bad file descriptor)
read(3515623936, 0x2b62d1351000, 8192)  = -1 EBADF (Bad file descriptor)
read(3515623936, 0x2b62d1351000, 8192)  = -1 EBADF (Bad file descriptor)
Comment 9 Chris Lalancette 2008-08-28 13:13:46 EDT
Ug.  There is a bug in the RHEL-5 x86_64 kernel only that would cause "unknown" system calls to return their system call numbers instead of -1 with ENOSYS, as they should.  This is another instance of it.  This will be fixed by 5.3.

Chris Lalancette
Comment 10 Chris Lalancette 2008-08-28 13:15:46 EDT
Oh, I should have said "RHEL-5 xen x86_64 kernel"; and the relevant BZ is 453394.  Can you try the test kernels at http://people.redhat.com/dzickus/el5/105.el5/ and see if it fixes the issue for you?

Chris Lalancette
Comment 11 Kevin Kofler 2008-08-28 13:36:04 EDT
Well, we'll need that fixed kernel installed on the Xen host which hosts Koji.
Comment 12 Chris Lalancette 2008-08-28 13:39:56 EDT
Right.  Well, I asked this on the other bug as well; what do we need to do to make that happen?  Assuming that this does fix this issue (I would like confirmation), then we can propose it for the z-stream.  Would that work?

Chris Lalancette
Comment 13 Orion Poplawski 2008-08-28 13:42:26 EDT
New kernel works:

syscall_293(0x7fffedee1aa0, 0x80000, 0x400799, 0x2ba7bd15e6f0, 0, 0x1517100, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) = -1
(errno 38)
pipe([5, 6])                            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2ba7bd15e780) = 2185

I also saw this in the F-8 x86_64 xen kernel as well.

Thanks!
Comment 14 Kevin Kofler 2008-08-28 13:50:56 EDT
> what do we need to do to make that happen?

Talk to the infrastructure folks.
Comment 15 Chris Lalancette 2008-08-28 14:59:19 EDT
OK.  It seems like we have what we need in Koji already (a workaround), so we don't need to do anything further.  I'm going to close this bug as a dup of the other one, so we only have to track one bug.

Chris Lalancette

*** This bug has been marked as a duplicate of bug 453394 ***

Note You need to log in before you can comment on or make changes to this bug.