459442 – popen() fails to connect output pipe to reader process running under xen

Bug 459442 - popen() fails to connect output pipe to reader process running under xen

Summary: popen() fails to connect output pipe to reader process running under xen

Keywords:
Status:	CLOSED DUPLICATE of bug 453394
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel-xen
Sub Component:
Version:	5.2
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Xen Maintainance List
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-08-18 22:14 UTC by Orion Poplawski
Modified:	2008-08-28 18:59 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-08-28 18:59:19 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
pipetest.src.rpm (2.14 KB, application/x-rpm) 2008-08-18 22:14 UTC, Orion Poplawski	no flags	Details
View All

Description Orion Poplawski 2008-08-18 22:14:38 UTC

Created attachment 314500 [details]
pipetest.src.rpm

Description of problem:

The following code:

#include <stdio.h>
int main(int argc, char **argv) {
  FILE *cpipe;
  const int BUFFER_SIZE = 4096;
  char buffer[BUFFER_SIZE];

  cpipe = popen(argv[1],"r");
  while(!feof(cpipe))
    {
    fgets(buffer, BUFFER_SIZE, cpipe);
    }
  pclose(cpipe);
}

when run with "uname" for argument outputs "Linux" and hangs when run in a rawhide-i386 chroot on a F8 xen dom0 machine.  This also appears to be reproducible on the Fedora F10 x86_64 xen builders.  Cannot reproduce on non-rawhide chroots.

It runs okay under strace, but the bad behavior can be seen under gdb.  There you can see that the file descriptor is all messed up:

(gdb) print *cpipe
$2 = {_flags = -72539000, _IO_read_ptr = 0x55577000 "", _IO_read_end = 0x55577000 "",
  _IO_read_base = 0x55577000 "", _IO_write_base = 0x55577000 "",
  _IO_write_ptr = 0x55577000 "", _IO_write_end = 0x55577000 "",
  _IO_buf_base = 0x55577000 "",
  _IO_buf_end = 0x55579000 <Address 0x55579000 out of bounds>, _IO_save_base = 0x0,
  _IO_backup_base = 0x0, _IO_save_end = 0x0, _markers = 0x0, _chain = 0x556f6560,
  _fileno = -14452, _flags2 = 0, _old_offset = 0, _cur_column = 0,
  _vtable_offset = 0 '\0', _shortbuf = "", _lock = 0x804a0a8, _offset = -1,
  __pad1 = 0x0, __pad2 = 0x0, __pad3 = 0x0, __pad4 = 0x0, __pad5 = 0, _mode = -1,
  _unused2 = '\0' <repeats 39 times>}


I'm attaching a src.rpm that can be used to test this as well.

How reproducible:
Everytime.

Comment 1 Kevin Kofler 2008-08-19 09:04:57 UTC

This can't really be caused by the Rawhide kernel, chroots don't use the kernel from the target system. The kernel in use is the respective Xen kernel, on that F8 machine it's an F8 kernel, on Koji it's a RHEL kernel.

Comment 2 Orion Poplawski 2008-08-19 14:20:52 UTC

Hm, good point.  Re-assigning to glibc.  Only happens in a xen environment though too.

Comment 3 Orion Poplawski 2008-08-28 16:05:55 UTC

Okay, looks like _IO_new_proc_open() has changed from F-9 to F-10.

In F-9 we get the pipe file descriptor with:

  if (_IO_pipe (pipe_fds) < 0)
    return NULL;

where it appears (being a total glibc novice) that _IO_pipe calls the "pipe" Linux syscall.

In F-10 we get the pipe file descriptor with:

      int r = __pipe2 (pipe_fds, O_CLOEXEC);

i.e. calling the "pipe2" linux syscall.  

So, it looks like the pipe2 linux syscall is borked in the xen kernel.  I see nonsense being returned:

(gdb) print pipe_fds
$9 = {-1425810944, 10922}

Now, do we "fix" by moving back to using __pipe in glibc or can we get __pipe2 fixed in the xen kernel?

Comment 4 Orion Poplawski 2008-08-28 16:13:55 UTC

I seem to have the results backwards in the first comment: I'm only seeing this on x86_64 guests, not i386.

Comment 5 Daniel Berrangé 2008-08-28 16:21:17 UTC

As Kevin mentions procsses run in the chroot on builders aren't using the rawhide kernel. So I not 100% sure that this is a xen specific issue. 

I think it is simply that the F10 glibc is wanting to be running on a kernel supporting the new pipe2 syscall, but F8/9 kernels used on the builders don't have this. I think glibc needs to detect when the kernel has no pipe2() syscall and fall back to old syscall usage. Or perhaps glibc already has this fallback, and it is simply not working correctly on x86_64 kernels ?

Comment 6 Orion Poplawski 2008-08-28 16:32:53 UTC

glibc checks for the pipe2 syscall with "__have_pipe2":

#ifdef O_CLOEXEC
# ifndef __ASSUME_PIPE2
  if (__have_pipe2 >= 0)
# endif
    {
      int r = __pipe2 (pipe_fds, O_CLOEXEC);
# ifndef __ASSUME_PIPE2
      if (__have_pipe2 == 0)
        __have_pipe2 = r != -1 || errno != ENOSYS ? 1 : -1;

      if (__have_pipe2 > 0)
# endif
        if (r < 0)
          return NULL;
    }
#endif
#ifndef __ASSUME_PIPE2
# ifdef O_CLOEXEC
  if (__have_pipe2 < 0)
# endif
    if (__pipe (pipe_fds) < 0)
      return NULL;
#endif


Presumably the xen kernels "have" pipe2, it just isn't working in the x86_64 version.

Comment 7 Jakub Jelinek 2008-08-28 16:35:11 UTC

Can you strace it?  glibc will fallback to pipe if pipe2 syscall fails with -ENOSYS.  Of course if the kernel pretends the syscall succeeded and returns garbage, it is a kernel that that has to be fixed.

Comment 8 Orion Poplawski 2008-08-28 17:04:54 UTC

brk(0x1d04c000)                         = 0x1d04c000
syscall_293(0x7fffd9777e80, 0x80000, 0x400799, 0x2b62d18c76f0, 0, 0x1d02b100, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) = 0x125
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0
x2b62d18c7780) = 27744
close(11106)                            = -1 EBADF (Bad file descriptor)
fcntl(-779343360, F_SETFD, 0)           = -1 EBADF (Bad file descriptor)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b62d1351000
read(3515623936, 0x2b62d1351000, 8192)  = -1 EBADF (Bad file descriptor)
read(3515623936, 0x2b62d1351000, 8192)  = -1 EBADF (Bad file descriptor)
read(3515623936, 0x2b62d1351000, 8192)  = -1 EBADF (Bad file descriptor)

Comment 9 Chris Lalancette 2008-08-28 17:13:46 UTC

Ug.  There is a bug in the RHEL-5 x86_64 kernel only that would cause "unknown" system calls to return their system call numbers instead of -1 with ENOSYS, as they should.  This is another instance of it.  This will be fixed by 5.3.

Chris Lalancette

Comment 10 Chris Lalancette 2008-08-28 17:15:46 UTC

Oh, I should have said "RHEL-5 xen x86_64 kernel"; and the relevant BZ is 453394.  Can you try the test kernels at http://people.redhat.com/dzickus/el5/105.el5/ and see if it fixes the issue for you?

Chris Lalancette

Comment 11 Kevin Kofler 2008-08-28 17:36:04 UTC

Well, we'll need that fixed kernel installed on the Xen host which hosts Koji.

Comment 12 Chris Lalancette 2008-08-28 17:39:56 UTC

Right.  Well, I asked this on the other bug as well; what do we need to do to make that happen?  Assuming that this does fix this issue (I would like confirmation), then we can propose it for the z-stream.  Would that work?

Chris Lalancette

Comment 13 Orion Poplawski 2008-08-28 17:42:26 UTC

New kernel works:

syscall_293(0x7fffedee1aa0, 0x80000, 0x400799, 0x2ba7bd15e6f0, 0, 0x1517100, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) = -1
(errno 38)
pipe([5, 6])                            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2ba7bd15e780) = 2185

I also saw this in the F-8 x86_64 xen kernel as well.

Thanks!

Comment 14 Kevin Kofler 2008-08-28 17:50:56 UTC

> what do we need to do to make that happen?

Talk to the infrastructure folks.

Comment 15 Chris Lalancette 2008-08-28 18:59:19 UTC

OK.  It seems like we have what we need in Koji already (a workaround), so we don't need to do anything further.  I'm going to close this bug as a dup of the other one, so we only have to track one bug.

Chris Lalancette

*** This bug has been marked as a duplicate of bug 453394 ***

Note You need to log in before you can comment on or make changes to this bug.