Bug 573210

Summary: [utrace] gdb can't debug bash inside an LXC container
Product: [Fedora] Fedora Reporter: Robin Green <greenrd>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 14CC: anton, dougsland, gansalmon, itamar, jonathan, keithlscott, kernel-maint, onestero, roland
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.35.14-96.fc14 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-09-06 23:58:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
bash binary from Exherbo Linux none

Description Robin Green 2010-03-13 12:41:33 UTC
Created attachment 399848 [details]
bash binary from Exherbo Linux

Description of problem:
gdb is unable to debug a certain binary (attached - a build of bash), but only when run inside an LXC container.

Version-Release number of selected component (if applicable):
kernel-PAE-2.6.32.9-70.fc12.i686

How reproducible:
Always

Steps to Reproduce:
1. Set up an LXC chroot containing gdb and the attached binary (e.g. using "mach yum install gdb")
2. Define the LXC chroot in virsh, e.g. virsh define foo.xml for some libvirt configuration file foo.xml (see bug 554203 for an example)
3. virsh --connect lxc:/// start foo && virsh --connect lxc:/// console foo
4. gdb bash.exherbo
5. Type 'r' to run the program
  
Actual results:
GNU gdb (GDB) Fedora (7.0.1-33.fc12)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /bin/bash...Reading symbols from /usr/lib/debug/bin/bash.debug...done.
done.
(gdb) r
Starting program: /bin/bash 
warning: linux_test_for_tracefork: failed to kill second child
waiting for new child: No child processes.
(gdb) 

Expected results:
GNU gdb (GDB) Fedora (7.0.1-33.fc12)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/greenrd/exherbo/bin/bash...Missing separate debuginfo for /home/greenrd/exherbo/bin/bash
(no debugging symbols found)...done.
(gdb) r
Starting program: /home/greenrd/exherbo/bin/bash 
warning: .dynamic section for "/lib/libc.so.6" is not at the expected address
warning: difference appears to be caused by prelink, adjusting expectations
Detaching after fork from child process 740.
Detaching after fork from child process 741.
Detaching after fork from child process 742.
Detaching after fork from child process 744.
Detaching after fork from child process 746.
Detaching after fork from child process 747.
Detaching after fork from child process 748.
Detaching after fork from child process 749.
Detaching after fork from child process 750.
Detaching after fork from child process 751.
Detaching after fork from child process 752.
Detaching after fork from child process 753.
Detaching after fork from child process 754.
Detaching after fork from child process 755.
[root@cspcnh exherbo]#

Additional info:
This bug does not occur on Fedora 13. It does not occur with the /bin/bash from Fedora 12, either (which is why I attached the particular binary it occurs with).

Comment 1 Robin Green 2010-03-13 15:19:45 UTC
Workaround: use an older kernel, e.g. kernel-PAE-2.6.31.12-174.2.22.fc12.i686

Comment 2 Robin Green 2010-08-23 05:26:06 UTC
Correction: This bug essentially *does* still occur on Fedora 13 (with a minimal Fedora 13 installed inside the chroot). But unlike in Fedora 12, bash.exherbo starts apparently successfully, and the error message "Waiting for child: no such process" only appears when you try to run an external command such as /bin/ls inside of bash.exherbo. So the manifestation of the bug is slightly different.

Comment 3 Robin Green 2010-08-23 05:29:36 UTC
And it turns out this bug also occurs with Fedora 13's /bin/bash on Fedora 13, in the same way I stated in comment#2 - so you don't need to use another distro's bash binary to reproduce this.

Comment 4 Chuck Ebbert 2010-08-24 16:21:36 UTC
Does the bug happen if you use an upstream kernel instead of the fedora kernel? I would suspect the utrace patches as the cause if it doesn't happen on an unmodified kernel.

Comment 5 Robin Green 2010-08-29 09:40:17 UTC
With upstream kernel 2.6.33.7 (the closest released version available) I get a different error message from gdb:

bash-4.1# ls
Couldn't write debug register: No such process

I've filed that bug upstream as https://bugzilla.kernel.org/show_bug.cgi?id=17281 I didn't add it to the upstream bug field of this bug because it's not necessarily the same issue.

Comment 6 Robin Green 2010-09-29 04:13:38 UTC
The upstream bug is now fixed in head, by

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=068e35eee9ef98eb4cab55181977e24995d273be

and is also fixed in 2.6.35.6.

Can this patch be applied to the Fedora kernel, please, to see if it fixes this bug as well?

Comment 7 Robin Green 2010-10-12 16:06:47 UTC
Issue still exists in F14 with kernel-PAE-2.6.35.6-39.fc14.i686 (note: I tested with the same F13 installation inside the chroot - I only upgraded the host, not the guest).

If I am reading the RPM version correctly, the upstream patch should be in that version (unless it's overridden by some Fedora patch), so this looks like it's probably NOT the same bug as the upstream bug.

Comment 8 Robin Green 2011-04-08 11:58:14 UTC
(In reply to comment #4)
> Does the bug happen if you use an upstream kernel instead of the fedora kernel?
> I would suspect the utrace patches as the cause if it doesn't happen on an
> unmodified kernel.

Yup, I commented out the utrace patches and rebuilt kernel-PAE-2.6.35.11-82.fc14.i686, and this bug disappeared. Without those patches commented out, this bug occurs.

Comment 9 Robin Green 2011-04-08 11:59:00 UTC
sorry, -83, not -82.

Comment 10 Keith Scott 2011-05-05 13:19:32 UTC
I get the same error (waiting for child: No child processes.) from a very simple pthreads example program, using 2.6.35.12-90.fc14.i686  --  This happens immediately after the call to pthread_create()

Comment 11 Josh Boyer 2011-08-26 15:16:14 UTC
Oleg, any thoughts on this?  See comment #8.

Comment 12 Oleg Nesterov 2011-08-26 16:31:06 UTC
I am puzzled. And I don't know what exactly fails. OK, perhaps
gdb tries to attach (or auto-attach) to the forked task. I don't
think http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=068e35eee9ef98eb4cab55181977e24995d273be can make
any difference.

And yes, we have the problems with the pid_ns here, but utrace
looks bug-compatible in this respect. I'll try to investigate
when I reserve the testing machine.

Comment 13 Oleg Nesterov 2011-08-26 16:34:23 UTC
(In reply to comment #10)
> I get the same error (waiting for child: No child processes.) from a very
> simple pthreads example program, using 2.6.35.12-90.fc14.i686  --  This happens
> immediately after the call to pthread_create()

Aha, I didn't notice this message. So, yes, it seems that something
is wrong with pids... May be.

Any chance you can confirm that this doesn't happen without
CONFIG_UTRACE or with upstream kernel?

Comment 14 Oleg Nesterov 2011-08-27 13:42:33 UTC
(In reply to comment #12)
> 
> And yes, we have the problems with the pid_ns here, but utrace
> looks bug-compatible in this respect.

Ooh, it is not. Can't understand how I didn't notice this before.

> I'll try to investigate
> when I reserve the testing machine.

unneeded. I'm pretty sure I understand the problem.

Comment 15 Oleg Nesterov 2011-08-27 17:54:07 UTC
[PATCH F-14] bz#573210: ptrace-utrace: fix PTRACE_GETEVENTMSG(pid) in LXC
http://lists.fedoraproject.org/pipermail/kernel/2011-August/003340.html

Comment 16 Josh Boyer 2011-08-29 13:31:11 UTC
I've committed the patch from comment #15.  This should be in the next F14 kernel build.

Comment 17 Fedora Update System 2011-09-01 15:24:55 UTC
kernel-2.6.35.14-96.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/kernel-2.6.35.14-96.fc14

Comment 18 Fedora Update System 2011-09-02 05:29:49 UTC
Package kernel-2.6.35.14-96.fc14:
* should fix your issue,
* was pushed to the Fedora 14 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-2.6.35.14-96.fc14'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/kernel-2.6.35.14-96.fc14
then log in and leave karma (feedback).

Comment 19 Fedora Update System 2011-09-06 23:57:52 UTC
kernel-2.6.35.14-96.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.