This is the error message, which seems to indicate a bug in the sha1sum program itself: sha1sum /sysroot/new sha1sum[986]: segfault at 0 ip (null) sp bfe1887c error 4 in libc-2.10.1.so[110000+16b000]
Created attachment 345019 [details] build.log Search the attachment for 'sha1sum' and you can see the program segfaulting like crazy.
I wonder if this is a general 'coreutils programs segfault' in the guest problem. For example, here's another one: md5sum /sysroot/upload md5sum[1038]: segfault at b697 ip 0000af67 sp bfb332d0 error 4 in libc-2.10.1.so[110000+16b000] from http://koji.fedoraproject.org/koji/getfile?taskID=1369168&name=build.log
Adding Jim Meyering to the CC of this bug. Jim: This is very out of leftfield for coreutils, but I wonder if you have any idea why the *sum programs in coreutils could possibly segfault randomly when run in a Fedora 11 i586 qemu guest? F-11, i586 and qemu [not KVM] all seem to be significant factors.
Hi Rich, a stack trace, gdb backtrace from sha1sum would help, but here's a shot in the dark: could this be due to locale settings different from LC_ALL=C and running without locale-related files? I.e., could your removing some locale-related infrastructure have caused this? Then I looked at your attachment. Right before the first segfault, I see what looks like serious FS trouble with a mkdir syscall. It's probably worth investigating that first. ------------[ cut here ]------------ WARNING: at fs/fs-writeback.c:302 __writeback_single_inode+0x1d4/0x27a() (Tainted: G W ) Hardware name: Modules linked in: ext2 virtio_net virtio_pci Pid: 276, comm: mkdir Tainted: G W 2.6.29.3-155.fc11.i586 #1 Call Trace: [<c042f2f2>] warn_slowpath+0x7c/0xa4 [<c0478bf0>] ? find_get_pages_tag+0x32/0xa2 [<c047fb86>] ? pagevec_lookup_tag+0x1e/0x25 [<c0479935>] ? wait_on_page_writeback_range+0xa2/0xdd [<c04bb661>] ? wait_on_buffer+0x32/0x35 [<c04bb8e2>] ? sync_dirty_buffer+0x59/0x8d [<d887409f>] ? brelse+0x11/0x13 [ext2] [<d887432d>] ? ext2_update_inode+0x28c/0x2c7 [ext2] [<c04b673d>] __writeback_single_inode+0x1d4/0x27a [<c044077d>] ? wake_bit_function+0x0/0x3c [<c04b6808>] sync_inode+0x25/0x38 [<d8873fa0>] ext2_sync_inode+0x2c/0x33 [ext2] [<d8872838>] ext2_commit_chunk+0x92/0xa6 [ext2] [<d88729af>] ext2_make_empty+0x163/0x17a [ext2] [<d8875d03>] ext2_mkdir+0x9d/0xf1 [ext2] [<c04a8501>] vfs_mkdir+0x61/0x9f [<c04a9b8b>] sys_mkdirat+0x89/0xc2 [<c0433ee0>] ? _local_bh_enable+0x8d/0x9d [<c043401a>] ? __do_softirq+0x12a/0x139 [<c04a9bd9>] sys_mkdir+0x15/0x17 [<c0403f72>] syscall_call+0x7/0xb ---[ end trace fe7116eb9e9c7886 ]---
This is the environment that commands running in the daemon get. TERM=linux PWD=/ guestfs=10.0.2.4:6666 SHLVL=0 HOME=/ I'm now in the process of adding more environment variables (particularly PATH) to see if that fixes the problem.
Nasty, nasty - could be worth trying to reproduce locally with qemu -no-kvm
fixed typo in title: s/libguetfs/libguestfs/
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle. Changing version to '11'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
*** Bug 512680 has been marked as a duplicate of this bug. ***
Changed the summary to: "running commands segfault randomly, with 32 bit host and guest, when host is a Xen or VMWare guest" All the factors in that summary seem to be crucial: the "host" (ie. what is running libguestfs) will be a 32 bit Xen or VMWare guest. Programs running in the guestfs will segfault randomly, or give a spurious kernel error like "no vm86_info: BAD error", or give a qemu error "tcg fatal error". The problem seems to be confined to F11, i586, RHEL 5 Xen.
Verified that this doesn't happen with the vanilla, upstream qemu 0.10.4. Solution is probably to upgrade qemu, but we need to do a proper bisect.
Setting Product to Virtualization Tools. Haven't seen this for a long time, and it's probably a TCG-related bug, but I'll leave it open for now.
Hrm, I just started seeing this (or something else with the same symptoms) after a domU reboot. 5.3 Dom0, F10 domU. When it gets like this, seemingly everything will segfault (even 'less'), though I can get a reboot to happen. A reboot of the domU will forestall the problem for something on the order of a day. I see no obvious problems on the Dom0 or in a light Fedora 11 DomU or a Nexenta (OpenSolaris) DomU. segfaults look like: Sep 29 15:14:38 borlaug kernel: MailScanner[26490]: segfault at 50 ip 008071ba s p bfdc7820 error 6 in libperl.so[6f7000+269000] Sep 29 15:15:10 borlaug kernel: MailScanner[26599]: segfault at 639ce069 ip 007a 83b1 sp bfdc7990 error 4 in libperl.so[6f7000+269000] One difference would be these two kernel versions: Aug 25 21:28:31 borlaug kernel: Linux version 2.6.27.21-170.2.56.fc10.i686.PAE (mockbuild.redhat.com) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #1 SMP Mon Mar 23 23:24:26 EDT 2009 Sep 14 18:35:31 borlaug kernel: Linux version 2.6.27.30-170.2.82.fc10.i686.PAE (mockbuild.phx.redhat.com) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #1 SMP Mon Aug 17 08:24:23 EDT 2009 thought there have been other subsequent package updates. Is it helpful to get a bt on e.g. 'less' when this happens again? I'll try switching back to the old kernel to see if that makes a difference.
The domU kernel didn't make a difference, but the dom0 kernel did. Problems (any of several f10 domU kernels): Sep 26 19:08:02 stevens kernel: Linux version 2.6.18-164.el5xen (mockbuild.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) #1 SMP Thu Sep 3 04:47:32 EDT 2009 No problems (most recent f10 domU kernel): Oct 7 23:42:28 stevens kernel: Linux version 2.6.18-128.1.10.el5xen (mockbuild.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Thu May 7 11:51:15 EDT 2009 am I chasing the same bug here? I skipped many updates with a reboot of the Dom0 - I'll try a binary search of the updates to see where the problems started if this isn't already being hunted in another bug.
I'm pretty certain this is a different bug. The current bug is with TCG emulation under QEMU, and nothing to do with Xen.
Ah, I see I've misparsed comment #10. I'll find/make a different one.
Still seeing this on Koji, eg: http://koji.fedoraproject.org/koji/getfile?taskID=2025462&name=build.log sha224sum /sysroot/known-3 [ 966.687020] sha224sum[5431]: segfault at eae7 ip 0000e3b7 sp bfa418d0 error 4 in ld-2.11.1.so[581000+1e000] [ 966.691152] no vm86_info: BAD sha256sum /sysroot/known-3 [ 967.673020] sha256sum[5441]: segfault at f056 ip 0000e926 sp bfbf4ef0 error 4 in libc-2.11.1.so[16a000+16f000] [ 967.674848] no vm86_info: BAD
Another example: http://koji.fedoraproject.org/koji/getfile?taskID=2025622&name=build.log sha224sum /sysroot/known-3 [ 1013.971021] sha224sum[5281]: segfault at e864 ip 0000e134 sp bfbfb7f0 error 4 in libc-2.11.1.so[5fd000+16f000] [ 1013.975147] no vm86_info: BAD (Note that this test was exactly the same as comment 17, and yet it only failed once this time, so the failures are not completely deterministic).
Haven't seen this happen for quite a long time, and we regularly run the full test suite on i386 and x86-64. Closing ...