Test test_find_2 fails with reasonable reliability: find /sysroot/ -print0 find string: /sysroot/ find string: /sysroot/lost+found find string: /sysroot/a find string: /sysroot/b find string: /sysroot/b/c sock_read_event: 0x1ef69010 g->state = 3, fd = 3, events = 0x1 sock_read_event: 0x1ef69010 g->state = 3, fd = 3, events = 0x1 sock_write_event: 0x1ef69010 g->state = 3, fd = 3, events = 0x2 sock_write_event: writing 40 bytes ... sock_write_event: wrote 40 bytes sock_write_event: done writing, calling send_cb /sbin/blockdev --setrw /dev/sda sock_read_event: 0x1ef69010 g->state = 3, fd = 3, events = 0x1 sock_read_event: 0x1ef69010 g->state = 3, fd = 3, events = 0x1 sock_write_event: 0x1ef69010 g->state = 3, fd = 3, events = 0x2 sock_write_event: writing 28 bytes ... sock_write_event: wrote 28 bytes sock_write_event: done writing, calling send_cb mount umount /sysroot guestfsd: error: umount: /sysroot: umount: /sysroot: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) See: http://koji.fedoraproject.org/koji/getfile?taskID=1426167&name=build.log for the full build log.
*** Bug 507022 has been marked as a duplicate of this bug. ***
Updated summary. It's test_find_1 (ie. the second find test) which is actually failing, silently, but the stuck mount causes the subsequent tests to all fail.
This error is quite different from what I imagined. After some effort I've got down to this minimal test case: ./fish/guestfish -v -- \ alloc /tmp/test.img 100M : \ run : \ sfdisk /dev/sda 0 0 0 , : \ mkfs ext2 /dev/sda1 : \ mount /dev/sda1 / : \ touch /a : \ umount-all : \ echo done which prints (in the umount-all): guestfsd: error: umount: /sysroot: umount: /sysroot: device is busy. The 'touch' command seems to be important. Adding a sleep after 'touch' didn't make any difference. Also this error only appears to occur with the very latest Rawhide kernel (2.6.31-0.21.rc0.git18.fc12.x86_64).
lsof and fuser both report nothing using the filesystem. I think this must be a kernel bug ...
Tried replacing ext2 by ext3 .. same problem. Tried replacing ext2 by msdos .. same problem. So it doesn't appear to be a filesystem-specific problem, but something in the VFS or higher.
Doesn't happen with 'mkdir' (uses CHROOT_IN/OUT). Does happen with 'write_file' (uses CHROOT_IN/OUT). Doesn't happen with running 'echo 1 > /sysroot/a'. Appears to be connected to the CHROOT_IN/OUT macros, because when I modified the internal touch command to remove those, I didn't get the error.
This is a kernel bug - I've got a short reproducer program which I'll post here.
Created attachment 348882 [details] test.c
Created attachment 348883 [details] test.sh
To reproduce the bug: 1. Copy test.c and test.sh into /tmp 2. gcc -Wall -o /tmp/test /tmp/test.c 3. /tmp/test.sh If you see "Finished test successfully." then the kernel is not susceptible to this bug. However try this with 2.6.31-0.21.rc0.git18.fc12.x86_64 and the final umount will fail. You will end up with an unmountable filesystem /tmp/mnt. lsof and fuser show nothing using the filesystem. (Note: all my testing is being done under qemu-kvm, not on baremetal).
bash-4.0# ./test.sh 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.251 s, 41.8 MB/s mke2fs 1.41.4 (27-Jan-2009) Filesystem label= OS type: Linux Block size=1024 (log=0) Fragment size=1024 (log=0) 2560 inodes, 10240 blocks 512 blocks (5.00%) reserved for the super user First data block=1 Maximum filesystem blocks=10485760 2 block groups 8192 blocks per group, 8192 fragments per group 1280 inodes per group Superblock backups stored on blocks: 8193 Writing inode tables: done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 39 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. umount: /tmp/mnt: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) umount failed bash-4.0# mount none on /dev type tmpfs (rw) /proc on /proc type proc (rw) /sys on /sys type sysfs (rw) /dev/pts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/sda on /sysroot type squashfs (rw) /dev/loop0 on /tmp/mnt type ext2 (rw) bash-4.0# ls fs mnt test test.sh bash-4.0# ls / bin etc init lib64 mnt proc sbin sysroot usr dev home lib media opt root sys tmp var bash-4.0# ls /tmp/mnt/ hello lost+found bash-4.0# umount /tmp/mnt umount: /tmp/mnt: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) bash-4.0# /usr/sbin/lsof -V /tmp/mnt lsof: no file system use located: /tmp/mnt bash-4.0# umount /tmp/mnt umount: /tmp/mnt: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) bash-4.0# /sbin/fuser /tmp/mnt bash-4.0# echo $? 1 bash-4.0# /sbin/fuser -v /tmp/mnt bash-4.0# umount /tmp/mnt umount: /tmp/mnt: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1))
this is a problem. your reproducer isn't quite right, since the cwd of the reproducer is the dir you are trying to umount, but once the test exits you should be able to unmount. i will look into this, thanks.
nm, apparently chroot doesnt change the cwd, so the testcase should umount as well, sorry about that.
almost finished with the git bisect, hopefully i will have a fix for you tomorrow.
Thanks. I also posted a message on LKML, but there's been no response: http://lkml.org/lkml/2009/6/22/447
Created attachment 349094 [details] patch to fix the problem this patch fixes the testcase, but I would like you to test with the original stuff you were having problems with to make sure I've not broken anything else. Thank you.
I rebuilt 2.6.31-0.24.rc0.git18.fc11.x86_64 + the patch from comment 16, and reran the libguestfs test suite, and that appears to have fixed the problem. Josef, do you need me to push the patch upstream? I think it's better if you do it, since you understand what's going on in the code.
yup i will push it upstream. Thanks for verifying it.
Has this patch gone anywhere? I can't see it in linux-2.6 git or linux-next ...
Al pushed a fuller fix a few weeks ago after I posted my patch, so the problem should be fixed. Let me know if its not.
Yes I've verified that this works for me now. Closed -> Upstream. Thanks for looking at this.