Bug 1771976
| Summary: | Occasional supermin failures when the appliance is partially deleted perhaps by a tmp cleaner | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Lubomír Sedlář <lsedlar> | |
| Component: | supermin | Assignee: | Richard W.M. Jones <rjones> | |
| Status: | CLOSED WONTFIX | QA Contact: | YongkuiGuo <yoguo> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 7.7 | CC: | bthurber, fdupont, nperic, ptoscano, rjones, tzheng, xchen, yoguo | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1796120 (view as bug list) | Environment: | ||
| Last Closed: | 2020-06-29 09:45:23 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1813809 | |||
| Bug Blocks: | ||||
Does /var/tmp/.guestfs-1000/appliance.d/root exist? How about other files under /var/tmp/.guestfs-1000 ? The files exist now, but there may have been successful runs since seeing the issues. ls -lh /var/tmp/.guestfs-1000/appliance.d/ total 337M -rw-r--r--. 1 rcm-auto-compose rcm-auto-compose 3.3M Nov 13 03:34 initrd -rwxr-xr-x. 1 rcm-auto-compose rcm-auto-compose 6.5M Nov 13 03:34 kernel -rw-r--r--. 1 rcm-auto-compose rcm-auto-compose 4.0G Nov 13 03:34 root /var/tmp/.guestfs-1000 ├── appliance.d │ ├── initrd │ ├── kernel │ └── root ├── lock ├── qemu-5258400-1565614036.devices ├── qemu-5258400-1565614036.help ├── qemu-5258400-1565614036.qmp-schema ├── qemu-5258400-1565614036.query-kvm └── qemu-5258400-1565614036.stat Could this be caused by multiple instances of guestmount running at the same time? > Could this be caused by multiple instances of guestmount running at the same time?
No, if the appliance needs to be rebuilt then it takes a lock so
parallel runs are fine.
Anyway it seems as if /var/tmp/.guestfs-1000/appliance.d/root now exists. Does
it work now?
I can't test it since I don't have access to the machine in question. I'll ask owners to test. One one affected machine the /var/tmp/.guestfs-1000 is empty, and there it seems to not be working. What is responsible for creating the structure? The directory is a cache, populated on demand when any libguestfs-using tool runs. You can delete it at any time if you want to free up space, at the cost of slower start up next time you use a libguestfs tool. This bug will be addressed in next major release. Cannot reproduce this issue with the following steps: Version of packages: libguestfs-1.40.2-5.el7_7.1.x86_64 qemu-kvm-1.5.3-167.el7.x86_64 kernel-3.10.0-1062.el7.x86_64 1. Install libguestfs relevant packages in a rhel7.7 vm 2. # wget http://download-node-02.eng.bos.redhat.com/rhel-7/nightly/RHEL-7/RHEL-7.8-20191031.n.1/work/x86_64/buildinstall/Server/images/efiboot.img # mkdir /tmp/guestmount-test # LIBGUESTFS_BACKEND=direct guestmount -a /home/efiboot.img -m /dev/sda /tmp/guestmount-test # ll /tmp/guestmount-test total 2 drwxr-xr-x. 3 root root 2048 Oct 31 19:47 EFI Move back against RHEL 7.7. We have seen the same problem again. The directory it complained about was /var/tmp/.guestfs-101448/appliance.d/root and right after the failure it did not exist. How can I debug this to provide more information? Definitely enable debugging all the time, and capture the messages on every run. When it happens we'd want to see the full messages. (It's probably not going to help to enable debugging right now and run it again once. We want to see debug from the time it starts happening). (In reply to Lubomír Sedlář from comment #11) > We have seen the same problem again. The directory it complained about was > /var/tmp/.guestfs-101448/appliance.d/root and right after the failure it did > not exist. Under a properly cached appliance, you will see three files, something like (with different sizes of course): $ du -hcs /var/tmp/.guestfs-101448/appliance.d/ 1.6M tmp/.guestfs-1000/appliance.d/initrd 9.9M tmp/.guestfs-1000/appliance.d/kernel 383M tmp/.guestfs-1000/appliance.d/root 394M total Since 'root' is a big file (at least 300/350M) and that the directory is under /var/tmp, my theory is that systemd-tmpfiles is cleaning that to save space. In particular, /usr/lib/tmpfiles.d/tmp.conf sets 30d (at least on Fedora 31) as age for /var/tmp, so in case libguestfs was not run for more than 30 days then I think that files might be cleaned. When this happens, can you please check the timestamp of the appliance.d and all the files inside it? - `stat /var/tmp/.guestfs-101448/appliance.d/` - `stat /var/tmp/.guestfs-101448/appliance.d/*` --- The issue I see is that supermin (that creates the appliance in appliance.d) does not check for the existance of all the files it creates, but only whether the directory exists. Most probably it should be smarter and check which files exists -- sadly ATM it is not easy to do so, as the files created change depending on the appliance type (tgz vs chroot) and depending on the files themselves. For the latest failure I have this. It was run with LIBGUESTFS_DEBUG=1 and LIBGUESTFS_TRACE=1. Is there more debugging information I can turn on?
2020-01-28 14:10:12 [ERROR ] Command ['guestmount', '-a', '/mnt/redhat/rhel-8/devel/candidate-trees/RHEL-8/RHEL-8.1.1-updates-20200128.3/work/global/tmp/tweak_buildinstall_EkC1x6/images/efiboot.img', '-m', '/dev/sda', '/tmp/iso-mount-G5ZZbg'] exited with 1 and output:
libguestfs: trace: set_verbose true
libguestfs: trace: set_verbose = 0
libguestfs: trace: set_backend "direct"
libguestfs: trace: set_backend = 0
libguestfs: create: flags = 0, handle = 0x563460863b50, program = guestmount
libguestfs: trace: set_recovery_proc false
libguestfs: trace: set_recovery_proc = 0
libguestfs: trace: add_drive "/mnt/redhat/rhel-8/devel/candidate-trees/RHEL-8/RHEL-8.1.1-updates-20200128.3/work/global/tmp/tweak_buildinstall_EkC1x6/images/efiboot.img"
libguestfs: trace: add_drive = 0
libguestfs: trace: launch
libguestfs: trace: max_disks
libguestfs: trace: max_disks = 255
libguestfs: trace: get_tmpdir
libguestfs: trace: get_tmpdir = "/tmp"
libguestfs: trace: version
libguestfs: trace: version = <struct guestfs_version = major: 1, minor: 40, release: 2, extra: rhel=7,release=5.el7_7.2,libvirt, >
libguestfs: trace: get_backend
libguestfs: trace: get_backend = "direct"
libguestfs: launch: program=guestmount
libguestfs: launch: version=1.40.2rhel=7,release=5.el7_7.2,libvirt
libguestfs: launch: backend registered: unix
libguestfs: launch: backend registered: uml
libguestfs: launch: backend registered: libvirt
libguestfs: launch: backend registered: direct
libguestfs: launch: backend=direct
libguestfs: launch: tmpdir=/tmp/libguestfskaSVES
libguestfs: launch: umask=0002
libguestfs: launch: euid=101448
libguestfs: trace: get_cachedir
libguestfs: trace: get_cachedir = "/var/tmp"
libguestfs: begin building supermin appliance
libguestfs: run supermin
libguestfs: command: run: /usr/bin/supermin5
libguestfs: command: run: \ --build
libguestfs: command: run: \ --verbose
libguestfs: command: run: \ --if-newer
libguestfs: command: run: \ --lock /var/tmp/.guestfs-101448/lock
libguestfs: command: run: \ --copy-kernel
libguestfs: command: run: \ -f ext2
libguestfs: command: run: \ --host-cpu x86_64
libguestfs: command: run: \ /usr/lib64/guestfs/supermin.d
libguestfs: command: run: \ -o /var/tmp/.guestfs-101448/appliance.d
supermin: version: 5.1.19
supermin: rpm: detected RPM version 4.13
supermin: package handler: fedora/rpm
supermin: acquiring lock on /var/tmp/.guestfs-101448/lock
supermin: if-newer: output does not need rebuilding
libguestfs: finished building supermin appliance
libguestfs: begin testing qemu features
libguestfs: trace: get_cachedir
libguestfs: trace: get_cachedir = "/var/tmp"
libguestfs: checking for previously cached test results of /usr/libexec/qemu-kvm, in /var/tmp/.guestfs-101448
libguestfs: loading previously cached test results
libguestfs: QMP parse error: '[' or '{' expected near end of file (ignored)
libguestfs: qemu version: 1.5
libguestfs: qemu mandatory locking: no
libguestfs: qemu KVM: disabled
libguestfs: trace: get_backend_setting "force_tcg"
libguestfs: trace: get_backend_setting = NULL (error)
libguestfs: trace: get_sockdir
libguestfs: trace: get_sockdir = "/tmp"
libguestfs: finished testing qemu features
libguestfs: trace: get_backend_setting "gdb"
libguestfs: trace: get_backend_setting = NULL (error)
libguestfs: command: run: dmesg | grep -Eoh 'lpj=[[:digit:]]+'
libguestfs: read_lpj_from_dmesg: calculated lpj=2499998
/usr/libexec/qemu-kvm \
-global virtio-blk-pci.scsi=off \
-no-user-config \
-enable-fips \
-nodefaults \
-display none \
-machine accel=kvm:tcg \
-m 768 \
-no-reboot \
-rtc driftfix=slew \
-no-hpet \
-global kvm-pit.lost_tick_policy=discard \
-kernel /var/tmp/.guestfs-101448/appliance.d/kernel \
-initrd /var/tmp/.guestfs-101448/appliance.d/initrd \
-object rng-random,filename=/dev/urandom,id=rng0 \
-device virtio-rng-pci,rng=rng0 \
-device virtio-scsi-pci,id=scsi \
-drive file=/mnt/redhat/rhel-8/devel/candidate-trees/RHEL-8/RHEL-8.1.1-updates-20200128.3/work/global/tmp/tweak_buildinstall_EkC1x6/images/efiboot.img,cache=writeback,id=hd0,if=none \
-device scsi-hd,drive=hd0 \
-drive file=/var/tmp/.guestfs-101448/appliance.d/root,snapshot=on,id=appliance,cache=unsafe,if=none,format=raw \
-device scsi-hd,drive=appliance \
-device virtio-serial-pci \
-serial stdio \
-device sga \
-chardev socket,path=/tmp/libguestfsiIhnQy/guestfsd.sock,id=channel0 \
-device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \
-append "panic=1 console=ttyS0 edd=off udevtimeout=6000 udev.event-timeout=6000 no_timer_check lpj=2499998 printk.time=1 cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable 8250.nr_uarts=1 root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=linux"
Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory
Back to tcg accelerator.
qemu-kvm: -drive file=/var/tmp/.guestfs-101448/appliance.d/root,snapshot=on,id=appliance,cache=unsafe,if=none,format=raw: could not open disk image /var/tmp/.guestfs-101448/appliance.d/root: Could not open '/var/tmp/.guestfs-101448/appliance.d/root': No such file or directory
libguestfs: error: appliance closed the connection unexpectedly, see earlier error messages
libguestfs: child_cleanup: 0x563460863b50: child process died
libguestfs: sending SIGTERM to process 46402
libguestfs: error: guestfs_launch failed, see earlier error messages
libguestfs: trace: launch = -1 (error)
libguestfs: trace: close
libguestfs: closing guestfs handle 0x563460863b50 (state 0)
libguestfs: command: run: rm
libguestfs: command: run: \ -rf /tmp/libguestfskaSVES
libguestfs: command: run: rm
libguestfs: command: run: \ -rf /tmp/libguestfsiIhnQy
There is no more debug. > supermin: if-newer: output does not need rebuilding This message is consistent with what Pino says. https://github.com/libguestfs/supermin/blob/62d5c774d6c8fcac11e28fcba99754b5478e5088/src/supermin.ml#L232 Note on line 239 it only checks the date of the output directory. In this case the appliance.d/root file is 4.0G.
Looking around the machine, I see there is /var/tmp/.guestfs-$UID/ for multiple users where the only content is an empty appliance.d directory.
I can replicate the error now:
/var/tmp/.guestfs-$UID is missing => guestmount works
/var/tmp/.guestfs-$UID contains all files => guestmount works
/var/tmp/.guestfs-$UID/appliance.d is empty directory and nothing else exists in the directory => error
/usr/lib/tmpfiles.d/tmp.conf defines that files should be deleted after 30 days.
v /var/tmp 1777 root root 30d
Is there something I can change about the guestmount command to avoid this? The only workaround I can probably do is to unconditionally remove /var/tmp/.guestfs-$UID before running it.
I think this is an actual bug in supermin, as described in Pino in comment 13. I wonder why we've not seen it before though - maybe the defaults in tmpfiles changed recently? While removing /var/tmp/.guestfs-* will work it means it'll rebuild the cache each time, so YMMV. > While removing /var/tmp/.guestfs-* will work it means it'll rebuild the cache each time, so YMMV.
Yeah, I will probably check if the appliance.d is empty, and only remove the directory in such case. That still leaves some room for possible race condition, but it seems unlikely enough to not warrant the performance penalty.
Anyway, thank you all for debugging this!
Closing per comment 21. |
Description of problem: RHEL 7.8 nightly composes are sometimes failing when they can not mount an image locally with guestmount. I'm not sure if this is a bug in guestmount, incorrectly set up machine or something else entirely. Please advise. The entire process is running inside a VM. Version-Release number of selected component (if applicable): libguestfs-tools-c-1.40.2-5.el7_7.1.x86_64 How reproducible: Steps to Reproduce: 1. mktemp -d /tmp/iso-mount-XXXXXX 2. LIBGUESTFS_BACKEND=direct LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1 guestmount -a /mnt/redhat/rhel-7/nightly/RHEL-7/RHEL-7.8-20191031.n.1/work/global/tmp/tweak_buildinstall_HpWAdm/images/efiboot.img -m /dev/sda /tmp/iso-mount-9Iw40N Actual results: libguestfs: trace: set_verbose true libguestfs: trace: set_verbose = 0 libguestfs: trace: set_backend "direct" libguestfs: trace: set_backend = 0 libguestfs: create: flags = 0, handle = 0x5624eddd1b50, program = guestmount libguestfs: trace: set_recovery_proc false libguestfs: trace: set_recovery_proc = 0 libguestfs: trace: add_drive "/mnt/redhat/rhel-7/nightly/RHEL-7/RHEL-7.8-20191031.n.1/work/global/tmp/tweak_buildinstall_HpWAdm/images/efiboot.img" libguestfs: trace: add_drive = 0 libguestfs: trace: launch libguestfs: trace: max_disks libguestfs: trace: max_disks = 255 libguestfs: trace: get_tmpdir libguestfs: trace: get_tmpdir = "/tmp" libguestfs: trace: version libguestfs: trace: version = <struct guestfs_version = major: 1, minor: 40, release: 2, extra: rhel=7,release=5.el7_7.1,libvirt, > libguestfs: trace: get_backend libguestfs: trace: get_backend = "direct" libguestfs: launch: program=guestmount libguestfs: launch: version=1.40.2rhel=7,release=5.el7_7.1,libvirt libguestfs: launch: backend registered: unix libguestfs: launch: backend registered: uml libguestfs: launch: backend registered: libvirt libguestfs: launch: backend registered: direct libguestfs: launch: backend=direct libguestfs: launch: tmpdir=/tmp/libguestfstmaWpa libguestfs: launch: umask=0002 libguestfs: launch: euid=1000 libguestfs: trace: get_cachedir libguestfs: trace: get_cachedir = "/var/tmp" libguestfs: begin building supermin appliance libguestfs: run supermin libguestfs: command: run: /usr/bin/supermin5 libguestfs: command: run: \ --build libguestfs: command: run: \ --verbose libguestfs: command: run: \ --if-newer libguestfs: command: run: \ --lock /var/tmp/.guestfs-1000/lock libguestfs: command: run: \ --copy-kernel libguestfs: command: run: \ -f ext2 libguestfs: command: run: \ --host-cpu x86_64 libguestfs: command: run: \ /usr/lib64/guestfs/supermin.d libguestfs: command: run: \ -o /var/tmp/.guestfs-1000/appliance.d supermin: version: 5.1.19 supermin: rpm: detected RPM version 4.11 supermin: package handler: fedora/rpm supermin: acquiring lock on /var/tmp/.guestfs-1000/lock supermin: if-newer: output does not need rebuilding libguestfs: finished building supermin appliance libguestfs: begin testing qemu features libguestfs: trace: get_cachedir libguestfs: trace: get_cachedir = "/var/tmp" libguestfs: checking for previously cached test results of /usr/libexec/qemu-kvm, in /var/tmp/.guestfs-1000 libguestfs: command: run: /usr/libexec/qemu-kvm libguestfs: command: run: \ -display none libguestfs: command: run: \ -help libguestfs: command: run: /usr/libexec/qemu-kvm libguestfs: command: run: \ -display none libguestfs: command: run: \ -machine accel=kvm:tcg libguestfs: command: run: \ -device ? libguestfs: command: run: echo '{ "execute": "qmp_capabilities" }' '{ "execute": "query-qmp-schema" }' '{ "execute": "quit" }' | "/usr/libexec/qemu-kvm" -display none -machine "accel=kvm:tcg" -qmp stdio libguestfs: did not understand QMP monitor output from /usr/libexec/qemu-kvm (ignored) libguestfs: command: run: echo '{ "execute": "qmp_capabilities" }' '{ "execute": "query-kvm" }' '{ "execute": "quit" }' | "/usr/libexec/qemu-kvm" -display none -machine "accel=kvm:tcg" -qmp stdio libguestfs: saving test results libguestfs: QMP parse error: '[' or '{' expected near end of file (ignored) libguestfs: qemu version: 1.5 libguestfs: qemu mandatory locking: no libguestfs: qemu KVM: disabled libguestfs: trace: get_backend_setting "force_tcg" libguestfs: trace: get_backend_setting = NULL (error) libguestfs: trace: get_sockdir libguestfs: trace: get_sockdir = "/tmp" libguestfs: finished testing qemu features libguestfs: trace: get_backend_setting "gdb" libguestfs: trace: get_backend_setting = NULL (error) libguestfs: command: run: dmesg | grep -Eoh 'lpj=[[:digit:]]+' libguestfs: read_lpj_from_dmesg: calculated lpj=2499998 /usr/libexec/qemu-kvm \ -global virtio-blk-pci.scsi=off \ -no-user-config \ -enable-fips \ -nodefaults \ -display none \ -machine accel=kvm:tcg \ -m 768 \ -no-reboot \ -rtc driftfix=slew \ -no-hpet \ -global kvm-pit.lost_tick_policy=discard \ -kernel /var/tmp/.guestfs-1000/appliance.d/kernel \ -initrd /var/tmp/.guestfs-1000/appliance.d/initrd \ -object rng-random,filename=/dev/urandom,id=rng0 \ -device virtio-rng-pci,rng=rng0 \ -device virtio-scsi-pci,id=scsi \ -drive file=/mnt/redhat/rhel-7/nightly/RHEL-7/RHEL-7.8-20191031.n.1/work/global/tmp/tweak_buildinstall_HpWAdm/images/efiboot.img,cache=writeback,id=hd0,if=none \ -device scsi-hd,drive=hd0 \ -drive file=/var/tmp/.guestfs-1000/appliance.d/root,snapshot=on,id=appliance,cache=unsafe,if=none,format=raw \ -device scsi-hd,drive=appliance \ -device virtio-serial-pci \ -serial stdio \ -device sga \ -chardev socket,path=/tmp/libguestfsPLBKqz/guestfsd.sock,id=channel0 \ -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \ -append "panic=1 console=ttyS0 edd=off udevtimeout=6000 udev.event-timeout=6000 no_timer_check lpj=2499998 printk.time=1 cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable 8250.nr_uarts=1 root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=linux" Could not access KVM kernel module: No such file or directory failed to initialize KVM: No such file or directory Back to tcg accelerator. qemu-kvm: -drive file=/var/tmp/.guestfs-1000/appliance.d/root,snapshot=on,id=appliance,cache=unsafe,if=none,format=raw: could not open disk image /var/tmp/.guestfs-1000/appliance.d/root: Could not open '/var/tmp/.guestfs-1000/appliance.d/root': No such file or directory libguestfs: error: appliance closed the connection unexpectedly, see earlier error messages libguestfs: child_cleanup: 0x5624eddd1b50: child process died libguestfs: sending SIGTERM to process 53716 libguestfs: error: guestfs_launch failed, see earlier error messages libguestfs: trace: launch = -1 (error) libguestfs: trace: close libguestfs: closing guestfs handle 0x5624eddd1b50 (state 0) libguestfs: command: run: rm libguestfs: command: run: \ -rf /tmp/libguestfstmaWpa libguestfs: command: run: rm libguestfs: command: run: \ -rf /tmp/libguestfsPLBKqz