Bug 1813809

Summary: Supermin should check if the files exist, not just the directory.
Product: [Community] Virtualization Tools Reporter: Pino Toscano <ptoscano>
Component: superminAssignee: Pino Toscano <ptoscano>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: kanda.motohiro, ptoscano, rjones
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1796120 Environment:
Last Closed: 2020-04-07 08:34:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1771976, 1796120    

Description Pino Toscano 2020-03-16 07:35:43 UTC
+++ This bug was initially created as a clone of Bug #1796120 +++

+++ This bug was initially created as a clone of Bug #1771976 +++

--- Additional comment from Pino Toscano on 2020-01-28 15:08:41 UTC ---

(In reply to Lubomír Sedlář from comment #11)
> We have seen the same problem again. The directory it complained about was 
> /var/tmp/.guestfs-101448/appliance.d/root and right after the failure it did
> not exist.

Under a properly cached appliance, you will see three files, something like (with different sizes of course):
$ du -hcs /var/tmp/.guestfs-101448/appliance.d/
1.6M    tmp/.guestfs-1000/appliance.d/initrd
9.9M    tmp/.guestfs-1000/appliance.d/kernel
383M    tmp/.guestfs-1000/appliance.d/root
394M    total

Since 'root' is a big file (at least 300/350M) and that the directory is under /var/tmp, my theory is that systemd-tmpfiles is cleaning that to save space.
In particular, /usr/lib/tmpfiles.d/tmp.conf sets 30d (at least on Fedora 31) as age for /var/tmp, so in case libguestfs was not run for more than 30 days then I think that files might be cleaned.

When this happens, can you please check the timestamp of the appliance.d and all the files inside it?
- `stat /var/tmp/.guestfs-101448/appliance.d/`
- `stat /var/tmp/.guestfs-101448/appliance.d/*`

---

The issue I see is that supermin (that creates the appliance in appliance.d) does not check for the existance of all the files it creates, but only whether the directory exists. Most probably it should be smarter and check which files exists -- sadly ATM it is not easy to do so, as the files created change depending on the appliance type (tgz vs chroot) and depending on the files themselves.

--- Additional comment from Richard W.M. Jones on 2020-01-28 15:34:48 UTC ---

> supermin: if-newer: output does not need rebuilding

This message is consistent with what Pino says.

https://github.com/libguestfs/supermin/blob/62d5c774d6c8fcac11e28fcba99754b5478e5088/src/supermin.ml#L232

Note on line 239 it only checks the date of the output directory.

--- Additional comment from Lubomír Sedlář on 2020-01-29 13:35:07 UTC ---

In this case the appliance.d/root file is 4.0G.

Looking around the machine, I see there is /var/tmp/.guestfs-$UID/ for multiple users where the only content is an empty appliance.d directory.

I can replicate the error now:

/var/tmp/.guestfs-$UID is missing => guestmount works
/var/tmp/.guestfs-$UID contains all files => guestmount works
/var/tmp/.guestfs-$UID/appliance.d is empty directory and nothing else exists in the directory => error

/usr/lib/tmpfiles.d/tmp.conf defines that files should be deleted after 30 days. 
    v /var/tmp 1777 root root 30d

Is there something I can change about the guestmount command to avoid this? The only workaround I can probably do is to unconditionally remove /var/tmp/.guestfs-$UID before running it.

--- Additional comment from Richard W.M. Jones on 2020-01-29 16:17:46 UTC ---

I think this is an actual bug in supermin, as described in Pino above.
I wonder why we've not seen it before though - maybe the defaults in
tmpfiles changed recently?

--- Additional comment from Richard W.M. Jones on 2020-01-29 17:21:20 CET ---

Simple reproduce for this:

(1) Run libguestfs-test-tool.

This should run successfully.

(2) rm /var/tmp/.guestfs-`id -u`/appliance.d/*

(3) Run libguestfs-test-tool again.

This time it should fail with an error similar to:

qemu-img: /tmp/libguestfsY8hHID/overlay2.qcow2: Could not open '/var/tmp/.guestfs-1000/appliance.d/root': No such file or directory

Comment 1 Pino Toscano 2020-03-16 07:38:09 UTC
*** Bug 1813807 has been marked as a duplicate of this bug. ***

Comment 2 Pino Toscano 2020-04-03 10:21:36 UTC
Attempt of fix posted:
https://www.redhat.com/archives/libguestfs/2020-April/msg00013.html