Bug 2104408 - /var/cache/dnf fills up and will cause upgrades to fail
Summary: /var/cache/dnf fills up and will cause upgrades to fail
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: ovirt-host-deploy-ansible
Version: 4.5.1
Hardware: x86_64
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.5.2
: 4.5.2
Assignee: Dana
QA Contact: Lukas Svaty
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-06 07:52 UTC by Nathaniel Roach
Modified: 2022-08-30 08:49 UTC (History)
14 users (show)

Fixed In Version: ovirt-engine-4.5.2
Doc Type: Release Note
Doc Text:
yum cache is now cleared always before upgrading a host to minimize issues with filling up available space in /var/tmp
Clone Of: 2055829
Environment:
Last Closed: 2022-08-08 08:17:16 UTC
oVirt Team: Infra
Embargoed:
mperina: ovirt-4.5+


Attachments (Terms of Use)
disk utilisation during upgrade (16.03 KB, image/png)
2022-07-06 07:52 UTC, Nathaniel Roach
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-engine pull 517 0 None open Clean cache before upgrading packages 2022-07-06 12:13:01 UTC
Red Hat Bugzilla 2055829 0 unspecified CLOSED [RFE] /var/tmp should be on its own partition 2022-07-06 07:51:59 UTC
Red Hat Issue Tracker RHV-46999 0 None None None 2022-07-06 08:16:10 UTC

Description Nathaniel Roach 2022-07-06 07:52:00 UTC
Created attachment 1894856 [details]
disk utilisation during upgrade

This is probably made worse by changes as a result of #2055829

After a 2-3 system updates of a ONN 4.5 node, upgrades will fail due to /var/cache/dnf filling with copies of the upgrade packages. The events view will report that the last successful task is "Stop services" (will log a RFE to improve the messaging there).

This did seem to occur on 4.4 but to a lessened extent.

The log on the node in question shows that the failure happens on the following ansible task:

python3[224883]: ansible-ansible.legacy.dnf Invoked with name=['ovirt-node-ng-image-update.noarch'] state=latest lock_timeout=300 conf_file=/tmp/yum.conf allow_downgrade=False autoremove=False bugfix=False cacheonly=False disable_gpg_check=False disable_plugin=[] disablerepo=[] download_only=False enable_plugin=[] enablerepo=[] exclude=[] installroot=/ install_repoquery=True install_weak_deps=True security=False skip_broken=False update_cache=False update_only=False validate_certs=True allowerasing=False nobest=False disable_excludes=None download_dir=None list=None releasever=None``

After seeing this I was able to confirm that /var/ had filled to approx 70-80% (of 5GB) utilisation, and was able to see that during the upgrade the FS would completely fill (ref attachment)

Would it be possible to have the space checked and the cache cleared automatically? In the case of my nodes, I manually had to rm -rfv /var/cache/dnf/* to allow the upgrade to progress.

Comment 1 RHEL Program Management 2022-07-06 08:51:56 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 2 Sandro Bonazzola 2022-07-06 08:57:44 UTC
Moving to ovirt-engine ; ansible code taking care of `dnf update` should `dnf clean packages` before performing the update.

As a side consideration, just enlrging /var won't solve, it will just make this happen a bit later.

Comment 3 Yedidyah Bar David 2022-07-06 09:51:27 UTC
(In reply to Sandro Bonazzola from comment #2)
> Moving to ovirt-engine ; ansible code taking care of `dnf update` should
> `dnf clean packages` before performing the update.

+1.

Perhaps do this conditionally - there might be cases where users would want to prevent this - e.g. for testing of a particular upgrade or something similar, you might want to be able to repeatedly try to upgrade the same package without having to wait for the downloads. Not that important.

Also, perhaps instead of, or in addition, check for free space in /var (or /var/cache, although I do not think we have plans to split it to its own FS) and fail with a suitable message if there is not enough space (not sure how much is the minimum - need to check how much is actually needed right now and add some safety margin).

> 
> As a side consideration, just enlrging /var won't solve, it will just make
> this happen a bit later.

Not exactly. If you consider current bug to be a regression caused by bug 2055829, then a trivial fix for the specific regression is to make /var itself 15GB. Obviously this would still only "make this happen a bit later", but this is always the case, right?

The point is that in bug 2055829 we decided to not enlarge the total required space, because we thought we do not need to, and did not want to make the "minimal requirements" now become bigger just because of splitting /var/tmp. I still think it made sense.

Going forward, as part of the current bug or a future/RFE one, it might make sense to allow the installation process allocate space for the LVs that we create with sizes that are not hard-coded, but depend on the total disk/VG size. E.g. something like the current sizes (as set in IMGBASED_DEFAULT_VOLUMES) + a some percent (different per LV) from the remaining free space.

Comment 5 Sandro Bonazzola 2022-08-30 08:49:07 UTC
This bugzilla is included in oVirt 4.5.2 release, published on August 10th 2022.
Since the problem described in this bug report should be resolved in oVirt 4.5.2 release, it has been closed with a resolution of CURRENT RELEASE.
If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.