Bug 2121665

Summary: Failed to install qemu-guest-agent package on the first boot after conversion on OpenSuse-Tumbleweed
Product: Red Hat Enterprise Linux 9 Reporter: Vera <vwu>
Component: virt-v2vAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED MIGRATED QA Contact: Virtualization Bugs <virt-bugs>
Severity: low Docs Contact:
Priority: low    
Version: 9.1CC: chhu, hongzliu, juzhou, lersek, mxie, rjones, tyan, tzheng, xiaodwan
Target Milestone: rcKeywords: MigratedToJIRA, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-07-07 20:39:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vera 2022-08-26 07:12:24 UTC
Description of problem:

As described in bz2028764, qemu-guest-agent failed to install on OpenSUSE Tumbleweed on the first boot after conversion.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1.Prepare the guest OpenSUSE Tumbleweed on ESX
2.Convert it via virt-v2v to rhv
3.Start VM and check the qemu-guest-agent rpm/service

Actual results:
No qemu-guest-agent pkg installed


Expected results:
qemu-guest-agent pkg is installed successfully


Additional info:

The problem is Packagekit blocks zypper. Packagekit will begin to autoupdate when network is connected and It has a big chance of conflicting with zypper operation. So this is not Tumbleweed specific.

/usr/lib/virt-sysprep/firstboot.sh start
Scripts dir: /usr/lib/virt-sysprep/scripts
=== Running /usr/lib/virt-sysprep/scripts/5000-0001-wait-online ===
=== Running /usr/lib/virt-sysprep/scripts/5000-0002-setenforce-0 ===
=== Running /usr/lib/virt-sysprep/scripts/5000-0003-install-qga ===
PackageKit is blocking zypper. This happens if you have an updater applet or other software
management application using PackageKit running.
We can ask PackageKit to interrupt the current action as soon as possible, but it depends on
PackageKit how fast it will respond to this request.
Ask PackageKit to quit? [yes/no] (no): no
System management is locked by the application with pid 1509 (/usr/libexec/packagekitd).
Close this application before trying again.
=== Running /usr/lib/virt-sysprep/scripts/5000-0004-setenforce-restore ===

Comment 1 Laszlo Ersek 2022-09-21 09:53:17 UTC
Internet lore says we need to disable packagekit temporarily, like this, probably before wait-online succeeds:

systemctl stop packagekit
systemctl mask packagekit

then undo it at the end with

systemctl unmask packagekit
systemctl start packagekit

(dependent on its original status I guess)

Comment 2 Richard W.M. Jones 2022-09-21 12:54:50 UTC
Is masking the service actually needed?  Seems extreme ...

There's a danger with masking the service that users might not realise how
to turn it back on (since stopping < disabling < masking).

Comment 3 Laszlo Ersek 2022-09-21 15:49:18 UTC
I think masking may be suggested because packagekit could have triggers other than just "systemctl start". I think someone from QE suggested that it was network activation that triggered packagekit specifically. My own experience (with different packages/services, such sa NFS) is that socket (?) activation can only be prevented with masking, not with "systemctl disable".

Either way, my idea here would be similar to what we do for SELinux at first boot -- first save the current state, then disable the service, then restore the original state. I hope we can get systemctl to tell us whether packagekit is currently masked, in machine-readable format.

Another complication is: what happens if packagekit is already running (or running a transaction) by the time we try to stop it? We might not be able to stop it (and if we do stop it, does it corrupt the transaction)? That's why I thought we should disable / mask packagekit before waiting for network connectivity.

I doubt we can ever make this robust :/

Comment 4 Laszlo Ersek 2022-09-23 09:31:50 UTC
(1) check whether packagekit is running:

# systemctl is-active packagekit

Exits with status 0 and prints "active" if packagekit is running. Exits with nonzero status and prints "inactive" otherwise.

Save the output in a file.

(2) check whether packagekit is currently masked or not (we don't care about any other enablement status, just masked or not masked).

# systemctl is-enabled packagekit

The systemctl manual lists a bunch of possible enablement states, we only care about "masked" and "masked-runtime".

Save the output in a file.

(3a) If packagekit was masked in (2), then stop the service.

# systemctl stop packagekit

(3b) If packagekit was not masked in (2), then mask and stop the service.

# systemctl mask --now packagekit

(4) Install the guest agent.

(5) If packagekit was not masked in (2), then unmask the service.

# systemctl unmask packagekit

(6) If packagekit was running in (1), then start it.

# systemctl start packagekit


Now, whether stopping the packagekit service will actually stop a running transaction (or wait for the running transaction to complete) is anyone's guess.

I've also checked the zypper source code (@ 281f866999fc, "changes 1.14.56", 2022-09-02). When it asks the user about waiting for the concurrent packagekit transaction to complete, the default answer is "no", and in non-interactive mode, zypper *only* takes the default answer. There is no way to say "don't ask the user, just go ahead and answer "yes". The manual says as much, but I was incredulous, and checked the source code. The manual is factual.

Interestingly, the global "-n" (--non-interactive) option is the one we already use. There is also a command-specific (such as "install"-specific) option "-y" (--no-confirm). However, the manual does not recommend using "-y", and much more importantly, the source code handles "-y" by simulating "-n": "-y" does not change the default answer, it just enters non-interactive mode.