Description of problem: Update/Upgrade tasks timeout, abort, and fail when an RHV-H host has been deployed using the "VPP - Protection Profile for Virtualization v. 1.0 for Red Hat Enterprise Linux Hypervisor (RHELH)" Security Policy during the Anaconda installation. This could also be caused by simply adjusting the "ClientAlive..." options in /etc/ssh/sshd_config like this: [root@rhhi1 ~]# grep -B1 '^ClientAlive' /etc/ssh/sshd_config # Per CCE-CCE-27082-7: Set ClientAliveCountMax 0 in /etc/ssh/sshd_config ClientAliveCountMax 0 -- # Per CCE-CCE-27433-2: Set ClientAliveInterval 300 in /etc/ssh/sshd_config ClientAliveInterval 300 When the task is aborted, yum, rpm, and imgbased become "angry" and the update/upgrade can't be retried easily. How reproducible: Install/Deploy a host with that Security Profile, or with those sshd_config options. Upgrade the host from the RHVM GUI. Wait 5 minutes. The task is failed. Additional info: It seems like this could easily be avoided by installing a "ServerAliveInterval=60" parameter somewhere in the ansible playbook / variables. Maybe like this... cat << EOF >> /usr/share/ovirt-engine/playbooks/ansible.cfg # Maintain a heartbeat for long-running tasks like ovirt-host-upgrade ansible_ssh_common_args = '-o ServerAliveInterval=60' EOF
Here are the RHVM (ovirt-engine) logs that show the disconnect after 5 minutes waiting for yum to complete (which takes ~8 minutes on these hosts), followed by the `yum history` from the affected host. [root@rhvm host-deploy]# cat ovirt-host-mgmt-ansible-20190930160816-rhv1.cluster.net-bddbd03f-3a7f-4e6c-b79d-d4651f411d0f.log [[[[ output from successful tasks omitted]]]] 2019-09-30 16:08:16,580 p=29853 u=ovirt | Using /usr/share/ovirt-engine/playbooks/ansible.cfg as config file 2019-09-30 16:08:17,459 passlib.registry registered 'md5_crypt' handler: <class 'passlib.handlers.md5_crypt.md5_crypt'> 2019-09-30 16:08:17,466 p=29853 u=ovirt | PLAY [all] ********************************************************************* 2019-09-30 16:08:17,476 p=29853 u=ovirt | TASK [Gathering Facts] ********************************************************* 2019-09-30 16:08:22,727 p=29853 u=ovirt | ok: [rhv1.cluster.net] 2019-09-30 16:08:22,763 p=29853 u=ovirt | TASK [Check host certificate] ************************************************** 2019-09-30 16:08:25,915 p=29853 u=ovirt | changed: [rhv1.cluster.net] => (item={u'ca': u'/etc/pki/vdsm/certs/cacert.pem', u'cert': u'/etc/pki/vdsm/certs/vdsmcert.pem'}) => { 2019-09-30 16:08:28,919 p=29853 u=ovirt | changed: [rhv1.cluster.net] => (item={u'ca': u'/etc/pki/libvirt/../CA/cacert.pem', u'cert': u'/etc/pki/libvirt/clientcert.pem'}) => { 2019-09-30 16:08:31,950 p=29853 u=ovirt | changed: [rhv1.cluster.net] => (item={u'ca': u'/etc/pki/vdsm/libvirt-spice/ca-cert.pem', u'cert': u'/etc/pki/vdsm/libvirt-spice/server-cert.pem'}) => { 2019-09-30 16:08:31,990 p=29853 u=ovirt | TASK [Set enrollment variable] ************************************************* 2019-09-30 16:08:32,009 p=29853 u=ovirt | ok: [rhv1.cluster.net] => { 2019-09-30 16:08:32,045 p=29853 u=ovirt | TASK [ovirt-host-enroll-certificate : include_tasks] *************************** 2019-09-30 16:08:32,058 p=29853 u=ovirt | skipping: [rhv1.cluster.net] => (item=loadconfig) => { 2019-09-30 16:08:32,062 p=29853 u=ovirt | skipping: [rhv1.cluster.net] => (item=vdsm) => { 2019-09-30 16:08:32,065 p=29853 u=ovirt | skipping: [rhv1.cluster.net] => (item=vmconsole) => { 2019-09-30 16:08:32,100 p=29853 u=ovirt | TASK [ovirt-host-upgrade : Install ovirt-host package if it isn't installed] *** 2019-09-30 16:08:45,971 p=29853 u=ovirt | ok: [rhv1.cluster.net] => { "changed": false, "changes": { "installed": [], "updated": [] }, "obsoletes": { "redhat-virtualization-host-image-update-placeholder": { "dist": "noarch", "repo": "installed", "version": "4.3.5-3.el7ev" } }, "rc": 0, "results": [ "All packages providing ovirt-host are up to date", "" ] } 2019-09-30 16:08:46,004 p=29853 u=ovirt | TASK [ovirt-host-upgrade : Update system] ************************************** 2019-09-30 16:13:48,011 p=29853 u=ovirt | fatal: [rhv1.cluster.net]: UNREACHABLE! => { "changed": false, "unreachable": true } MSG: Failed to connect to the host via ssh: FIPS mode initialized Shared connection to rhv1.cluster.net closed. 2019-09-30 16:13:48,012 p=29853 u=ovirt | PLAY RECAP ********************************************************************* 2019-09-30 16:13:48,012 p=29853 u=ovirt | rhv1.cluster.net : ok=4 changed=1 unreachable=1 failed=0 skipped=1 rescued=0 ignored=0 [root@rhvm host-deploy]# And here are the yum history info logs from the host where the update failed/aborted [root@rhv1 ~]# yum history Loaded plugins: enabled_repos_upload, imgbased-persist, package_upload, product-id, search-disabled-repos, subscription-manager, : vdsmupgrade, versionlock ID | Login user | Date and time | Action(s) | Altered ------------------------------------------------------------------------------- 4 | root <root> | 2019-09-30 16:09 | Obsoleting | 2 ** 3 | root <root> | 2019-09-30 15:02 | Erase | 1 2 | root <root> | 2019-09-30 13:47 | Install | 1 1 | System <unset> | 2019-09-30 11:59 | Install | 11 history list Uploading Enabled Repositories Report Loaded plugins: product-id, subscription-manager, versionlock Unable to upload Enabled Repositories Report [root@rhv1 ~]# yum history info 4 Loaded plugins: enabled_repos_upload, imgbased-persist, package_upload, product-id, search-disabled-repos, subscription-manager, : vdsmupgrade, versionlock Transaction ID : 4 Begin time : Mon Sep 30 16:09:40 2019 Begin rpmdb : 899:25e583f565e071183b4b6d2b29f8489f6e5463d1 User : root <root> Return-Code : ** Aborted ** ######################## ABORTED ######################### Command Line : -d 2 -y update Transaction performed with: Installed rpm-4.11.3-40.el7.x86_64 installed Installed subscription-manager-1.24.13-1.el7.x86_64 installed Installed yum-3.4.3-163.el7.noarch installed Installed yum-plugin-versionlock-1.1.31-52.el7.noarch installed Packages Altered: ** Obsoleting redhat-virtualization-host-image-update-4.3.5-20190920.0.el7_7.noarch @rhel-7-server-rhvh-4-rpms Excluding 5 updates due to versionlock (use "yum versionlock status" to show them) ** Obsoleted redhat-virtualization-host-image-update-placeholder-4.3.5-3.el7ev.noarch ? Scriptlet output: 1 warning: %post(redhat-virtualization-host-image-update-4.3.5-20190920.0.el7_7.noarch) scriptlet failed, signal 1 history info Uploading Enabled Repositories Report Loaded plugins: product-id, subscription-manager, versionlock Unable to upload Enabled Repositories Report [root@rhv1 ~]#
Verified with: ovirt-engine-4.4.2.2-0.3.el8ev.noarch Steps: 1. Test with VPP security profile selected (Use RHVH 4.3 in this test, because RHVH 4.4 can't be added to engine when VPP is selected, see bug 1833254) 1) install redhat-virtualization-host-4.3.10-20200513.0.el7_8 host, select VPP security profile during installation 2) create 4.3 data center, cluster on engine, and add rhvh 4.3 host to engine 3) create a repo providing redhat-virtualization-host-4.3.10-20200615.0.el7_8 update 4) upgrade rhvh 4.3 host on engine 2. Test with /etc/ssh/sshd_config configured 1) install redhat-virtualization-host-4.4.1-20200722.0.el8_2 host 2) add rhvh 4.4 host to engine 3) modify /etc/ssh/sshd_config and restart sshd.service on host: ClientAliveInterval 300 ClientAliveCountMax 0 4) create a repo providing redhat-virtualization-host-4.4.2-20200812.3.el8_2 update 5) upgrade rhvh 4.4 on engine Results: 1. When VPP security profile was selected during host installation, upgrade packages task succeeded: 2020-08-19 10:45:33 IDT - TASK [ovirt-host-upgrade : Upgrade packages] *********************************** 2020-08-19 10:45:33 IDT - 2020-08-19 10:45:33 IDT - { "status" : "OK", "msg" : "", "data" : { "event_data" : { "playbook" : "ovirt-host-upgrade.yml", "playbook_uuid" : "f22e9551-699e-43ec-a56e-fb296996f8e2", "play" : "all", "play_uuid" : "001a4a16-1064-9792-4690-000000000009", "play_pattern" : "all", "task" : "Upgrade packages", "task_uuid" : "001a4a16-1064-9792-4690-000000000034", "task_action" : "yum", "task_args" : "", "task_path" : "/usr/share/ovirt-engine/ansible-runner-service-project/project/roles/ovirt-host-upgrade/tasks/main.yml:30", "role" : "ovirt-host-upgrade", "host" : "10.73.131.97", "remote_addr" : "10.73.131.97", "res" : { "results" : [ { "changed" : true, "results" : [ "Installed:\n redhat-virtualization-host-image-update.noarch 0:4.3.10-20200615.0.el7_8 \n\nReplaced:\n redhat-virtualization-host-image-update-placeholder.noarch 0:4.3.10-1.el7ev \n\nComplete!\nUploading Enabled Repositories Report\n" ], "start" : "2020-08-19T07:34:50.825566", "end" : "2020-08-19T07:45:32.197702", 2. When /etc/ssh/sshd_config was configured, upgrade packages task succeeded: 2020-08-19 16:37:29 IDT - TASK [ovirt-host-upgrade : Upgrade packages] *********************************** 2020-08-19 16:46:25 IDT - { "status" : "OK", "data" : { "event_data" : { "playbook" : "ovirt-host-upgrade.yml", "playbook_uuid" : "44f35c32-1975-4e6b-b27d-be87eabc0936", "play" : "all", "play_uuid" : "001a4a16-1064-115d-9337-00000000000a", "play_pattern" : "all", "task" : "Upgrade packages", "task_uuid" : "001a4a16-1064-115d-9337-000000000035", "task_action" : "yum", "task_args" : "", "task_path" : "/usr/share/ovirt-engine/ansible-runner-service-project/project/roles/ovirt-host-upgrade/tasks/main.yml:30", "role" : "ovirt-host-upgrade", "host" : "10.73.73.13", "remote_addr" : "10.73.73.13", "res" : { "results" : [ { "msg" : "", "changed" : true, "results" : [ "Installed: redhat-virtualization-host-image-update-4.4.2-20200812.3.el8_2.noarch", "Removed: redhat-virtualization-host-image-update-placeholder-4.4.1-1.el8ev.noarch" ], "start" : "2020-08-19T13:37:29.373697", "end" : "2020-08-19T13:46:24.113720", "duration" : 534.740023,
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Virtualization security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:3807