Bug 1758024 - Long running Ansible tasks timeout and abort for RHV-H hosts with STIG/Security Profiles applied
Summary: Long running Ansible tasks timeout and abort for RHV-H hosts with STIG/Securi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: unspecified
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.4.2
: 4.4.2
Assignee: Dana
QA Contact: Qin Yuan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-03 03:49 UTC by John Call
Modified: 2022-07-09 16:25 UTC (History)
3 users (show)

Fixed In Version: rhv-4.4.2-2 ovirt-engine-4.4.2.1
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-23 16:11:04 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-37465 0 None None None 2022-07-09 16:25:16 UTC
Red Hat Product Errata RHSA-2020:3807 0 None None None 2020-09-23 16:11:39 UTC
oVirt gerrit 110110 0 master MERGED engine: add server alive configurations 2020-12-14 22:55:14 UTC

Description John Call 2019-10-03 03:49:02 UTC
Description of problem:
Update/Upgrade tasks timeout, abort, and fail when an RHV-H host has been deployed using the "VPP - Protection Profile for Virtualization v. 1.0 for Red Hat Enterprise Linux Hypervisor (RHELH)" Security Policy during the Anaconda installation.

This could also be caused by simply adjusting the "ClientAlive..." options in /etc/ssh/sshd_config like this:
[root@rhhi1 ~]# grep -B1 '^ClientAlive' /etc/ssh/sshd_config
# Per CCE-CCE-27082-7: Set ClientAliveCountMax 0 in /etc/ssh/sshd_config
ClientAliveCountMax 0
--
# Per CCE-CCE-27433-2: Set ClientAliveInterval 300 in /etc/ssh/sshd_config
ClientAliveInterval 300

When the task is aborted, yum, rpm, and imgbased become "angry" and the update/upgrade can't be retried easily.



How reproducible:
Install/Deploy a host with that Security Profile, or with those sshd_config options.  Upgrade the host from the RHVM GUI.  Wait 5 minutes.  The task is failed.



Additional info:
It seems like this could easily be avoided by installing a "ServerAliveInterval=60" parameter somewhere in the ansible playbook / variables.  Maybe like this...

cat << EOF >> /usr/share/ovirt-engine/playbooks/ansible.cfg
# Maintain a heartbeat for long-running tasks like ovirt-host-upgrade
ansible_ssh_common_args = '-o ServerAliveInterval=60'
EOF

Comment 1 John Call 2019-10-03 04:17:54 UTC
Here are the RHVM (ovirt-engine) logs that show the disconnect after 5 minutes waiting for yum to complete (which takes ~8 minutes on these hosts), followed by the `yum history` from the affected host.

[root@rhvm host-deploy]# cat ovirt-host-mgmt-ansible-20190930160816-rhv1.cluster.net-bddbd03f-3a7f-4e6c-b79d-d4651f411d0f.log [[[[ output from successful tasks omitted]]]]
2019-09-30 16:08:16,580 p=29853 u=ovirt |  Using /usr/share/ovirt-engine/playbooks/ansible.cfg as config file
2019-09-30 16:08:17,459 passlib.registry registered 'md5_crypt' handler: <class 'passlib.handlers.md5_crypt.md5_crypt'>
2019-09-30 16:08:17,466 p=29853 u=ovirt |  PLAY [all] *********************************************************************
2019-09-30 16:08:17,476 p=29853 u=ovirt |  TASK [Gathering Facts] *********************************************************
2019-09-30 16:08:22,727 p=29853 u=ovirt |  ok: [rhv1.cluster.net]
2019-09-30 16:08:22,763 p=29853 u=ovirt |  TASK [Check host certificate] **************************************************
2019-09-30 16:08:25,915 p=29853 u=ovirt |  changed: [rhv1.cluster.net] => (item={u'ca': u'/etc/pki/vdsm/certs/cacert.pem', u'cert': u'/etc/pki/vdsm/certs/vdsmcert.pem'}) => {
2019-09-30 16:08:28,919 p=29853 u=ovirt |  changed: [rhv1.cluster.net] => (item={u'ca': u'/etc/pki/libvirt/../CA/cacert.pem', u'cert': u'/etc/pki/libvirt/clientcert.pem'}) => {
2019-09-30 16:08:31,950 p=29853 u=ovirt |  changed: [rhv1.cluster.net] => (item={u'ca': u'/etc/pki/vdsm/libvirt-spice/ca-cert.pem', u'cert': u'/etc/pki/vdsm/libvirt-spice/server-cert.pem'}) => {
2019-09-30 16:08:31,990 p=29853 u=ovirt |  TASK [Set enrollment variable] *************************************************
2019-09-30 16:08:32,009 p=29853 u=ovirt |  ok: [rhv1.cluster.net] => {
2019-09-30 16:08:32,045 p=29853 u=ovirt |  TASK [ovirt-host-enroll-certificate : include_tasks] ***************************
2019-09-30 16:08:32,058 p=29853 u=ovirt |  skipping: [rhv1.cluster.net] => (item=loadconfig)  => {
2019-09-30 16:08:32,062 p=29853 u=ovirt |  skipping: [rhv1.cluster.net] => (item=vdsm)  => {
2019-09-30 16:08:32,065 p=29853 u=ovirt |  skipping: [rhv1.cluster.net] => (item=vmconsole)  => {
2019-09-30 16:08:32,100 p=29853 u=ovirt |  TASK [ovirt-host-upgrade : Install ovirt-host package if it isn't installed] ***
2019-09-30 16:08:45,971 p=29853 u=ovirt |  ok: [rhv1.cluster.net] => {
    "changed": false, 
    "changes": {
        "installed": [], 
        "updated": []
    }, 
    "obsoletes": {
        "redhat-virtualization-host-image-update-placeholder": {
            "dist": "noarch", 
            "repo": "installed", 
            "version": "4.3.5-3.el7ev"
        }
    }, 
    "rc": 0, 
    "results": [
        "All packages providing ovirt-host are up to date", 
        ""
    ]
}
2019-09-30 16:08:46,004 p=29853 u=ovirt |  TASK [ovirt-host-upgrade : Update system] **************************************
2019-09-30 16:13:48,011 p=29853 u=ovirt |  fatal: [rhv1.cluster.net]: UNREACHABLE! => {
    "changed": false, 
    "unreachable": true
}

MSG:

Failed to connect to the host via ssh: FIPS mode initialized
Shared connection to rhv1.cluster.net closed.

2019-09-30 16:13:48,012 p=29853 u=ovirt |  PLAY RECAP *********************************************************************
2019-09-30 16:13:48,012 p=29853 u=ovirt |  rhv1.cluster.net           : ok=4    changed=1    unreachable=1    failed=0    skipped=1    rescued=0    ignored=0   
[root@rhvm host-deploy]# 





And here are the yum history info logs from the host where the update failed/aborted
[root@rhv1 ~]# yum history
Loaded plugins: enabled_repos_upload, imgbased-persist, package_upload, product-id, search-disabled-repos, subscription-manager,
              : vdsmupgrade, versionlock
ID     | Login user               | Date and time    | Action(s)      | Altered
-------------------------------------------------------------------------------
     4 | root <root>              | 2019-09-30 16:09 | Obsoleting     |    2 **
     3 | root <root>              | 2019-09-30 15:02 | Erase          |    1   
     2 | root <root>              | 2019-09-30 13:47 | Install        |    1   
     1 | System <unset>           | 2019-09-30 11:59 | Install        |   11   
history list
Uploading Enabled Repositories Report
Loaded plugins: product-id, subscription-manager, versionlock
Unable to upload Enabled Repositories Report

[root@rhv1 ~]# yum history info 4
Loaded plugins: enabled_repos_upload, imgbased-persist, package_upload, product-id, search-disabled-repos, subscription-manager,
              : vdsmupgrade, versionlock
Transaction ID : 4
Begin time     : Mon Sep 30 16:09:40 2019
Begin rpmdb    : 899:25e583f565e071183b4b6d2b29f8489f6e5463d1
User           : root <root>
Return-Code    : ** Aborted **                           ######################## ABORTED #########################
Command Line   : -d 2 -y update
Transaction performed with:
    Installed     rpm-4.11.3-40.el7.x86_64                    installed
    Installed     subscription-manager-1.24.13-1.el7.x86_64   installed
    Installed     yum-3.4.3-163.el7.noarch                    installed
    Installed     yum-plugin-versionlock-1.1.31-52.el7.noarch installed
Packages Altered:
 ** Obsoleting redhat-virtualization-host-image-update-4.3.5-20190920.0.el7_7.noarch    @rhel-7-server-rhvh-4-rpms
Excluding 5 updates due to versionlock (use "yum versionlock status" to show them)
 ** Obsoleted  redhat-virtualization-host-image-update-placeholder-4.3.5-3.el7ev.noarch ?
Scriptlet output:
   1 warning: %post(redhat-virtualization-host-image-update-4.3.5-20190920.0.el7_7.noarch) scriptlet failed, signal 1
history info
Uploading Enabled Repositories Report
Loaded plugins: product-id, subscription-manager, versionlock
Unable to upload Enabled Repositories Report

[root@rhv1 ~]#

Comment 5 Qin Yuan 2020-08-21 15:47:24 UTC
Verified with:
ovirt-engine-4.4.2.2-0.3.el8ev.noarch

Steps:
1. Test with VPP security profile selected
(Use RHVH 4.3 in this test, because RHVH 4.4 can't be added to engine when VPP is selected, see bug 1833254)
1) install redhat-virtualization-host-4.3.10-20200513.0.el7_8 host, select VPP security profile during installation
2) create 4.3 data center, cluster on engine, and add rhvh 4.3 host to engine
3) create a repo providing redhat-virtualization-host-4.3.10-20200615.0.el7_8 update
4) upgrade rhvh 4.3 host on engine

2. Test with /etc/ssh/sshd_config configured
1) install redhat-virtualization-host-4.4.1-20200722.0.el8_2 host
2) add rhvh 4.4 host to engine
3) modify /etc/ssh/sshd_config and restart sshd.service on host:
ClientAliveInterval 300
ClientAliveCountMax 0
4) create a repo providing redhat-virtualization-host-4.4.2-20200812.3.el8_2 update
5) upgrade rhvh 4.4 on engine

Results:
1. When VPP security profile was selected during host installation, upgrade packages task succeeded:
2020-08-19 10:45:33 IDT - TASK [ovirt-host-upgrade : Upgrade packages] ***********************************
2020-08-19 10:45:33 IDT -
2020-08-19 10:45:33 IDT - {
  "status" : "OK",
  "msg" : "",
  "data" : {
    "event_data" : {
      "playbook" : "ovirt-host-upgrade.yml",
      "playbook_uuid" : "f22e9551-699e-43ec-a56e-fb296996f8e2",
      "play" : "all",
      "play_uuid" : "001a4a16-1064-9792-4690-000000000009",
      "play_pattern" : "all",
      "task" : "Upgrade packages",
      "task_uuid" : "001a4a16-1064-9792-4690-000000000034",
      "task_action" : "yum",
      "task_args" : "",
      "task_path" : "/usr/share/ovirt-engine/ansible-runner-service-project/project/roles/ovirt-host-upgrade/tasks/main.yml:30",
      "role" : "ovirt-host-upgrade",
      "host" : "10.73.131.97",
      "remote_addr" : "10.73.131.97",
      "res" : {
        "results" : [ {
          "changed" : true,
          "results" : [ "Installed:\n  redhat-virtualization-host-image-update.noarch 0:4.3.10-20200615.0.el7_8      \n\nReplaced:\n  redhat-virtualization-host-image-update-placeholder.noarch 0:4.3.10-1.el7ev   \n\nComplete!\nUploading Enabled Repositories Report\n" ],
      "start" : "2020-08-19T07:34:50.825566",
      "end" : "2020-08-19T07:45:32.197702",


2. When /etc/ssh/sshd_config was configured, upgrade packages task succeeded:

2020-08-19 16:37:29 IDT - TASK [ovirt-host-upgrade : Upgrade packages] ***********************************
2020-08-19 16:46:25 IDT - {
  "status" : "OK",
  "data" : {
    "event_data" : {
      "playbook" : "ovirt-host-upgrade.yml",
      "playbook_uuid" : "44f35c32-1975-4e6b-b27d-be87eabc0936",
      "play" : "all",
      "play_uuid" : "001a4a16-1064-115d-9337-00000000000a",
      "play_pattern" : "all",
      "task" : "Upgrade packages",
      "task_uuid" : "001a4a16-1064-115d-9337-000000000035",
      "task_action" : "yum",
      "task_args" : "",
      "task_path" : "/usr/share/ovirt-engine/ansible-runner-service-project/project/roles/ovirt-host-upgrade/tasks/main.yml:30",
      "role" : "ovirt-host-upgrade",
      "host" : "10.73.73.13",
      "remote_addr" : "10.73.73.13",
      "res" : {
        "results" : [ {
          "msg" : "",
          "changed" : true,
          "results" : [ "Installed: redhat-virtualization-host-image-update-4.4.2-20200812.3.el8_2.noarch", "Removed: redhat-virtualization-host-image-update-placeholder-4.4.1-1.el8ev.noarch" ],
      "start" : "2020-08-19T13:37:29.373697",
      "end" : "2020-08-19T13:46:24.113720",
      "duration" : 534.740023,

Comment 9 errata-xmlrpc 2020-09-23 16:11:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Virtualization security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3807


Note You need to log in before you can comment on or make changes to this bug.