Description of problem: When installing RHV-H with the PCI-DSS Security profile, sshd_config is modified to include the following values: ~~~ ClientAliveCountMax 0 ClientAliveInterval 900 ~~~ This will cause long running tasks (such as exporting a large VM to OVA) to fail. IT is somewhat of a silent failure also, as the engine.log just says it fails and the log in /var/log/ovirt-engine/ova/ just shows it sitting on this task: ~~~ ovirt-ova-pack : Run packing script ~~~ This also results in ova/tmp files being left on the storage despite the failure. Version-Release number of selected component (if applicable): 4.4.10 How reproducible: 100% Steps to Reproduce: 1. Install RHV-H with PCI-DSS profile 2. Export a large VM (here one that was 800G was used) Actual results: OVA Export fails + leaves files on the storage Expected results: OVA export succeeds Additional info: In `/var/log/ovirt-engine/ansible-runner-service.log` on RHV-M the SSH closed connection can be observed: ~~~ 2022-02-11 14:28:37,445 - runner_service.controllers.playbooks - INFO - Playbook ovirt-ova-export.yml, UUID=84d50674-8b3e-11ec-8367-0cc47a078176 initiated : status=starting ... 2022-02-11 14:44:02,799 - runner_service.services.playbook - DEBUG - cb_event_handler event_data={'uuid': '4bd7e6dd-52d9-4908-a54e-7a19ded1652f', 'counter': 32, 'stdout': 'fatal: [rhevh1.xxx.xxx]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Shared connection to rhevh1.xxxx.xxxx closed.", "unreachable": true}', 'start_line': 31, 'end_line': 32, 'runner_ident': '84d50674-8b3e-11ec-8367-0cc47a078176', 'event': 'runner_on_unreachable', 'pid': 3471730, 'created': '2022-02-11T13:44:02.798127', 'parent_uuid': '0cc47a07-8176-4cac-0217-000000000020', 'event_data': {'playbook': 'ovirt-ova-export.yml', 'playbook_uuid': 'e0dc7c95-fc9f-4ec4-9997-59e29c84ae30', 'play': 'all', 'play_uuid': '0cc47a07-8176-4cac-0217-000000000009', 'play_pattern': 'all', 'task': 'Run packing script', 'task_uuid': '0cc47a07-8176-4cac-0217-000000000020', 'task_action': 'script', 'task_args': '', 'task_path': '/usr/share/ovirt-engine/ansible-runner-service-project/project/roles/ovirt-ova-pack/tasks/main.yml:2', 'role': 'ovirt-ova-pack', 'host': 'rhevh1.xxxx.xxxx', 'remote_addr': 'rhevh1.xxxx.xxxx', 'start': '2022-02-11T13:28:44.958720', 'end': '2022-02-11T13:44:02.797920', 'duration': 917.8392, 'res': {'unreachable': True, 'msg': 'Failed to connect to the host via ssh: Shared connection to rhevh1.xxxx.xxxxx closed.', 'changed': False}, 'uuid': '4bd7e6dd-52d9-4908-a54e-7a19ded1652f'}} ~~~ The SSH connection gets closed here. duration': 917.8392 = 15m 17s Workaround: Comment out these lines in /etc/ssh/sshd_config: ~~~ # ClientAliveCountMax 0 # ClientAliveInterval 900 ~~~ Restart sshd.service and the issue goes away.
Assuming it was set to Virt by mistake, moved it to Infra (which makes sense because it's not an issue that is specific to export to OVA)
Solution to this seems to be to use ansible async with poll < 900. This can also impact other long running tasks, but I am not sure if there is any that might be running in single ansible step for more than 900 secs.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.0] security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:4711