Bug 2101708
Summary: | when host is deleted on hypervisor while ansible job is running, hosts gets deleted on hypervisor level | ||
---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Stefan Nemeth <snemeth> |
Component: | Remote Execution | Assignee: | Adam Ruzicka <aruzicka> |
Status: | CLOSED ERRATA | QA Contact: | Peter Ondrejka <pondrejk> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.10.6 | CC: | aruzicka, pcreech |
Target Milestone: | 6.13.0 | Keywords: | Triaged |
Target Release: | Unused | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | smart_proxy_remote_execution_ssh-0.10.1 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-05-03 13:21:12 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Stefan Nemeth
2022-06-28 08:01:52 UTC
Just to double check, do I read that right that you do not remove the host from Satellite? Just kick off a job, go to the hypervisor and remove the host there? Adding a proper needinfo Right, I managed to reproduce it. Local libvirt reproducer: 1) Have a satellite and a vm 2) Run long running ansible job against the vm 3) Do shut down > force off on the vm foreman-proxy service runs ansible, ansible runs ssh. When the remote host is forcefully killed (or removed), the connection does not break. The connection remains ESTABLISHED long time after the host went away. We could probably start setting a combination of ServerAliveInterval and ServerAliveCountMax for ssh. Both ansible and script (in ssh mode) jobs are susceptible to this, ansible will need to be fixed in puppet foreman_proxy modules, rex in rex itself. Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/35924 has been resolved. To elaborate, the fix for ssh is merged and we can ship it for 6.13. The ansible parts needs to happen in puppet modules and will need an additional installer change. We can deliver the ssh part for 6.13, but not the rest. Hi Adam, Thank you for the details. For the installer changes mentioned in comment 6, is there another bugzilla to track those changes or should this bugzilla be cloned for Installer? As far as I know there is no other BZ, although I have it laid out in jira as subtasks if that counts. Looking at a 6.13 snap 13 box, this seems to have been full delivered already. satellite-6.13.0-6.el8sat.noarch rubygem-smart_proxy_remote_execution_ssh-0.10.1-1.el8sat.noarch foreman-installer-katello-3.5.2.1-1.el8sat.noarch foreman-installer-3.5.2.1-1.el8sat.noarch satellite-installer-6.13.0.7-1.el8sat.noarch Verified on Sat 6.13 sn 15, both ansible and ssh script jobs get terminated when target host becomes unreachable Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Satellite 6.13 Release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:2097 |