Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2101708 - when host is deleted on hypervisor while ansible job is running, hosts gets deleted on hypervisor level
Summary: when host is deleted on hypervisor while ansible job is running, hosts gets d...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Remote Execution
Version: 6.10.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: 6.13.0
Assignee: Adam Ruzicka
QA Contact: Peter Ondrejka
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-28 08:01 UTC by Stefan Nemeth
Modified: 2023-05-10 12:55 UTC (History)
2 users (show)

Fixed In Version: smart_proxy_remote_execution_ssh-0.10.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-03 13:21:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 35924 0 Normal Closed ssh is not able to detect if the remote host just stops responding 2023-03-17 11:56:00 UTC
Foreman Issue Tracker 35925 0 Normal Closed ansible is not able to detect if the remote host just stops responding 2023-03-17 11:56:02 UTC
Red Hat Issue Tracker SAT-14863 0 None None None 2023-01-11 14:42:01 UTC
Red Hat Product Errata RHSA-2023:2097 0 None None None 2023-05-03 13:21:34 UTC

Internal Links: 2077529

Description Stefan Nemeth 2022-06-28 08:01:52 UTC
Description of problem:

When ansible run is executed and running, hosts gets deleted on hypervisor level. 

Job will not fail over time and gets stuck on either pending or on some percentages of progress for long time

Version-Release number of selected component (if applicable):

6.10.6

How reproducible:

100%

Steps to Reproduce:
1.execute ansible job which runs longer than few seconds
2.delete targeted host on hypervisor while job is running
3.

Actual results:

Job gets stuck on its current state

Expected results:

Job fails after some time 

Additional info:

Comment 1 Adam Ruzicka 2022-06-28 09:26:31 UTC
Just to double check, do I read that right that you do not remove the host from Satellite? Just kick off a job, go to the hypervisor and remove the host there?

Comment 2 Marek Hulan 2022-11-07 17:36:30 UTC
Adding a proper needinfo

Comment 4 Adam Ruzicka 2023-01-10 15:10:02 UTC
Right, I managed to reproduce it.

Local libvirt reproducer:
1) Have a satellite and a vm
2) Run long running ansible job against the vm
3) Do shut down > force off on the vm

foreman-proxy service runs ansible, ansible runs ssh. When the remote host is forcefully killed (or removed), the connection does not break. The connection remains ESTABLISHED long time after the host went away.

We could probably start setting a combination of ServerAliveInterval and ServerAliveCountMax for ssh.

Both ansible and script (in ssh mode) jobs are susceptible to this, ansible will need to be fixed in puppet foreman_proxy modules, rex in rex itself.

Comment 5 Bryan Kearney 2023-01-14 16:03:07 UTC
Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/35924 has been resolved.

Comment 6 Adam Ruzicka 2023-01-18 15:17:23 UTC
To elaborate, the fix for ssh is merged and we can ship it for 6.13. The ansible parts needs to happen in puppet modules and will need an additional installer change. We can deliver the ssh part for 6.13, but not the rest.

Comment 7 Brad Buckingham 2023-01-23 14:53:12 UTC
Hi Adam,

Thank you for the details.

For the installer changes mentioned in comment 6, is there another bugzilla to track those changes or should this bugzilla be cloned for Installer?

Comment 8 Adam Ruzicka 2023-01-24 13:44:37 UTC
As far as I know there is no other BZ, although I have it laid out in jira as subtasks if that counts.

Comment 9 Adam Ruzicka 2023-03-17 12:02:25 UTC
Looking at a 6.13 snap 13 box, this seems to have been full delivered already.

satellite-6.13.0-6.el8sat.noarch
rubygem-smart_proxy_remote_execution_ssh-0.10.1-1.el8sat.noarch
foreman-installer-katello-3.5.2.1-1.el8sat.noarch
foreman-installer-3.5.2.1-1.el8sat.noarch
satellite-installer-6.13.0.7-1.el8sat.noarch

Comment 10 Peter Ondrejka 2023-03-22 15:03:54 UTC
Verified on Sat 6.13 sn 15, both ansible and ssh script jobs get terminated when target host becomes unreachable

Comment 13 errata-xmlrpc 2023-05-03 13:21:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.13 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2097


Note You need to log in before you can comment on or make changes to this bug.