Bug 1907514

Summary: Can not delete a host when the capsule it's registered to is down
Product: Red Hat Satellite Reporter: Julio Entrena Perez <jentrena>
Component: Remote ExecutionAssignee: Adam Ruzicka <aruzicka>
Status: CLOSED ERRATA QA Contact: Peter Ondrejka <pondrejk>
Severity: high Docs Contact:
Priority: high    
Version: 6.8.0CC: ahumbe, aruzicka, dgross, gtaylor, inecas, jjeffers, joseph.alexander, jpasqual, ktordeur, lstejska, mmccune, ngalvin, pjagtap, riehecky, risantam, sadas, saydas, zhunting
Target Milestone: 6.9.5Keywords: PrioBumpGSS, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: tfm-rubygem-foreman_remote_execution-4.2.3.1-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1973365 (view as bug list) Environment:
Last Closed: 2021-08-31 12:04:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Julio Entrena Perez 2020-12-14 16:41:54 UTC
Description of problem:
Attempting to delete a host in the web ui while the capsule that it's registered to is down fails.
Satellite tries to contact the capsule to delete the SSH known host key, fails to contact the capsule because it's down and host is not deleted.

Version-Release number of selected component (if applicable):
satellite-6.8.1-1.el7sat.noarch

How reproducible:
Always

Steps to Reproduce:
1. Register a host to a capsule
2. Run a remote command against the host
3. Shutdown the capsule
4. Attempt to delete the host

Actual results:
Via web ui, error message "Failed to delete <hostname>: []" is displayed.
Via hammer, "Host deleted." is returned with a 0 exit code but the host is not deleted.

Expected results:
Host is deleted.
The capsule may have been removed/destroyed and may no longer be available. This should not block deletion of hosts.

Additional info:

Comment 1 Julio Entrena Perez 2020-12-14 16:49:16 UTC
2020-12-14T16:13:13 [W|app|] Remove SSH known hosts for host.example.com task failed with the following error: ERF12-6886 [ProxyAPI::ProxyException]: Unable to remove host from known hosts ([SocketError]: Failed to open TCP connection to capsule.example.com:9090 (getaddrinfo: Name or service not known)) for Capsule https://capsule.example.com:9090/ssh
2020-12-14T16:13:13 [W|app|] Rolling back due to a problem: [#<Orchestration::Task:0x0000000011f995f0 @name="Remove SSH known hosts for host.example.com", @id="ssh_remove_known_hosts_interface_10.33.8.73_2", @status="failed", @priority=200, @action=[#<Nic::Bridge id: 15, mac: "52:54:00:14:4d:08", ip: "10.33.8.73", type: "Nic::Bridge", name: "host.example.com", host_id: 8, subnet_id: 1, domain_id: 1, attrs: {"bridge"=>true}, created_at: "2020-12-14 14:51:07", updated_at: "2020-12-14 15:03:18", provider: nil, username: nil, password: nil, virtual: true, link: true, identifier: "br0", tag: "", attached_to: "", managed: true, mode: "balance-rr", attached_devices: "", bond_options: "", primary: true, provision: true, compute_attributes: {}, execution: true, ip6: "", subnet6_id: nil>, :drop_from_known_hosts, 2], @created=1607962393.0890071, @timestamp=2020-12-14 16:13:13 UTC>]
2020-12-14T16:13:13 [I|bac|] Task {label: Actions::Katello::Host::Destroy, id: e81809da-1c33-4a0c-84ae-ab20536a62e9, execution_plan_id: a8f006df-a5ac-420d-90d3-b49336a39e42} state changed: stopped  result: success
2020-12-14T16:13:13 [I|bac|] Task {label: Actions::Katello::Host::Destroy, id: e81809da-1c33-4a0c-84ae-ab20536a62e9, execution_plan_id: a8f006df-a5ac-420d-90d3-b49336a39e42} state changed: stopped  result: success
2020-12-14T16:13:13 [I|bac|] Task {label: Actions::BulkAction, id: 22c644c1-e0d7-49c2-88f0-514ca9140cf1, execution_plan_id: 9d74e080-ea14-4eb0-8c8d-ca6e30a39790} state changed: stopped  result: success
2020-12-14T16:13:13 [I|bac|] Task {label: Actions::BulkAction, id: 22c644c1-e0d7-49c2-88f0-514ca9140cf1, execution_plan_id: 9d74e080-ea14-4eb0-8c8d-ca6e30a39790} state changed: stopped  result: success

Comment 16 Adam Ruzicka 2021-04-30 11:55:52 UTC
*** Bug 1942366 has been marked as a duplicate of this bug. ***

Comment 21 Adam Ruzicka 2021-07-13 13:41:48 UTC
On another thought, REX has diverged quite a bit since 6.9 went out and a cherry-pick would be better. The patches should still apply cleanly, if not, feel free to ping me.

Comment 25 Peter Ondrejka 2021-08-16 09:52:11 UTC
Verified on Satellite 6.9.5 sn 2, host can be successfully removed even if the rex capsule it was registered to is down

Comment 30 errata-xmlrpc 2021-08-31 12:04:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Satellite 6.9.5 Async Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3387