1858923 – Can not modify associated hosts if connection to vCenter is down from the Satellite server.

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1858923 - Can not modify associated hosts if connection to vCenter is down from the Satellite server.

Summary: Can not modify associated hosts if connection to vCenter is down from the Sat...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Compute Resources - VMWare
Sub Component:
Version:	6.7.0
Hardware:	All
OS:	All
Priority:	medium
Severity:	medium
Target Milestone:	Unspecified
Assignee:	Ondřej Ezr
QA Contact:	Lukáš Hellebrandt
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-07-20 18:46 UTC by Sayan Das
Modified:	2024-10-01 16:43 UTC (History)
CC List:	10 users (show)
Fixed In Version:	foreman-2.5.1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-07-07 06:36:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	31307	Normal	New	Can not modify associated hosts if connection to vCenter is down from the Satellite server.	2021-02-19 11:33:40 UTC
Red Hat Knowledge Base (Solution)	5237461	None	None	None	2020-07-20 20:55:28 UTC
Red Hat Knowledge Base (Solution)	5525281	None	None	None	2021-06-25 16:49:06 UTC

Description Sayan Das 2020-07-20 18:46:02 UTC

Description of problem:

Unable to set or change any release version\content-view\LCE , if the content host is associated with VMware compute-resource and connection to vCenter from satellite server is down.


Version-Release number of selected component (if applicable):
Satellite 6.7 [ older versions as well ]

How reproducible:
Always


Steps to Reproduce:
1. Configure VMware compute resource.
2. Build a host from satellite using the compute resource.
3. Use following command to block the connection from satellite to vCenter.
   # iptables -I OUTPUT -d vcenter.example.com -j DROP

4. Try to set or change the release version\CV\LCE of the host.


Actual results:

* It will wait for some time and then throw following error in GUI.
~~
An error occurred saving the Content Host: Failed to find compute attributes, please check if VM test-rhel7.example.com was deleted
~~

* It will log following details in production.log at the same time.
~~
2020-07-20T22:54:33 [I|app|ad7b7494] Started PUT "/api/v2/hosts/24" for 10.74.9.157 at 2020-07-20 22:54:33 +0530
2020-07-20T22:54:33 [I|app|ad7b7494] Processing by Api::V2::HostsController#update as JSON
2020-07-20T22:54:33 [I|app|ad7b7494]   Parameters: {"id"=>"24", "host"=>{"subscription_facet_attributes"=>{"id"=>19, "autoheal"=>true, "purpose_role"=>"", "purpose_usage"=>"", "service_level"=>"", "release_version"=>"7Server"}}, "apiv"=>"v2"}
2020-07-20T22:55:06 [W|app|22f93f5a] Action failed
2020-07-20T22:55:06 [I|app|22f93f5a] Deface: [WARNING] No :original defined for 'change 500 page content', you should change its definition to include:
 :original => '35d2b4f7aac0c083740c6de6775473457e9ae9d8' 
2020-07-20T22:55:06 [I|app|22f93f5a]   Rendering common/500.html.erb
2020-07-20T22:55:06 [I|app|22f93f5a]   Rendered common/500.html.erb (5.4ms)
2020-07-20T22:55:06 [I|app|22f93f5a] Completed 500 Internal Server Error in 60039ms (Views: 19.5ms | ActiveRecord: 3.2ms)
2020-07-20T22:55:20 [I|app|1c9b8bb8] Started GET "/notification_recipients" for 10.74.9.157 at 2020-07-20 22:55:20 +0530
2020-07-20T22:55:20 [I|app|1c9b8bb8] Processing by NotificationRecipientsController#index as JSON
2020-07-20T22:55:20 [I|app|1c9b8bb8] Completed 200 OK in 12ms (Views: 0.1ms | ActiveRecord: 2.7ms)
2020-07-20T22:55:34 [I|app|ad7b7494] Adding Compute instance for test-rhel7.example.com
2020-07-20T22:55:34 [W|app|ad7b7494] Failed to find compute attributes, please check if VM test-rhel7.example.com was deleted
2020-07-20T22:55:34 [W|app|ad7b7494] Rolling back due to a problem: [#<Orchestration::Task:0x00007f479fe84a48 @name="Set up compute instance test-rhel7.example.com", @id="Set up compute instance test-rhel7.example.com", @status="failed", @priority=3, @action=[#<Host::Managed id: 24, name: "test-rhel7.example.com", last_compile: "2020-07-19 20:19:25", last_report: nil, updated_at: "2020-07-19 20:19:25", created_at: "2020-07-17 20:57:27", root_pass: "$5$S5makfAftJJIxgUd$dhLS9YrQMfXjNGD.ADRzt2oGzZgvqJ...", architecture_id: 1, operatingsystem_id: 4, environment_id: nil, ptable_id: 106, medium_id: nil, build: false, comment: "", disk: "", installed_at: "2020-07-19 17:24:08", model_id: 1, hostgroup_id: 1, owner_id: 4, owner_type: "User", enabled: true, puppet_ca_proxy_id: nil, managed: true, use_image: nil, image_file: nil, uuid: "5000ed8f-054b-f2c8-80ba-f993a114d1a5", compute_resource_id: 1, puppet_proxy_id: nil, certname: nil, image_id: nil, organization_id: 1, location_id: 2, type: "Host::Managed", otp: nil, realm_id: nil, compute_profile_id: 4, provision_method: "build", grub_pass: "$5$S5makfAftJJIxgUd$dhLS9YrQMfXjNGD.ADRzt2oGzZgvqJ...", discovery_rule_id: nil, global_status: 1, lookup_value_matcher: "fqdn=test-rhel7.example.com", pxe_loader: "PXELinux BIOS", initiated_at: "2020-07-19 17:18:22", build_errors: nil, openscap_proxy_id: nil>, :setCompute], @created=1595265934.121043, @timestamp=2020-07-20 17:25:34 UTC>]
2020-07-20T22:55:34 [I|app|ad7b7494] Processed 1 tasks from queue 'Host::Managed Main', completed 0/2
2020-07-20T22:55:34 [E|app|ad7b7494] Task 'Set up compute instance test-rhel7.example.com' *failed*
2020-07-20T22:55:34 [E|app|ad7b7494] Task 'Query instance details for test-rhel7.example.com' *canceled*
2020-07-20T22:55:34 [E|app|ad7b7494] Unprocessable entity Host::Managed (id: 24):
  Failed to find compute attributes, please check if VM test-rhel7.example.com was deleted

~~


Expected results:

1. Satellite should allow changing\setting the values of content host, which are not related to compute resource e.g. release version.


2. Production.log should show more meaning full message rather than just showing "Completed 500 Internal Server Error".

Like
~~
Unable to reach vcenter.example.com. 
~~


Additional info:
NA

Comment 3 Ondřej Ezr 2020-11-12 10:00:03 UTC

Created redmine issue https://projects.theforeman.org/issues/31307 from this bug

Comment 7 Bryan Kearney 2021-05-15 22:41:03 UTC

Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/31307 has been resolved.

Comment 8 Lukáš Hellebrandt 2021-07-14 10:02:18 UTC

Failed with Sat 6.10.0 snap 5.0.

Tl;dr: not much changed and what actually improved is very slow to use. Some error messages are now more meaningful but the root issue seems to be unsolved. I'm open to other points of view but right now, I don't see this as verification material.

Snap 5.0, i.e. before fix:
===
Using a reproducer from OP through WebUI, trying to edit a host that is associated with a CR that is unavailale due to network times out: "Oops, we're sorry but something went wrong execution expired"

Using Hammer:
# hammer host update --name mae-opsahl.vms.sat.rdu2.redhat.com --content-view testcv --organization-id 1 --location-id 2
Could not update the host:
Failed to find compute attributes, please check if VM mae-opsahl.vms.sat.rdu2.redhat.com was deleted

Snap 7.0, i.e. after fix:
===
Using a reproducer from OP through WebUI, trying to edit a host that is associated with a CR that is unavailale due to network takes several minutes for the edit page to appear and then allows for editing some values (e.g. content view), again taking several minutes saving the changes and then failing with "Receiving vm data for host '<FQDN>' from used compute resource 'testvmware (VMware)' failed: 'Connection to compute resource timed out'."
=> This still doesn't allow for editing. Opening the edit page works but very slowly. I understand this is due to waiting for network connectivity but it renders the new functionality almost unusable by default, even if it actually worked.

Using Hammer:
# hammer host update --name <FQDN> --content-view testcv --organization-id 1 --location-id 2
Could not update the host:
Receiving vm data for host '<FQDN>' from used compute resource 'testvmware (VMware)' failed: 'Connection to compute resource timed out'.
=> The error in Hammer has changed and probably does a better job at suggesting this is a network issue. But I highly doubt this is the intended result. Wasn't the goal to make this actually editable, not even through WebUI but also by Hammer?

Creating a vmware host, changing the CR password to wrong and then going to Host -> Edit in webUI leads to: "Oops, we're sorry but something went wrong InvalidLogin: Cannot complete login due to an incorrect user name or password."
Doing the same in Hammer:
# hammer host update --name <FQDN> --content-view testcv --organization-id 1 --location-id 2
Could not update the host:
Failed to find compute attributes, please check if VM <FQDN> was deleted
=> Nothing changed here. This is a similar case but CR unavailability is caused not by network but by password being incorrectly specified. Some parameters of the host could and should be editable even in this case. But impact is low, most people probably don't expect hosts on a CR with incorrectly specified password to work in any way. This being said, an error message in Hammer could be better. If you won't fix this as part of this BZ, I'll perhaps file a separate, low severity and low priority BZ about this.
===

Comment 17 Ron Lavi 2022-07-07 08:00:12 UTC

Hi, while the issue is valid, it exists since Satellite day 1 and currently not going to change.
Modifying the host tries to also modify the host on the compute resource and if it's unavailable, it would lead to an inconsistent state.
If the compute resource is down intentionally and the user still needs to modify the host, they first need to disassociate the host (edit host -> disassociate VM button) and then perform the modifications.

Feel free to re-open if the workaround doesn't fit in this case

Note You need to log in before you can comment on or make changes to this bug.