Bug 1858923 - Can not modify associated hosts if connection to vCenter is down from the Satellite server.
Summary: Can not modify associated hosts if connection to vCenter is down from the Sat...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Compute Resources - VMWare
Version: 6.7.0
Hardware: All
OS: All
medium
medium vote
Target Milestone: Unspecified
Assignee: Ondřej Ezr
QA Contact: Lukáš Hellebrandt
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-20 18:46 UTC by Sayan Das
Modified: 2022-05-25 00:30 UTC (History)
9 users (show)

Fixed In Version: foreman-2.5.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 31307 0 Normal New Can not modify associated hosts if connection to vCenter is down from the Satellite server. 2021-02-19 11:33:40 UTC
Red Hat Knowledge Base (Solution) 5237461 0 None None None 2020-07-20 20:55:28 UTC
Red Hat Knowledge Base (Solution) 5525281 0 None None None 2021-06-25 16:49:06 UTC

Description Sayan Das 2020-07-20 18:46:02 UTC
Description of problem:

Unable to set or change any release version\content-view\LCE , if the content host is associated with VMware compute-resource and connection to vCenter from satellite server is down.


Version-Release number of selected component (if applicable):
Satellite 6.7 [ older versions as well ]

How reproducible:
Always


Steps to Reproduce:
1. Configure VMware compute resource.
2. Build a host from satellite using the compute resource.
3. Use following command to block the connection from satellite to vCenter.
   # iptables -I OUTPUT -d vcenter.example.com -j DROP

4. Try to set or change the release version\CV\LCE of the host.


Actual results:

* It will wait for some time and then throw following error in GUI.
~~
An error occurred saving the Content Host: Failed to find compute attributes, please check if VM test-rhel7.example.com was deleted
~~

* It will log following details in production.log at the same time.
~~
2020-07-20T22:54:33 [I|app|ad7b7494] Started PUT "/api/v2/hosts/24" for 10.74.9.157 at 2020-07-20 22:54:33 +0530
2020-07-20T22:54:33 [I|app|ad7b7494] Processing by Api::V2::HostsController#update as JSON
2020-07-20T22:54:33 [I|app|ad7b7494]   Parameters: {"id"=>"24", "host"=>{"subscription_facet_attributes"=>{"id"=>19, "autoheal"=>true, "purpose_role"=>"", "purpose_usage"=>"", "service_level"=>"", "release_version"=>"7Server"}}, "apiv"=>"v2"}
2020-07-20T22:55:06 [W|app|22f93f5a] Action failed
2020-07-20T22:55:06 [I|app|22f93f5a] Deface: [WARNING] No :original defined for 'change 500 page content', you should change its definition to include:
 :original => '35d2b4f7aac0c083740c6de6775473457e9ae9d8' 
2020-07-20T22:55:06 [I|app|22f93f5a]   Rendering common/500.html.erb
2020-07-20T22:55:06 [I|app|22f93f5a]   Rendered common/500.html.erb (5.4ms)
2020-07-20T22:55:06 [I|app|22f93f5a] Completed 500 Internal Server Error in 60039ms (Views: 19.5ms | ActiveRecord: 3.2ms)
2020-07-20T22:55:20 [I|app|1c9b8bb8] Started GET "/notification_recipients" for 10.74.9.157 at 2020-07-20 22:55:20 +0530
2020-07-20T22:55:20 [I|app|1c9b8bb8] Processing by NotificationRecipientsController#index as JSON
2020-07-20T22:55:20 [I|app|1c9b8bb8] Completed 200 OK in 12ms (Views: 0.1ms | ActiveRecord: 2.7ms)
2020-07-20T22:55:34 [I|app|ad7b7494] Adding Compute instance for test-rhel7.example.com
2020-07-20T22:55:34 [W|app|ad7b7494] Failed to find compute attributes, please check if VM test-rhel7.example.com was deleted
2020-07-20T22:55:34 [W|app|ad7b7494] Rolling back due to a problem: [#<Orchestration::Task:0x00007f479fe84a48 @name="Set up compute instance test-rhel7.example.com", @id="Set up compute instance test-rhel7.example.com", @status="failed", @priority=3, @action=[#<Host::Managed id: 24, name: "test-rhel7.example.com", last_compile: "2020-07-19 20:19:25", last_report: nil, updated_at: "2020-07-19 20:19:25", created_at: "2020-07-17 20:57:27", root_pass: "$5$S5makfAftJJIxgUd$dhLS9YrQMfXjNGD.ADRzt2oGzZgvqJ...", architecture_id: 1, operatingsystem_id: 4, environment_id: nil, ptable_id: 106, medium_id: nil, build: false, comment: "", disk: "", installed_at: "2020-07-19 17:24:08", model_id: 1, hostgroup_id: 1, owner_id: 4, owner_type: "User", enabled: true, puppet_ca_proxy_id: nil, managed: true, use_image: nil, image_file: nil, uuid: "5000ed8f-054b-f2c8-80ba-f993a114d1a5", compute_resource_id: 1, puppet_proxy_id: nil, certname: nil, image_id: nil, organization_id: 1, location_id: 2, type: "Host::Managed", otp: nil, realm_id: nil, compute_profile_id: 4, provision_method: "build", grub_pass: "$5$S5makfAftJJIxgUd$dhLS9YrQMfXjNGD.ADRzt2oGzZgvqJ...", discovery_rule_id: nil, global_status: 1, lookup_value_matcher: "fqdn=test-rhel7.example.com", pxe_loader: "PXELinux BIOS", initiated_at: "2020-07-19 17:18:22", build_errors: nil, openscap_proxy_id: nil>, :setCompute], @created=1595265934.121043, @timestamp=2020-07-20 17:25:34 UTC>]
2020-07-20T22:55:34 [I|app|ad7b7494] Processed 1 tasks from queue 'Host::Managed Main', completed 0/2
2020-07-20T22:55:34 [E|app|ad7b7494] Task 'Set up compute instance test-rhel7.example.com' *failed*
2020-07-20T22:55:34 [E|app|ad7b7494] Task 'Query instance details for test-rhel7.example.com' *canceled*
2020-07-20T22:55:34 [E|app|ad7b7494] Unprocessable entity Host::Managed (id: 24):
  Failed to find compute attributes, please check if VM test-rhel7.example.com was deleted

~~


Expected results:

1. Satellite should allow changing\setting the values of content host, which are not related to compute resource e.g. release version.


2. Production.log should show more meaning full message rather than just showing "Completed 500 Internal Server Error".

Like
~~
Unable to reach vcenter.example.com. 
~~


Additional info:
NA

Comment 3 Ondřej Ezr 2020-11-12 10:00:03 UTC
Created redmine issue https://projects.theforeman.org/issues/31307 from this bug

Comment 7 Bryan Kearney 2021-05-15 22:41:03 UTC
Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/31307 has been resolved.

Comment 8 Lukáš Hellebrandt 2021-07-14 10:02:18 UTC
Failed with Sat 6.10.0 snap 5.0.

Tl;dr: not much changed and what actually improved is very slow to use. Some error messages are now more meaningful but the root issue seems to be unsolved. I'm open to other points of view but right now, I don't see this as verification material.


Snap 5.0, i.e. before fix:
===
Using a reproducer from OP through WebUI, trying to edit a host that is associated with a CR that is unavailale due to network times out: "Oops, we're sorry but something went wrong execution expired"

Using Hammer:
# hammer host update --name mae-opsahl.vms.sat.rdu2.redhat.com --content-view testcv --organization-id 1 --location-id 2
Could not update the host:
  Failed to find compute attributes, please check if VM mae-opsahl.vms.sat.rdu2.redhat.com was deleted

Creating a vmware host, changing the CR password to wrong and then going to Host -> Edit in webUI leads to: "Oops, we're sorry but something went wrong InvalidLogin: Cannot complete login due to an incorrect user name or password."
Doing the same in Hammer:
# hammer host update --name <FQDN> --content-view testcv --organization-id 1 --location-id 2
Could not update the host:
  Failed to find compute attributes, please check if VM <FQDN> was deleted
===

Snap 7.0, i.e. after fix:
===
Using a reproducer from OP through WebUI, trying to edit a host that is associated with a CR that is unavailale due to network takes several minutes for the edit page to appear and then allows for editing some values (e.g. content view), again taking several minutes saving the changes and then failing with "Receiving vm data for host '<FQDN>' from used compute resource 'testvmware (VMware)' failed: 'Connection to compute resource timed out'."
=> This still doesn't allow for editing. Opening the edit page works but very slowly. I understand this is due to waiting for network connectivity but it renders the new functionality almost unusable by default, even if it actually worked.

Using Hammer:
# hammer host update --name <FQDN> --content-view testcv --organization-id 1 --location-id 2
Could not update the host:
  Receiving vm data for host '<FQDN>' from used compute resource 'testvmware (VMware)' failed: 'Connection to compute resource timed out'.
=> The error in Hammer has changed and probably does a better job at suggesting this is a network issue. But I highly doubt this is the intended result. Wasn't the goal to make this actually editable, not even through WebUI but also by Hammer?

Creating a vmware host, changing the CR password to wrong and then going to Host -> Edit in webUI leads to: "Oops, we're sorry but something went wrong InvalidLogin: Cannot complete login due to an incorrect user name or password."
Doing the same in Hammer:
# hammer host update --name <FQDN> --content-view testcv --organization-id 1 --location-id 2
Could not update the host:
  Failed to find compute attributes, please check if VM <FQDN> was deleted
=> Nothing changed here. This is a similar case but CR unavailability is caused not by network but by password being incorrectly specified. Some parameters of the host could and should be editable even in this case. But impact is low, most people probably don't expect hosts on a CR with incorrectly specified password to work in any way. This being said, an error message in Hammer could be better. If you won't fix this as part of this BZ, I'll perhaps file a separate, low severity and low priority BZ about this.
===


Note You need to log in before you can comment on or make changes to this bug.