Bug 1564567
Summary: | User Interface does not come up after reboot | |||
---|---|---|---|---|
Product: | Red Hat CloudForms Management Engine | Reporter: | Ryan Spagnola <rspagnol> | |
Component: | Appliance | Assignee: | Joe Rafaniello <jrafanie> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Tasos Papaioannou <tpapaioa> | |
Severity: | urgent | Docs Contact: | ||
Priority: | high | |||
Version: | 5.7.0 | CC: | abellott, cpelland, jrafanie, obarenbo, tpapaioa | |
Target Milestone: | GA | Keywords: | TestOnly, ZStream | |
Target Release: | 5.10.0 | |||
Hardware: | All | |||
OS: | All | |||
Whiteboard: | ||||
Fixed In Version: | 5.10.0.0 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1568158 1568159 (view as bug list) | Environment: | ||
Last Closed: | 2019-02-11 14:01:50 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | Bug | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | CFME Core | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1568158, 1568159 |
Description
Ryan Spagnola
2018-04-06 15:50:30 UTC
We had a discussion with the customer with the hope of fixing the problem but also trying to understand the root cause. The customer reported they rebooted appliances after reconfiguring memory thresholds. When the appliances were rebooted, the server responsible for distributing roles (master server) was changed. The new master server was then encountering a timeout when it was activating roles. This prevented restarted appliances from being given roles. Upon further inspection, we found a higher latency to the database from the master server encountering this timeout. This latency could be responsible for the inability to assign roles due to the timeout. We forced the master server to move to a different appliance without such a large latency. When a new master server took over, previously restarted appliances started to be given roles as expected. We believe the default 1 minute timeout for this very important work is too small so we will be increasing it. New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/1f564cddadf625bfaf044fa6b1b6932f45c8d8dd commit 1f564cddadf625bfaf044fa6b1b6932f45c8d8dd Author: Joe Rafaniello <jrafanie> AuthorDate: Fri Apr 6 17:37:03 2018 -0400 Commit: Joe Rafaniello <jrafanie> CommitDate: Fri Apr 6 17:37:03 2018 -0400 Add timeout knob for monitoring server roles https://bugzilla.redhat.com/show_bug.cgi?id=1564567 Monitoring server roles as the master server is so important, it should finish and not ever timeout. If it times out, servers will not be able to gain roles. Previously, the default lock timeout of 1 minute is too low in situations where the master server has higher than normal latency to the database. We need to give it more time to finish before timing it out. Additionally, we can specify this value in advanced settings in the server section if 5.minutes is still not enough or just a wrong value. app/models/miq_server/role_management.rb | 6 +- config/settings.yml | 1 + 2 files changed, 6 insertions(+), 1 deletion(-) We added a monitor_server_roles_timeout setting in the "advanced settings" "server" section. We now default to 5 minutes, previously 1 minute, and this value can be configured on a case by case basis. Verified on 5.10.0.3. |