looks like there was SPM handover in the middle. Low severity if there was no functional impact. Was there any? Reproducible?
(In reply to Michal Skrivanek from comment #5) > looks like there was SPM handover in the middle. Low severity if there was > no functional impact. Was there any? Reproducible? It reproduces in two different environments, one of which has not been updated so it is not related to Update. There is no functional impact, host successfully get into maintenance, but an exception occurs that makes Ansible to fail and an unknown error at the UI
Note that any Ansible play that use the following Ansible module may get the exception ovirt.ovirt.ovirt_host: state: maintenance
moving to UX team for investigation, the UI window is raised while there's nothing significant logged in backend, not even in DEBUG mode.
As also ovirt-maintain-host task failed it does not seem like just a UI issue and ansible failure should also be investigated as this is the main issue blocking several QE jobs. We are trying to find a WA and not use this host but I would advise starting with the failed ansible issue unless you thing fixing front end/UI can resolve also the ansible failiure.
FileNotFoundError: [Errno 2] No such file or directory: '/run/vdsm/storage/6f5c1326-822a-42d7-90b5-ec18e4054d50/e33de9f3-a354-4321-a574-5306be9798a7/a52870ad-2fcd-4315-8b1b-27e23f99a2b7' StatusStorageThread::ERROR::2022-07-17 18:53:41,591::status_broker::98::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to read state. Traceback (most recent call last): Note that the reported missing file, is exists but with the permissions of "nobody:nobody" (is this expected?) File exists: [root@caracal01 ~]# ll /run/vdsm/storage/6f5c1326-822a-42d7-90b5-ec18e4054d50/e33de9f3-a354-4321-a574-5306be9798a7/a52870ad-2fcd-4315-8b1b-27e23f99a2b7 -rw-rw----. 1 nobody nobody 134217728 Jul 17 23:45 /run/vdsm/storage/6f5c1326-822a-42d7-90b5-ec18e4054d50/e33de9f3-a354-4321-a574-5306be9798a7/a52870ad-2fcd-4315-8b1b-27e23f99a2b7 [root@caracal01 ~]#
I don't think it matters in any way, there's no functional impact, just the bogus error window in UI it seems. Even if there are more issues I'd liekt o start with that window first.
I disagree @Michal and will explain why: We have a clear QE automation impact as also deactivate host API calls fail which means multiple test cases that deactivate hosts will fail as well. This should have higher attention as those multiple missed tests will mask the true state of 4.5.2. Raising severity to high and marking this bug as an automation blocker. For customers that rely on SDK/API/ansible scripts it will also fail thus worth fixing and soon please. It's a clear regression as it was not seen in the latest rhv-4.5.1-5 (engine 4.5.1.3-0.28). For some reason this occurs only on Hosted Engine environments and not in regular ones but we/QE and customers use mostly HE envs it's important enough for fix.
I can reproduce that via UI and REST on any host deactivation. When did it start occurring?
impact on tests doesn't make the bug more severe. priority-wise it's important to fix it, but it's not a severe bug, there's no functional impact on maintenance operation
(In reply to Michal Skrivanek from comment #13) > I can reproduce that via UI and REST on any host deactivation. When did it > start occurring? It started from the latest version 4.5.2-1 (before it released to QE)
there is no oVirt 4.5.2-1. If it reproduces in nightly can you pinpoint the day? If it's on QE downstream builds, can you please share the last passing one and the failing one? 4.5.0-11 passed?
Shmuel, can you please take a look? you've been the last one touching this area simple REST API call to /api/hosts/[UUID]/deactivate that moves it to Maintenance actually returns: <action> <fault> <detail>[]</detail> <reason>Operation Failed</reason> </fault> <status>failed</status> </action>
(In reply to Michal Skrivanek from comment #16) > there is no oVirt 4.5.2-1. If it reproduces in nightly can you pinpoint the > day? About QE nightly runs: They Started failing at Friday July 17th the issue reproduce with ovirt-engine-4.5.1.3-0.36.el8ev.noarch. Before that July 15th nightly failed on an earlier stage not related to this issue. On July 14th nightly passed with ovirt-engine-4.5.1.3-654.af4ac851f145.35.el8ev.noarch. So this issue entered from July 14th to July17th. > If it's on QE downstream builds, can you please share the last passing > one rhv-4.5.1-5 was the last QE downstream build that passed > and the failing one? rhv-4.5.2-1 was the failing one (which is the next downstream build QE got after rhv-4.5.1-5) > For some reason this occurs only on Hosted Engine environments and not in > regular ones Correcting myself on this one: I double checked an this issue occurs on ALL ENVs also on non HE env.
(In reply to Avihai from comment #20) > I double checked an this issue occurs on ALL ENVs also on non HE env. I didn't succeed to reproduce this issue in non-HE environment. But anyway, the fix I've just posted solves it in HE environment, as I've checked.
Roni, please update here when you're able to run the tests with a build that includes [1] [1] https://github.com/oVirt/ovirt-engine/commit/478fda94c0dd56a6a31efdd812fc7e6b38211299
This issue is a side effect of the fix for bz 1789389
Verified using automation version: 4.5.2-0.3.el8ev
This bugzilla is included in oVirt 4.5.2 release, published on August 10th 2022. Since the problem described in this bug report should be resolved in oVirt 4.5.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.