Created attachment 1511662 [details] engine & vdsm logs Description of problem: API deactivate host Post request failed with ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-17) [hosts_syncAction_c88085a1-1fa6-4f1b] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_2', pool id '55b614dc-f75d-11e8-80fd-001a4a168bfc' as there are uncleared tasks 'Task '791e2033-3b6a-469b-90c6-966ed6691d5e', status 'finished'' Version-Release number of selected component (if applicable): ovirt-engine-4.2.7.5-0.1.el7ev.noarch vdsm vdsm-4.20.43-1.el7ev.x86_64 libvirt libvirt-4.5.0-10.el7_6.3.x86_64 qemu-img-rhev-2.12.0-18.el7_6.1.x86_64 How reproducible: once in automation run https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-tier1/272/ Steps to Reproduce: 1. Send deactivate host action Action request content is -- url:/ovirt-engine/api/hosts/5c19ab08-36c7-4efe-bec2-47b75ccec856/deactivate body:<action> <async>false</async> <grace_period> <expiry>10</expiry> </grace_period> </action> Actual results: deactivation fails with ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (default task-17) [hosts_syncAction_c88085a1-1fa6-4f1b] SpmStopVDSCommand::Not stopping SPM on vds 'host_mixed_2', pool id '55b614dc-f75d-11e8-80fd-001a4a168bfc' as there are uncleared tasks 'Task '791e2033-3b6a-469b-90c6-966ed6691d5e', status 'finished'' Expected results: host is deactivated Additional info:for qe - rhevmtests/compute/sla/multiple_pinning/multiple_pinning_test.py (TestMultiplePinning01)
Raising severity as this bug fails 10 cases in our last tier1 cycle. Also adding this piece from the engine.log that looks very suspicious: 2018-12-04 08:05:31,284+02 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-34) [63214c63] SPMAsyncTask::PollTask: Polling task '791e2033-3b6a-469b-90c6-966ed6691d5e' (Parent Command 'Unknown', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'success'. 2018-12-04 08:05:31,284+02 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedThreadFactory-engineScheduled-Thread-34) [63214c63] BaseAsyncTask::logEndTaskFailure: Task '791e2033-3b6a-469b-90c6-966ed6691d5e' (Parent Command 'Unknown', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure: -- Result: 'success' -- Message: '', -- Exception: '[null]'
the engine.log is not complete and I suppose we'll need to see the whole thing, but it anyway looks like some pending/stuck/stale storage operation, reassigning to Storage
It seems like you have a pending task in which case the SPM should indeed not be deactivated, any actions you ran before trying to deactivate the host?
Created attachment 1513078 [details] engine.log for the automation run The full engine.log from the run is attached. Nothing specific was done to the host before. The test starts on the environment with three hosts in Up state. scenario: Start VM golden_env_mixed_virtio_1_0 Get the VM's host - host_mixed_1 , Stop VM. Deactivate the host host_mixed_1. with {'expected_status': 'maintenance'} Start VM golden_env_mixed_virtio_1_0 Get the VM golden_env_mixed_virtio_1_0 host - host_mixed_2. Stop vm golden_env_mixed_virtio_1_0. Deactivate the host host_mixed_2. with {'expected_status': 'maintenance'} 05:34:27 2018-12-04 05:34:27,047 ERROR Result: FAILED
Can you check in the database or on the host which tasks was running on the SPM?
This environment was changed already. So, I can't run 'vdsm-client Host getAllTasks' on the host. Could be this info extracted from the vdsm.logs?
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.
need more logs for vdsm and engine, the command for the task is not in the logs.
The deactivation action happens at 2018-12-04 05:34:26 and the attached logs include this period of time. engine.log starts at 2018-12-04 04:55:25 - ends at 2018-12-04 06:01:01 vdsm.log 2018-12-04 04:55:25 - 2018-12-04 06:01:01 Since this run was in December I have no more logs to add since the infra team removes the old jobs. If I see such incident again in further runs I'll add additional info to this BZ.
(In reply to Polina from comment #9) > The deactivation action happens at 2018-12-04 05:34:26 > and the attached logs include this period of time. > engine.log starts at 2018-12-04 04:55:25 - ends at 2018-12-04 06:01:01 > vdsm.log 2018-12-04 04:55:25 - 2018-12-04 06:01:01 > > Since this run was in December I have no more logs to add since the infra > team removes the old jobs. > If I see such incident again in further runs I'll add additional info to > this BZ. @Polin do you think it's the same as? https://bugzilla.redhat.com/show_bug.cgi?id=1664045
not sure . I think that Bz 1664045 talks about "no error to the user". Here the deactivation fails with error reported to logs
followed reproducing steps, and I got no error. when the virtual machines where running, the result was that there virtual machines running. when I shut down virtual machines, the action has been completed successfully. from the attached logs, we can get enough info about the running task on SPM host. if its long task or stuck task! <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <action> <async>false</async> <grace_period> <expiry>10</expiry> </grace_period> <host href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412" id="f8c911e3-e205-4afd-9f06-2fb51b3ef412"> <actions> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/refresh" rel="refresh"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/install" rel="install"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/activate" rel="activate"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/deactivate" rel="deactivate"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/upgrade" rel="upgrade"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/setupnetworks" rel="setupnetworks"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/syncallnetworks" rel="syncallnetworks"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/approve" rel="approve"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/commitnetconfig" rel="commitnetconfig"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/enrollcertificate" rel="enrollcertificate"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/fence" rel="fence"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/forceselectspm" rel="forceselectspm"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/iscsidiscover" rel="iscsidiscover"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/iscsilogin" rel="iscsilogin"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/unregisteredstoragedomainsdiscover" rel="unregisteredstoragedomainsdiscover"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/upgradecheck" rel="upgradecheck"/> </actions> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/nics" rel="nics"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/affinitylabels" rel="affinitylabels"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/fenceagents" rel="fenceagents"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/devices" rel="devices"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/hooks" rel="hooks"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/numanodes" rel="numanodes"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/storage" rel="storage"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/networkattachments" rel="networkattachments"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/storageconnectionextensions" rel="storageconnectionextensions"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/unmanagednetworks" rel="unmanagednetworks"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/externalnetworkproviderconfigurations" rel="externalnetworkproviderconfigurations"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/katelloerrata" rel="katelloerrata"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/permissions" rel="permissions"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/tags" rel="tags"/> <link href="/ovirt-engine/api/hosts/f8c911e3-e205-4afd-9f06-2fb51b3ef412/statistics" rel="statistics"/> </host> <status>complete</status> </action>
Hi Ahmad, the problem was not reproduced easily with the described steps. I just described the scenario in which it was faced once with the hope that some hint could be found in logs. If it doesn't help and we don't find what could be fixed in this area, I think we could close the issue. If it happens again in our automation runs, I'll reopen the bz and will try to get the host's tasks the 'vdsm-client Host getAllTasks'.