Description of problem: Although the validation of add-vm fails the command remains in command_entities. After engine restart, the callback is executed and fails to end the add-vm command. Since we don't set retry to false, it will fail periodically, as shown by the audit logs in the log file. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Create VM such that the validation will fail 2. 3. Actual results: The command remains in FAILED state in command_entities If the end-action of add-vm (that closes the disk operations) fails, an audit log will be periodically produced. Expected results: No periodic audit log should be produced and no callback should be executed Additional info:
Created attachment 1176302 [details] engine_db_backup
Created attachment 1176303 [details] engine_log_1
Created attachment 1176304 [details] engine_log_2
Not sure if this is Storage's or Infra's, but definitely needs solving. Liron - can you take a look please?
Arik/Israel - we have few issues here: 1. endAction() failure. 2. orphaned CommandEntity record for commands with callbacks when validate() fails till the next engine restart. 3. endAction() is executed the endAction() failure shouldn't happen and i couldn't reproduce it, Israel - can you check the exact scenario again and see if it reproduces? please also run the engine under DEBUG logging level and open a separate bug for that issue with the logs. In this bug I'll handle #2 and clone this bug to a new bug for #3.
Liron, previously this problem did not happen in the reported flow (AddVm) and therefore it is a regression. (sure, it seems like a general problem in coco, but only now that it is used in this flow, AddVm is affected by this bug so I think we should treat that as a regression)
Verified on ovirt-engine-4.0.2-0.1.rc.el7ev.noarch Followed the steps: 1. Create VM based on template fails on validation 2. Delete the template 3. Restart the engine That was provided by Liron