Bug 881803
| Summary: | engine: loop in event log report on task that failed with NPE | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Dafna Ron <dron> | ||||||
| Component: | ovirt-engine | Assignee: | Mooli Tayer <mtayer> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Leonid Natapov <lnatapov> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 3.1.0 | CC: | aberezin, acathrow, bazulay, dron, emesika, gklein, iheim, jkt, lpeer, lsvaty, mavital, pstehlik, Rhev-m-bugs, yeylon, yzaslavs | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | 3.4.0 | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | infra | ||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2014-06-12 14:04:55 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
This is a problem with async task mechanism. AsyncTaskMechanism wily to perform endSuccessfully or endWithFailure. If there is NPE - it can be stuck forever. This requires infra change for async_tasks (such situations currently are handled by flow-related bugs - Dafna, I suggest you open a bug related to the specific flow with the NPE). Not sure we will be able to fix the "loop forever" at AsyncTaskManager issue for 3.2. Suggesting this to future. Mooli, please check if this bug is still relevant. Following Yair's comment, as for this fllow, The NPE seems to be thrown from a null vm in VmHandler.UnLockVm. Investigating AddVmTemplateCommand to see how this is possible. NPE in such flows usually happens due to compensation mechanism + restart of Engine while the task is being created. However this is not the case described above. We have investigated this bug and could not reproduce it according to the flow above. Need to investigate whether it happens in engine restart (and the object disappears from the DB), and than see whether the phenomena still happens. Dafna , we could not reproduce that, do you have a clear reproducer or is it OK to close with WORKSFORME ? Meital, Please try to reproduce this issue. re-adding comment (Gil pointed out I might have add it incorrectly :) came across same issue in older rhevm33 version, reproducible 5% of time (Could not reproduce for the log collection and reproduction steps. steps to reproduce: 1. create template from VM 2. wait for VM status locked 3. in database change VM status in vm_dynamic table from 15 (locked) to 0 (down) 4. see engine log At the moment there is no NPE, in my opinion is that this was fixed in previouse version. Still some errors appears, but engine recovered successfully. I suggest closing this as it is currently working and reopening if NPE appears in similar workflow. attaching engine.log of my work-flow and stuck action of recovering from error Created attachment 868105 [details]
lsvaty-engine.log
Moving to ON-QA per comment 9 I suggest closing it as it is fixed in current release 3.4.0-0.3.master.el6ev. Closing as part of 3.4.0 |
Created attachment 654373 [details] logs Description of problem: I have a task that failed with NPE and the failure is reported in endless loop in event log: 2012-Nov-29, 17:31 Failed to complete creation of Template BUG from VM <UNKNOWN>. 2012-Nov-29, 17:31 Failed to complete creation of Template BUG from VM <UNKNOWN>. 2012-Nov-29, 17:31 Failed to complete creation of Template BUG from VM <UNKNOWN>. 2012-Nov-29, 17:31 Failed to complete creation of Template BUG from VM <UNKNOWN>. Version-Release number of selected component (if applicable): si24.5 How reproducible: unknown. Steps to Reproduce: 1. I created a template from a vm 2. the command was hanging so I reloaded the UI 3. Actual results: the command ended with NPE and event log started reporting the event in loop. Expected results: we should not report the event in loop Additional info: log please note that the task is not cleaned: root@gold-vdsc ~]# vdsClient -s 0 getAllTasksInfo 1e17cd6e-5e64-440e-a1d3-9c4e58ecf576 : verb = copyImage id = 1e17cd6e-5e64-440e-a1d3-9c4e58ecf576