Created attachment 632063 [details] engine Description of problem: error message VdcBLLException: null when libvirt & vdsm are in deadlock Version-Release number of selected component (if applicable): 3.1/si21 How reproducible: always Steps to Reproduce: 1.Installed rhevm+dwh+reports 2.Created setup - DC, Host, Storage(iscsi), vms 3.Changed engine & vds date to 31/12/12 Actual results: All VMs show "Not Responding" Host CPU jumps to 99% libvirt in deadlock while trying to destroy qemu process while it can't communicate with its monitor socket Expected results: should give such exceptions in engine log Additional info: logs
The problem is that when Engine tries to connect to vdsm, while vdsm is in deadlock, there is a timeout, But the engine also sends Exception with NULL to the log, which it shouldnt be. error message "VdcBLLException: null" when engine tries to run VM and operation is failing due to network timeout exception
I'd also rename VdcBllException...
Is there a BZ for the deadlock which is more important than the error message that we can clean up in 3.2
(In reply to comment #3) > Is there a BZ for the deadlock which is more important than the error > message that we can clean up in 3.2 deadlock was originated in libvirt and already solved in libvirt-0.9.10-21.el6_3.6.x86_64.
better error reporting was merged [1] so now you should see something like 2012-12-06 17:23:14,424 ERROR [org.ovirt.engine.core.bll.StopVmCommand] (pool-10-thread-48) [7c2d1e0a] Command org.ovirt.engine.core.bll.StopVmCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.util.concurrent.TimeoutException [1] http://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=commit;h=6b7ae33a8ef9682adf00bf2495487d3617ffc99b
to test the change you can block a host to 54321 and see the error underlying exception is now printed host shell: iptables -I INPUT --proto tcp --dport 54321 -j REJECT engine: 2012-12-30 14:37:15,282 WARN [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-13) ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = ab416c3a-4374-43b1-961b-008897d74b87 : suz, VDS Network Error, continuing. java.net.NoRouteToHostException: No route to host
Already solved and verified.
3.2 has been released