Created attachment 644870 [details] ## Logs vdsm, rhevm Description of problem: VDSM reports wrong state to engine, host stay in “UP” state, although Libvirt not responding. During deadlock Livbirt, host stay in “UP” state. Version-Release number of selected component (if applicable): RHEVM 3.1 - SI24.1 RHEVM: rhevm-3.1.0-28.el6ev.noarch VDSM: vdsm-4.9.6-42.0.el6_3.x86_64 LIBVIRT: libvirt-0.9.10-21.el6_3.5.x86_64 QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.295.el6_3.5.x86_64 SANLOCK: sanlock-2.3-4.el6_3.x86_64 How reproducible: 100% Steps to Reproduce: Enter Libvirt to deadlock on HSM server gdb libvirt process Actual results: Libvirt enter in deadlock. VDSM failed respond to “vdsClient -s 0 getVdsCaps” Engine send “vdsClient -s 0 getVdsStats” command and get respond, and mark that, no problem found on host. VDSM report status “UP” Expected results: Engine need send “vdsClient -s 0 getVdsCaps” command, and if respond failed, move a host to “Non Responsive” state. And in our case only “vdsClient -s 0 getVdsStats” command send and get respond Additional info: Real life scenario: 1. Create iSCSI DC with 2 hosts 2. Create VM with multiple disks on multiple storage domains 3. Run VM on HSM 4. Install OS (RHEL 6.3) 5. Install RHEV Agent (Guest Agent) 6. Create a snapshot 7. Snapshot --> Preview 8. Snapshot --> Commit 9. Snapshot --> Delete snapshot 10. Power-on VM, OS stuck on boot 11. Failed Power OFF 12. Failed power-on a new created VM's in same DC
After discussion with Yaniv.K & Miki we reached a decision that It seems that there is only one scenario that libvirt is going deadlock. In case more issues come up we'll reopen this BZ. for now we CLOSE DEFERRED