Created attachment 696148 [details] logs Description of problem: I blocked my iscsi storage domain (1 domain) from both my hosts using iptables. connect to super vdsm failed due to fail in prepareForShutDown and vdsm is "stuck" MainThread::DEBUG::2013-02-11 13:58:14,051::clientIF::228::vds::(prepareForShutdown) cannot run prepareForShutdown concurrently MainThread::DEBUG::2013-02-11 13:58:15,051::clientIF::228::vds::(prepareForShutdown) cannot run prepareForShutdown concurrently Version-Release number of selected component (if applicable): si27 vdsm-4.10.2-1.3.el6.x86_64 libvirt-0.10.2-18.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. create a pool with 1 iscsi storage domain on two host's cluster 2. block connectivity to storage from both hosts using iptables 3. Actual results: connect to super vdsm failed due to fail in prepareForShutDown which leaves the vdsm "stuck" Expected results: vdsm should suceed in prepareForShutdown Additional info:logs
The socket file to communicate with supervdsm was not exist. I still work to figure how it happened, but it can happens as we see. I'm so sure if it is related to the failure in prepareForShutdown because after the reset we also clean the socket file and create it again. When the connect fails, we try again and that's it. My patch changes this process - now we'll try to communicate with supervdsm socket 3 times, and if we fail, we'll kill the process and start it again. The socket is created as part of the initialization of the process.
Instead of killing the process and start it again, I changed it to get vdsm into panic and kill itself. When this occurred vdsm will restart by respawn. This senerio has never happened to me yet and if it is reproducible as mentioned in the bug description please let me know and see.
verified on vdsm-4.10.2-8.0.el6ev.x86_64 I can see that vdsm is still stuck but manages to recover
*** Bug 909967 has been marked as a duplicate of this bug. ***
3.2 has been released