Bug 910005

Summary: vdsm: connect to super vdsm failed due to fail in prepareForShutDown
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: vdsmAssignee: Yaniv Bronhaim <ybronhei>
Status: CLOSED CURRENTRELEASE QA Contact: Dafna Ron <dron>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.1.3CC: abaron, bazulay, hateya, iheim, knesenko, lpeer, mkalinin, sgrinber, ybronhei, ykaul
Target Milestone: ---Keywords: Regression
Target Release: 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: vdsm-4.10.2-8.0.el6ev Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 910490 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 910490, 915537    
Attachments:
Description Flags
logs none

Description Dafna Ron 2013-02-11 15:52:37 UTC
Created attachment 696148 [details]
logs

Description of problem:

I blocked my iscsi storage domain (1 domain) from both my hosts using iptables. 
connect to super vdsm failed due to fail in prepareForShutDown and vdsm is "stuck"


MainThread::DEBUG::2013-02-11 13:58:14,051::clientIF::228::vds::(prepareForShutdown) cannot run prepareForShutdown concurrently
MainThread::DEBUG::2013-02-11 13:58:15,051::clientIF::228::vds::(prepareForShutdown) cannot run prepareForShutdown concurrently

 
Version-Release number of selected component (if applicable):

si27 
vdsm-4.10.2-1.3.el6.x86_64
libvirt-0.10.2-18.el6.x86_64

How reproducible:

100%

Steps to Reproduce:
1. create a pool with 1 iscsi storage domain on two host's cluster
2. block connectivity to storage from both hosts using iptables
3.
  
Actual results:

connect to super vdsm failed due to fail in prepareForShutDown which leaves the vdsm "stuck" 

Expected results:

vdsm should suceed in prepareForShutdown

Additional info:logs

Comment 2 Yaniv Bronhaim 2013-02-11 16:48:26 UTC
The socket file to communicate with supervdsm was not exist. I still work to figure how it happened, but it can happens as we see.

I'm so sure if it is related to the failure in prepareForShutdown because after the reset we also clean the socket file and create it again.

When the connect fails, we try again and that's it. My patch changes this process - now we'll try to communicate with supervdsm socket 3 times, and if we fail, we'll kill the process and start it again. The socket is created as part of the initialization of the process.

Comment 3 Yaniv Bronhaim 2013-02-12 10:58:27 UTC
Instead of killing the process and start it again, I changed it to get vdsm into panic and kill itself. When this occurred vdsm will restart by respawn.

This senerio has never happened to me yet and if it is reproducible as mentioned in the bug description please let me know and see.

Comment 6 Dafna Ron 2013-02-20 17:34:38 UTC
verified on vdsm-4.10.2-8.0.el6ev.x86_64

I can see that vdsm is still stuck but manages to recover

Comment 7 Yaniv Bronhaim 2013-06-09 08:41:57 UTC
*** Bug 909967 has been marked as a duplicate of this bug. ***

Comment 8 Itamar Heim 2013-06-11 09:40:43 UTC
3.2 has been released

Comment 9 Itamar Heim 2013-06-11 09:41:12 UTC
3.2 has been released

Comment 10 Itamar Heim 2013-06-11 09:54:01 UTC
3.2 has been released