Bug 689814

Summary: 2.2.8 - Host became non-responsive after attaching ISCSI data storage domain to the DataCenter
Product: Red Hat Enterprise Linux 5 Reporter: Evgeniy German <egerman>
Component: vdsm22Assignee: Igor Lvovsky <ilvovsky>
Status: CLOSED CURRENTRELEASE QA Contact: yeylon <yeylon>
Severity: high Docs Contact:
Priority: high    
Version: 5.6CC: abaron, bazulay, danken, iheim, lpeer, srevivo, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-04-14 12:46:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
RHEVM and VDSM22 logs
none
vdsm log (RHEL6 log) none

Description Evgeniy German 2011-03-22 14:16:49 UTC
Created attachment 486804 [details]
RHEVM and VDSM22 logs

Description of problem:
The status of non SPM host is non-responsive after attaching ISCSI Data storage domain to the datacenter

Version-Release number of selected component (if applicable):
RHEVM version:ic 104
vdsm22 on both hosts:vdsm22-4.5-63.24.el5_6


Steps to Reproduce:
1.Create data center and cluster (type iscsi version 2.2)
2.Add at least two hosts
3.Create ISCSI Data storage domain
4.Attach created storage
5.One host is SPM and another one is non-responsive

Expected results:
All hosts in status UP and One of them is SPM

Additional info:
*The same behaviour also with REST API

Comment 1 Dan Kenigsberg 2011-03-30 22:43:36 UTC
For how long does the non-SPM host stay non-responsive? Forever?

Is this behavior new to rhev-m-2.3?

Either way,

Thread-12940::ERROR::2011-03-22 15:45:57,547::misc::66::irs::'masterValidate'
Thread-12940::ERROR::2011-03-22 15:45:57,548::misc::67::irs::Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 978, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1696, in public_repoStats
    valid = (master_stats['masterValidate']['mount'] and
KeyError: 'masterValidate'

smells like the outcome of a race between adding a SD and reporting its repoStats.

Comment 2 Evgeniy German 2011-03-31 05:09:03 UTC
(In reply to comment #1)
> For how long does the non-SPM host stay non-responsive? Forever?
> 
> Is this behavior new to rhev-m-2.3?
> 
> Either way,
> 
> Thread-12940::ERROR::2011-03-22 15:45:57,547::misc::66::irs::'masterValidate'
> Thread-12940::ERROR::2011-03-22 15:45:57,548::misc::67::irs::Traceback (most
> recent call last):
>   File "/usr/share/vdsm/storage/task.py", line 978, in _run
>     return fn(*args, **kargs)
>   File "/usr/share/vdsm/storage/hsm.py", line 1696, in public_repoStats
>     valid = (master_stats['masterValidate']['mount'] and
> KeyError: 'masterValidate'
> 
> smells like the outcome of a race between adding a SD and reporting its
> repoStats.

The non-SPM host stay forever on non-responsive state.

Comment 3 Igor Lvovsky 2011-04-10 15:23:19 UTC
It looks like setup issue.
I just added several logs and the problem disappeared. 
Let's try to reproduce it on RHEL6

Comment 4 Evgeniy German 2011-04-12 09:47:35 UTC
Created attachment 491439 [details]
vdsm log (RHEL6 log)

Comment 5 Evgeniy German 2011-04-12 09:48:19 UTC
Reproduced in RHEL6 (log attached)