Bug 882641

Summary: 3.1.z vdsm: [POSIX] wrong master domain or version for only one of my hosts
Product: Red Hat Enterprise Linux 6 Reporter: Dafna Ron <dron>
Component: vdsmAssignee: Federico Simoncelli <fsimonce>
Status: CLOSED DUPLICATE QA Contact: vvyazmin <vvyazmin>
Severity: high Docs Contact:
Priority: high    
Version: 6.3CC: abaron, amureini, bazulay, cpelland, hateya, iheim, lpeer, ykaul
Target Milestone: rc   
Target Release: 6.4   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: vdsm-4.10.2-2.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-12 17:21:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Dafna Ron 2012-12-02 13:02:12 UTC
Created attachment 655977 [details]
logs

Description of problem:

I activated one of my hosts and got wrong master domain or version. 
looking at the info from getStorageDomainInfo I can see that in only one of my hosts we see the domain with master version 0 instead of 1. 
after vdsm restart the issue was resolved
looks like it might be a cache issue

Version-Release number of selected component (if applicable):

vdsm-4.9.6-44.0.el6_3.x86_64
si24.5

How reproducible:

Steps to Reproduce:
1. in NFS, create a pool while 2 out of the 3 hosts are in maintenance
2. attach and activate iso and export domain
3. activate the two other hosts
  
Actual results:

one host is unable to connect to the pool with wrong master domain version

Expected results:

we should be able to connect to pool and not read from cache. 

Additional info:

before and after vdsm restart:

[root@camel-vdsb vdsm]# vdsClient -s 0 getStorageDomainsList
72ec1321-a114-451f-bee1-6790cbca1bc6
eebcad17-cbf7-4098-880d-81e2c2504169
74b2f39e-d374-4bda-b8a1-fb2aca984a5a

[root@camel-vdsb vdsm]# vdsClient -s 0 getStorageDomainInfo 74b2f39e-d374-4bda-b8a1-fb2aca984a5a
	uuid = 74b2f39e-d374-4bda-b8a1-fb2aca984a5a
	pool = ['85047de7-c841-4ff0-9ebb-3f722001b0b4']
	lver = -1
	version = 3
	role = Master
	remotePath = filer01.qa.lab.tlv.redhat.com:/Dafna-GA
	spm_id = -1
	type = POSIXFS
	class = Data
	master_ver = 0
	name = Dafna-01

[root@camel-vdsb vdsm]# service vdsmd restart
Shutting down vdsm daemon: 
vdsm watchdog stop                                         [  OK  ]
vdsm stop                                                  [  OK  ]
vdsm: libvirt already configured for vdsm                  [  OK  ]
Starting iscsid: 
Starting up vdsm daemon: 
vdsm start                                                 [  OK  ]
[root@camel-vdsb vdsm]# vdsClient -s 0 getStorageDomainInfo 74b2f39e-d374-4bda-b8a1-fb2aca984a5a
	uuid = 74b2f39e-d374-4bda-b8a1-fb2aca984a5a
	pool = ['85047de7-c841-4ff0-9ebb-3f722001b0b4']
	lver = 0
	version = 3
	role = Master
	remotePath = filer01.qa.lab.tlv.redhat.com:/Dafna-GA
	spm_id = 1
	type = POSIXFS
	class = Data
	master_ver = 1
	name = Dafna-01

my two other hosts:

[root@gold-vdsc ~]# vdsClient -s 0 getStorageDomainInfo 74b2f39e-d374-4bda-b8a1-fb2aca984a5a
	uuid = 74b2f39e-d374-4bda-b8a1-fb2aca984a5a
	pool = ['85047de7-c841-4ff0-9ebb-3f722001b0b4']
	lver = 0
	version = 3
	role = Master
	remotePath = filer01.qa.lab.tlv.redhat.com:/Dafna-GA
	spm_id = 1
	type = POSIXFS
	class = Data
	master_ver = 1
	name = Dafna-01


[root@gold-vdsd ~]# vdsClient -s 0 getStorageDomainInfo 74b2f39e-d374-4bda-b8a1-fb2aca984a5a
	uuid = 74b2f39e-d374-4bda-b8a1-fb2aca984a5a
	pool = ['85047de7-c841-4ff0-9ebb-3f722001b0b4']
	lver = 0
	version = 3
	role = Master
	remotePath = filer01.qa.lab.tlv.redhat.com:/Dafna-GA
	spm_id = 1
	type = POSIXFS
	class = Data
	master_ver = 1
	name = Dafna-01

Comment 1 Allon Mureinik 2012-12-05 10:27:07 UTC
Fede, is this a duplicate of bug 879253 ?

Comment 2 Federico Simoncelli 2012-12-12 10:14:16 UTC
(In reply to comment #1)
> Fede, is this a duplicate of bug 879253 ?

Yes I think we can close this as a duplicate of bug 879253. Dafna would you like to try to reproduce before closing?

Comment 4 Ayal Baron 2012-12-12 12:54:29 UTC
commit 9d042bdd276c4a3a6c75fe506bb5897044c8ebf8
Author: Federico Simoncelli <fsimonce>
Date:   Thu Nov 22 09:04:00 2012 -0500

    sdcache: add refresh to connectStoragePool
    
    Change-Id: I2d3adcff7bb0e97be5c797cd720c6353920d9db0
    Signed-off-by: Federico Simoncelli <fsimonce>

http://gerrit.ovirt.org/#/c/9422/

Comment 6 Chris Pelland 2013-03-12 17:21:30 UTC

*** This bug has been marked as a duplicate of bug 879253 ***