Bug 1272075

Summary: [vdsm] cannot add host with 3.6 vdsm into 3.5.4 engine / Storage domain does not exist
Product: [oVirt] vdsm Reporter: Jiri Belka <jbelka>
Component: GeneralAssignee: Ala Hino <ahino>
Status: CLOSED WORKSFORME QA Contact: Aharon Canan <acanan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.17.9CC: ahino, amureini, bugs, jbelka, oourfali, tnisan, ybronhei, ylavi
Target Milestone: ovirt-3.6.2Keywords: Triaged
Target Release: 4.17.10Flags: ybronhei: ovirt-3.6.z?
ybronhei: blocker?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-20 14:45:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1264667    

Description Jiri Belka 2015-10-15 12:18:33 UTC
Description of problem:

cannot add host with 3.6 vdsm into 3.5.4 engine, imo it should work:

[root@dell-r210ii-04 ~]# egrep "^[[:blank:]]*\'supportedENGINEs|clusterLevels" /usr/share/vdsm/dsaversion.py                                                                                                        
    'clusterLevels': ['3.4', '3.5', '3.6'],
    version_info['clusterLevels'] = ['3.6']
[root@dell-r210ii-04 ~]# rpm -q vdsm
vdsm-4.17.9-1.el7ev.noarch

----%----
...
Thread-31::DEBUG::2015-10-15 13:55:28,901::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2012-06.brq.str-01:brq-setup -I default -p 10.34.63.202:3260,1 -n node.startu
p -v manual --op=update (cwd None)
Thread-31::DEBUG::2015-10-15 13:55:28,910::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) SUCCESS: <err> = ''; <rc> = 0
Thread-31::DEBUG::2015-10-15 13:55:28,910::utils::676::root::(execCmd) /sbin/udevadm settle --timeout=5 (cwd None)
Thread-31::DEBUG::2015-10-15 13:55:33,925::utils::694::root::(execCmd) FAILED: <err> = ''; <rc> = 1
Thread-31::ERROR::2015-10-15 13:55:33,925::udevadm::61::root::(settle) Process failed with rc=1 out='' err=''
...
Thread-36::DEBUG::2015-10-15 13:56:10,022::fileUtils::143::Storage.fileUtils::(createdir) Creating directory: /rhev/data-center/3a63d854-bed0-11e0-b671-545200312d04 mode: None
Thread-43::ERROR::2015-10-15 13:56:10,023::sdc::138::Storage.StorageDomainCache::(_findDomain) looking for unfetched domain 219bc71f-c5ec-4ace-80f5-f07b2f892163
.Thread-39::DEBUG::2015-10-15 13:56:11,201::lvm::291::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_s
tate=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/1brq-setup|/dev/mapper/1brqsetup02|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_
for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_coun
t,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name e469d56b-af8f-4a4a-b0dd-9fe2e8f84e69 (cwd None)
Thread-37::DEBUG::2015-10-15 13:56:11,201::lvm::514::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-43::ERROR::2015-10-15 13:56:11,202::sdc::144::Storage.StorageDomainCache::(_findDomain) domain 219bc71f-c5ec-4ace-80f5-f07b2f892163 not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 142, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 172, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'219bc71f-c5ec-4ace-80f5-f07b2f892163',)
Thread-43::ERROR::2015-10-15 13:56:11,210::monitor::250::Storage.Monitor::(_monitorDomain) Error monitoring domain 219bc71f-c5ec-4ace-80f5-f07b2f892163
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/monitor.py", line 238, in _monitorDomain
    self._produceDomain()
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 774, in wrapper
    value = meth(self, *a, **kw)
  File "/usr/share/vdsm/storage/monitor.py", line 297, in _produceDomain
    self.domain = sdCache.produce(self.sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 99, in produce
    domain.getRealDomain()
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 123, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 142, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 172, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'219bc71f-c5ec-4ace-80f5-f07b2f892163',)
...
----%----

Version-Release number of selected component (if applicable):
vdsm-4.17.9-1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. install rhel 7.2 (i used RHEL-7.2-20151008.0)
2. install vdsm from 3.6
3. add the host into 3.5.4 (the current released version) engine

Actual results:
failure

	
2015-Oct-15, 14:13
Host dell-r210ii-04.rhev.lab.eng.brq.redhat.com cannot access the Storage Domain(s) str03-brqsetup03 attached to the Data Center DEF. Setting Host state to Non-Operational.
362378b7
oVirt
	

2015-Oct-15, 14:13	
Host dell-r210ii-04.rhev.lab.eng.brq.redhat.com reports about one of the Active Storage Domains as Problematic.

Expected results:
should work

Additional info:

Comment 2 Oved Ourfali 2015-10-16 08:18:47 UTC
The errors seem storage related. Yaniv, can you take a look?

Comment 3 Yaniv Bronhaim 2015-10-19 11:13:28 UTC
Yes, in the log I see only the exception above repeating forever after the upgrade. please update target milestone to 3.6.0 if this is really 100% reproducible as a new upgrade regression

Comment 4 Tal Nisan 2015-10-19 12:46:31 UTC
Ala, please have a look asap

Comment 5 Ala Hino 2015-10-20 07:17:25 UTC
Couldn't reproduce.

In both logs there is no messages regarding unsupported cluster level.
The only thing seen in vdsm log is StorageDomainDoesNotExist.
This error indicates that there is an existing storage domain on this though this host is just being installed.

Jiri,
Can you make sure this host is clean? Maybe remove /rhev/ dir (assuming no data to lose).

Comment 6 Jiri Belka 2015-10-22 07:39:53 UTC
3.5.4 was our long running RHEVM in hosted-engine setup. would you like to have access to this env or should i try to replicate the issue on clean env?

Comment 7 Ala Hino 2015-10-22 07:50:58 UTC
I would recommend to reproduce on a clean env and if issue still exists, I will take a look at the env.

Please make sure host is clean regarding old storage domains.

Thanks!

Comment 8 Allon Mureinik 2015-11-18 13:37:23 UTC
Pushing out until we have a reproducer.
If this this just a dirty env issue, it's not 3.6.1 material.

Comment 9 Jiri Belka 2015-11-20 14:45:08 UTC
(In reply to Ala Hino from comment #7)
> I would recommend to reproduce on a clean env and if issue still exists, I
> will take a look at the env.
> 
> Please make sure host is clean regarding old storage domains.
> 
> Thanks!

Hm, I can't reproduce on clean env:

- rhevm 3.5.6 and rhel 7.1 with 3.5.6 vdsm
  > storage domain up
- rhel 7.2 with vdsm-4.17.10.1-0.el7ev.noarch