Bug 1272075 - [vdsm] cannot add host with 3.6 vdsm into 3.5.4 engine / Storage domain does not exist
[vdsm] cannot add host with 3.6 vdsm into 3.5.4 engine / Storage domain does ...
Status: CLOSED WORKSFORME
Product: vdsm
Classification: oVirt
Component: General (Show other bugs)
4.17.9
x86_64 Linux
unspecified Severity high (vote)
: ovirt-3.6.2
: 4.17.10
Assigned To: Ala Hino
Aharon Canan
storage
: Triaged
Depends On:
Blocks: 1264667
  Show dependency treegraph
 
Reported: 2015-10-15 08:18 EDT by Jiri Belka
Modified: 2016-03-10 02:26 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-11-20 09:45:08 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
ybronhei: ovirt‑3.6.z?
ybronhei: blocker?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)

  None (edit)
Description Jiri Belka 2015-10-15 08:18:33 EDT
Description of problem:

cannot add host with 3.6 vdsm into 3.5.4 engine, imo it should work:

[root@dell-r210ii-04 ~]# egrep "^[[:blank:]]*\'supportedENGINEs|clusterLevels" /usr/share/vdsm/dsaversion.py                                                                                                        
    'clusterLevels': ['3.4', '3.5', '3.6'],
    version_info['clusterLevels'] = ['3.6']
[root@dell-r210ii-04 ~]# rpm -q vdsm
vdsm-4.17.9-1.el7ev.noarch

----%----
...
Thread-31::DEBUG::2015-10-15 13:55:28,901::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) /usr/bin/sudo -n /sbin/iscsiadm -m node -T iqn.2012-06.brq.str-01:brq-setup -I default -p 10.34.63.202:3260,1 -n node.startu
p -v manual --op=update (cwd None)
Thread-31::DEBUG::2015-10-15 13:55:28,910::iscsiadm::97::Storage.Misc.excCmd::(_runCmd) SUCCESS: <err> = ''; <rc> = 0
Thread-31::DEBUG::2015-10-15 13:55:28,910::utils::676::root::(execCmd) /sbin/udevadm settle --timeout=5 (cwd None)
Thread-31::DEBUG::2015-10-15 13:55:33,925::utils::694::root::(execCmd) FAILED: <err> = ''; <rc> = 1
Thread-31::ERROR::2015-10-15 13:55:33,925::udevadm::61::root::(settle) Process failed with rc=1 out='' err=''
...
Thread-36::DEBUG::2015-10-15 13:56:10,022::fileUtils::143::Storage.fileUtils::(createdir) Creating directory: /rhev/data-center/3a63d854-bed0-11e0-b671-545200312d04 mode: None
Thread-43::ERROR::2015-10-15 13:56:10,023::sdc::138::Storage.StorageDomainCache::(_findDomain) looking for unfetched domain 219bc71f-c5ec-4ace-80f5-f07b2f892163
.Thread-39::DEBUG::2015-10-15 13:56:11,201::lvm::291::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_s
tate=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/1brq-setup|/dev/mapper/1brqsetup02|'\'', '\''r|.*|'\'' ] }  global {  locking_type=1  prioritise_write_locks=1  wait_
for_locks=1  use_lvmetad=0 }  backup {  retain_min = 50  retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_coun
t,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name e469d56b-af8f-4a4a-b0dd-9fe2e8f84e69 (cwd None)
Thread-37::DEBUG::2015-10-15 13:56:11,201::lvm::514::Storage.OperationMutex::(_invalidatelvs) Operation 'lvm reload operation' is holding the operation mutex, waiting...
Thread-43::ERROR::2015-10-15 13:56:11,202::sdc::144::Storage.StorageDomainCache::(_findDomain) domain 219bc71f-c5ec-4ace-80f5-f07b2f892163 not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 142, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 172, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'219bc71f-c5ec-4ace-80f5-f07b2f892163',)
Thread-43::ERROR::2015-10-15 13:56:11,210::monitor::250::Storage.Monitor::(_monitorDomain) Error monitoring domain 219bc71f-c5ec-4ace-80f5-f07b2f892163
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/monitor.py", line 238, in _monitorDomain
    self._produceDomain()
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 774, in wrapper
    value = meth(self, *a, **kw)
  File "/usr/share/vdsm/storage/monitor.py", line 297, in _produceDomain
    self.domain = sdCache.produce(self.sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 99, in produce
    domain.getRealDomain()
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 123, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 142, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 172, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'219bc71f-c5ec-4ace-80f5-f07b2f892163',)
...
----%----

Version-Release number of selected component (if applicable):
vdsm-4.17.9-1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. install rhel 7.2 (i used RHEL-7.2-20151008.0)
2. install vdsm from 3.6
3. add the host into 3.5.4 (the current released version) engine

Actual results:
failure

	
2015-Oct-15, 14:13
Host dell-r210ii-04.rhev.lab.eng.brq.redhat.com cannot access the Storage Domain(s) str03-brqsetup03 attached to the Data Center DEF. Setting Host state to Non-Operational.
362378b7
oVirt
	

2015-Oct-15, 14:13	
Host dell-r210ii-04.rhev.lab.eng.brq.redhat.com reports about one of the Active Storage Domains as Problematic.

Expected results:
should work

Additional info:
Comment 2 Oved Ourfali 2015-10-16 04:18:47 EDT
The errors seem storage related. Yaniv, can you take a look?
Comment 3 Yaniv Bronhaim 2015-10-19 07:13:28 EDT
Yes, in the log I see only the exception above repeating forever after the upgrade. please update target milestone to 3.6.0 if this is really 100% reproducible as a new upgrade regression
Comment 4 Tal Nisan 2015-10-19 08:46:31 EDT
Ala, please have a look asap
Comment 5 Ala Hino 2015-10-20 03:17:25 EDT
Couldn't reproduce.

In both logs there is no messages regarding unsupported cluster level.
The only thing seen in vdsm log is StorageDomainDoesNotExist.
This error indicates that there is an existing storage domain on this though this host is just being installed.

Jiri,
Can you make sure this host is clean? Maybe remove /rhev/ dir (assuming no data to lose).
Comment 6 Jiri Belka 2015-10-22 03:39:53 EDT
3.5.4 was our long running RHEVM in hosted-engine setup. would you like to have access to this env or should i try to replicate the issue on clean env?
Comment 7 Ala Hino 2015-10-22 03:50:58 EDT
I would recommend to reproduce on a clean env and if issue still exists, I will take a look at the env.

Please make sure host is clean regarding old storage domains.

Thanks!
Comment 8 Allon Mureinik 2015-11-18 08:37:23 EST
Pushing out until we have a reproducer.
If this this just a dirty env issue, it's not 3.6.1 material.
Comment 9 Jiri Belka 2015-11-20 09:45:08 EST
(In reply to Ala Hino from comment #7)
> I would recommend to reproduce on a clean env and if issue still exists, I
> will take a look at the env.
> 
> Please make sure host is clean regarding old storage domains.
> 
> Thanks!

Hm, I can't reproduce on clean env:

- rhevm 3.5.6 and rhel 7.1 with 3.5.6 vdsm
  > storage domain up
- rhel 7.2 with vdsm-4.17.10.1-0.el7ev.noarch

Note You need to log in before you can comment on or make changes to this bug.