Bug 1766582 - After engine upgrade from rhv-4.3.7-1 to 4.4 all ISCSI storage domains does not go up
Summary: After engine upgrade from rhv-4.3.7-1 to 4.4 all ISCSI storage domains does n...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Vojtech Juranek
QA Contact: Avihai
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-29 13:01 UTC by Avihai
Modified: 2020-05-12 12:24 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-03 11:53:47 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)
engine and vdsm logs (624.12 KB, application/gzip)
2019-10-29 13:01 UTC, Avihai
no flags Details

Description Avihai 2019-10-29 13:01:52 UTC
Created attachment 1630169 [details]
engine and vdsm logs

Description of problem:
Upgrade engine with working 3 ISCSI storage domains('iscsi_0', 'iscsi_1', 'iscsi_2') from 4.3.7 to 4.4.

Than add the first newly installed RHEL8.1 host to the data center all other(NFS,gluster) storage domains are up but ISCSI storage domain remains down with the following errors:

Engine log:
2019-10-29 12:49:49,948+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (EE-ManagedThreadFactory-engine-Thread-28) [73422e2c] Failed in 'GetDeviceListVDS' method
2019-10-29 12:49:49,959+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-28) [73422e2c] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_1 command GetDeviceListVDS failed: Error block device action: ()
2019-10-29 12:49:49,960+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (EE-ManagedThreadFactory-engine-Thread-28) [73422e2c] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand' return value '
LUNListReturn:{status='Status [code=600, message=Error block device action: ()]'}
'
2019-10-29 12:49:49,961+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (EE-ManagedThreadFactory-engine-Thread-28) [73422e2c] HostName = host_mixed_1
2019-10-29 12:49:49,961+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (EE-ManagedThreadFactory-engine-Thread-28) [73422e2c] Command 'GetDeviceListVDSCommand(HostName = host_mixed_1, GetDeviceListVDSCommandParameters:{hostId='d199123d-9772-4276-9d28-c83d3ee2d999', storageType='UNKNOWN', checkStatus='false', lunIds='null'})' execution failed: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: (), code = 600
2019-10-29 12:49:49,961+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetDeviceListVDSCommand] (EE-ManagedThreadFactory-engine-Thread-28) [73422e2c] FINISH, GetDeviceListVDSCommand, return: , log id: 1d7be8db
2019-10-29 12:49:49,961+02 ERROR [org.ovirt.engine.core.bll.storage.pool.SyncStorageDomainsLunsCommand] (EE-ManagedThreadFactory-engine-Thread-28) [73422e2c] Command 'org.ovirt.engine.core.bll.storage.pool.SyncStorageDomainsLunsCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetDeviceListVDS, error = Error block device action: (), code = 600 (Failed with error BlockDeviceActionError and code 600)
2019-10-29 12:49:58,127+02 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-30) [] domain '788f4bfb-380f-4b98-93c1-dfa967ea4f3a:iscsi_2' in problem 'STORAGE_ACCCESS_ERROR'. vds: 'host_mixed_1'
2019-10-29 12:49:58,128+02 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-30) [] domain '45f2708d-4b6d-418d-aaf1-10bea1b94552:iscsi_0' in problem 'STORAGE_ACCCESS_ERROR'. vds: 'host_mixed_1'
2019-10-29 12:49:58,129+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-30) [] Host 'd199123d-9772-4276-9d28-c83d3ee2d999' has reported new storage access problem to the following domains '788f4bfb-380f-4b98-93c1-dfa967ea4f3a,45f2708d-4b6d-418d-aaf1-10bea1b94552' marking it for storage connections and pool metadata refresh (report id: 'd0139dc9-ea67-4866-ada7-4242311f8345')
'

VDSM log(5 hours behind due to timezone bug on the host): 
2019-10-29 06:49:40,142-0400 WARN  (monitor/45f2708) [storage.LVM] Reloading VGs failed (vgs=['45f2708d-4b6d-418d-aaf1-10bea1b94552'] rc=5 out=[] err=['  Volume group "45f2708d-4b6d-418d-aaf1-10bea1b94552" not 
found', '  Cannot process volume group 45f2708d-4b6d-418d-aaf1-10bea1b94552']) (lvm:501)
2019-10-29 06:49:40,156-0400 WARN  (monitor/788f4bf) [storage.LVM] Reloading VGs failed (vgs=['788f4bfb-380f-4b98-93c1-dfa967ea4f3a'] rc=5 out=[] err=['  Volume group "788f4bfb-380f-4b98-93c1-dfa967ea4f3a" not 
found', '  Cannot process volume group 788f4bfb-380f-4b98-93c1-dfa967ea4f3a']) (lvm:501)
2019-10-29 06:49:40,241-0400 ERROR (monitor/45f2708) [storage.Monitor] Setting up monitor for 45f2708d-4b6d-418d-aaf1-10bea1b94552 failed (monitor:332)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 329, in _setupLoop
    self._setupMonitor()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 351, in _setupMonitor
    self._produceDomain()
  File "/usr/lib/python3.6/site-packages/vdsm/utils.py", line 153, in wrapper
    value = meth(self, *a, **kw)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 369, in _produceDomain
    self.domain = sdCache.produce(self.sdUUID)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 108, in produce
    domain.getRealDomain()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 50, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 132, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 149, in _findDomain
    return findMethod(sdUUID)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 174, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
vdsm.storage.exception.StorageDomainDoesNotExist: Storage domain does not exist: ('45f2708d-4b6d-418d-aaf1-10bea1b94552',)
2019-10-29 06:49:40,242-0400 INFO  (monitor/45f2708) [storage.Monitor] Domain 45f2708d-4b6d-418d-aaf1-10bea1b94552 became INVALID (monitor:472)
2019-10-29 06:49:40,247-0400 ERROR (monitor/788f4bf) [storage.Monitor] Setting up monitor for 788f4bfb-380f-4b98-93c1-dfa967ea4f3a failed (monitor:332)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 329, in _setupLoop
    self._setupMonitor()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 351, in _setupMonitor
    self._produceDomain()
  File "/usr/lib/python3.6/site-packages/vdsm/utils.py", line 153, in wrapper
    value = meth(self, *a, **kw)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 369, in _produceDomain
    self.domain = sdCache.produce(self.sdUUID)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 108, in produce
    domain.getRealDomain()
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 50, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 132, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 149, in _findDomain
    return findMethod(sdUUID)
  File "/usr/lib/python3.6/site-packages/vdsm/storage/sdc.py", line 174, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
vdsm.storage.exception.StorageDomainDoesNotExist: Storage domain does not exist: ('788f4bfb-380f-4b98-93c1-dfa967ea4f3a',)
2019-10-29 06:49:40,248-0400 INFO  (monitor/788f4bf) [storage.Monitor] Domain 788f4bfb-380f-4b98-93c1-dfa967ea4f3a became INVALID (monitor:472)


Version-Release number of selected component (if applicable):
rhv-4.4.0-4 (http://bob.eng.lab.tlv.redhat.com/builds/4.4/rhv-4.4.0-4/)

ovirt-engine-4.4.0-0.4.master.el7.noarch
vdsm-4.40.0-127.gitc628cce.el8ev.x86_64
python3-libvirt-5.0.0-5.module+el8.0.1+3755+6782b0ed.x86_64
qemu-kvm-3.1.0-30.module+el8.0.1+3755+6782b0ed.x86_64


How reproducible:
Upgraded one stand so far

Steps to Reproduce:
1. ENV(engine + host) with 4.3.7 connected to ISCSI+NFS storage domain
2. Upgrade Engine(RHEL7.7) with new RHV4.4
3. As hosts are RHL8.1 and no upgrade is available between RHEL7.7 -> 8.1 you need to fresh install RHEL8.1 on the hosts
4. install host packages(VDSM)
5. Via webadmin add host to engine



Actual results:
ISCSI SD's does not come up , NFS and gluster is ok.

Expected results:
ISCSI SD's should come up.

Additional info:

Comment 2 Tal Nisan 2019-11-03 10:29:29 UTC
*** Bug 1766595 has been marked as a duplicate of this bug. ***

Comment 4 Avihai 2019-11-03 11:53:19 UTC
Ok , so the issue that the SD's did not go up and no SCSI attached devices were seen is due to the fact that the initiatorname.iscsi was deleted in OS re-provisioning.
When I changed the initiatorname.iscsi to their old value + iscsid service restart + reconnect ISCSI target -> the pre upgrade ISCSI SD's went up.

*** But still we have the creating new SD's issue from UI, I still see that we are stuck in 'new domain' window on loading luns stage.***
*** So we are still blocked creating a new SD via UI *** 

I am closing this bug as ISCSI SD's will go up if you configure initiatorname.iscsi correctly. 
I am reopening bug 1766595 as we still cannot create new SD's via REST/UI.

Now iscsi attachment is working:
[root@storage-ge4-vdsm1 ~]#  iscsiadm -m session  -P 3
iSCSI Transport Class version 2.0-870
version 6.2.0.877-0
Target: iqn.2000-05.com.3pardata:20210002ac021f6b (non-flash)
	Current Portal: 10.35.146.1:3260,21
	Persistent Portal: 3par-iscsi-1.scl.lab.tlv.redhat.com:3260,21
		**********
		Interface:
		**********
		Iface Name: default
		Iface Transport: tcp
		Iface Initiatorname: iqn.1994-05.com.redhat:storage-ge4-vdsm1
		Iface IPaddress: 10.35.82.63
		Iface HWaddress: default
		Iface Netdev: default
		SID: 6
		iSCSI Connection State: LOGGED IN
		iSCSI Session State: LOGGED_IN
		Internal iscsid Session State: NO CHANGE
		*********
		Timeouts:
		*********
		Recovery Timeout: 10
		Target Reset Timeout: 30
		LUN Reset Timeout: 30
		Abort Timeout: 15
		*****
		CHAP:
		*****
		username: <empty>
		password: ********
		username_in: <empty>
		password_in: ********
		************************
		Negotiated iSCSI params:
		************************
		HeaderDigest: None
		DataDigest: None
		MaxRecvDataSegmentLength: 262144
		MaxXmitDataSegmentLength: 65536
		FirstBurstLength: 16384
		MaxBurstLength: 262144
		ImmediateData: Yes
		InitialR2T: Yes
		MaxOutstandingR2T: 1
		************************
		Attached SCSI devices:
		************************
		Host Number: 3	State: running
		scsi3 Channel 00 Id 0 Lun: 0
			Attached scsi disk sdb		State: running
		scsi3 Channel 00 Id 0 Lun: 1
			Attached scsi disk sdc		State: running
		scsi3 Channel 00 Id 0 Lun: 2
			Attached scsi disk sdd		State: running
		scsi3 Channel 00 Id 0 Lun: 254
		scsi3 Channel 00 Id 0 Lun: 3
			Attached scsi disk sde		State: running
		scsi3 Channel 00 Id 0 Lun: 4
			Attached scsi disk sdf		State: running
		scsi3 Channel 00 Id 0 Lun: 5
			Attached scsi disk sdg		State: running
Target: iqn.2000-05.com.3pardata:21210002ac021f6b (non-flash)
	Current Portal: 10.35.146.2:3260,121
	Persistent Portal: 10.35.146.2:3260,121
		**********
		Interface:
		**********
		Iface Name: default
		Iface Transport: tcp
		Iface Initiatorname: iqn.1994-05.com.redhat:storage-ge4-vdsm1
		Iface IPaddress: 10.35.82.63
		Iface HWaddress: default
		Iface Netdev: default
		SID: 7
		iSCSI Connection State: LOGGED IN
		iSCSI Session State: LOGGED_IN
		Internal iscsid Session State: NO CHANGE
		*********
		Timeouts:
		*********
		Recovery Timeout: 10
		Target Reset Timeout: 30
		LUN Reset Timeout: 30
		Abort Timeout: 15
		*****
		CHAP:
		*****
		username: <empty>
		password: ********
		username_in: <empty>
		password_in: ********
		************************
		Negotiated iSCSI params:
		************************
		HeaderDigest: None
		DataDigest: None
		MaxRecvDataSegmentLength: 262144
		MaxXmitDataSegmentLength: 65536
		FirstBurstLength: 16384
		MaxBurstLength: 262144
		ImmediateData: Yes
		InitialR2T: Yes
		MaxOutstandingR2T: 1
		************************
		Attached SCSI devices:
		************************
		Host Number: 4	State: running
		scsi4 Channel 00 Id 0 Lun: 0
			Attached scsi disk sdh		State: running
		scsi4 Channel 00 Id 0 Lun: 1
			Attached scsi disk sdi		State: running
		scsi4 Channel 00 Id 0 Lun: 2
			Attached scsi disk sdj		State: running
		scsi4 Channel 00 Id 0 Lun: 254
		scsi4 Channel 00 Id 0 Lun: 3
			Attached scsi disk sdk		State: running
		scsi4 Channel 00 Id 0 Lun: 4
			Attached scsi disk sdl		State: running
		scsi4 Channel 00 Id 0 Lun: 5
			Attached scsi disk sdm		State: running

Comment 5 Avihai 2019-11-03 11:53:47 UTC
See prior comment.


Note You need to log in before you can comment on or make changes to this bug.