Bug 854151

Summary: 3.1 - vdsm: iscsi connections are not disconnected when host is put in maintenance and backend is sending disconnectStorageServer
Product: Red Hat Enterprise Linux 6 Reporter: Dafna Ron <dron>
Component: vdsmAssignee: Eduardo Warszawski <ewarszaw>
Status: CLOSED ERRATA QA Contact: Dafna Ron <dron>
Severity: high Docs Contact:
Priority: medium    
Version: 6.3CC: abaron, bazulay, hateya, iheim, ilvovsky, jbiddle, lpeer, ykaul
Target Milestone: betaKeywords: Regression, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: vdsm-4.9.6-38.0 Doc Type: Bug Fix
Doc Text:
Previously, when a host was put into maintenance mode, the back end would send disconnectStorageServer but the VDSM failed to actually disconnect. If there was a problem with the target, an error would remain even after the host was put back into up mode as the disconnection never occurred as it should have. Now, the disconnectStorageServer call works correctly, allowing issues to be repaired as necessary and the host to be re-activated without errors.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-04 19:09:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Dafna Ron 2012-09-04 08:19:42 UTC
Created attachment 609580 [details]
logs

Description of problem:

when we put host in maintenance backend sends disconnectStorageServer but vdsm is not disconnecting the connections. 
if the system is functioning correctly we would not notice that but when there is a problem with the targets and we do not disconnect the connections, even after we fix the issue we would still be detecting a problem in the storage (devices would be unknown). 

Version-Release number of selected component (if applicable):

si16

How reproducible:

100%

Steps to Reproduce:
1. run iscsiadm -m session on the host
2. put host in maintenance
3. run iscsiadm -m session again
4. check the vdsm log for disconnectStorageServer
  
Actual results:

although disconnectStorageServer was sent vdsm does not disconnect the connections. as a result, if there was a problem in the storage (in my case I removed my hosts from the access list) vdsm will keep detecting a problem because its reading from cache.
Expected results:

we should disconnect the iscsi connection when putting a host in maintenance. 

Additional info: logs

in the logs attached, I had a problem with the storage and even after vdsm restart I still cannot detect all the devices. 

when I start the host we see right away that we cannot detect one of the domains: 

Thread-21::ERROR::2012-09-04 11:04:15,795::sp::1566::Storage.StoragePool::(updateMonitoringThreads) Unexpected error while trying to monitor domain `0dc1433f-72e6-4b62-9845-dc022a191f4f`
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sp.py", line 1561, in updateMonitoringThreads
    .startMonitoring(sdCache.produce(sdUUID), self.id)
  File "/usr/share/vdsm/storage/sdc.py", line 110, in produce
    dom.getRealDomain()
  File "/usr/share/vdsm/storage/sdc.py", line 51, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 123, in _realProduce
    dom = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 147, in _findDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: (u'0dc1433f-72e6-4b62-9845-dc022a191f4f',)
Thread-22::INFO::2012-09-04 11:04:15,796::safelease::160::SANLock::(acquireHostId) Acquiring host id for domain 6a1f9f02-2f53-4d67-8f6a-567c0d01777c (id: 1)


I put the host in maintenance again and I can see that disconnectStorage server is sent: 

Thread-212::INFO::2012-09-04 11:06:01,165::logUtils::37::dispatcher::(wrapper) Run and protect: disconnectStorageServer(domType=3, spUUID='f570527f-004a-4cab-8bee-129fa589bec5', conList=[{'connection': '10.35.64.10', 'iqn': 'dafna-si16-02', 'portal': '1', 'user': '', 'password': '******', 'id': 'bc1efcff-3f14-4502-9728-89fbc29e0100', 'port': '3260'}, {'connection': '10.35.64.11', 'iqn': 'iqn.1992-08.com.netapp:sn.151709391', 'portal': '2000', 'user': '', 'password': '******', 'id': 'c5a034c9-db6e-4741-ac6e-b84495e7cc37', 'port': '3260'}, {'connection': '10.35.160.7', 'iqn': 'iqn.2008-05.com.xtremio:001b21b545c0', 'portal': '1', 'user': '', 'password': '******', 'id': 'ff6680e9-2550-4a30-b947-1648617c62cc', 'port': '3260'}, {'connection': '10.35.64.10', 'iqn': 'dafna-si16-03', 'portal': '1', 'user': '', 'password': '******', 'id': '6a41fd87-0726-43a3-9306-32fb3e241ed5', 'port': '3260'}, {'connection': '10.35.64.10', 'iqn': 'dafna-si16-01', 'portal': '1', 'user': '', 'password': '******', 'id': 'bed3f09f-d7d4-4f8f-bcd6-e7f55df5be31', 'port': '3260'}], options=None)
Thread-212::DEBUG::2012-09-04 11:06:01,176::misc::1080::SamplingMethod::(__call__) Trying to enter sampling method (storage.sdc.refreshStorage)
Thread-212::DEBUG::2012-09-04 11:06:01,176::misc::1082::SamplingMethod::(__call__) Got in to sampling method
Thread-212::DEBUG::2012-09-04 11:06:01,177::misc::1080::SamplingMethod::(__call__) Trying to enter sampling method (storage.iscsi.rescan)
Thread-212::DEBUG::2012-09-04 11:06:01,177::misc::1082::SamplingMethod::(__call__) Got in to sampling method
Thread-212::DEBUG::2012-09-04 11:06:01,177::__init__::1164::Storage.Misc.excCmd::(_log) '/usr/bin/sudo -n /sbin/iscsiadm -m session -R' (cwd None)
Thread-212::DEBUG::2012-09-04 11:06:01,370::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = ''; <rc> = 0
Thread-212::DEBUG::2012-09-04 11:06:01,371::misc::1090::SamplingMethod::(__call__) Returning last result
Thread-212::DEBUG::2012-09-04 11:06:01,378::__init__::1164::Storage.Misc.excCmd::(_log) '/usr/bin/sudo -n /sbin/multipath' (cwd None)
Thread-212::DEBUG::2012-09-04 11:06:01,816::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = ''; <rc> = 0
Thread-212::DEBUG::2012-09-04 11:06:01,817::lvm::460::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' got the operation mutex
Thread-212::DEBUG::2012-09-04 11:06:01,817::lvm::462::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' released the operation mutex
Thread-212::DEBUG::2012-09-04 11:06:01,818::lvm::472::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' got the operation mutex
Thread-212::DEBUG::2012-09-04 11:06:01,818::lvm::474::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' released the operation mutex
Thread-212::DEBUG::2012-09-04 11:06:01,819::lvm::493::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' got the operation mutex
Thread-212::DEBUG::2012-09-04 11:06:01,819::lvm::495::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' released the operation mutex
Thread-212::DEBUG::2012-09-04 11:06:01,819::misc::1090::SamplingMethod::(__call__) Returning last result


but running iscsiadm -m session will show that the connections are alive: 

[root@localhost ~]# iscsiadm -m session
tcp: [10] 10.35.64.10:3260,1 dafna-si16-01
tcp: [11] 10.35.64.10:3260,1 dafna-si16-02
tcp: [12] 10.35.64.10:3260,1 dafna-si16-03
tcp: [8] 10.35.64.11:3260,2000 iqn.1992-08.com.netapp:sn.151709391
tcp: [9] 10.35.160.7:3260,1 iqn.2008-05.com.xtremio:001b21b545c0
[root@localhost ~]# date
Tue Sep  4 11:07:17 IDT 2012
[root@localhost ~]# 


activating the host again will show that it cannot detect all the domains again:

StorageDomainDoesNotExist: Storage domain does not exist: (u'0dc1433f-72e6-4b62-9845-dc022a191f4f',)

after I manually disconnect the iscsi connection, I activate the host and it can detect the domain and everything is fine. 


root@localhost ~]# iscsiadm -m session
tcp: [10] 10.35.64.10:3260,1 dafna-si16-01
tcp: [11] 10.35.64.10:3260,1 dafna-si16-02
tcp: [12] 10.35.64.10:3260,1 dafna-si16-03
tcp: [8] 10.35.64.11:3260,2000 iqn.1992-08.com.netapp:sn.151709391
tcp: [9] 10.35.160.7:3260,1 iqn.2008-05.com.xtremio:001b21b545c0
[root@localhost ~]# iscsiadm -m node -u
Logging out of session [sid: 10, target: dafna-si16-01, portal: 10.35.64.10,3260]
Logging out of session [sid: 11, target: dafna-si16-02, portal: 10.35.64.10,3260]
Logging out of session [sid: 12, target: dafna-si16-03, portal: 10.35.64.10,3260]
Logging out of session [sid: 8, target: iqn.1992-08.com.netapp:sn.151709391, portal: 10.35.64.11,3260]
Logging out of session [sid: 9, target: iqn.2008-05.com.xtremio:001b21b545c0, portal: 10.35.160.7,3260]
Logout of [sid: 10, target: dafna-si16-01, portal: 10.35.64.10,3260] successful.
Logout of [sid: 11, target: dafna-si16-02, portal: 10.35.64.10,3260] successful.
Logout of [sid: 12, target: dafna-si16-03, portal: 10.35.64.10,3260] successful.
Logout of [sid: 8, target: iqn.1992-08.com.netapp:sn.151709391, portal: 10.35.64.11,3260] successful.
Logout of [sid: 9, target: iqn.2008-05.com.xtremio:001b21b545c0, portal: 10.35.160.7,3260] successful.
[root@localhost ~]# iscsiadm -m session
iscsiadm: No active sessions.
[root@localhost ~]# date
Tue Sep  4 11:10:06 IDT 2012


as you can see the vgs returns with all devices: 

[root@localhost ~]# vgs -o +pv_name
  VG                                   #PV #LV #SN Attr   VSize   VFree   PV                                           
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz--n- 352.38g 241.50g /dev/mapper/1Dafna-si16-011346574            
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz--n- 352.38g 241.50g /dev/mapper/3514f0c56958002de                
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz--n- 352.38g 241.50g /dev/mapper/3514f0c5695800328                
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz--n- 352.38g 241.50g /dev/mapper/3514f0c5695800329                
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz--n- 352.38g 241.50g /dev/mapper/3514f0c569580032a                
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz--n- 352.38g 241.50g /dev/mapper/3514f0c569580032b                
  0dc1433f-72e6-4b62-9845-dc022a191f4f   7  59   0 wz--n- 352.38g 241.50g /dev/mapper/3514f0c569580032c                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/360a98000572d45366b4a6d4156565377
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002ec                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002eb                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002e9                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c56958002ed                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c5695800320                
  6a1f9f02-2f53-4d67-8f6a-567c0d01777c   7   6   0 wz--n- 347.38g 343.50g /dev/mapper/3514f0c5695800322                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz--n- 416.62g 412.75g /dev/mapper/1Dafna-si16-021346574            
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz--n- 416.62g 412.75g /dev/mapper/3514f0c56958002e3                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz--n- 416.62g 412.75g /dev/mapper/3514f0c56958002e6                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz--n- 416.62g 412.75g /dev/mapper/3514f0c56958002e4                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz--n- 416.62g 412.75g /dev/mapper/3514f0c56958002e5                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz--n- 416.62g 412.75g /dev/mapper/3514f0c5695800326                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz--n- 416.62g 412.75g /dev/mapper/3514f0c5695800327                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz--n- 416.62g 412.75g /dev/mapper/3514f0c5695800331                
  7d1dd321-16a3-4557-99c2-7d524b60f33f   9   6   0 wz--n- 416.62g 412.75g /dev/mapper/3514f0c5695800333                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002ee                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002ef                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002f3                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002f2                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002f0                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c56958002f1                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c569580031d                
  b1f82e00-4647-4df0-8190-6ba4e811f5a5   8  23   0 wz--n- 397.00g 350.75g /dev/mapper/3514f0c569580031e                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/1Dafna-si16-031346574            
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c56958002e2                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c56958002e7                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c56958002e8                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c56958002ea                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c5695800323                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c5695800324                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c569580032e                
  bc7fde7a-4d43-4dd4-874a-bff5ca517bae   9   7   0 wz--n- 416.62g 410.75g /dev/mapper/3514f0c569580032f                
  vg0                                    1   3   0 wz--n- 136.24g      0  /dev/sda2                                    
[root@localhost ~]# date
Tue Sep  4 11:10:49 IDT 2012

Comment 3 Eduardo Warszawski 2012-09-30 07:49:45 UTC
http://gerrit.ovirt.org/#/c/8275/

Comment 5 Dafna Ron 2012-10-21 13:33:42 UTC
verified on si21.1

Comment 8 errata-xmlrpc 2012-12-04 19:09:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-1508.html