Bug 1281909

Summary: Errors when resizing devices after disconnecting storage server during maintenance flow
Product: [oVirt] vdsm Reporter: Nir Soffer <nsoffer>
Component: CoreAssignee: Fred Rolland <frolland>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.17.11CC: amureini, bugs, frolland, nsoffer, tnisan, ylavi
Target Milestone: ovirt-3.6.1Flags: rule-engine: ovirt-3.6.z+
ylavi: planning_ack+
tnisan: devel_ack+
rule-engine: testing_ack+
Target Release: 4.17.11   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-19 15:37:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Vdsm log showing the errors none

Description Nir Soffer 2015-11-13 19:01:54 UTC
Created attachment 1093791 [details]
Vdsm log showing the errors

Description of problem:

When moving host to maintenance vdsm disconnect all storage servers.
As part of the disconnect, vdsm perform a storage refresh operation.
This includes resizing of all multipath devices. 

There seems to be a race between removing iscsi session and removal
of sysfs devices, so when we enumerate devices after disconnecting
from storage server, we get various errors:

Multipath device with not slaves:

2015-11-13 20:20:55,252 WARNING [Storage.Multipath] (jsonrpc.Executor/3) Map '3600140587a1af8ecb9e4fa9ad76f9b28' has no slaves [multipath:107(_resize_if_needed)]

This device will probably disappear soon.

Devices without a missing /sys/block/sdi/queue/logical_block_size:

2015-11-13 20:20:55,253 ERROR [Storage.Multipath] (jsonrpc.Executor/3) Could not resize device 360014052f7915069dd94a6eaf25b4edf [multipath:98(resize_devices)]
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/multipath.py", line 96, in resize_devices
    _resize_if_needed(guid)
  File "/usr/share/vdsm/storage/multipath.py", line 104, in _resize_if_needed
    for slave in devicemapper.getSlaves(name)]
  File "/usr/share/vdsm/storage/multipath.py", line 161, in getDeviceSize
    bs, phyBs = getDeviceBlockSizes(devName)
  File "/usr/share/vdsm/storage/multipath.py", line 153, in getDeviceBlockSizes
    "queue", "logical_block_size")).read())
IOError: [Errno 2] No such file or directory: '/sys/block/sdi/queue/logical_block_size'

Version-Release number of selected component (if applicable):
Since multipath device resizing is supported

How reproducible:
Always

Steps to Reproduce:
1. Get setup with ISCSI storage domain
2. Put host to maintenance

Actual results:
Warnings and errors during disconnect storage server flow

Expected results:
Clean disconnect

There are two issues:

1. Resizing devices is not needed after disconnect.
   This opertion is needed only when:
   1. connecting to storage
   2. getting device list
   3. performing operations vg operations
   Currently this operation is part of storage refresh, which is needed
   when disconnecting from a storage server.

2. We don't wait until iscsi session is removed, racing with scsi
   system when enumerating devices

Solving the first issue will probably eliminate the second.

Comment 1 Nir Soffer 2015-11-13 19:05:54 UTC
This is a regression caused by adding support for device resizing, but it 
is does not effect the functionality of the system.

Comment 2 Sandro Bonazzola 2015-11-24 16:43:22 UTC
Please set target release or I can't move the bug to ON_QA automatically.

Comment 3 Red Hat Bugzilla Rules Engine 2015-11-24 18:07:49 UTC
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.

Comment 4 Elad 2015-12-01 09:27:34 UTC
Right now, vdsm doesn't disconnect its iSCSI sessions upon storage domain deactivation due to BZ #1279485.

Fred/Nir, should we wait for the fix of BZ #1279485 in order to test the scenario described here or will it be OK to test it with manual intervention in iSCSI sessions disconnection as a workaround?

Thanks

Comment 5 Fred Rolland 2015-12-01 10:21:17 UTC
Elad, this issue appeared only after the fix in BZ #1279485.
I don't think you can test it without it.

Comment 6 Elad 2016-01-18 08:31:36 UTC
No IOError while putting host to maintenance on a DC with active iSCSI domains.
Host moves to maintenance and iSCSI sessions get disconnected successfully.

Verified with:
vdsm-4.17.17-0.el7ev.noarch
rhevm-3.6.2.5-0.1.el6.noarch