Bug 1281909 - Errors when resizing devices after disconnecting storage server during maintenance flow
Errors when resizing devices after disconnecting storage server during mainte...
Product: vdsm
Classification: oVirt
Component: Core (Show other bugs)
Unspecified Unspecified
unspecified Severity medium (vote)
: ovirt-3.6.1
: 4.17.11
Assigned To: Fred Rolland
Depends On:
  Show dependency treegraph
Reported: 2015-11-13 14:01 EST by Nir Soffer
Modified: 2016-01-19 10:37 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2016-01-19 10:37:24 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.z+
ylavi: planning_ack+
tnisan: devel_ack+
rule-engine: testing_ack+

Attachments (Terms of Use)
Vdsm log showing the errors (1.29 MB, text/plain)
2015-11-13 14:01 EST, Nir Soffer
no flags Details

External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 48578 master MERGED hsm : Do not resize on disconnectStorageServer Never
oVirt gerrit 48705 ovirt-3.6 MERGED hsm : Do not resize on disconnectStorageServer Never

  None (edit)
Description Nir Soffer 2015-11-13 14:01:54 EST
Created attachment 1093791 [details]
Vdsm log showing the errors

Description of problem:

When moving host to maintenance vdsm disconnect all storage servers.
As part of the disconnect, vdsm perform a storage refresh operation.
This includes resizing of all multipath devices. 

There seems to be a race between removing iscsi session and removal
of sysfs devices, so when we enumerate devices after disconnecting
from storage server, we get various errors:

Multipath device with not slaves:

2015-11-13 20:20:55,252 WARNING [Storage.Multipath] (jsonrpc.Executor/3) Map '3600140587a1af8ecb9e4fa9ad76f9b28' has no slaves [multipath:107(_resize_if_needed)]

This device will probably disappear soon.

Devices without a missing /sys/block/sdi/queue/logical_block_size:

2015-11-13 20:20:55,253 ERROR [Storage.Multipath] (jsonrpc.Executor/3) Could not resize device 360014052f7915069dd94a6eaf25b4edf [multipath:98(resize_devices)]
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/multipath.py", line 96, in resize_devices
  File "/usr/share/vdsm/storage/multipath.py", line 104, in _resize_if_needed
    for slave in devicemapper.getSlaves(name)]
  File "/usr/share/vdsm/storage/multipath.py", line 161, in getDeviceSize
    bs, phyBs = getDeviceBlockSizes(devName)
  File "/usr/share/vdsm/storage/multipath.py", line 153, in getDeviceBlockSizes
    "queue", "logical_block_size")).read())
IOError: [Errno 2] No such file or directory: '/sys/block/sdi/queue/logical_block_size'

Version-Release number of selected component (if applicable):
Since multipath device resizing is supported

How reproducible:

Steps to Reproduce:
1. Get setup with ISCSI storage domain
2. Put host to maintenance

Actual results:
Warnings and errors during disconnect storage server flow

Expected results:
Clean disconnect

There are two issues:

1. Resizing devices is not needed after disconnect.
   This opertion is needed only when:
   1. connecting to storage
   2. getting device list
   3. performing operations vg operations
   Currently this operation is part of storage refresh, which is needed
   when disconnecting from a storage server.

2. We don't wait until iscsi session is removed, racing with scsi
   system when enumerating devices

Solving the first issue will probably eliminate the second.
Comment 1 Nir Soffer 2015-11-13 14:05:54 EST
This is a regression caused by adding support for device resizing, but it 
is does not effect the functionality of the system.
Comment 2 Sandro Bonazzola 2015-11-24 11:43:22 EST
Please set target release or I can't move the bug to ON_QA automatically.
Comment 3 Red Hat Bugzilla Rules Engine 2015-11-24 13:07:49 EST
Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA.
Comment 4 Elad 2015-12-01 04:27:34 EST
Right now, vdsm doesn't disconnect its iSCSI sessions upon storage domain deactivation due to BZ #1279485.

Fred/Nir, should we wait for the fix of BZ #1279485 in order to test the scenario described here or will it be OK to test it with manual intervention in iSCSI sessions disconnection as a workaround?

Comment 5 Fred Rolland 2015-12-01 05:21:17 EST
Elad, this issue appeared only after the fix in BZ #1279485.
I don't think you can test it without it.
Comment 6 Elad 2016-01-18 03:31:36 EST
No IOError while putting host to maintenance on a DC with active iSCSI domains.
Host moves to maintenance and iSCSI sessions get disconnected successfully.

Verified with:

Note You need to log in before you can comment on or make changes to this bug.