Bug 1281909
Summary: | Errors when resizing devices after disconnecting storage server during maintenance flow | ||||||
---|---|---|---|---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | Nir Soffer <nsoffer> | ||||
Component: | Core | Assignee: | Fred Rolland <frolland> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Elad <ebenahar> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.17.11 | CC: | amureini, bugs, frolland, nsoffer, tnisan, ylavi | ||||
Target Milestone: | ovirt-3.6.1 | Flags: | rule-engine:
ovirt-3.6.z+
ylavi: planning_ack+ tnisan: devel_ack+ rule-engine: testing_ack+ |
||||
Target Release: | 4.17.11 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | storage | ||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-01-19 15:37:24 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
This is a regression caused by adding support for device resizing, but it is does not effect the functionality of the system. Please set target release or I can't move the bug to ON_QA automatically. Bug tickets that are moved to testing must have target release set to make sure tester knows what to test. Please set the correct target release before moving to ON_QA. Right now, vdsm doesn't disconnect its iSCSI sessions upon storage domain deactivation due to BZ #1279485. Fred/Nir, should we wait for the fix of BZ #1279485 in order to test the scenario described here or will it be OK to test it with manual intervention in iSCSI sessions disconnection as a workaround? Thanks Elad, this issue appeared only after the fix in BZ #1279485. I don't think you can test it without it. No IOError while putting host to maintenance on a DC with active iSCSI domains. Host moves to maintenance and iSCSI sessions get disconnected successfully. Verified with: vdsm-4.17.17-0.el7ev.noarch rhevm-3.6.2.5-0.1.el6.noarch |
Created attachment 1093791 [details] Vdsm log showing the errors Description of problem: When moving host to maintenance vdsm disconnect all storage servers. As part of the disconnect, vdsm perform a storage refresh operation. This includes resizing of all multipath devices. There seems to be a race between removing iscsi session and removal of sysfs devices, so when we enumerate devices after disconnecting from storage server, we get various errors: Multipath device with not slaves: 2015-11-13 20:20:55,252 WARNING [Storage.Multipath] (jsonrpc.Executor/3) Map '3600140587a1af8ecb9e4fa9ad76f9b28' has no slaves [multipath:107(_resize_if_needed)] This device will probably disappear soon. Devices without a missing /sys/block/sdi/queue/logical_block_size: 2015-11-13 20:20:55,253 ERROR [Storage.Multipath] (jsonrpc.Executor/3) Could not resize device 360014052f7915069dd94a6eaf25b4edf [multipath:98(resize_devices)] Traceback (most recent call last): File "/usr/share/vdsm/storage/multipath.py", line 96, in resize_devices _resize_if_needed(guid) File "/usr/share/vdsm/storage/multipath.py", line 104, in _resize_if_needed for slave in devicemapper.getSlaves(name)] File "/usr/share/vdsm/storage/multipath.py", line 161, in getDeviceSize bs, phyBs = getDeviceBlockSizes(devName) File "/usr/share/vdsm/storage/multipath.py", line 153, in getDeviceBlockSizes "queue", "logical_block_size")).read()) IOError: [Errno 2] No such file or directory: '/sys/block/sdi/queue/logical_block_size' Version-Release number of selected component (if applicable): Since multipath device resizing is supported How reproducible: Always Steps to Reproduce: 1. Get setup with ISCSI storage domain 2. Put host to maintenance Actual results: Warnings and errors during disconnect storage server flow Expected results: Clean disconnect There are two issues: 1. Resizing devices is not needed after disconnect. This opertion is needed only when: 1. connecting to storage 2. getting device list 3. performing operations vg operations Currently this operation is part of storage refresh, which is needed when disconnecting from a storage server. 2. We don't wait until iscsi session is removed, racing with scsi system when enumerating devices Solving the first issue will probably eliminate the second.