Bug 1613282
Summary: | Failed to hot-Unplug disk from VM with Code 46 (Timeout detaching) | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Yosi Ben Shimon <ybenshim> | ||||||
Component: | BLL.Storage | Assignee: | Arik <ahadas> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Liran Rotenberg <lrotenbe> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 4.2.5 | CC: | ahadas, bugs, ebenahar, eshames, michal.skrivanek, ratamir, rbarry, ybenshim | ||||||
Target Milestone: | ovirt-4.2.6 | Keywords: | Automation, AutomationBlocker, Regression | ||||||
Target Release: | --- | Flags: | rule-engine:
ovirt-4.2+
rule-engine: blocker+ |
||||||
Hardware: | x86_64 | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | ovirt-engine-4.2.6.4 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2018-09-03 15:07:43 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Yosi Ben Shimon
2018-08-07 11:15:48 UTC
Moving to Virt for inspection as it might be related to DomainXML changes Created attachment 1473977 [details]
logs
The first error is not related. I would suspect guest cooperation issue first as it seems to just time out. Libvirt and guest logs would help. The second item about upstream user report is unrelated. Seems to be about inability to hot add a disk due to underlying (and likely pre-existing) libvirt issue. Without logs or any other details there's nothing to look at. Users list discussion points to the -U parameter of qemu-img which is long fixed I believe. The third item about the missing alias should be tracked separately, please do not mix two issues in one bug. There's nothing about that in attached logs anyway. Please do not mix up unrelated issues in one report So if we talk about the first issue only then please provide additional logs/reproduction scenario. Without further data this bug will likely get closed. Created attachment 1474372 [details]
engine + vdsm + libvirt + art logs
Hi Michal, we will keep this bug for the third issue (for QE info - reproduced using TestCase 5044).
Attached logs for this issue.
In vdsm log we see the next exception:
2018-07-25 23:36:02,978+0300 INFO (jsonrpc/4) [api.virt] START hotunplugDisk(params={'xml': '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><hotunplug><devices><disk><alias name=""/></disk></devices></hotunplug>', 'vmId': '58e0bb05-f526-4e59-8cfd-0bd497a07dc3'}) from=::ffff:10.46.16.248,33718, flow_id=diskattachments_update_0472abcd-b37f, vmId=58e0bb05-f526-4e59-8cfd-0bd497a07dc3 (api:46)
2018-07-25 23:36:02,980+0300 ERROR (jsonrpc/4) [api] FINISH hotunplugDisk error=('Unrecognized device name: %s', '') (api:132)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 122, in method
ret = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 419, in hotunplugDisk
return self.vm.hotunplugDisk(params)
File "<string>", line 2, in hotunplugDisk
File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 99, in method
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3902, in hotunplugDisk
diskParams = storagexml.parse(elem, meta)
File "/usr/lib/python2.7/site-packages/vdsm/virt/vmdevices/storagexml.py", line 91, in parse
add_vdsm_parameters(params)
File "/usr/lib/python2.7/site-packages/vdsm/virt/vmdevices/storagexml.py", line 98, in add_vdsm_parameters
_, params['index'] = drivename.split(params['name'])
File "/usr/lib/python2.7/site-packages/vdsm/virt/vmdevices/drivename.py", line 51, in split
raise ValueError('Unrecognized device name: %s', devname)
ValueError: ('Unrecognized device name: %s', '')
2018-07-25 23:36:02,982+0300 INFO (jsonrpc/4) [api.virt] FINISH hotunplugDisk return={'status': {'message': 'General Exception: ("(\'Unrecognized device name: %s\', \'\')",)', 'code': 100}}
The steps to reproduce are:
1. Create and start VM
2. Create 8 thin provisioned of 1GB disks
3. Attach all disks to the VM
4. Deactivate a randomly selected disk (x) from the vm
5. Activate disk x
Arik, this is likely related to your recent change. How is it possible that the alias is empty? (In reply to Michal Skrivanek from comment #5) > Arik, this is likely related to your recent change. How is it possible that > the alias is empty? I see two possible reasons for this: 1. The correlation logic is incorrect. 2. The unplug operation was triggered before the engine managed to receive the up-to-date devices. The right fix for this though is changing the hot-plug disk command to work like hot-plug NIC - we can update the alias without waiting for the devices monitoring as in 4.2 clusters (and above) we know the device alias is based on the device-id. What's your estimation on time required for a fix? Hi Ryan, this bug is marked as a regression, therefore it should be targeted for 4.2.6. (In reply to Ryan Barry from comment #7) > What's your estimation on time required for a fix? Should be fairly simple, something like an hour or two. (In reply to Elad from comment #8) > Hi Ryan, this bug is marked as a regression, therefore it should be targeted > for 4.2.6. Note that the test does not represent a realistic scenario though - the disk is unplugged 3 seconds after it was plugged... so it is unlikely to affect users. Verified on: ovirt-engine-4.2.6.4-0.0.master.20180821115903.git1327b2f.el7.noarch vdsm-4.20.37-3.git924eec4.el7.x86_64 Steps of reproduce: Run TestCase5044 of storage team. 1. Create and start VM 2. Create 8 thin provisioned of 1GB disks 3. Attach all disks to the VM 4. Deactivate a randomly selected disk (x) from the vm 5. Activate disk x Results: No errors, hot unplugging the disk from the VM succeeded. QE verification bot: the bug was verified upstream |