Bug 2123008
Summary: | engine qemu-nbd lock virtual disk even with process failed | ||
---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | D. Ercolani <licenze+RedHat> |
Component: | Backup-Restore.Engine | Assignee: | Benny Zlotnik <bzlotnik> |
Status: | CLOSED NEXTRELEASE | QA Contact: | Shir Fishbain <sfishbai> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.5.2.1 | CC: | ahadas, bugs, bzlotnik, dfodor |
Target Milestone: | ovirt-4.5.3 | Flags: | pm-rhel:
ovirt-4.5?
|
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | ovirt-engine-4.5.3 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-09-19 14:31:02 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
D. Ercolani
2022-08-31 14:36:00 UTC
This bug has low overall severity and is not going to be further verified by QE. If you believe special care is required, feel free to properly align relevant severity, flags and keywords to raise PM_Score or use one of the Bumps ('PrioBumpField', 'PrioBumpGSS', 'PrioBumpPM', 'PrioBumpQA') in Keywords to raise it's PM_Score above verification threashold (1000). Reproducer: 1. Start downloading a disk 2. During download stop ovirt-engine and the ovirt-imageio service 3. Start both again The problem is in line[1]: if (getParameters().getTransferClientType().isBrowserTransfer()) { Where getTransferClientType() is null (probably after reloading the parameters after engine restart) [1] https://github.com/oVirt/ovirt-engine/blob/d318da21f71653ea095cfa5b30552d6ea0c74787/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/storage/disk/image/TransferDiskImageCommand.java#L1357 The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again. (In reply to Benny Zlotnik from comment #2) > Reproducer: > 1. Start downloading a disk > 2. During download stop ovirt-engine and the ovirt-imageio service > 3. Start both again > > The problem is in line[1]: > if > (getParameters().getTransferClientType().isBrowserTransfer()) { > > Where getTransferClientType() is null (probably after reloading the > parameters after engine restart) > [1] > https://github.com/oVirt/ovirt-engine/blob/ > d318da21f71653ea095cfa5b30552d6ea0c74787/backend/manager/modules/bll/src/ > main/java/org/ovirt/engine/core/bll/storage/disk/image/ > TransferDiskImageCommand.java#L1357 I think I understood how I faced this situation: I'm registering continue "hangs" of the ovirt-engine related to some lock in the gluster implementation, in vdsm.log I have many: 2022-09-06 20:33:48,960+0000 ERROR (qgapoller/0) [virt.periodic.Operation] <bound method QemuGuestAgentPoller._poller of <vdsm.virt.qemuguestagent.QemuGuestAgentPoller object at 0x7fb17130e4a8>> ope ration failed (periodic:204) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/virt/periodic.py", line 202, in __call__ self._func() File "/usr/lib/python3.6/site-packages/vdsm/virt/qemuguestagent.py", line 493, in _poller vm_id, self._qga_call_get_vcpus(vm_obj)) File "/usr/lib/python3.6/site-packages/vdsm/virt/qemuguestagent.py", line 814, in _qga_call_get_vcpus if 'online' in vcpus: TypeError: argument of type 'NoneType' is not iterable 2022-09-06 20:33:51,358+0000 ERROR (check/loop) [storage.monitor] Error checking path /rhev/data-center/mnt/glusterSD/localhost:_glen/3577c21e-f757-4405-97d1-0f827c9b4e22/dom_md/metadata (monitor:51 1) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 509, in _pathChecked delay = result.delay() File "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line 398, in delay raise exception.MiscFileReadException(self.path, self.rc, self.err) vdsm.storage.exception.MiscFileReadException: Internal file read failure: ('/rhev/data-center/mnt/glusterSD/localhost:_glen/3577c21e-f757-4405-97d1-0f827c9b4e22/dom_md/metadata', 1, 'Read timeout') 2022-09-06 20:33:51,358+0000 INFO (check/loop) [storage.monitor] Domain 3577c21e-f757-4405-97d1-0f827c9b4e22 became INVALID (monitor:482) If this happen while I'm going to backup a vm, the hang can bring up this problem. This bug has low overall severity and passed an automated regression suite, and is not going to be further verified by QE. If you believe special care is required, feel free to re-open to ON_QA status. |