Bug 1001626
Summary: | Failed Remove Storage Domain after restart “ovirt-engine” service | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | vvyazmin <vvyazmin> | ||||||
Component: | ovirt-engine | Assignee: | Maor <mlipchuk> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Leonid Natapov <lnatapov> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 3.3.0 | CC: | abaron, acanan, acathrow, amureini, bazulay, iheim, lpeer, mlipchuk, ratamir, Rhev-m-bugs, scohen, yeylon | ||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||
Target Release: | 3.3.0 | Flags: | abaron:
Triaged+
|
||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | storage | ||||||||
Fixed In Version: | is20.2 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | Type: | Bug | |||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1026487 | ||||||||
Attachments: |
|
Remove storage domain and format storage domain should roll forward. If a removal is interrupted, the SD should be removed from the engine, since it's probably unusable anyway, and in the worst event there will be some leftovers on the storage. According to the flow the *engine* was restart while vdsm was performing a formatStorageDomain op. Then engine sent the command again (after it was started) and vdsm notified that it cannot find this storage domain. This sounds like an engine issue to me. Engine removes the storage domain only after it is removed from VDSM. The proposed solution should be that if the engine gets "Storage domain does not exist" when calling formatStorageDomain, then the engine should roll forword and remove the storage. Version-Release number of selected component (if applicable): RHEVM 3.3 - IS14 environment: Host OS: RHEL 6.5 RHEVM: rhevm-3.3.0-0.21.master.el6ev.noarch PythonSDK: rhevm-sdk-python-3.3.0.13-1.el6ev.noarch VDSM: vdsm-4.12.0-127.gitedb88bf.el6ev.x86_64 LIBVIRT: libvirt-0.10.2-23.el6.bz964359.eblake.1.x86_64 QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.401.el6.x86_64 SANLOCK: sanlock-2.8-1.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. Remove a Storage Domain and reboot RHEVM host 2. After “START, FormatStorageDomainVDSCommand” command restart “ovirt-engine” service. Actual results: Failed remove the domain once WebAdmin is back. SD removed from VDSM, but not from RHEVM DB Expected results: Succeed remove the domain once WebAdmin is back. Additional info: /var/log/ovirt-engine/engine.log 2013-09-16 16:02:27,761 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) Failed in FormatStorageDomainVDS method 2013-09-16 16:02:27,762 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) Error code StorageDomainDoesNotExist and error message VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('fd2bb500-0744-405d-9860-d2b3824f13cf',) 2013-09-16 16:02:27,763 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) Command org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand return value StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=358, mMessage=Storage domain does not exist: ('fd2bb500-0744-405d-9860-d2b3824f13cf',)]] 2013-09-16 16:02:27,763 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) HostName = tigris01.scl.lab.tlv.redhat.com 2013-09-16 16:02:27,807 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) Command FormatStorageDomainVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('fd2bb500-0744-405d-9860-d2b3824f13cf',) 2013-09-16 16:02:27,807 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) FINISH, FormatStorageDomainVDSCommand, log id: 47105bd3 2013-09-16 16:02:27,808 ERROR [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8702-7) Command org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('fd2bb500-0744-405d-9860-d2b3824f13cf',) (Failed with error StorageDomainDoesNotExist and code 358) 2013-09-16 16:02:27,850 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-7) Correlation ID: 76be222, Job ID: e25df7b6-40a8-4ae0-9e4e-065d2934264c, Call Stack: null, Custom Event ID: -1, Message: Failed to remove Storage Domain SD-FCP-04. (User: admin@internal) 2013-09-16 16:02:27,905 INFO [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8702-7) Lock freed to object EngineLock [exclusiveLocks= key: fd2bb500-0744-405d-9860-d2b3824f13cf value: STORAGE Created attachment 798656 [details]
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm
is23. fixed. tested according steps to reproduce. SD removed without problem after restarting ovirt-engine. Closing - RHEV 3.3 Released Closing - RHEV 3.3 Released *** Bug 1019295 has been marked as a duplicate of this bug. *** |
Created attachment 790947 [details] ## Logs rhevm, vdsm, libvirt, thread dump, superVdsm Description of problem: Failed Remove Storage Domain after restart “ovirt-engine” service Version-Release number of selected component (if applicable): RHEVM 3.3 - IS11 environment: RHEVM: rhevm-3.3.0-0.16.master.el6ev.noarch PythonSDK: rhevm-sdk-python-3.3.0.11-1.el6ev.noarch VDSM: vdsm-4.12.0-72.git287bb7e.el6ev.x86_64 LIBVIRT: libvirt-0.10.2-18.el6_4.9.x86_64 QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.355.el6_4.5.x86_64 SANLOCK: sanlock-2.8-1.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. Create unattached Storage Domain (SD) 2. Remove SD - RemoveStorageDomain command 3. During FormatStorageDomain command - restart “ovirt-engine” service 4. Remove SD failed (as expected) 5. Once webadmin is back, run remove SD again Actual results: Failed remove SD - failed on FormatStorageDomain command Expected results: Succeed remove SD Impact on user: Failed remove SD Workaround: Run ForceRemoveStorageDomain command Additional info: /var/log/ovirt-engine/engine.log 2013-08-27 14:57:24,968 INFO [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8702-10) Running command: RemoveStorageDomainCommand internal: false . Entities affected : ID: 5aa0e6b6-6969-4c81-b676-db85d548249a Type: Storage 2013-08-27 14:57:24,974 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (ajp-/127.0.0.1:8702-10) START, ConnectStorageServerVDSCommand(HostName = tigris01.scl.lab.tlv.redhat.com, HostId = 9576d8ca-4466-46e6-bebc-ccd922075ac6, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = ISCSI, connectionList = [{ id: f7e66fe5-e840-4987-a339-03234a63d57a, connection: 10.35.160.7, iqn: iqn.2008-05.com.xtremio:001e675b8ee0, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: nu ll, nfsTimeo: null };]), log id: 128cf66 2013-08-27 14:57:25,639 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (ajp-/127.0.0.1:8702-10) FINISH, ConnectStorageServerVDSCommand, return: {f7e66fe5-e840-4987-a339-03234a63d57a=0}, log id: 128cf66 2013-08-27 14:57:25,641 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) START, FormatStorageDomainVDSCommand(HostName = tigris01.scl.lab.tlv.redhat.com, HostId = 9576d8ca-4466-46e6-bebc-ccd922075ac6, storageDomainId=5aa0e6b6-6969-4c81-b676-db85d548249a), log id: 49710f12 2013-08-27 14:57:44,281 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) Failed in FormatStorageDomainVDS method 2013-08-27 14:57:44,281 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) Error code StorageDomainDoesNotExist and error m essage VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('5aa0e6b6-6969-4c81-b676-db85d548249a',) 2013-08-27 14:57:44,281 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) Command org.ovirt.engine.core.vdsbroker.vdsbroke r.FormatStorageDomainVDSCommand return value StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=358, mMessage=Storage domain does not exist: ('5aa0e6b6-6969-4c81-b676-db85d548249a',)]] 2013-08-27 14:57:44,281 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) HostName = tigris01.scl.lab.tlv.redhat.com 2013-08-27 14:57:44,281 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) Command FormatStorageDomainVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('5aa0e6b6-6969-4c81-b676-db85d54 8249a',) 2013-08-27 14:57:44,281 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) FINISH, FormatStorageDomainVDSCommand, log id: 4 9710f12 2013-08-27 14:57:44,282 ERROR [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8702-10) Command org.ovirt.engine.core.bll.storage.RemoveStorageDoma inCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Faile d to FormatStorageDomainVDS, error = Storage domain does not exist: ('5aa0e6b6-6969-4c81-b676-db85d548249a',) (Failed with error StorageDomainDoesNotExist and code 358) /var/log/vdsm/vdsm.log