Bug 1001626 - Failed Remove Storage Domain after restart “ovirt-engine” service
Summary: Failed Remove Storage Domain after restart “ovirt-engine” service
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.3.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 3.3.0
Assignee: Maor
QA Contact: Leonid Natapov
URL:
Whiteboard: storage
: 1019295 (view as bug list)
Depends On:
Blocks: 3.3snap1
TreeView+ depends on / blocked
 
Reported: 2013-08-27 12:24 UTC by vvyazmin@redhat.com
Modified: 2016-02-10 20:34 UTC (History)
12 users (show)

Fixed In Version: is20.2
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
oVirt Team: Storage
Target Upstream Version:
Embargoed:
abaron: Triaged+


Attachments (Terms of Use)
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm (5.46 MB, application/x-gzip)
2013-08-27 12:24 UTC, vvyazmin@redhat.com
no flags Details
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm (2.45 MB, application/x-gzip)
2013-09-17 07:54 UTC, vvyazmin@redhat.com
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 20158 0 'None' MERGED core: Remove non-existent Storage Domain will be rolled forward. 2020-10-21 19:11:25 UTC
oVirt gerrit 20333 0 'None' MERGED core: Remove non-existent Storage Domain will be rolled forward. 2020-10-21 19:11:26 UTC

Description vvyazmin@redhat.com 2013-08-27 12:24:52 UTC
Created attachment 790947 [details]
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm

Description of problem:
Failed Remove Storage Domain after restart “ovirt-engine” service

Version-Release number of selected component (if applicable):
RHEVM 3.3 - IS11 environment:

RHEVM:  rhevm-3.3.0-0.16.master.el6ev.noarch
PythonSDK:  rhevm-sdk-python-3.3.0.11-1.el6ev.noarch
VDSM:  vdsm-4.12.0-72.git287bb7e.el6ev.x86_64
LIBVIRT:  libvirt-0.10.2-18.el6_4.9.x86_64
QEMU & KVM:  qemu-kvm-rhev-0.12.1.2-2.355.el6_4.5.x86_64
SANLOCK:  sanlock-2.8-1.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create unattached Storage Domain (SD) 
2. Remove SD - RemoveStorageDomain command 
3. During FormatStorageDomain command - restart “ovirt-engine” service
4. Remove SD failed (as expected)
5. Once webadmin is back, run remove SD again

Actual results:
Failed remove SD - failed on FormatStorageDomain command

Expected results:
Succeed remove SD

Impact on user:
Failed remove SD

Workaround:
Run ForceRemoveStorageDomain command

Additional info:

/var/log/ovirt-engine/engine.log
2013-08-27 14:57:24,968 INFO  [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8702-10) Running command: RemoveStorageDomainCommand internal: false
. Entities affected :  ID: 5aa0e6b6-6969-4c81-b676-db85d548249a Type: Storage
2013-08-27 14:57:24,974 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (ajp-/127.0.0.1:8702-10) START, ConnectStorageServerVDSCommand(HostName 
= tigris01.scl.lab.tlv.redhat.com, HostId = 9576d8ca-4466-46e6-bebc-ccd922075ac6, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = ISCSI, connectionList = [{ 
id: f7e66fe5-e840-4987-a339-03234a63d57a, connection: 10.35.160.7, iqn: iqn.2008-05.com.xtremio:001e675b8ee0, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: nu
ll, nfsTimeo: null };]), log id: 128cf66
2013-08-27 14:57:25,639 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (ajp-/127.0.0.1:8702-10) FINISH, ConnectStorageServerVDSCommand, return:
 {f7e66fe5-e840-4987-a339-03234a63d57a=0}, log id: 128cf66
2013-08-27 14:57:25,641 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) START, FormatStorageDomainVDSCommand(HostName = 
tigris01.scl.lab.tlv.redhat.com, HostId = 9576d8ca-4466-46e6-bebc-ccd922075ac6, storageDomainId=5aa0e6b6-6969-4c81-b676-db85d548249a), log id: 49710f12
2013-08-27 14:57:44,281 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) Failed in FormatStorageDomainVDS method
2013-08-27 14:57:44,281 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) Error code StorageDomainDoesNotExist and error m
essage VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('5aa0e6b6-6969-4c81-b676-db85d548249a',)
2013-08-27 14:57:44,281 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) Command org.ovirt.engine.core.vdsbroker.vdsbroke
r.FormatStorageDomainVDSCommand return value 
 StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=358, mMessage=Storage domain does not exist: ('5aa0e6b6-6969-4c81-b676-db85d548249a',)]]
2013-08-27 14:57:44,281 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) HostName = tigris01.scl.lab.tlv.redhat.com
2013-08-27 14:57:44,281 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) Command FormatStorageDomainVDS execution failed.
 Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('5aa0e6b6-6969-4c81-b676-db85d54
8249a',)
2013-08-27 14:57:44,281 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) FINISH, FormatStorageDomainVDSCommand, log id: 4
9710f12
2013-08-27 14:57:44,282 ERROR [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8702-10) Command org.ovirt.engine.core.bll.storage.RemoveStorageDoma
inCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Faile
d to FormatStorageDomainVDS, error = Storage domain does not exist: ('5aa0e6b6-6969-4c81-b676-db85d548249a',) (Failed with error StorageDomainDoesNotExist and code 358)

/var/log/vdsm/vdsm.log

Comment 1 Allon Mureinik 2013-08-28 07:33:08 UTC
Remove storage domain and format storage domain should roll forward.
If a removal is interrupted, the SD should be removed from the engine, since it's probably unusable anyway, and in the worst event there will be some leftovers on the storage.

Comment 4 Ayal Baron 2013-09-02 15:01:27 UTC
According to the flow the *engine* was restart while vdsm was performing a formatStorageDomain op.  Then engine sent the command again (after it was started) and vdsm notified that it cannot find this storage domain.  This sounds like an engine issue to me.

Comment 5 Maor 2013-09-15 15:09:49 UTC
Engine removes the storage domain only after it is removed from VDSM.

The proposed solution should be that if the engine gets "Storage domain does not exist" when calling formatStorageDomain, then the engine should roll forword and remove the storage.

Comment 6 vvyazmin@redhat.com 2013-09-17 07:53:30 UTC
Version-Release number of selected component (if applicable):
RHEVM 3.3 - IS14 environment:

Host OS: RHEL 6.5

RHEVM:  rhevm-3.3.0-0.21.master.el6ev.noarch
PythonSDK:  rhevm-sdk-python-3.3.0.13-1.el6ev.noarch
VDSM:  vdsm-4.12.0-127.gitedb88bf.el6ev.x86_64
LIBVIRT:  libvirt-0.10.2-23.el6.bz964359.eblake.1.x86_64
QEMU & KVM:  qemu-kvm-rhev-0.12.1.2-2.401.el6.x86_64
SANLOCK:  sanlock-2.8-1.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.  Remove a Storage Domain and reboot RHEVM host 
2. After “START, FormatStorageDomainVDSCommand” command restart “ovirt-engine” service.

Actual results:
Failed remove the domain once WebAdmin is back.
SD removed from VDSM, but not from RHEVM DB

Expected results:
Succeed remove the domain once WebAdmin is back.

Additional info:

/var/log/ovirt-engine/engine.log


2013-09-16 16:02:27,761 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) Failed in FormatStorageDomainVDS method
2013-09-16 16:02:27,762 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) Error code StorageDomainDoesNotExist and error message VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('fd2bb500-0744-405d-9860-d2b3824f13cf',)
2013-09-16 16:02:27,763 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) Command org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand return value 
 StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=358, mMessage=Storage domain does not exist: ('fd2bb500-0744-405d-9860-d2b3824f13cf',)]]
2013-09-16 16:02:27,763 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) HostName = tigris01.scl.lab.tlv.redhat.com
2013-09-16 16:02:27,807 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) Command FormatStorageDomainVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('fd2bb500-0744-405d-9860-d2b3824f13cf',)
2013-09-16 16:02:27,807 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) FINISH, FormatStorageDomainVDSCommand, log id: 47105bd3
2013-09-16 16:02:27,808 ERROR [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8702-7) Command org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('fd2bb500-0744-405d-9860-d2b3824f13cf',) (Failed with error StorageDomainDoesNotExist and code 358)
2013-09-16 16:02:27,850 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-7) Correlation ID: 76be222, Job ID: e25df7b6-40a8-4ae0-9e4e-065d2934264c, Call Stack: null, Custom Event ID: -1, Message: Failed to remove Storage Domain SD-FCP-04. (User: admin@internal)
2013-09-16 16:02:27,905 INFO  [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8702-7) Lock freed to object EngineLock [exclusiveLocks= key: fd2bb500-0744-405d-9860-d2b3824f13cf value: STORAGE

Comment 7 vvyazmin@redhat.com 2013-09-17 07:54:04 UTC
Created attachment 798656 [details]
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm

Comment 12 Leonid Natapov 2013-11-18 13:24:45 UTC
is23. fixed. tested according steps to reproduce. SD removed without problem after restarting ovirt-engine.

Comment 13 Itamar Heim 2014-01-21 22:21:19 UTC
Closing - RHEV 3.3 Released

Comment 14 Itamar Heim 2014-01-21 22:26:43 UTC
Closing - RHEV 3.3 Released

Comment 15 Liron Aravot 2014-10-19 14:57:23 UTC
*** Bug 1019295 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.