Bug 1001626 - Failed Remove Storage Domain after restart “ovirt-engine” service
Failed Remove Storage Domain after restart “ovirt-engine” service
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.3.0
x86_64 Linux
medium Severity medium
: ---
: 3.3.0
Assigned To: Maor
Leonid Natapov
storage
: Triaged
: 1019295 (view as bug list)
Depends On:
Blocks: 3.3snap1
  Show dependency treegraph
 
Reported: 2013-08-27 08:24 EDT by vvyazmin@redhat.com
Modified: 2016-02-10 15:34 EST (History)
12 users (show)

See Also:
Fixed In Version: is20.2
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
abaron: Triaged+


Attachments (Terms of Use)
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm (5.46 MB, application/x-gzip)
2013-08-27 08:24 EDT, vvyazmin@redhat.com
no flags Details
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm (2.45 MB, application/x-gzip)
2013-09-17 03:54 EDT, vvyazmin@redhat.com
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 20158 None None None Never
oVirt gerrit 20333 None None None Never

  None (edit)
Description vvyazmin@redhat.com 2013-08-27 08:24:52 EDT
Created attachment 790947 [details]
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm

Description of problem:
Failed Remove Storage Domain after restart “ovirt-engine” service

Version-Release number of selected component (if applicable):
RHEVM 3.3 - IS11 environment:

RHEVM:  rhevm-3.3.0-0.16.master.el6ev.noarch
PythonSDK:  rhevm-sdk-python-3.3.0.11-1.el6ev.noarch
VDSM:  vdsm-4.12.0-72.git287bb7e.el6ev.x86_64
LIBVIRT:  libvirt-0.10.2-18.el6_4.9.x86_64
QEMU & KVM:  qemu-kvm-rhev-0.12.1.2-2.355.el6_4.5.x86_64
SANLOCK:  sanlock-2.8-1.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create unattached Storage Domain (SD) 
2. Remove SD - RemoveStorageDomain command 
3. During FormatStorageDomain command - restart “ovirt-engine” service
4. Remove SD failed (as expected)
5. Once webadmin is back, run remove SD again

Actual results:
Failed remove SD - failed on FormatStorageDomain command

Expected results:
Succeed remove SD

Impact on user:
Failed remove SD

Workaround:
Run ForceRemoveStorageDomain command

Additional info:

/var/log/ovirt-engine/engine.log
2013-08-27 14:57:24,968 INFO  [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8702-10) Running command: RemoveStorageDomainCommand internal: false
. Entities affected :  ID: 5aa0e6b6-6969-4c81-b676-db85d548249a Type: Storage
2013-08-27 14:57:24,974 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (ajp-/127.0.0.1:8702-10) START, ConnectStorageServerVDSCommand(HostName 
= tigris01.scl.lab.tlv.redhat.com, HostId = 9576d8ca-4466-46e6-bebc-ccd922075ac6, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = ISCSI, connectionList = [{ 
id: f7e66fe5-e840-4987-a339-03234a63d57a, connection: 10.35.160.7, iqn: iqn.2008-05.com.xtremio:001e675b8ee0, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: nu
ll, nfsTimeo: null };]), log id: 128cf66
2013-08-27 14:57:25,639 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (ajp-/127.0.0.1:8702-10) FINISH, ConnectStorageServerVDSCommand, return:
 {f7e66fe5-e840-4987-a339-03234a63d57a=0}, log id: 128cf66
2013-08-27 14:57:25,641 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) START, FormatStorageDomainVDSCommand(HostName = 
tigris01.scl.lab.tlv.redhat.com, HostId = 9576d8ca-4466-46e6-bebc-ccd922075ac6, storageDomainId=5aa0e6b6-6969-4c81-b676-db85d548249a), log id: 49710f12
2013-08-27 14:57:44,281 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) Failed in FormatStorageDomainVDS method
2013-08-27 14:57:44,281 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) Error code StorageDomainDoesNotExist and error m
essage VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('5aa0e6b6-6969-4c81-b676-db85d548249a',)
2013-08-27 14:57:44,281 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) Command org.ovirt.engine.core.vdsbroker.vdsbroke
r.FormatStorageDomainVDSCommand return value 
 StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=358, mMessage=Storage domain does not exist: ('5aa0e6b6-6969-4c81-b676-db85d548249a',)]]
2013-08-27 14:57:44,281 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) HostName = tigris01.scl.lab.tlv.redhat.com
2013-08-27 14:57:44,281 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) Command FormatStorageDomainVDS execution failed.
 Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('5aa0e6b6-6969-4c81-b676-db85d54
8249a',)
2013-08-27 14:57:44,281 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) FINISH, FormatStorageDomainVDSCommand, log id: 4
9710f12
2013-08-27 14:57:44,282 ERROR [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8702-10) Command org.ovirt.engine.core.bll.storage.RemoveStorageDoma
inCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Faile
d to FormatStorageDomainVDS, error = Storage domain does not exist: ('5aa0e6b6-6969-4c81-b676-db85d548249a',) (Failed with error StorageDomainDoesNotExist and code 358)

/var/log/vdsm/vdsm.log
Comment 1 Allon Mureinik 2013-08-28 03:33:08 EDT
Remove storage domain and format storage domain should roll forward.
If a removal is interrupted, the SD should be removed from the engine, since it's probably unusable anyway, and in the worst event there will be some leftovers on the storage.
Comment 4 Ayal Baron 2013-09-02 11:01:27 EDT
According to the flow the *engine* was restart while vdsm was performing a formatStorageDomain op.  Then engine sent the command again (after it was started) and vdsm notified that it cannot find this storage domain.  This sounds like an engine issue to me.
Comment 5 Maor 2013-09-15 11:09:49 EDT
Engine removes the storage domain only after it is removed from VDSM.

The proposed solution should be that if the engine gets "Storage domain does not exist" when calling formatStorageDomain, then the engine should roll forword and remove the storage.
Comment 6 vvyazmin@redhat.com 2013-09-17 03:53:30 EDT
Version-Release number of selected component (if applicable):
RHEVM 3.3 - IS14 environment:

Host OS: RHEL 6.5

RHEVM:  rhevm-3.3.0-0.21.master.el6ev.noarch
PythonSDK:  rhevm-sdk-python-3.3.0.13-1.el6ev.noarch
VDSM:  vdsm-4.12.0-127.gitedb88bf.el6ev.x86_64
LIBVIRT:  libvirt-0.10.2-23.el6.bz964359.eblake.1.x86_64
QEMU & KVM:  qemu-kvm-rhev-0.12.1.2-2.401.el6.x86_64
SANLOCK:  sanlock-2.8-1.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.  Remove a Storage Domain and reboot RHEVM host 
2. After “START, FormatStorageDomainVDSCommand” command restart “ovirt-engine” service.

Actual results:
Failed remove the domain once WebAdmin is back.
SD removed from VDSM, but not from RHEVM DB

Expected results:
Succeed remove the domain once WebAdmin is back.

Additional info:

/var/log/ovirt-engine/engine.log


2013-09-16 16:02:27,761 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) Failed in FormatStorageDomainVDS method
2013-09-16 16:02:27,762 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) Error code StorageDomainDoesNotExist and error message VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('fd2bb500-0744-405d-9860-d2b3824f13cf',)
2013-09-16 16:02:27,763 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) Command org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand return value 
 StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=358, mMessage=Storage domain does not exist: ('fd2bb500-0744-405d-9860-d2b3824f13cf',)]]
2013-09-16 16:02:27,763 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) HostName = tigris01.scl.lab.tlv.redhat.com
2013-09-16 16:02:27,807 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) Command FormatStorageDomainVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('fd2bb500-0744-405d-9860-d2b3824f13cf',)
2013-09-16 16:02:27,807 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) FINISH, FormatStorageDomainVDSCommand, log id: 47105bd3
2013-09-16 16:02:27,808 ERROR [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8702-7) Command org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('fd2bb500-0744-405d-9860-d2b3824f13cf',) (Failed with error StorageDomainDoesNotExist and code 358)
2013-09-16 16:02:27,850 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-7) Correlation ID: 76be222, Job ID: e25df7b6-40a8-4ae0-9e4e-065d2934264c, Call Stack: null, Custom Event ID: -1, Message: Failed to remove Storage Domain SD-FCP-04. (User: admin@internal)
2013-09-16 16:02:27,905 INFO  [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8702-7) Lock freed to object EngineLock [exclusiveLocks= key: fd2bb500-0744-405d-9860-d2b3824f13cf value: STORAGE
Comment 7 vvyazmin@redhat.com 2013-09-17 03:54:04 EDT
Created attachment 798656 [details]
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm
Comment 12 Leonid Natapov 2013-11-18 08:24:45 EST
is23. fixed. tested according steps to reproduce. SD removed without problem after restarting ovirt-engine.
Comment 13 Itamar Heim 2014-01-21 17:21:19 EST
Closing - RHEV 3.3 Released
Comment 14 Itamar Heim 2014-01-21 17:26:43 EST
Closing - RHEV 3.3 Released
Comment 15 Liron Aravot 2014-10-19 10:57:23 EDT
*** Bug 1019295 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.