Bug 1001626

Summary: Failed Remove Storage Domain after restart “ovirt-engine” service
Product: Red Hat Enterprise Virtualization Manager Reporter: vvyazmin <vvyazmin>
Component: ovirt-engineAssignee: Maor <mlipchuk>
Status: CLOSED CURRENTRELEASE QA Contact: Leonid Natapov <lnatapov>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.3.0CC: abaron, acanan, acathrow, amureini, bazulay, iheim, lpeer, mlipchuk, ratamir, Rhev-m-bugs, scohen, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.3.0Flags: abaron: Triaged+
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: is20.2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1026487    
Attachments:
Description Flags
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm
none
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm none

Description vvyazmin@redhat.com 2013-08-27 12:24:52 UTC
Created attachment 790947 [details]
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm

Description of problem:
Failed Remove Storage Domain after restart “ovirt-engine” service

Version-Release number of selected component (if applicable):
RHEVM 3.3 - IS11 environment:

RHEVM:  rhevm-3.3.0-0.16.master.el6ev.noarch
PythonSDK:  rhevm-sdk-python-3.3.0.11-1.el6ev.noarch
VDSM:  vdsm-4.12.0-72.git287bb7e.el6ev.x86_64
LIBVIRT:  libvirt-0.10.2-18.el6_4.9.x86_64
QEMU & KVM:  qemu-kvm-rhev-0.12.1.2-2.355.el6_4.5.x86_64
SANLOCK:  sanlock-2.8-1.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create unattached Storage Domain (SD) 
2. Remove SD - RemoveStorageDomain command 
3. During FormatStorageDomain command - restart “ovirt-engine” service
4. Remove SD failed (as expected)
5. Once webadmin is back, run remove SD again

Actual results:
Failed remove SD - failed on FormatStorageDomain command

Expected results:
Succeed remove SD

Impact on user:
Failed remove SD

Workaround:
Run ForceRemoveStorageDomain command

Additional info:

/var/log/ovirt-engine/engine.log
2013-08-27 14:57:24,968 INFO  [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8702-10) Running command: RemoveStorageDomainCommand internal: false
. Entities affected :  ID: 5aa0e6b6-6969-4c81-b676-db85d548249a Type: Storage
2013-08-27 14:57:24,974 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (ajp-/127.0.0.1:8702-10) START, ConnectStorageServerVDSCommand(HostName 
= tigris01.scl.lab.tlv.redhat.com, HostId = 9576d8ca-4466-46e6-bebc-ccd922075ac6, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = ISCSI, connectionList = [{ 
id: f7e66fe5-e840-4987-a339-03234a63d57a, connection: 10.35.160.7, iqn: iqn.2008-05.com.xtremio:001e675b8ee0, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: nu
ll, nfsTimeo: null };]), log id: 128cf66
2013-08-27 14:57:25,639 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (ajp-/127.0.0.1:8702-10) FINISH, ConnectStorageServerVDSCommand, return:
 {f7e66fe5-e840-4987-a339-03234a63d57a=0}, log id: 128cf66
2013-08-27 14:57:25,641 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) START, FormatStorageDomainVDSCommand(HostName = 
tigris01.scl.lab.tlv.redhat.com, HostId = 9576d8ca-4466-46e6-bebc-ccd922075ac6, storageDomainId=5aa0e6b6-6969-4c81-b676-db85d548249a), log id: 49710f12
2013-08-27 14:57:44,281 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) Failed in FormatStorageDomainVDS method
2013-08-27 14:57:44,281 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) Error code StorageDomainDoesNotExist and error m
essage VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('5aa0e6b6-6969-4c81-b676-db85d548249a',)
2013-08-27 14:57:44,281 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) Command org.ovirt.engine.core.vdsbroker.vdsbroke
r.FormatStorageDomainVDSCommand return value 
 StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=358, mMessage=Storage domain does not exist: ('5aa0e6b6-6969-4c81-b676-db85d548249a',)]]
2013-08-27 14:57:44,281 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) HostName = tigris01.scl.lab.tlv.redhat.com
2013-08-27 14:57:44,281 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) Command FormatStorageDomainVDS execution failed.
 Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('5aa0e6b6-6969-4c81-b676-db85d54
8249a',)
2013-08-27 14:57:44,281 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-10) FINISH, FormatStorageDomainVDSCommand, log id: 4
9710f12
2013-08-27 14:57:44,282 ERROR [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8702-10) Command org.ovirt.engine.core.bll.storage.RemoveStorageDoma
inCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Faile
d to FormatStorageDomainVDS, error = Storage domain does not exist: ('5aa0e6b6-6969-4c81-b676-db85d548249a',) (Failed with error StorageDomainDoesNotExist and code 358)

/var/log/vdsm/vdsm.log

Comment 1 Allon Mureinik 2013-08-28 07:33:08 UTC
Remove storage domain and format storage domain should roll forward.
If a removal is interrupted, the SD should be removed from the engine, since it's probably unusable anyway, and in the worst event there will be some leftovers on the storage.

Comment 4 Ayal Baron 2013-09-02 15:01:27 UTC
According to the flow the *engine* was restart while vdsm was performing a formatStorageDomain op.  Then engine sent the command again (after it was started) and vdsm notified that it cannot find this storage domain.  This sounds like an engine issue to me.

Comment 5 Maor 2013-09-15 15:09:49 UTC
Engine removes the storage domain only after it is removed from VDSM.

The proposed solution should be that if the engine gets "Storage domain does not exist" when calling formatStorageDomain, then the engine should roll forword and remove the storage.

Comment 6 vvyazmin@redhat.com 2013-09-17 07:53:30 UTC
Version-Release number of selected component (if applicable):
RHEVM 3.3 - IS14 environment:

Host OS: RHEL 6.5

RHEVM:  rhevm-3.3.0-0.21.master.el6ev.noarch
PythonSDK:  rhevm-sdk-python-3.3.0.13-1.el6ev.noarch
VDSM:  vdsm-4.12.0-127.gitedb88bf.el6ev.x86_64
LIBVIRT:  libvirt-0.10.2-23.el6.bz964359.eblake.1.x86_64
QEMU & KVM:  qemu-kvm-rhev-0.12.1.2-2.401.el6.x86_64
SANLOCK:  sanlock-2.8-1.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.  Remove a Storage Domain and reboot RHEVM host 
2. After “START, FormatStorageDomainVDSCommand” command restart “ovirt-engine” service.

Actual results:
Failed remove the domain once WebAdmin is back.
SD removed from VDSM, but not from RHEVM DB

Expected results:
Succeed remove the domain once WebAdmin is back.

Additional info:

/var/log/ovirt-engine/engine.log


2013-09-16 16:02:27,761 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) Failed in FormatStorageDomainVDS method
2013-09-16 16:02:27,762 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) Error code StorageDomainDoesNotExist and error message VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('fd2bb500-0744-405d-9860-d2b3824f13cf',)
2013-09-16 16:02:27,763 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) Command org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand return value 
 StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=358, mMessage=Storage domain does not exist: ('fd2bb500-0744-405d-9860-d2b3824f13cf',)]]
2013-09-16 16:02:27,763 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) HostName = tigris01.scl.lab.tlv.redhat.com
2013-09-16 16:02:27,807 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) Command FormatStorageDomainVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('fd2bb500-0744-405d-9860-d2b3824f13cf',)
2013-09-16 16:02:27,807 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (ajp-/127.0.0.1:8702-7) FINISH, FormatStorageDomainVDSCommand, log id: 47105bd3
2013-09-16 16:02:27,808 ERROR [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8702-7) Command org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Storage domain does not exist: ('fd2bb500-0744-405d-9860-d2b3824f13cf',) (Failed with error StorageDomainDoesNotExist and code 358)
2013-09-16 16:02:27,850 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-7) Correlation ID: 76be222, Job ID: e25df7b6-40a8-4ae0-9e4e-065d2934264c, Call Stack: null, Custom Event ID: -1, Message: Failed to remove Storage Domain SD-FCP-04. (User: admin@internal)
2013-09-16 16:02:27,905 INFO  [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (ajp-/127.0.0.1:8702-7) Lock freed to object EngineLock [exclusiveLocks= key: fd2bb500-0744-405d-9860-d2b3824f13cf value: STORAGE

Comment 7 vvyazmin@redhat.com 2013-09-17 07:54:04 UTC
Created attachment 798656 [details]
## Logs rhevm, vdsm, libvirt, thread dump, superVdsm

Comment 12 Leonid Natapov 2013-11-18 13:24:45 UTC
is23. fixed. tested according steps to reproduce. SD removed without problem after restarting ovirt-engine.

Comment 13 Itamar Heim 2014-01-21 22:21:19 UTC
Closing - RHEV 3.3 Released

Comment 14 Itamar Heim 2014-01-21 22:26:43 UTC
Closing - RHEV 3.3 Released

Comment 15 Liron Aravot 2014-10-19 14:57:23 UTC
*** Bug 1019295 has been marked as a duplicate of this bug. ***