Bug 850859

Summary: RHEVM - Backend: Remove of DC with one corrupted domain fails and leaves healthy domain stuck in Locked status
Product: Red Hat Enterprise Virtualization Manager Reporter: Daniel Paikov <dpaikov>
Component: ovirt-engineAssignee: Federico Simoncelli <fsimonce>
Status: CLOSED NOTABUG QA Contact: Haim <hateya>
Severity: high Docs Contact:
Priority: high    
Version: 3.1.0CC: abaron, amureini, dyasny, hateya, iheim, lpeer, Rhev-m-bugs, sgrinber, yeylon, ykaul
Target Milestone: ---   
Target Release: 3.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: SI21 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-17 09:19:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine.log
none
vdsm.log
none
vdsm.log none

Description Daniel Paikov 2012-08-22 14:49:34 UTC
Created attachment 606300 [details]
engine.log

* DC with one healthy domain (status = active/master), and one domain with bad metadata (status = inactive).
* Maintenance the active domain.
* Try to remove DC.
* Removal fails, DC is in Maintenance mode, corrupt domain is in Inactive mode, healthy domain is stuck in Locked mode.

Comment 1 Daniel Paikov 2012-08-22 14:51:18 UTC
Created attachment 606301 [details]
vdsm.log

Comment 2 Daniel Paikov 2012-08-22 14:53:52 UTC
Created attachment 606305 [details]
vdsm.log

Comment 3 Federico Simoncelli 2012-10-15 16:40:29 UTC
The logs contain an infinite loop of:

Thread-18057::ERROR::2012-08-28 01:04:21,782::SecureXMLRPCServer::77::root::(handle_error) client ('10.35.97.174', 47219)
Traceback (most recent call last):
  File "/usr/lib64/python2.6/SocketServer.py", line 560, in process_request_thread
    self.finish_request(request, client_address)
  File "/usr/lib64/python2.6/site-packages/vdsm/SecureXMLRPCServer.py", line 68, in finish_request
    request.do_handshake()
  File "/usr/lib64/python2.6/ssl.py", line 279, in do_handshake
    self._sslobj.do_handshake()
SSLError: [Errno 1] _ssl.c:490: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request

Are you sure the engine was able to communicate with vdsm?

Comment 4 Haim 2012-10-16 10:03:24 UTC
(In reply to comment #3)
> The logs contain an infinite loop of:
> 
> Thread-18057::ERROR::2012-08-28
> 01:04:21,782::SecureXMLRPCServer::77::root::(handle_error) client
> ('10.35.97.174', 47219)
> Traceback (most recent call last):
>   File "/usr/lib64/python2.6/SocketServer.py", line 560, in
> process_request_thread
>     self.finish_request(request, client_address)
>   File "/usr/lib64/python2.6/site-packages/vdsm/SecureXMLRPCServer.py", line
> 68, in finish_request
>     request.do_handshake()
>   File "/usr/lib64/python2.6/ssl.py", line 279, in do_handshake
>     self._sslobj.do_handshake()
> SSLError: [Errno 1] _ssl.c:490: error:1407609C:SSL
> routines:SSL23_GET_CLIENT_HELLO:http request
> 
> Are you sure the engine was able to communicate with vdsm?

yes, it just means that some other backend tries to communicate with this VDS with no success.

Comment 5 Federico Simoncelli 2012-10-16 10:29:39 UTC
I tried to reproduce but the master domain (the consistent one) doesn't remain locked (it is correctly moved to maintenance) and the DC removal is prevented by the error: "Error while executing action: Cannot remove Data Center when there are more than one Storage Domain attached."

Are you still affected by this issue? Also update the steps to reproduce if you think I missed something.

If you are going to test this again could you provide cleaner logs (both vdsm and engine) without other backends involved?

Thanks.

Comment 6 Daniel Paikov 2012-10-17 09:18:42 UTC
Can't reproduce anymore, since we don't allow DC removal of DCs with more than one domain. This bug is now irrelevant since we can't reach this flow.

GUI:
Error while executing action: Cannot remove Data Center when there are more than one Storage Domain attached.

engine.log:
2012-10-17 10:57:05,756 WARN  [org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand] (http--0.0.0.0-8700-5) CanDoAction of action RemoveStoragePool failed. Reasons:VAR__TYPE__STORAGE__POOL,VAR__ACTION__REMOVE,ERROR_CANNOT_REMOVE_STORAGE_POOL_WITH_NONMASTER_DOMAINS