Bug 850859 - RHEVM - Backend: Remove of DC with one corrupted domain fails and leaves healthy domain stuck in Locked status
RHEVM - Backend: Remove of DC with one corrupted domain fails and leaves heal...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.1.0
Unspecified Unspecified
high Severity high
: ---
: 3.1.0
Assigned To: Federico Simoncelli
Haim
storage
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-22 10:49 EDT by Daniel Paikov
Modified: 2016-02-10 14:50 EST (History)
10 users (show)

See Also:
Fixed In Version: SI21
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-10-17 05:19:20 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
engine.log (4.50 KB, application/x-compressed-tar)
2012-08-22 10:49 EDT, Daniel Paikov
no flags Details
vdsm.log (472.29 KB, application/x-compressed-tar)
2012-08-22 10:51 EDT, Daniel Paikov
no flags Details
vdsm.log (267.47 KB, application/x-compressed-tar)
2012-08-22 10:53 EDT, Daniel Paikov
no flags Details

  None (edit)
Description Daniel Paikov 2012-08-22 10:49:34 EDT
Created attachment 606300 [details]
engine.log

* DC with one healthy domain (status = active/master), and one domain with bad metadata (status = inactive).
* Maintenance the active domain.
* Try to remove DC.
* Removal fails, DC is in Maintenance mode, corrupt domain is in Inactive mode, healthy domain is stuck in Locked mode.
Comment 1 Daniel Paikov 2012-08-22 10:51:18 EDT
Created attachment 606301 [details]
vdsm.log
Comment 2 Daniel Paikov 2012-08-22 10:53:52 EDT
Created attachment 606305 [details]
vdsm.log
Comment 3 Federico Simoncelli 2012-10-15 12:40:29 EDT
The logs contain an infinite loop of:

Thread-18057::ERROR::2012-08-28 01:04:21,782::SecureXMLRPCServer::77::root::(handle_error) client ('10.35.97.174', 47219)
Traceback (most recent call last):
  File "/usr/lib64/python2.6/SocketServer.py", line 560, in process_request_thread
    self.finish_request(request, client_address)
  File "/usr/lib64/python2.6/site-packages/vdsm/SecureXMLRPCServer.py", line 68, in finish_request
    request.do_handshake()
  File "/usr/lib64/python2.6/ssl.py", line 279, in do_handshake
    self._sslobj.do_handshake()
SSLError: [Errno 1] _ssl.c:490: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request

Are you sure the engine was able to communicate with vdsm?
Comment 4 Haim 2012-10-16 06:03:24 EDT
(In reply to comment #3)
> The logs contain an infinite loop of:
> 
> Thread-18057::ERROR::2012-08-28
> 01:04:21,782::SecureXMLRPCServer::77::root::(handle_error) client
> ('10.35.97.174', 47219)
> Traceback (most recent call last):
>   File "/usr/lib64/python2.6/SocketServer.py", line 560, in
> process_request_thread
>     self.finish_request(request, client_address)
>   File "/usr/lib64/python2.6/site-packages/vdsm/SecureXMLRPCServer.py", line
> 68, in finish_request
>     request.do_handshake()
>   File "/usr/lib64/python2.6/ssl.py", line 279, in do_handshake
>     self._sslobj.do_handshake()
> SSLError: [Errno 1] _ssl.c:490: error:1407609C:SSL
> routines:SSL23_GET_CLIENT_HELLO:http request
> 
> Are you sure the engine was able to communicate with vdsm?

yes, it just means that some other backend tries to communicate with this VDS with no success.
Comment 5 Federico Simoncelli 2012-10-16 06:29:39 EDT
I tried to reproduce but the master domain (the consistent one) doesn't remain locked (it is correctly moved to maintenance) and the DC removal is prevented by the error: "Error while executing action: Cannot remove Data Center when there are more than one Storage Domain attached."

Are you still affected by this issue? Also update the steps to reproduce if you think I missed something.

If you are going to test this again could you provide cleaner logs (both vdsm and engine) without other backends involved?

Thanks.
Comment 6 Daniel Paikov 2012-10-17 05:18:42 EDT
Can't reproduce anymore, since we don't allow DC removal of DCs with more than one domain. This bug is now irrelevant since we can't reach this flow.

GUI:
Error while executing action: Cannot remove Data Center when there are more than one Storage Domain attached.

engine.log:
2012-10-17 10:57:05,756 WARN  [org.ovirt.engine.core.bll.storage.RemoveStoragePoolCommand] (http--0.0.0.0-8700-5) CanDoAction of action RemoveStoragePool failed. Reasons:VAR__TYPE__STORAGE__POOL,VAR__ACTION__REMOVE,ERROR_CANNOT_REMOVE_STORAGE_POOL_WITH_NONMASTER_DOMAINS

Note You need to log in before you can comment on or make changes to this bug.