Bug 684595

Summary: [vdsm] [storage] [scale] deactivate storage domain doesn't return with valid return response
Product: Red Hat Enterprise Linux 6 Reporter: Haim <hateya>
Component: vdsmAssignee: Eduardo Warszawski <ewarszaw>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Haim <hateya>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1CC: abaron, bazulay, dnaori, ewarszaw, hateya, iheim, mgoldboi, syeghiay, yeylon, ykaul
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-01 08:36:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
vdsm logs. none

Description Haim 2011-03-13 17:40:45 UTC
Created attachment 484029 [details]
vdsm logs.

Description of problem:

running on small scale system, with new meta-data, at some point, rhevm decided to deactivate storage domain due too problematic state (high latency), and sent deactivate storage domain, vdsm started to process this command, but never returns valid response code (greped all logs), and looks like this: 
------------------------------------------------------------------------------
Thread-86543::INFO::2011-03-12 15:36:31,328::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: deactivateStorageDomain, args: ( sdUUID=aa6a8d5
3-8c0a-4be1-865e-452948c2ef83 spUUID=a8e3a5e0-1437-4dfb-9ac5-c6835227a074 msdUUID=00000000-0000-0000-0000-000000000000 masterVersion=146)
Thread-86543::DEBUG::2011-03-12 15:36:31,684::task::491::TaskManager.Task::(_debug) Task f3e5ac66-b788-414f-a57f-ff9b35e7c97c: moving from state init -> state
preparing
------------------------------------------------------------------------------

then, I see few logs regarding this thread, at some point it goes to sleep for 2 minutes (didn't manage to acquire resource), and then i see the following:
------------------------------------------------------------------------------
Thread-86543::INFO::2011-03-12 15:38:18,908::sp::942::Storage.StoragePool::(deactivateSD) sdUUID=aa6a8d53-8c0a-4be1-865e-452948c2ef83 spUUID=a8e3a5e0-1437-4dfb-9ac5-c6835227a074 msdUUID=00000000-0000-0000-0000-000000000000
------------------------------------------------------------------------------

no return response what so ever, towards the end of log, i get the following errors, but non over that specific SD: 
------------------------------------------------------------------------------
- RuntimeError: _handleRequests._checkForMail - Could not read mailbox
- AttributeError: 'NoneType' object has no attribute 'partial'

[root@rhev-i32c-01 vdsm]# zgrep  deactivateStorage /var/log/vdsm/vdsm.log.*  |grep aa6a8d53-8c0a-4be1-865e-452948c2ef83 | grep Run

/var/log/vdsm/var/log/vdsm/vdsm.log.27.gz:Thread-86543::INFO::2011-03-12 15:36:31,328::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: deactivateStorageDomain, args: ( sdUUID=aa6a8d53-8c0a-4be1-865e-452948c2ef83 spUUID=a8e3a5e0-1437-4dfb-9ac5-c6835227a074 msdUUID=00000000-0000-0000-0000-000000000000 masterVersion=146)

/var/log/vdsm/vdsm.log.42.gz:Thread-63070::INFO::2011-03-12 00:27:12,374::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: deactivateStorageDomain, args: ( sdUUID=aa6a8d53-8c0a-4be1-865e-452948c2ef83 spUUID=a8e3a5e0-1437-4dfb-9ac5-c6835227a074 msdUUID=00000000-0000-0000-0000-000000000000 masterVersion=141)

result: 

vg is activate, but has no link in '/rhev/data-center/mnt/blockSD/', backend rollbacked command, and 'thinks' vg (domain) is up, meaning, totally a mess.

setup:

1) fcp
2) 31 storage domains 
3) vm load - 194

Comment 1 RHEL Program Management 2011-04-04 02:12:42 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 2 Eduardo Warszawski 2011-04-07 09:53:08 UTC
The attached logs are not from this bug.
Haim, please add them or reproduce.

Comment 3 Haim 2011-05-01 08:36:43 UTC
(In reply to comment #2)
> The attached logs are not from this bug.
> Haim, please add them or reproduce.

small chances to reproduce - will re-open in case i'll hit it again