Bug 917576 - engine: connectStorageServer is not sent for inactive domains before connectStoragePool
Summary: engine: connectStorageServer is not sent for inactive domains before connectS...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.2.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: 3.2.0
Assignee: mkublin
QA Contact: Elad
URL:
Whiteboard: infra
Depends On:
Blocks: 923116 948448
TreeView+ depends on / blocked
 
Reported: 2013-03-04 11:22 UTC by Elad
Modified: 2016-02-10 19:14 UTC (History)
14 users (show)

Fixed In Version: sf11
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 923116 (view as bug list)
Environment:
Last Closed:
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (315.34 KB, application/x-gzip)
2013-03-04 11:22 UTC, Elad
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 12805 0 None MERGED engine: connectStorageServer is not sent for inactive domains before connectStoragePool 2021-02-11 12:56:23 UTC

Description Elad 2013-03-04 11:22:50 UTC
Created attachment 704880 [details]
logs

Description of problem:

When trying to connect host to storage pool with an inactive master domain, the operation failed because engine do not perform connectStorageServer on an inactive domain. Insted, the engine is doing connectStorageServer to another domain that is unattatched and then connectStoragePool to the relevant inactive domain. 

Version-Release number of selected component (if applicable):

vdsm-4.10.2-10.0.el6ev.x86_64

How reproducible:

100%

Steps to Reproduce:

1. Have one host and 2 domains: 1 inactive and 1 unattatched.
2. Activate the host
  
Actual results:

The engine will perform connectStorageServer to another domain that is unattatched and then connectStoragePool to the relevant inactive domain. 


Expected results:

Engine should perform connectStorageServer on an inactive domain. 


Additional info: logs

Comment 2 Dafna Ron 2013-03-04 12:28:01 UTC
the reproduction is: 
1. in an iscsi DC with 1 data domain, 1 iso domain and 1 host, block the storage domain from the host using iptables 
2. once host becomes non-operational add a second host

both hosts will be non-operational, the data domain will be inactive and the iso will be unknown. 

3. remove the iptables block from the storage. 

4. put hosts in maintenance and activate the hosts

we send connectStoragePool without connectStroageServer for the data domain
but we do send connectStorageServer for the iso domain.

Comment 3 Ayal Baron 2013-03-04 12:48:24 UTC
(In reply to comment #2)
> the reproduction is: 
> 1. in an iscsi DC with 1 data domain, 1 iso domain and 1 host, block the
> storage domain from the host using iptables 
> 2. once host becomes non-operational add a second host
> 
> both hosts will be non-operational, the data domain will be inactive and the
> iso will be unknown. 
> 
> 3. remove the iptables block from the storage. 
> 
> 4. put hosts in maintenance and activate the hosts
> 
> we send connectStoragePool without connectStroageServer for the data domain
> but we do send connectStorageServer for the iso domain.

Liron, is this use case covered by the bug you're fixing? (if so please close as duplicate)

Comment 4 Liron Aravot 2013-03-04 13:47:37 UTC
Ayal, this is seems to be different issue then the other bug (though they somehow related)

From brief look - The issue here seems to me that during initVdsOnUp, we perform ConnectStorageServer only to domains that are unknown/active - queried using the stored procedure Getstorage_server_connectionsByStoragePoolId (storage_domains.status in(0,3));

The domain auto recovery process performs the connect operations only for hosts that are in status up, so basically as the hosts should be non operational we won't do connect operation either by this flow to them.

so basically IIUC, we can't have success in initvdsonup as we don't connect to the inactive domain storage server, while auto recovery won't help as the hosts aren't in status up.

I'm adding Michael for his opinion.

Comment 5 Ayal Baron 2013-03-04 14:41:54 UTC
(In reply to comment #4)
> Ayal, this is seems to be different issue then the other bug (though they
> somehow related)
> 
> From brief look - The issue here seems to me that during initVdsOnUp, we
> perform ConnectStorageServer only to domains that are unknown/active -
> queried using the stored procedure
> Getstorage_server_connectionsByStoragePoolId (storage_domains.status
> in(0,3));

This seems wrong to me.  Once we split maintenance from inactive, I don't understand why initvdsonup doesn't try to connect to inactive domains (this is exactly the operation that may move them back to active).

> 
> The domain auto recovery process performs the connect operations only for
> hosts that are in status up, so basically as the hosts should be non
> operational we won't do connect operation either by this flow to them.
> 
> so basically IIUC, we can't have success in initvdsonup as we don't connect
> to the inactive domain storage server, while auto recovery won't help as the
> hosts aren't in status up.
> 
> I'm adding Michael for his opinion.

Comment 6 mkublin 2013-03-05 13:02:40 UTC
I think that bug exist from 3.1 version at least, when InActive status was introduced.
First bug , is our code, the following DbFacade.getInstance().getStorageServerConnectionDao().getAllForStoragePool()
method not returning all connections but only active/unknown - this is bug. 
And what is more funny, code has two tests that checking that it will return all domains, nothing to add.
For this - fix is easy.
Next problem:
Connection success but I need to do reconstruct because of error during 
ConnectStoragePool.
Reconstruct will fail, because I have only one/all storage domain which are inactive. (Actually reconstruct will not run at all, because of no new master domain is elected).
If reconstruct is called from InitVdsOnUp domain with any status (Active/Unknown/InActive) can be chosen as new master, but Active should has first priority 
So two fixes

Comment 7 mkublin 2013-03-07 07:19:53 UTC
http://gerrit.ovirt.org/#/c/12805/

Fixing connection bug. If will be problem after connect , a new bug should be opened

Comment 9 Elad 2013-03-25 09:24:40 UTC
Verified on SF11. connectStorageServer was sent to the inactive data domain and then connectStoragePool.


Thread-38639::INFO::2013-03-25 11:14:27,392::logUtils::37::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=3, spUUID='f8ed642d-5e2f-4065-b22f-8a3f4d88e318', conList=[{'connection': '10.35.64.81', 'iqn': 'elad203', 'portal': '1', 'user': '', 'password': '******', 'id': 'ad6dccb6-365f-465c-bcb6-25a444169528', 'port': '3260'}], options=None)


Thread-38641::INFO::2013-03-25 11:14:28,169::logUtils::37::dispatcher::(wrapper) Run and protect: connectStoragePool(spUUID='f8ed642d-5e2f-4065-b22f-8a3f4d88e318', hostID=2, scsiKey='f8ed642d-5e2f-4065-b22f-8a3f4d88e318', msdUUID='29daeb89-f858-4d2a-ba38-3027210862a8', masterVersion=1, options=None)

Comment 10 Itamar Heim 2013-06-11 09:51:18 UTC
3.2 has been released

Comment 11 Itamar Heim 2013-06-11 09:51:38 UTC
3.2 has been released

Comment 12 Itamar Heim 2013-06-11 09:58:40 UTC
3.2 has been released


Note You need to log in before you can comment on or make changes to this bug.