Bug 1651840 - Power up of HostedEngine environment does not connect Host to all storage domains
Summary: Power up of HostedEngine environment does not connect Host to all storage dom...
Keywords:
Status: CLOSED DUPLICATE of bug 1772688
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.2.7
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ---
: ---
Assignee: Benny Zlotnik
QA Contact: Avihai
URL:
Whiteboard:
Depends On:
Blocks: 1772688
TreeView+ depends on / blocked
 
Reported: 2018-11-21 02:17 UTC by Germano Veit Michel
Modified: 2022-03-22 10:49 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-24 19:46:45 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)

Description Germano Veit Michel 2018-11-21 02:17:51 UTC
Description of problem:

When powering up a HostedEngine environment, the Host that the HostedEngine VM runs is not connected to all Storage Domains of the DC, only to hosted_storage. But it is in Up status. This host gets the SPM role without seeing all SDs (hosted_storage is master), and several commands start failing as the host is not connected to the other SDs.

It happens on this situation:
1. Enable global maintenance
2. Migrate Hosted-Engine to Host X
3. Shutdown the Hosted-Engine
4. Power cycle the host
5. Start Hosted-Engine on Host X
6. HostedEngine goes up on Host X
   - Host X is still in Up state in the DB since step 2
7. Host X is in Up state, but engine did not send connectStorageServer commands to connect to all other SDs, it is only connected to hosted_storage (master).

I think it happens because the HE goes up on a host that was set to Up in the DB before the shutdown, so it doesn't send the connect storage commands to it. If the HE goes up on another host that was not up in the DB during shutdown, the problem does not seem to happen.

Host in Up status after powering up the env:

# vdsm-client Host getStorageDomains
[
    "4d38d22c-88c1-4054-a942-15bc51cd8214"  <-- hosted_storage
]

Expected SDs to be connected to if host is in Up status:

# vdsm-client Host getStorageDomains
[
    "f7eeca0e-b360-4d88-959a-1e0e0730f846", 
    "4d38d22c-88c1-4054-a942-15bc51cd8214", 
    "8b5fc9dc-019b-4c8a-ba89-7b0dc19c5186", 
    "c9d1a566-d436-4fd7-82c1-f886ca239a14", 
    "72114491-e7e4-4680-b095-7d3b83a967c7"
]

Version-Release number of selected component (if applicable):
ovirt-engine-4.2.7.4-1.el7.noarch
vdsm-4.20.43-1.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
As above

Actual results:
Host is in Up status, but only connected to hosted_storage

Expected results:
If Host is in Up status, it must be connected to the entire Pool.

Additional information:

1. ovirt-engine starts

2018-11-21 11:58:02,652+10 INFO  [org.ovirt.engine.core.uutils.config.ShellLikeConfd] (ServerService Thread Pool -- 44) [] Loaded file '/usr/share/ovirt-engine/services/ovirt-engine/ovirt-engine.conf'.

2018-11-21 11:58:05,557+10 INFO  [org.ovirt.engine.core.vdsbroker.VdsManager] (ServerService Thread Pool -- 41) [] Initialize vdsBroker 'host1.rhvlab:54321'

2018-11-21 11:58:07,598+10 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to host1.rhvlab/192.168.100.1

2018-11-21 11:58:12,266+10 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetHardwareInfoAsyncVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-5) [] START, GetHardwareInfoAsyncVDSCommand(HostName = host1.rhvlab, VdsIdAndVdsVDSCommandParametersBase:{hostId='8b0876b5-6e38-464f-a018-a93c91d27724', vds='Host[host1.rhvlab,8b0876b5-6e38-464f-a018-a93c91d27724]'}), log id: 14d9690b

2. warnings about SDs not connected:

2018-11-21 11:58:13,616+10 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-4) [6bb8e8de] domain '72114491-e7e4-4680-b095-7d3b83a967c7:Export' in problem 'NOT_REPORTED'. vds: 'host1.rhvlab'
2018-11-21 11:58:13,643+10 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-4) [6bb8e8de] domain 'c9d1a566-d436-4fd7-82c1-f886ca239a14:NFS' in problem 'NOT_REPORTED'. vds: 'host1.rhvlab'
2018-11-21 11:58:13,652+10 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-4) [6bb8e8de] domain 'f7eeca0e-b360-4d88-959a-1e0e0730f846:iSCSI' in problem 'NOT_REPORTED'. vds: 'host1.rhvlab'
2018-11-21 11:58:13,676+10 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] (EE-ManagedThreadFactory-engine-Thread-4) [6bb8e8de] domain '8b5fc9dc-019b-4c8a-ba89-7b0dc19c5186:ISO' in problem 'NOT_REPORTED'. vds: 'host1.rhvlab'

3. connect storage pool

2018-11-21 11:58:17,993+10 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-37) [8d9bb30] START, ConnectStoragePoolVDSCommand(HostName = host1.rhvlab, ConnectStoragePoolVDSCommandParameters:{hostId='8b0876b5-6e38-464f-a018-a93c91d27724', vdsId='8b0876b5-6e38-464f-a018-a93c91d27724', storagePoolId='bcced3da-e61d-11e8-9e0a-52540015c1ff', masterVersion='1'}), log id: 717c0c84

4. spm start on this host

2018-11-21 11:58:18,650+10 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-37) [8d9bb30] START, SpmStartVDSCommand(HostName = host1.rhvlab, SpmStartVDSCommandParameters:{hostId='8b0876b5-6e38-464f-a018-a93c91d27724', storagePoolId='bcced3da-e61d-11e8-9e0a-52540015c1ff', prevId='-1', prevLVER='4', storagePoolFormatType='V4', recoveryMode='Manual', SCSIFencing='false'}), log id: 416a4529

5. Random things start failing, because the host is not connected to several SDs, but the engine thinks it is in up state and send commands that require those SDs connected

2018-11-21 12:00:49,605+10 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [aefb9ca2-8ac3-4ba2-bd53-79e2f21299e3] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host1.rhvlab command GetVolumeInfoVDS failed: Storage domain does not exist: (u'c9d1a566-d436-4fd7-82c1-f886ca239a14',)

Comment 1 Simone Tiraboschi 2018-11-26 09:12:59 UTC
(In reply to Germano Veit Michel from comment #0)
> It happens on this situation:
> 1. Enable global maintenance
> 2. Migrate Hosted-Engine to Host X
> 3. Shutdown the Hosted-Engine
> 4. Power cycle the host
> 5. Start Hosted-Engine on Host X

Here the engine is also going to start on the engine.
AFAIK after engine start there is a kind of grace period where the engine should try to reconcile hosts status before taking any further action.
We should investigate why this wasn't enough in the reported case.


> 6. HostedEngine goes up on Host X
>    - Host X is still in Up state in the DB since step 2
> 7. Host X is in Up state, but engine did not send connectStorageServer
> commands to connect to all other SDs, it is only connected to hosted_storage
> (master).

Comment 2 Eli Mesika 2018-12-23 10:49:53 UTC
(In reply to Germano Veit Michel from comment #0)

Please use :

Host A = initial Hosted-Engine Host 
Host X = New Hosted-Engine Host

> 
> It happens on this situation:
> 1. Enable global maintenance
> 2. Migrate Hosted-Engine to Host X
> 3. Shutdown the Hosted-Engine

which one ? A or X ???

> 4. Power cycle the host

which one ? A or X ???


> 5. Start Hosted-Engine on Host X
> 6. HostedEngine goes up on Host X
>    - Host X is still in Up state in the DB since step 2
> 7. Host X is in Up state, but engine did not send connectStorageServer
> commands to connect to all other SDs, it is only connected to hosted_storage
> (master).

Comment 3 Germano Veit Michel 2019-01-06 23:05:11 UTC
(In reply to Eli Mesika from comment #2)
> (In reply to Germano Veit Michel from comment #0)
> 
> Please use :
> 
> Host A = initial Hosted-Engine Host 
> Host X = New Hosted-Engine Host
> 
> > 
> > It happens on this situation:
> > 1. Enable global maintenance
> > 2. Migrate Hosted-Engine to Host X
> > 3. Shutdown the Hosted-Engine
> 
> which one ? A or X ???

X

> 
> > 4. Power cycle the host
> 
> which one ? A or X ???

X

> 
> 
> > 5. Start Hosted-Engine on Host X
> > 6. HostedEngine goes up on Host X
> >    - Host X is still in Up state in the DB since step 2
> > 7. Host X is in Up state, but engine did not send connectStorageServer
> > commands to connect to all other SDs, it is only connected to hosted_storage
> > (master).

In fact, I think any order will reproduce this, I get this bug every single time I power up my test environment.

Comment 4 Eli Mesika 2019-01-09 10:37:33 UTC
This warning  is related to storage 
Please look at IrsProxy::addDomainInProblemData

Seems that in that case host must go to non-operational 

Tal, can you take a look and move to storage ?

Comment 5 Sandro Bonazzola 2019-01-28 09:34:40 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 8 Marina Kalinin 2021-05-24 19:46:45 UTC
Closing the upstram bug in favor of downstream, to concentrate efforts.

*** This bug has been marked as a duplicate of bug 1772688 ***


Note You need to log in before you can comment on or make changes to this bug.