Bug 1251912

Summary: hosted-engine-setup fails updating vlan property on the management network if more than one datacenter is there
Product: Red Hat Enterprise Virtualization Manager Reporter: wanghui <huiwa>
Component: ovirt-hosted-engine-setupAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED ERRATA QA Contact: Michael Burman <mburman>
Severity: high Docs Contact:
Priority: high    
Version: 3.5.4CC: cshao, cwu, fdeutsch, gklein, huiwa, huzhao, leiwang, lsurette, mburman, sbonazzo, stirabos, yaniwang, ycui, ykaul, ylavi
Target Milestone: ovirt-3.6.0-rcKeywords: ZStream
Target Release: 3.6.0Flags: huiwa: needinfo-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, if more than one data center was defined in Red Hat Enterprise Virtualization Manager while using tagged VLANs, 'hosted-engine --deploy' failed updating the VLAN property on the management network. Now, multiple data centers are handled correctly.
Story Points: ---
Clone Of:
: 1260143 (view as bug list) Environment:
Last Closed: 2016-03-09 19:13:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1250199, 1260143    
Attachments:
Description Flags
rhevh log files
none
engine log files none

Description wanghui 2015-08-10 09:47:57 UTC
Created attachment 1060994 [details]
rhevh log files

Description of problem:
It failed when setup hosted engine through vlan tagged network. It will failed during register rhevh to rhevm part.

Version-Release number of selected component (if applicable):
rhev-hypervisor7-7.1-20150805.0.el7ev
ovirt-node-3.2.3-16.el7.noarch
ovirt-node-plugin-hosted-engine-0.2.0-18.0.el7ev.noarch
ovirt-hosted-engine-setup-1.2.5.2-1.el7ev.noarch
rhevm-appliance-20150727.0-1.x86_64.rhevm.ova  

How reproducible:
100%

Steps to Reproduce:
1. Clean install rhev-hypervisor7-7.1-20150805.0.el7ev
2. Create network throug vlan tagged
3. Setup hosted engine through ova type.
4. Configure vm to rhevm3.5.4. Create new cluster for rhevh to register and modify the rhevm network as vlan tagged in rhevm.
5. Select as follows

To continue make a selection from the options below:
          (1) Continue setup - engine installation is complete
          (2) Power off and restart the VM
          (3) Abort setup
          (4) Destory VM and abort setup

          (1,2,3,4)[1]: <Enter>

Actual results:
1. It reports error like follows.

[ERROR]Failed to execute stage 'Closing up': [ERROR]::Used query (name=rhevm) produces ambiguous results.

Expected results:


Additional info:

Comment 1 wanghui 2015-08-10 09:49:40 UTC
Created attachment 1060995 [details]
engine log files

Comment 2 Sandro Bonazzola 2015-08-10 13:02:11 UTC
Looking at the setup logs, setup correctly detected the tagged vlan:

Please indicate a nic to set rhevm bridge on: (enp7s4, enp32s0f0, enp32s0f1, enp63s0, enp32s0f0.20) [enp7s4]: enp32s0f0.20

When the setup query the engine about rhevm it receives:

2015-08-10 05:22:45 DEBUG otopi.plugins.ovirt_hosted_engine_setup.engine.add_host add_host._closeup:579 Getting engine's management network via engine's APIs
2015-08-10 05:22:46 DEBUG otopi.context context._executeMethod:152 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 142, in _executeMethod
  File "/usr/share/ovirt-hosted-engine-setup/plugins/ovirt-hosted-engine-setup/engine/add_host.py", line 582, in _closeup
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/brokers.py", line 15230, in get
  File "/usr/lib/python2.7/site-packages/ovirtsdk/utils/filterhelper.py", line 31, in getItem
AmbiguousQueryError: [ERROR]::Used query (name=rhevm) produces ambiguous results.
2015-08-10 05:22:46 ERROR otopi.context context._executeMethod:161 Failed to execute stage 'Closing up': [ERROR]::Used query (name=rhevm) produces ambiguous results.
2015-08-10 05:22:46 DEBUG otopi.context context.dumpEnvironment:490 ENVIRONMENT DUMP - BEGIN
2015-08-10 05:22:46 DEBUG otopi.context context.dumpEnvironment:500 ENV BASE/error=bool:'True'
2015-08-10 05:22:46 DEBUG otopi.context context.dumpEnvironment:500 ENV BASE/exceptionInfo=list:'[(<class 'ovirtsdk.infrastructure.errors.AmbiguousQueryError'>, AmbiguousQueryError('[ERROR]::Used query (name=rhevm) produces ambiguous results.',), <traceback object at 0x49133b0>)]'

Comment 3 Sandro Bonazzola 2015-08-10 13:03:22 UTC
vdsm creates correctly the network bridge:

Thread-16::DEBUG::2015-08-10 04:51:04,253::BindingXMLRPC::1133::vds::(wrapper) client [127.0.0.1]::call setupNetworks with ({'rhevm': {'nic': 'enp32s0f0', 'vlan': 20, 'blockingdhcp': True, 'bootproto': 'dhcp'}}, {}, {'connectivityCheck': False}) {}
Thread-16::DEBUG::2015-08-10 04:51:09,191::BindingXMLRPC::1140::vds::(wrapper) return setupNetworks with {'status': {'message': 'Done', 'code': 0}}

Comment 4 Sandro Bonazzola 2015-08-10 13:06:39 UTC
In engine log I see:
2015-08-10 01:20:52,363 INFO  [org.ovirt.engine.core.bll.storage.AddEmptyStoragePoolCommand] (ajp-/127.0.0.1:8702-2) [53b563d7] Running command: AddEmptyStoragePoolCommand internal: false. Entities affected :  ID: aaa00000-0000-0000-0000-123456789aaa Type: SystemAction group CREATE_STORAGE_POOL with role type ADMIN
2015-08-10 01:20:52,493 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-2) [53b563d7] Correlation ID: 53b563d7, Call Stack: null, Custom Event ID: -1, Message: Data Center test, Compatibility Version 3.5 and Quota Type DISABLED was added by admin@internal
2015-08-10 01:21:59,949 INFO  [org.ovirt.engine.core.bll.AddVdsGroupCommand] (ajp-/127.0.0.1:8702-1) [40d94fad] Running command: AddVdsGroupCommand internal: false. Entities affected :  ID: 4b83150d-db61-484c-8400-5a3de39acdc5 Type: StoragePoolAction group CREATE_CLUSTER with role type ADMIN
2015-08-10 01:22:00,128 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-1) [40d94fad] Correlation ID: 40d94fad, Call Stack: null, Custom Event ID: -1, Message: Cluster test was added by admin@internal
2015-08-10 01:22:31,632 INFO  [org.ovirt.engine.core.bll.network.dc.UpdateNetworkCommand] (ajp-/127.0.0.1:8702-5) [5270129c] Running command: UpdateNetworkCommand internal: false. Entities affected :  ID: c9dda052-25ae-4f4c-9487-b1a6837d3de2 Type: NetworkAction group CONFIGURE_STORAGE_POOL_NETWORK with role type ADMIN
2015-08-10 01:22:31,773 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-5) [5270129c] Correlation ID: 5270129c, Call Stack: null, Custom Event ID: -1, Message: Network rhevm was updated on Data Center: test

Has the rhevm network been somehow modified manually during the setup?

Comment 5 wanghui 2015-08-12 02:30:06 UTC
(In reply to Sandro Bonazzola from comment #4)
> In engine log I see:
> 2015-08-10 01:20:52,363 INFO 
> [org.ovirt.engine.core.bll.storage.AddEmptyStoragePoolCommand]
> (ajp-/127.0.0.1:8702-2) [53b563d7] Running command:
> AddEmptyStoragePoolCommand internal: false. Entities affected :  ID:
> aaa00000-0000-0000-0000-123456789aaa Type: SystemAction group
> CREATE_STORAGE_POOL with role type ADMIN
> 2015-08-10 01:20:52,493 INFO 
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (ajp-/127.0.0.1:8702-2) [53b563d7] Correlation ID: 53b563d7, Call Stack:
> null, Custom Event ID: -1, Message: Data Center test, Compatibility Version
> 3.5 and Quota Type DISABLED was added by admin@internal
> 2015-08-10 01:21:59,949 INFO  [org.ovirt.engine.core.bll.AddVdsGroupCommand]
> (ajp-/127.0.0.1:8702-1) [40d94fad] Running command: AddVdsGroupCommand
> internal: false. Entities affected :  ID:
> 4b83150d-db61-484c-8400-5a3de39acdc5 Type: StoragePoolAction group
> CREATE_CLUSTER with role type ADMIN
> 2015-08-10 01:22:00,128 INFO 
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (ajp-/127.0.0.1:8702-1) [40d94fad] Correlation ID: 40d94fad, Call Stack:
> null, Custom Event ID: -1, Message: Cluster test was added by admin@internal
> 2015-08-10 01:22:31,632 INFO 
> [org.ovirt.engine.core.bll.network.dc.UpdateNetworkCommand]
> (ajp-/127.0.0.1:8702-5) [5270129c] Running command: UpdateNetworkCommand
> internal: false. Entities affected :  ID:
> c9dda052-25ae-4f4c-9487-b1a6837d3de2 Type: NetworkAction group
> CONFIGURE_STORAGE_POOL_NETWORK with role type ADMIN
> 2015-08-10 01:22:31,773 INFO 
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (ajp-/127.0.0.1:8702-5) [5270129c] Correlation ID: 5270129c, Call Stack:
> null, Custom Event ID: -1, Message: Network rhevm was updated on Data
> Center: test
> 
> Has the rhevm network been somehow modified manually during the setup?

No. All the configurations are done by HE process.

Comment 6 Fabian Deutsch 2015-08-12 08:48:07 UTC
Can this be reproduced?

Comment 7 Sandro Bonazzola 2015-08-12 12:36:38 UTC
I think that the thing that breaks the setup is in comment #0:

4. Configure vm to rhevm3.5.4. Create new cluster for rhevh to register and modify the rhevm network as vlan tagged in rhevm.

I guess that modifying the rhevm network at this point causes issues within the deployment.

Comment 8 Ying Cui 2015-08-13 04:13:32 UTC
(In reply to Fabian Deutsch from comment #6)
> Can this be reproduced?

Fabian, could you please rephrase your question here? We don't understand the question here, we can reproduce this issue 100% in bug description. Thanks.

Comment 9 Fabian Deutsch 2015-08-13 07:20:19 UTC
(In reply to Ying Cui from comment #8)
> (In reply to Fabian Deutsch from comment #6)
> > Can this be reproduced?
> 
> Fabian, could you please rephrase your question here? We don't understand
> the question here, we can reproduce this issue 100% in bug description.
> Thanks.

That was just my question, if this can be reproduced 100%.

Comment 10 Fabian Deutsch 2015-08-13 07:28:20 UTC
Considering Sandro's comment 7, it looks like the tested flow is not the usual one.
The bit which is making it different, is the manual creation of the cluster.

COuld you please test the following flow:

Steps to Reproduce:
1. Clean install rhev-hypervisor7-7.1-20150805.0.el7ev
2. Create network throug vlan tagged
3. Setup hosted engine through ova type.
4. Configure vm to rhevm3.5.4. (Use the Default cluster)
5. Select as follows [1]

6. Modify the rhevm network as vlan tagged in rhevm.

IIUIC this should be more safe.

Sandro, does this flow look good to you?

Comment 11 wanghui 2015-08-13 07:37:55 UTC
(In reply to Fabian Deutsch from comment #10)
> Considering Sandro's comment 7, it looks like the tested flow is not the
> usual one.
> The bit which is making it different, is the manual creation of the cluster.
> 
> COuld you please test the following flow:
> 
> Steps to Reproduce:
> 1. Clean install rhev-hypervisor7-7.1-20150805.0.el7ev
> 2. Create network throug vlan tagged
> 3. Setup hosted engine through ova type.
> 4. Configure vm to rhevm3.5.4. (Use the Default cluster)
> 5. Select as follows [1]
> 
> 6. Modify the rhevm network as vlan tagged in rhevm.
> 
> IIUIC this should be more safe.
> 
> Sandro, does this flow look good to you?

Hey fabian,

That flow is ok. If you register to Default cluster, and modify the rhevm network as vlan tagged in rhevm. HE setup can be succeed.

But I have the question, should we always use default cluster when setup HE?  

Thanks
Hui Wang

Comment 12 Simone Tiraboschi 2015-08-13 07:45:30 UTC
On my opinion the issue is here:

> Steps to Reproduce:
> ...
> 4. Configure vm to rhevm3.5.4. Create new cluster for rhevh to register and modify the rhevm network as vlan tagged in rhevm.

cause hosted-engine-setup is going by itself to automatically modify the management network to be vlan tagged before adding the first host.

                    mgmt_network = engine_api.networks.get(
                        name=self.environment[
                            ohostedcons.NetworkEnv.BRIDGE_NAME]
                    )
                    mgmt_network.set_vlan(
                        self._ovirtsdk_xml.params.VLAN(id=vlan_id)
                    )
                    mgmt_network.update()

If the user manually tweaked the management network configuration before that step then the automatic procedure will fail cause it's not able to correctly identify the management network.

wanghui, could you please try to reproduce skipping the network tweaking part on your step 4? If it doesn't reproduce it's not a bug.

Comment 13 Simone Tiraboschi 2015-08-13 07:48:30 UTC
(In reply to wanghui from comment #11)

> But I have the question, should we always use default cluster when setup HE?

No, you can create a specific one for HE and it shouldn't be an issue.
But you should avoid modifying the management network otherwise it will fail identifying it.

Could you please try reproducing with a custom cluster and the default untouched management network?

Comment 14 wanghui 2015-08-13 08:04:07 UTC
(In reply to Simone Tiraboschi from comment #12)
> On my opinion the issue is here:
> 
> > Steps to Reproduce:
> > ...
> > 4. Configure vm to rhevm3.5.4. Create new cluster for rhevh to register and modify the rhevm network as vlan tagged in rhevm.
> 
> cause hosted-engine-setup is going by itself to automatically modify the
> management network to be vlan tagged before adding the first host.
> 
>                     mgmt_network = engine_api.networks.get(
>                         name=self.environment[
>                             ohostedcons.NetworkEnv.BRIDGE_NAME]
>                     )
>                     mgmt_network.set_vlan(
>                         self._ovirtsdk_xml.params.VLAN(id=vlan_id)
>                     )
>                     mgmt_network.update()
> 
> If the user manually tweaked the management network configuration before
> that step then the automatic procedure will fail cause it's not able to
> correctly identify the management network.
> 
> wanghui, could you please try to reproduce skipping the network tweaking
> part on your step 4? If it doesn't reproduce it's not a bug.

Hi Simone Tiraboschi,

IIRC rhevm3.5.z need to rhevm network be tagged otherwise this vlan tagged rhevh can not be registered to rhevm3.5.z.

And when I register to default cluster, I also need to modify rhevm network to vlan tagged. But this scenario is succeed.

Thanks
Hui Wang

Comment 15 Fabian Deutsch 2015-08-13 08:22:27 UTC
Because using the Default cluster as described in comment 11 is working, I'd say this is not blocking 3.5.4 and lowering the severity.

Comment 16 Simone Tiraboschi 2015-08-13 08:52:52 UTC
(In reply to wanghui from comment #14)
> IIRC rhevm3.5.z need to rhevm network be tagged otherwise this vlan tagged
> rhevh can not be registered to rhevm3.5.z.

hosted-engine-setup should correctly add the tagging information by itself since this: https://gerrit.ovirt.org/#/c/32374/

> And when I register to default cluster, I also need to modify rhevm network
> to vlan tagged. But this scenario is succeed.

So the issue happens just if you have more than one cluster

Comment 17 Simone Tiraboschi 2015-08-13 09:03:24 UTC
Ok, reproduced:
it happens if you have more than one datacenter cause management network is datacenter specific.

Traceback (most recent call last):
  File "testvlan.py", line 16, in <module>
    name='ovirtmgmt'
  File "/usr/lib/python2.7/site-packages/ovirtsdk/infrastructure/brokers.py", line 15230, in get
    query="name=" + name
  File "/usr/lib/python2.7/site-packages/ovirtsdk/utils/filterhelper.py", line 31, in getItem
    raise AmbiguousQueryError(query)
ovirtsdk.infrastructure.errors.AmbiguousQueryError: [ERROR]::Used query (name=ovirtmgmt) produces ambiguous results.

Comment 20 Michael Burman 2015-12-14 09:49:12 UTC
Verified on - rhevm-3.6.1.3-0.1.el6.noarch
ovirt-hosted-engine-setup-1.3.1.2-1.el7ev.noarch
vdsm-4.17.13-1.el7ev.noarch
Red Hat Enterprise Virtualization Hypervisor (Beta) release 7.2 (20151210.1.el7ev)

Comment 22 errata-xmlrpc 2016-03-09 19:13:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0375.html