Bug 1785800 - Get Host Capabilities fails if Infiniband interfaces exist
Summary: Get Host Capabilities fails if Infiniband interfaces exist
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: General
Version: 4.40.0
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ovirt-4.4.1
: ---
Assignee: bugs@ovirt.org
QA Contact: Dominik Holler
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-21 04:07 UTC by Alexander Murashkin
Modified: 2020-07-08 08:26 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2020-07-08 08:26:03 UTC
oVirt Team: Network
Embargoed:
sbonazzo: ovirt-4.4?


Attachments (Terms of Use)
Engine log (59.43 KB, text/plain)
2019-12-21 04:07 UTC, Alexander Murashkin
no flags Details
Engine packages (57.73 KB, text/plain)
2019-12-21 04:08 UTC, Alexander Murashkin
no flags Details
Engine and host logs (379.04 KB, application/gzip)
2019-12-21 04:49 UTC, Alexander Murashkin
no flags Details
Unexpected failure of libnm when running the mainloop (7.71 MB, application/gzip)
2020-01-09 01:40 UTC, Alexander Murashkin
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github nmstate nmstate issues 655 0 None closed nmstate fails when Infiniband interfaces are present 2020-11-03 03:18:00 UTC
Red Hat Bugzilla 1791156 0 medium CLOSED Failed on showing MAC address of Infiniband interface 2023-04-20 15:23:20 UTC
Red Hat Bugzilla 1841017 0 high CLOSED RFE: Allow IP over InfiniBand (IPoIB) device to be configured 2023-06-27 13:13:16 UTC

Description Alexander Murashkin 2019-12-21 04:07:13 UTC
Created attachment 1646957 [details]
Engine log

Description of problem:

Get Host Capabilities fails if Infiniband interfaces exist. Infiniband MAC address is very long and does not match the schema pattern.

The failure happened while the host was being reinstalled. Currently host status is "Unassigned".

Here is the pattern 

    ^([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}$

Here is Infiniband MAC 

    80:00:02:08:fe:80:00:00:00:00:00:00:f4:52:14:03:00:8d:52:11

Version-Release number of selected component (if applicable):

ovirt-engine-4.4.0-0.0.master.20191219143318.git65c2ffb.el7.noarch

How reproducible:

Do Hosts -> the host -> Maintenance -> Reinstall. 

Actual results:

2019-12-20 21:37:32,014-06 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-57) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM harrier command Get Host Capabilities failed: Internal JSON-RPC error: {'reason': "'80:00:02:08:FE:80:00:00:00:00:00:00:F4:52:14:03:00:8D:52:11' does not match '^([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}$'\n\nFailed validating 'pattern' in schema['properties']['interfaces']['items']['allOf'][0]['properties']['mac-address']:\n    {'pattern': '^([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}$', 'type': 'string'}\n\nOn instance['interfaces'][6]['mac-address']:\n    '80:00:02:08:FE:80:00:00:00:00:00:00:F4:52:14:03:00:8D:52:11'"}

Expected results:

Successfull Get Host Capabilities.

Additional info:

Here is an example of Infiniband MAC address

6: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UP group default qlen 256
    link/infiniband 80:00:02:08:fe:80:00:00:00:00:00:00:f4:52:14:03:00:8d:52:11 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

Comment 1 Alexander Murashkin 2019-12-21 04:08:08 UTC
Created attachment 1646958 [details]
Engine packages

Comment 2 Alexander Murashkin 2019-12-21 04:49:27 UTC
Created attachment 1646961 [details]
Engine and host logs

Similar errors in vdsm.log and supervdsm.log

Comment 3 Alexander Murashkin 2019-12-21 05:25:35 UTC
As the host was unusable, I had to mask rdma.service on the host (to disable Infiniband) and remove Infiniband network from oVirt. Obviously, rebooted the host few times.

After it the host reinstall completed. But our Infiniband configuration is lost.

The configuration will be restored. But we need to be able to run reinstall while Infiniband is configured.

BTW The failed reinstall also broke boot configuration (seems another bug). I had to reboot using older kernel.

Comment 4 Alexander Murashkin 2019-12-21 18:34:38 UTC
The error comes from the host's python3-libnmstate.noarch package. 

See also https://bugzilla.redhat.com/show_bug.cgi?id=1740644 

Relevant version

python3-libnmstate-0.2.1-0.20191212.805git055f39e.el8.noarch

How to reproduce (assuming that the host has Infiniband interfaces and rdma.service is started)

# nmstatectl show
Traceback (most recent call last):
  File "/usr/bin/nmstatectl", line 11, in <module>
    load_entry_point('nmstate==0.2.1', 'console_scripts', 'nmstatectl')()
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 59, in main
    return args.func(args)
  File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 203, in show
    state = _filter_state(libnmstate.show(), args.only)
  File "/usr/lib/python3.6/site-packages/libnmstate/netinfo.py", line 71, in show
    validator.validate(report)
  File "/usr/lib/python3.6/site-packages/libnmstate/validator.py", line 56, in validate
    js.validate(data, validation_schema)
  File "/usr/lib/python3.6/site-packages/jsonschema/validators.py", line 541, in validate
    cls(schema, *args, **kwargs).validate(instance)
  File "/usr/lib/python3.6/site-packages/jsonschema/validators.py", line 130, in validate
    raise error
jsonschema.exceptions.ValidationError: '80:00:02:08:FE:80:00:00:00:00:00:00:F4:52:14:03:00:8D:52:11' does not match '^([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}$'

Failed validating 'pattern' in schema['properties']['interfaces']['items']['allOf'][0]['properties']['mac-address']:
    {'pattern': '^([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}$', 'type': 'string'}

On instance['interfaces'][6]['mac-address']:
    '80:00:02:08:FE:80:00:00:00:00:00:00:F4:52:14:03:00:8D:52:11'

Comment 5 Alexander Murashkin 2019-12-21 18:43:30 UTC
nmstate (https://github.com/nmstate/nmstate) does not have Infiniband type implemented. It also requires MAC address to be 6 octets.

nmstate shall have Infiniband added or, at least, accept MAC addresses 20 octets in size.

Meanwhile, as a workaround apply the following patch on the host 

--- /usr/lib/python3.6/site-packages/libnmstate/schemas/operational-state.yaml.orig 2019-12-12 09:24:22.000000000 -0600
+++ /usr/lib/python3.6/site-packages/libnmstate/schemas/operational-state.yaml	2019-12-21 12:09:38.277135571 -0600
@@ -63,7 +63,7 @@
         - down
     mac-address:
       type: string
-      pattern: "^([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}$"
+      pattern: "^(([a-fA-F0-9]{2}:){5}|([a-fA-F0-9]{2}:){19})[a-fA-F0-9]{2}$"
     bridge-vlan-tag:
       type: integer
       minimum: 0

With the patch the host has been reinstalled successfully.

Comment 6 Alexander Murashkin 2019-12-21 19:03:59 UTC
Submitted nmstate issue https://github.com/nmstate/nmstate/issues/655

Comment 7 Alexander Murashkin 2019-12-21 20:51:47 UTC
Applying the patch fixes reinstall but other problems remain. nmstate returns "unknown" interface type. As the result vdsm cannot apply any network changes.

We had to disable nmstate completely.

# cat /etc/vdsm/vdsm.conf.d/99_local.conf 
[vars]

# Control nmstate network backend provider.
net_nmstate_enabled = false

How is VdsmUseNmstate property related to net_nmstate_enabled? Disabling the property

# engine-config -g VdsmUseNmstate
VdsmUseNmstate: false version: 4.2
VdsmUseNmstate: false version: 4.3
VdsmUseNmstate: false version: 4.4
VdsmUseNmstate: false version: general

does not disable nmstate usage by vdsm.

Comment 8 Alexander Murashkin 2019-12-21 20:55:18 UTC
In summary, there are 3 related issues

1. nmstate does not support Infiniband
2. currently hosts with Infiniband have to have net_nmstate_enabled=false in vdsm config
3. VdsmUseNmstate=false does not disable nmstate usage by vdsm

Comment 10 Alexander Murashkin 2020-01-09 01:40:09 UTC
Created attachment 1650884 [details]
Unexpected failure of libnm when running the mainloop

If the MAC address validation patch is applied, the engine log gets errors similar to one below

2019-12-21 13:14:23,022-06 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (default task-125) [7c65ff99-7cbd-4247-b577-e9149862f141] Error: VDSGenericException: VDSErrorException: Failed to HostSetupNetworksVDS, error = Internal JSON-RPC error: {'reason': 'Unexpected failure of libnm when running the mainloop: run execution'}, code = -32603

See the attached log file and nmstatectl output.

Here what nmstatectl show on the host returns (all fields are wrong except the MAC address) 

- name: ib0
  type: unknown
  state: down
  ipv4:
    enabled: false
  ipv6:
    enabled: false
  mac-address: 80:00:02:08:FE:80:00:00:00:00:00:00:F4:52:14:03:00:8D:52:11
  mtu: 65520

Comment 11 Edward Haas 2020-01-09 07:21:53 UTC
(In reply to Alexander Murashkin from comment #10)

> If the MAC address validation patch is applied, the engine log gets errors

After applying the patch, have you restarted supervdsm service?

> Here what nmstatectl show on the host returns (all fields are wrong except
> the MAC address) 
> 
> - name: ib0
>   type: unknown
>   state: down
>   ipv4:
>     enabled: false
>   ipv6:
>     enabled: false
>   mac-address: 80:00:02:08:FE:80:00:00:00:00:00:00:F4:52:14:03:00:8D:52:11
>   mtu: 65520

Interfaces that are not managed by NetworkManager are showing with state `down`.
Interface types that are not supported will show with type `unknown`.

It seems that we also do not report the IP address if the interface is not managed by NM,
which is wrong and should be fixed.
I will open a BZ on that.

Comment 12 Alexander Murashkin 2020-01-10 01:35:09 UTC
> After applying the patch, have you restarted supervdsm service?

Yes, and the MAC format validation exception does not happen anymore. 

> Interfaces that are not managed by NetworkManager are showing with state `down`.

Will the state attribute be used by oVirt in any way? 

NetworkManager (CLI at least) reports state as "connected", "disconnected", and "unmanaged". It seems that "connected" and "unmanaged" correspond to ip link state UP.

# nmcli dev
DEVICE       TYPE         STATE         CONNECTION 
enp4s4       ethernet     connected     enp4s4               <--- the interface is "up" and managed by NetworkManager
enp5s5       ethernet     disconnected  --                   <--- the interface is "down"
enp6s0       ethernet     disconnected  --         
ib1          infiniband   disconnected  --         
vnet0        tun          disconnected  --         
vnet1        tun          disconnected  --         
;vdsmdummy;  bridge       unmanaged     --         
ovirtmgmt    bridge       unmanaged     --                   <--- the interface is "up" and is not managed
enp10s0      ethernet     unmanaged     --         
ib0          infiniband   unmanaged     --         
lo           loopback     unmanaged     --         
br-int       openvswitch  unmanaged     --         
ovs-system   openvswitch  unmanaged     --

Comment 13 Dominik Holler 2020-05-27 12:36:29 UTC
Alexander, is the issue fixed for you?

Comment 14 Dominik Holler 2020-07-01 09:00:37 UTC
Alexander, thanks a lot for reporting this bug.
It is fixed now:
nmstate-0.2.10-1.el8.noarch
vdsm-4.40.20-5.gitd3e64e6a9.el8.x86_64

MainProcess|jsonrpc/0::DEBUG::2020-07-01 10:53:22,054::supervdsm_server::100::SuperVdsm.ServerCallback::(wrapper) return network_caps with {'networks': {'ovirtmgmt': {'ports': ['enp0s31f6'], 'stp': 'off', 'iface': 'ovirtmgmt', 'bridged': True, 'addr': '192.168.178.90', 'netmask': '255.255.255.0', 'ipv4addrs': ['192.168.178.90/24'], 'ipv6addrs': ['fd00::3697:f6ff:fe9c:641f/64', '2001:16b8:17b7:eb00:3697:f6ff:fe9c:641f/64'], 'ipv6autoconf': True, 'gateway': '192.168.178.1', 'ipv6gateway': 'fe80::464e:6dff:fe3a:dccd', 'ipv4defaultroute': True, 'mtu': '1500', 'switch': 'legacy', 'southbound': 'enp0s31f6', 'dhcpv4': True, 'dhcpv6': True}}, 'bondings': {}, 'bridges': {'ovirtmgmt': {'ports': ['enp0s31f6'], 'stp': 'off', 'addr': '192.168.178.90', 'ipv4addrs': ['192.168.178.90/24'], 'ipv6addrs': ['fd00::3697:f6ff:fe9c:641f/64', '2001:16b8:17b7:eb00:3697:f6ff:fe9c:641f/64'], 'ipv6autoconf': True, 'gateway': '192.168.178.1', 'ipv6gateway': 'fe80::464e:6dff:fe3a:dccd', 'mtu': '1500', 'netmask': '255.255.255.0', 'ipv4defaultroute': True, 'dhcpv4': True, 'dhcpv6': True, 'opts': {'ageing_time': '30000', 'multicast_query_use_ifaddr': '0', 'gc_timer': '4242', 'hello_time': '200', 'multicast_router': '1', 'nf_call_iptables': '0', 'group_addr': '01:80:c2:00:00:00', 'group_fwd_mask': '0x0', 'multicast_querier': '0', 'hash_max': '512', 'tcn_timer': '0', 'vlan_protocol': '0x8100', 'vlan_stats_per_port': '0', 'root_port': '0', 'vlan_filtering': '0', 'multicast_query_interval': '12500', 'bridge_id': '8000.3497f69c641f', 'max_age': '2000', 'nf_call_arptables': '0', 'multicast_startup_query_interval': '3125', 'multicast_stats_enabled': '0', 'multicast_query_response_interval': '1000', 'topology_change': '0', 'priority': '32768', 'multicast_mld_version': '1', 'hash_elasticity': '4', 'hello_timer': '0', 'default_pvid': '1', 'root_path_cost': '0', 'multicast_igmp_version': '2', 'stp_state': '0', 'multicast_startup_query_count': '2', 'topology_change_detected': '0', 'multicast_last_member_interval': '100', 'topology_change_timer': '0', 'root_id': '8000.3497f69c641f', 'forward_delay': '1500', 'multicast_membership_interval': '26000', 'multicast_querier_interval': '25500', 'vlan_stats_enabled': '0', 'multicast_snooping': '1', 'nf_call_ip6tables': '0', 'multicast_last_member_count': '2'}}}, 'nics': {'enp4s0f0': {'hwaddr': 'a0:36:9f:13:8c:66', 'addr': '', 'ipv4addrs': [], 'ipv6addrs': [], 'ipv6autoconf': False, 'gateway': '', 'ipv6gateway': '::', 'mtu': '1500', 'netmask': '', 'ipv4defaultroute': False, 'dhcpv4': False, 'dhcpv6': False, 'speed': 1000}, 'enp4s0f1': {'hwaddr': 'a0:36:9f:13:8c:67', 'addr': '', 'ipv4addrs': [], 'ipv6addrs': [], 'ipv6autoconf': False, 'gateway': '', 'ipv6gateway': '::', 'mtu': '1500', 'netmask': '', 'ipv4defaultroute': False, 'dhcpv4': False, 'dhcpv6': False, 'speed': 1000}, 'enp0s31f6': {'hwaddr': '34:97:f6:9c:64:1f', 'addr': '', 'ipv4addrs': [], 'ipv6addrs': [], 'ipv6autoconf': False, 'gateway': '', 'ipv6gateway': '::', 'mtu': '1500', 'netmask': '', 'ipv4defaultroute': False, 'dhcpv4': False, 'dhcpv6': False, 'speed': 1000}, 'ib0': {'hwaddr': '80:00:02:08:fe:80:00:00:00:00:00:00:58:49:56:0e:4e:57:0a:02', 'addr': '', 'ipv4addrs': [], 'ipv6addrs': [], 'ipv6autoconf': False, 'gateway': '', 'ipv6gateway': '::', 'mtu': '4092', 'netmask': '', 'ipv4defaultroute': False, 'dhcpv4': False, 'dhcpv6': False, 'speed': 0}}, 'vlans': {}, 'nameservers': ['fd00::464e:6dff:fe3a:dccd'], 'supportsIPv6': True}

Comment 15 Sandro Bonazzola 2020-07-08 08:26:03 UTC
This bugzilla is included in oVirt 4.4.1 release, published on July 8th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.