Created attachment 1646957 [details] Engine log Description of problem: Get Host Capabilities fails if Infiniband interfaces exist. Infiniband MAC address is very long and does not match the schema pattern. The failure happened while the host was being reinstalled. Currently host status is "Unassigned". Here is the pattern ^([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}$ Here is Infiniband MAC 80:00:02:08:fe:80:00:00:00:00:00:00:f4:52:14:03:00:8d:52:11 Version-Release number of selected component (if applicable): ovirt-engine-4.4.0-0.0.master.20191219143318.git65c2ffb.el7.noarch How reproducible: Do Hosts -> the host -> Maintenance -> Reinstall. Actual results: 2019-12-20 21:37:32,014-06 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-57) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM harrier command Get Host Capabilities failed: Internal JSON-RPC error: {'reason': "'80:00:02:08:FE:80:00:00:00:00:00:00:F4:52:14:03:00:8D:52:11' does not match '^([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}$'\n\nFailed validating 'pattern' in schema['properties']['interfaces']['items']['allOf'][0]['properties']['mac-address']:\n {'pattern': '^([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}$', 'type': 'string'}\n\nOn instance['interfaces'][6]['mac-address']:\n '80:00:02:08:FE:80:00:00:00:00:00:00:F4:52:14:03:00:8D:52:11'"} Expected results: Successfull Get Host Capabilities. Additional info: Here is an example of Infiniband MAC address 6: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UP group default qlen 256 link/infiniband 80:00:02:08:fe:80:00:00:00:00:00:00:f4:52:14:03:00:8d:52:11 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
Created attachment 1646958 [details] Engine packages
Created attachment 1646961 [details] Engine and host logs Similar errors in vdsm.log and supervdsm.log
As the host was unusable, I had to mask rdma.service on the host (to disable Infiniband) and remove Infiniband network from oVirt. Obviously, rebooted the host few times. After it the host reinstall completed. But our Infiniband configuration is lost. The configuration will be restored. But we need to be able to run reinstall while Infiniband is configured. BTW The failed reinstall also broke boot configuration (seems another bug). I had to reboot using older kernel.
The error comes from the host's python3-libnmstate.noarch package. See also https://bugzilla.redhat.com/show_bug.cgi?id=1740644 Relevant version python3-libnmstate-0.2.1-0.20191212.805git055f39e.el8.noarch How to reproduce (assuming that the host has Infiniband interfaces and rdma.service is started) # nmstatectl show Traceback (most recent call last): File "/usr/bin/nmstatectl", line 11, in <module> load_entry_point('nmstate==0.2.1', 'console_scripts', 'nmstatectl')() File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 59, in main return args.func(args) File "/usr/lib/python3.6/site-packages/nmstatectl/nmstatectl.py", line 203, in show state = _filter_state(libnmstate.show(), args.only) File "/usr/lib/python3.6/site-packages/libnmstate/netinfo.py", line 71, in show validator.validate(report) File "/usr/lib/python3.6/site-packages/libnmstate/validator.py", line 56, in validate js.validate(data, validation_schema) File "/usr/lib/python3.6/site-packages/jsonschema/validators.py", line 541, in validate cls(schema, *args, **kwargs).validate(instance) File "/usr/lib/python3.6/site-packages/jsonschema/validators.py", line 130, in validate raise error jsonschema.exceptions.ValidationError: '80:00:02:08:FE:80:00:00:00:00:00:00:F4:52:14:03:00:8D:52:11' does not match '^([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}$' Failed validating 'pattern' in schema['properties']['interfaces']['items']['allOf'][0]['properties']['mac-address']: {'pattern': '^([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}$', 'type': 'string'} On instance['interfaces'][6]['mac-address']: '80:00:02:08:FE:80:00:00:00:00:00:00:F4:52:14:03:00:8D:52:11'
nmstate (https://github.com/nmstate/nmstate) does not have Infiniband type implemented. It also requires MAC address to be 6 octets. nmstate shall have Infiniband added or, at least, accept MAC addresses 20 octets in size. Meanwhile, as a workaround apply the following patch on the host --- /usr/lib/python3.6/site-packages/libnmstate/schemas/operational-state.yaml.orig 2019-12-12 09:24:22.000000000 -0600 +++ /usr/lib/python3.6/site-packages/libnmstate/schemas/operational-state.yaml 2019-12-21 12:09:38.277135571 -0600 @@ -63,7 +63,7 @@ - down mac-address: type: string - pattern: "^([a-fA-F0-9]{2}:){5}[a-fA-F0-9]{2}$" + pattern: "^(([a-fA-F0-9]{2}:){5}|([a-fA-F0-9]{2}:){19})[a-fA-F0-9]{2}$" bridge-vlan-tag: type: integer minimum: 0 With the patch the host has been reinstalled successfully.
Submitted nmstate issue https://github.com/nmstate/nmstate/issues/655
Applying the patch fixes reinstall but other problems remain. nmstate returns "unknown" interface type. As the result vdsm cannot apply any network changes. We had to disable nmstate completely. # cat /etc/vdsm/vdsm.conf.d/99_local.conf [vars] # Control nmstate network backend provider. net_nmstate_enabled = false How is VdsmUseNmstate property related to net_nmstate_enabled? Disabling the property # engine-config -g VdsmUseNmstate VdsmUseNmstate: false version: 4.2 VdsmUseNmstate: false version: 4.3 VdsmUseNmstate: false version: 4.4 VdsmUseNmstate: false version: general does not disable nmstate usage by vdsm.
In summary, there are 3 related issues 1. nmstate does not support Infiniband 2. currently hosts with Infiniband have to have net_nmstate_enabled=false in vdsm config 3. VdsmUseNmstate=false does not disable nmstate usage by vdsm
Created attachment 1650884 [details] Unexpected failure of libnm when running the mainloop If the MAC address validation patch is applied, the engine log gets errors similar to one below 2019-12-21 13:14:23,022-06 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HostSetupNetworksVDSCommand] (default task-125) [7c65ff99-7cbd-4247-b577-e9149862f141] Error: VDSGenericException: VDSErrorException: Failed to HostSetupNetworksVDS, error = Internal JSON-RPC error: {'reason': 'Unexpected failure of libnm when running the mainloop: run execution'}, code = -32603 See the attached log file and nmstatectl output. Here what nmstatectl show on the host returns (all fields are wrong except the MAC address) - name: ib0 type: unknown state: down ipv4: enabled: false ipv6: enabled: false mac-address: 80:00:02:08:FE:80:00:00:00:00:00:00:F4:52:14:03:00:8D:52:11 mtu: 65520
(In reply to Alexander Murashkin from comment #10) > If the MAC address validation patch is applied, the engine log gets errors After applying the patch, have you restarted supervdsm service? > Here what nmstatectl show on the host returns (all fields are wrong except > the MAC address) > > - name: ib0 > type: unknown > state: down > ipv4: > enabled: false > ipv6: > enabled: false > mac-address: 80:00:02:08:FE:80:00:00:00:00:00:00:F4:52:14:03:00:8D:52:11 > mtu: 65520 Interfaces that are not managed by NetworkManager are showing with state `down`. Interface types that are not supported will show with type `unknown`. It seems that we also do not report the IP address if the interface is not managed by NM, which is wrong and should be fixed. I will open a BZ on that.
> After applying the patch, have you restarted supervdsm service? Yes, and the MAC format validation exception does not happen anymore. > Interfaces that are not managed by NetworkManager are showing with state `down`. Will the state attribute be used by oVirt in any way? NetworkManager (CLI at least) reports state as "connected", "disconnected", and "unmanaged". It seems that "connected" and "unmanaged" correspond to ip link state UP. # nmcli dev DEVICE TYPE STATE CONNECTION enp4s4 ethernet connected enp4s4 <--- the interface is "up" and managed by NetworkManager enp5s5 ethernet disconnected -- <--- the interface is "down" enp6s0 ethernet disconnected -- ib1 infiniband disconnected -- vnet0 tun disconnected -- vnet1 tun disconnected -- ;vdsmdummy; bridge unmanaged -- ovirtmgmt bridge unmanaged -- <--- the interface is "up" and is not managed enp10s0 ethernet unmanaged -- ib0 infiniband unmanaged -- lo loopback unmanaged -- br-int openvswitch unmanaged -- ovs-system openvswitch unmanaged --
Alexander, is the issue fixed for you?
Alexander, thanks a lot for reporting this bug. It is fixed now: nmstate-0.2.10-1.el8.noarch vdsm-4.40.20-5.gitd3e64e6a9.el8.x86_64 MainProcess|jsonrpc/0::DEBUG::2020-07-01 10:53:22,054::supervdsm_server::100::SuperVdsm.ServerCallback::(wrapper) return network_caps with {'networks': {'ovirtmgmt': {'ports': ['enp0s31f6'], 'stp': 'off', 'iface': 'ovirtmgmt', 'bridged': True, 'addr': '192.168.178.90', 'netmask': '255.255.255.0', 'ipv4addrs': ['192.168.178.90/24'], 'ipv6addrs': ['fd00::3697:f6ff:fe9c:641f/64', '2001:16b8:17b7:eb00:3697:f6ff:fe9c:641f/64'], 'ipv6autoconf': True, 'gateway': '192.168.178.1', 'ipv6gateway': 'fe80::464e:6dff:fe3a:dccd', 'ipv4defaultroute': True, 'mtu': '1500', 'switch': 'legacy', 'southbound': 'enp0s31f6', 'dhcpv4': True, 'dhcpv6': True}}, 'bondings': {}, 'bridges': {'ovirtmgmt': {'ports': ['enp0s31f6'], 'stp': 'off', 'addr': '192.168.178.90', 'ipv4addrs': ['192.168.178.90/24'], 'ipv6addrs': ['fd00::3697:f6ff:fe9c:641f/64', '2001:16b8:17b7:eb00:3697:f6ff:fe9c:641f/64'], 'ipv6autoconf': True, 'gateway': '192.168.178.1', 'ipv6gateway': 'fe80::464e:6dff:fe3a:dccd', 'mtu': '1500', 'netmask': '255.255.255.0', 'ipv4defaultroute': True, 'dhcpv4': True, 'dhcpv6': True, 'opts': {'ageing_time': '30000', 'multicast_query_use_ifaddr': '0', 'gc_timer': '4242', 'hello_time': '200', 'multicast_router': '1', 'nf_call_iptables': '0', 'group_addr': '01:80:c2:00:00:00', 'group_fwd_mask': '0x0', 'multicast_querier': '0', 'hash_max': '512', 'tcn_timer': '0', 'vlan_protocol': '0x8100', 'vlan_stats_per_port': '0', 'root_port': '0', 'vlan_filtering': '0', 'multicast_query_interval': '12500', 'bridge_id': '8000.3497f69c641f', 'max_age': '2000', 'nf_call_arptables': '0', 'multicast_startup_query_interval': '3125', 'multicast_stats_enabled': '0', 'multicast_query_response_interval': '1000', 'topology_change': '0', 'priority': '32768', 'multicast_mld_version': '1', 'hash_elasticity': '4', 'hello_timer': '0', 'default_pvid': '1', 'root_path_cost': '0', 'multicast_igmp_version': '2', 'stp_state': '0', 'multicast_startup_query_count': '2', 'topology_change_detected': '0', 'multicast_last_member_interval': '100', 'topology_change_timer': '0', 'root_id': '8000.3497f69c641f', 'forward_delay': '1500', 'multicast_membership_interval': '26000', 'multicast_querier_interval': '25500', 'vlan_stats_enabled': '0', 'multicast_snooping': '1', 'nf_call_ip6tables': '0', 'multicast_last_member_count': '2'}}}, 'nics': {'enp4s0f0': {'hwaddr': 'a0:36:9f:13:8c:66', 'addr': '', 'ipv4addrs': [], 'ipv6addrs': [], 'ipv6autoconf': False, 'gateway': '', 'ipv6gateway': '::', 'mtu': '1500', 'netmask': '', 'ipv4defaultroute': False, 'dhcpv4': False, 'dhcpv6': False, 'speed': 1000}, 'enp4s0f1': {'hwaddr': 'a0:36:9f:13:8c:67', 'addr': '', 'ipv4addrs': [], 'ipv6addrs': [], 'ipv6autoconf': False, 'gateway': '', 'ipv6gateway': '::', 'mtu': '1500', 'netmask': '', 'ipv4defaultroute': False, 'dhcpv4': False, 'dhcpv6': False, 'speed': 1000}, 'enp0s31f6': {'hwaddr': '34:97:f6:9c:64:1f', 'addr': '', 'ipv4addrs': [], 'ipv6addrs': [], 'ipv6autoconf': False, 'gateway': '', 'ipv6gateway': '::', 'mtu': '1500', 'netmask': '', 'ipv4defaultroute': False, 'dhcpv4': False, 'dhcpv6': False, 'speed': 1000}, 'ib0': {'hwaddr': '80:00:02:08:fe:80:00:00:00:00:00:00:58:49:56:0e:4e:57:0a:02', 'addr': '', 'ipv4addrs': [], 'ipv6addrs': [], 'ipv6autoconf': False, 'gateway': '', 'ipv6gateway': '::', 'mtu': '4092', 'netmask': '', 'ipv4defaultroute': False, 'dhcpv4': False, 'dhcpv6': False, 'speed': 0}}, 'vlans': {}, 'nameservers': ['fd00::464e:6dff:fe3a:dccd'], 'supportsIPv6': True}
This bugzilla is included in oVirt 4.4.1 release, published on July 8th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.