Created attachment 1669984 [details] ironic-inspector log Description of problem: OSP 16 DCN with OVS network backend and Spine-Leaf network topology. Scaled up node failed on introspection ......... Waiting for introspection to finish... Introspection of node attempt failed:dcn3-compute-0. Retrying 1 nodes that failed introspection. Attempt 1 of 3 Introspection of node attempt failed:4299a8c6-98b4-4935-9372-ecaaf2794190. Retrying 1 nodes that failed introspection. Attempt 2 of 3 Introspection of node attempt failed:4299a8c6-98b4-4935-9372-ecaaf2794190. Retrying 1 nodes that failed introspection. Attempt 3 of 3 Introspection of node attempt failed:4299a8c6-98b4-4935-9372-ecaaf2794190. Retry limit reached with 1 nodes still failing introspectionWaiting for messages on queue 'tripleo' with no timeout. Introspection completed with errors: Retry limit reached with 1 nodes still failing introspection non-zero return code ironic-inspector.log 2020-03-13 09:23:43.692 8 ERROR ironic_inspector.node_cache [-] Introspection for nodes ['4299a8c6-98b4-4935-9372-ecaaf2794190'] has timed out 2020-03-13 09:23:43.696 8 DEBUG ironic_inspector.node_cache [-] [node: 4299a8c6-98b4-4935-9372-ecaaf2794190 state waiting] Attempting to acquire l ock acquire_lock /usr/lib/python3.6/site-packages/ironic_inspector/node_cache.py:105 2020-03-13 09:23:43.696 8 DEBUG ironic_inspector.node_cache [-] [node: 4299a8c6-98b4-4935-9372-ecaaf2794190 state waiting] Successfully acquired l ock acquire_lock /usr/lib/python3.6/site-packages/ironic_inspector/node_cache.py:107 2020-03-13 09:23:43.696 8 DEBUG ironic_inspector.node_cache [-] [node: 4299a8c6-98b4-4935-9372-ecaaf2794190 state waiting] Successfully released l ock release_lock /usr/lib/python3.6/site-packages/ironic_inspector/node_cache.py:119 2020-03-13 09:23:43.697 8 DEBUG ironic_inspector.node_cache [-] [node: 4299a8c6-98b4-4935-9372-ecaaf2794190 state waiting] Executing fsm(waiting). process_event(timeout) fsm_event /usr/lib/python3.6/site-packages/ironic_inspector/node_cache.py:207 2020-03-13 09:23:43.697 8 INFO ironic_inspector.node_cache [-] [node: 4299a8c6-98b4-4935-9372-ecaaf2794190 state waiting] Updating node state: wai ting --> error 2020-03-13 09:23:43.698 8 DEBUG ironic_inspector.node_cache [-] [node: 4299a8c6-98b4-4935-9372-ecaaf2794190 state waiting] Committing fields: {'st ate': 'error'} _commit /usr/lib/python3.6/site-packages/ironic_inspector/node_cache.py:150 2020-03-13 09:23:43.708 8 DEBUG ironic_inspector.node_cache [-] [node: 4299a8c6-98b4-4935-9372-ecaaf2794190 state error] Committing fields: {'fini shed_at': datetime.datetime(2020, 3, 13, 9, 23, 43, 697300), 'error': 'Introspection timeout'} _commit /usr/lib/python3.6/site-packages/ironic_ins pector/node_cache.py:150 ironic-conductor.log 2020-03-13 15:19:06.047 7 DEBUG ironic.common.utils [req-dc3d6f4a-e35e-47e3-9db6-d203f4865372 - - - - -] Execution completed, command line is "ipmitool -I lanplus -H 172.16.10.1 -L ADMINISTRATOR -p 6234 -U admin -v -R 12 -N 5 -f /tmp/tmpfvihl7u4 power status" execute /usr/lib/python3.6/site-packages/ironic/common/utils.py:77 2020-03-13 15:19:06.047 7 DEBUG ironic.common.utils [req-dc3d6f4a-e35e-47e3-9db6-d203f4865372 - - - - -] Command stdout is: "Chassis Power is on " execute /usr/lib/python3.6/site-packages/ironic/common/utils.py:78 2020-03-13 15:19:06.047 7 DEBUG ironic.common.utils [req-dc3d6f4a-e35e-47e3-9db6-d203f4865372 - - - - -] Command stderr is: "Running Get PICMG Properties my_addr 0x20, transit 0, target 0x20 Error response 0xc1 from Get PICMG Properities Running Get VSO Capabilities my_addr 0x20, transit 0, target 0x20 Invalid completion code received: Invalid command Discovered IPMB address 0x0 " execute /usr/lib/python3.6/site-packages/ironic/common/utils.py:79 Version-Release number of selected component (if applicable): RHOS_TRUNK-16.0-RHEL-8-20200226.n.1 puppet-ironic-15.4.1-0.20191022165413.8fe6978.el8ost.noarch python3-ironicclient-3.1.0-0.20190926124316.04ef2d7.el8ost.noarch openstack-ironic-python-agent-builder-1.1.1-0.20191203040321.a34dfda.el8ost.noarch python3-ironic-inspector-client-3.7.0-0.20190923163033.d95a4cd.el8ost.noarch How reproducible: https://rhos-ci-staging-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/edge/job/DFG-edge-day2-16-rhel-virthost-3cont-2comp-2leafs-x-2comp-ovs-scale-up-down/7/consoleFull or https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/edge/job/DFG-edge-day2-16-rhel-virthost-3cont-2comp-2leafs-x-2comp-ovs-scale-up-down/2/console Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: The same test flow with OVN network backend works great. https://rhos-ci-staging-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/edge/job/DFG-edge-day2-16-rhel-virthost-3cont-2comp-2leafs-x-2comp-ovn-scale-up-down/1/consoleFull
Created attachment 1669985 [details] ironic-conductor log
Bogdan - note that is the neutron dhcp server you are looking at but this bug is an introspection issue, so that's the inspector dnsmasq.
@Bob, indeed thanks. I've retried it for the br-ctlplane but the result look unchanged - no replies logged by tcpdump
Could this perchance be a duplicate of bug 1814616? [stack@site-undercloud-0 ~]$ sudo ovs-vsctl -V ovs-vsctl (Open vSwitch) 2.11.0 DB Schema 7.16.1 [stack@site-undercloud-0 ~]$ cat /etc/redhat-release Red Hat Enterprise Linux release 8.1 (Ootpa)
Bogdan - interesting idea. We had been assuming that the problem with DHCP packets not being received by inspector dnsmasq was only happening with 16.1 and OVS 2.13, but it does look similar to this issue. We haven't had any other reports from 16.0 CI of this problem though.
so this is likely the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1814616#c9
(In reply to Bogdan Dobrelya from comment #7) > Could this perchance be a duplicate of bug 1814616? > I think this is not a duplicate, of bug 1814616 Ironic inspector's dnsmasq does receive requests here: [root@site-undercloud-0 ~]# tail -n 5 /var/log/containers/ironic-inspector/dnsmasq.log Mar 20 16:14:59 dnsmasq-dhcp[7]: no address range available for DHCP request via 10.35.64.34 Mar 20 16:15:09 dnsmasq-dhcp[7]: no address range available for DHCP request via 10.35.64.34 Mar 20 16:15:09 dnsmasq-dhcp[7]: no address range available for DHCP request via 10.35.64.34 Mar 20 16:15:20 dnsmasq-dhcp[7]: no address range available for DHCP request via 10.35.64.34 Mar 20 16:15:20 dnsmasq-dhcp[7]: no address range available for DHCP request via 10.35.64.34 [root@site-undercloud-0 ~]# cat /var/lib/config-data/puppet-generated/ironic_inspector/etc/ironic-inspector/dnsmasq.conf port=0 interface=br-ctlplane dhcp-range=set:leaf0,192.168.24.100,192.168.24.190,255.255.255.0,10m dhcp-option=tag:leaf0,option:router,192.168.24.254 dhcp-option-force=tag:leaf0,option:mtu,1500 dhcp-range=set:leaf1,192.168.34.100,192.168.34.190,255.255.255.0,10m dhcp-option=tag:leaf1,option:router,192.168.34.254 dhcp-option-force=tag:leaf1,option:mtu,1500 dhcp-range=set:leaf2,192.168.44.100,192.168.44.190,255.255.255.0,10m dhcp-option=tag:leaf2,option:router,192.168.44.254 dhcp-option-force=tag:leaf2,option:mtu,1500 dhcp-sequential-ip dhcp-match=ipxe,175 dhcp-match=set:efi,option:client-arch,7 dhcp-match=set:efi,option:client-arch,9 dhcp-match=set:efi,option:client-arch,11 # dhcpv6s for Client System Architecture Type (61) dhcp-match=set:efi6,option6:61,0007 dhcp-match=set:efi6,option6:61,0009 dhcp-match=set:efi6,option6:61,0011 dhcp-userclass=set:ipxe6,iPXE # Client is already running iPXE; move to next stage of chainloading dhcp-boot=tag:ipxe,http://192.168.24.1:8088/inspector.ipxe dhcp-option=tag:ipxe6,option6:bootfile-url,http://192.168.24.1:8088/inspector.ipxe # Client is PXE booting over EFI without iPXE ROM; send EFI version of iPXE chainloader dhcp-boot=tag:efi,tag:!ipxe,ipxe.efi dhcp-option=tag:efi6,tag:!ipxe6,option6:bootfile-url,tftp://192.168.24.1/ipxe.efi # Client is running PXE over BIOS; send BIOS version of iPXE chainloader dhcp-boot=undionly.kpxe,localhost.localdomain,192.168.24.1 dhcp-hostsdir=/var/lib/ironic-inspector/dhcp-hostsdir
If I add log-dhcp I can also see dnsmasq-dhcp[7]: no address range available for DHCP request via 192.168.54.254 logged when I do virsh reset dcn3-compute-0 so this probably a configuration issue indeed? ()[root@site-undercloud-0 /]# grep range /etc/ironic-inspector/dnsmasq.conf dhcp-range=172.20.0.100,172.20.0.110,255.255.255.0,10m dhcp-range=set:leaf0,192.168.24.100,192.168.24.190,255.255.255.0,10m dhcp-range=set:leaf1,192.168.34.100,192.168.34.190,255.255.255.0,10m dhcp-range=set:leaf2,192.168.44.100,192.168.44.190,255.255.255.0,10m [root@site-undercloud-0 stack]# grep 192.168.54. undercloud.conf cidr = 192.168.54.0/24 dhcp_start = 192.168.54.10 dhcp_end = 192.168.54.90 inspection_iprange = 192.168.54.100,192.168.54.190 gateway = 192.168.54.254
That is not bug just configuration issue in undercloud.conf Should be re-installed UC with new subnet in subnets = leaf0,leaf1,leaf2,leaf3 Test passed https://rhos-ci-staging-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/edge/job/DFG-edge-day2-16-rhel-virthost-3cont-2comp-2leafs-x-2comp-ovs-scale-up-down/16/ I appreciate your help and time Thank you Yuri