Created attachment 1187474 [details] Example of failures Introspection hangs on OSPd9 in my bare metal lab (PowerEdge R320). We're using the following iPXE script (generated by OSPd, stripped of kernel parameters): #!ipxe :retry_dhcp dhcp || goto retry_dhcp :retry_boot imgfree kernel --timeout 600000 http://172.21.64.1:8088/agent.kernel initrd=agent.ramdisk || goto retry_boot initrd --timeout 600000 http://172.21.64.1:8088/agent.ramdisk || goto retry_boot boot When I start introspection, the nodes gets DHCP successfully, then proceeds with iPXE (second screenshot). After downloading the iPXE scripts (from the same host http://172.21.64.1:8088), it starts displaying 3 kinds of messages (first screenshot): "file not found" (this is not true), "not enough space", "connection timed out". Note that the last error happens after it starts downloading the kernel. Even more interesting: it always breaks on one of 2 positions! Name : ipxe-bootimgs Arch : noarch Version : 20160127 Release : 1.git6366fa7a.el7 Size : 3.4 M Repo : installed From repo : rhelosp-9.0-director-puddle
Created attachment 1187475 [details] iPXE versions as seen by the machine
We are seeing similar errors and hangs with introspection in CI jobs (happening on OVB and real baremetal hardware) - but only with master jobs - mitaka jobs are passing atm. The job logs show: Introspection completed with errors: 2a296b82-47d2-446f-961a-e8f34e0b21ea: Introspection timeout 678319e5-a270-49af-ab2f-f88aa52b9f7c: Introspection timeout ca5e6f69-32e8-4580-88ba-602685a3a2f1: Introspection timeout ad54e211-f5ad-41cf-833a-10a556f2181a: Introspection timeout Setting nodes for introspection to manageable... Starting introspection of manageable nodes Waiting for introspection to finish... Introspection for UUID 2a296b82-47d2-446f-961a-e8f34e0b21ea finished with error: Introspection timeout Introspection for UUID 678319e5-a270-49af-ab2f-f88aa52b9f7c finished with error: Introspection timeout Introspection for UUID ca5e6f69-32e8-4580-88ba-602685a3a2f1 finished with error: Introspection timeout Introspection for UUID ad54e211-f5ad-41cf-833a-10a556f2181a finished with error: Introspection timeout No nodes in manageable state found for introspection Looking at the console, I see repeated messages: 'Connection timeout...' Attaching screenshot of console
Created attachment 1187565 [details] Screenshot of OVB instance console during introspection
I'm sorry for the noise, my problem was due to some missing steps ended up in missing images. Ronelle, please double-check that the agent image is actually present in /httpboot.
Dmitry, I checked the undercloud: [stack@undercloud ~]$ ls -la /httpboot total 339516 drwxr-xr-x. 2 ironic ironic 66 Aug 4 15:08 . dr-xr-xr-x. 19 root root 4096 Aug 4 14:50 .. -rwxr-xr-x. 1 root root 5158704 Aug 4 15:08 agent.kernel -rw-r--r--. 1 root root 342493754 Aug 4 15:08 agent.ramdisk -rw-r--r--. 1 ironic ironic 465 Aug 4 14:52 inspector.ipxe
Watching the console during introspection, it gets inspector.ipxe just fine but times out on agent.kernel
OK - I think this is due to the MTU values of the interface used for provisioning being set back to 1500 during undercloud install. This overwrites the value set in the OVB setup. Resetting value before introspection ... under test
Confirmed it was an MTU issue.
Thanks, so I'm closing it. Do you think we could update some documentation mentioning this potential MTU issue?
The instructions to modify MTU are doc'ed. (and I did have those modifications made). The issue is that undercloud-install overwrites them - that could possibly be doc'ed. It's only an issue with OVB and some hardware platforms