Bug 1984860 - Baremetal IPI is permafailing - workers are failing to PXE
Summary: Baremetal IPI is permafailing - workers are failing to PXE
Keywords:
Status: CLOSED DUPLICATE of bug 1984576
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Tomas Sedovic
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-22 11:04 UTC by Stephen Benjamin
Modified: 2021-07-22 11:09 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-22 11:09:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Stephen Benjamin 2021-07-22 11:04:27 UTC
We're seeing increased rates of workers failing to provision, with introspection timing out.  This started with this build https://amd64.ocp.releases.ci.openshift.org/releasestream/4.9.0-0.nightly/release/4.9.0-0.nightly-2021-07-21-130417, and appears to be caused by the provisioning network optional PR - https://github.com/openshift/installer/pull/5015

This is causing nightly builds to be rejected, this needs to be reverted or fixed ASAP.


Example job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-ipi/1417945225736753152

baremetal-operator reports inspection times out[1]:

{"level":"info","ts":1626905438.6808221,"logger":"provisioner.ironic","msg":"current provision state","host":"openshift-machine-api~ostest-worker-0","lastError":"timeout reached while inspecting the node","current":"inspect failed","target":"manageable"}

libvirt serial console shows a PXE timeout[2]:

>>Start PXE over IPv4.
  PXE-E18: Server response timeout.
BdsDxe: failed to load Boot0001 "UEFI PXEv4 (MAC:0098F104545C)" from PciRoot(0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(0098F104545C,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0): Not Found

>>Start PXE over IPv6.
  PXE-E16: No valid offer received.
BdsDxe: failed to load Boot0002 "UEFI PXEv6 (MAC:0098F104545C)" from PciRoot(0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(0098F104545C,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000): Not Found

>>Start HTTP Boot over IPv4.
  Error: Could not retrieve NBP file size from HTTP server.

  Error: Server response timeout.
BdsDxe: failed to load Boot0003 "UEFI HTTPv4 (MAC:0098F104545C)" from PciRoot(0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(0098F104545C,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)/Uri(): Not Found

>>Start HTTP Boot over IPv6.
  Error: Could not retrieve NBP file size from HTTP server.

  Error: Unexpected network error.
BdsDxe: failed to load Boot0004 "UEFI HTTPv6 (MAC:0098F104545C)" from PciRoot(0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(0098F104545C,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000)/Uri(): Not Found

>>Start PXE over IPv4.
  PXE-E16: No valid offer received.
BdsDxe: failed to load Boot0005 "UEFI PXEv4 (MAC:0098F104545E)" from PciRoot(0x0)/Pci(0x2,0x1)/Pci(0x0,0x0)/MAC(0098F104545E,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0): Not Found

>>Start PXE over IPv6.
  PXE-E16: No valid offer received.
BdsDxe: failed to load Boot0006 "UEFI PXEv6 (MAC:0098F104545E)" from PciRoot(0x0)/Pci(0x2,0x1)/Pci(0x0,0x0)/MAC(0098F104545E,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000): Not Found

>>Start HTTP Boot over IPv4.....
  Error: Could not retrieve NBP file size from HTTP server.

  Error: Server response timeout.
BdsDxe: failed to load Boot0007 "UEFI HTTPv4 (MAC:0098F104545E)" from PciRoot(0x0)/Pci(0x2,0x1)/Pci(0x0,0x0)/MAC(0098F104545E,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)/Uri(): Not Found

>>Start HTTP Boot over IPv6.
  Error: Could not retrieve NBP file size from HTTP server.

  Error: Unexpected network error.
BdsDxe: failed to load Boot0008 "UEFI HTTPv6 (MAC:0098F104545E)" from PciRoot(0x0)/Pci(0x2,0x1)/Pci(0x0,0x0)/MAC(0098F104545E,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000)/Uri(): Not Found
BdsDxe: No bootable option or device was found.
BdsDxe: Press any key to enter the Boot Manager Menu.





[1] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-ipi/1417945225736753152/artifacts/e2e-metal-ipi/gather-extra/artifacts/pods/openshift-machine-api_metal3-5679654987-4gw68_metal3-baremetal-operator.log
[2] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-ipi/1417945225736753152/artifacts/e2e-metal-ipi/baremetalds-devscripts-gather/artifacts/

Comment 1 Derek Higgins 2021-07-22 11:09:52 UTC
Looks like a dup of bz#1984576

*** This bug has been marked as a duplicate of bug 1984576 ***


Note You need to log in before you can comment on or make changes to this bug.