Bug 1773108 - All 4.3 metal CI jobs are failing: Due to "provisioning time limit exceeded; the Packet team will investigate"
Summary: All 4.3 metal CI jobs are failing: Due to "provisioning time limit exceeded; ...
Keywords:
Status: CLOSED DUPLICATE of bug 1775388
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: unspecified
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.4.0
Assignee: Scott Dodson
QA Contact: David Sanz
URL:
Whiteboard:
: 1772212 (view as bug list)
Depends On: 1775388 1779755 1782546
Blocks: 1776011
TreeView+ depends on / blocked
 
Reported: 2019-11-15 22:13 UTC by Greg Sheremeta
Modified: 2019-12-13 13:59 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1776011 (view as bug list)
Environment:
Last Closed: 2019-12-13 13:59:02 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift installer pull 2688 'None' closed WIP Bug 1773108: upi/metal Only configure the first interface 2020-08-06 02:23:21 UTC
Github openshift installer pull 2695 'None' closed upi/metal various fixes 2020-08-06 02:23:20 UTC

Description Greg Sheremeta 2019-11-15 22:13:15 UTC
Description of problem:

metal jobs failing because "provisioning time limit exceeded; the Packet team will investigate"

Example:
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.2/183

Comment 2 Scott Dodson 2019-11-20 13:53:48 UTC
I met with Zac and Golden @ Packet yesterday to discuss this and they've informed me that this particular message indicates that the device was provisioned but that the host OS never reached running state so this is most likely a failure in PXE / Ignition processes. So it seems like there's definitely something to look into here.

In all cases where a running OS became available NetworkManager-wait-online.service is in a failed state because the second interface is not properly configured. This service now blocks other services and this introduces a 300 second delay in the boot process and with hosts rebooting multiple times this has potential to cause job failure. I'm attmepting to only configure the first interface and see if that produces better results.

Comment 3 Scott Dodson 2019-11-24 15:41:57 UTC
*** Bug 1772212 has been marked as a duplicate of this bug. ***

Comment 5 Clayton Coleman 2019-12-02 18:00:09 UTC
We cannot ship without this in 4.3, marking appropriately.

Comment 6 Scott Dodson 2019-12-13 13:59:02 UTC

*** This bug has been marked as a duplicate of bug 1775388 ***


Note You need to log in before you can comment on or make changes to this bug.