Bug 1773108

Summary: All 4.3 metal CI jobs are failing: Due to "provisioning time limit exceeded; the Packet team will investigate"
Product: OpenShift Container Platform Reporter: Greg Sheremeta <gshereme>
Component: InstallerAssignee: Scott Dodson <sdodson>
Installer sub component: openshift-installer QA Contact: David Sanz <dsanzmor>
Status: CLOSED DUPLICATE Docs Contact:
Severity: urgent    
Priority: urgent CC: ccoleman, dgoodwin, mifiedle, sdodson
Version: unspecifiedKeywords: TestBlocker
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1776011 (view as bug list) Environment:
Last Closed: 2019-12-13 13:59:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1775388, 1779755, 1782546    
Bug Blocks: 1776011    

Description Greg Sheremeta 2019-11-15 22:13:15 UTC
Description of problem:

metal jobs failing because "provisioning time limit exceeded; the Packet team will investigate"

Example:
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.2/183

Comment 2 Scott Dodson 2019-11-20 13:53:48 UTC
I met with Zac and Golden @ Packet yesterday to discuss this and they've informed me that this particular message indicates that the device was provisioned but that the host OS never reached running state so this is most likely a failure in PXE / Ignition processes. So it seems like there's definitely something to look into here.

In all cases where a running OS became available NetworkManager-wait-online.service is in a failed state because the second interface is not properly configured. This service now blocks other services and this introduces a 300 second delay in the boot process and with hosts rebooting multiple times this has potential to cause job failure. I'm attmepting to only configure the first interface and see if that produces better results.

Comment 3 Scott Dodson 2019-11-24 15:41:57 UTC
*** Bug 1772212 has been marked as a duplicate of this bug. ***

Comment 5 Clayton Coleman 2019-12-02 18:00:09 UTC
We cannot ship without this in 4.3, marking appropriately.

Comment 6 Scott Dodson 2019-12-13 13:59:02 UTC

*** This bug has been marked as a duplicate of bug 1775388 ***