Bug 2030488
Summary: | Numerous Azure CI jobs are Failing with Partially Rendered machinesets | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | rvanderp | |
Component: | Cloud Compute | Assignee: | Joel Speed <jspeed> | |
Cloud Compute sub component: | Other Providers | QA Contact: | Milind Yadav <miyadav> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | low | CC: | wking | |
Version: | 4.7 | |||
Target Milestone: | --- | |||
Target Release: | 4.10.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2047845 (view as bug list) | Environment: | ||
Last Closed: | 2022-03-10 16:32:46 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2047845 |
Description
rvanderp
2021-12-08 22:45:58 UTC
I meant to mention, the Azure account that services the CI jobs was checked to see if any limits were being approached and it appeared that the account was operating within limits. After talking to @wking in Slack, it appears this issue is occurring primarily on 4.6/4.7: From wking: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?search=Timed+out+waiting+for+node+count&maxAge=48h&type=junit&name=azure' | grep 'failures match' | sort periodic-ci-openshift-release-master-ci-4.10-e2e-azure-ovn-upgrade (all) - 97 runs, 77% failed, 1% of failures match = 1% impact periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-azure-upgrade (all) - 133 runs, 99% failed, 2% of failures match = 2% impact periodic-ci-openshift-release-master-ci-4.6-e2e-azure (all) - 5 runs, 20% failed, 100% of failures match = 20% impact periodic-ci-openshift-release-master-ci-4.7-e2e-azure-ovn (all) - 5 runs, 40% failed, 50% of failures match = 20% impact periodic-ci-openshift-release-master-nightly-4.6-e2e-azure (all) - 5 runs, 60% failed, 33% of failures match = 20% impact periodic-ci-openshift-release-master-nightly-4.7-e2e-azure (all) - 5 runs, 60% failed, 33% of failures match = 20% impact pull-ci-openshift-machine-api-provider-azure-main-e2e-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact release-openshift-origin-installer-launch-azure-modern (all) - 51 runs, 76% failed, 5% of failures match = 4% impact Still seeing this failed , in the last 6hrs log - https://search.ci.openshift.org/?search=Timed+out+waiting+for+node+count&maxAge=6h&context=1&type=junit&name=azure&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job I've looked through the test failures over the last week, there is only 1 failure for 4.10 which is a different symptom in my opinion. The current 4.10 failure shows the machine provisioned but it failed for some reason during the ignition phase. The other failures are all on older versions which we haven't backported the fix into. I think we can move this to verified acknowledged @Joel , we can move this to verified.(I look at 4.7 , mistakenly) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |