Bug 1900378
| Summary: | Infinite loop on provisioning error when scaling up machineset with error in yaml config | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Lubov <lshilin> | ||||||
| Component: | Bare Metal Hardware Provisioning | Assignee: | Steven Hardy <shardy> | ||||||
| Bare Metal Hardware Provisioning sub component: | baremetal-operator | QA Contact: | Lubov <lshilin> | ||||||
| Status: | CLOSED WORKSFORME | Docs Contact: | |||||||
| Severity: | low | ||||||||
| Priority: | low | CC: | afasano, zbitter | ||||||
| Version: | 4.7 | Keywords: | Triaged | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | 4.7.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2020-12-14 06:51:15 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1886327 | ||||||||
| Attachments: |
|
||||||||
|
Description
Lubov
2020-11-22 16:39:31 UTC
Created attachment 1732105 [details]
example of configuration yaml
There is WA for the problem: scale down machineset, delete BMH, fix the yaml configuration and recreate bmh Repeatedly retrying the reprovisioning is expected. We don't currently make any distinction between configuration errors (this will never work) and transient errors - mainly because Ironic cannot be relied on to give us granular enough information about the cause. What you should see is an increasing error count, and increasing time between retries. It shouldn't be necessary to delete the BMH to work around this; simply updating with the correct spec should be enough. If it were not, that would be a bug in the baremetal-operator; however at first glance the code appears correct (and this was the subject of several previous bugs, so it should have been fairly thoroughly verified.) Did you attempt to update the BMH in place? (In reply to Zane Bitter from comment #3) > It shouldn't be necessary to delete the BMH to work around this; simply > updating with the correct spec should be enough. If it were not, that would > be a bug in the baremetal-operator; however at first glance the code appears > correct (and this was the subject of several previous bugs, so it should > have been fairly thoroughly verified.) Did you attempt to update the BMH in > place? My bad. You are right, after I fixed device hint for existing BMH configuration by $ oc edit BMHNAME the next attempt to provision the machine succeeded > Repeatedly retrying the reprovisioning is expected. We don't currently make > any distinction between configuration errors (this will never work) and > transient errors - mainly because Ironic cannot be relied on to give us > granular enough information about the cause. Should this bz be closed as NOTABUG or WONTFIX/CANTFIX? |