Bug 1900538
| Summary: | [OSP] mapi_instance_create_failed doesn't work on openstack | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Milind Yadav <miyadav> |
| Component: | Cloud Compute | Assignee: | ShiftStack Bugwatcher <shiftstack-bugwatcher> |
| Cloud Compute sub component: | OpenStack Provider | QA Contact: | Jon Uriarte <juriarte> |
| Status: | CLOSED DEFERRED | Docs Contact: | |
| Severity: | low | ||
| Priority: | low | CC: | eduen, m.andre, mbooth, pprinett, zhsun |
| Version: | 4.6 | Keywords: | Triaged |
| Target Milestone: | --- | ||
| Target Release: | 4.8.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1890456 | Environment: | |
| Last Closed: | 2023-03-09 01:00:15 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1890456 | ||
| Bug Blocks: | |||
|
Description
Milind Yadav
2020-11-23 10:17:50 UTC
Reproduced with: ``` $ oc get machineset -n openshift-machine-api -o json \ | jq '.items[0].spec.template.spec.providerSpec.value.flavor="invalid"' \ | jq '.items[0].spec.replicas=4' \ | oc apply -f - $ sleep 5 $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- \ curl -sSk -H "Authorization: Bearer $(oc sa get-token prometheus-k8s -n openshift-monitoring)" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' \ | jq \ | grep "mapi_instance_" ``` One failed machine on AWS generates a Prometheus value; the same information does not appear on OpenStack's Prometheus. Raising the severity, as this looks like a pretty sensitive missing piece of OCP's observability. Marked as "blocker-" because it's not a regression (issue found in 4.6+) In contrast to CAPA[1], CAPO doesn't seem to be instrumented to report create, update or delete failures to Prometheus. The team will have to decide whether to introduce the change before the upstream rebase. [1]: https://github.com/openshift/cluster-api-provider-aws/blob/2d4e76faac97d3e4a26d2685d8efd78173bae52e/pkg/actuators/machine/reconciler.go#L77 Since this is not a regression, and there doesn't seem to be customer cases attached, we postpone tackling this bug until after we complete the rebase work we are planning for CAPO. I am restoring the original priority and severity (low/low). Removing the Triaged keyword because: * the QE automation assessment (flag qe_test_coverage) is missing OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira. https://issues.redhat.com/browse/OCPBUGS-8820 |