Description of problem (please be detailed as possible and provide log snippests): After deleting storageSystem there are the following errors: openshift-storage 24s Warning FailedCreate job/cluster-cleanup-job-ip-10-0-188-204.us-east-2.compute.internal Error creating: Pod "cluster-cleanup-job-ip-10-0-188-204.us-east-2.compute.--1-rlpxk" is invalid: [metadata.generateName: Invalid value: "cluster-cleanup-job-ip-10-0-188-204.us-east-2.compute.--1-": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), metadata.name: Invalid value: "cluster-cleanup-job-ip-10-0-188-204.us-east-2.compute.--1-rlpxk": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')] I believe the problem is due to name of pod cluster-cleanup-job-ip-10-0-188-204.us-east-2.compute.--1-rlpxk the -- after the period Version of all relevant components (if applicable): odf 4.9.132-ci Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? No Is there any workaround available to the best of your knowledge? I don't know what the impact beside the error Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes Can this issue reproduce from the UI? No If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Install OCP 4.9 and ODF 4.9 2. Create storageSystem 3. Delete storageSystem 4. issue oc get events -n openshift-storage --sort-by='.metadata.creationTimestamp' | grep cleanup Actual results: All cluster-cleanup-job pods fail to start with this error: 2m53s Warning FailedCreate job/cluster-cleanup-job-ip-10-0-188-204.us-east-2.compute.internal Error creating: Pod "cluster-cleanup-job-ip-10-0-188-204.us-east-2.compute.--1-r574v" is invalid: [metadata.generateName: Invalid value: "cluster-cleanup-job-ip-10-0-188-204.us-east-2.compute.--1-": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), metadata.name: Invalid value: "cluster-cleanup-job-ip-10-0-188-204.us-east-2.compute.--1-r574v": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')] The problem is probably due to the pod name which contain .-- Expected results: Cluster-cleanup-job should start Additional info:
Not a 4.9 blocker, moving it out. Nitin, PTAL once you have some BW
As far as I know, cleanup jobs are handled by rook, not ocs-operator will confirm with Jose and work/move accordingly.
This is still not a real blocker, to pushing it out of ODF 4.10.0. That said, I don't want to leave it hanging, so I'll try to poke at it again and hopefully update with some answers.
Took a quick glance, and yes it looks like rook is the one generating these jobs. Moving accordingly.
I tested with regex generator and yes seems like the error is due to `--` after `.` The format that rook uses to generate the name `cluster-cleanup-job-<node-name>` can you confirm if the name of the node is `ip-10-0-188-204.us-east-2.compute.--1-rlpxk`?
Subham, please look at the TruncateNodeNameForJob() method called here: https://github.com/rook/rook/blob/4ea8cc6224efb0e9c18ffb8a39a6955f32d79a60/pkg/operator/ceph/cluster/cleanup.go#L75
seems like we already have a fix for this but not present in 4.9 . We need to bp this to 4.9 https://github.com/rook/rook/pull/9312 Travis to confirm. Thanks
The fix Subham mentioned is included in 4.10 and newer, but not 4.9. Given how old and low priority this BZ is, I'm going to assume it's not critical to backport to 4.9 and we can close it as fixed in 4.10 and newer.