Possibly related to bug 1809345 and such, we have an AWS cluster which got stuck in 4.3.1 -> 4.3.8. Symptoms included a degraded ingress controller: $ yaml2json <cluster-scoped-resources/config.openshift.io/clusteroperators/ingress.yaml | jq '.status.conditions[] | select(.type == "Degraded")' | json2yaml lastTransitionTime: '2020-03-26T14:10:50Z' message: 'Some ingresscontrollers are degraded: axa-de-dev-internal,ago-fr-dev-internal,shared-srv-internal,axa-it-dev-internal,axa-rev-dev-internal,axa-rev-srv-internal,shared-dev-internal,shared-prod-lpl-internal,axa-ebp-fr-srv-internal,axa-fr-dev-internal,axa-ebp-fr-dev-internal,axa-fr-srv-internal,axa-de-srv-internal,axa-it-srv-internal,ago-fr-srv-internal,default,shared-preprod-lpl-internal' reason: IngressControllersDegraded status: 'True' type: Degraded $ grep phase namespaces/openshift-machine-config-operator/pods/etcd-quorum-guard-665864f9d4-*/*.yaml namespaces/openshift-machine-config-operator/pods/etcd-quorum-guard-665864f9d4-5fmbg/etcd-quorum-guard-665864f9d4-5fmbg.yaml: phase: Pending namespaces/openshift-machine-config-operator/pods/etcd-quorum-guard-665864f9d4-nsrst/etcd-quorum-guard-665864f9d4-nsrst.yaml: phase: Running namespaces/openshift-machine-config-operator/pods/etcd-quorum-guard-665864f9d4-sccs4/etcd-quorum-guard-665864f9d4-sccs4.yaml: phase: Pending $ yaml2json <namespaces/openshift-machine-config-operator/pods/etcd-quorum-guard-665864f9d4-5fmbg/etcd-quorum-guard-665864f9d4-5fmbg.yaml | jq .status.conditions | json2yaml - lastProbeTime: 'null' lastTransitionTime: '2020-03-26T14:08:42Z' message: '0/39 nodes are available: 3 node(s) didn''t match pod affinity/anti-affinity, 3 node(s) didn''t satisfy existing pods anti-affinity rules, 36 node(s) didn''t match node selector.' reason: Unschedulable status: 'False' type: PodScheduled Not clear yet why this cluster had 'localhost' for kubernetes.io/hostname; the node names were correct. For example: $ yaml2json <cluster-scoped-resources/core/nodes/ip-100-72-3-91.eu-central-1.compute.internal.yaml | jq -r .metadata.name ip-100-72-3-91.eu-central-1.compute.internal But the issue affected all the nodes: $ grep -hr kubernetes.io/hostname cluster-scoped-resources/core/nodes | uniq -c 39 kubernetes.io/hostname: localhost We'll hopefully get console logs for one of the nodes in the next 24h, in case that helps with debugging.
Looping over all nodes and using 'oc edit node/$NAME' to set kubernetes.io/hostname to the .metadata.name value unlocked the scheduling issue, and the update completed successfully.
I've also filed bug 1817774 requesting alerting to make this easier to discover going forward.
Through convestations on Slack, we found the following in the logs: ``` Mar 26 09:49:18 localhost systemd[1]: systemd-hostnamed.service: Main process exited, code=exited, status=1/FAILURE Mar 26 09:49:18 localhost systemd[1]: systemd-hostnamed.service: Failed with result 'exit-code'. Mar 26 09:49:18 localhost systemd[1]: systemd-hostnamed.service: Consumed 260ms CPU time Mar 27 13:56:47 localhost systemd[1]: Starting Hostname Service... Mar 27 13:56:48 localhost systemd-hostnamed[2655670]: Failed to read hostname and machine information: Is a directory Mar 27 13:56:48 localhost systemd[1]: Started Hostname Service. Mar 27 13:56:48 localhost systemd[1]: systemd-hostnamed.service: Main process exited, code=exited, status=1/FAILURE Mar 27 13:56:48 localhost systemd[1]: systemd-hostnamed.service: Failed with result 'exit-code'. Mar 27 13:56:48 localhost systemd[1]: systemd-hostnamed.service: Consumed 471ms CPU time ``` This failure is the same problem we saw in BZ#1746968 and deeper investigation showed that in this case there was a custom `fluentd` DaemonSet that was trying to mount `/etc/hostname/` as a directory. I'm going to leave this open for now (in case there is additional action we can take to prevent this kind of configuration), but my gut says that we are going to end up closing this BZ as a duplicate.
Clarifying the issue, fluend had: volumeMounts: - mountPath: /etc/docker-hostname name: dockerhostname readOnly: true volumes: - hostPath: path: /etc/hostname type: "" name: dockerhostname And CRI-O/runc said "I'm supposed to mount the host's /etc/hostname in, and it doesn't exist, so I'll create a directory." The behavior on non-existent sources is not covered in the latest runtime-spec release [1]. Looking in runc, it looks like it should fail on missing source directories [2]? Or maybe I'm reading that wrong. Not entirely clear on who's creating the missing path as a directory. [1]: https://github.com/opencontainers/runtime-spec/blob/v1.0.2/config.md#mounts [2]: https://github.com/opencontainers/runc/blob/2fc03cc11c775b7a8b2e48d7ee447cb9bef32ad0/libcontainer/rootfs_linux.go#L188
it seems to be cri-o digging through the source code, this is coming from line: https://github.com/cri-o/cri-o/blob/372d7127613f7a40dafd3ad8f6460d47f0c845df/server/container_create_linux.go#L1035 which is *very* old: https://github.com/cri-o/cri-o/commit/f3650533f031af3889a51011856f6b1f57bd6513
So is this WONTFIX for CRI-O? Or is it worth pushing something into the Kubernetes CRI space for "volume mount this if it exists on the host, but silently drop the mount request if the source does not exist yet"?
Just checked the VolumeMount spec [1] to be sure, and there is no knob for "but ignore this if the source does not exist" in the spec (yet). [1]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#volumemount-v1-core
*** Bug 1813895 has been marked as a duplicate of this bug. ***
Marking UpgradeBlocker based on duplicated bug 1813895
We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the UpgradeBlocker flag has been added to this bug. It will be removed if the assessment indicates that this should not block upgrade edges. The expectation is that the assignee answers these questions. Who is impacted? Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time What is the impact? Up to 2 minute disruption in edge routing Up to 90seconds of API downtime etcd loses quorum and you have to restore from backup How involved is remediation? Issue resolves itself after five minutes Admin uses oc to fix things Admin must SSH to hosts, restore from backups, or other non standard admin activities Is this a regression? No, itβs always been like this we just never noticed Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1
fluentd will need to work around if the hostname file does not exist. Perhaps an initcontainer to wait until the hostname file is present.
Retitled to make this more clearly about the fluentd ask. Also, this is not just update-specific, right? You could trigger the same issue simply by touching a MachineConfig which triggers a rolling node reboot.
@Trevor can you provide details of the version of cluster logging from which this error originated? This mount point was removed in 4.2 [1] because of [2] [1] https://github.com/openshift/cluster-logging-operator/pull/239/files#diff-977fde508b09ae1d524afe3125ac12f1L248 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1746968
my assessment is this is vanishingly small and something that's already out there in the wild, not going to be new to 4.4, could already be affecting 4.3.z to 4.3.z+1 upgrades and presumably we're not seeing that. closing as a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1746968 where we think this was fixed before 4.2.0 ever shipped. impacted users would be someone who installed logging 4.1 and hasn't upgraded logging since. And those users would already be potentially impacted on every node restart (upgrade or otherwise), not new/specific to 4.4 upgrades. *** This bug has been marked as a duplicate of bug 1746968 ***
This bugzilla can be closed