Bug 1817769 - fluentd volume-mounting /etc/hostname before the file exists leads to directory creation and eventually: kubernetes.io/hostname: localhost
Summary: fluentd volume-mounting /etc/hostname before the file exists leads to directo...
Keywords:
Status: CLOSED DUPLICATE of bug 1746968
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.5.0
Assignee: Jeff Cantrill
QA Contact: Anping Li
URL:
Whiteboard:
: 1813895 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-26 23:35 UTC by W. Trevor King
Modified: 2021-08-09 07:30 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-23 16:08:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1746968 0 unspecified CLOSED rsyslog and fluentd write /etc/hostname at startup, causes systemd-hostnamed to not start 2023-03-24 15:21:06 UTC
Red Hat Bugzilla 1817774 0 unspecified CLOSED Alert if any node has: kubernetes.io/hostname: localhost 2021-02-22 00:41:40 UTC

Internal Links: 1818086

Description W. Trevor King 2020-03-26 23:35:21 UTC
Possibly related to bug 1809345 and such, we have an AWS cluster which got stuck in 4.3.1 -> 4.3.8.  Symptoms included a degraded ingress controller:

$ yaml2json <cluster-scoped-resources/config.openshift.io/clusteroperators/ingress.yaml | jq '.status.conditions[] | select(.type == "Degraded")' | json2yaml 
lastTransitionTime: '2020-03-26T14:10:50Z'
message: 'Some ingresscontrollers are degraded: axa-de-dev-internal,ago-fr-dev-internal,shared-srv-internal,axa-it-dev-internal,axa-rev-dev-internal,axa-rev-srv-internal,shared-dev-internal,shared-prod-lpl-internal,axa-ebp-fr-srv-internal,axa-fr-dev-internal,axa-ebp-fr-dev-internal,axa-fr-srv-internal,axa-de-srv-internal,axa-it-srv-internal,ago-fr-srv-internal,default,shared-preprod-lpl-internal'
reason: IngressControllersDegraded
status: 'True'
type: Degraded
$ grep phase namespaces/openshift-machine-config-operator/pods/etcd-quorum-guard-665864f9d4-*/*.yaml
namespaces/openshift-machine-config-operator/pods/etcd-quorum-guard-665864f9d4-5fmbg/etcd-quorum-guard-665864f9d4-5fmbg.yaml:  phase: Pending
namespaces/openshift-machine-config-operator/pods/etcd-quorum-guard-665864f9d4-nsrst/etcd-quorum-guard-665864f9d4-nsrst.yaml:  phase: Running
namespaces/openshift-machine-config-operator/pods/etcd-quorum-guard-665864f9d4-sccs4/etcd-quorum-guard-665864f9d4-sccs4.yaml:  phase: Pending
$ yaml2json <namespaces/openshift-machine-config-operator/pods/etcd-quorum-guard-665864f9d4-5fmbg/etcd-quorum-guard-665864f9d4-5fmbg.yaml | jq .status.conditions | json2yaml 
- lastProbeTime: 'null'
  lastTransitionTime: '2020-03-26T14:08:42Z'
  message: '0/39 nodes are available: 3 node(s) didn''t match pod affinity/anti-affinity, 3 node(s) didn''t satisfy existing
    pods anti-affinity rules, 36 node(s) didn''t match node selector.'
  reason: Unschedulable
  status: 'False'
  type: PodScheduled

Not clear yet why this cluster had 'localhost' for kubernetes.io/hostname; the node names were correct.  For example:

$ yaml2json <cluster-scoped-resources/core/nodes/ip-100-72-3-91.eu-central-1.compute.internal.yaml | jq -r .metadata.name
ip-100-72-3-91.eu-central-1.compute.internal

But the issue affected all the nodes:

$ grep -hr kubernetes.io/hostname cluster-scoped-resources/core/nodes | uniq -c
     39     kubernetes.io/hostname: localhost

We'll hopefully get console logs for one of the nodes in the next 24h, in case that helps with debugging.

Comment 1 W. Trevor King 2020-03-26 23:37:14 UTC
Looping over all nodes and using 'oc edit node/$NAME' to set kubernetes.io/hostname to the .metadata.name value unlocked the scheduling issue, and the update completed successfully.

Comment 2 W. Trevor King 2020-03-26 23:48:41 UTC
I've also filed bug 1817774 requesting alerting to make this easier to discover going forward.

Comment 3 Micah Abbott 2020-03-27 15:56:56 UTC
Through convestations on Slack, we found the following in the logs:

```
Mar 26 09:49:18 localhost systemd[1]: systemd-hostnamed.service: Main process exited, code=exited, status=1/FAILURE
Mar 26 09:49:18 localhost systemd[1]: systemd-hostnamed.service: Failed with result 'exit-code'.
Mar 26 09:49:18 localhost systemd[1]: systemd-hostnamed.service: Consumed 260ms CPU time
Mar 27 13:56:47 localhost systemd[1]: Starting Hostname Service...
Mar 27 13:56:48 localhost systemd-hostnamed[2655670]: Failed to read hostname and machine information: Is a directory
Mar 27 13:56:48 localhost systemd[1]: Started Hostname Service.
Mar 27 13:56:48 localhost systemd[1]: systemd-hostnamed.service: Main process exited, code=exited, status=1/FAILURE
Mar 27 13:56:48 localhost systemd[1]: systemd-hostnamed.service: Failed with result 'exit-code'.
Mar 27 13:56:48 localhost systemd[1]: systemd-hostnamed.service: Consumed 471ms CPU time
```

This failure is the same problem we saw in BZ#1746968 and deeper investigation showed that in this case there was a custom `fluentd` DaemonSet that was trying to mount `/etc/hostname/` as a directory.

I'm going to leave this open for now (in case there is additional action we can take to prevent this kind of configuration), but my gut says that we are going to end up closing this BZ as a duplicate.

Comment 4 W. Trevor King 2020-03-27 16:30:48 UTC
Clarifying the issue, fluend had:

        volumeMounts:
        - mountPath: /etc/docker-hostname
          name: dockerhostname
          readOnly: true
      volumes:
      - hostPath:
          path: /etc/hostname
          type: ""
        name: dockerhostname

And CRI-O/runc said "I'm supposed to mount the host's /etc/hostname in, and it doesn't exist, so I'll create a directory."  The behavior on non-existent sources is not covered in the latest runtime-spec release [1].  Looking in runc, it looks like it should fail on missing source directories [2]?  Or maybe I'm reading that wrong.  Not entirely clear on who's creating the missing path as a directory.

[1]: https://github.com/opencontainers/runtime-spec/blob/v1.0.2/config.md#mounts
[2]: https://github.com/opencontainers/runc/blob/2fc03cc11c775b7a8b2e48d7ee447cb9bef32ad0/libcontainer/rootfs_linux.go#L188

Comment 5 Peter Hunt 2020-03-27 17:18:04 UTC
it seems to be cri-o

digging through the source code, this is coming from line:
https://github.com/cri-o/cri-o/blob/372d7127613f7a40dafd3ad8f6460d47f0c845df/server/container_create_linux.go#L1035

which is *very* old:
https://github.com/cri-o/cri-o/commit/f3650533f031af3889a51011856f6b1f57bd6513

Comment 6 W. Trevor King 2020-03-27 18:46:58 UTC
So is this WONTFIX for CRI-O?  Or is it worth pushing something into the Kubernetes CRI space for "volume mount this if it exists on the host, but silently drop the mount request if the source does not exist yet"?

Comment 7 W. Trevor King 2020-03-27 19:01:32 UTC
Just checked the VolumeMount spec [1] to be sure, and there is no knob for "but ignore this if the source does not exist" in the spec (yet).

[1]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#volumemount-v1-core

Comment 8 Daneyon Hansen 2020-03-31 01:34:32 UTC
*** Bug 1813895 has been marked as a duplicate of this bug. ***

Comment 9 Mike Fiedler 2020-04-01 13:48:38 UTC
Marking UpgradeBlocker based on duplicated bug 1813895

Comment 10 Scott Dodson 2020-04-07 18:18:22 UTC
We're asking the following questions to evaluate whether or not this bug warrants blocking an upgrade edge from either the previous X.Y or X.Y.Z. The ultimate goal is to avoid delivering an update which introduces new risk or reduces cluster functionality in any way. Sample answers are provided to give more context and the UpgradeBlocker flag has been added to this bug. It will be removed if the assessment indicates that this should not block upgrade edges. The expectation is that the assignee answers these questions.

Who is impacted?
  Customers upgrading from 4.y.Z to 4.y+1.z running on GCP with thousands of namespaces, approximately 5% of the subscribed fleet
  All customers upgrading from 4.y.z to 4.y+1.z fail approximately 10% of the time
What is the impact?
  Up to 2 minute disruption in edge routing
  Up to 90seconds of API downtime
  etcd loses quorum and you have to restore from backup
How involved is remediation?
  Issue resolves itself after five minutes
  Admin uses oc to fix things
  Admin must SSH to hosts, restore from backups, or other non standard admin activities
Is this a regression?
  No, it’s always been like this we just never noticed
  Yes, from 4.y.z to 4.y+1.z Or 4.y.z to 4.y.z+1

Comment 11 Ryan Phillips 2020-04-07 18:42:37 UTC
fluentd will need to work around if the hostname file does not exist. Perhaps an initcontainer to wait until the hostname file is present.

Comment 12 W. Trevor King 2020-04-08 20:08:23 UTC
Retitled to make this more clearly about the fluentd ask.  Also, this is not just update-specific, right?  You could trigger the same issue simply by touching a MachineConfig which triggers a rolling node reboot.

Comment 13 Jeff Cantrill 2020-04-09 13:05:41 UTC
@Trevor can you provide details of the version of cluster logging from which this error originated?  This mount point was removed in 4.2 [1] because of [2] 


[1] https://github.com/openshift/cluster-logging-operator/pull/239/files#diff-977fde508b09ae1d524afe3125ac12f1L248
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1746968

Comment 15 Ben Parees 2020-04-23 16:08:03 UTC
my assessment is this is vanishingly small and something that's already out there in the wild, not going to be new to 4.4, could already be affecting 4.3.z to 4.3.z+1 upgrades and presumably we're not seeing that.

closing as a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1746968 where we think this was fixed before 4.2.0 ever shipped.

impacted users would be someone who installed logging 4.1 and hasn't upgraded logging since.

And those users would already be potentially impacted on every node restart (upgrade or otherwise), not new/specific to 4.4 upgrades.

*** This bug has been marked as a duplicate of bug 1746968 ***

Comment 17 Patrick Azogni 2021-08-09 07:30:01 UTC
This bugzilla can be closed


Note You need to log in before you can comment on or make changes to this bug.