Bug 1746968
Summary: | rsyslog and fluentd write /etc/hostname at startup, causes systemd-hostnamed to not start | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Mike Fiedler <mifiedle> |
Component: | Logging | Assignee: | Jeff Cantrill <jcantril> |
Status: | CLOSED ERRATA | QA Contact: | Mike Fiedler <mifiedle> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.2.0 | CC: | akamra, anli, aos-bugs, bbreard, dustymabe, ewolinet, grodrigu, imcleod, jcantril, jligon, mfisher, miabbott, nelluri, nhosoi, nstielau, rmeggins, wking |
Target Milestone: | --- | Keywords: | TestBlocker |
Target Release: | 4.2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | aos-scalability-42 | ||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-10-16 06:38:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Mike Fiedler
2019-08-29 15:59:04 UTC
(In reply to Mike Fiedler from comment #0) > What can we capture if this happens again? Let's start with the full journal after the reboot. Maybe NetworkManager isn't able to set the hostname correctly and the error is logged? On the failing systems the hostname command as root returns "localhost". On working systems, hostname returns the real DNS name of the system Recovery on this latest instance was mv /etc/hostname /etc/hostname.bad systemctl restart systemd-hostnamed hostnamectl status and verify hostname is there systemctl restart kubelet Thought I tried that on the node yesterday and it did not recover, but maybe order was different. Removing testblocker since we've figured out how to rescue the nodes in this state. ok I think I've got some new information. It appears the rsyslog daemonset in the openshift-logging namespace has a volume mount of `/etc/hostname` into the container: ``` - hostPath: path: /etc/hostname type: "" name: dockerhostname ``` It appears if `/etc/hostname` doesn't exist on the host something will create it (as a directory in this case) and thus we get an `/etc/hostname` directory created at approximately the same time as the rsyslog container is started: ``` [root@ats-cl-sn2h2-w-c-zrknt etc]# stat /etc/hostname File: /etc/hostname Size: 6 Blocks: 0 IO Block: 4096 directory Device: 803h/2051d Inode: 142620217 Links: 2 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:etc_t:s0 Access: 2019-09-03 20:30:30.475026448 +0000 Modify: 2019-09-03 20:29:52.774519598 +0000 Change: 2019-09-03 20:29:52.774519598 +0000 Birth: - ``` From the journal: ``` Sep 03 20:29:52 ats-cl-sn2h2-w-c-zrknt.c.openshift-perfscale.internal hyperkube[1421]: I0903 20:29:52.771555 1421 kubelet_pods.go:151] container: openshift-logging/rsyslog-nw5gd/rsyslog podIP: "10.131.97.187" creating hosts mount: true ``` Out of this investigation comes two questions: - Should the volumemount for the rsyslog ds explicitly be for `type: File` and not a directory? - Why is the `/etc/hostname` file not there to begin with? Is that desired behavior? Jeff, could someone from the Logging team help out with this BZ? Adding Rich and Noriko not sure - we should either mount /etc/hostname readonly, or not mount it at all Compared rsyslog vs. fluentd. In terms of hostname, they look identical. I wonder why this issue is observed just for rsyslog... (or is it?) $ oc rsh $FLUENTD_POD sh-4.2# ls -l /etc/hostname -rw-r--r--. 1 root root 14 Sep 4 17:18 /etc/hostname sh-4.2# ls -lZ /etc/hostname -rw-r--r--. root root system_u:object_r:container_file_t:s0:c523,c879 /etc/hostname sh-4.2# cat /etc/hostname fluentd-ch4jb sh-4.2# mount | egrep hostname tmpfs on /etc/hostname type tmpfs (rw,nosuid,nodev,seclabel,mode=755) /dev/xvda3 on /etc/docker-hostname type xfs (ro,relatime,seclabel,attr2,inode64,prjquota) sh-4.2# ls -l /etc/docker-hostname/ total 0 volumeMounts: - mountPath: /etc/docker-hostname name: dockerhostname readOnly: true volumes: - hostPath: path: /etc/hostname type: "" name: dockerhostname $ oc rsh $RSYSLOG_POD sh-4.2# ls -l /etc/hostname -rw-r--r--. 1 root root 14 Sep 4 17:33 /etc/hostname sh-4.2# ls -lZ /etc/hostname -rw-r--r--. root root system_u:object_r:container_file_t:s0:c113,c896 /etc/hostname sh-4.2# cat /etc/hostname rsyslog-gvnrq sh-4.2# mount | egrep hostname tmpfs on /etc/hostname type tmpfs (rw,nosuid,nodev,seclabel,mode=755) /dev/xvda3 on /etc/docker-hostname type xfs (ro,relatime,seclabel,attr2,inode64,prjquota) sh-4.2# ls -l /etc/docker-hostname/ total 0 volumeMounts: - mountPath: /etc/docker-hostname name: dockerhostname readOnly: true volumes: - hostPath: path: /etc/hostname type: "" name: dockerhostname I'm not sure why we don't see a problem with fluentd - but we have several problems with rsyslog related to file/directory interactions with the host, configmaps, and secrets, that should affect fluentd the same way, but don't. I don't know why rsyslog has these problems that fluentd does not. For this particular problem, maybe we should just make sure all of our mounts that should not modify files/dirs are mounted readonly. (In reply to Rich Megginson from comment #12) > For this particular problem, maybe we should just make sure all of our > mounts that should not modify files/dirs are mounted readonly. For dockerhostname, this is not good enough? volumeMounts: - mountPath: /etc/docker-hostname name: dockerhostname readOnly: true ^^^^^^^^^^^^^^ Note: we mount the following mount paths without readOnly: true - mountPath: /run/log/journal name: runlogjournal - mountPath: /var/log name: varlog - mountPath: /var/run name: varrun - mountPath: /var/lib/rsyslog.pod name: filebufferstorage - mountPath: /etc/rsyslog/metrics name: collector-metrics > For dockerhostname, this is not good enough? I'm not sure how it is creating the directory on the host if it is mounted readOnly. Maybe we need to add a `type: File` as suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1746968#c6 > Note: we mount the following mount paths without readOnly: true I think we can mount /run/log/journal readonly - we should not be writing there. Same with /etc/rsyslog/metrics. Everything else in that list should be writable. (In reply to Rich Megginson from comment #14) > > For dockerhostname, this is not good enough? > > I'm not sure how it is creating the directory on the host if it is mounted > readOnly. Maybe we need to add a `type: File` as suggested in > https://bugzilla.redhat.com/show_bug.cgi?id=1746968#c6 I think it's the platform (kubernetes) that is creating the directory before the rsyslog container is started. Probably making it an optional mount and also adding `type: File` as I suggested before. (In reply to Dusty Mabe from comment #15) > I think it's the platform (kubernetes) that is creating the directory before the > rsyslog container is started. Probably making it an optional mount and also > adding `type: File` as I suggested before. Thanks for your comments, @Dusty. Adding `type: File` is done. Regarding the "optional mount", does that mean wa are supposed to introduce an environment variable or something to control whether "/etc/hostname" is mounted or not? (I'd appreciate that you correct me if I'm wrong...) Hi Noriko. Let me describe the behavior that we want: - the type of mount should be `type: File`. We don't want to mount in a directory and we certainly don't want any file or directory to be created by Kubernetes on the host if it doesn't already exist. - If the `/etc/hostname` file doesn't exist on the host the container should be able to be started and continue to operate normally. You said you already took care of that first bullet point. Can you test to make sure the 2nd bullet point is satisfied? If not can we satisfy that condition? We hit this today with fluentd as the collector, so a possible answer to comment 11 and comment 12. Still have never seen this on Azure or AWS though. Marking this testblocker again. Not sure what triggers systemd to restart systemd-hostnamed but that is when the /etc/hostname dir issue gets noticed. As soon as fluentd or rsyslogd start the directory gets created, but the node will stay in Ready state until something causes the restart. Not sure of the full chain of events, but the node is in trouble from the time the collector starts, it just doesn't realize it yet. *** Bug 1748149 has been marked as a duplicate of this bug. *** Verified on 4.2.0-0.nightly-2019-09-12-114308 - 250 node cluster is stable with fluentd running This bug affects both fluentd and rsyslogd and is already fixed 4.2. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 *** Bug 1777098 has been marked as a duplicate of this bug. *** *** Bug 1777098 has been marked as a duplicate of this bug. *** *** Bug 1817769 has been marked as a duplicate of this bug. *** |