Description of problem: Customer is facing a problem with fluent not able to read the logs due to the symlinks missing, possible resulting in the following error. Nov 13 14:31:09 server atomic-openshift-node: E1113 14:31:09.000898 28872 logs.go:368] Unexpected fsnotify event: "/var/log/pods/dc8b8bb6-e13f-11e8-aa0f-001a4a409202/mule_0.log": RENAME, retrying... Nov 13 14:31:09 server atomic-openshift-node: E1113 14:31:09.000898 28872 logs.go:368] Unexpected fsnotify event: "/var/log/pods/dc8b8bb6-e13f-11e8-aa0f-001a4a409202/mule_0.log": RENAME, retrying... Nov 13 14:31:13 server atomic-openshift-node: E1113 14:31:13.819755 28872 logs.go:368] Unexpected fsnotify event: "/var/log/pods/dc8b8bb6-e13f-11e8-aa0f-001a4a409202/mule_0.log": RENAME, retrying... Nov 13 14:31:13 server atomic-openshift-node: E1113 14:31:13.819755 28872 logs.go:368] Unexpected fsnotify event: "/var/log/pods/dc8b8bb6-e13f-11e8-aa0f-001a4a409202/mule_0.log": RENAME, retrying... Nov 13 14:31:17 server atomic-openshift-node: E1113 14:31:17.239105 28872 logs.go:368] Unexpected fsnotify event: "/var/log/pods/dc8b8bb6-e13f-11e8-aa0f-001a4a409202/mule_0.log": CHMOD, retrying... Nov 13 14:31:17 server atomic-openshift-node: E1113 14:31:17.239105 28872 logs.go:368] Unexpected fsnotify event: "/var/log/pods/dc8b8bb6-e13f-11e8-aa0f-001a4a409202/mule_0.log": CHMOD, retrying... The issue seems to be closely related to the issue mentioned in the following discussion, and is causing the symbolic links to disapear on a regular basis from the openshift nodes which seems to be describes here: https://github.com/kubernetes/kubernetes/issues/52172 Version-Release number of selected component (if applicable): atomic-openshift-3.10.72-1.git.0.3cb2fdc.el7.x86_64
Fluentd reads from /var/log/containers/*.log. It does not know and should not care that /var/log/containers/*.log are symlinks to /var/log/pods/*/*.log which in turn are symlinks to /var/lib/docker/containers/*/*.log It has worked fine since day 1. Not sure why this is a problem now. There are no pertinent errors in the fluentd logs, but of course there are many errors in the atomic-openshift-node logs. Are you saying that messages such as this are benign and do not indicate that fluentd should have a problem reading the logs? Dec 11 17:02:09 prlonr34.nz.thenational.com atomic-openshift-node[114311]: E1211 17:02:09.445383 114311 logs.go:387] Unexpected fsnotify event: "/var/log/pods/bb05d500-fcf8-11e8-88cc-001a4a408f4f/sti-build/0.log": RENAME, retrying.. > Sending to Logging as this flow is bypassing the kubelet i.e. fluent is > reading the raw container logs on the host from the container runtime. Can I infer from this statement that fluentd should be doing something other than reading the raw container logs on the host from the container runtime? If so, what should it be doing? >It > seems that fluent is not handling the log rotation done by the container > runtime. Fluentd does have log rotation handling code. >Or something is messed up with the customers splunk connector. We can try to debug what is happening with fluentd, to at least see if fluentd is reading the records. 1) oc project logging # or openshift-logging if logging is not present 2) oc edit cm logging-fluentd Comment out this line: @include configs.d/openshift/system.conf Add these lines just below: <system> @log_level trace </system> Find these lines: <label @INGRESS> ## filters Add this just below: <filter kubernetes.**> @type stdout </filter> 3) oc set env ds/logging-fluentd VERBOSE=true DEBUG=true This will cause all fluentd in the cluster to restart and write their log files to /var/log/fluentd.log (or /var/log/fluentd/fluentd.log depending on what version you are running). Then we can see if/when fluentd stops reading and hopefully why.