Bug 1777098 - Fluentd creates /etc/hostname directory and break systemd-hostnamed.service on the nodes
Summary: Fluentd creates /etc/hostname directory and break systemd-hostnamed.service o...
Keywords:
Status: CLOSED DUPLICATE of bug 1746968
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.2.z
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Jeff Cantrill
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-26 22:44 UTC by Greg Rodriguez II
Modified: 2023-03-24 16:12 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-09 18:34:19 UTC
Target Upstream Version:
Embargoed:
grodrigu: needinfo-


Attachments (Terms of Use)

Description Greg Rodriguez II 2019-11-26 22:44:37 UTC
Description of problem:

~~~From Customer Description~~~

Fluentd creates /etc/hostname directory and break systemd-hostnamed.service on the nodes

Basically /etc/hostname does not exist on our nodes (they are purely dhcp driven) and when fluentd pod starts, it creates /etc/hostname path as a directory, due to following piece in daemon set:

        - name: dockerhostname
          hostPath:
            path: /etc/hostname
            type: ''


/etc/hostname being a directory causes systemd-hostnamed.service to fail at the subsequent boot with:

Nov 25 13:02:56 localhost systemd-hostnamed[1859]: Failed to read hostname and machine information: Is a directory
Nov 25 13:02:56 localhost systemd[1]: Started Hostname Service.

This causes the node to revert its hostname to 'localhost' - and basically self-destruct.

According to this 1746968 bug, this is allegedly fixed in 4.2 - but it is not clear what the actual fix is?
Has the daemon set template been changed in any way? Is /etc/hostname supposed to exist as a file in every case?

Regardless of the actual cause, we are definitively still experiencing this problem on 4.2  - and this is brand new 4.2 deployment, not an upgrade from 4.1.x (which - if it was the case -  could maybe explain existence of some stale, buggy deployment)

~~~

Version-Release number of selected component (if applicable):
OCP 4.2.7 with Logging Operator 4.2.5

How reproducible:
Customer verified easily and repeatedly reproducible

Additional info:
Logging dump data and must-gather in ticket (too large to attach to bz)

Comment 2 Jeff Cantrill 2019-11-27 16:02:11 UTC

*** This bug has been marked as a duplicate of bug 1746968 ***

Comment 5 Jeff Cantrill 2019-12-03 21:44:11 UTC
(In reply to Greg Rodriguez II from comment #3)
> Please see the detail from the description that is pertinent to c#2:
> 
> ~~~
> 
> According to this 1746968 bug, this is allegedly fixed in 4.2 - but it is
> not clear what the actual fix is?

The fix is to remove the mount point [1] as shown that it does NOT EXIST in 4.2 [2] 

[1] https://github.com/openshift/cluster-logging-operator/pull/239/files#diff-977fde508b09ae1d524afe3125ac12f1L285
[2] https://github.com/openshift/cluster-logging-operator/blob/release-4.2/pkg/k8shandler/fluentd.go#L280-L289

> Has the daemon set template been changed in any way? Is /etc/hostname
> supposed to exist as a file in every case?

See [1]

> 
> Regardless of the actual cause, we are definitively still experiencing this
> problem on 4.2  - and this is brand new 4.2 deployment, not an upgrade from
> 4.1.x (which - if it was the case -  could maybe explain existence of some
> stale, buggy deployment)

This is only possible if there is a bug in the operator where it is not updating the DS.  You can look to see for certain but the code I pointed you to is evidence that fix is in the 4.2 branch.

Alternatively are you certain you have properly upgraded logging?  The initial 4.1 release subscriptions pointed to the 'preview' channel where subsequent releases are associated with a '4.x' channel.  

> 
> ~~~
> 
> I have acknowledged that 1746968 exists and is closed per errata, however I
> have a specific question.
> 
> Please reopen ticket as it is not resolved.

Comment 7 Jeff Cantrill 2019-12-09 18:34:19 UTC

*** This bug has been marked as a duplicate of bug 1746968 ***


Note You need to log in before you can comment on or make changes to this bug.