Bug 1565625 - Unable to use hostpath mount for fluentd [NEEDINFO]
Summary: Unable to use hostpath mount for fluentd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.10.0
Assignee: Hemant Kumar
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-10 12:54 UTC by Jeff Cantrill
Modified: 2018-07-30 19:13 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-30 19:12:35 UTC
Target Upstream Version:
hekumar: needinfo? (jcantril)


Attachments (Terms of Use)
fluentd ds file (5.22 KB, text/plain)
2018-04-28 09:21 UTC, Junqi Zhao
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1816 None None None 2018-07-30 19:13:02 UTC

Description Jeff Cantrill 2018-04-10 12:54:03 UTC
Description of problem:

The Openshift EFK stack is part of the infra deploys fluentd as a daemonset to collects node and container logs.  It mounts (since inception in v3.2) various parts of the host to collect application and journal logs [1].  Since the mount propegation feature landed [2], log collection is completely broken which is blocking bug fixing in the current stack and testing of releasing the ES5 stack

[1] https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_logging_fluentd/templates/2.x/fluentd.j2#L215-L245
[2] https://github.com/kubernetes/kubernetes/issues/61058
https://github.com/kubernetes/kubernetes/pull/61126

Version-Release number of selected component (if applicable):

openshift v3.10.0-alpha.0+435f98f-619 and kubernetes v1.10.0+b81c8f8

How reproducible:

Always

Steps to Reproduce:
1. Deploy openshift EFK stack
2.
3.

Comment 1 Jan Safranek 2018-04-10 13:02:14 UTC
You can use whole /var/lib/docker instead of /var/lib/docker/containers in your pod:

spec:
    containers:
        - name: varlibdocker
          mountPath: /var/lib/docker
          readOnly: true

...
    volumes:
      - name: varlibdocker
        hostPath:
          path: /var/lib/docker


Proper fix would require us to change API in Kubernetes, which is long and tedious process that does not work well with urgent bugs.

Comment 2 Hemant Kumar 2018-04-10 15:31:33 UTC
Jeff - we made rslave mount as default in 1.10 and since docker explicitly marks /var/lib/docker/containers as private mount - it can't be mounted within a pod. 

Mounting /var/lib/docker still works because it is within "/" file system and can be mounted as rslave. 

If this workaround does not work - we can disable mount propagation feature in the cluster. Providing private mount as optional param will require API change via upstream and will take time.

Comment 3 Hemant Kumar 2018-04-12 18:43:02 UTC
For now I have opened a PR to disable mount propagation via ansible - https://github.com/openshift/openshift-ansible/pull/7936

Comment 4 Jan Safranek 2018-04-13 13:56:08 UTC
Long-term, I want Kubernetes to revert to "private" propagation by default (i.e. same as was in 1.9 and earlier): https://github.com/kubernetes/kubernetes/pull/62462

Comment 6 Hemant Kumar 2018-04-16 21:46:30 UTC
Revert PR for Openshift as well - https://github.com/openshift/origin/pull/19364

Comment 7 Junqi Zhao 2018-04-28 09:20:53 UTC
(In reply to Jan Safranek from comment #1)
> You can use whole /var/lib/docker instead of /var/lib/docker/containers in
> your pod:
> 
> spec:
>     containers:
>         - name: varlibdocker
>           mountPath: /var/lib/docker
>           readOnly: true
> 
> ...
>     volumes:
>       - name: varlibdocker
>         hostPath:
>           path: /var/lib/docker
> 
> 
> Proper fix would require us to change API in Kubernetes, which is long and
> tedious process that does not work well with urgent bugs.

There is one interesting scenario, I set the following parameters, no need to do the workaround, fluentd pods can be started up, 
openshift_logging_use_ops=true
openshift_logging_es_cluster_size=2
openshift_logging_es_ops_cluster_size=2


  
# oc get pod
NAME                                          READY     STATUS    RESTARTS   AGE
logging-curator-1-rjbpp                       1/1       Running   0          47m
logging-curator-ops-1-676fp                   1/1       Running   0          47m
logging-es-data-master-i39dne2b-1-ckqkp       2/2       Running   0          46m
logging-es-data-master-tumll5zj-1-k6rzh       2/2       Running   0          46m
logging-es-ops-data-master-4z0dr5nh-1-cx67r   2/2       Running   0          46m
logging-es-ops-data-master-vj9lewcb-1-7zk2l   2/2       Running   0          46m
logging-fluentd-b6dcw                         1/1       Running   0          46m
logging-fluentd-s28sd                         1/1       Running   0          46m
logging-kibana-1-frx6j                        2/2       Running   0          48m
logging-kibana-ops-1-cpkrs                    2/2       Running   0          47m

Still need to do workaround if without the following settings
openshift_logging_es_cluster_size=2
openshift_logging_es_ops_cluster_size=2


more info see the attached flunetd ds file

Comment 8 Junqi Zhao 2018-04-28 09:21:27 UTC
Created attachment 1428003 [details]
fluentd ds file

Comment 10 Junqi Zhao 2018-05-30 02:38:28 UTC
Issue is fixed, fluentd pods can be started up now.
used /var/lib/docker as hostPath for fluentd

         - mountPath: /var/lib/docker
           name: varlibdockercontainers
           readOnly: true

       - hostPath:
           path: /var/lib/docker
           type: ""
         name: varlibdockercontainers

Images version: v3.10.0-0.54.0.0

Comment 12 errata-xmlrpc 2018-07-30 19:12:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816


Note You need to log in before you can comment on or make changes to this bug.