Bug 1851694

Summary: Fluentd stuck in `Init:CrashLoopBackOff` when enable logforwarding and only deploy fluentd.
Product: OpenShift Container Platform Reporter: Qiaoling Tang <qitang>
Component: LoggingAssignee: Periklis Tsirakidis <periklis>
Status: CLOSED DUPLICATE QA Contact: Anping Li <anli>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.5CC: aos-bugs, periklis
Target Milestone: ---Keywords: Regression
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-30 06:01:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Qiaoling Tang 2020-06-28 09:32:25 UTC
Description of problem:
Deploy a log receiver, then create ClusterLogging, enable logforwarding and only deploy fluentd, the fluentd pods can't become running:

$ oc get pod
NAME                                       READY   STATUS                  RESTARTS   AGE
cluster-logging-operator-74cc99dfd-drts4   1/1     Running                 0          5h33m
fluentd-6mlbv                              0/1     Init:CrashLoopBackOff   6          10m
fluentd-7bzpz                              0/1     Init:CrashLoopBackOff   6          10m
fluentd-7c77w                              0/1     Init:CrashLoopBackOff   6          10m
fluentd-8zd7l                              0/1     Init:Error              7          10m
fluentd-czq9n                              0/1     Init:Error              7          10m
fluentd-fhp9v                              0/1     Init:CrashLoopBackOff   6          10m
fluentd-g47c9                              0/1     Init:CrashLoopBackOff   6          10m
fluentd-ht252                              0/1     Init:CrashLoopBackOff   6          10m
fluentd-vv4l5                              0/1     Init:CrashLoopBackOff   6          10m
fluentdserver-578777544c-b5nwq             1/1     Running                 0          11m

$ oc get clusterlogging -oyaml
  spec:
    collection:
      logs:
        fluentd: {}
        type: fluentd
    managementState: Managed
  status:
    collection:
      logs:
        fluentdStatus:
          clusterCondition:
            fluentd-6mlbv:
            - lastTransitionTime: "2020-06-28T08:27:28Z"
              reason: PodInitializing
              status: "True"
              type: ContainerWaiting
            fluentd-7bzpz:
            - lastTransitionTime: "2020-06-28T08:27:28Z"
              reason: PodInitializing
              status: "True"
              type: ContainerWaiting
            fluentd-7c77w:
            - lastTransitionTime: "2020-06-28T08:27:28Z"
              reason: PodInitializing
              status: "True"
              type: ContainerWaiting
            fluentd-8zd7l:
            - lastTransitionTime: "2020-06-28T08:27:28Z"
              reason: PodInitializing
              status: "True"
              type: ContainerWaiting
            fluentd-czq9n:
            - lastTransitionTime: "2020-06-28T08:27:28Z"
              reason: PodInitializing
              status: "True"
              type: ContainerWaiting
            fluentd-fhp9v:
            - lastTransitionTime: "2020-06-28T08:27:28Z"
              reason: PodInitializing
              status: "True"
              type: ContainerWaiting
            fluentd-g47c9:
            - lastTransitionTime: "2020-06-28T08:27:28Z"
              reason: PodInitializing
              status: "True"
              type: ContainerWaiting
            fluentd-ht252:
            - lastTransitionTime: "2020-06-28T08:27:28Z"
              reason: PodInitializing
              status: "True"
              type: ContainerWaiting
            fluentd-vv4l5:
            - lastTransitionTime: "2020-06-28T08:27:28Z"
              reason: PodInitializing
              status: "True"
              type: ContainerWaiting
          daemonSet: fluentd
          nodes:
            fluentd-6mlbv: ip-10-0-137-205.us-east-2.compute.internal
            fluentd-7bzpz: ip-10-0-153-51.us-east-2.compute.internal
            fluentd-7c77w: ip-10-0-142-205.us-east-2.compute.internal
            fluentd-8zd7l: ip-10-0-201-66.us-east-2.compute.internal
            fluentd-czq9n: ip-10-0-203-83.us-east-2.compute.internal
            fluentd-fhp9v: ip-10-0-183-222.us-east-2.compute.internal
            fluentd-g47c9: ip-10-0-162-142.us-east-2.compute.internal
            fluentd-ht252: ip-10-0-161-84.us-east-2.compute.internal
            fluentd-vv4l5: ip-10-0-192-161.us-east-2.compute.internal
          pods:
            failed: []
            notReady:
            - fluentd-6mlbv
            - fluentd-7bzpz
            - fluentd-7c77w
            - fluentd-8zd7l
            - fluentd-czq9n
            - fluentd-fhp9v
            - fluentd-g47c9
            - fluentd-ht252
            - fluentd-vv4l5
            ready: []
    curation: {}
    logStore: {}
    visualization: {}

$ oc get logforwarding -oyaml
  spec:
    outputs:
    - endpoint: fluentdserver.openshift-logging.svc:24224
      insecure: true
      name: fluentd-created-by-user
      type: forward
    pipelines:
    - inputSource: logs.app
      name: app-pipeline
      outputRefs:
      - fluentd-created-by-user
    - inputSource: logs.infra
      name: infra-pipeline
      outputRefs:
      - fluentd-created-by-user
    - inputSource: logs.audit
      name: audit-pipeline
      outputRefs:
      - fluentd-created-by-user
  status:
    lastUpdated: "2020-06-28T08:15:36Z"
    reason: ResourceName
    state: Accepted


Version-Release number of selected component (if applicable):
$ oc get csv
NAME                                           DISPLAY                  VERSION                 REPLACES   PHASE
clusterlogging.4.5.0-202006271533.p0           Cluster Logging          4.5.0-202006271533.p0              Succeeded
elasticsearch-operator.4.5.0-202006261904.p0   Elasticsearch Operator   4.5.0-202006261904.p0              Succeeded


How reproducible:
Always

Steps to Reproduce:
1. deploy log receiver
2. create logforwarding instance
3. create clusterlogging instance with:
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  annotations:
    clusterlogging.openshift.io/logforwardingtechpreview: enabled
  name: "instance"
  namespace: "openshift-logging"
spec:
  managementState: "Managed"
  collection:
    logs:
      type: "fluentd"
      fluentd: {}


Actual results:


Expected results:


Additional info:
No such issue in 4.6

Comment 1 Periklis Tsirakidis 2020-06-30 06:01:47 UTC
@Qiaoling Tang

This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1850076. Fix is waiting to be tested in the parent BZ for 4.6: https://bugzilla.redhat.com/show_bug.cgi?id=1849188

*** This bug has been marked as a duplicate of bug 1850076 ***