Bug 1288414

Summary: logging-deployment creates logging-es-xxxxx, logging-fluentd and logging-kibana which have wrong image name
Product: OpenShift Container Platform Reporter: Kenjiro Nakayama <knakayam>
Component: LoggingAssignee: Luke Meyer <lmeyer>
Status: CLOSED NOTABUG QA Contact: chunchen <chunchen>
Severity: low Docs Contact:
Priority: high    
Version: 3.1.0CC: aos-bugs, jcantril, wsun
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-18 14:54:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kenjiro Nakayama 2015-12-04 06:58:46 UTC
Description of the problem:
- Followed the documentation[1] and after deployed logging-deployment, all of their pod will get their status "CrashLoopBackOff" or "ImagePullBackOff".

To fix this issue:
- Need to update image name in their dc. For example,

(WRONG)
image: logging-fluentd

(TOBE)
image: openshift3/logging-fluentd

[1] https://docs.openshift.com/enterprise/3.1/install_config/aggregate_logging.html

Comment 1 Luke Meyer 2015-12-18 13:07:41 UTC
What did you use for your IMAGE_PREFIX and IMAGE_VERSION in the `oc process logging-deployer-template` command?
https://docs.openshift.com/enterprise/3.1/install_config/aggregate_logging.html#deploying-the-efk-stack
It's *supposed* to default to the right thing but maybe it's wrong or you did something non-standard. Definitely the image name shouldn't need manual modification.

Comment 2 Kenjiro Nakayama 2015-12-18 14:10:37 UTC
(In reply to Luke Meyer from comment #1)
> What did you use for your IMAGE_PREFIX and IMAGE_VERSION in the `oc process
> logging-deployer-template` command?

I didn't set IMAGE_PREFIX and IMAGE_VERSION in the `oc process logging-deployer-template` command. 

I didn't think it is necessary, since it has been set by default:

~~~
  name: IMAGE_PREFIX
  value: "registry.access.redhat.com/openshift3/"
~~~

https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_examples/files/examples/v1.0/infrastructure-templates/enterprise/logging-deployer.yaml#L83-L85

Comment 3 Jeff Cantrill 2016-01-06 20:26:17 UTC
Moving this out of 'must fix' as there is a work around if the default is not being set correctly.  The DC can be manually updated.

Comment 4 Luke Meyer 2016-01-08 13:58:45 UTC
I think I understand the issue. I don't think it's a bug, but it points to something else that has gone wrong.

How this works is that when the DC is created, the pod spec has a placeholder image setting. This is supposed to be overwritten immediately by the ImageChange trigger. E.g. if you look at logging-fluentd-template it comes out as:

    template:
      metadata:
        labels:
          component: fluentd
          provider: openshift
        name: fluentd-elasticsearch
      spec:
        containers:
        - image: logging-fluentd     <--- placeholder value
          imagePullPolicy: Always
          name: fluentd-elasticsearch
          [...]
    triggers:
    - type: ConfigChange
    - imageChangeParams:             <--- specifies overwrite
        automatic: true
        containerNames:
        - fluentd-elasticsearch
        from:                      
          kind: ImageStreamTag
          name: logging-fluentd:latest
      type: ImageChange

Once the DC is actually created, the image in the template.spec should be changed to the real value, e.g. registry.access.redhat.com/openshift3/logging-fluentd:latest, which comes from the ImageStream "logging-fluentd". If, however, your IS hasn't been populated for some reason, the trigger won't fire and the DC will just try to use the placeholder.

I'm not sure what would cause ImageStream population to fail if you can then just change the image name to the right value and it starts working. ImageStreams do have their own docker client settings (I know this from trying to use an internal registry with an unknown CA, the IS doesn't use the system docker settings for this) however this is working fine with registry.access.redhat.com in all my tests.

It would probably be helpful to get the output of `oc describe is/logging-fluentd` (and/or any others that are misbehaving) as well as master logs if you see this occur.

Since the customer case is closed now (sorry for the delay) shall we close this bug and see if we can get further information if someone reproduces this?

Comment 5 Kenjiro Nakayama 2016-01-17 05:14:11 UTC
Thank you for your detail explanation.
I confirmed that the DC has "image: registry.access.redhat.com/openshift3/logging-xxxx" after I ran:

  $ oc process logging-support-template | oc create -f -

As you explained, I also guess that ImageStream population from logging-support-template has been failed.
Please feel free to close this bugzilla ticket.