Bug 1366936

Summary: The 3.2.1 level kibana pod kept on redeploying itself when logging is deployed on OSE 3.3.0 master
Product: OpenShift Container Platform Reporter: Xia Zhao <xiazhao>
Component: NodeAssignee: Michail Kargakis <mkargaki>
Status: CLOSED ERRATA QA Contact: chunchen <chunchen>
Severity: high Docs Contact:
Priority: high    
Version: 3.3.0CC: aos-bugs, ccoleman, eparis, ewolinet, jokerman, lmeyer, mfojtik, mmccomas, tdawson, wsun, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
The trigger controller we use for handling triggers for deployments was not handling correctly ImageChangeTriggers from different namespaces, resulting in hotlooping between deployments.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-27 09:44:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 2 Xia Zhao 2016-08-15 03:27:10 UTC
My typo, the last sentence of comment #1 should be:
The issue happen with A+B+C and B+C, not repro for all the others combinations.

Comment 3 ewolinet 2016-08-15 14:05:34 UTC
I know we've seen this behavior once before, is this something that occurs every time we run Aggregated Logging 3.2.1 on an OSE 3.3 master?

We've removed triggers in the 3.3 DCs, so that may be why we don't see this happening then. Can we check if using Aggregated Logging 3.3 with scenarios B+C causes issues? You'll probably need to update the image versions of the IS and pull it in too.

Comment 4 Luke Meyer 2016-08-15 20:13:02 UTC
I can reproduce this with 3.2.1 and 3.2.0 logging deploying on OSE 3.3.0. I suspect that the two ImageChange triggers are both attempting a deploy at the same time and tripping over each other. I'm seeing errors like this in the journal:

ontroller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:404] found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:226] Error instantiating deployment config logging/logging-kibana-ops: couldn't retrieve deployment for deployment config "logging/logging-kibana-ops": replicationcontrollers "logging-kibana-ops-91" not found
controller.go:226] Error instantiating deployment config logging/logging-kibana-ops: couldn't retrieve deployment for deployment config "logging/logging-kibana-ops": replicationcontrollers "logging-kibana-ops-91" not found
controller.go:226] Error instantiating deployment config logging/logging-kibana-ops: couldn't retrieve deployment for deployment config "logging/logging-kibana-ops": replicationcontrollers "logging-kibana-ops-91" not found
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:226] Error instantiating deployment config logging/logging-kibana-ops: couldn't retrieve deployment for deployment config "logging/logging-kibana-ops": replicationcontrollers "logging-kibana-ops-91" not found
controller.go:404] found previous inflight deployment for logging/logging-kibana - requeuing
replication_controller.go:498] Too many "logging"/"logging-kibana-97" replicas, need 0, deleting 1
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
event.go:216] Event(api.ObjectReference{Kind:"ReplicationController", Namespace:"logging", Name:"logging-kibana-97", UID:"9521e1e7-6323-11e6-976c-5254002ddfb5", APIVersion:"v1", ResourceVersion:"271783", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' Deleted pod: logging-kibana-97-qkm0a
controller.go:399] Error syncing deployment config logging/logging-kibana: found previous inflight deployment for logging/logging-kibana - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana: found previous inflight deployment for logging/logging-kibana - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana: found previous inflight deployment for logging/logging-kibana - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana: found previous inflight deployment for logging/logging-kibana - requeuing
controller.go:226] Error instantiating deployment config logging/logging-kibana-ops: couldn't retrieve deployment for deployment config "logging/logging-kibana-ops": replicationcontrollers "logging-kibana-ops-91" not found
controller.go:399] Error syncing deployment config logging/logging-kibana: found previous inflight deployment for logging/logging-kibana - requeuing
controller.go:404] found previous inflight deployment for logging/logging-kibana - requeuing

This may not actually be a problem for customers as I think the problem is only when the images are imported and kick off the deploy. I think the path of install 3.2 -> upgrade OSE -> upgrade logging will probably work fine (though this would be good to test). However this is a change in behavior and I think it's worth having the platform team take a look at it and see if they can make it behave better.

Comment 5 Clayton Coleman 2016-08-15 21:32:46 UTC
This is pretty serious - it could certainly explain bugs we've seen.  It should resolve all triggers at once.

Comment 6 Xia Zhao 2016-08-16 07:00:19 UTC
(In reply to ewolinet from comment #3)
> I know we've seen this behavior once before, is this something that occurs
> every time we run Aggregated Logging 3.2.1 on an OSE 3.3 master?

Yes, currently the reproducibility is 100% to me.

> We've removed triggers in the 3.3 DCs, so that may be why we don't see this
> happening then. Can we check if using Aggregated Logging 3.3 with scenarios
> B+C causes issues? You'll probably need to update the image versions of the
> IS and pull it in too.

Yes, I placed image triggers B+C with image versions = 3.3.0 in logging 3.3.0, issue can be reproducible there.

Comment 7 Michal Fojtik 2016-08-17 14:23:47 UTC
Fix here: https://github.com/openshift/origin/pull/10444

Comment 8 openshift-github-bot 2016-08-17 21:10:07 UTC
Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/90fc4171e146646eb38b7973fc20ae49b84eafb8
Bug 1366936: fix ICT matching in the trigger controller

Comment 12 Troy Dawson 2016-08-19 20:55:46 UTC
Commit from Comment 8 has been merged into OSE.
This has been merged into ose and is in OSE v3.3.0.23 or newer.

Comment 14 Xia Zhao 2016-08-23 04:34:24 UTC
Verified on openshift v3.3.0.23, the 3.2.1 level logging stacks are stable and working fine there:

$ oc get po
NAME                          READY     STATUS      RESTARTS   AGE
logging-deployer-0n8h5        0/1       Completed   0          4m
logging-es-7igyqphg-1-2muff   1/1       Running     0          2m
logging-fluentd-1-ixyt2       1/1       Running     0          2m
logging-kibana-1-vrg6o        2/2       Running     0          2m

# openshift version
openshift v3.3.0.23-dirty
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git

Comment 16 errata-xmlrpc 2016-09-27 09:44:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933