Bug 1366936 - The 3.2.1 level kibana pod kept on redeploying itself when logging is deployed on OSE 3.3.0 master
Summary: The 3.2.1 level kibana pod kept on redeploying itself when logging is deploye...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Michail Kargakis
QA Contact: chunchen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-15 03:20 UTC by Xia Zhao
Modified: 2017-03-08 18:26 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
The trigger controller we use for handling triggers for deployments was not handling correctly ImageChangeTriggers from different namespaces, resulting in hotlooping between deployments.
Clone Of:
Environment:
Last Closed: 2016-09-27 09:44:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 10444 0 None closed Bug 1366936: fix ICT matching in the trigger controller 2020-02-23 08:47:43 UTC
Red Hat Product Errata RHBA-2016:1933 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.3 Release Advisory 2016-09-27 13:24:36 UTC

Comment 2 Xia Zhao 2016-08-15 03:27:10 UTC
My typo, the last sentence of comment #1 should be:
The issue happen with A+B+C and B+C, not repro for all the others combinations.

Comment 3 ewolinet 2016-08-15 14:05:34 UTC
I know we've seen this behavior once before, is this something that occurs every time we run Aggregated Logging 3.2.1 on an OSE 3.3 master?

We've removed triggers in the 3.3 DCs, so that may be why we don't see this happening then. Can we check if using Aggregated Logging 3.3 with scenarios B+C causes issues? You'll probably need to update the image versions of the IS and pull it in too.

Comment 4 Luke Meyer 2016-08-15 20:13:02 UTC
I can reproduce this with 3.2.1 and 3.2.0 logging deploying on OSE 3.3.0. I suspect that the two ImageChange triggers are both attempting a deploy at the same time and tripping over each other. I'm seeing errors like this in the journal:

ontroller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:404] found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:226] Error instantiating deployment config logging/logging-kibana-ops: couldn't retrieve deployment for deployment config "logging/logging-kibana-ops": replicationcontrollers "logging-kibana-ops-91" not found
controller.go:226] Error instantiating deployment config logging/logging-kibana-ops: couldn't retrieve deployment for deployment config "logging/logging-kibana-ops": replicationcontrollers "logging-kibana-ops-91" not found
controller.go:226] Error instantiating deployment config logging/logging-kibana-ops: couldn't retrieve deployment for deployment config "logging/logging-kibana-ops": replicationcontrollers "logging-kibana-ops-91" not found
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
controller.go:226] Error instantiating deployment config logging/logging-kibana-ops: couldn't retrieve deployment for deployment config "logging/logging-kibana-ops": replicationcontrollers "logging-kibana-ops-91" not found
controller.go:404] found previous inflight deployment for logging/logging-kibana - requeuing
replication_controller.go:498] Too many "logging"/"logging-kibana-97" replicas, need 0, deleting 1
controller.go:399] Error syncing deployment config logging/logging-kibana-ops: found previous inflight deployment for logging/logging-kibana-ops - requeuing
event.go:216] Event(api.ObjectReference{Kind:"ReplicationController", Namespace:"logging", Name:"logging-kibana-97", UID:"9521e1e7-6323-11e6-976c-5254002ddfb5", APIVersion:"v1", ResourceVersion:"271783", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' Deleted pod: logging-kibana-97-qkm0a
controller.go:399] Error syncing deployment config logging/logging-kibana: found previous inflight deployment for logging/logging-kibana - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana: found previous inflight deployment for logging/logging-kibana - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana: found previous inflight deployment for logging/logging-kibana - requeuing
controller.go:399] Error syncing deployment config logging/logging-kibana: found previous inflight deployment for logging/logging-kibana - requeuing
controller.go:226] Error instantiating deployment config logging/logging-kibana-ops: couldn't retrieve deployment for deployment config "logging/logging-kibana-ops": replicationcontrollers "logging-kibana-ops-91" not found
controller.go:399] Error syncing deployment config logging/logging-kibana: found previous inflight deployment for logging/logging-kibana - requeuing
controller.go:404] found previous inflight deployment for logging/logging-kibana - requeuing

This may not actually be a problem for customers as I think the problem is only when the images are imported and kick off the deploy. I think the path of install 3.2 -> upgrade OSE -> upgrade logging will probably work fine (though this would be good to test). However this is a change in behavior and I think it's worth having the platform team take a look at it and see if they can make it behave better.

Comment 5 Clayton Coleman 2016-08-15 21:32:46 UTC
This is pretty serious - it could certainly explain bugs we've seen.  It should resolve all triggers at once.

Comment 6 Xia Zhao 2016-08-16 07:00:19 UTC
(In reply to ewolinet from comment #3)
> I know we've seen this behavior once before, is this something that occurs
> every time we run Aggregated Logging 3.2.1 on an OSE 3.3 master?

Yes, currently the reproducibility is 100% to me.

> We've removed triggers in the 3.3 DCs, so that may be why we don't see this
> happening then. Can we check if using Aggregated Logging 3.3 with scenarios
> B+C causes issues? You'll probably need to update the image versions of the
> IS and pull it in too.

Yes, I placed image triggers B+C with image versions = 3.3.0 in logging 3.3.0, issue can be reproducible there.

Comment 7 Michal Fojtik 2016-08-17 14:23:47 UTC
Fix here: https://github.com/openshift/origin/pull/10444

Comment 8 openshift-github-bot 2016-08-17 21:10:07 UTC
Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/90fc4171e146646eb38b7973fc20ae49b84eafb8
Bug 1366936: fix ICT matching in the trigger controller

Comment 12 Troy Dawson 2016-08-19 20:55:46 UTC
Commit from Comment 8 has been merged into OSE.
This has been merged into ose and is in OSE v3.3.0.23 or newer.

Comment 14 Xia Zhao 2016-08-23 04:34:24 UTC
Verified on openshift v3.3.0.23, the 3.2.1 level logging stacks are stable and working fine there:

$ oc get po
NAME                          READY     STATUS      RESTARTS   AGE
logging-deployer-0n8h5        0/1       Completed   0          4m
logging-es-7igyqphg-1-2muff   1/1       Running     0          2m
logging-fluentd-1-ixyt2       1/1       Running     0          2m
logging-kibana-1-vrg6o        2/2       Running     0          2m

# openshift version
openshift v3.3.0.23-dirty
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git

Comment 16 errata-xmlrpc 2016-09-27 09:44:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933


Note You need to log in before you can comment on or make changes to this bug.