Hide Forgot
Created attachment 1212648 [details] full journal logs of the openshift-dedicated-role service Description of problem: We're seeing these non-fatal errors in journalctl on one host running openshift-dedicated-role: Oct 20 20:01:36 ip-172-31-57-66.ec2.internal python[34206]: Adding role admin to groups 'dedicated-admins' in project ops-health-build-nodejs-ex-qcs01-master-06cda ... Oct 20 20:01:36 ip-172-31-57-66.ec2.internal python[34206]: OK Oct 20 20:02:12 ip-172-31-57-66.ec2.internal python[34206]: AddException in thread Thread-1: Oct 20 20:02:12 ip-172-31-57-66.ec2.internal python[34206]: Traceback (most recent call last): Oct 20 20:02:12 ip-172-31-57-66.ec2.internal python[34206]: File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner Oct 20 20:02:12 ip-172-31-57-66.ec2.internal python[34206]: self.run() Oct 20 20:02:12 ip-172-31-57-66.ec2.internal python[34206]: File "/usr/lib64/python2.7/threading.py", line 764, in run Oct 20 20:02:12 ip-172-31-57-66.ec2.internal python[34206]: self.__target(*self.__args, **self.__kwargs) Oct 20 20:02:12 ip-172-31-57-66.ec2.internal python[34206]: File "/usr/bin/apply-dedicated-roles.py", line 266, in async_sync_role Oct 20 20:02:12 ip-172-31-57-66.ec2.internal python[34206]: p.add_role() Oct 20 20:02:12 ip-172-31-57-66.ec2.internal python[34206]: File "/usr/bin/apply-dedicated-roles.py", line 140, in add_role Oct 20 20:02:12 ip-172-31-57-66.ec2.internal python[34206]: print "Failed to add role %s to groups %s in project %s: %s" % (dedicated_role, groups_str, self.name, error) Oct 20 20:02:12 ip-172-31-57-66.ec2.internal python[34206]: NameError: global name 'error' is not defined Oct 20 20:26:04 ip-172-31-57-66.ec2.internal python[34206]: ing role dedicated-project-admin to groups 'dedicated-admins' in project ops-health-build-nodejs-ex-qcs01-master-aeba9 ... Oct 20 20:26:04 ip-172-31-57-66.ec2.internal python[34206]: OK It errors periodically, then recovers. Performance doesn't seem to be impacted, but the service is restarting a few times per day. Version-Release number of selected component (if applicable): openshift-scripts-dedicated-3.3.0.6-1.el7.x86_64 How reproducible: Only a handful of times, on one host. But it seems to be recurring. This is our only host running openshift-scripts-dedicated-3.3.0.6. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Fixed with https://github.com/openshift/online/pull/509
Bumping severity to just help with tracking. Its a toss-up between a low and a medium.
This is being expected to be built and tested in Dedicated INT shortly.
This can be tested on Dedicated clusters that have been upgraded to Online version 3.3.1.3+
Checked this with OCP 3.4, no such error logged. openshift v3.4.0.26+f7e109e kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 @dakini would you please help try this in your online env? If it fixed in online env, then I will change to verified.
This needs to be tested on a Dedicated cluster with an openshift-scripts-dedicated version of 3.3.1.3+. OCP does not have this role - dedicated clusters do.
I just set this up in ded-int-aws, with version 3.3.1.6-1. So far it looks good. I'll let it run for a day or so to see if the error occurs again.
We'll let QE test it on AWS INT and verify it.
@abhgupta, QE don't have permission to run the script on AWS INT environment. As the #C7 and #C8 indicated, the fixes workable in OCP3.4.0.26 and ded-int-aws 3.3.1.6-1, so change the status to verified. If the issue still happen later, please re-open it, thanks.