Bug 1385530 - atomic-openshift-controller shouldn't be started when run upgrade_nodes.yml
Summary: atomic-openshift-controller shouldn't be started when run upgrade_nodes.yml
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.3.1
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: Scott Dodson
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-17 09:31 UTC by Anping Li
Modified: 2019-01-31 15:59 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-31 15:59:23 UTC
Target Upstream Version:


Attachments (Terms of Use)
upgrade Group1 nodes (98.38 KB, text/plain)
2016-10-17 09:35 UTC, Anping Li
no flags Details
master system status before and after (10.00 KB, application/x-tar)
2016-10-31 09:01 UTC, Anping Li
no flags Details
Journal logs for controllers and nodes (1.97 MB, application/x-gzip)
2016-11-01 02:00 UTC, Anping Li
no flags Details
Journal logs for the above comment (892.01 KB, application/x-gzip)
2017-08-24 05:24 UTC, Anping Li
no flags Details

Description Anping Li 2016-10-17 09:31:58 UTC
Description of problem:
atomic-openshift-controller was started when run upgrade_nodes.yml 

Version-Release number of selected component (if applicable):
openshift-ansible-3.3.35-1.git.0.1be8ddc.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. install OSE 3.2 
2. Upgrade v3_3/upgrade_control_plane.yml
   ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_3/upgrade_control_plane.yml
3. collect system service status
4. Upgrade to v3_3/upgrade_nodes.yml
    ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_3/upgrade_nodes.yml
5.  collect system service status and diff with step 4)

Actual results:

atomic-openshift-controller was restated during nodes upgrade.

Expected results:

The master service shouldn't be restarted during node upgrade.

Additional info:

Comment 1 Anping Li 2016-10-17 09:35:49 UTC
Created attachment 1211256 [details]
upgrade Group1 nodes

Comment 2 Devan Goodwin 2016-10-28 15:27:17 UTC
Cannot reproduce with latest master.

Was this containers or rpm? Could you add your inventory?

Exactly what criteria were you using to determine if the service restarted?

Was there anything relevant in the journalctl logs for the service?

I made assumptions it was probably rpm based and was referring to atomic-openshift-master-controllers service, both it and -api service did not restart during upgrade_nodes, same PID, same start time in systemctl status, no restart indications in the journalctl logs.

Comment 3 Anping Li 2016-10-31 09:01:30 UTC
Created attachment 1215720 [details]
master system status before and after

Comment 5 Anping Li 2016-10-31 09:09:11 UTC
This is RPM upgrade.

Comment 6 Devan Goodwin 2016-10-31 12:48:24 UTC
Would you still have the journalctl logs for atomic-openshift-node and atomic-openshift-master-controllers?

There's no indication of a restart in ansible logs, I am as yet unable to tell why it would restart. Interesting that the API server does not however.

Could you also check if there was a systemd-journald rpm upgrade triggered anywhere in the journal.

Comment 7 Anping Li 2016-11-01 02:00:30 UTC
Created attachment 1215962 [details]
Journal logs for controllers and nodes

All tree atomic-openshift-master-controllers were restarted.  I was running v3_3/upgrade_nodes.yml on latest v3.3.1 Env, so it shouldn't caused by package upgrade.

Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-node[25405]: I1031 16:25:00.422137   25405 kubelet.go:2704] SyncLoop (housekeeping)
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-node[25405]: I1031 16:25:00.593568   25405 generic.go:181] GenericPLEG: Relisting
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: I1031 16:25:00.629276    9262 trace.go:61] Trace "Update /api/v1/nodes/openshift-138.lab.eng.nay.redhat.com/status" (starte
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [37.896µs] [37.896µs] About to convert to expected version
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [182.233µs] [144.337µs] Conversion done
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [190.943µs] [8.71µs] About to store object in database
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [1.020790176s] [1.020599233s] Object stored in database
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [1.020803997s] [13.821µs] Self-link added
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [1.020933432s] [129.435µs] END
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: I1031 16:25:00.630029    9262 trace.go:61] Trace "Update /api/v1/nodes/openshift-153.lab.eng.nay.redhat.com/status" (starte
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [30.678µs] [30.678µs] About to convert to expected version
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [152.421µs] [121.743µs] Conversion done
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [160.591µs] [8.17µs] About to store object in database
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [1.701706558s] [1.701545967s] Object stored in database
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [1.701713439s] [6.881µs] Self-link added
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [1.701780068s] [66.629µs] END
Oct 31 16:25:01 openshift-153.lab.eng.nay.redhat.com atomic-openshift-node[25405]: I1031 16:25:01.595735   25405 generic.go:181] GenericPLEG: Relisting
Oct 31 16:25:02 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-controllers[30094]: F1031 16:25:02.047280   30094 start_master.go:569] Controller graceful shutdown requested
Oct 31 16:25:02 openshift-153.lab.eng.nay.redhat.com systemd[1]: atomic-openshift-master-controllers.service: main process exited, code=exited, status=255/n/a
Oct 31 16:25:02 openshift-153.lab.eng.nay.redhat.com systemd[1]: Unit atomic-openshift-master-controllers.service entered failed state.
Oct 31 16:25:02 openshift-153.lab.eng.nay.redhat.com systemd[1]: atomic-openshift-master-controllers.service failed.
Oct 31 16:25:02 openshift-153.lab.eng.nay.redhat.com atomic-openshift-node[25405]: I1031 16:25:02.422223   25405 kubelet.go:2704] SyncLoop (housekeeping)
Oct 31 16:25:02 openshift-153.lab.eng.nay.redhat.com atomic-openshift-node[25405]: I1031 16:25:02.598055   25405 generic.go:181] GenericPLEG: Relisting

Comment 8 Devan Goodwin 2016-11-01 12:32:54 UTC
Thanks for additional info. Still trying to figure out what's happening and find a reproducer.

Moving off blocker list however as the impact is very low, it shouldn't be happening but I do not believe it should block the 3.4 release.

Comment 9 Anping Li 2017-08-24 05:20:39 UTC
With atomic-openshift-master-3.6.173.0.7-1.git.0.163458a.el7.x86_64, sometimes the atomic-openshift-controller was restarted on one of three masters.

Comment 10 Anping Li 2017-08-24 05:24:51 UTC
Created attachment 1317411 [details]
Journal logs for the above comment

Comment 11 Scott Dodson 2019-01-31 15:59:23 UTC
There appear to be no active cases related to this bug. As such we're closing this bug in order to focus on bugs that are still tied to active customer cases. Please re-open this bug if you feel it was closed in error or a new active case is attached.


Note You need to log in before you can comment on or make changes to this bug.