| Summary: | atomic-openshift-controller shouldn't be started when run upgrade_nodes.yml | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Anping Li <anli> | ||||||||||
| Component: | Cluster Version Operator | Assignee: | Scott Dodson <sdodson> | ||||||||||
| Status: | CLOSED WONTFIX | QA Contact: | Anping Li <anli> | ||||||||||
| Severity: | low | Docs Contact: | |||||||||||
| Priority: | low | ||||||||||||
| Version: | 3.3.1 | CC: | anli, aos-bugs, dgoodwin, jokerman, mmccomas | ||||||||||
| Target Milestone: | --- | ||||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2019-01-31 15:59:23 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Attachments: |
|
||||||||||||
Created attachment 1211256 [details]
upgrade Group1 nodes
Cannot reproduce with latest master. Was this containers or rpm? Could you add your inventory? Exactly what criteria were you using to determine if the service restarted? Was there anything relevant in the journalctl logs for the service? I made assumptions it was probably rpm based and was referring to atomic-openshift-master-controllers service, both it and -api service did not restart during upgrade_nodes, same PID, same start time in systemctl status, no restart indications in the journalctl logs. Created attachment 1215720 [details]
master system status before and after
This is RPM upgrade. Would you still have the journalctl logs for atomic-openshift-node and atomic-openshift-master-controllers? There's no indication of a restart in ansible logs, I am as yet unable to tell why it would restart. Interesting that the API server does not however. Could you also check if there was a systemd-journald rpm upgrade triggered anywhere in the journal. Created attachment 1215962 [details]
Journal logs for controllers and nodes
All tree atomic-openshift-master-controllers were restarted. I was running v3_3/upgrade_nodes.yml on latest v3.3.1 Env, so it shouldn't caused by package upgrade.
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-node[25405]: I1031 16:25:00.422137 25405 kubelet.go:2704] SyncLoop (housekeeping)
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-node[25405]: I1031 16:25:00.593568 25405 generic.go:181] GenericPLEG: Relisting
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: I1031 16:25:00.629276 9262 trace.go:61] Trace "Update /api/v1/nodes/openshift-138.lab.eng.nay.redhat.com/status" (starte
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [37.896µs] [37.896µs] About to convert to expected version
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [182.233µs] [144.337µs] Conversion done
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [190.943µs] [8.71µs] About to store object in database
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [1.020790176s] [1.020599233s] Object stored in database
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [1.020803997s] [13.821µs] Self-link added
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [1.020933432s] [129.435µs] END
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: I1031 16:25:00.630029 9262 trace.go:61] Trace "Update /api/v1/nodes/openshift-153.lab.eng.nay.redhat.com/status" (starte
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [30.678µs] [30.678µs] About to convert to expected version
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [152.421µs] [121.743µs] Conversion done
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [160.591µs] [8.17µs] About to store object in database
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [1.701706558s] [1.701545967s] Object stored in database
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [1.701713439s] [6.881µs] Self-link added
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [1.701780068s] [66.629µs] END
Oct 31 16:25:01 openshift-153.lab.eng.nay.redhat.com atomic-openshift-node[25405]: I1031 16:25:01.595735 25405 generic.go:181] GenericPLEG: Relisting
Oct 31 16:25:02 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-controllers[30094]: F1031 16:25:02.047280 30094 start_master.go:569] Controller graceful shutdown requested
Oct 31 16:25:02 openshift-153.lab.eng.nay.redhat.com systemd[1]: atomic-openshift-master-controllers.service: main process exited, code=exited, status=255/n/a
Oct 31 16:25:02 openshift-153.lab.eng.nay.redhat.com systemd[1]: Unit atomic-openshift-master-controllers.service entered failed state.
Oct 31 16:25:02 openshift-153.lab.eng.nay.redhat.com systemd[1]: atomic-openshift-master-controllers.service failed.
Oct 31 16:25:02 openshift-153.lab.eng.nay.redhat.com atomic-openshift-node[25405]: I1031 16:25:02.422223 25405 kubelet.go:2704] SyncLoop (housekeeping)
Oct 31 16:25:02 openshift-153.lab.eng.nay.redhat.com atomic-openshift-node[25405]: I1031 16:25:02.598055 25405 generic.go:181] GenericPLEG: Relisting
Thanks for additional info. Still trying to figure out what's happening and find a reproducer. Moving off blocker list however as the impact is very low, it shouldn't be happening but I do not believe it should block the 3.4 release. With atomic-openshift-master-3.6.173.0.7-1.git.0.163458a.el7.x86_64, sometimes the atomic-openshift-controller was restarted on one of three masters. Created attachment 1317411 [details]
Journal logs for the above comment
There appear to be no active cases related to this bug. As such we're closing this bug in order to focus on bugs that are still tied to active customer cases. Please re-open this bug if you feel it was closed in error or a new active case is attached. |
Description of problem: atomic-openshift-controller was started when run upgrade_nodes.yml Version-Release number of selected component (if applicable): openshift-ansible-3.3.35-1.git.0.1be8ddc.el7.noarch How reproducible: always Steps to Reproduce: 1. install OSE 3.2 2. Upgrade v3_3/upgrade_control_plane.yml ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_3/upgrade_control_plane.yml 3. collect system service status 4. Upgrade to v3_3/upgrade_nodes.yml ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_3/upgrade_nodes.yml 5. collect system service status and diff with step 4) Actual results: atomic-openshift-controller was restated during nodes upgrade. Expected results: The master service shouldn't be restarted during node upgrade. Additional info: