Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1385530

Summary:

atomic-openshift-controller shouldn't be started when run upgrade_nodes.yml

Product:

OpenShift Container Platform

Reporter:

Anping Li <anli>

Component:

Cluster Version Operator

Assignee:

Scott Dodson <sdodson>

Status:

CLOSED WONTFIX

QA Contact:

Anping Li <anli>

Severity:

low

Docs Contact:

Priority:

low

Version:

3.3.1

CC:

anli, aos-bugs, dgoodwin, jokerman, mmccomas

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2019-01-31 15:59:23 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
upgrade Group1 nodes	none
master system status before and after	none
Journal logs for controllers and nodes	none
Journal logs for the above comment	none

Description Anping Li 2016-10-17 09:31:58 UTC

Description of problem:
atomic-openshift-controller was started when run upgrade_nodes.yml 

Version-Release number of selected component (if applicable):
openshift-ansible-3.3.35-1.git.0.1be8ddc.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. install OSE 3.2 
2. Upgrade v3_3/upgrade_control_plane.yml
   ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_3/upgrade_control_plane.yml
3. collect system service status
4. Upgrade to v3_3/upgrade_nodes.yml
    ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_3/upgrade_nodes.yml
5.  collect system service status and diff with step 4)

Actual results:

atomic-openshift-controller was restated during nodes upgrade.

Expected results:

The master service shouldn't be restarted during node upgrade.

Additional info:

Comment 1 Anping Li 2016-10-17 09:35:49 UTC

Created attachment 1211256 [details]
upgrade Group1 nodes

Comment 2 Devan Goodwin 2016-10-28 15:27:17 UTC

Cannot reproduce with latest master.

Was this containers or rpm? Could you add your inventory?

Exactly what criteria were you using to determine if the service restarted?

Was there anything relevant in the journalctl logs for the service?

I made assumptions it was probably rpm based and was referring to atomic-openshift-master-controllers service, both it and -api service did not restart during upgrade_nodes, same PID, same start time in systemctl status, no restart indications in the journalctl logs.

Comment 3 Anping Li 2016-10-31 09:01:30 UTC

Created attachment 1215720 [details]
master system status before and after

Comment 5 Anping Li 2016-10-31 09:09:11 UTC

This is RPM upgrade.

Comment 6 Devan Goodwin 2016-10-31 12:48:24 UTC

Would you still have the journalctl logs for atomic-openshift-node and atomic-openshift-master-controllers?

There's no indication of a restart in ansible logs, I am as yet unable to tell why it would restart. Interesting that the API server does not however.

Could you also check if there was a systemd-journald rpm upgrade triggered anywhere in the journal.

Comment 7 Anping Li 2016-11-01 02:00:30 UTC

Created attachment 1215962 [details]
Journal logs for controllers and nodes

All tree atomic-openshift-master-controllers were restarted.  I was running v3_3/upgrade_nodes.yml on latest v3.3.1 Env, so it shouldn't caused by package upgrade.

Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-node[25405]: I1031 16:25:00.422137   25405 kubelet.go:2704] SyncLoop (housekeeping)
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-node[25405]: I1031 16:25:00.593568   25405 generic.go:181] GenericPLEG: Relisting
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: I1031 16:25:00.629276    9262 trace.go:61] Trace "Update /api/v1/nodes/openshift-138.lab.eng.nay.redhat.com/status" (starte
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [37.896µs] [37.896µs] About to convert to expected version
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [182.233µs] [144.337µs] Conversion done
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [190.943µs] [8.71µs] About to store object in database
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [1.020790176s] [1.020599233s] Object stored in database
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [1.020803997s] [13.821µs] Self-link added
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [1.020933432s] [129.435µs] END
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: I1031 16:25:00.630029    9262 trace.go:61] Trace "Update /api/v1/nodes/openshift-153.lab.eng.nay.redhat.com/status" (starte
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [30.678µs] [30.678µs] About to convert to expected version
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [152.421µs] [121.743µs] Conversion done
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [160.591µs] [8.17µs] About to store object in database
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [1.701706558s] [1.701545967s] Object stored in database
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [1.701713439s] [6.881µs] Self-link added
Oct 31 16:25:00 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-api[9262]: [1.701780068s] [66.629µs] END
Oct 31 16:25:01 openshift-153.lab.eng.nay.redhat.com atomic-openshift-node[25405]: I1031 16:25:01.595735   25405 generic.go:181] GenericPLEG: Relisting
Oct 31 16:25:02 openshift-153.lab.eng.nay.redhat.com atomic-openshift-master-controllers[30094]: F1031 16:25:02.047280   30094 start_master.go:569] Controller graceful shutdown requested
Oct 31 16:25:02 openshift-153.lab.eng.nay.redhat.com systemd[1]: atomic-openshift-master-controllers.service: main process exited, code=exited, status=255/n/a
Oct 31 16:25:02 openshift-153.lab.eng.nay.redhat.com systemd[1]: Unit atomic-openshift-master-controllers.service entered failed state.
Oct 31 16:25:02 openshift-153.lab.eng.nay.redhat.com systemd[1]: atomic-openshift-master-controllers.service failed.
Oct 31 16:25:02 openshift-153.lab.eng.nay.redhat.com atomic-openshift-node[25405]: I1031 16:25:02.422223   25405 kubelet.go:2704] SyncLoop (housekeeping)
Oct 31 16:25:02 openshift-153.lab.eng.nay.redhat.com atomic-openshift-node[25405]: I1031 16:25:02.598055   25405 generic.go:181] GenericPLEG: Relisting

Comment 8 Devan Goodwin 2016-11-01 12:32:54 UTC

Thanks for additional info. Still trying to figure out what's happening and find a reproducer.

Moving off blocker list however as the impact is very low, it shouldn't be happening but I do not believe it should block the 3.4 release.

Comment 9 Anping Li 2017-08-24 05:20:39 UTC

With atomic-openshift-master-3.6.173.0.7-1.git.0.163458a.el7.x86_64, sometimes the atomic-openshift-controller was restarted on one of three masters.

Comment 10 Anping Li 2017-08-24 05:24:51 UTC

Created attachment 1317411 [details]
Journal logs for the above comment

Comment 11 Scott Dodson 2019-01-31 15:59:23 UTC

There appear to be no active cases related to this bug. As such we're closing this bug in order to focus on bugs that are still tied to active customer cases. Please re-open this bug if you feel it was closed in error or a new active case is attached.