1419255 – Fail to redeploy certificates due to restart node's delay

Bug 1419255 - Fail to redeploy certificates due to restart node's delay

Summary: Fail to redeploy certificates due to restart node's delay

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Andrew Butcher
QA Contact:	Gaoyun Pei
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-02-04 10:30 UTC by liujia
Modified:	2017-07-24 14:11 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Environment:
Last Closed:	2017-04-11 21:24:30 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
redeploy log (1.38 MB, text/plain) 2017-02-04 10:32 UTC, liujia	no flags	Details
node service (236.39 KB, text/x-vhdl) 2017-02-04 10:33 UTC, liujia	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:0903	0	normal	SHIPPED_LIVE	OpenShift Container Platform atomic-openshift-utils bug fix and enhancement	2017-04-12 22:45:42 UTC

Description liujia 2017-02-04 10:30:56 UTC

Description of problem:
Run redeploy-certificates playbook against ocp3.5(master/node/etcd), playbook will fail and exit on task [restart node] of play [Restart nodes]. But the node service will come to "running" after playbook halt.

TASK [restart node] ************************************************************
fatal: [x.x.x.x]: FAILED! => {
    "changed": false, 
    "failed": true
}

MSG:

Unable to restart service atomic-openshift-node: Job for atomic-openshift-node.service failed because the control process exited with error code. See "systemctl status atomic-openshift-node.service" and "journalctl -xe" for details.

	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-certificates.retry


Version-Release number of selected component (if applicable):
openshift-ansible-3.5.3-1.git.0.80c2436.el7.noarch
ansible-2.2.0.0-1.el7.noarch

How reproducible:
always

Steps to Reproduce:
1.Container install ocp3.5 on atomic host.
2.Run redeploy certificates playbook
# ansible-playbook -i /tmp/hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-certificates.yml -v | tee /tmp/redeploy.log 
3.

Actual results:
Playbook exit at task [restart node]. 

Expected results:
It should redeploy certificates successfully.

Additional info:
redeploy log in attachment.
atomic-openshift-node log in attachment.

Comment 2 liujia 2017-02-04 10:32:46 UTC

Created attachment 1247674 [details]
redeploy log

Comment 3 liujia 2017-02-04 10:33:11 UTC

Created attachment 1247675 [details]
node service

Comment 5 Gaoyun Pei 2017-02-10 07:21:15 UTC

(In reply to Andrew Butcher from comment #4)
> The new certificate redeploy playbooks have merged from
> https://github.com/openshift/openshift-ansible/pull/2671 so moving to ON_QA
> to try with the new changes.
> 
> @Gaoyun, have you encountered a similar while verifying the new playbooks?

Didn't encounter such issue during the testing.

We should have three cert redeploy playbook may trigger nodes restart:
playbooks/byo/openshift-cluster/redeploy-openshift-ca.yml
playbooks/byo/openshift-cluster/redeploy-certificates.yml
playbooks/byo/openshift-cluster/redeploy-node-certificates.yml

I tried all of them against various ocp-3.5 env, including containerized/rpm env, multi-master, 7 nodes cluster, all the node restart are successful.

Move this bug to verified with openshift-ansible-3.5.6-1.git.0.5e6099d.el7.noarch.

Note You need to log in before you can comment on or make changes to this bug.