Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1591957

Summary: node upgrade failed - node service unit stuck in "activating" state
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: Cluster Version OperatorAssignee: Scott Dodson <sdodson>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Gaoyun Pei <gpei>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.10.0CC: aos-bugs, jokerman, mmccomas
Target Milestone: ---   
Target Release: 3.10.z   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-02 20:10:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ansible, pod and system logs. inventory. none

Description Mike Fiedler 2018-06-15 20:43:28 UTC
Created attachment 1452065 [details]
ansible, pod and system logs.  inventory.

Description of problem:

1. 3.9.27 HA cluster:  1 lb/3 master/etcd co-located/2 infra/5 nodes
2. Successfully ran upgrade_control_plane.yml.  Master/etcd pods healthy and all nodes still Ready at 3.9 level
3. Ran upgrade_nodes.yml

1 node successfully upgraded to 3.10
3 nodes Ready at 3.9 level
1 node NotReady and not schedulable at 3.9.  systemd unit for node service:

● atomic-openshift-node.service - OpenShift Node
   Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
   Active: activating (start) since Fri 2018-06-15 20:29:26 UTC; 2min 24s ago

Last messages in the node log:


Jun 15 20:29:27 ip-172-31-24-3.us-west-2.compute.internal atomic-openshift-node[126627]: I0615 20:29:27.259195  126627 server.go:739] cloud provider determined current node name to be ip-172-31-24-3...e.internal
Jun 15 20:29:27 ip-172-31-24-3.us-west-2.compute.internal atomic-openshift-node[126627]: I0615 20:29:27.259289  126627 bootstrap.go:53] Using bootstrap kubeconfig to generate TLS client cert, key an...onfig file
Jun 15 20:29:27 ip-172-31-24-3.us-west-2.compute.internal atomic-openshift-node[126627]: I0615 20:29:27.260999  126627 bootstrap.go:79] No valid private key found for bootstrapping, creating a new one


Gathered the following (see attachment):

ansible -vvv log
journal from the failed node
/etc/origin tar from the failed node
node logs for master api and controllers
inventory



Version-Release number of the following components:
# ansible --version
ansible 2.4.3.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']                                                                                                      
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Apr 19 2018, 05:40:55) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]


How reproducible: Unknown

Steps to Reproduce:
1. See above