Bug 1384457

Summary: [ocp-on-osp]Scaling up failed due to the new node can't start
Product: OpenShift Container Platform Reporter: Gan Huang <ghuang>
Component: InstallerAssignee: Scott Dodson <sdodson>
Status: CLOSED ERRATA QA Contact: Gan Huang <ghuang>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.3.0CC: aos-bugs, jokerman, jprovazn, mmccomas
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-01 13:59:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Gan Huang 2016-10-13 10:17:28 UTC
Description of problem:
"volume_quota: 3" was set when creating the heat stack, then scaling up a node after the stack is completed. At last the stack was failed to update due to atomic-openshift-node on the new node can't start.

Version-Release number of selected component (if applicable):
openshift-on-openstack v0.9.2

How reproducible:
100%

Steps to Reproduce:
1.Create a heat stack with "volume_quota: 3"
<--snip-->
  node_count: 1
  master_count: 3
  infra_count: 2
  volume_quota: 3
  container_quota: 3
  deploy_registry: true
<--snip-->
2. Update the stack with "node_count: 2"
3.

Actual results:
Heat stack can't be updated due to openshift-ansible failed.

On the bastion node:

[root@ghuang-verify-bugs-bastion ~]# cat /var/lib/ansible/playbooks/main.yml 
- include: /var/lib/ansible/playbooks/dns.yml


- include: /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml
  vars:
    openshift_infra_nodes: "{{ groups.infra | default([]) }}"


- hosts: masters[0]
  sudo: yes
  tasks:
  - name: Fetch cert file
    fetch:
      src=/etc/origin/master/ca.crt
      dest=/var/run/heat-config/heat-config-script/01637a88-ce4e-492e-99c3-c65183c3533c.ca_cert
      flat=yes
<--snip-->

On the failed node:

Oct 13 06:15:53 ghuang-verify-bugs-openshift-node-b23nh2b8 systemd: Failed to start Atomic OpenShift Node.
Oct 13 06:15:53 ghuang-verify-bugs-openshift-node-b23nh2b8 systemd: Unit atomic-openshift-node.service entered failed state.
Oct 13 06:15:53 ghuang-verify-bugs-openshift-node-b23nh2b8 systemd: atomic-openshift-node.service failed.
Oct 13 06:15:58 ghuang-verify-bugs-openshift-node-b23nh2b8 systemd: atomic-openshift-node.service holdoff time over, scheduling restart.
Oct 13 06:15:58 ghuang-verify-bugs-openshift-node-b23nh2b8 systemd: Starting Atomic OpenShift Node...
Oct 13 06:15:58 ghuang-verify-bugs-openshift-node-b23nh2b8 atomic-openshift-node: I1013 06:15:58.307417   55005 docker.go:329] Start docker client with request timeout=2m0s
Oct 13 06:15:58 ghuang-verify-bugs-openshift-node-b23nh2b8 atomic-openshift-node: I1013 06:15:58.364409   55005 openstack.go:187] Got instance id from /var/lib/cloud/data/instance-id: 039f64f6-7ae4-4948-bac4-8205cefaff98
Oct 13 06:15:58 ghuang-verify-bugs-openshift-node-b23nh2b8 atomic-openshift-node: I1013 06:15:58.364432   55005 node_config.go:274] Successfully initialized cloud provider: "openstack" from the config file: "/etc/origin/cloudprovider/openstack.conf"
Oct 13 06:15:58 ghuang-verify-bugs-openshift-node-b23nh2b8 atomic-openshift-node: I1013 06:15:58.364444   55005 node.go:42] Initializing SDN node of type "redhat/openshift-ovs-subnet" with configured hostname "ghuang-verify-bugs-openshift-node-b23nh2b8.example.com" (IP ""), iptables sync period "30s"
Oct 13 06:15:58 ghuang-verify-bugs-openshift-node-b23nh2b8 atomic-openshift-node: I1013 06:15:58.365228   55005 start_node.go:298] Starting node ghuang-verify-bugs-openshift-node-b23nh2b8.example.com (v3.3.0.35)
Oct 13 06:15:58 ghuang-verify-bugs-openshift-node-b23nh2b8 atomic-openshift-node: I1013 06:15:58.373716   55005 start_node.go:307] Connecting to API server https://ghuang-verify-bugs-lb.example.com:8443
Oct 13 06:15:58 ghuang-verify-bugs-openshift-node-b23nh2b8 atomic-openshift-node: I1013 06:15:58.375179   55005 docker.go:309] Connecting to docker on unix:///var/run/docker.sock
Oct 13 06:15:58 ghuang-verify-bugs-openshift-node-b23nh2b8 atomic-openshift-node: I1013 06:15:58.375198   55005 docker.go:329] Start docker client with request timeout=0
Oct 13 06:15:58 ghuang-verify-bugs-openshift-node-b23nh2b8 docker-current: time="2016-10-13T06:15:58.378683379-04:00" level=info msg="{Action=_ping, LoginUID=4294967295, PID=55005}"
Oct 13 06:15:58 ghuang-verify-bugs-openshift-node-b23nh2b8 atomic-openshift-node: I1013 06:15:58.379212   55005 node.go:138] Connecting to Docker at unix:///var/run/docker.sock
Oct 13 06:15:58 ghuang-verify-bugs-openshift-node-b23nh2b8 atomic-openshift-node: I1013 06:15:58.402608   55005 node.go:217] Replacing empty-dir volume plugin with quota wrapper
Oct 13 06:15:58 ghuang-verify-bugs-openshift-node-b23nh2b8 atomic-openshift-node: F1013 06:15:58.407136   55005 node.go:222] Could not set up local quota, /var/lib/origin/openshift.local.volumes is not on a filesystem mounted with the grpquota option
Oct 13 06:15:58 ghuang-verify-bugs-openshift-node-b23nh2b8 systemd: atomic-openshift-node.service: main process exited, code=exited, status=255/n/a
Oct 13 06:15:58 ghuang-verify-bugs-openshift-node-b23nh2b8 systemd: Failed to start Atomic OpenShift Node.
Oct 13 06:15:58 ghuang-verify-bugs-openshift-node-b23nh2b8 systemd: Unit atomic-openshift-node.service entered failed state.
Oct 13 06:15:58 ghuang-verify-bugs-openshift-node-b23nh2b8 systemd: atomic-openshift-node.service failed.


Expected results:
Scaling up successfully.

Additional info:

Comment 1 Sylvain Baubeau 2016-10-13 17:00:32 UTC
Fixed by https://github.com/redhat-openstack/openshift-on-openstack/pull/280

Comment 2 Jan Provaznik 2016-10-17 07:44:07 UTC
fixed in 0.9.3

Comment 3 Gan Huang 2016-10-18 07:16:15 UTC
Verified with v0.9.3

Stack succeed to scale up/down.