Bug 1331380
Summary: | Prepare for Node evacuation failed during containerized upgrade | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Anping Li <anli> | ||||
Component: | Cluster Version Operator | Assignee: | Devan Goodwin <dgoodwin> | ||||
Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 3.2.0 | CC: | anli, aos-bugs, bleanhar, dgoodwin, jokerman, mmccomas, trankin | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.2.1 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-06-27 15:04:22 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Anping Li
2016-04-28 12:14:50 UTC
Created attachment 1152143 [details] Logs for upgrade on Atomic hosts When upgrade native ha on ATOMIC Host, Hit the same issue. It was a testblock for containerized OSE Upgrade. -bash-4.2# systemctl status atomic-openshift-master-api ● atomic-openshift-master-api.service - Atomic OpenShift Master API Loaded: loaded (/etc/systemd/system/atomic-openshift-master-api.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Fri 2016-04-29 06:30:13 UTC; 11min ago Docs: https://github.com/openshift/origin Process: 18174 ExecStop=/usr/bin/docker stop atomic-openshift-master-api (code=exited, status=0/SUCCESS) Process: 17413 ExecStartPost=/usr/bin/sleep 10 (code=exited, status=0/SUCCESS) Process: 17412 ExecStart=/usr/bin/docker run --rm --privileged --net=host --name atomic-openshift-master-api --env-file=/etc/sysconfig/atomic-openshift-master-api -v /var/lib/origin:/var/lib/origin -v /var/run/docker.sock:/var/run/docker.sock -v /etc/origin:/etc/origin openshift3/ose:${IMAGE_VERSION} start master api --config=${CONFIG_FILE} $OPTIONS (code=exited, status=2) Process: 17407 ExecStartPre=/usr/bin/docker rm -f atomic-openshift-master-api (code=exited, status=1/FAILURE) Main PID: 17412 (code=exited, status=2) Apr 29 06:29:11 atomic1master1.example.com atomic-openshift-master-api[17412]: [185.391µs] [12.375µs] About to store object in database Apr 29 06:29:11 atomic1master1.example.com atomic-openshift-master-api[17412]: [475.968183ms] [475.782792ms] END Apr 29 06:29:12 atomic1master1.example.com atomic-openshift-master-api[17412]: I0429 02:29:12.225890 1 ensure.go:86] Added replication-controller service accounts to the system:replication-co...role: <nil> Apr 29 06:29:12 atomic1master1.example.com atomic-openshift-master-api[17412]: I0429 02:29:12.551309 1 run_components.go:199] DNS listening at 0.0.0.0:53 Apr 29 06:30:12 atomic1master1.example.com systemd[1]: Stopping Atomic OpenShift Master API... Apr 29 06:30:13 atomic1master1.example.com atomic-openshift-master-api[18174]: atomic-openshift-master-api Apr 29 06:30:13 atomic1master1.example.com systemd[1]: atomic-openshift-master-api.service: main process exited, code=exited, status=2/INVALIDARGUMENT Apr 29 06:30:13 atomic1master1.example.com systemd[1]: Stopped Atomic OpenShift Master API. Apr 29 06:30:13 atomic1master1.example.com systemd[1]: Unit atomic-openshift-master-api.service entered failed state. Apr 29 06:30:13 atomic1master1.example.com systemd[1]: atomic-openshift-master-api.service failed. Hint: Some lines were ellipsized, use -l to show in full. Quicker and simpler reproducer, install with on RHEL server 7.2 with openshift_image_tag=v3.1.1.6, using latest openshift-ansible. Make sure this lands you with docker 1.8. Then change to openshift_image_tag=v3.2.0.20, run upgrade playbook. https://github.com/openshift/openshift-ansible/pull/1918 Fixed by using systemctl to restart docker, which is able to automatically restart the dependent services. Ansible's service command does a full stop and start, which does not. We will tackle the other issue where we're bouncing docker, then trying to evacuate the node, as part of upcoming improvements to upgrade. For this bugs, The node evacuation pass, so moved to verified. TASK: [Prepare for Node evacuation] ******************************************* <host4master.example.com> ESTABLISH CONNECTION FOR USER: root <host4master.example.com> REMOTE_MODULE command /usr/local/bin/oadm manage-node host4node.example.com --schedulable=false <host4master.example.com> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 host4master.example.com /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1465383087.54-96651821464828 && echo $HOME/.ansible/tmp/ansible-tmp-1465383087.54-96651821464828' <host4master.example.com> PUT /tmp/tmp7XkjxL TO /root/.ansible/tmp/ansible-tmp-1465383087.54-96651821464828/command <host4master.example.com> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 host4master.example.com /bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python /root/.ansible/tmp/ansible-tmp-1465383087.54-96651821464828/command; rm -rf /root/.ansible/tmp/ansible-tmp-1465383087.54-96651821464828/ >/dev/null 2>&1' changed: [host4node.example.com -> host4master.example.com] => {"changed": true, "cmd": ["/usr/local/bin/oadm", "manage-node", "host4node.example.com", "--schedulable=false"], "delta": "0:00:03.309119", "end": "2016-06-08 06:51:30.566772", "rc": 0, "start": "2016-06-08 06:51:27.257653", "stderr": "\n================================================================================\nATTENTION: You are running oadm via a wrapper around 'docker run openshift3/ose:v3.2.1.1'.\nThis wrapper is intended only to be used to bootstrap an environment. Please\ninstall client tools on another host once you have granted cluster-admin\nprivileges to a user. \nSee https://docs.openshift.com/enterprise/latest/cli_reference/get_started_cli.html\n=================================================================================", "stdout": "NAME STATUS AGE\nhost4node.example.com Ready,SchedulingDisabled 2h", "warnings": []} TASK: [Evacuate Node for Kubelet upgrade] ************************************* <host4master.example.com> ESTABLISH CONNECTION FOR USER: root <host4master.example.com> REMOTE_MODULE command /usr/local/bin/oadm manage-node host4node.example.com --evacuate --force <host4master.example.com> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 host4master.example.com /bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-tmp-1465383091.08-106825151439623 && echo $HOME/.ansible/tmp/ansible-tmp-1465383091.08-106825151439623' <host4master.example.com> PUT /tmp/tmp4NPAR_ TO /root/.ansible/tmp/ansible-tmp-1465383091.08-106825151439623/command <host4master.example.com> EXEC ssh -C -tt -vvv -o ControlMaster=auto -o ControlPersist=60s -o ControlPath="/root/.ansible/cp/ansible-ssh-%h-%p-%r" -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 host4master.example.com /bin/sh -c 'LANG=C LC_CTYPE=C /usr/bin/python /root/.ansible/tmp/ansible-tmp-1465383091.08-106825151439623/command; rm -rf /root/.ansible/tmp/ansible-tmp-1465383091.08-106825151439623/ >/dev/null 2>&1' changed: [host4node.example.com -> host4master.example.com] => {"changed": true, "cmd": ["/usr/local/bin/oadm", "manage-node", "host4node.example.com", "--evacuate", "--force"], "delta": "0:00:02.874339", "end": "2016-06-08 06:51:33.655613", "rc": 0, "start": "2016-06-08 06:51:30.781274", "stderr": "\n================================================================================\nATTENTION: You are running oadm via a wrapper around 'docker run openshift3/ose:v3.2.1.1'.\nThis wrapper is intended only to be used to bootstrap an environment. Please\ninstall client tools on another host once you have granted cluster-admin\nprivileges to a user. \nSee https://docs.openshift.com/enterprise/latest/cli_reference/get_started_cli.html\n=================================================================================", "stdout": "\nMigrating these pods on node: host4node.example.com\n\nNAME READY STATUS RESTARTS AGE\ncakephp-mysql-example-1-gao39 1/1 Running 0 16s\nmysql-1-6fwhs 1/1 Running 0 2h\ndocker-registry-2-ppfh8 1/1 Running 0 2h\nrouter-1-f9xba 1/1 Running 0 2h", "warnings": []} TASK: [Upgrade packages] ****************************************************** skipping: [host4node.example.com] Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1344 |