Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1573307 - FFU: ceph upgrade fails because Docker service is not running on the Ceph OSD nodes
Summary: FFU: ceph upgrade fails because Docker service is not running on the Ceph OSD...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: beta
: ---
Assignee: RHOS Maint
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-30 19:58 UTC by Marius Cornea
Modified: 2018-05-02 15:24 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-02 15:24:40 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Marius Cornea 2018-04-30 19:58:48 UTC
Description of problem:

FFU: ceph upgrade fails because Docker service is not running on the Ceph OSD nodes, snippet from /var/log/mistral/ceph-install-workflow.log:

[...]
2018-04-30 15:53:22,353 p=11902 u=mistral |  task path: /usr/share/ceph-ansible/roles/ceph-docker-common/tasks/fetch_image.yml:179
2018-04-30 15:53:22,353 p=11902 u=mistral |  Monday 30 April 2018  15:53:22 -0400 (0:00:00.036)       0:06:32.099 ********** 
2018-04-30 15:53:22,788 p=11902 u=mistral |  FAILED - RETRYING: pulling registry.access.redhat.com/rhceph/rhceph-3-rhel7:latest image (3 retries left).
2018-04-30 15:53:33,033 p=11902 u=mistral |  FAILED - RETRYING: pulling registry.access.redhat.com/rhceph/rhceph-3-rhel7:latest image (2 retries left).
2018-04-30 15:53:43,266 p=11902 u=mistral |  FAILED - RETRYING: pulling registry.access.redhat.com/rhceph/rhceph-3-rhel7:latest image (1 retries left).
2018-04-30 15:53:53,508 p=11902 u=mistral |  fatal: [192.168.24.10]: FAILED! => {"attempts": 3, "changed": false, "cmd": ["timeout", "300s", "docker", "pull", "registry.access.redhat.com/rhceph/rhceph-3-rhel7:latest"], "delta": "0:00:00.025122", "end": "2018-04-30 19:53:52.221277", "msg": "non-zero return code", "rc": 1, "start": "2018-04-30 19:53:52.196155", "stderr": "Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?", "stderr_lines": ["Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"], "stdout": "", "stdout_lines": []}
2018-04-30 15:53:53,510 p=11902 u=mistral |  PLAY RECAP *********************************************************************
2018-04-30 15:53:53,511 p=11902 u=mistral |  192.168.24.10              : ok=42   changed=4    unreachable=0    failed=1   
2018-04-30 15:53:53,511 p=11902 u=mistral |  192.168.24.12              : ok=121  changed=26   unreachable=0    failed=0   
2018-04-30 15:53:53,511 p=11902 u=mistral |  192.168.24.13              : ok=111  changed=21   unreachable=0    failed=0   
2018-04-30 15:53:53,511 p=11902 u=mistral |  192.168.24.18              : ok=2    changed=0    unreachable=0    failed=0   
2018-04-30 15:53:53,511 p=11902 u=mistral |  192.168.24.19              : ok=110  changed=22   unreachable=0    failed=0   
2018-04-30 15:53:53,511 p=11902 u=mistral |  192.168.24.23              : ok=2    changed=0    unreachable=0    failed=0   
2018-04-30 15:53:53,511 p=11902 u=mistral |  localhost                  : ok=0    changed=0    unreachable=0    failed=0   
2018-04-30 15:53:53,512 p=11902 u=mistral |  Monday 30 April 2018  15:53:53 -0400 (0:00:31.158)       0:07:03.257 ********** 
2018-04-30 15:53:53,512 p=11902 u=mistral |  =============================================================================== 


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-8.0.2-4.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. openstack overcloud ffwd-upgrade prepare 
2. openstack overcloud ffwd-upgrade run
3. openstack overcloud upgrade run --roles Controller --skip-tags validation
4. openstack overcloud upgrade run --roles Compute --skip-tags validation
5. openstack overcloud ffwd-upgrade converge
6. openstack overcloud ceph-upgrade run \
--timeout 100 \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
-e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
-e /home/stack/virt/internal.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/ffu_repos.yaml \
-e /home/stack/cli_opts_params.yaml \
-e /home/stack/ceph-ansible-env.yaml \
--ceph-ansible-playbook '/usr/share/ceph-ansible/infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml,/usr/share/ceph-ansible/infrastructure-playbooks/rolling_update.yml'

Actual results:
switch-from-non-containerized-to-containerized-ceph-daemons.yml playbook fails because the Docker service on ceph OSD nodes is not running

Expected results:
Ceph upgrade playbook finish without errors.

Additional info:

Comment 1 Lukas Bezdicka 2018-05-02 15:24:40 UTC
Upgrade step - openstack overcloud upgrade run - has to run on all nodes including Ceph.


Note You need to log in before you can comment on or make changes to this bug.