Description of problem: ----------------------- Attempt to setup heat's output failed: openstack overcloud update stack --init-minor-update --container-registry-file /home/stack/virt/docker-images.yaml Waiting for messages on queue '737184d8-0c45-4990-bb26-9acf4f23731e' with no timeout. Started Mistral Workflow tripleo.package_update.v1.package_update_plan. Execution ID: a5b114d8-7973-4551-891b-0c1b1a067d5f 2017-11-23 15:38:35Z [Networks]: UPDATE_IN_PROGRESS state changed 2017-11-23 15:38:36Z [overcloud-Networks-oe4at5i7bgvr]: UPDATE_IN_PROGRESS Stack UPDATE started ... 2017-11-23 15:51:42Z [overcloud-AllNodesDeHeat Stack update failed. Heat Stack update failed. ploySteps-xhntoqtltqqf.WorkflowTasks_Step2_Execution]: CREATE_IN_PROGRESS state changed 2017-11-23 15:52:51Z [overcloud-AllNodesDeploySteps-xhntoqtltqqf.WorkflowTasks_Step2_Execution]: CREATE_FAILED resources.WorkflowTasks_Step2_Execution: ERROR 2017-11-23 15:52:52Z [overcloud-AllNodesDeploySteps-xhntoqtltqqf]: UPDATE_FAILED resources.WorkflowTasks_Step2_Execution: ERROR 2017-11-23 15:52:53Z [AllNodesDeploySteps]: UPDATE_FAILED resources.AllNodesDeploySteps: resources.WorkflowTasks_Step2_Execution: ERROR 2017-11-23 15:52:54Z [overcloud]: UPDATE_FAILED resources.AllNodesDeploySteps: resources.WorkflowTasks_Step2_Execution: ERROR Stack overcloud UPDATE_FAILED overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution: resource_type: OS::Mistral::ExternalResource physical_resource_id: ea461a72-48b4-46d7-b21b-0298049bd9ca status: CREATE_FAILED status_reason: | resources.WorkflowTasks_Step2_Execution: ERROR From /var/log/mistral/ceph-install-workflow.log ... 2017-11-23 10:52:48,775 p=19122 u=mistral | PLAY [confirm whether user really meant to upgrade the cluster] **************** 2017-11-23 10:52:48,787 p=19122 u=mistral | TASK [Gathering Facts] ********************************************************* 2017-11-23 10:52:49,239 p=19122 u=mistral | fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "module_stderr": "sudo: a password is required\n", "module_stdout": "", "msg": "MODULE FAILURE", "rc": 1} 2017-11-23 10:52:49,241 p=19122 u=mistral | PLAY RECAP ********************************************************************* 2017-11-23 10:52:49,241 p=19122 u=mistral | localhost : ok=0 changed=0 unreachable=0 failed=1 Version-Release number of selected component (if applicable): ------------------------------------------------------------- puppet-ceph-2.4.2-0.20170927195215.718a5ff.el7ost.noarch ceph-ansible-3.0.14-1.el7cp.noarch python-mistral-5.2.0-1.el7ost.noarch openstack-mistral-executor-5.2.0-1.el7ost.noarch python-mistralclient-3.1.3-2.el7ost.noarch python-mistral-lib-0.3.1-1.el7ost.noarch openstack-mistral-common-5.2.0-1.el7ost.noarch openstack-mistral-engine-5.2.0-1.el7ost.noarch openstack-mistral-api-5.2.0-1.el7ost.noarch puppet-mistral-11.3.1-0.20170825184651.cf2e493.el7ost.noarch Steps to Reproduce: ------------------- 1. Update uc to 2017-11-22.7 2. Upload later images 3. Try to setup heat's output Actual results: --------------- Failed to setup heat's output Additional info: ---------------- Virtual setup: 3controllers + 2computes + 3ceph
Giulio could you please have a look at the SOS report?
One workaround for this is to add the user running ansible to the sudoers file.
done, feel free to re-arrange
Failed to update the overcloud with the latest ceph container image with an error in one of the OSDs: fatal: [192.168.24.11]: FAILED! => {"changed": false, "cmd": ["docker", "run", "--rm", "--entrypoint", "/usr/bin/ceph", "brew-pulp-docker01.web.pro d.ext.phx2.redhat.com:8888/rhceph:ceph-2-rhel-7-docker-candidate-81064-20180205070134", "--version"], "delta": "0:00:00.662242", "end": "2018-02-06 03:13:19.996086", "msg": "non-zero return c ode", "rc": 127, "start": "2018-02-06 03:13:19.333844", "stderr": "container_linux.go:247: starting container process caused \"process_linux.go:258: applying cgroup configuration for process caused \\\"open /sys/fs/cgroup/pids/system.slice/docker-5f32487f85449859a1d51d2cb12ff2336ffdeeec8876ea7132ce438830f51147.scope/cgroup.procs: no such file or directory\\\"\"\n/usr/bin/docker-c urrent: Error response from daemon: invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:258: applying cgroup configu ration for process caused \\\\\\\"open /sys/fs/cgroup/pids/system.slice/docker-5f32487f85449859a1d51d2cb12ff2336ffdeeec8876ea7132ce438830f51147.scope/cgroup.procs: no such file or directory\\ \\\\\"\\\"\\n\".", "stderr_lines": ["container_linux.go:247: starting container process caused \"process_linux.go:258: applying cgroup configuration for process caused \\\"open /sys/fs/cgroup /pids/system.slice/docker-5f32487f85449859a1d51d2cb12ff2336ffdeeec8876ea7132ce438830f51147.scope/cgroup.procs: no such file or directory\\\"\"", "/usr/bin/docker-current: Error response from daemon: invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:258: applying cgroup configuration for process caused \\ \\\\\"open /sys/fs/cgroup/pids/system.slice/docker-5f32487f85449859a1d51d2cb12ff2336ffdeeec8876ea7132ce438830f51147.scope/cgroup.procs: no such file or directory\\\\\\\"\\\"\\n\"."], "stdout" : "", "stdout_lines": []}
This looks like a Docker error to me. Nothing related to the container image.
Yogev, can you investigate further this error? It looks like the Docker engine is having an issue. Let us know if there is something we can help you with. But for now, I believe you're hitting an issue that is unrelated to the original bug. What is your plan? Are you going to test on another env? Thanks
The controller Ceph image was updated but the Ceph storage nodes (the OSDs) were not updated [heat-admin@ceph-0 ~]$ sudo docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES a618c3ca7c97 docker-registry.engineering.redhat.com/ceph/rhceph-2-rhel7:2.4-4 "/entrypoint.sh" About an hour ago Up About an hour ceph-osd-ceph-0-vdb 2826a5b4a576 192.168.24.1:8787/rhosp12/openstack-cron:2018-01-24.2 "kolla_start" 12 hours ago Up About an hour logrotate_crond [heat-admin@controller-2 ~]$ sudo docker ps | grep ceph ce5f796336e8 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhceph:ceph-2-rhel-7-docker-candidate-81064-20180205070134 "/entrypoint.sh" 8 minutes ago Up 8 minutes ceph-mon-controller-2
the version is: ceph-ansible-3.0.23-1.el7cp.noarch
Yogev, this looks like a different issue. Which test are you running? Why do you expect the image to change? Anyway, can you provide the playbook logs? Ideally, an env with the error as well. Thanks in advance.
leseb, the environment is being preserved for you
Thanks, let me know when it's available and send me details so I can login. Thanks
The environment is ready, available. its detailed have been provided on IRC
This is solved in https://bugzilla.redhat.com/show_bug.cgi?id=1526513
verified on ceph-ansible-3.0.25-1.el7cp.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0340