Created attachment 1717522 [details] package_update.log Description of problem: During FFU from 13 to 16.1 we were stuck at below 1. to leapp upgrade on overcloud nodes we executed below ~~~ openstack overcloud upgrade run --tags system_upgrade --limit overcloud-controller-0 ~~~ 2. Deployment failed with ~~~ 2020-09-21 15:04:43,623 p=5132 u=mistral n=ansible | fatal: [overcloud-controller-0]: FAILED! => {"changed": false, "msg": "There are no enabled repos.\n Run \"yum repolist all\" to see the repos you have.\n To enable Red Hat Subscription Management repositories:\n subscription-manager repos --enable <repo>\n To enable custom repositories:\n yum-config-manager --enable <repo>\n", "rc": 1, "results": []} ~~~ 3. We enabled repos but now the deployment fails at much early while trying to set certain conditions on OSDs ~~~ 2020-09-21 15:25:58,604 p=5945 u=mistral n=ansible | failed: [overcloud-controller-0 -> 172.16.0.21] (item=nodeep-scrub) => {"ansible_loop_var": "item", "changed": true, "cmd": "docker exec -u root ceph-mon-${HOSTNAME} ceph osd set nodeep-scrub", "delta": "0:00:00.023928", "end": "2020-09-29 11:12:28.027507", "item": "nodeep-scrub", "msg": "non-zero return code", "rc": 1, "start": "2020-09-29 11:12:28.003579", "stderr": "Error response from daemon: No such container: ceph-mon-overcloud-controller-0", "stderr_lines": ["Error response from daemon: No such container: ceph-mon-overcloud-controller-0"], "stdout": "", "stdout_lines": []} ~~~ 4. But eventually during the execution of 1st command we already stopped all docker containers so it's now failing at much early. ~~~ 2020-09-21 15:01:49,161 p=5132 u=mistral n=ansible | TASK [Stop all services by stopping all docker containers] ********************* 2020-09-21 15:01:49,162 p=5132 u=mistral n=ansible | Monday 21 September 2020 15:01:49 -0400 (0:00:03.066) 0:00:50.624 ****** 2020-09-21 15:01:49,355 p=5132 u=mistral n=ansible | TASK [tripleo-podman : Check if docker is enabled in the system] *************** 2020-09-21 15:01:49,355 p=5132 u=mistral n=ansible | Monday 21 September 2020 15:01:49 -0400 (0:00:00.193) 0:00:50.818 ****** 2020-09-21 15:01:49,632 p=5132 u=mistral n=ansible | ok: [overcloud-controller-0] => {"changed": false, "failed_when_result": false, "stat": {"atime": 1601376605.4744918, "attr_flags": "", "attributes": [], "block_size": 4096, "blocks": 0, "charset": "binary", "ctime": 1600358610.215621, "dev": 19, "device_type": 0, "executable": false, "exists": true, "gid": 1002, "gr_name": "docker", "inode": 65479, "isblk": false, "ischr": false, "isdir": false, "isfifo": false, "isgid": false, "islnk": false, "isreg": false, "issock": true, "isuid": false, "mimetype": "inode/socket", "mode": "0660", "mtime": 1600358610.215621, "nlink": 1, "path": "/var/run/docker.sock", "pw_name": "root", "readable": true, "rgrp": true, "roth": false, "rusr": true, "size": 0, "uid": 0, "version": null, "wgrp": true, "woth": false, "writeable": true, "wusr": true, "xgrp": false, "xoth": false, "xusr": false}} 2020-09-21 15:01:49,673 p=5132 u=mistral n=ansible | TASK [tripleo-podman : Stop all services by stopping all Docker containers] **** ~~~ We need to set some sort of mechanism to get out of this situation ,either skip those tasks or provide some mechanism to start playbook at the step where it actually fails. Upgrade is completely blocked due to this. I will attach package_update.log file Version-Release number of selected component (if applicable): OSP13 to 16 How reproducible: 100% Steps to Reproduce: 1. 2. 3. Actual results: Upgrade is failed much early. Expected results: It should skip already executed steps. Additional info: