Created attachment 1604646 [details] Undercloud files minus var folder Description of problem: When performing a scale up, the process terminates with error [1] on all controller nodes Version-Release number of selected component (if applicable): OSP15 core_puddle: RHOS_TRUNK-15.0-RHEL-8-20190813.n.0 CEPH compose: ceph-4.0-rhel-8-containers-candidate-64389-20190813102853 How reproducible: 100% Steps to Reproduce: 1. Deploy osp with ceph 3 controller, 1 compute 1 ceph nodes 2. Scale up to 3,2,3 accordingly 3. Error on scale up script execution Actual results: Scale up fails with error Expected results: Scale up should succeed Additional info: [1] http://pastebin.test.redhat.com/789335
Created attachment 1604649 [details] undercloud var folder
Created attachment 1604653 [details] controller files
Tested using build 17 and 18 in : https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/ceph/view/rhos/job/DFG-ceph-rhos-15_director-rhel-virthost-3cont_1_to_2comp_1_to_3ceph-ipv4-geneve-scale-up/
It seems like there is an issue with the systemd unit file being generated (?), if I look at our lab we have "tripleo_memcached.service": [root@controller-0 ~]# cat /etc/systemd/system/tripleo_memcached.service [Unit] Description=memcached container After=paunch-container-shutdown.service Wants= [Service] Restart=always ExecStart=/usr/bin/podman start memcached ExecStop=/usr/bin/podman stop -t 10 memcached KillMode=none Type=forking PIDFile=/var/run/memcached.pid [Install] WantedBy=multi-user.target while here: $ cat tripleo_memcached-a9pap7zv.service [Unit] Description=memcached-a9pap7zv container After=paunch-container-shutdown.service Wants= [Service] Restart=always ExecStart=/usr/bin/podman start memcached-a9pap7zv ExecStop=/usr/bin/podman stop -t 10 memcached-a9pap7zv KillMode=none Type=forking PIDFile=/var/run/memcached-a9pap7zv.pid [Install] WantedBy=multi-user.target
issue is that paunch/podman are creating a container with a bogus name: "Start container memcached.", "$ podman create --name memcached-a9pap7zv --label config_id=tripleo_step1 --label container_name=memcached --label ... will try to reproduce.
Reproduced and spent some time with Michele trying to figure it out. This looks like the same as in: https://bugs.launchpad.net/tripleo/+bug/1839929 fixed by (stein): https://review.opendev.org/#/c/676984/ latest puddle's paunch version does not include the patch above.
*** Bug 1743402 has been marked as a duplicate of this bug. ***
*** Bug 1744675 has been marked as a duplicate of this bug. ***
[root@controller-0 heat-admin]# rpm -q python3-paunch python3-paunch-4.5.1-0.20190829080435.f9349e0.el8ost.noarch Scale out completed successfully.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2811