Bug 1742169 - Scale up fails on all controller nodes with error"Job for tripleo_memcached-dmtfeqpf.service failed because the service did not take the steps required by its unit configuration"
Summary: Scale up fails on all controller nodes with error"Job for tripleo_memcached-d...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-paunch
Version: 15.0 (Stein)
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: 15.0 (Stein)
Assignee: Steve Baker
QA Contact: nlevinki
URL:
Whiteboard:
: 1743402 1744675 (view as bug list)
Depends On:
Blocks: 1690784 1737456
TreeView+ depends on / blocked
 
Reported: 2019-08-16 15:08 UTC by Eliad Cohen
Modified: 2019-11-11 20:31 UTC (History)
13 users (show)

Fixed In Version: python-paunch-4.5.1-0.20190829080435.f9349e0.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-21 11:24:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Undercloud files minus var folder (4.36 MB, application/x-xz)
2019-08-16 15:08 UTC, Eliad Cohen
no flags Details
undercloud var folder (15.92 MB, application/x-xz)
2019-08-16 15:09 UTC, Eliad Cohen
no flags Details
controller files (6.55 MB, application/x-xz)
2019-08-16 15:10 UTC, Eliad Cohen
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 679073 0 'None' MERGED Revert "Handle defined containers that are stopped." 2020-06-04 20:39:24 UTC
Red Hat Product Errata RHEA-2019:2811 0 None None None 2019-09-21 11:25:00 UTC

Description Eliad Cohen 2019-08-16 15:08:46 UTC
Created attachment 1604646 [details]
Undercloud files minus var folder

Description of problem:
When performing a scale up, the process terminates with error [1] on all controller nodes

Version-Release number of selected component (if applicable):
OSP15 core_puddle: RHOS_TRUNK-15.0-RHEL-8-20190813.n.0
CEPH compose: ceph-4.0-rhel-8-containers-candidate-64389-20190813102853

How reproducible:
100%

Steps to Reproduce:
1. Deploy osp with ceph 3 controller, 1 compute 1 ceph nodes
2. Scale up to 3,2,3 accordingly
3. Error on scale up script execution

Actual results:
Scale up fails with error

Expected results:
Scale up should succeed

Additional info:
[1] http://pastebin.test.redhat.com/789335

Comment 1 Eliad Cohen 2019-08-16 15:09:58 UTC
Created attachment 1604649 [details]
undercloud var folder

Comment 2 Eliad Cohen 2019-08-16 15:10:40 UTC
Created attachment 1604653 [details]
controller files

Comment 4 Luca Miccini 2019-08-19 08:15:30 UTC
It seems like there is an issue with the systemd unit file being generated (?), if I look at our lab we have "tripleo_memcached.service":

[root@controller-0 ~]# cat /etc/systemd/system/tripleo_memcached.service
[Unit]
Description=memcached container
After=paunch-container-shutdown.service
Wants=
[Service]
Restart=always
ExecStart=/usr/bin/podman start memcached
ExecStop=/usr/bin/podman stop -t 10 memcached
KillMode=none
Type=forking
PIDFile=/var/run/memcached.pid

[Install]
WantedBy=multi-user.target


while here:

$ cat tripleo_memcached-a9pap7zv.service
[Unit]
Description=memcached-a9pap7zv container
After=paunch-container-shutdown.service
Wants=
[Service]
Restart=always
ExecStart=/usr/bin/podman start memcached-a9pap7zv
ExecStop=/usr/bin/podman stop -t 10 memcached-a9pap7zv
KillMode=none
Type=forking
PIDFile=/var/run/memcached-a9pap7zv.pid

[Install]
WantedBy=multi-user.target

Comment 5 Luca Miccini 2019-08-19 09:48:04 UTC
issue is that paunch/podman are creating a container with a bogus name:

        "Start container memcached.",
        "$ podman create --name memcached-a9pap7zv --label config_id=tripleo_step1 --label container_name=memcached --label ...


will try to reproduce.

Comment 6 Luca Miccini 2019-08-19 12:42:35 UTC
Reproduced and spent some time with Michele trying to figure it out.

This looks like the same as in: 

https://bugs.launchpad.net/tripleo/+bug/1839929

fixed by (stein):

https://review.opendev.org/#/c/676984/

latest puddle's paunch version does not include the patch above.

Comment 7 Michele Baldessari 2019-08-20 05:40:55 UTC
*** Bug 1743402 has been marked as a duplicate of this bug. ***

Comment 9 Michele Baldessari 2019-08-22 17:02:33 UTC
*** Bug 1744675 has been marked as a duplicate of this bug. ***

Comment 15 Marius Cornea 2019-08-30 23:17:30 UTC
[root@controller-0 heat-admin]# rpm -q python3-paunch
python3-paunch-4.5.1-0.20190829080435.f9349e0.el8ost.noarch

Scale out completed successfully.

Comment 19 errata-xmlrpc 2019-09-21 11:24:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811


Note You need to log in before you can comment on or make changes to this bug.