1742169 – Scale up fails on all controller nodes with error"Job for tripleo_memcached-dmtfeqpf.service failed because the service did not take the steps required by its unit configuration"

Bug 1742169 - Scale up fails on all controller nodes with error"Job for tripleo_memcached-dmtfeqpf.service failed because the service did not take the steps required by its unit configuration"

Summary: Scale up fails on all controller nodes with error"Job for tripleo_memcached-d...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	python-paunch
Sub Component:
Version:	15.0 (Stein)
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	15.0 (Stein)
Assignee:	Steve Baker
QA Contact:	nlevinki
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1743402 1744675 (view as bug list)
Depends On:
Blocks:	1690784 1737456
TreeView+	depends on / blocked

Reported:	2019-08-16 15:08 UTC by Eliad Cohen
Modified:	2019-11-11 20:31 UTC (History)
CC List:	13 users (show)
Fixed In Version:	python-paunch-4.5.1-0.20190829080435.f9349e0.el8ost
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-09-21 11:24:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Undercloud files minus var folder (4.36 MB, application/x-xz) 2019-08-16 15:08 UTC, Eliad Cohen	no flags	Details
undercloud var folder (15.92 MB, application/x-xz) 2019-08-16 15:09 UTC, Eliad Cohen	no flags	Details
controller files (6.55 MB, application/x-xz) 2019-08-16 15:10 UTC, Eliad Cohen	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenStack gerrit	679073	0	'None'	MERGED	Revert "Handle defined containers that are stopped."	2020-06-04 20:39:24 UTC
Red Hat Product Errata	RHEA-2019:2811	0	None	None	None	2019-09-21 11:25:00 UTC

Description Eliad Cohen 2019-08-16 15:08:46 UTC

Created attachment 1604646 [details]
Undercloud files minus var folder

Description of problem:
When performing a scale up, the process terminates with error [1] on all controller nodes

Version-Release number of selected component (if applicable):
OSP15 core_puddle: RHOS_TRUNK-15.0-RHEL-8-20190813.n.0
CEPH compose: ceph-4.0-rhel-8-containers-candidate-64389-20190813102853

How reproducible:
100%

Steps to Reproduce:
1. Deploy osp with ceph 3 controller, 1 compute 1 ceph nodes
2. Scale up to 3,2,3 accordingly
3. Error on scale up script execution

Actual results:
Scale up fails with error

Expected results:
Scale up should succeed

Additional info:
[1] http://pastebin.test.redhat.com/789335

Comment 1 Eliad Cohen 2019-08-16 15:09:58 UTC

Created attachment 1604649 [details]
undercloud var folder

Comment 2 Eliad Cohen 2019-08-16 15:10:40 UTC

Created attachment 1604653 [details]
controller files

Comment 3 Eliad Cohen 2019-08-16 15:11:00 UTC

Tested using build 17 and 18 in : https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/ceph/view/rhos/job/DFG-ceph-rhos-15_director-rhel-virthost-3cont_1_to_2comp_1_to_3ceph-ipv4-geneve-scale-up/

Comment 4 Luca Miccini 2019-08-19 08:15:30 UTC

It seems like there is an issue with the systemd unit file being generated (?), if I look at our lab we have "tripleo_memcached.service":

[root@controller-0 ~]# cat /etc/systemd/system/tripleo_memcached.service
[Unit]
Description=memcached container
After=paunch-container-shutdown.service
Wants=
[Service]
Restart=always
ExecStart=/usr/bin/podman start memcached
ExecStop=/usr/bin/podman stop -t 10 memcached
KillMode=none
Type=forking
PIDFile=/var/run/memcached.pid

[Install]
WantedBy=multi-user.target


while here:

$ cat tripleo_memcached-a9pap7zv.service
[Unit]
Description=memcached-a9pap7zv container
After=paunch-container-shutdown.service
Wants=
[Service]
Restart=always
ExecStart=/usr/bin/podman start memcached-a9pap7zv
ExecStop=/usr/bin/podman stop -t 10 memcached-a9pap7zv
KillMode=none
Type=forking
PIDFile=/var/run/memcached-a9pap7zv.pid

[Install]
WantedBy=multi-user.target

Comment 5 Luca Miccini 2019-08-19 09:48:04 UTC

issue is that paunch/podman are creating a container with a bogus name:

        "Start container memcached.",
        "$ podman create --name memcached-a9pap7zv --label config_id=tripleo_step1 --label container_name=memcached --label ...


will try to reproduce.

Comment 6 Luca Miccini 2019-08-19 12:42:35 UTC

Reproduced and spent some time with Michele trying to figure it out.

This looks like the same as in: 

https://bugs.launchpad.net/tripleo/+bug/1839929

fixed by (stein):

https://review.opendev.org/#/c/676984/

latest puddle's paunch version does not include the patch above.

Comment 7 Michele Baldessari 2019-08-20 05:40:55 UTC

*** Bug 1743402 has been marked as a duplicate of this bug. ***

Comment 9 Michele Baldessari 2019-08-22 17:02:33 UTC

*** Bug 1744675 has been marked as a duplicate of this bug. ***

Comment 15 Marius Cornea 2019-08-30 23:17:30 UTC

[root@controller-0 heat-admin]# rpm -q python3-paunch
python3-paunch-4.5.1-0.20190829080435.f9349e0.el8ost.noarch

Scale out completed successfully.

Comment 19 errata-xmlrpc 2019-09-21 11:24:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811

Note You need to log in before you can comment on or make changes to this bug.