Bug 2028413

Summary: After UC FFU OSP13 to 16.2.0 tripleo_memcached_healthcheck.service fails
Product: Red Hat OpenStack Reporter: Amedeo Salvati <asalvati>
Component: openstack-tripleo-commonAssignee: Adriano Petrich <apetrich>
Status: CLOSED DUPLICATE QA Contact: David Rosenfeld <drosenfe>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.2 (Train)CC: apevec, asalvati, aschultz, jschluet, lhh, lmiccini, mburns, ravsingh, slinaber
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-11 06:43:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Amedeo Salvati 2021-12-02 11:03:20 UTC
Description of problem:
After UC FFU to 16.2.0 tripleo service tripleo_memcached_healthcheck.service fails with error: "healthcheck_memcached[132238]: /usr/share/openstack-tripleo-common/healthcheck/common.sh: line 160: $1: unbound variable" [1].

Looking at code inside /usr/share/openstack-tripleo-common/healthcheck/memcached it calls wrap_ipv6 function against /etc/sysconfig/memcached file:

 [root@undercloud sysconfig]# cat /usr/share/openstack-tripleo-common/healthcheck/memcached 
#!/bin/bash
. ${HEALTHCHECK_SCRIPTS:-/usr/share/openstack-tripleo-common/healthcheck}/common.sh

listen_addr=$(wrap_ipv6 $(awk 'match($0, /-l +([0-9a-fA-F\.\:]+) /, a) {print a[1]}' /etc/sysconfig/memcached))

echo "version" | socat - TCP:$listen_addr:11211 1>/dev/null
exit $?

but running that awk returns null / empty string:

 [root@undercloud ~]# awk 'match($0, /-l +([0-9a-fA-F\.\:]+) /, a) {print a[1]}' /var/lib/config-data/puppet-generated/memcached/etc/sysconfig/memcached 
 [root@undercloud ~]# cat /var/lib/config-data/puppet-generated/memcached/etc/sysconfig/memcached 
PORT="11211"
USER="memcached"
MAXCONN="8192"
CACHESIZE="15979"
OPTIONS="-v -l 127.0.0.1,192.168.201.30 -U 0 -X -t 4"

instead on the old one returns the loopback address:

 [root@undercloud ~]# cat /etc/sysconfig/memcached.rpmsave 
PORT="11211"
USER="memcached"
MAXCONN="8192"
CACHESIZE="11975"
OPTIONS="-v -l 127.0.0.1 -U 0 -X -t 4 >> /var/log/memcached.log 2>&1"

IMO the awk could be fixed inserting "\s*":

 [root@undercloud ~]# awk 'match($0, /-l +([0-9a-fA-F\.\:]+)\s*/, a) {print a[1]}' /var/lib/config-data/puppet-generated/memcached/etc/sysconfig/memcached 
127.0.0.1

 [root@undercloud healthcheck]# diff -u /usr/share/openstack-tripleo-common/healthcheck/memcached.ORIG /usr/share/openstack-tripleo-common/healthcheck/memcached
--- /usr/share/openstack-tripleo-common/healthcheck/memcached.ORIG      2021-05-15 12:48:09.000000000 +0000
+++ /usr/share/openstack-tripleo-common/healthcheck/memcached   2021-12-02 10:52:53.925922152 +0000
@@ -1,7 +1,7 @@
 #!/bin/bash
 . ${HEALTHCHECK_SCRIPTS:-/usr/share/openstack-tripleo-common/healthcheck}/common.sh
 
-listen_addr=$(wrap_ipv6 $(awk 'match($0, /-l +([0-9a-fA-F\.\:]+) /, a) {print a[1]}' /etc/sysconfig/memcached))
+listen_addr=$(wrap_ipv6 $(awk 'match($0, /-l +([0-9a-fA-F\.\:]+)\s*/, a) {print a[1]}' /etc/sysconfig/memcached))
 
 echo "version" | socat - TCP:$listen_addr:11211 1>/dev/null
 exit $?


[1] 
 [root@undercloud sysconfig]# systemctl status tripleo_memcached_healthcheck.service
● tripleo_memcached_healthcheck.service - memcached healthcheck
   Loaded: loaded (/etc/systemd/system/tripleo_memcached_healthcheck.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2021-12-02 10:35:17 UTC; 2s ago
  Process: 153371 ExecStart=/usr/bin/podman exec --user root memcached /openstack/healthcheck (code=exited, status=1/FAILURE)
 Main PID: 153371 (code=exited, status=1/FAILURE)

Dec 02 10:35:16 undercloud.example.com systemd[1]: Starting memcached healthcheck...
Dec 02 10:35:17 undercloud.example.com podman[153371]: 2021-12-02 10:35:17.365222614 +0000 UTC m=+0.523695166 container exec 2c1b6d0fff654efbbeaa6d4797225830997f0886108796e9514036b33443f24f (image=undercloud.ctlplane.example.com:8787/r>
Dec 02 10:35:17 undercloud.example.com healthcheck_memcached[153371]: /usr/share/openstack-tripleo-common/healthcheck/common.sh: line 160: $1: unbound variable
Dec 02 10:35:17 undercloud.example.com systemd[1]: tripleo_memcached_healthcheck.service: Main process exited, code=exited, status=1/FAILURE
Dec 02 10:35:17 undercloud.example.com systemd[1]: tripleo_memcached_healthcheck.service: Failed with result 'exit-code'.
Dec 02 10:35:17 undercloud.example.com systemd[1]: Failed to start memcached healthcheck.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. UC FFU from 13 to 16.2.0
2. systemctl status tripleo_memcached_healthcheck.service
3.

Actual results:
Service tripleo_memcached_healthcheck.service fails with error:
healthcheck_memcached[201289]: /usr/share/openstack-tripleo-common/healthcheck/common.sh: line 160: $1: unbound variable


Expected results:
Service tripleo_memcached_healthcheck.service able to check memcached status


Additional info:

Comment 1 Lon Hohberger 2022-01-10 17:57:25 UTC
healthcheck_memcached[201289]: /usr/share/openstack-tripleo-common/healthcheck/common.sh: line 160: $1: unbound variable

https://opendev.org/openstack/tripleo-common/src/branch/stable/train/healthcheck/common.sh#L160
->
https://opendev.org/openstack/tripleo-common/src/branch/stable/train/healthcheck/memcached

This does not appear to be the memcached component failing, rather, it appears to be a bug in the healthcheck script from in tripleo-common.

Comment 2 Alex Schultz 2022-01-10 22:58:30 UTC
https://bugs.launchpad.net/tripleo/+bug/1929881
https://review.opendev.org/c/openstack/tripleo-common/+/794736

It's likely fixed in a subsequent 16.2.x zstream.

Comment 3 Luca Miccini 2022-01-11 06:43:50 UTC

*** This bug has been marked as a duplicate of bug 1961321 ***