so: apparently, redis claims to be up and running. haproxy can not connect to redis. 84:M 16 Jul 19:04:11.583 * The server is now ready to accept connections on port 6379 84:M 16 Jul 19:04:11.583 * The server is now ready to accept connections at /var/run/redis/redis.sock 84:S 16 Jul 19:04:13.352 * SLAVE OF no-such-master:6379 enabled (user request from 'id=4 addr=/var/run/redis/redis.sock:0 fd=6 name= age=0 idle=0 flags=U db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=slaveof') 84:S 16 Jul 19:04:13.586 * Connecting to MASTER no-such-master:6379 84:S 16 Jul 19:04:13.589 # Unable to connect to MASTER: No such file or directory 84:S 16 Jul 19:04:14.591 * Connecting to MASTER no-such-master:6379 Jul 16 19:17:14 controller-0 haproxy[12]: 172.17.1.16:37784 [16/Jul/2019:19:17:14.189] redis redis/<NOSRV> -1/-1/0 0 SC 52/0/0/0/0 0/0 Jul 16 19:17:14 controller-0 haproxy[12]: 172.17.1.16:37786 [16/Jul/2019:19:17:14.189] redis redis/<NOSRV> -1/-1/0 0 SC 52/0/0/0/0 0/0 Jul 16 19:17:14 controller-0 haproxy[12]: 172.17.1.23:33529 [16/Jul/2019:19:17:14.185] mysql mysql/controller-0.internalapi.localdomain 1/0/158 8792 -- 51/51/51/51/0 0/0 Jul 16 19:17:14 controller-0 haproxy[12]: 172.17.1.16:37796 [16/Jul/2019:19:17:14.599] redis redis/<NOSRV> -1/-1/0 0 SC 52/0/0/0/0 0/0 Jul 16 19:17:14 controller-0 haproxy[12]: 172.17.1.16:37798 [16/Jul/2019:19:17:14.599] redis redis/<NOSRV> -1/-1/0 0 SC 52/0/0/0/0 0/0 Jul 16 19:17:15 controller-0 haproxy[12]: 172.17.1.16:37816 [16/Jul/2019:19:17:15.047] redis redis/<NOSRV> -1/-1/0 0 SC 53/0/0/0/0 0/0 Jul 16 19:17:15 controller-0 haproxy[12]: 172.17.1.16:37818 [16/Jul/2019:19:17:15.048] redis redis/<NOSRV> -1/-1/0 0 SC 53/0/0/0/0 0/0 Jul 16 19:17:15 controller-0 haproxy[12]: 172.17.1.16:37820 [16/Jul/2019:19:17:15.049] redis redis/<NOSRV> -1/-1/0 0 SC 53/0/0/0/0 0/0 Jul 16 19:17:15 controller-0 haproxy[12]: 172.17.1.16:37822 [16/Jul/2019:19:17:15.049] redis redis/<NOSRV> -1/-1/0 0 SC 53/0/0/0/0 0/0 Jul 16 19:17:15 controller-0 haproxy[12]: 172.17.1.14:41586 [16/Jul/2019:19:17:15.253] redis redis/<NOSRV> -1/-1/0 0 SC 53/0/0/0/0 0/0 Jul 16 19:17:15 controller-0 haproxy[12]: 172.17.1.14:41588 [16/Jul/2019:19:17:15.253] redis redis/<NOSRV> -1/-1/0 0 SC 53/0/0/0/0 0/0 Jul 16 19:17:15 controller-0 haproxy[12]: 172.17.1.16:37826 [16/Jul/2019:19:17:15.432] redis redis/<NOSRV> -1/-1/0 0 SC 53/0/0/0/0 0/0 Jul 16 19:17:15 controller-0 haproxy[12]: 172.17.1.16:37828 [16/Jul/2019:19:17:15.433] redis redis/<NOSRV> -1/-1/0 0 SC 53/0/0/0/0 0/0 Jul 16 19:17:15 controller-0 haproxy[12]: 172.17.1.14:41590 [16/Jul/2019:19:17:15.471] redis redis/<NOSRV> -1/-1/0 0 SC 53/0/0/0/0 0/0 Jul 16 19:17:15 controller-0 haproxy[12]: 172.17.1.14:41592 [16/Jul/2019:19:17:15.471] redis redis/<NOSRV> -1/-1/0 0 SC 53/0/0/0/0 0/0 Jul 16 19:17:15 controller-0 haproxy[12]: 172.17.1.16:37846 [16/Jul/2019:19:17:15.663] redis redis/<NOSRV> -1/-1/0 0 SC 53/0/0/0/0 0/0
(In reply to Matthias Runge from comment #5) > so: > apparently, redis claims to be up and running. > haproxy can not connect to redis. > > > 84:M 16 Jul 19:04:11.583 * The server is now ready to accept connections on > port 6379 > 84:M 16 Jul 19:04:11.583 * The server is now ready to accept connections at > /var/run/redis/redis.sock > 84:S 16 Jul 19:04:13.352 * SLAVE OF no-such-master:6379 enabled (user > request from 'id=4 addr=/var/run/redis/redis.sock:0 fd=6 name= age=0 idle=0 > flags=U db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 > events=r cmd=slaveof') > 84:S 16 Jul 19:04:13.586 * Connecting to MASTER no-such-master:6379 > 84:S 16 Jul 19:04:13.589 # Unable to connect to MASTER: No such file or > directory > 84:S 16 Jul 19:04:14.591 * Connecting to MASTER no-such-master:6379 > > > Jul 16 19:17:14 controller-0 haproxy[12]: 172.17.1.16:37784 > [16/Jul/2019:19:17:14.189] redis redis/<NOSRV> -1/-1/0 0 SC 52/0/0/0/0 0/0 > Jul 16 19:17:14 controller-0 haproxy[12]: 172.17.1.16:37786 > [16/Jul/2019:19:17:14.189] redis redis/<NOSRV> -1/-1/0 0 SC 52/0/0/0/0 0/0 > Jul 16 19:17:14 controller-0 haproxy[12]: 172.17.1.23:33529 > [16/Jul/2019:19:17:14.185] mysql mysql/controller-0.internalapi.localdomain > 1/0/158 8792 -- 51/51/51/51/0 0/0 > Jul 16 19:17:14 controller-0 haproxy[12]: 172.17.1.16:37796 > [16/Jul/2019:19:17:14.599] redis redis/<NOSRV> -1/-1/0 0 SC 52/0/0/0/0 0/0 > Jul 16 19:17:14 controller-0 haproxy[12]: 172.17.1.16:37798 > [16/Jul/2019:19:17:14.599] redis redis/<NOSRV> -1/-1/0 0 SC 52/0/0/0/0 0/0 > Jul 16 19:17:15 controller-0 haproxy[12]: 172.17.1.16:37816 > [16/Jul/2019:19:17:15.047] redis redis/<NOSRV> -1/-1/0 0 SC 53/0/0/0/0 0/0 > Jul 16 19:17:15 controller-0 haproxy[12]: 172.17.1.16:37818 we tracked this down to be an issue with haproxy 1.5.18-9 shipped in the latest containers, specifically this change: 2019-02-27 Ryan O'Hara <rohara> - 1.5.18-9 - Detect down servers with mutliple tcp-check connect rules (#1677420) apparently this breaks the multiple tcp-check logic we use to detect if redis is up or not. listen redis bind 172.17.1.23:6379 transparent balance first option tcp-check option tcplog tcp-check send AUTH\ <PASSWORD>\r\n tcp-check send PING\r\n tcp-check expect string +PONG tcp-check send info\ replication\r\n tcp-check expect string role:master tcp-check send QUIT\r\n tcp-check expect string +OK server controller-0.internalapi.localdomain 172.17.1.17:6379 check fall 5 inter 2000 on-marked-down shutdown-sessions rise 2 server controller-1.internalapi.localdomain 172.17.1.15:6379 check fall 5 inter 2000 on-marked-down shutdown-sessions rise 2 server controller-2.internalapi.localdomain 172.17.1.18:6379 check fall 5 inter 2000 on-marked-down shutdown-sessions rise 2 we reverted to the '2019-06-27.1' haproxy container and now it can correctly detect redis to be up: [root@controller-2 haproxy]# echo "show info;show stat" | nc -U stats |grep ^redis redis,FRONTEND,,,9,9,4096,9,0,0,0,0,0,,,,,OPEN,,,,,,,,,1,19,0,,,,0,0,0,8,,,,,,,,,,,0,0,0,,,0,0,0,0,,,,,,,, redis,controller-0.internalapi.localdomain,0,0,9,9,,9,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,21,0,,1,19,1,,9,,2,0,,8,L7OK,0,0,,,,,,,0,,,,0,0,,,,,19,(tcp-check),,0,0,0,0, redis,controller-1.internalapi.localdomain,0,0,0,0,,0,0,0,,0,,0,0,0,0,DOWN,1,1,0,1,1,9,9,,1,19,2,,0,,2,0,,0,L7TOUT,,10001,,,,,,,0,,,,0,0,,,,,-1,,,0,0,0,0, redis,controller-2.internalapi.localdomain,0,0,0,0,,0,0,0,,0,,0,0,0,0,DOWN,1,1,0,1,1,9,9,,1,19,3,,0,,2,0,,0,L7TOUT,,10001,,,,,,,0,,,,0,0,,,,,-1,,,0,0,0,0, redis,BACKEND,0,0,9,9,410,9,0,0,0,0,,0,0,0,0,UP,1,1,0,,0,21,0,,1,19,0,,9,,1,0,,8,,,,,,,,,,,,,,0,0,0,0,0,0,19,,,0,0,0,0
As state in rhbz#1677420, put a "tcp-check connect port <n>" statement before all the send/expect statements. Alternatively, specify a healthcheck port on each server line. If that does not work, use tcpdump to check that healthcheck are being made to the correct port. I realize the docs are a bit unclear on this -- you have a port on the server line (eg. 172.17.1.17:6379), thus port 6379 should be used for healthchecks unless "port" is used. But you don't used "port" on the server line, so I am not sure your tcp-check knows what port to connect to. I'm reluctant to blame the fix for rhbz#1677420 because that actually fixed a problem where tcp-check was not failing when it should. Now you're seeing failures with the fix, which makes me wonder if your redis proxy was ever detecing failed healthchecks correctly.
The detection has worked fine in the past. I added the connect stuff here https://review.opendev.org/671526. The problem I see is that it will be hard/impossible to land all this stuff for all OSP releases *before* RHEL7.7 comes out
(In reply to Michele Baldessari from comment #9) > The detection has worked fine in the past. > > I added the connect stuff here https://review.opendev.org/671526. The > problem I see is that it will be hard/impossible to land all this stuff for > all OSP releases *before* RHEL7.7 comes out We should try to proactively backport this ASAP for osp13 and osp14, and possibly/probably osp10 although we have not reproduced it there yet. I notice that the upstream code review already has a +2 from emilien.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2421
*** Bug 1751725 has been marked as a duplicate of this bug. ***