Bug 1737911

Summary: [UPDATE] gnocchi update fails with connection error
Product: Red Hat OpenStack Reporter: Raviv Bar-Tal <rbartal>
Component: openstack-tripleo-heat-templatesAssignee: RHOS Maint <rhos-maint>
Status: CLOSED DUPLICATE QA Contact: Sasha Smolyak <ssmolyak>
Severity: medium Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: amichel, apevec, jjoyce, jschluet, lhh, lmiccini, mburns, mrunge, pkilambi
Target Milestone: z9Keywords: Regression, Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1748364 (view as bug list) Environment:
Last Closed: 2019-09-03 14:20:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1748364    
Attachments:
Description Flags
Update log
none
redis log none

Description Raviv Bar-Tal 2019-08-06 11:31:29 UTC
Description of problem:
osp13 update from z5 to passed_phase3 (currently 2019.07.23.1) fail with the error:

2019-08-05 11:19:29 |         "2019-08-05 15:17:48,688 [22] INFO     gnocchi.cli.manage: Upgrading incoming storage RedisStorage: StrictRedis<ConnectionPool<Connection<host=
172.17.1.13,port=6379,db=0>>>", 
2019-08-05 11:19:29 |         "2019-08-05 15:17:48,694 [22] CRITICAL root: Traceback (most recent call last):", 
2019-08-05 11:19:29 |         "  File \"/usr/bin/gnocchi-upgrade\", line 10, in <module>", 
2019-08-05 11:19:29 |         "    sys.exit(upgrade())", 
2019-08-05 11:19:29 |         "  File \"/usr/lib/python2.7/site-packages/gnocchi/cli/manage.py\", line 73, in upgrade", 
2019-08-05 11:19:29 |         "    i.upgrade(conf.sacks_number)", 
2019-08-05 11:19:29 |         "  File \"/usr/lib/python2.7/site-packages/gnocchi/incoming/__init__.py\", line 71, in upgrade", 
2019-08-05 11:19:29 |         "    self.set_storage_settings(num_sacks)", 
2019-08-05 11:19:29 |         "  File \"/usr/lib/python2.7/site-packages/gnocchi/incoming/redis.py\", line 38, in set_storage_settings", 
2019-08-05 11:19:29 |         "    self._client.hset(self.CFG_PREFIX, self.CFG_SACKS, num_sacks)", 
2019-08-05 11:19:29 |         "  File \"/usr/lib/python2.7/site-packages/redis/client.py\", line 1853, in hset", 
2019-08-05 11:19:29 |         "    return self.execute_command('HSET', name, key, value)", 
2019-08-05 11:19:29 |         "  File \"/usr/lib/python2.7/site-packages/redis/client.py\", line 570, in execute_command", 
2019-08-05 11:19:29 |         "    connection.send_command(*args)", 
2019-08-05 11:19:29 |         "  File \"/usr/lib/python2.7/site-packages/redis/connection.py\", line 556, in send_command", 
2019-08-05 11:19:29 |         "    self.send_packed_command(self.pack_command(*args))", 
2019-08-05 11:19:29 |         "  File \"/usr/lib/python2.7/site-packages/redis/connection.py\", line 532, in send_packed_command", 
2019-08-05 11:19:29 |         "    self.connect()", 
2019-08-05 11:19:29 |         "  File \"/usr/lib/python2.7/site-packages/redis/connection.py\", line 440, in connect", 
2019-08-05 11:19:29 |         "    self.on_connect()", 
2019-08-05 11:19:29 |         "  File \"/usr/lib/python2.7/site-packages/redis/connection.py\", line 508, in on_connect", 
2019-08-05 11:19:29 |         "    if nativestr(self.read_response()) != 'OK':", 
2019-08-05 11:19:29 |         "  File \"/usr/lib/python2.7/site-packages/redis/connection.py\", line 569, in read_response", 
2019-08-05 11:19:29 |         "    response = self._parser.read_response()", 
2019-08-05 11:19:29 |         "  File \"/usr/lib/python2.7/site-packages/redis/connection.py\", line 224, in read_response", 
2019-08-05 11:19:29 |         "    response = self._buffer.readline()", 
2019-08-05 11:19:29 |         "  File \"/usr/lib/python2.7/site-packages/redis/connection.py\", line 162, in readline", 
2019-08-05 11:19:29 |         "    self._read_from_socket()", 
2019-08-05 11:19:29 |         "  File \"/usr/lib/python2.7/site-packages/redis/connection.py\", line 137, in _read_from_socket", 
2019-08-05 11:19:29 |         "    (e.args,))", 
2019-08-05 11:19:29 |         "ConnectionError: Error while reading from socket: ('Connection closed by server.',)", 
2019-08-05 11:19:29 |         "stdout: Debug: Runtime environment: puppet_version=4.8.2, ruby_version=2.0.0, run_mode=user, default_encoding=US-ASCII", 
2019-08-05 11:19:29 |         "Debug: Evicting cache entry for environment 'production'", 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. install osp13 z6
2. run update to passed_phase2 (currentlly 2019-07-23.1)
3. 

Actual results:
Controller update fail

Expected results:


Additional info:

Comment 1 Raviv Bar-Tal 2019-08-06 11:35:26 UTC
Created attachment 1600954 [details]
Update log

Comment 3 Matthias Runge 2019-08-19 14:47:49 UTC
Could you please provide redis logs?

Comment 5 Raviv Bar-Tal 2019-09-02 09:49:38 UTC
Redis log from controller-0 is attached

More logs can be found in https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/DFG-upgrades-updates-13-from-z7-HA-ipv4/3/artifact/

Comment 6 Raviv Bar-Tal 2019-09-02 09:51:46 UTC
Created attachment 1610729 [details]
redis log

Comment 7 Matthias Runge 2019-09-02 10:00:27 UTC
Okay, what I see is
4:S 01 Sep 17:04:27.293 # Connection with slave 172.17.1.12:6379 lost.
84:S 01 Sep 17:04:27.293 # Connection with slave 172.17.1.21:6379 lost.
84:S 01 Sep 17:04:27.293 * SLAVE OF no-such-master:6379 enabled (user request from 'id=14978 addr=/var/run/redis/redis.sock:0 fd=7 name= age=0 idle=0 flags=U db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=slaveof')
84:S 01 Sep 17:04:28.253 * Connecting to MASTER no-such-master:6379
84:S 01 Sep 17:04:28.257 # Unable to connect to MASTER: No such file or directory
84:S 01 Sep 17:04:29.259 * Connecting to MASTER no-such-master:6379
84:S 01 Sep 17:04:29.260 # Unable to connect to MASTER: No such file or directory
84:S 01 Sep 17:04:30.262 * Connecting to MASTER no-such-master:6379
84:S 01 Sep 17:04:30.264 # Unable to connect to MASTER: No such file or directory
84:S 01 Sep 17:04:31.248 * SLAVE OF would result into synchronization with the master we are already connected with. No operation performed.
84:S 01 Sep 17:04:31.269 * Connecting to MASTER no-such-master:6379
84:S 01 Sep 17:04:31.270 # Unable to connect to MASTER: No such file or directory
84:S 01 Sep 17:04:32.273 * Connecting to MASTER no-such-master:6379
84:S 01 Sep 17:04:32.273 # Unable to connect to MASTER: No such file or directory
84:S 01 Sep 17:04:33.277 * Connecting to MASTER no-such-master:6379
84:S 01 Sep 17:04:33.277 # Unable to connect to MASTER: No such file or directory
84:S 01 Sep 17:04:34.278 * Connecting to MASTER no-such-master:6379
84:S 01 Sep 17:04:34.279 # Unable to connect to MASTER: No such file or directory
84:S 01 Sep 17:04:35.281 * Connecting to MASTER no-such-master:6379
84:S 01 Sep 17:04:35.282 # Unable to connect to MASTER: No such file or directory
84:S 01 Sep 17:04:36.285 * Connecting to MASTER no-such-master:6379
84:S 01 Sep 17:04:36.285 # Unable to connect to MASTER: No such file or directory
84:S 01 Sep 17:04:37.289 * Connecting to MASTER no-such-master:6379
84:S 01 Sep 17:04:37.289 # Unable to connect to MASTER: No such file or directory
84:signal-handler (1567357477) Received SIGTERM scheduling shutdown...
84:S 01 Sep 17:04:37.790 # User requested shutdown...
84:S 01 Sep 17:04:37.790 * Saving the final RDB snapshot before exiting.
84:S 01 Sep 17:04:37.791 * DB saved on disk
84:S 01 Sep 17:04:37.792 * Removing the pid file.
84:S 01 Sep 17:04:37.792 * Removing the unix socket file.
84:S 01 Sep 17:04:37.792 # Redis is now ready to exit, bye bye...
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 3.2.8 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 84
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

84:M 01 Sep 17:08:44.497 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
84:M 01 Sep 17:08:44.497 # Server started, Redis version 3.2.8
84:M 01 Sep 17:08:44.497 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
84:M 01 Sep 17:08:44.497 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
84:M 01 Sep 17:08:44.498 * DB loaded from disk: 0.000 seconds
84:M 01 Sep 17:08:44.498 * The server is now ready to accept connections on port 6379
84:M 01 Sep 17:08:44.498 * The server is now ready to accept connections at /var/run/redis/redis.sock
84:S 01 Sep 17:08:46.162 * SLAVE OF controller-1:6379 enabled (user request from 'id=7 addr=/var/run/redis/redis.sock:0 fd=9 name= age=0 idle=0 flags=U db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=slaveof')
84:S 01 Sep 17:08:46.503 * Connecting to MASTER controller-1:6379
84:S 01 Sep 17:08:46.504 * MASTER <-> SLAVE sync started
84:S 01 Sep 17:08:46.504 * Non blocking connect for SYNC fired the event.
84:S 01 Sep 17:08:46.505 * Master replied to PING, replication can continue...
84:S 01 Sep 17:08:46.505 * Partial resynchronization not possible (no cached master)
84:S 01 Sep 17:08:46.507 * Full resync from master: 13fa3c3161e127343053bb4939df2dc85f76414f:323
84:S 01 Sep 17:08:46.554 * MASTER <-> SLAVE sync: receiving 1461 bytes from master
84:S 01 Sep 17:08:46.554 * MASTER <-> SLAVE sync: Flushing old data
84:S 01 Sep 17:08:46.554 * MASTER <-> SLAVE sync: Loading DB in memory
84:S 01 Sep 17:08:46.554 * MASTER <-> SLAVE sync: Finished with success



At the same time in HAproxy logs:
Sep  1 17:08:34 controller-0 haproxy[12]: Server redis/controller-0.internalapi.redhat.local is DOWN, reason: Layer4 connection problem, info: "Connection refused at step 2 of tcp-check (send)", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.


ep  1 17:08:33 controller-0 haproxy[11]: Proxy aodh started.
Sep  1 17:08:33 controller-0 haproxy[11]: Proxy cinder started.
Sep  1 17:08:33 controller-0 haproxy[11]: Proxy glance_api started.
Sep  1 17:08:33 controller-0 haproxy[11]: Proxy gnocchi started.
Sep  1 17:08:33 controller-0 haproxy[11]: Proxy haproxy.stats started.
Sep  1 17:08:33 controller-0 haproxy[11]: Proxy heat_api started.
Sep  1 17:08:33 controller-0 haproxy[11]: Proxy heat_cfn started.
Sep  1 17:08:33 controller-0 haproxy[11]: Proxy horizon started.
Sep  1 17:08:33 controller-0 haproxy[11]: Proxy keystone_admin started.
Sep  1 17:08:33 controller-0 haproxy[11]: Proxy keystone_public started.
Sep  1 17:08:33 controller-0 haproxy[11]: Proxy mysql started.
Sep  1 17:08:33 controller-0 haproxy[11]: Proxy neutron started.
Sep  1 17:08:33 controller-0 haproxy[11]: Proxy nova_metadata started.
Sep  1 17:08:33 controller-0 haproxy[11]: Proxy nova_novncproxy started.
Sep  1 17:08:33 controller-0 haproxy[11]: Proxy nova_osapi started.
Sep  1 17:08:33 controller-0 haproxy[11]: Proxy nova_placement started.
Sep  1 17:08:33 controller-0 haproxy[11]: Proxy panko started.
Sep  1 17:08:33 controller-0 haproxy[11]: Proxy redis started.
Sep  1 17:08:33 controller-0 haproxy[11]: Proxy sahara started.
Sep  1 17:08:33 controller-0 haproxy[11]: Proxy swift_proxy_server started.
Sep  1 17:08:33 controller-0 haproxy[11]: Backup Server mysql/controller-0.internalapi.redhat.local is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 2 backup servers left. Running on backup. 0 sessions active, 0 requeued, 0 remaining in queue.
Sep  1 17:08:34 controller-0 haproxy[12]: Server redis/controller-0.internalapi.redhat.local is DOWN, reason: Layer4 connection problem, info: "Connection refused at step 2 of tcp-check (send)", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Sep  1 17:08:44 controller-0 haproxy[12]: Server redis/controller-1.internalapi.redhat.local is DOWN, reason: Layer7 timeout, info: " at step 2 of tcp-check (send)", check duration: 10001ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Sep  1 17:08:44 controller-0 haproxy[12]: Server redis/controller-2.internalapi.redhat.local is DOWN, reason: Layer7 timeout, info: " at step 2 of tcp-check (send)", check duration: 10002ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Sep  1 17:08:44 controller-0 haproxy[12]: proxy redis has no server available!
Sep  1 17:22:48 controller-0 haproxy[12]: Server keystone_admin/controller-0.ctlplane.redhat.local is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Sep  1 17:22:48 controller-0 haproxy[12]: Server keystone_public/controller-0.internalapi.redhat.local is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Sep  1 17:22:50 controller-0 haproxy[12]: Server horizon/controller-0.internalapi.redhat.local is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Sep  1 17:23:38 controller-0 haproxy[12]: Server horizon/controller-0.internalapi.redhat.local is UP, reason: Layer7 check passed, code: 301, info: "Moved Permanently", check duration: 0ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Sep  1 17:23:40 controller-0 haproxy[12]: Server keystone_admin/controller-0.ctlplane.redhat.local is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 5ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Sep  1 17:23:40 controller-0 haproxy[12]: Server keystone_public/controller-0.internalapi.redhat.local is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 5ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Sep  1 17:26:46 controller-0 haproxy[12]: Server nova_metadata/controller-0.internalapi.redhat.local is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Sep  1 17:26:46 controller-0 haproxy[12]: Server swift_proxy_server/controller-0.storage.redhat.local is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Sep  1 17:26:46 controller-0 haproxy[12]: Server glance_api/controller-0.internalapi.redhat.local is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Sep  1 17:26:46 controller-0 haproxy[12]: Server panko/controller-0.internalapi.redhat.local is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Sep  1 17:26:46 controller-0 haproxy[12]: Server nova_osapi/controller-0.internalapi.redhat.local is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Sep  1 17:26:47 controller-0 haproxy[12]: Server nova_placement/controller-0.internalapi.redhat.local is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Sep  1 17:26:47 controller-0 haproxy[12]: Server heat_cfn/controller-0.internalapi.redhat.local is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Sep  1 17:26:48 controller-0 haproxy[12]: Server neutron/controller-0.internalapi.redhat.local is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Sep  1 17:26:50 controller-0 haproxy[12]: Server sahara/controller-0.internalapi.redhat.local is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Sep  1 17:26:52 controller-0 haproxy[12]: Server aodh/controller-0.internalapi.redhat.local is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Sep  1 17:26:53 controller-0 haproxy[12]: Server nova_novncproxy/controller-0.internalapi.redhat.local is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Sep  1 17:26:59 controller-0 haproxy[12]: Server nova_novncproxy/controller-0.internalapi.redhat.local is UP, reason: Layer4 check passed, check duration: 0ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Sep  1 17:27:01 controller-0 haproxy[12]: Server aodh/controller-0.internalapi.redhat.local is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 3ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Sep  1 17:27:05 controller-0 haproxy[12]: Server heat_cfn/controller-0.internalapi.redhat.local is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 1795ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Sep  1 17:27:06 controller-0 haproxy[12]: Server sahara/controller-0.internalapi.redhat.local is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 2ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Sep  1 17:27:09 controller-0 haproxy[12]: Server nova_placement/controller-0.internalapi.redhat.local is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 1ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Sep  1 17:27:12 controller-0 haproxy[12]: Server swift_proxy_server/controller-0.storage.redhat.local is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 5ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Sep  1 17:27:12 controller-0 haproxy[12]: Server neutron/controller-0.internalapi.redhat.local is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 6ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Sep  1 17:27:12 controller-0 haproxy[12]: Server glance_api/controller-0.internalapi.redhat.local is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 5ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Sep  1 17:27:17 controller-0 haproxy[12]: Server panko/controller-0.internalapi.redhat.local is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 1325ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Sep  1 17:27:18 controller-0 haproxy[12]: Server nova_metadata/controller-0.internalapi.redhat.local is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 3ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Sep  1 17:27:20 controller-0 haproxy[12]: Server nova_osapi/controller-0.internalapi.redhat.local is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 6ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Sep  1 17:28:27 controller-0 haproxy[12]: Server gnocchi/controller-0.internalapi.redhat.local is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Sep  1 17:38:27 controller-0 haproxy[12]: Server gnocchi/controller-1.internalapi.redhat.local is DOWN, reason: Layer7 timeout, check duration: 10000ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

Comment 8 Luca Miccini 2019-09-03 10:42:48 UTC
db_sync fails:

[root@controller-0 gnocchi]# docker ps -a |grep "(1)"
91fe53111859        192.168.24.1:8787/rhosp13/openstack-gnocchi-api:2019-08-19.1                 "kolla_start"            7 minutes ago       Exited (1) 7 minutes ago                        gnocchi_db_sync

because redis is down:

redis,FRONTEND,,,0,0,4096,0,0,0,0,0,0,,,,,OPEN,,,,,,,,,1,19,0,,,,0,0,0,0,,,,,,,,,,,0,0,0,,,0,0,0,0,,,,,,,,
redis,controller-0.internalapi.redhat.local,0,0,0,0,,0,0,0,,0,,0,0,0,0,DOWN,1,1,0,1,1,1294,1294,,1,19,1,,0,,2,0,,0,L7TOUT,,10001,,,,,,,0,,,,0,0,,,,,-1,,,0,0,0,0,
redis,controller-1.internalapi.redhat.local,0,0,0,0,,0,0,0,,0,,0,0,0,0,DOWN,1,1,0,1,1,1284,1284,,1,19,2,,0,,2,0,,0,L7TOUT,,10001,,,,,,,0,,,,0,0,,,,,-1,,,0,0,0,0,
redis,controller-2.internalapi.redhat.local,0,0,0,0,,0,0,0,,0,,0,0,0,0,DOWN,1,1,0,1,1,1284,1284,,1,19,3,,0,,2,0,,0,L7TOUT,,10001,,,,,,,0,,,,0,0,,,,,-1,,,0,0,0,0,

config timestamp is newer than the container start time:

[root@controller-0 gnocchi]# ll /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg 
-rw-r-----. 1 root root 13354 Sep  3 10:19 /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg

[root@controller-0 gnocchi]# docker inspect c717c4cd2174 |grep -i start
            "/usr/local/bin/kolla_start"
            "Restarting": false,
            "StartedAt": "2019-09-03T10:15:41.760521542Z",
        "RestartCount": 0,
            "RestartPolicy": {
                "/usr/local/bin/kolla_start"

To me this looks like a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1745857.

Comment 9 Luca Miccini 2019-09-03 10:52:39 UTC
btw with z7 you would also hit https://bugzilla.redhat.com/show_bug.cgi?id=1730723 (fixed by puppet-tripleo-8.4.1-20.el7ost).

Comment 10 Raviv Bar-Tal 2019-09-03 12:33:17 UTC
Same behaviour in osp14 
2019-09-02 13:15:39 |         "2019-09-02 17:06:12,739 [24] INFO     gnocchi.common.ceph: Ceph storage backend use 'cradox' python library", 
2019-09-02 13:15:39 |         "2019-09-02 17:06:12,781 [24] INFO     gnocchi.cli.manage: Upgrading storage CephStorage: c5278b74-cd6e-11e9-89a5-52540024bdc1", 
2019-09-02 13:15:39 |         "2019-09-02 17:06:12,784 [24] INFO     gnocchi.cli.manage: Upgrading incoming storage RedisStorage: StrictRedis<ConnectionPool<Connection<host=
172.17.1.14,port=6379,db=0>>>", 
2019-09-02 13:15:39 |         "2019-09-02 17:06:12,788 [24] CRITICAL root: Traceback (most recent call last):", 
2019-09-02 13:15:39 |         "  File \"/usr/bin/gnocchi-upgrade\", line 11, in <module>", 
2019-09-02 13:15:39 |         "    load_entry_point('gnocchi==4.3.3.dev4', 'console_scripts', 'gnocchi-upgrade')()", 
2019-09-02 13:15:39 |         "  File \"/usr/lib/python2.7/site-packages/gnocchi/cli/manage.py\", line 77, in upgrade", 
2019-09-02 13:15:39 |         "    i.upgrade(conf.sacks_number)", 
2019-09-02 13:15:39 |         "  File \"/usr/lib/python2.7/site-packages/gnocchi/incoming/__init__.py\", line 129, in upgrade", 
2019-09-02 13:15:39 |         "    self.set_storage_settings(num_sacks)", 
2019-09-02 13:15:39 |         "  File \"/usr/lib/python2.7/site-packages/gnocchi/incoming/redis.py\", line 69, in set_storage_settings", 
2019-09-02 13:15:39 |         "    self._client.hset(self.CFG_PREFIX, self.CFG_SACKS, num_sacks)", 
2019-09-02 13:15:39 |         "  File \"/usr/lib/python2.7/site-packages/redis/client.py\", line 1853, in hset", 
2019-09-02 13:15:39 |         "    return self.execute_command('HSET', name, key, value)", 
2019-09-02 13:15:39 |         "  File \"/usr/lib/python2.7/site-packages/redis/client.py\", line 570, in execute_command", 
2019-09-02 13:15:39 |         "    connection.send_command(*args)", 
2019-09-02 13:15:39 |         "  File \"/usr/lib/python2.7/site-packages/redis/connection.py\", line 556, in send_command", 
2019-09-02 13:15:39 |         "    self.send_packed_command(self.pack_command(*args))", 
2019-09-02 13:15:39 |         "  File \"/usr/lib/python2.7/site-packages/redis/connection.py\", line 532, in send_packed_command", 
2019-09-02 13:15:39 |         "    self.connect()", 
2019-09-02 13:15:39 |         "  File \"/usr/lib/python2.7/site-packages/redis/connection.py\", line 440, in connect", 
2019-09-02 13:15:39 |         "    self.on_connect()", 
2019-09-02 13:15:39 |         "  File \"/usr/lib/python2.7/site-packages/redis/connection.py\", line 508, in on_connect", 
2019-09-02 13:15:39 |         "    if nativestr(self.read_response()) != 'OK':", 
2019-09-02 13:15:39 |         "  File \"/usr/lib/python2.7/site-packages/redis/connection.py\", line 569, in read_response", 
2019-09-02 13:15:39 |         "    response = self._parser.read_response()", 
2019-09-02 13:15:39 |         "  File \"/usr/lib/python2.7/site-packages/redis/connection.py\", line 224, in read_response", 
2019-09-02 13:15:39 |         "    response = self._buffer.readline()", 
2019-09-02 13:15:39 |         "  File \"/usr/lib/python2.7/site-packages/redis/connection.py\", line 162, in readline", 
2019-09-02 13:15:39 |         "    self._read_from_socket()", 
2019-09-02 13:15:39 |         "  File \"/usr/lib/python2.7/site-packages/redis/connection.py\", line 137, in _read_from_socket", 
2019-09-02 13:15:39 |         "    (e.args,))", 
2019-09-02 13:15:39 |         "ConnectionError: Error while reading from socket: ('Connection closed by server.',)", 
2019-09-02 13:15:39 |         "stdout: Debug: Runtime environment: puppet_version=4.8.2, ruby_version=2.0.0, run_mode=user, default_encoding=US-ASCII", 
2019-09-02 13:15:39 |         "Debug: Evicting cache entry for environment 'production'", 
2019-09-02 13:15:39 |         "Debug: Caching environment 'production' (ttl = 0 sec)", 
2019-09-02 13:15:39 |         "Debug: Loading external facts from /etc/puppet/modules/openstacklib/facts.d",

Comment 12 Luca Miccini 2019-09-03 14:20:19 UTC

*** This bug has been marked as a duplicate of bug 1745857 ***