1562121 – When using file driver and gnocchi is failing to mkdir, connection is not drop to redis

Bug 1562121 - When using file driver and gnocchi is failing to mkdir, connection is not drop to redis

Summary: When using file driver and gnocchi is failing to mkdir, connection is not dro...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-gnocchi
Sub Component:
Version:	10.0 (Newton)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	z9
Target Release:	10.0 (Newton)
Assignee:	Julien Danjou
QA Contact:	Sasha Smolyak
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-03-29 14:52 UTC by David Vallee Delisle
Modified:	2021-12-10 16:07 UTC (History)
CC List:	10 users (show)
Fixed In Version:	openstack-gnocchi-3.0.21-1.el7ost
Doc Type:	Bug Fix
Doc Text:	Previously, Gnocchi attempted to create a storage directory on every startup, even if the storage directory already existed. Gnocchi failed to start if the directory creation failed. With this update, gnocchi-upgrade creates the storage directory only once. As a result, Gnocchi starts successfully.
Clone Of:
Environment:
Last Closed:	2018-09-17 16:59:16 UTC
Target Upstream Version:
Embargoed:
Flags:	dvd: needinfo+

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	gnocchixyz gnocchi pull 893	None	closed	storage/file: create directories only on upgrade()	2020-10-06 00:16:32 UTC
RDO	14078	None	None	None	2018-06-05 22:04:03 UTC
Red Hat Issue Tracker	OSP-11381	None	None	None	2021-12-10 16:07:53 UTC
Red Hat Product Errata	RHBA-2018:2671	None	None	None	2018-09-17 17:00:47 UTC

Description David Vallee Delisle 2018-03-29 14:52:25 UTC

Description of problem:
If gnocchi is unable to create metric/mkdir on backend, it leaves the connection to redis opened.

There's a situation right now where the gnocchi backend has a limit of 65532 files (we know we can solve this by deleting expired metrics). So gnocchi-carbonara is unable to create directory:
~~~
2018-03-29 14:44:22.443 86371 ERROR gnocchi.storage._carbonara [-] Error processing new measures
2018-03-29 14:44:22.443 86371 ERROR gnocchi.storage._carbonara Traceback (most recent call last):
2018-03-29 14:44:22.443 86371 ERROR gnocchi.storage._carbonara   File "/usr/lib/python2.7/site-packages/gnocchi/storage/_carbonara.py", line 538, in process_new_measures
2018-03-29 14:44:22.443 86371 ERROR gnocchi.storage._carbonara     self._create_metric(metric)
2018-03-29 14:44:22.443 86371 ERROR gnocchi.storage._carbonara   File "/usr/lib/python2.7/site-packages/gnocchi/storage/file.py", line 108, in _create_metric
2018-03-29 14:44:22.443 86371 ERROR gnocchi.storage._carbonara     os.mkdir(path, 0o750)
2018-03-29 14:44:22.443 86371 ERROR gnocchi.storage._carbonara OSError: [Errno 5] Input/output error: '/var/lib/gnocchi/afba56af-9fc7-4778-ab13-c65ff8a24688'
2018-03-29 14:44:22.443 86371 ERROR gnocchi.storage._carbonara 
~~~

When this happens, gnocchi leaves the connection to redis opened instead of closing it, and at some point, redis is suffering from resource starvation.

As you can see with the following example, 75% of the connections to haproxy are for redis. At some point, it's going to blow and cause more impact to other services.
~~~
# ss -tanp | grep haproxy | grep -oP "^ESTAB .*192.168.1.(11|28):\K([0-9]+)" | sort | uniq -c | sort -nr | head -1
   6131 6379
# ss -tanp | grep haproxy | wc -l
8245
~~~

 

Version-Release number of selected component (if applicable):
haproxy-1.5.18-6.el7.x86_64
openstack-gnocchi-api-3.0.15-1.el7ost.noarch
openstack-gnocchi-carbonara-3.0.15-1.el7ost.noarch
openstack-gnocchi-common-3.0.15-1.el7ost.noarch
openstack-gnocchi-indexer-sqlalchemy-3.0.15-1.el7ost.noarch
openstack-gnocchi-metricd-3.0.15-1.el7ost.noarch
openstack-gnocchi-statsd-3.0.15-1.el7ost.noarch
puppet-gnocchi-9.5.0-3.el7ost.noarch
puppet-haproxy-1.5.0-3.f8c5f27git.el7ost.noarch
puppet-redis-1.2.3-2.el7ost.noarch
python-gnocchi-3.0.15-1.el7ost.noarch
python-gnocchiclient-2.8.2-2.el7ost.noarch
python-redis-2.10.3-3.el7ost.noarch
redis-3.0.6-2.el7ost.x86_64


How reproducible:
All the time. It takes ~48h in this environment to break

Steps to Reproduce:
1. Gnocchi uses file storage
2. Storage should be unreliable
3. Wait 48h

Actual results:
Connections to redis are piling up

Expected results:
Connections should get closed

Additional info:

Comment 1 Julien Danjou 2018-03-29 16:01:05 UTC

This does not make any sense to me for now.

There should be one connection to Redis per metricd processor running, and one per metricd scheduler, that's all. The exception you see is caught handled and the lock is then released on Redis side. There's no reconnection or whatever to Redis at this point.

So I'm struggling to see what the source of this might be. Are you sure that Gnocchi is the source of this?

Redis is also used by e.g. Ceilometer FWIW.

Comment 20 Julien Danjou 2018-05-31 14:26:40 UTC

3.0.21 released upstream with the fix. Need a rebase. Prad? :)

Comment 33 Alex McLeod 2018-09-03 08:00:57 UTC

Hi there,

If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field.

The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.

Thanks,
Alex

Comment 35 errata-xmlrpc 2018-09-17 16:59:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2671

Note You need to log in before you can comment on or make changes to this bug.