Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1562121 - When using file driver and gnocchi is failing to mkdir, connection is not drop to redis
When using file driver and gnocchi is failing to mkdir, connection is not dro...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-gnocchi (Show other bugs)
10.0 (Newton)
Unspecified Unspecified
high Severity high
: z9
: 10.0 (Newton)
Assigned To: Julien Danjou
Sasha Smolyak
: Triaged, ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-03-29 10:52 EDT by David Vallee Delisle
Modified: 2018-09-17 13:00 EDT (History)
10 users (show)

See Also:
Fixed In Version: openstack-gnocchi-3.0.21-1.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, Gnocchi attempted to create a storage directory on every startup, even if the storage directory already existed. Gnocchi failed to start if the directory creation failed. With this update, gnocchi-upgrade creates the storage directory only once. As a result, Gnocchi starts successfully.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-09-17 12:59:16 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
dvd: needinfo+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
RDO 14078 None None None 2018-06-05 18:04 EDT
Github gnocchixyz/gnocchi/pull/893 None None None 2018-05-21 07:46 EDT
Red Hat Product Errata RHBA-2018:2671 None None None 2018-09-17 13:00 EDT

  None (edit)
Description David Vallee Delisle 2018-03-29 10:52:25 EDT
Description of problem:
If gnocchi is unable to create metric/mkdir on backend, it leaves the connection to redis opened.

There's a situation right now where the gnocchi backend has a limit of 65532 files (we know we can solve this by deleting expired metrics). So gnocchi-carbonara is unable to create directory:
~~~
2018-03-29 14:44:22.443 86371 ERROR gnocchi.storage._carbonara [-] Error processing new measures
2018-03-29 14:44:22.443 86371 ERROR gnocchi.storage._carbonara Traceback (most recent call last):
2018-03-29 14:44:22.443 86371 ERROR gnocchi.storage._carbonara   File "/usr/lib/python2.7/site-packages/gnocchi/storage/_carbonara.py", line 538, in process_new_measures
2018-03-29 14:44:22.443 86371 ERROR gnocchi.storage._carbonara     self._create_metric(metric)
2018-03-29 14:44:22.443 86371 ERROR gnocchi.storage._carbonara   File "/usr/lib/python2.7/site-packages/gnocchi/storage/file.py", line 108, in _create_metric
2018-03-29 14:44:22.443 86371 ERROR gnocchi.storage._carbonara     os.mkdir(path, 0o750)
2018-03-29 14:44:22.443 86371 ERROR gnocchi.storage._carbonara OSError: [Errno 5] Input/output error: '/var/lib/gnocchi/afba56af-9fc7-4778-ab13-c65ff8a24688'
2018-03-29 14:44:22.443 86371 ERROR gnocchi.storage._carbonara 
~~~

When this happens, gnocchi leaves the connection to redis opened instead of closing it, and at some point, redis is suffering from resource starvation.

As you can see with the following example, 75% of the connections to haproxy are for redis. At some point, it's going to blow and cause more impact to other services.
~~~
# ss -tanp | grep haproxy | grep -oP "^ESTAB .*192.168.1.(11|28):\K([0-9]+)" | sort | uniq -c | sort -nr | head -1
   6131 6379
# ss -tanp | grep haproxy | wc -l
8245
~~~

 

Version-Release number of selected component (if applicable):
haproxy-1.5.18-6.el7.x86_64
openstack-gnocchi-api-3.0.15-1.el7ost.noarch
openstack-gnocchi-carbonara-3.0.15-1.el7ost.noarch
openstack-gnocchi-common-3.0.15-1.el7ost.noarch
openstack-gnocchi-indexer-sqlalchemy-3.0.15-1.el7ost.noarch
openstack-gnocchi-metricd-3.0.15-1.el7ost.noarch
openstack-gnocchi-statsd-3.0.15-1.el7ost.noarch
puppet-gnocchi-9.5.0-3.el7ost.noarch
puppet-haproxy-1.5.0-3.f8c5f27git.el7ost.noarch
puppet-redis-1.2.3-2.el7ost.noarch
python-gnocchi-3.0.15-1.el7ost.noarch
python-gnocchiclient-2.8.2-2.el7ost.noarch
python-redis-2.10.3-3.el7ost.noarch
redis-3.0.6-2.el7ost.x86_64


How reproducible:
All the time. It takes ~48h in this environment to break

Steps to Reproduce:
1. Gnocchi uses file storage
2. Storage should be unreliable
3. Wait 48h

Actual results:
Connections to redis are piling up

Expected results:
Connections should get closed

Additional info:
Comment 1 Julien Danjou 2018-03-29 12:01:05 EDT
This does not make any sense to me for now.

There should be one connection to Redis per metricd processor running, and one per metricd scheduler, that's all. The exception you see is caught handled and the lock is then released on Redis side. There's no reconnection or whatever to Redis at this point.

So I'm struggling to see what the source of this might be. Are you sure that Gnocchi is the source of this?

Redis is also used by e.g. Ceilometer FWIW.
Comment 20 Julien Danjou 2018-05-31 10:26:40 EDT
3.0.21 released upstream with the fix. Need a rebase. Prad? :)
Comment 33 Alex 2018-09-03 04:00:57 EDT
Hi there,

If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field.

The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.

Thanks,
Alex
Comment 35 errata-xmlrpc 2018-09-17 12:59:16 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2671

Note You need to log in before you can comment on or make changes to this bug.