Bug 1479889 - measures not being flushed to swift - how to drop the backlog / drop the db content?
measures not being flushed to swift - how to drop the backlog / drop the db c...
Status: CLOSED NOTABUG
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-gnocchi (Show other bugs)
10.0 (Newton)
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Pradeep Kilambi
Sasha Smolyak
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-09 12:19 EDT by Luca Miccini
Modified: 2017-09-01 04:24 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-09-01 04:24:27 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Luca Miccini 2017-08-09 12:19:07 EDT
Description of problem:

gnocchi backlog is stuck with ~9k measures to process:

$ gnocchi status
+-----------------------------------------------------+-------+
| Field                                               | Value |
+-----------------------------------------------------+-------+
| storage/number of metric having measures to process | 367   |
| storage/total number of measures to process         | 8694  |
+-----------------------------------------------------+-------+

we suspect we ended up in this situation because of issue with the underlying swift (under investigation).

Now swift has been fixed but still metrics/measures are not being flushed.

Suspicion is that some of the containers are not there anymore (metricd.log):

2017-08-09 10:26:46.597 410023 ERROR swiftclient ClientException: Object DELETE failed: http://10.1.0.167:8080/v1/AUTH_f30fb12391754e1dad13da9aa3e57f24/gnocchi.85e090e2-aac4-40d3-aa7f-3bb4bb10aefd/none_v3 404 Not Found  [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<
2017-08-09 10:26:46.621 410023 INFO swiftclient [-] RESP STATUS: 404 Not Found
2017-08-09 10:26:46.621 410023 INFO swiftclient [-] RESP BODY: <html><h1>Not Found</h1><p>The resource could not be found.</p></html>
2017-08-09 10:26:46.621 410023 ERROR swiftclient [-] Container GET failed: http://10.1.0.167:8080/v1/AUTH_f30fb12391754e1dad13da9aa3e57f24/gnocchi.85e090e2-aac4-40d3-aa7f-3bb4bb10aefd?format=json 404 Not Found  [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<
2017-08-09 10:26:46.621 410023 ERROR swiftclient ClientException: Container GET failed: http://10.1.0.167:8080/v1/AUTH_f30fb12391754e1dad13da9aa3e57f24/gnocchi.85e090e2-aac4-40d3-aa7f-3bb4bb10aefd?format=json 404 Not Found  [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<
2017-08-09 10:26:46.634 410023 INFO swiftclient [-] RESP STATUS: 404 Not Found
2017-08-09 10:26:46.634 410023 INFO swiftclient [-] RESP BODY: <html><h1>Not Found</h1><p>The resource could not be found.</p></html>
2017-08-09 10:26:46.634 410023 ERROR swiftclient [-] Object DELETE failed: http://10.1.0.167:8080/v1/AUTH_f30fb12391754e1dad13da9aa3e57f24/gnocchi.85e0ea0d-6d5f-4b71-9809-b6d67bbc57a2/none_v3 404 Not Found  [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<
2017-08-09 10:26:46.634 410023 ERROR swiftclient ClientException: Object DELETE failed: http://10.1.0.167:8080/v1/AUTH_f30fb12391754e1dad13da9aa3e57f24/gnocchi.85e0ea0d-6d5f-4b71-9809-b6d67bbc57a2/none_v3 404 Not Found  [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<
2017-08-09 10:26:46.659 410023 INFO swiftclient [-] RESP STATUS: 404 Not Found
2017-08-09 10:26:46.660 410023 INFO swiftclient [-] RESP BODY: <html><h1>Not Found</h1><p>The resource could not be found.</p></html>
2017-08-09 10:26:46.660 410023 ERROR swiftclient [-] Container GET failed: http://10.1.0.167:8080/v1/AUTH_f30fb12391754e1dad13da9aa3e57f24/gnocchi.85e0ea0d-6d5f-4b71-9809-b6d67bbc57a2?format=json 404 Not Found  [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<
2017-08-09 10:26:46.660 410023 ERROR swiftclient ClientException: Container GET failed: http://10.1.0.167:8080/v1/AUTH_f30fb12391754e1dad13da9aa3e57f24/gnocchi.85e0ea0d-6d5f-4b71-9809-b6d67bbc57a2?format=json 404 Not Found  [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<


As the overcloud is almost empty is there a way to drop the content of the backlog (as well as the rest of the data contained in the gnocchi swift containers) and start from an empty ceilometer/gnocchi db?


Version-Release number of selected component (if applicable):

openstack-gnocchi-api-3.0.6-2.el7ost.noarch                 
openstack-gnocchi-carbonara-3.0.6-2.el7ost.noarch           
openstack-gnocchi-common-3.0.6-2.el7ost.noarch              
openstack-gnocchi-indexer-sqlalchemy-3.0.6-2.el7ost.noarch  
openstack-gnocchi-metricd-3.0.6-2.el7ost.noarch             
openstack-gnocchi-statsd-3.0.6-2.el7ost.noarch             
puppet-gnocchi-9.5.0-1.el7ost.noarch                        
python-gnocchi-3.0.6-2.el7ost.noarch                       
python-gnocchiclient-2.8.2-2.el7ost.noarch                  


How reproducible:

unsure / not reproducible

Steps to Reproduce:
1.
2.
3.

Actual results:

measures are not being saved.

Expected results:

gnocchi to recover once the storage backend is functional again.

Additional info:
Comment 2 Luca Miccini 2017-08-10 08:29:59 EDT
Hi,

we somehow managed to start from scratch by:

1. truncating tables content in the gnocchi mysql DB
2. cleaning swift content (actually wiping swift clean by re-formatting the drives)

backlog is being processed now:


[stack@ospd-osp1-cs-01 ~]$ gnocchi status
+-----------------------------------------------------+-------+
| Field                                               | Value |
+-----------------------------------------------------+-------+
| storage/number of metric having measures to process | 487   |
| storage/total number of measures to process         | 487   |
+-----------------------------------------------------+-------+
...
+-----------------------------------------------------+-------+
| Field                                               | Value |
+-----------------------------------------------------+-------+
| storage/number of metric having measures to process | 228   |
| storage/total number of measures to process         | 228   |
+-----------------------------------------------------+-------+
...
+-----------------------------------------------------+-------+
| Field                                               | Value |
+-----------------------------------------------------+-------+
| storage/number of metric having measures to process | 0     |
| storage/total number of measures to process         | 0     |
+-----------------------------------------------------+-------+

lowering the severity of this BZ.

Regards
Luca
Comment 4 Julien Danjou 2017-08-11 08:56:47 EDT
Can it be closed in the end?
Comment 6 Deepthi V V 2017-08-29 04:58:56 EDT
I am facing same issue in ROSP10 environment.
I am using default configurations for Ceilometer, gnocchi and swift.
Do I have to modify any configurations?

Luca, did you face the issue again after swift clean up? How do I perform swift clean up?
Comment 8 Luca Miccini 2017-09-01 03:07:11 EDT
(In reply to Deepthi V V from comment #6)
> I am facing same issue in ROSP10 environment.
> I am using default configurations for Ceilometer, gnocchi and swift.
> Do I have to modify any configurations?
> 
> Luca, did you face the issue again after swift clean up? How do I perform
> swift clean up?

Hi Deepthi,

cleaning up db and swift helped in this specific case, but only because of the specificity of this environment.

To answer your question: you can simply delete the containers named gnocchi_*, but without any details of your environment and a clear understanding of why you are facing this issue I don't know if this would be enough.

I think it's better for you to open a support case so to have one of our colleagues to perform the proper analysis and suggest the best plan to address your issue.

thanks
Luca
Comment 9 Julien Danjou 2017-09-01 04:24:27 EDT
Thanks for the reply Luca!

Note You need to log in before you can comment on or make changes to this bug.