Bug 1479889

Summary: measures not being flushed to swift - how to drop the backlog / drop the db content?
Product: Red Hat OpenStack Reporter: Luca Miccini <lmiccini>
Component: openstack-gnocchiAssignee: Pradeep Kilambi <pkilambi>
Status: CLOSED NOTABUG QA Contact: Sasha Smolyak <ssmolyak>
Severity: high Docs Contact:
Priority: unspecified    
Version: 10.0 (Newton)CC: apevec, deepthi.v.v, jdanjou, jschluet, lhh, lmiccini, pgrist
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-01 08:24:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Luca Miccini 2017-08-09 16:19:07 UTC
Description of problem:

gnocchi backlog is stuck with ~9k measures to process:

$ gnocchi status
+-----------------------------------------------------+-------+
| Field                                               | Value |
+-----------------------------------------------------+-------+
| storage/number of metric having measures to process | 367   |
| storage/total number of measures to process         | 8694  |
+-----------------------------------------------------+-------+

we suspect we ended up in this situation because of issue with the underlying swift (under investigation).

Now swift has been fixed but still metrics/measures are not being flushed.

Suspicion is that some of the containers are not there anymore (metricd.log):

2017-08-09 10:26:46.597 410023 ERROR swiftclient ClientException: Object DELETE failed: http://10.1.0.167:8080/v1/AUTH_f30fb12391754e1dad13da9aa3e57f24/gnocchi.85e090e2-aac4-40d3-aa7f-3bb4bb10aefd/none_v3 404 Not Found  [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<
2017-08-09 10:26:46.621 410023 INFO swiftclient [-] RESP STATUS: 404 Not Found
2017-08-09 10:26:46.621 410023 INFO swiftclient [-] RESP BODY: <html><h1>Not Found</h1><p>The resource could not be found.</p></html>
2017-08-09 10:26:46.621 410023 ERROR swiftclient [-] Container GET failed: http://10.1.0.167:8080/v1/AUTH_f30fb12391754e1dad13da9aa3e57f24/gnocchi.85e090e2-aac4-40d3-aa7f-3bb4bb10aefd?format=json 404 Not Found  [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<
2017-08-09 10:26:46.621 410023 ERROR swiftclient ClientException: Container GET failed: http://10.1.0.167:8080/v1/AUTH_f30fb12391754e1dad13da9aa3e57f24/gnocchi.85e090e2-aac4-40d3-aa7f-3bb4bb10aefd?format=json 404 Not Found  [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<
2017-08-09 10:26:46.634 410023 INFO swiftclient [-] RESP STATUS: 404 Not Found
2017-08-09 10:26:46.634 410023 INFO swiftclient [-] RESP BODY: <html><h1>Not Found</h1><p>The resource could not be found.</p></html>
2017-08-09 10:26:46.634 410023 ERROR swiftclient [-] Object DELETE failed: http://10.1.0.167:8080/v1/AUTH_f30fb12391754e1dad13da9aa3e57f24/gnocchi.85e0ea0d-6d5f-4b71-9809-b6d67bbc57a2/none_v3 404 Not Found  [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<
2017-08-09 10:26:46.634 410023 ERROR swiftclient ClientException: Object DELETE failed: http://10.1.0.167:8080/v1/AUTH_f30fb12391754e1dad13da9aa3e57f24/gnocchi.85e0ea0d-6d5f-4b71-9809-b6d67bbc57a2/none_v3 404 Not Found  [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<
2017-08-09 10:26:46.659 410023 INFO swiftclient [-] RESP STATUS: 404 Not Found
2017-08-09 10:26:46.660 410023 INFO swiftclient [-] RESP BODY: <html><h1>Not Found</h1><p>The resource could not be found.</p></html>
2017-08-09 10:26:46.660 410023 ERROR swiftclient [-] Container GET failed: http://10.1.0.167:8080/v1/AUTH_f30fb12391754e1dad13da9aa3e57f24/gnocchi.85e0ea0d-6d5f-4b71-9809-b6d67bbc57a2?format=json 404 Not Found  [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<
2017-08-09 10:26:46.660 410023 ERROR swiftclient ClientException: Container GET failed: http://10.1.0.167:8080/v1/AUTH_f30fb12391754e1dad13da9aa3e57f24/gnocchi.85e0ea0d-6d5f-4b71-9809-b6d67bbc57a2?format=json 404 Not Found  [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<


As the overcloud is almost empty is there a way to drop the content of the backlog (as well as the rest of the data contained in the gnocchi swift containers) and start from an empty ceilometer/gnocchi db?


Version-Release number of selected component (if applicable):

openstack-gnocchi-api-3.0.6-2.el7ost.noarch                 
openstack-gnocchi-carbonara-3.0.6-2.el7ost.noarch           
openstack-gnocchi-common-3.0.6-2.el7ost.noarch              
openstack-gnocchi-indexer-sqlalchemy-3.0.6-2.el7ost.noarch  
openstack-gnocchi-metricd-3.0.6-2.el7ost.noarch             
openstack-gnocchi-statsd-3.0.6-2.el7ost.noarch             
puppet-gnocchi-9.5.0-1.el7ost.noarch                        
python-gnocchi-3.0.6-2.el7ost.noarch                       
python-gnocchiclient-2.8.2-2.el7ost.noarch                  


How reproducible:

unsure / not reproducible

Steps to Reproduce:
1.
2.
3.

Actual results:

measures are not being saved.

Expected results:

gnocchi to recover once the storage backend is functional again.

Additional info:

Comment 2 Luca Miccini 2017-08-10 12:29:59 UTC
Hi,

we somehow managed to start from scratch by:

1. truncating tables content in the gnocchi mysql DB
2. cleaning swift content (actually wiping swift clean by re-formatting the drives)

backlog is being processed now:


[stack@ospd-osp1-cs-01 ~]$ gnocchi status
+-----------------------------------------------------+-------+
| Field                                               | Value |
+-----------------------------------------------------+-------+
| storage/number of metric having measures to process | 487   |
| storage/total number of measures to process         | 487   |
+-----------------------------------------------------+-------+
...
+-----------------------------------------------------+-------+
| Field                                               | Value |
+-----------------------------------------------------+-------+
| storage/number of metric having measures to process | 228   |
| storage/total number of measures to process         | 228   |
+-----------------------------------------------------+-------+
...
+-----------------------------------------------------+-------+
| Field                                               | Value |
+-----------------------------------------------------+-------+
| storage/number of metric having measures to process | 0     |
| storage/total number of measures to process         | 0     |
+-----------------------------------------------------+-------+

lowering the severity of this BZ.

Regards
Luca

Comment 4 Julien Danjou 2017-08-11 12:56:47 UTC
Can it be closed in the end?

Comment 6 Deepthi V V 2017-08-29 08:58:56 UTC
I am facing same issue in ROSP10 environment.
I am using default configurations for Ceilometer, gnocchi and swift.
Do I have to modify any configurations?

Luca, did you face the issue again after swift clean up? How do I perform swift clean up?

Comment 8 Luca Miccini 2017-09-01 07:07:11 UTC
(In reply to Deepthi V V from comment #6)
> I am facing same issue in ROSP10 environment.
> I am using default configurations for Ceilometer, gnocchi and swift.
> Do I have to modify any configurations?
> 
> Luca, did you face the issue again after swift clean up? How do I perform
> swift clean up?

Hi Deepthi,

cleaning up db and swift helped in this specific case, but only because of the specificity of this environment.

To answer your question: you can simply delete the containers named gnocchi_*, but without any details of your environment and a clear understanding of why you are facing this issue I don't know if this would be enough.

I think it's better for you to open a support case so to have one of our colleagues to perform the proper analysis and suggest the best plan to address your issue.

thanks
Luca

Comment 9 Julien Danjou 2017-09-01 08:24:27 UTC
Thanks for the reply Luca!