Bug 1465529
| Summary: | Autoscaling fails, RabbitMQ being killed | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Morgan Weetman <mweetman> |
| Component: | openstack-ceilometer | Assignee: | Julien Danjou <jdanjou> |
| Status: | CLOSED NOTABUG | QA Contact: | Sasha Smolyak <ssmolyak> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 10.0 (Newton) | CC: | ftaylor, jdanjou, jruzicka, mabaakou, mweetman, rlocke, sclewis, srevivo, vstinner |
| Target Milestone: | --- | Keywords: | Triaged, ZStream |
| Target Release: | 10.0 (Newton) | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-09-04 07:23:03 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1467947 | ||
| Bug Blocks: | |||
|
Description
Morgan Weetman
2017-06-27 14:51:28 UTC
Could you give more info about your problem and way to reproduce it in detail? It's hard to understand what your problem is exactly. I have tried the lab and ceph looks not healthy: $ ceph health HEALTH_ERR 68pgs are stuck inactive.... ceph pool have size of 3 replicas with min_size 1. This makes ceph slow. size should be 1 too, because you have only one node. Also in ceph, three osds are configured, when you have only one, so ceph try to reach unexisting nodes, that also make ceph slow, by waiting a lot on osd that will never come back. Also, even without running tempest ceph is already reporting slow request like more than 500s to write data. So, adding the tempest load is not going to work. This slow requests have good chance to come from the missing osd nodes. So, my guess is, the ceph node is too slow, Gnocchi can't write the backlog to it. Also Ceilometer can't post measures to Gnocchi because Ceph is too slow to write them. That make many messages waiting to be processed on rabbitmq. You should first fix the ceph setup. I got the ceph issue fixed, and can reproduce the issue. I have added depends on to the other issue. Since the root cause have good change to be the same for both BZs. I'm closing this bug on the conclusion that this is not a bug and that the root cause is lack of resources and because the rest is discussed in https://bugzilla.redhat.com/show_bug.cgi?id=1467947 |