Created attachment 1136980 [details] Horizon good screenshot Description of problem: I successfully created 5000 instances on small scale environment on 10.03 Horizon was OK and show me all instances in UI (see attached screenshot). I left it to run on weekend in idle mode (w/o load). And now Horizon is not accessible. Raise error "504 Gateway Time-out" (see screenshot). I check environment using command $ openstack-status Looks like everything is OK (see attached output) 1. I updated httpd timeout and restarted it - didn't help /etc/httpd/conf/httpd.conf #Timeout 120 Timeout 600 2. I updated HAProxy timeout and restarted it - didn't help /etc/haproxy/haproxy.cfg defaults log global maxconn 4096 mode tcp retries 3 # timeout http-request 10s timeout http-request 1m timeout queue 1m # timeout connect 10s timeout connect 1m timeout client 1m timeout server 1m # timeout check 10s timeout check 1m Version-Release number of selected component (if applicable): rhos-release 7 -p 2016-02-24.1 How reproducible: Create 5000 instances/vms for 50 tenants using - image: cirros-0.3.2-sc (12.6 MB) - flavor: m1.nano (1VCPU, 64MB RAM, 1GB Disk) Try login to Horizon Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 1136981 [details] Horizon timeout screenshot
Created attachment 1136982 [details] openstack-status output
Test result here: https://mojo.redhat.com/docs/DOC-1071620
Could you please provide logs? Like error log from horizon? I'd be also curious to see haproxy logs.
Created attachment 1136990 [details] horizon controller log file
Created attachment 1136991 [details] journalctl -u haproxy >haproxy.log
Created attachment 1136997 [details] correct horizon controller log file
Created attachment 1137035 [details] httpd logs
Created attachment 1137051 [details] correct httpd logs
After digging around in that installation, finding many stopped neutron services, I'd tend to close this. neutron logs are showing lots of errors like this: 2016-03-17 00:06:11.767 15007 WARNING keystonemiddleware.auth_token [-] Authorization failed for token 2016-03-17 00:06:11.770 15007 WARNING keystonemiddleware.auth_token [-] Identity response: {"error": {"message": "Could not find token: 447bdac7a94548cc9e7ec6e29e4bd942", "code": 404, "title": "Not Found"}} 2016-03-17 00:06:11.771 15007 WARNING keystonemiddleware.auth_token [-] Authorization failed for token 2016-03-17 00:06:14.526 15011 WARNING keystonemiddleware.auth_token [-] Authorization failed for token 2016-03-17 00:06:14.528 15011 WARNING keystonemiddleware.auth_token [-] Identity response: {"error": {"message": "Could not find token: 99dbb2f98e654ee08acc16502a100ccc", "code": 404, "title": "Not Found"}} 2016-03-17 00:06:14.529 15011 WARNING keystonemiddleware.auth_token [-] Authorization failed for token 2016-03-17 00:06:18.295 15002 WARNING keystonemiddleware.auth_token [-] Authorization failed for token 2016-03-17 00:06:18.296 15002 WARNING keystonemiddleware.auth_token [-] Identity response: {"error": {"message": "Could not find token: 1f5fa24792ec4f7ab741a9d600b41695", "code": 404, "title": "Not Found"}} 2016-03-17 00:06:18.296 15002 WARNING keystonemiddleware.auth_token [-] Authorization failed for token 2016-03-17 00:06:18.303 15002 WARNING keystonemiddleware.auth_token [-] Authorization failed for token 2016-03-17 00:06:30.417 15003 WARNING keystonemiddleware.auth_token [-] Authorization failed for token 2016-03-17 00:06:30.419 15003 WARNING keystonemiddleware.auth_token [-] Identity response: {"error": {"message": "Could not find token: 70dd139424894c37bae880f8c8ae0add", "code": 404, "title": "Not Found"}} 2016-03-17 00:06:30.419 15003 WARNING keystonemiddleware.auth_token [-] Authorization failed for token 2016-03-17 00:16:20.058 15009 WARNING keystonemiddleware.auth_token [-] Authorization failed for token 2016-03-17 00:16:20.059 15009 WARNING keystonemiddleware.auth_token [-] Identity response: {"error": {"message": "Could not find token: 8974d93d849445dc967504b295e16e1e", "code": 404, "title": "Not Found"}} 2016-03-17 00:16:20.059 15009 WARNING keystonemiddleware.auth_token [-] Authorization failed for token 2016-03-17 00:16:22.496 15021 WARNING keystonemiddleware.auth_token [-] Authorization failed for token 2016-03-17 00:16:22.497 15021 WARNING keystonemiddleware.auth_token [-] Identity response: {"error": {"message": "Could not find token: 55ea136f54574b14b948fd4928083958", "code": 404, "title": "Not Found"}} 2016-03-17 00:16:22.498 15021 WARNING keystonemiddleware.auth_token [-] Authorization failed for token and also on another node: 2016-03-16 15:56:39.237 14714 ERROR neutron.plugins.ml2.managers [req-7b4ea77c-fe69-4f77-b78d-10517dbeaca 5 ] Failed to bind port 48d926c7-c5e9-4c79-a7b9-ab06f28841eb on host overcloud-controller-1.localdomain 2016-03-16 15:56:39.238 14714 ERROR neutron.plugins.ml2.managers [req-7b4ea77c-fe69-4f77-b78d-10517dbeaca 5 ] Failed to bind port 48d926c7-c5e9-4c79-a7b9-ab06f28841eb on host overcloud-controller-1.localdomain 2016-03-16 15:56:39.265 14714 WARNING neutron.plugins.ml2.plugin [req-7b4ea77c-fe69-4f77-b78d-10517dbeaca 5 ] In _notify_port_updated(), no bound segment for port 48d926c7-c5e9-4c79-a7b9-ab06f28841eb on network adfe1297-56ec-4b68-ae4c-da48483c40de 2016-03-16 15:56:41.224 14729 WARNING neutron.plugins.ml2.rpc [req-d03b4f72-c9bd-4caa-9115-cc38de8ea7aa ] Device 48d926c7-c5e9-4c79-a7b9-ab06f28841eb requested by agent ovs-agent-overcloud-controller-0.localdomain on network adfe1297-56ec-4b68-ae4c-da48483c40de not bound, vif_type: binding_failed 2016-03-16 15:56:41.290 14729 WARNING neutron.plugins.ml2.rpc [req-d03b4f72-c9bd-4caa-9115-cc38de8ea7aa ] Device b0e449b3-05ab-4b6a-a54e-758c142d8886 requested by agent ovs-agent-overcloud-controller-0.localdomain on network 653efbd9-4213-4f20-a93b-94ce5f2b3548 not bound, vif_type: binding_failed 2016-03-16 15:56:42.749 14734 WARNING neutron.plugins.ml2.drivers.type_tunnel [req-74c3026c-c431-4e8e-abfd-c001cbe297b2 ] Endpoint with ip 172.16.0.10 already exists 2016-03-16 15:56:44.028 14737 WARNING neutron.plugins.ml2.rpc [req-74c3026c-c431-4e8e-abfd-c001cbe297b2 ] Device 48d926c7-c5e9-4c79-a7b9-ab06f28841eb requested by agent ovs-agent-overcloud-controller-1.localdomain on network adfe1297-56ec-4b68-ae4c-da48483c40de not bound, vif_type: binding_failed
I think, it is not correct verification. The bug happened when stack has 5000 instances and you digging around in stack w/o instances at all.
Yuri, the bug still exists, even on an empty machine without running instances.
Yuri, any update on this or can I close it? thanks
Hi, I try to reproduce bug on rhos-release 8 -p 2016-03-24.2 Created 4,849 VMs Horizon is accessible right now But, I'd like to take a 24 hours for idle running I'll update with result. Thanks
After deleting the instances, it seems like the admin/ocerview page is not available any more. While looking at used APIS, I found horizon is calling os-simple-tenant-usage, which has been involved in an older bug: https://bugzilla.redhat.com/show_bug.cgi?id=1243301 Unfortunately, the proposed solution doesn't solve *this* situation here.
addition to #c16 I would like to add OPENSTACK_NOVA_EXTENSIONS_BLACKLIST = ["SimpleTenantUsage"] to /etc/openstack-dashboard/local_settings and restart httpd I can not verify that solution from here.
For the terminated instances killing the page it may help to do this: --- a/openstack_dashboard/dashboards/admin/overview/views.py +++ b/openstack_dashboard/dashboards/admin/overview/views.py @@ -44,6 +44,7 @@ class GlobalUsageCsvRenderer(csvbase.BaseCsvResponse): class GlobalOverview(usage.UsageView): + show_terminated = False table_class = usage.GlobalUsageTable usage_class = usage.GlobalUsage template_name = 'admin/overview/usage.html' The problem with this is that by default the overview page for admin will query for all instances usage including the deleted ones, which can have a high load if the number of instances is too high. Would it be possible to get the env details so I can ssh into it and see it myself? Thanks!
Provided env details to Itxaka.
This problem should be greatly alleviated (and in most cases completely fixed) by implementing https://bugzilla.redhat.com/show_bug.cgi?id=1388171