Bug 1759511 - changing replication settings in global appliance sometimes leads to 500/502/503 error, infinispinner and sometimes to hung appliance
Summary: changing replication settings in global appliance sometimes leads to 500/502/...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Replication
Version: 5.11.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: GA
: 5.11.z
Assignee: Joe Rafaniello
QA Contact: Ievgen Zapolskyi
Red Hat CloudForms Documentation
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-08 11:56 UTC by Ievgen Zapolskyi
Modified: 2019-12-13 15:20 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-13 15:20:24 UTC
Category: Bug
Cloudforms Team: CFME Core
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
global appliance logs (1.16 MB, application/gzip)
2019-10-08 11:56 UTC, Ievgen Zapolskyi
no flags Details
502 error screenshot (237.83 KB, image/png)
2019-10-08 11:57 UTC, Ievgen Zapolskyi
no flags Details
503 error screenshot (224.86 KB, image/png)
2019-10-08 11:58 UTC, Ievgen Zapolskyi
no flags Details
infinispinner (210.69 KB, image/png)
2019-10-08 11:58 UTC, Ievgen Zapolskyi
no flags Details
hung appliance (214.76 KB, image/png)
2019-10-08 11:59 UTC, Ievgen Zapolskyi
no flags Details

Description Ievgen Zapolskyi 2019-10-08 11:56:52 UTC
Created attachment 1623489 [details]
global appliance logs

Description of problem:
when I make changes in replication settings of global appliance, I see 502/503 errors from time to time. In some cases additional steps in UI lead to infinispinner on Replication tab and more rarely it leads to hung appliance and inability to use it until appliance processes restart.

I noticed 502/503 erorrs after different steps but I managed to find reproducible steps only in one case.

Version-Release number of selected component (if applicable):
5.11.0.27

How reproducible:
60%

Steps to Reproduce:
(env set up command: miq setup-env multi-region --cfme-version 5.11.0.27 --provider-type rhevm --remote-nodes 3 --add-prov vsphere67-nested --add-prov rhv43 --add-prov rhos13 --add-prov azure --add-prov ec2west  --add-prov scvmm2016)
1. Set up one appliance in global region and another one in remote region
2. set up subscription from remote region to global one
3. Go to Configuration->CFME Region->Replication tab in global appliance
4. Choose Delete on subscription and click Save button
5. Click Cancel Delete and Save button
6. Switch to Tasks page
7. wait for 3-5 min and switch back to Configuration->Replication tab

Actual results:
500/502/503 error or infinispinner or hung appliance in 2 out of 4 cases

Expected results:
there should be better handling of such issues

Additional info:
several screenshots and global appliance logs are attached

Comment 2 Ievgen Zapolskyi 2019-10-08 11:57:46 UTC
Created attachment 1623490 [details]
502 error screenshot

Comment 3 Ievgen Zapolskyi 2019-10-08 11:58:18 UTC
Created attachment 1623491 [details]
503 error screenshot

Comment 4 Ievgen Zapolskyi 2019-10-08 11:58:58 UTC
Created attachment 1623492 [details]
infinispinner

Comment 5 Ievgen Zapolskyi 2019-10-08 11:59:30 UTC
Created attachment 1623493 [details]
hung appliance

Comment 7 Joe Rafaniello 2019-10-08 14:26:06 UTC
I'm seeing the long running requests starting in the production.log with these log lines;

[----] W, [2019-10-08T05:14:37.977253 #9879:2af8b69d2c20]  WARN -- : Long running http(s) request: '/ops/pglogical_subscriptions_form_fields/new' handled by #9879:2af8bae848a0, running for 93.79 seconds
[----] W, [2019-10-08T05:15:13.987312 #9879:2af8b69d2c20]  WARN -- : Long running http(s) request: '/ops/pglogical_subscriptions_form_fields/new' handled by #9879:2af8bae848a0, running for 129.8 seconds

That thread is handling this request:

[----] I, [2019-10-08T05:13:04.186500 #9879:2af8bae848a0]  INFO -- : Started GET "/ops/pglogical_subscriptions_form_fields/new" for X.X.X.X at 2019-10-08 05:13:04 -0400
[----] I, [2019-10-08T05:13:04.190051 #9879:2af8bae848a0]  INFO -- : Processing by OpsController#pglogical_subscriptions_form_fields as HTML
[----] I, [2019-10-08T05:13:04.190139 #9879:2af8bae848a0]  INFO -- :   Parameters: {"id"=>"new"}

Just prior to that there's a no method error I need to investigate (from the same client ip address):

----] I, [2019-10-08T05:12:06.934553 #9879:2af8bae81894]  INFO -- : Started POST "/ops/pglogical_save_subscriptions/new?button=save" for X.X.X.X at 2019-10-08 05:12:06 -0400
[----] I, [2019-10-08T05:12:06.947142 #9879:2af8bae81894]  INFO -- : Processing by OpsController#pglogical_save_subscriptions as JS
[----] I, [2019-10-08T05:12:06.947272 #9879:2af8bae81894]  INFO -- :   Parameters: {"replication_type"=>"global", "button"=>"save", "id"=>"new"}
[----] F, [2019-10-08T05:12:06.970998 #9879:2af8bae81894] FATAL -- : Error caught: [NoMethodError] undefined method `each' for nil:NilClass
/opt/rh/cfme-gemset/bundler/gems/cfme-ui-classic-3a49768831a8/app/controllers/ops_controller/settings/common.rb:234:in `prepare_subscriptions_for_saving'
/opt/rh/cfme-gemset/bundler/gems/cfme-ui-classic-3a49768831a8/app/controllers/ops_controller/settings/common.rb:183:in `pglogical_save_subscriptions'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_controller/metal/basic_implicit_render.rb:4:in `send_action'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/abstract_controller/base.rb:186:in `process_action'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_controller/metal/rendering.rb:30:in `process_action'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/abstract_controller/callbacks.rb:20:in `block in process_action'
/opt/rh/cfme-gemset/gems/activesupport-5.1.7/lib/active_support/callbacks.rb:131:in `run_callbacks'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/abstract_controller/callbacks.rb:19:in `process_action'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_controller/metal/rescue.rb:20:in `process_action'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_controller/metal/instrumentation.rb:32:in `block in process_action'
/opt/rh/cfme-gemset/gems/activesupport-5.1.7/lib/active_support/notifications.rb:166:in `block in instrument'
/opt/rh/cfme-gemset/gems/activesupport-5.1.7/lib/active_support/notifications/instrumenter.rb:21:in `instrument'
/opt/rh/cfme-gemset/gems/activesupport-5.1.7/lib/active_support/notifications.rb:166:in `instrument'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_controller/metal/instrumentation.rb:30:in `process_action'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_controller/metal/params_wrapper.rb:252:in `process_action'
/opt/rh/cfme-gemset/gems/activerecord-5.1.7/lib/active_record/railties/controller_runtime.rb:22:in `process_action'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/abstract_controller/base.rb:124:in `process'
/opt/rh/cfme-gemset/gems/actionview-5.1.7/lib/action_view/rendering.rb:30:in `process'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_controller/metal.rb:189:in `dispatch'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_controller/metal.rb:253:in `dispatch'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/routing/route_set.rb:49:in `dispatch'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/routing/route_set.rb:31:in `serve'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/journey/router.rb:50:in `block in serve'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/journey/router.rb:33:in `each'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/journey/router.rb:33:in `serve'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/routing/route_set.rb:844:in `call'
/opt/rh/cfme-gemset/bundler/gems/cfme-graphql-ef6880aaae05/lib/manageiq/graphql/rest_api_proxy.rb:18:in `call'
/opt/rh/cfme-gemset/gems/secure_headers-3.0.3/lib/secure_headers/middleware.rb:10:in `call'
/var/www/miq/vmdb/lib/request_started_on_middleware.rb:12:in `call'
/opt/rh/cfme-gemset/gems/rack-2.0.7/lib/rack/etag.rb:25:in `call'
/opt/rh/cfme-gemset/gems/rack-2.0.7/lib/rack/conditional_get.rb:38:in `call'
/opt/rh/cfme-gemset/gems/rack-2.0.7/lib/rack/head.rb:12:in `call'
/opt/rh/cfme-gemset/gems/rack-2.0.7/lib/rack/session/abstract/id.rb:232:in `context'
/opt/rh/cfme-gemset/gems/rack-2.0.7/lib/rack/session/abstract/id.rb:226:in `call'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/middleware/cookies.rb:613:in `call'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/middleware/callbacks.rb:26:in `block in call'
/opt/rh/cfme-gemset/gems/activesupport-5.1.7/lib/active_support/callbacks.rb:97:in `run_callbacks'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/middleware/callbacks.rb:24:in `call'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/middleware/debug_exceptions.rb:59:in `call'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/middleware/show_exceptions.rb:31:in `call'
/opt/rh/cfme-gemset/gems/railties-5.1.7/lib/rails/rack/logger.rb:36:in `call_app'
/opt/rh/cfme-gemset/gems/railties-5.1.7/lib/rails/rack/logger.rb:26:in `call'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/middleware/remote_ip.rb:79:in `call'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/middleware/request_id.rb:25:in `call'
/opt/rh/cfme-gemset/gems/rack-2.0.7/lib/rack/method_override.rb:22:in `call'
/opt/rh/cfme-gemset/gems/rack-2.0.7/lib/rack/runtime.rb:22:in `call'
/opt/rh/cfme-gemset/gems/activesupport-5.1.7/lib/active_support/cache/strategy/local_cache_middleware.rb:27:in `call'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/middleware/executor.rb:12:in `call'
/opt/rh/cfme-gemset/gems/rack-2.0.7/lib/rack/sendfile.rb:111:in `call'
/opt/rh/cfme-gemset/gems/railties-5.1.7/lib/rails/engine.rb:522:in `call'
/usr/share/gems/gems/puma-3.7.1/lib/puma/configuration.rb:232:in `call'
/usr/share/gems/gems/puma-3.7.1/lib/puma/server.rb:578:in `handle_request'
/usr/share/gems/gems/puma-3.7.1/lib/puma/server.rb:415:in `process_client'
/usr/share/gems/gems/puma-3.7.1/lib/puma/server.rb:275:in `block in run'
/usr/share/gems/gems/puma-3.7.1/lib/puma/thread_pool.rb:120:in `block in spawn_thread'
[----] I, [2019-10-08T05:12:06.989932 #9879:2af8bae81894]  INFO -- :   Rendered /opt/rh/cfme-gemset/bundler/gems/cfme-ui-classic-3a49768831a8/app/views/layouts/_exception_contents.html.haml (14.9ms)
[----] I, [2019-10-08T05:12:06.990702 #9879:2af8bae81894]  INFO -- : Completed 200 OK in 43ms (Views: 19.0ms | ActiveRecord: 2.8ms)



I don't know if the segfault is related or not.

Comment 8 Yuri Rudman 2019-10-08 14:41:18 UTC
 Ievgen, were you testing on version  5.11.0.25 or older (to have https://bugzilla.redhat.com/show_bug.cgi?id=1741240 ) ?

Comment 11 Yuri Rudman 2019-10-11 15:51:35 UTC
PR to resolve nil error described in Comment #7:
https://github.com/ManageIQ/manageiq-ui-classic/pull/6294


Note You need to log in before you can comment on or make changes to this bug.