Bug 1759511

Summary: changing replication settings in global appliance sometimes leads to 500/502/503 error, infinispinner and sometimes to hung appliance
Product: Red Hat CloudForms Management Engine Reporter: Ievgen Zapolskyi <izapolsk>
Component: ReplicationAssignee: Joe Rafaniello <jrafanie>
Status: CLOSED CURRENTRELEASE QA Contact: Ievgen Zapolskyi <izapolsk>
Severity: medium Docs Contact: Red Hat CloudForms Documentation <cloudforms-docs>
Priority: medium    
Version: 5.11.0CC: dmetzger, gtanzillo, jrafanie, mshriver, obarenbo, simaishi, yrudman
Target Milestone: GA   
Target Release: 5.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-12-13 15:20:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: Bug
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: CFME Core Target Upstream Version:
Embargoed:
Attachments:
Description Flags
global appliance logs
none
502 error screenshot
none
503 error screenshot
none
infinispinner
none
hung appliance none

Description Ievgen Zapolskyi 2019-10-08 11:56:52 UTC
Created attachment 1623489 [details]
global appliance logs

Description of problem:
when I make changes in replication settings of global appliance, I see 502/503 errors from time to time. In some cases additional steps in UI lead to infinispinner on Replication tab and more rarely it leads to hung appliance and inability to use it until appliance processes restart.

I noticed 502/503 erorrs after different steps but I managed to find reproducible steps only in one case.

Version-Release number of selected component (if applicable):
5.11.0.27

How reproducible:
60%

Steps to Reproduce:
(env set up command: miq setup-env multi-region --cfme-version 5.11.0.27 --provider-type rhevm --remote-nodes 3 --add-prov vsphere67-nested --add-prov rhv43 --add-prov rhos13 --add-prov azure --add-prov ec2west  --add-prov scvmm2016)
1. Set up one appliance in global region and another one in remote region
2. set up subscription from remote region to global one
3. Go to Configuration->CFME Region->Replication tab in global appliance
4. Choose Delete on subscription and click Save button
5. Click Cancel Delete and Save button
6. Switch to Tasks page
7. wait for 3-5 min and switch back to Configuration->Replication tab

Actual results:
500/502/503 error or infinispinner or hung appliance in 2 out of 4 cases

Expected results:
there should be better handling of such issues

Additional info:
several screenshots and global appliance logs are attached

Comment 2 Ievgen Zapolskyi 2019-10-08 11:57:46 UTC
Created attachment 1623490 [details]
502 error screenshot

Comment 3 Ievgen Zapolskyi 2019-10-08 11:58:18 UTC
Created attachment 1623491 [details]
503 error screenshot

Comment 4 Ievgen Zapolskyi 2019-10-08 11:58:58 UTC
Created attachment 1623492 [details]
infinispinner

Comment 5 Ievgen Zapolskyi 2019-10-08 11:59:30 UTC
Created attachment 1623493 [details]
hung appliance

Comment 7 Joe Rafaniello 2019-10-08 14:26:06 UTC
I'm seeing the long running requests starting in the production.log with these log lines;

[----] W, [2019-10-08T05:14:37.977253 #9879:2af8b69d2c20]  WARN -- : Long running http(s) request: '/ops/pglogical_subscriptions_form_fields/new' handled by #9879:2af8bae848a0, running for 93.79 seconds
[----] W, [2019-10-08T05:15:13.987312 #9879:2af8b69d2c20]  WARN -- : Long running http(s) request: '/ops/pglogical_subscriptions_form_fields/new' handled by #9879:2af8bae848a0, running for 129.8 seconds

That thread is handling this request:

[----] I, [2019-10-08T05:13:04.186500 #9879:2af8bae848a0]  INFO -- : Started GET "/ops/pglogical_subscriptions_form_fields/new" for X.X.X.X at 2019-10-08 05:13:04 -0400
[----] I, [2019-10-08T05:13:04.190051 #9879:2af8bae848a0]  INFO -- : Processing by OpsController#pglogical_subscriptions_form_fields as HTML
[----] I, [2019-10-08T05:13:04.190139 #9879:2af8bae848a0]  INFO -- :   Parameters: {"id"=>"new"}

Just prior to that there's a no method error I need to investigate (from the same client ip address):

----] I, [2019-10-08T05:12:06.934553 #9879:2af8bae81894]  INFO -- : Started POST "/ops/pglogical_save_subscriptions/new?button=save" for X.X.X.X at 2019-10-08 05:12:06 -0400
[----] I, [2019-10-08T05:12:06.947142 #9879:2af8bae81894]  INFO -- : Processing by OpsController#pglogical_save_subscriptions as JS
[----] I, [2019-10-08T05:12:06.947272 #9879:2af8bae81894]  INFO -- :   Parameters: {"replication_type"=>"global", "button"=>"save", "id"=>"new"}
[----] F, [2019-10-08T05:12:06.970998 #9879:2af8bae81894] FATAL -- : Error caught: [NoMethodError] undefined method `each' for nil:NilClass
/opt/rh/cfme-gemset/bundler/gems/cfme-ui-classic-3a49768831a8/app/controllers/ops_controller/settings/common.rb:234:in `prepare_subscriptions_for_saving'
/opt/rh/cfme-gemset/bundler/gems/cfme-ui-classic-3a49768831a8/app/controllers/ops_controller/settings/common.rb:183:in `pglogical_save_subscriptions'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_controller/metal/basic_implicit_render.rb:4:in `send_action'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/abstract_controller/base.rb:186:in `process_action'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_controller/metal/rendering.rb:30:in `process_action'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/abstract_controller/callbacks.rb:20:in `block in process_action'
/opt/rh/cfme-gemset/gems/activesupport-5.1.7/lib/active_support/callbacks.rb:131:in `run_callbacks'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/abstract_controller/callbacks.rb:19:in `process_action'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_controller/metal/rescue.rb:20:in `process_action'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_controller/metal/instrumentation.rb:32:in `block in process_action'
/opt/rh/cfme-gemset/gems/activesupport-5.1.7/lib/active_support/notifications.rb:166:in `block in instrument'
/opt/rh/cfme-gemset/gems/activesupport-5.1.7/lib/active_support/notifications/instrumenter.rb:21:in `instrument'
/opt/rh/cfme-gemset/gems/activesupport-5.1.7/lib/active_support/notifications.rb:166:in `instrument'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_controller/metal/instrumentation.rb:30:in `process_action'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_controller/metal/params_wrapper.rb:252:in `process_action'
/opt/rh/cfme-gemset/gems/activerecord-5.1.7/lib/active_record/railties/controller_runtime.rb:22:in `process_action'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/abstract_controller/base.rb:124:in `process'
/opt/rh/cfme-gemset/gems/actionview-5.1.7/lib/action_view/rendering.rb:30:in `process'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_controller/metal.rb:189:in `dispatch'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_controller/metal.rb:253:in `dispatch'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/routing/route_set.rb:49:in `dispatch'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/routing/route_set.rb:31:in `serve'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/journey/router.rb:50:in `block in serve'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/journey/router.rb:33:in `each'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/journey/router.rb:33:in `serve'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/routing/route_set.rb:844:in `call'
/opt/rh/cfme-gemset/bundler/gems/cfme-graphql-ef6880aaae05/lib/manageiq/graphql/rest_api_proxy.rb:18:in `call'
/opt/rh/cfme-gemset/gems/secure_headers-3.0.3/lib/secure_headers/middleware.rb:10:in `call'
/var/www/miq/vmdb/lib/request_started_on_middleware.rb:12:in `call'
/opt/rh/cfme-gemset/gems/rack-2.0.7/lib/rack/etag.rb:25:in `call'
/opt/rh/cfme-gemset/gems/rack-2.0.7/lib/rack/conditional_get.rb:38:in `call'
/opt/rh/cfme-gemset/gems/rack-2.0.7/lib/rack/head.rb:12:in `call'
/opt/rh/cfme-gemset/gems/rack-2.0.7/lib/rack/session/abstract/id.rb:232:in `context'
/opt/rh/cfme-gemset/gems/rack-2.0.7/lib/rack/session/abstract/id.rb:226:in `call'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/middleware/cookies.rb:613:in `call'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/middleware/callbacks.rb:26:in `block in call'
/opt/rh/cfme-gemset/gems/activesupport-5.1.7/lib/active_support/callbacks.rb:97:in `run_callbacks'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/middleware/callbacks.rb:24:in `call'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/middleware/debug_exceptions.rb:59:in `call'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/middleware/show_exceptions.rb:31:in `call'
/opt/rh/cfme-gemset/gems/railties-5.1.7/lib/rails/rack/logger.rb:36:in `call_app'
/opt/rh/cfme-gemset/gems/railties-5.1.7/lib/rails/rack/logger.rb:26:in `call'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/middleware/remote_ip.rb:79:in `call'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/middleware/request_id.rb:25:in `call'
/opt/rh/cfme-gemset/gems/rack-2.0.7/lib/rack/method_override.rb:22:in `call'
/opt/rh/cfme-gemset/gems/rack-2.0.7/lib/rack/runtime.rb:22:in `call'
/opt/rh/cfme-gemset/gems/activesupport-5.1.7/lib/active_support/cache/strategy/local_cache_middleware.rb:27:in `call'
/opt/rh/cfme-gemset/gems/actionpack-5.1.7/lib/action_dispatch/middleware/executor.rb:12:in `call'
/opt/rh/cfme-gemset/gems/rack-2.0.7/lib/rack/sendfile.rb:111:in `call'
/opt/rh/cfme-gemset/gems/railties-5.1.7/lib/rails/engine.rb:522:in `call'
/usr/share/gems/gems/puma-3.7.1/lib/puma/configuration.rb:232:in `call'
/usr/share/gems/gems/puma-3.7.1/lib/puma/server.rb:578:in `handle_request'
/usr/share/gems/gems/puma-3.7.1/lib/puma/server.rb:415:in `process_client'
/usr/share/gems/gems/puma-3.7.1/lib/puma/server.rb:275:in `block in run'
/usr/share/gems/gems/puma-3.7.1/lib/puma/thread_pool.rb:120:in `block in spawn_thread'
[----] I, [2019-10-08T05:12:06.989932 #9879:2af8bae81894]  INFO -- :   Rendered /opt/rh/cfme-gemset/bundler/gems/cfme-ui-classic-3a49768831a8/app/views/layouts/_exception_contents.html.haml (14.9ms)
[----] I, [2019-10-08T05:12:06.990702 #9879:2af8bae81894]  INFO -- : Completed 200 OK in 43ms (Views: 19.0ms | ActiveRecord: 2.8ms)



I don't know if the segfault is related or not.

Comment 8 Yuri Rudman 2019-10-08 14:41:18 UTC
 Ievgen, were you testing on version  5.11.0.25 or older (to have https://bugzilla.redhat.com/show_bug.cgi?id=1741240 ) ?

Comment 11 Yuri Rudman 2019-10-11 15:51:35 UTC
PR to resolve nil error described in Comment #7:
https://github.com/ManageIQ/manageiq-ui-classic/pull/6294