Bug 1199904 - delete_monitor() in f5-icontrol-rest.rb only deletes type "http"
Summary: delete_monitor() in f5-icontrol-rest.rb only deletes type "http"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Unknown
Version: 2.2.0
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: ---
Assignee: Miciah Dashiel Butler Masters
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-03-09 08:36 UTC by Kenjiro Nakayama
Modified: 2019-05-20 11:35 UTC (History)
12 users (show)

Fixed In Version: rubygem-openshift-origin-routing-daemon-0.23.2.0-1.el6op
Doc Type: Bug Fix
Doc Text:
Previously if the routing daemon was configured for use with F5 BIG-IP LTM®, attempting to delete pools using the oo-admin-ctl-routing tool failed. This was due to a bug in the routing daemon. This bug fix updates the routing daemon and oo-admin-ctl-routing tool to address these issues, and as a result these errors no longer occur. Additionally, new commands have been added for listing the monitors associated with a given pool, associating an existing monitor with a pool, or disassociating a monitor from a pool (without deleting either the pool or the monitor). The tool's usage and built-in help text has also been made clearer, and the daemon has been made more resilient when the administrator makes changes to monitors in F5's configuration while the daemon is running.
Clone Of:
Environment:
Last Closed: 2015-04-06 17:06:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:0779 0 normal SHIPPED_LIVE Red Hat OpenShift Enterprise 2.2.5 bug fix and enhancement update 2015-04-06 21:05:45 UTC

Description Kenjiro Nakayama 2015-03-09 08:36:55 UTC
Version-Release number of selected component (if applicable):
===

- OSE 2.2


Actual results:
===
~~~
def delete_monitor monitor_name
# TODO: delete_monitor needs a 'type' parameter for the REST API.
  delete(url: "https://#{@host}/mgmt/tm/ltm/monitor/http/#{monitor_name}")
  #delete(url: "https://#{@host}/mgmt/tm/ltm/monitor/#{type}/#{monitor_name}")
end
~~~


Expected results:
====
~~~
def delete_monitor monitor_name
  delete(url: "https://#{@host}/mgmt/tm/ltm/monitor/#{type}/#{monitor_name}")
end
~~~


See also
===
https://github.com/openshift/enterprise-server/blob/enterprise-2.2/routing-daemon/lib/openshift/routing/models/f5-icontrol-rest.rb#L115-L119

Comment 2 chris alfonso 2015-03-11 19:09:48 UTC
PR opened upstream for origin-server https://github.com/openshift/origin-server/pull/6092

Comment 6 Johnny Liu 2015-03-17 11:21:09 UTC
Retest this bug with rubygem-openshift-origin-routing-daemon-0.22.2.2-1.el6op.noarch, FAIL.


# oo-admin-ctl-routing delete-monitor 
Requires a pool name and monitor type.

# oo-admin-ctl-routing delete-monitor pool_ose_myapp_jialiu_80 https
Deleting monitor pool_ose_myapp_jialiu_80 of type https.
I, [2015-03-17T07:18:51.050410 #16181]  INFO -- : Initializing controller...
I, [2015-03-17T07:18:51.050842 #16181]  INFO -- : Initializing F5 iControl REST interface model...
wrong number of arguments (2 for 3)

# oo-admin-ctl-routing delete-monitor monitor_ose_myapp_jialiu pool_ose_myapp_jialiu_80 http
Requires a pool name and monitor type.

Comment 7 chris alfonso 2015-03-17 16:53:19 UTC
https://github.com/openshift/origin-server/pull/6100 has been opened to fix this issue. It's worth noting there is no way to remove a monitor from a pool via this oo-admin-ctl-routing tool without using delete-pool. You can delete a monitor if it's not being used by a pool, though.

Comment 8 Johnny Liu 2015-03-18 02:36:08 UTC
Checked https://github.com/openshift/origin-server/pull/6092/files, the issue in comments 6 is that oo-admin-ctl-routing tools need 2 arguments, but delete_monitor definition need 3 arguments. they are mismatched.

Comment 9 Brenton Leanhardt 2015-03-18 13:01:51 UTC
This should be fixed with the latest PR that Chris submitted.  I built it yesterday but forgot to move this back to ON_QA.

http://etherpad.corp.redhat.com/puddle-2-2-2015-03-17

Comment 10 Johnny Liu 2015-03-19 06:27:56 UTC
Re-test this bug with rubygem-openshift-origin-routing-daemon-0.22.2.3-1.el6op.noarch, FAIL.

# oo-admin-ctl-routing list-pools; oo-admin-ctl-routing list-aliases; oo-admin-ctl-routing list-monitors
Listing pools.
I, [2015-03-19T02:03:33.028528 #20750]  INFO -- : Initializing controller...
I, [2015-03-19T02:03:33.028949 #20750]  INFO -- : Initializing F5 iControl REST interface model...
I, [2015-03-19T02:03:33.061574 #20750]  INFO -- : Requesting list of pools from load balancer...
  pool_ose_myapp_jialiu_80 (1 members)
Listing aliases for all pools.
I, [2015-03-19T02:03:33.348718 #20765]  INFO -- : Initializing controller...
I, [2015-03-19T02:03:33.349151 #20765]  INFO -- : Initializing F5 iControl REST interface model...
I, [2015-03-19T02:03:33.372453 #20765]  INFO -- : Requesting list of pools from load balancer...
Pool pool_ose_myapp_jialiu_80 has alias ha-myapp-jialiu.example.com.
Listing monitors.
I, [2015-03-19T02:03:33.659971 #20772]  INFO -- : Initializing controller...
I, [2015-03-19T02:03:33.660491 #20772]  INFO -- : Initializing F5 iControl REST interface model...
I, [2015-03-19T02:03:33.684408 #20772]  INFO -- : Requesting list of monitors from load balancer...
monitor_ose_myapp_jialiu

# oo-admin-ctl-routing -h
<--snip-->
list-monitors
  List all monitors.
create-monitor <name> <path> <up-code>
  Create a new monitor with the specified name that queries <path>
  and expects to receive <up-code> to indicate that the pool is up.
delete-monitor <name> <pool> <type>
  Delete the specified monitor.

# oo-admin-ctl-routing delete-monitor monitor_ose_myapp_jialiu pool_ose_myapp_jialiu_80 http
Deleting monitor monitor_ose_myapp_jialiu from pool pool_ose_myapp_jialiu_80 of type http.
I, [2015-03-19T02:24:41.687544 #2114]  INFO -- : Initializing controller...
I, [2015-03-19T02:24:41.687964 #2114]  INFO -- : Initializing F5 iControl REST interface model...
D, [2015-03-19T02:24:41.713858 #2114] DEBUG -- : Deleting monitor monitor_ose_myapp_jialiu, pool_ose_myapp_jialiu_80, http
I, [2015-03-19T02:24:41.713904 #2114]  INFO -- : Requesting list of monitors from load balancer...
RestClient::BadRequest: 400 Bad Request
/opt/rh/ruby193/root/usr/share/gems/gems/rest-client-1.6.1/lib/restclient/abstract_response.rb:48:in `return!'
/opt/rh/ruby193/root/usr/share/gems/gems/rest-client-1.6.1/lib/restclient/request.rb:220:in `process_result'
/opt/rh/ruby193/root/usr/share/gems/gems/rest-client-1.6.1/lib/restclient/request.rb:169:in `block in transmit'
/opt/rh/ruby193/root/usr/share/ruby/net/http.rb:746:in `start'
/opt/rh/ruby193/root/usr/share/gems/gems/rest-client-1.6.1/lib/restclient/request.rb:166:in `transmit'
/opt/rh/ruby193/root/usr/share/gems/gems/rest-client-1.6.1/lib/restclient/request.rb:60:in `execute'
/opt/rh/ruby193/root/usr/share/gems/gems/rest-client-1.6.1/lib/restclient/request.rb:31:in `execute'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-routing-daemon-0.22.2.3/lib/openshift/routing/models/f5-icontrol-rest.rb:40:in `rest_request'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-routing-daemon-0.22.2.3/lib/openshift/routing/models/f5-icontrol-rest.rb:77:in `delete'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-routing-daemon-0.22.2.3/lib/openshift/routing/models/f5-icontrol-rest.rb:117:in `delete_monitor'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-routing-daemon-0.22.2.3/lib/openshift/routing/controllers/simple.rb:109:in `delete_monitor'
/usr/sbin/oo-admin-ctl-routing:29:in `method_missing'
/usr/sbin/oo-admin-ctl-routing:210:in `block in <main>'
/usr/sbin/oo-admin-ctl-routing:158:in `each'
/usr/sbin/oo-admin-ctl-routing:158:in `<main>'

Beside the above issue, one more minor issue, the help message is not consistent.
# oo-admin-ctl-routing -h
<--snip-->
delete-monitor <name> <pool> <type>
  Delete the specified monitor.
<--snip-->

# oo-admin-ctl-routing delete-monitor
Requires a pool name and monitor type.

Comment 11 Miciah Dashiel Butler Masters 2015-03-19 14:08:10 UTC
Is monitor_ose_myapp_jialiu still associated with pool_ose_myapp_jialiu_80 when you try to delete it? F5 will not allow a monitor to be deleted if it is associated with any pool.  (Given that delete-monitor requires the pool name as an argument, this is confusing, granted.)

I have a pull request pending that improves error handling, makes the pool name an optional argument to delete-monitor, and adds a subcommand to oo-admin-ctl-routing to dissociate a monitor from a pool, which will allow it to be deleted: https://github.com/openshift/origin-server/pull/6102

Comment 12 Johnny Liu 2015-03-20 02:57:28 UTC
(In reply to Miciah Dashiel Butler Masters from comment #11)
> Is monitor_ose_myapp_jialiu still associated with pool_ose_myapp_jialiu_80
> when you try to delete it? 

Yeah, it is.

Comment 14 Johnny Liu 2015-03-26 07:49:20 UTC
Re-test this bug with rubygem-openshift-origin-routing-daemon-0.23.1.1-1.el6op.noarch, but failed.


Create a scaling app with F5 LB.
# oo-admin-ctl-routing list-pools; oo-admin-ctl-routing list-aliases; oo-admin-ctl-routing list-monitors
Listing pools.
<--SNIP-->
  pool_ose_myapp_jialiu_80 (1 members)
<--SNIP-->
Listing aliases for all pools.
<--SNIP-->
Pool pool_ose_myapp_jialiu_80 has alias ha-myapp-jialiu.example.com.
<--SNIP-->
Listing monitors.
<--SNIP-->
monitor_ose_myapp_jialiu
<--SNIP-->

Issue 1, a traceback is seen when deleting monitor with <name> + <pool> + <type> options.
# oo-admin-ctl-routing delete-monitor monitor_ose_myapp_jialiu pool_ose_myapp_jialiu_80 http
Deleting monitor monitor_ose_myapp_jialiu from pool http.
I, [2015-03-26T03:24:04.760936 #12599]  INFO -- : Initializing controller...
I, [2015-03-26T03:24:04.761387 #12599]  INFO -- : Initializing F5 iControl REST interface model...
I, [2015-03-26T03:24:05.010365 #12599]  INFO -- : Requesting list of pools from load balancer...
NoMethodError: undefined method `delete_monitor' for nil:NilClass
/usr/sbin/oo-admin-ctl-routing:220:in `block in <main>'
/usr/sbin/oo-admin-ctl-routing:164:in `each'
/usr/sbin/oo-admin-ctl-routing:164:in `<main>'
 
But "oo-admin-ctl-routing delete-monitor monitor_ose_myapp_jialiu http pool_ose_myapp_jialiu_80", after checked the patch code, obviously, when 3 parameters are given, the logic pick up wrong parameter from command line.

Issue 2, workaround for issue 1, delete monitor successfully via command line.
After that, delete app via rhc command, then create this app again, would see the following error log:
#v-
I, [2015-03-26T03:40:48.461710 #7702]  INFO -- : Creating new monitor monitor_ose_myapp_jialiu with path /health_check.php
W, [2015-03-26T03:40:48.461870 #7702]  WARN -- : Got an exception: Monitor already exists: monitor_ose_myapp_jialiu
D, [2015-03-26T03:40:48.461951 #7702] DEBUG -- : Backtrace:
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-routing-daemon-0.23.1.1/lib/openshift/routing/controllers/simple.rb:110:in `create_monitor'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-routing-daemon-0.23.1.1/lib/openshift/routing/daemon.rb:307:in `create_application'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-routing-daemon-0.23.1.1/lib/openshift/routing/daemon.rb:343:in `add_endpoint'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-routing-daemon-0.23.1.1/lib/openshift/routing/daemon.rb:254:in `handle'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-routing-daemon-0.23.1.1/lib/openshift/routing/daemon.rb:226:in `listen'
/etc/init.d/openshift-routing-daemon:94:in `block (2 levels) in <main>'
/opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons/application.rb:215:in `call'
/opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons/application.rb:215:in `block in start_proc'
/opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons/daemonize.rb:192:in `call'
/opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons/daemonize.rb:192:in `call_as_daemon'
/opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons/application.rb:219:in `start_proc'
/opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons/application.rb:255:in `start'
/opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons/controller.rb:69:in `run'
/opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons.rb:188:in `block in run_proc'
/opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons/cmdline.rb:105:in `call'
/opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons/cmdline.rb:105:in `catch_exceptions'
/opt/rh/ruby193/root/usr/share/gems/gems/daemons-1.0.10/lib/daemons.rb:187:in `run_proc'
/etc/init.d/openshift-routing-daemon:93:in `block in <main>'
/etc/init.d/openshift-routing-daemon:37:in `block (2 levels) in locked'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-common-1.29.1.1/lib/openshift-origin-common/utils/path_utils.rb:94:in `block in flock'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-common-1.29.1.1/lib/openshift-origin-common/utils/path_utils.rb:88:in `open'
/opt/rh/ruby193/root/usr/share/gems/gems/openshift-origin-common-1.29.1.1/lib/openshift-origin-common/utils/path_utils.rb:88:in `flock'
/etc/init.d/openshift-routing-daemon:36:in `block in locked'
/opt/rh/ruby193/root/usr/share/ruby/timeout.rb:69:in `timeout'
/etc/init.d/openshift-routing-daemon:35:in `locked'
/etc/init.d/openshift-routing-daemon:80:in `<main>'


And run "oo-admin-ctl-routing list-pools; oo-admin-ctl-routing list-aliases; oo-admin-ctl-routing list-monitors", found there is even no pool created, though app is created successfully from command line.

Once issue 2 happened, user have to restart routing service.

Issue 3, minor issue:
raise ArgumentError.new "Requires a pool name and monitor type." unless [2,3].include? argv.length

The pool name parameter is optional, so it should be "Requires a monitor name and monitor type."

Comment 15 Johnny Liu 2015-03-26 07:54:35 UTC
To be more clear, about issue 1 in comment 14, the help message is misleading user:

delete-monitor <name> [<pool>] <type>
Delete the specified monitor.

Comment 16 Miciah Dashiel Butler Masters 2015-03-26 18:33:49 UTC
Issue 1: It makes sense for the monitor name and type to be together and for the optional argument to be last, so rather than changing the command, I changed the help text to match the command: delete-monitor <name> <type> [<pool>].


Issue 2: The exception is rescued in OpenShift::RoutingDaemon#handle, so its caller, OpenShift::RoutingDaemon#listen, which runs the listen loop, should have continued doing its work without need to restart the daemon.  Routing to myapp can be expected to be broken because you have modified the configuration related to myapp and put it in an inconsistent state, but the daemon should have continued to handle events for other applications without problems.

Why did you need to restart the routing daemon after the error in Issue 2? What errors were you seeing? Are you saying that the daemon was no longer handling events related to other applications after the error related to myapp?

Today I've made two changes:

• Added a rescue to the invocation of @lb_controller.create_monitor in OpenShift::RoutingDaemon#create_application so the routing daemon will still try to create the pool even if it cannot create the monitor.  That is, the daemon will now keep going and try a little harder to set up routing for the application when it runs into this kind of problem.  (Note that when deleting an application, the daemon deletes the pool *before* it tries to delete the monitor, so an error deleting the monitor will prevent the daemon from cleaning up the pool.)

• Added code to the controller so that it (1) refreshes its cached list of monitors from the load balancer if the controller is told to create a monitor that already exists in the controller's list, or if it is told to delete a monitor that does not exist in its cached list of monitors, and then (2) checks the cache again before raising an exception.  That is, now the daemon will refresh its state from the load balancer if the daemon is told to do something that is inconsistent with its current state.

You are manually putting the configuration in an inconsistent state with the oo-admin-ctl-routing command, and the daemon shouldn't necessarily be expected to be able to cope perfectly with deliberate sabotage, but it should cope better now.  Following the steps you describe in comment 14, you can expect to see an error when you delete the application using rhc for which you have already deleted the monitor using oo-admin-ctl-routing, but the daemon should continue operating without any other problems.  Please explain if I am misunderstanding the situation.


Issue 3: Changed "pool name" to "monitor name" in the error message.

Thanks!

PR: https://github.com/openshift/origin-server/pull/6112

Comment 17 openshift-github-bot 2015-03-26 19:53:29 UTC
Commits pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/b3430784e3e9ad381677cf9b75c322ec5bd8458e
oo-admin-ctl-routing: Fix delete-monitor help text

Fix the help text for oo-admin-ctl-routing's delete-monitor command.

The delete-monitor command takes arguments <name> <type> [<pool>].  It
makes sense for the monitor name and type to be together and for the
optional argument to be last, so rather than changing ordering of arguments
in the command, it makes more sense to change the help text to match the
way the command actually works.

This commit is related to bug 1199904.

https://github.com/openshift/origin-server/commit/edccadd835fdb08e1772143fdc2ad443b82ea3e6
oo-admin-ctl-routing: Fix delete-monitor error msg

Fix the error message for oo-admin-ctl-routing's delete-monitor command
when it is given a wrong number of arguments.  The pool name is optional,
but the monitor name is always required.

This commit is related to bug 1199904.

https://github.com/openshift/origin-server/commit/b24b436e5881c2c6e8e596e731bdcf1667f6e506
routing-daemon: Refresh monitors in case of error

Modify the controllers' create_monitor and delete_monitor methods to
refresh the controller's cached list of monitors from the load balancer
when the controller is told to create a monitor that already exists in
the controller's list, or if it is told to delete a monitor that does
not exist in its list.

After this commit, the routing daemon is better able to cope with the
load-balancer's state's being modified from outside the routing daemon
while the routing daemon is running.

This commit is related to bug 1199904.

https://github.com/openshift/origin-server/commit/2bfadca0ac23452486b3b25244e4008b583a02d6
routing-daemon: Try harder to create pool

Modify OpenShift::RoutingDaemon#create_application to rescue and log any
LBControllerException raised in the controller's create_monitor method
so that create_application will continue on to try to create the pool.

The goal of this commit is to make the routing daemon try a little harder
to cope with inconsistent state.

This commit is related to bug 1199904.

Comment 21 Johnny Liu 2015-03-27 03:37:22 UTC
Verified this bug with rubygem-openshift-origin-routing-daemon-0.23.2.0-1.el6op.noarch, and PASS.


Issue 1 and 3 are fixed. Issue 2 did not reproduce in the latest build.

Create a scaling app with F5 LB.
# oo-admin-ctl-routing list-pools; oo-admin-ctl-routing list-aliases; oo-admin-ctl-routing list-monitors
Listing pools.
<--SNIP-->
  pool_ose_myapp_jialiu_80 (1 members)
<--SNIP-->
Listing aliases for all pools.
<--SNIP-->
Pool pool_ose_myapp_jialiu_80 has alias ha-myapp-jialiu.example.com.
<--SNIP-->
Listing monitors.
<--SNIP-->
monitor_ose_myapp_jialiu
<--SNIP-->

# oo-admin-ctl-routing -h
<--SNIP-->
list-monitors
  List all monitors.
create-monitor <name> <path> <up-code>
  Create a new monitor with the specified name that queries <path>
  and expects to receive <up-code> to indicate that the pool is up.
delete-monitor <name> <type> [<pool>]
  Delete the specified monitor.
add-pool-monitor <pool> <monitor>
  Associate the specified monitor with the specified pool.
delete-pool-monitor <pool> <monitor>
  Dissociate the specified monitor from the specified pool.
list-pool-monitors <pool>
  List all monitors associated with the specified pool.
<--SNIP-->

# oo-admin-ctl-routing delete-monitor
Requires a monitor name and monitor type.


# oo-admin-ctl-routing delete-monitor monitor_ose_myapp_jialiu http pool_ose_myapp_jialiu_80
Deleting monitor monitor_ose_myapp_jialiu from pool pool_ose_myapp_jialiu_80.
I, [2015-03-26T23:26:31.182914 #13579]  INFO -- : Initializing controller...
I, [2015-03-26T23:26:31.183369 #13579]  INFO -- : Initializing F5 iControl REST interface model...
I, [2015-03-26T23:26:31.209437 #13579]  INFO -- : Requesting list of pools from load balancer...
Deleting monitor monitor_ose_myapp_jialiu of type http.
D, [2015-03-26T23:26:31.421819 #13579] DEBUG -- : Deleting monitor monitor_ose_myapp_jialiu, pool_ose_myapp_jialiu_80, http
I, [2015-03-26T23:26:31.421871 #13579]  INFO -- : Requesting list of monitors from load balancer...


About question in comment 16:
> Why did you need to restart the routing daemon after the error in Issue 2?
> What errors were you seeing? Are you saying that the daemon was no longer
> handling events related to other applications after the error related to myapp?

In yesterday's build, just as what you said, routing to myapp can be expected to be broken because I have modified the configuration related to myapp and put it in an inconsistent state, when deleting myapp, would see an exception about the requested monitor is not found. This is expected behaviour.
The error I pasted in comment 14 is gotten when I creating another app (I delete myapp, and create a new myapp). Yeah, I indeed am saying that the daemon was no longer handling events related to other applications after the error related to myapp.

However, I tried the same test scenarios against today's build, it passed.

Comment 23 errata-xmlrpc 2015-04-06 17:06:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0779.html


Note You need to log in before you can comment on or make changes to this bug.