Bug 1217572

Summary:

[RFE] routing daemon should have a sync option for F5

Product:

OpenShift Container Platform

Reporter:

Eric Rich <erich>

Component:

Networking

Assignee:

Miciah Dashiel Butler Masters <mmasters>

Networking sub component:

router

QA Contact:

Anping Li <anli>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

unspecified

Priority:

high

CC:

adellape, anli, aos-bugs, bleanhar, erich, mmasters, nicholas_schuetz, pep, philfest, rhowe, sauchter, tiwillia, xtian

Version:

2.2.0

Flags:

anli: needinfo-

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

rubygem-openshift-origin-routing-daemon-0.25.1.2-1.el6op

Doc Type:

Enhancement

Doc Text:

After performing updates to F5 BIG-IP, the routing daemon should call the F5 iControl REST API to synchronize F5 BIG-IP's configuration within a preconfigured device-group. This enables a system administrator to set up an F5 BIG-IP cluster (or "device group" in F5 terminology) for high availability and have configuration automatically synchronized within the cluster. This enhancement adds a new setting BIGIP_DEVICE_GROUP, and the F5 iControl REST API model was changed to read the value for this setting and, if a value is set, update the specified device group. The routing daemon can now be configured to initiate a configuration synchronization for a configured F5 BIG-IP device group. The routing daemon will initiate this synchronization at an interval specified with the existing UPDATE_INTERVAL setting (default value 5).

Story Points:

---

Clone Of:

Clones:

1278947 (view as bug list)

Environment:

Last Closed:

2015-09-30 16:37:34 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1278947

Attachments:

Description	Flags
Failed to create openshift_application_aliases during initialization	none

Description Eric Rich 2015-04-30 17:06:35 UTC

Description of problem:

There should be a way to configure the routing daemon to do a final configure 'sync' command on the LTM when it's done doing it's updates. 

This will then use the F5's clustering (if configured) to sync configuration between the load balancers. 

This is important when you have a pair of LTMs setup for HA. This allows them to share the same configuration.

Additional info:

As a current workaround, multiple daemons should be used to configure multiple load balancers. Because we use a Topic, to send massages, the when you publish a message it goes to all the subscribers who are interested (routing daemons), so zero to many subscribers will receive a copy of the message. 
   Note: Only subscribers who had an active subscription at the time the broker receives the message will get a copy of the message. 
         (So its possible for loadbalancers to get out of sycn or configuration if connections are to the message broker are lost). And thus the driver for this request.

Comment 2 Miciah Dashiel Butler Masters 2015-05-01 18:53:18 UTC

The iControl REST User Guides for 11.4, 11.5, or 11.6 do not document any config-sync API, but I found a report that such an API does in fact exist in 11.5 and later:

    https://devcentral.f5.com/questions/rest-api-and-config-sync-question

Per the comments at the above link, the following should work:

    curl -svku admin:password https://bigip_host/mgmt/tm/util/tmsh -X POST -H 'Content-Type: application/json' -H 'Accept: application/json' -d '{"apiOptions":"to-group sync-failover-1"}'

I tried this command and got back an HTTP 400 Bad Request with the error message "Missing name." The reason for this error may be that the F5 deployment I have available for testing is not a clustered environment, so I have no device groups configured.

Is there any chance you could test the above curl command in a clustered F5 deployment to confirm that it works and does what the customer is expecting? If it does, we can add the REST call to the routing daemon and add two settings in routing-daemon.conf for the device-group name ("sync-failover-1") and the interval at which the config-sync should be performed.

Comment 3 Eric Rich 2015-05-01 20:37:52 UTC

(In reply to Miciah Dashiel Butler Masters from comment #2)
> the interval at which the config-sync should be performed.

Shouldn't this just be run after each update event?

Comment 4 Miciah Dashiel Butler Masters 2015-05-02 05:08:18 UTC

I get the impression that a config-sync is a somewhat heavy-weight operation, so we would want to rate-limit it.  Whether the rate limit should be on the order of once per second, once per minute, or once per hour is something we may need to research or get some advice on from an F5 engineer.

Comment 5 nicholas_schuetz 2015-05-05 00:09:26 UTC

(In reply to Miciah Dashiel Butler Masters from comment #2)
> The iControl REST User Guides for 11.4, 11.5, or 11.6 do not document any
> config-sync API, but I found a report that such an API does in fact exist in
> 11.5 and later:
> 
>     https://devcentral.f5.com/questions/rest-api-and-config-sync-question
> 
> Per the comments at the above link, the following should work:
> 
>     curl -svku admin:password https://bigip_host/mgmt/tm/util/tmsh -X POST
> -H 'Content-Type: application/json' -H 'Accept: application/json' -d
> '{"apiOptions":"to-group sync-failover-1"}'
> 
> I tried this command and got back an HTTP 400 Bad Request with the error
> message "Missing name." The reason for this error may be that the F5
> deployment I have available for testing is not a clustered environment, so I
> have no device groups configured.
> 
> Is there any chance you could test the above curl command in a clustered F5
> deployment to confirm that it works and does what the customer is expecting?
> If it does, we can add the REST call to the routing daemon and add two
> settings in routing-daemon.conf for the device-group name
> ("sync-failover-1") and the interval at which the config-sync should be
> performed.

I get an HTTP 400 when running that command in a Sync-Failover configuration.

However, I did find that this command successfully does the sync:

curl -svku 'admin:password' https://bigip_host/mgmt/tm/cm -H 'Content-Type: application/json' -H 'Accept: application/json' -X POST -d '{"command":"run","utilCmdArgs":"config-sync to-group Sync_Failover"}'

Comment 6 Miciah Dashiel Butler Masters 2015-05-05 00:25:01 UTC

Thanks, Nicholas Schuetz! What version of F5 did you test against? I'm going to guess that you have 11.4, and that the command that works for you is using an old API that works in 11.4 whereas the command that I provided is the new API in 11.5 onwards.

If we're lucky, the old API also works on newer F5 versions.  I tried your curl command against 11.6.0 and got "01070734:3: Configuration error: Device group (Sync_Failover) not found in device group sync", which suggests to me that we have indeed hit on the correct API.

Comment 7 nicholas_schuetz 2015-05-05 00:30:22 UTC

(In reply to Miciah Dashiel Butler Masters from comment #6)
> Thanks, Nicholas Schuetz! What version of F5 did you test against?

I tested this on 11.5.2.

Comment 8 Miciah Dashiel Butler Masters 2015-05-28 21:36:00 UTC

PR: https://github.com/openshift/origin-server/pull/6154

I will need to perform some manual testing and get the PR merged before I can mark this report ON_QA.

Comment 13 Anping Li 2015-08-28 08:30:44 UTC

QE is starting to setting up F5 BIG-IP instance now in AWS. We could foresee need a lot of help from you. Thanks in advance. To reduce noise, about issues blocking QE setting up env, will discuss via email.
Once env is set up, QE would verify this bug.

Comment 14 openshift-github-bot 2015-08-31 20:47:39 UTC

Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/f783dfab3a0555dab6170d1e7a39e4e5dcf37f1c
routing-daemon: F5: Sync device-group on update

routing-daemon.conf: Add commented-out example BIGIP_DEVICE_GROUP setting.

F5IControlRestLoadBalancerModel#read_config: Read BIGIP_DEVICE_GROUP
setting and assign it to @device_group.

F5IControlRestLoadBalancerModel: Add update method that synchronizes
the device group if @device_group is set.

This commit fixes bug 1217572.

Comment 15 Anping Li 2015-09-09 07:46:14 UTC

Created attachment 1071598 [details]
Failed to create openshift_application_aliases during initialization

Comment 16 Miciah Dashiel Butler Masters 2015-09-09 15:53:51 UTC

Do you encounter any problems if you `yum install ruby193-rubygem-rest-client` before starting the daemon?

Comment 17 Miciah Dashiel Butler Masters 2015-09-09 16:31:16 UTC

Actually, some of those errors are coming from a problem that this pull request fixes: https://github.com/openshift/origin-server/pull/6234

Thanks!

Comment 18 Anping Li 2015-09-09 23:30:20 UTC

That is great. I had created openshift_application_aliases  manually. it is not a blocker.

Comment 19 Anping Li 2015-09-10 10:06:27 UTC

When the update is called? It look like the update is in endless loop.

Comment 20 Miciah Dashiel Butler Masters 2015-09-10 13:28:32 UTC

update is called by the daemon in its listen loop:

https://github.com/openshift/origin-server/blob/f783dfab3a0555dab6170d1e7a39e4e5dcf37f1c/routing-daemon/lib/openshift/routing/daemon.rb#L243

The UPDATE_INTERVAL setting in /etc/openshift/routing-daemon.conf (with a default value of 5) specifies the interval at which the daemon calls update:

https://github.com/openshift/origin-server/blob/f783dfab3a0555dab6170d1e7a39e4e5dcf37f1c/routing-daemon/conf/routing-daemon.conf#L6

https://github.com/openshift/origin-server/blob/f783dfab3a0555dab6170d1e7a39e4e5dcf37f1c/routing-daemon/lib/openshift/routing/daemon.rb#L87

If you are trying to trace the callpath, note that SimpleLoadBalancerController inherits its update method from LoadBalancerController, and these controllers' update method simply calls the update method of the model:

https://github.com/openshift/origin-server/blob/f783dfab3a0555dab6170d1e7a39e4e5dcf37f1c/routing-daemon/lib/openshift/routing/controllers/simple.rb#L14

https://github.com/openshift/origin-server/blob/f783dfab3a0555dab6170d1e7a39e4e5dcf37f1c/routing-daemon/lib/openshift/routing/controllers/load_balancer.rb#L143

https://github.com/openshift/origin-server/blob/f783dfab3a0555dab6170d1e7a39e4e5dcf37f1c/routing-daemon/lib/openshift/routing/models/f5-icontrol-rest.rb#L337

Comment 21 Anping Li 2015-09-11 14:44:37 UTC

The fix works, The ConfigSync Status from 'Awaiting Initial Sync' to 'in Sync after new application was created in openshift.

Comment 22 Anping Li 2015-09-11 15:40:01 UTC

The F5 cluster can sync configure data now. I download the code and made some testing, the bug fix works well.  

Waiting for new puddle to do more verification.

Comment 23 nicholas_schuetz 2015-09-11 20:54:30 UTC

Does the sync save and persist the F5 configuration as well?

Comment 24 Anping Li 2015-09-14 01:09:14 UTC

(In reply to Nicholas Schuetz from comment #23)
> Does the sync save and persist the F5 configuration as well?

Yes, The data are saved and persist on the fail over F5 instance.

Comment 26 Anping Li 2015-09-15 08:05:20 UTC

Move to Modified status.  once the puddle is kick off, I will verify it.

Comment 30 Anping Li 2015-09-18 06:39:02 UTC

rubygem-openshift-origin-routing-daemon-0.25.1.1-1.el6op wasn't in puddle-2-2-2015-09-17.  Another puddle is required.

Comment 36 Miciah Dashiel Butler Masters 2015-09-23 13:25:35 UTC

rubygem-openshift-origin-gear-placement-0.0.2.1-1.el6op should not have been in fixed_in for this Bugzilla report; rather, that package is related to bug 1241750.  I am fixing that mistake in this Bugzilla report and in bug 1241750.

I'll look into why you are still seeing that syntax error.

Comment 38 Anping Li 2015-09-24 06:49:13 UTC

Verified and pass.
The configuration can be sync to the standby F5 instance.
1 Create an Sync-Failover Manual Device Group
2 Fill the device group in routing-daemon.log
3 Create scale applications and add alias and etc.
4.The new pool/pool member are synced to standby F5 instance.

Comment 41 errata-xmlrpc 2015-09-30 16:37:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1844.html