Bug 1217572
Summary: | [RFE] routing daemon should have a sync option for F5 | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Eric Rich <erich> | ||||
Component: | Networking | Assignee: | Miciah Dashiel Butler Masters <mmasters> | ||||
Networking sub component: | router | QA Contact: | Anping Li <anli> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | unspecified | ||||||
Priority: | high | CC: | adellape, anli, aos-bugs, bleanhar, erich, mmasters, nicholas_schuetz, pep, philfest, rhowe, sauchter, tiwillia, xtian | ||||
Version: | 2.2.0 | Flags: | anli:
needinfo-
|
||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | rubygem-openshift-origin-routing-daemon-0.25.1.2-1.el6op | Doc Type: | Enhancement | ||||
Doc Text: |
After performing updates to F5 BIG-IP, the routing daemon should call the F5 iControl REST API to synchronize F5 BIG-IP's configuration within a preconfigured device-group. This enables a system administrator to set up an F5 BIG-IP cluster (or "device group" in F5 terminology) for high availability and have configuration automatically synchronized within the cluster. This enhancement adds a new setting BIGIP_DEVICE_GROUP, and the F5 iControl REST API model was changed to read the value for this setting and, if a value is set, update the specified device group. The routing daemon can now be configured to initiate a configuration synchronization for a configured F5 BIG-IP device group. The routing daemon will initiate this synchronization at an interval specified with the existing UPDATE_INTERVAL setting (default value 5).
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1278947 (view as bug list) | Environment: | |||||
Last Closed: | 2015-09-30 16:37:34 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1278947 | ||||||
Attachments: |
|
Description
Eric Rich
2015-04-30 17:06:35 UTC
The iControl REST User Guides for 11.4, 11.5, or 11.6 do not document any config-sync API, but I found a report that such an API does in fact exist in 11.5 and later: https://devcentral.f5.com/questions/rest-api-and-config-sync-question Per the comments at the above link, the following should work: curl -svku admin:password https://bigip_host/mgmt/tm/util/tmsh -X POST -H 'Content-Type: application/json' -H 'Accept: application/json' -d '{"apiOptions":"to-group sync-failover-1"}' I tried this command and got back an HTTP 400 Bad Request with the error message "Missing name." The reason for this error may be that the F5 deployment I have available for testing is not a clustered environment, so I have no device groups configured. Is there any chance you could test the above curl command in a clustered F5 deployment to confirm that it works and does what the customer is expecting? If it does, we can add the REST call to the routing daemon and add two settings in routing-daemon.conf for the device-group name ("sync-failover-1") and the interval at which the config-sync should be performed. (In reply to Miciah Dashiel Butler Masters from comment #2) > the interval at which the config-sync should be performed. Shouldn't this just be run after each update event? I get the impression that a config-sync is a somewhat heavy-weight operation, so we would want to rate-limit it. Whether the rate limit should be on the order of once per second, once per minute, or once per hour is something we may need to research or get some advice on from an F5 engineer. (In reply to Miciah Dashiel Butler Masters from comment #2) > The iControl REST User Guides for 11.4, 11.5, or 11.6 do not document any > config-sync API, but I found a report that such an API does in fact exist in > 11.5 and later: > > https://devcentral.f5.com/questions/rest-api-and-config-sync-question > > Per the comments at the above link, the following should work: > > curl -svku admin:password https://bigip_host/mgmt/tm/util/tmsh -X POST > -H 'Content-Type: application/json' -H 'Accept: application/json' -d > '{"apiOptions":"to-group sync-failover-1"}' > > I tried this command and got back an HTTP 400 Bad Request with the error > message "Missing name." The reason for this error may be that the F5 > deployment I have available for testing is not a clustered environment, so I > have no device groups configured. > > Is there any chance you could test the above curl command in a clustered F5 > deployment to confirm that it works and does what the customer is expecting? > If it does, we can add the REST call to the routing daemon and add two > settings in routing-daemon.conf for the device-group name > ("sync-failover-1") and the interval at which the config-sync should be > performed. I get an HTTP 400 when running that command in a Sync-Failover configuration. However, I did find that this command successfully does the sync: curl -svku 'admin:password' https://bigip_host/mgmt/tm/cm -H 'Content-Type: application/json' -H 'Accept: application/json' -X POST -d '{"command":"run","utilCmdArgs":"config-sync to-group Sync_Failover"}' Thanks, Nicholas Schuetz! What version of F5 did you test against? I'm going to guess that you have 11.4, and that the command that works for you is using an old API that works in 11.4 whereas the command that I provided is the new API in 11.5 onwards. If we're lucky, the old API also works on newer F5 versions. I tried your curl command against 11.6.0 and got "01070734:3: Configuration error: Device group (Sync_Failover) not found in device group sync", which suggests to me that we have indeed hit on the correct API. (In reply to Miciah Dashiel Butler Masters from comment #6) > Thanks, Nicholas Schuetz! What version of F5 did you test against? I tested this on 11.5.2. PR: https://github.com/openshift/origin-server/pull/6154 I will need to perform some manual testing and get the PR merged before I can mark this report ON_QA. QE is starting to setting up F5 BIG-IP instance now in AWS. We could foresee need a lot of help from you. Thanks in advance. To reduce noise, about issues blocking QE setting up env, will discuss via email. Once env is set up, QE would verify this bug. Commit pushed to master at https://github.com/openshift/origin-server https://github.com/openshift/origin-server/commit/f783dfab3a0555dab6170d1e7a39e4e5dcf37f1c routing-daemon: F5: Sync device-group on update routing-daemon.conf: Add commented-out example BIGIP_DEVICE_GROUP setting. F5IControlRestLoadBalancerModel#read_config: Read BIGIP_DEVICE_GROUP setting and assign it to @device_group. F5IControlRestLoadBalancerModel: Add update method that synchronizes the device group if @device_group is set. This commit fixes bug 1217572. Created attachment 1071598 [details]
Failed to create openshift_application_aliases during initialization
Do you encounter any problems if you `yum install ruby193-rubygem-rest-client` before starting the daemon? Actually, some of those errors are coming from a problem that this pull request fixes: https://github.com/openshift/origin-server/pull/6234 Thanks! That is great. I had created openshift_application_aliases manually. it is not a blocker. When the update is called? It look like the update is in endless loop. update is called by the daemon in its listen loop: https://github.com/openshift/origin-server/blob/f783dfab3a0555dab6170d1e7a39e4e5dcf37f1c/routing-daemon/lib/openshift/routing/daemon.rb#L243 The UPDATE_INTERVAL setting in /etc/openshift/routing-daemon.conf (with a default value of 5) specifies the interval at which the daemon calls update: https://github.com/openshift/origin-server/blob/f783dfab3a0555dab6170d1e7a39e4e5dcf37f1c/routing-daemon/conf/routing-daemon.conf#L6 https://github.com/openshift/origin-server/blob/f783dfab3a0555dab6170d1e7a39e4e5dcf37f1c/routing-daemon/lib/openshift/routing/daemon.rb#L87 If you are trying to trace the callpath, note that SimpleLoadBalancerController inherits its update method from LoadBalancerController, and these controllers' update method simply calls the update method of the model: https://github.com/openshift/origin-server/blob/f783dfab3a0555dab6170d1e7a39e4e5dcf37f1c/routing-daemon/lib/openshift/routing/controllers/simple.rb#L14 https://github.com/openshift/origin-server/blob/f783dfab3a0555dab6170d1e7a39e4e5dcf37f1c/routing-daemon/lib/openshift/routing/controllers/load_balancer.rb#L143 https://github.com/openshift/origin-server/blob/f783dfab3a0555dab6170d1e7a39e4e5dcf37f1c/routing-daemon/lib/openshift/routing/models/f5-icontrol-rest.rb#L337 The fix works, The ConfigSync Status from 'Awaiting Initial Sync' to 'in Sync after new application was created in openshift. The F5 cluster can sync configure data now. I download the code and made some testing, the bug fix works well. Waiting for new puddle to do more verification. Does the sync save and persist the F5 configuration as well? (In reply to Nicholas Schuetz from comment #23) > Does the sync save and persist the F5 configuration as well? Yes, The data are saved and persist on the fail over F5 instance. Move to Modified status. once the puddle is kick off, I will verify it. rubygem-openshift-origin-routing-daemon-0.25.1.1-1.el6op wasn't in puddle-2-2-2015-09-17. Another puddle is required. rubygem-openshift-origin-gear-placement-0.0.2.1-1.el6op should not have been in fixed_in for this Bugzilla report; rather, that package is related to bug 1241750. I am fixing that mistake in this Bugzilla report and in bug 1241750. I'll look into why you are still seeing that syntax error. Verified and pass. The configuration can be sync to the standby F5 instance. 1 Create an Sync-Failover Manual Device Group 2 Fill the device group in routing-daemon.log 3 Create scale applications and add alias and etc. 4.The new pool/pool member are synced to standby F5 instance. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1844.html |