Bug 1362163

Summary: Improve configurability of ipfailover container(Track https://trello.com/c/228zu7Br/267-5-improve-the-configurability-of-the-ipfailover-container)
Product: OpenShift Container Platform Reporter: Miheer Salunke <misalunk>
Component: RFEAssignee: Phil Cameron <pcameron>
Status: CLOSED NOTABUG QA Contact: Johnny Liu <jialiu>
Severity: urgent Docs Contact:
Priority: medium    
Version: 3.1.0CC: aloughla, aos-bugs, bbennett, erich, jkaur, jokerman, mcurry, mmccomas, pcameron, stwalter
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1466033 (view as bug list) Environment:
Last Closed: 2017-12-12 15:30:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1466033    

Description Miheer Salunke 2016-08-01 12:38:46 UTC
1. Proposed title of this feature request  
Improve configurability of ipfailover container


3. What is the nature and description of the request?  

There is concern that the keepalived configuration doesn't monitor the IP address assignment is still current (restarting networking manually, or as a side-effect from yum update, or by an admin in error). Someone may also wish to add extra monitoring (of another service perhaps) to the check script. Similarly, they may wish to customize the actions taken when it cuts over (e.g. to program something upstream).

It is probably better to allow extension of the keepalived configuration to allow the user to specify custom check and notify scripts, or parts of them: https://tobrunet.ch/2013/07/keepalived-check-and-notify-scripts/

We may also want to allow the check period to be configured.

Part of this task is to look at the design and see what we want to allow to be customized. Ideally, we could customize the whole config, but given that it uses shell scripts at the moment rather than a templatized config file, we may not be able to do too much in that direction.

Acceptance Criteria
- Add a way to specify custom user-specified check and notify scripts to the keepalived configuration
- Document the method, and why you want to do it




7. Is there already an existing RFE upstream or in Red Hat Bugzilla?  
 
https://trello.com/c/228zu7Br/267-5-improve-the-configurability-of-the-ipfailover-container
    
10. List any affected packages or components.
ipfailover image

Comment 2 Phil Cameron 2016-10-11 21:11:39 UTC
This can be done with a configmap

root@wsfd-netdev22: ~ # oc get po | grep ipf
ipf-ha-router-5-ktol1      1/1       Running   0          3h
ipf-ha-router-5-t1xbd      1/1       Running   0          3h
root@wsfd-netdev22: ~ # oc rsh ipf-ha-router-5-ktol1 cat /etc/keepalived/keepalived.conf > save-file

Edit the save-file to add the desired checks and notifies then create a config map
with the file. Edit the dc/ipf-..... to include the config map setup and 
an ENV, KEEPALIVED_OPTIONS="-f ppp" pointing to config map mount point.

The config map provides a complete replacement for the keepalived.conf file so you can do anything that keepalived supports.

I will verify that this will work.

Comment 3 Phil Cameron 2016-10-12 12:52:30 UTC
bbennett This corresponds to the Trello card I am working on, so I took the bug.

Comment 4 Phil Cameron 2016-10-12 17:03:17 UTC
Spoke to Ben about this. The comment 2 approach is not flexible enough. The keepalived.conf file is created every time the pod is started and items can change in the file. A statically captured file passed in the config map won't have the changes. 

Instead we will have a env var for the path names to the check and notify scripts. One way to supply the check and notify scripts is through configmaps.

Comment 5 Phil Cameron 2016-12-09 18:21:50 UTC
Code changes:
https://github.com/openshift/origin/pull/11644

Documentation changes:
https://github.com/openshift/openshift-docs/pull/3355

Comment 6 openshift-github-bot 2016-12-19 19:13:03 UTC
Commit pushed to master at https://github.com/openshift/openshift-docs

https://github.com/openshift/openshift-docs/commit/9ee13b59401fe86c867fa6a498673785a210f254
Ipfailover check and notify script support

Openshift 3.5 feature.

Add options to 'oadm ipfailover' to configure the check script and
notify scripts and also control the period the check script runs.

Keepalived periodically checks whether the application is running
properly.  In the default case the test is a simple verification that
something is listening on the watch port. This PR permits the user to
supply an additional check script that is run in the ipfailover container
context to verify that the application is operating properly. For
example, a web server can be tested by accessing the watch port and
verifying the response.

Whenever a node changes state to MASTER, BACKUP, or FAULT a notify
script can be called. This script has 3 parameters filled in by
keepalived:
$1 - "GROUP"|"INSTANCE"
$2 - name of group or instance
$3 - target state of transition ("MASTER"|"BACKUP"|"FAULT")

--check-script="check_script"
The check script is a script in the keepalived container that verifies
the service is running properly. The script must return 0 for OK and 1
for FAIL.
These checks are in addition to verifying that the watch port is
listening.

--notify-script="notify_script"
The notify script is a script in the keepalived container that is
called whenever the keepalived state transitions to
(MASTER|BACKUP|FAULT)

--check-interval=
The check script is run every seconds. Default is 2.

Note: the scripts name is the full path to the script.

Fixes bug 1362163
https://trello.com/c/228zu7Br/267-5-improve-the-configurability-of-the-ipfailover-container-operations

Signed-off-by: Phil Cameron <pcameron>

Comment 7 Phil Cameron 2017-01-05 20:08:09 UTC
*** Bug 1363697 has been marked as a duplicate of this bug. ***

Comment 11 Phil Cameron 2017-06-28 20:54:26 UTC
Cloned this bug to add the preemption strategy support see 1466033