Description of problem: - ipfailover pod didn't clean up its virtual IPs Version-Release number of selected component (if applicable): - v3.1.1.6-21-gcd70c35 Steps to Reproduce: 1. Deploy ipfailover and router pods followed by https://docs.openshift.com/enterprise/3.1/admin_guide/high_availability.html 2. Check that Nodes on which ipfailover pod running have VIPs. 3. Scale=0 or delete ipfailover pods 4. Check that Nodes on which ipfailover pod running. And they still have VIPs. Actual results: After removing ipfailover pods, Node still have VIPs. Expected results: The step-4 (Check that Nodes on which ipfailover pod running. And they still have VIPs.) should NOT have VIPs.
This appears not to be a regression, and there's a manual workaround, so I'm flagging this for the next release.
No, I think it is critical. When ipfailover pods restarted, previous VIPs remains and ipfailover pod doesn't work correctly. e.g) Please see below. 192.168.133.111 (vip) was duplicated. * Test with two VIPs(192.168.133.111,192.168.133.112) 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 52:54:00:b3:3d:1e brd ff:ff:ff:ff:ff:ff inet 192.168.133.3/24 brd 192.168.133.255 scope global dynamic eth0 valid_lft 3106sec preferred_lft 3106sec inet 192.168.133.112/32 scope global eth0 valid_lft forever preferred_lft forever inet 192.168.133.111/32 scope global eth0 valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 52:54:00:b3:3d:1f brd ff:ff:ff:ff:ff:ff inet 192.168.133.4/24 brd 192.168.133.255 scope global dynamic eth0 valid_lft 2650sec preferred_lft 2650sec inet 192.168.133.111/32 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:feb3:3d1f/64 scope link valid_lft forever preferred_lft forever
(In reply to Ben Bennett from comment #1) > This appears not to be a regression, and there's a manual workaround, so I'm > flagging this for the next release. This might be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1300298 Ben can you confirm?
No, this is not related to https://bugzilla.redhat.com/show_bug.cgi?id=1300298 Even if bz#1300298 has been released, this bz#1321989 issue makes ipfailover function unstable. I want to get reply about my comment#2 and what does Ben mean "manual workaround".
OK, I think you mean "manual workaround" as $ip addr del .... Although I thought that would not meet customer's requirement, I noticed that this issue is kind of rare case. Even if ipfailover pods re-deployed and their VIPs remained, they managed (their VIP range) bythemselves when they are deployed correctly.
Ram, can you clarify what the expected behavior is? I'm not sure how we would get rid of the VIP assignment when a pod is killed.
@Ben so we don't currently clean up the VIP(s) - we can probably do it on a clean/normal shutdown. But for failures, ala container/cpu/node down it could cause issues. Node down is probably ok as it would lose the VIP(s) on restart but for container/keepalived getting forcefully killed we won't be able to handle that. The flip side is if someone is doing a forceful kill, then the onus of cleanup may need to be on them. As part of the github issue - ipfailover should handle sigterm - I did create a PR a few weeks back: https://github.com/openshift/origin/pull/9214 That does remove the VIPs on shutdown/termination. However, that path's execution is dependent on how the process/container is terminated. So for normal cases, this would do the cleanup however a forceful kill is something outside that "comfort zone".
@Nakayama-san is the above fix to cleanup on normal (non-forceful kill) shutdown a good solution or was something more warranted? That said, one other idea would be to clean out the VIP(s) on startup - that would at least start from a cleaner slate. However that does depend on what the current VIP(s) are vs what they were in the prior incarnation (so you could well have had 5 VIPs in the first run and kill the processes and then edit the dc to have 2 VIPs say).
Hi Ram-san, Your PR#9214 would be great. The reason I opened this case is that my customer noticed that the old VIP still remained, after he scaled down the ipfailover pod. So, the PR#9214 meets the requirement of this bz.
Cool - that sounds good. Thanks. Associated PR: https://github.com/openshift/origin/pull/9214
@Ram since above PR failed to merge and this bug was reported to OSE as well. For now I change the status to 'assign'. if the #PR 9214 merged to origin successfully. you can change the status to 'modified'. I will test this it on origin firstly. when this was merged to OSE. this bug will be changed to 'ON_QA' according to the flow. thanks.
It's already merged in Origin
Tested this issue with the latest ipfailover images( id=12298e9d8630), the VIP will be deleted when the ipfailover pod is deleted.
This has been merged and is in OSE v3.3.0.8 or newer.
verified this bug with images openshift3/ose-keepalived-ipfailover(id=ebfba8c15d55)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1933