Bug 1508435
| Summary: | keepalived v1.3.5 segfaults with older versions of selinux-policy and when running in a container or Linode VM | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Quentin Armitage <quentin> |
| Component: | keepalived | Assignee: | Ryan O'Hara <rohara> |
| Status: | CLOSED ERRATA | QA Contact: | Brandon Perkins <bperkins> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.4 | CC: | aherr, cfeist, christoph.sievers, cluster-maint, quentin |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | keepalived-1.3.5-4.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-04-10 18:15:45 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Quentin Armitage
2017-11-01 12:29:07 UTC
I ran some tests over the past couple days and here is what I've been able to reproduce so far: keepalived-1.3.9-1.el7 (the rebase done in RHEL7.4) will definitely segfault at startup in a container, although it the log messages don't give a reason. I'm assuming that this is the issues with iptables or ipset modules as described in comment #0. Here is the output: # keepalived -PRln Starting Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2 Keepalived[268]: Starting Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2 Unable to resolve default script username 'keepalived_script' - ignoring Keepalived[268]: Unable to resolve default script username 'keepalived_script' - ignoring Opening file '/etc/keepalived/keepalived.conf'. Keepalived[268]: Opening file '/etc/keepalived/keepalived.conf'. Starting VRRP child process, pid=269 Keepalived[268]: Starting VRRP child process, pid=269 Registering Kernel netlink reflector Keepalived_vrrp[269]: Registering Kernel netlink reflector Registering Kernel netlink command channel Keepalived_vrrp[269]: Registering Kernel netlink command channel Registering gratuitous ARP shared channel Keepalived_vrrp[269]: Registering gratuitous ARP shared channel Opening file '/etc/keepalived/keepalived.conf'. Keepalived_vrrp[269]: Opening file '/etc/keepalived/keepalived.conf'. Keepalived_vrrp exited due to segmentation fault (SIGSEGV). Keepalived[268]: Keepalived_vrrp exited due to segmentation fault (SIGSEGV). A quick test with the latest upstreanm keepalived (1.3.9) worked without segfault. Quentin, Although keepalived -1.3.9 worked without segfault when run inside a container, I did run into this message: Keepalived_vrrp[278]: Netlink: error: Operation not permitted, type=(20), seq=1510151532, pid=0 Any idea what this is about? There is no message in the logs indicating what operation was attempted. Ryan, I presume the beginning of the second paragraph of comment 2 should start "keepalived-1.3.5-1.el7 ..." and not "keepalived-1.3.9-1.el7 ..."; when I first read it as 1.3.9 I was somewhat worried! Re comment 3, the type in the (somewhat unhelpful) Netlink error message is the type of message, as defined in /usr/include/linux/rtnetlink.h, and in this case, 20 == RTM_NEWADDR, so for some reason keepalived doesn't have permission within the container to add an address to an interface. I must have a look to see if those Netlink error messages can give some more useful information, such as the address and the interface. (In reply to Quentin Armitage from comment #4) > Ryan, > > I presume the beginning of the second paragraph of comment 2 should start > "keepalived-1.3.5-1.el7 ..." and not "keepalived-1.3.9-1.el7 ..."; when I > first read it as 1.3.9 I was somewhat worried! Yes, you are correct. That is a typo. > Re comment 3, the type in the (somewhat unhelpful) Netlink error message is > the type of message, as defined in /usr/include/linux/rtnetlink.h, and in > this case, 20 == RTM_NEWADDR, so for some reason keepalived doesn't have > permission within the container to add an address to an interface. Ah ok. I thought 20 was indicative of "operation not permitted", but I misread the netlink code. Anyway, that confirms my suspicion that the netlink error was a result of failing to add the virtual IP address to the interface. > I must have a look to see if those Netlink error messages can give some more > useful information, such as the address and the interface. Yes, I am looking at this as well. In the meantime I will get the patches you referenced backported to 1.3.5 since it is too late to rebase to 1.3.9 in RHEL7.5. Ryan, Are you also able to add the BuildRequires for selinux-policy at this stage? (In reply to Quentin Armitage from comment #6) > Ryan, > > Are you also able to add the BuildRequires for selinux-policy at this stage? Yes. Not a problem at all. By the way, my scratch build of 1.3.9 was failing because README was being installed twice: once to /usr/share/doc/keepalived/ and once to /usr/share/keepalived-1.3.9/. I had to change the RHEL7 spec file to remove /usr/share/doc/keepalived/README since README file is handled by %doc line under %files -- that is what ultimately gets the README into the doc directory. (In reply to Ryan O'Hara from comment #7) > (In reply to Quentin Armitage from comment #6) > > Ryan, > > > > Are you also able to add the BuildRequires for selinux-policy at this stage? > > Yes. Not a problem at all. Actually this might be a problem. Consider a docker container where selinux-policy is not installed and is not required. I need to investigate if we can use a rpm macro to check if selinux is enabled prior to installing the policy. A couple comments/questions: 1. I don't think we will we add "Requires: selinux-policy >= 3.13.1-158". I talked to a few people about this and they advised against this, and I tend to agree. This would force the selinux-policy package to be installed anywhere keepalived is installed. That is not always desirable. First, consider containers. There is no need for it there. Also, how would be handle policy types (eg. mls, targeted)? Most importantly here is that we require complete upgrades. For example, when 7.4 is release you should not simply update keepalived -- you should update everything. That is support. Ad-hoc updates are not. I think this problem was observed because a user updated individual package(s), not the entire system. 2. Are we sure that keepalived even runs correctly in docker containers on RHEL/CentOS? I was able to reproduce the segfault, but even with that fixed keepalived running inside a container cannot create the virtual IP address on the interface (netlink reports operation not permitted). It seems like a very minor victory to fix the segfault, but in the end keepalived doesn't work in a container. Re 1 above, could you add "Conflicts: selinux-policy < 3.13.1-158"? I think this deals with the situation of selinux-policy not being installed. Whilst I see the point of complete upgrades, if someone does a "yum install keepalived" on a system that hasn't been upgraded, then I think there would be a problem. Re 2, we had a number of issue reports where people were running keepalived inside containers, so we know it is being done. It might be that they are using the healthchecker (IPVS) parts of keepalived rather than the VRRP part. Here are some test results/notes: Using the base RHEL7 docker image (registry.access.redhat.com/rhel7/rhel), start a container and connect to it with: # docker exec -it --privileged <ID> /bin/bash Once in the container, I had to setup yum repos to grab keepalived and write a simple config file. Since systemd is not used in docker container, start keepalived from the command-line and dump the output to the console: # keepalived -DRln Starting Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2 Unable to resolve default script username 'keepalived_script' - ignoring Opening file '/etc/keepalived/keepalived.conf'. Starting Healthcheck child process, pid=461 Initializing ipvs Starting VRRP child process, pid=462 Registering Kernel netlink reflector Registering Kernel netlink command channel Registering gratuitous ARP shared channel Opening file '/etc/keepalived/keepalived.conf'. (VI_1): Cannot start in MASTER state if not address owner VRRP_Instance(VI_1) removing protocol VIPs. VRRP_Instance(VI_1) removing protocol iptable drop rule IPVS: Can't initialize ipvs: Protocol not available Keepalived_vrrp exited due to segmentation fault (SIGSEGV). Note that the above segfault occurred with keepalived-1.3.5-1.el7.x86_64 from RHEL7.4. With the new patched version of keepalived, run the same test. The ip_vs module may fail to load (different issue), but you should not get a segfault: # keepalived -DRln Starting Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2 Opening file '/etc/keepalived/keepalived.conf'. Starting Healthcheck child process, pid=554 Initializing ipvs Starting VRRP child process, pid=555 Registering Kernel netlink reflector Registering Kernel netlink command channel Registering gratuitous ARP shared channel Opening file '/etc/keepalived/keepalived.conf'. VRRP_Instance(VRRP) removing protocol VIPs. Using LinkWatch kernel netlink reflector... VRRP sockpool: [ifindex(9), proto(112), unicast(0), fd(9,10)] IPVS: Can't initialize ipvs: Protocol not available Stopped Keepalived_healthcheckers exited with permanent error FATAL. Terminating Stopping Stopped Stopped Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2 On the host itself, you can run 'modprobe ip_vs' and run keepalived again and it will work. There is a better way to do this (so the container can load the module itself, by way of keepalived). I think we would need extra capabilities (--cap-add option) given when the container is started (docker run). But the three patches mentioned in comment #0 do fix segfaults. docker run -it --privileged --cap-add=ALL -v /lib/modules:/lib/modules registry.access.redhat.com/rhel7/rhel This allowed keepalived to load ip_vs module (which resides on the host) from the container. I'm not sure it is ideal, but it works for testing. *** Bug 1492827 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0972 |