Bug 1187244
Summary: | [RHEL 7.0 + 7.1] Host configure with DHCP is losing connectivity after some time - dhclient is not running | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Gil Klein <gklein> | |
Component: | vdsm | Assignee: | Dan Kenigsberg <danken> | |
Status: | CLOSED ERRATA | QA Contact: | Michael Burman <mburman> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 3.5.0 | CC: | adahms, bazulay, cshao, danken, eedri, fdeutsch, gklein, huiwa, initscripts-maint-list, jentrena, jpopelka, kgoldbla, lbopf, leiwang, lnykryn, lpeer, lsurette, mburman, mkalinin, ovasik, pdwyer, sbonazzo, yaniwang, ycui, yeylon, ykaul, ylavi | |
Target Milestone: | ovirt-3.6.0-rc | Keywords: | ZStream | |
Target Release: | 3.6.0 | Flags: | ylavi:
Triaged+
|
|
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1204782 (view as bug list) | Environment: | ||
Last Closed: | 2016-03-09 19:29:45 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1035038, 1082754, 1164311, 1186161, 1196220, 1197441, 1204782 | |||
Attachments: |
(In reply to Gil Klein from comment #0) > Network manager is inactive. > ... > --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=9759, si_uid=0} > +++ killed by SIGTERM +++ To know what exactly stopped dhclient we'd need to see what is/was the process with PID 9759. For now my only idea is initscripts (since NM is not running), for example ifdown-eth stops dhclient: https://git.fedorahosted.org/cgit/initscripts.git/tree/sysconfig/network-scripts/ifdown-eth#n98 Maybe Lukas has some idea -> let's start with reassigning to initscripts I really doubt that this is an initscripts bug. Initscripts are not doing anything on their own so even if it was ifdown who kill dhclient something else had to call ifdown. Can you post here you journal logs? Created attachment 986759 [details]
journalctl output (puma20.scl.lab.tlv.redhat.com)
Network initscript was not restarted so there must be a different culprit. (In reply to Gil Klein from comment #3) > journalctl output (puma20.scl.lab.tlv.redhat.com) I'm getting puzzled from the journalctl output. It shows something completely different than what Nisim originally added to bug #1116004, comment #54. According to your journalctl output there's been one dhclient instance (pid 5660) running since Jan 25th. and successfully renewing addresses for enp4s0f0 (10.35.160.49) enp4s0f1 (10.35.163.98) enp5s0f0 (10.35.162.32) rhevm (10.35.160.49) What I'm wondering in a first place is why there's only one instance of dhclient running for all network interfaces and what started that instance, because AFAIK initscripts run one dhclient instance per network interface. Could you try to look at what's the parent process of this dhclient ? # ps -o ppid= 5660 # ps -f <ppid> The server was rebooted do to power outage, but indeed, only one dhclient is now running now: # ps aux|grep "/sbin/dhclient" root 6235 0.0 0.0 112640 968 pts/1 S+ 15:39 0:00 grep --color=auto /sbin/dhclient root 29361 0.0 0.0 103736 12720 ? Ss 14:43 0:00 /sbin/dhclient -H puma20 -1 -q -lf /var/lib/dhclient/dhclient--rhevm.lease -pf /var/run/dhclient-rhevm.pid rhevm # ps -o ppid=29361 29361 26761 26742 [root@puma20 ~]# ps -f 26761 UID PID PPID C STIME TTY STAT TIME CMD root 26761 26742 0 14:25 pts/1 Ss 0:00 -bash [root@puma20 ~]# ps -f 26742 UID PID PPID C STIME TTY STAT TIME CMD root 26742 1584 0 14:24 ? Ss 0:03 sshd: root@pts/1 (In reply to Gil Klein from comment #7) > # ps -o ppid=29361 There's a space between = and the pid, i.e. # ps -o ppid= 29361 Here it is: # ps -o ppid= 29361 1 (In reply to Gil Klein from comment #0) > attached strace output for dhclient that shows a SIGTERM signal was sent to the process. Also 'dhclient -r/-x ${DEVICE}' sends a SIGTERM to the already running dhclient 'on' ${DEVICE} but I don't think this info moves us any further. As I stated in comment #6, the journalctl output from comment #3 looks good to me, i.e. the dhclient seems to have been running for couple days. So what next ? If there's a problem it's most likely not a problem in dhclient itself, but I have no idea where to move from here. I've tried to capture the SIGTERM while monitoring the process list frequently, but it seems hard to reproduce this exact problem. The same host is still losing it's IP each and every day, but I'm not sure it is the same issue. Please see the attached strace and pcap captures. Created attachment 987668 [details]
journalctl output (puma20.scl.lab.tlv.redhat.com) 3/Feb
Created attachment 987669 [details]
strace output (puma20.scl.lab.tlv.redhat.com) 3/Feb
Created attachment 987671 [details]
pcap output (puma20.scl.lab.tlv.redhat.com) 3/Feb
(In reply to Gil Klein from comment #12) > journalctl output (puma20.scl.lab.tlv.redhat.com) 3/Feb > 16:12:08 network[1020]: Bringing up interface rhevm: > 16:12:08 dhclient[1305]: DHCPREQUEST on rhevm to 255.255.255.255 port 67 (xid=0x1bb44b21) > 16:12:08 dhclient[1305]: DHCPACK from 10.35.160.254 (xid=0x1bb44b21) > 16:12:10 avahi-daemon[959]: Registering new address record for 10.35.160.49 on rhevm.IPv4. > 16:12:11 dhclient[1305]: bound to 10.35.160.49 -- renewal in 16302 seconds. > 16:12:11 network[1020]: Determining IP information for rhevm... done. here the network service configured rhevm interface > 16:12:11 systemd[1]: Reached target Network. > 16:12:13 vdsmd_init_common.sh[990]: vdsm: Running unified_network_persistence_upgrade > 16:12:13 vdsmd_init_common.sh[990]: vdsm: Running restore_nets > 16:12:13 avahi-daemon[959]: Withdrawing address record for 10.35.160.49 on rhevm. for some reason the configured address 10.35.160.49 on rhevm disappeared - maybe something (vdsmd_init_common.sh ?) called 'dhclient -r' ? > 16:12:13 kernel: rhevm: port 1(enp4s0f0) entered disabled state > 16:12:15 dhclient[3032]: DHCPREQUEST on rhevm to 255.255.255.255 port 67 (xid=0x128c4693) > 16:12:15 dhclient[3032]: DHCPACK from 10.35.160.254 (xid=0x128c4693) > 16:12:17 avahi-daemon[959]: Registering new address record for 10.35.160.49 on rhevm.IPv4. > 16:12:17 dhclient[3032]: bound to 10.35.160.49 -- renewal in 18213 seconds. and here new dhclient (pid 3032 instead of previous 1305) configured rhevm interface again Might the vdsmd_init_common.sh be the one what later stops dhclient ? What component does this belong to ? Dan, Is it possible that vdsmd_init_common.sh stops dhclient? (In reply to Gil Klein from comment #16) > Dan, > > Is it possible that vdsmd_init_common.sh stops dhclient? yes, when it calls `vdsm-tool restore-nets`. However, it should have started a new dhclient process again. Is that the issue that you see? This is different from the reports I heard about dhclient dying during a normal server lifetime, with no restarts involved. (In reply to Dan Kenigsberg from comment #17) > (In reply to Gil Klein from comment #16) > > Dan, > > > > Is it possible that vdsmd_init_common.sh stops dhclient? > > yes, when it calls `vdsm-tool restore-nets`. However, it should have started > a new dhclient process again. Is that the issue that you see? I'm still trying to reproduce it, but I'm pretty sure something has killed dhcleint, based on the strace output. > > This is different from the reports I heard about dhclient dying during a > normal server lifetime, with no restarts involved. I'm not sure this is the same issue we see on RHEVH 7.0, I'm trying to reproduce it too We were able to reproduce this issue on RHEL 7.0: # cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.0 (Maipo) # uname -a Linux puma28.scl.lab.tlv.redhat.com 3.10.0-123.20.1.el7.x86_64 #1 SMP Wed Jan 21 09:45:55 EST 2015 x86_64 x86_64 x86_64 GNU/Linux # rpm -qa | grep dhc dhcp-4.2.5-35.el7.x86_64 dhcp-common-4.2.5-35.el7.x86_64 dhclient-4.2.5-35.el7.x86_64 dhcp-libs-4.2.5-35.el7.x86_64 # ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp5s0f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 44:1e:a1:71:a2:fc brd ff:ff:ff:ff:ff:ff 3: enp4s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master rhevm state UP qlen 1000 link/ether 00:9c:02:b0:ff:28 brd ff:ff:ff:ff:ff:ff inet6 fe80::29c:2ff:feb0:ff28/64 scope link valid_lft forever preferred_lft forever 4: enp5s0f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 44:1e:a1:71:a2:fd brd ff:ff:ff:ff:ff:ff 5: enp4s0f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 00:9c:02:b0:ff:2c brd ff:ff:ff:ff:ff:ff 6: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN link/ether fe:3b:ab:d3:72:06 brd ff:ff:ff:ff:ff:ff 8: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether 2e:fe:c7:30:af:3e brd ff:ff:ff:ff:ff:ff 9: rhevm: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 00:9c:02:b0:ff:28 brd ff:ff:ff:ff:ff:ff inet6 fe80::29c:2ff:feb0:ff28/64 scope link valid_lft forever preferred_lft forever /etc/sysconfig/network-scripts/ifcfg-enp4s0f0 # Generated by VDSM version 4.16.8.1-6.el7ev DEVICE=enp4s0f0 HWADDR=00:9c:02:b0:ff:28 BRIDGE=rhevm ONBOOT=yes MTU=1500 NM_CONTROLLED=no /etc/sysconfig/network-scripts/ifcfg-enp5s0f0 HWADDR=44:1E:A1:71:A2:FC TYPE=Ethernet BOOTPROTO=dhcp DEFROUTE=yes PEERDNS=yes PEERROUTES=yes IPV4_FAILURE_FATAL=no IPV6INIT=yes IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_PEERDNS=yes IPV6_PEERROUTES=yes IPV6_FAILURE_FATAL=no NAME=enp5s0f0 UUID=f1eb15d0-303e-49d8-8de9-a8d1b92b8f0e ONBOOT=no /etc/sysconfig/network-scripts/ifcfg-enp5s0f1 HWADDR=44:1E:A1:71:A2:FD TYPE=Ethernet BOOTPROTO=dhcp DEFROUTE=yes PEERDNS=yes PEERROUTES=yes IPV4_FAILURE_FATAL=no IPV6INIT=yes IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_PEERDNS=yes IPV6_PEERROUTES=yes IPV6_FAILURE_FATAL=no NAME=enp5s0f1 UUID=6faa28a1-559f-4c80-b4ff-cdad27b7865d ONBOOT=no /etc/sysconfig/network-scripts/ifcfg-lo DEVICE=lo IPADDR=127.0.0.1 NETMASK=255.0.0.0 NETWORK=127.0.0.0 # If you're having problems with gated making 127.0.0.0/8 a martian, # you can change this to something else (255.255.255.255, for example) BROADCAST=127.255.255.255 ONBOOT=yes NAME=loopback /etc/sysconfig/network-scripts/ifcfg-rhevm # Generated by VDSM version 4.16.8.1-6.el7ev DEVICE=rhevm TYPE=Bridge DELAY=0 STP=off ONBOOT=yes BOOTPROTO=dhcp MTU=1500 DEFROUTE=yes NM_CONTROLLED=no HOTPLUG=no # brctl show bridge name bridge id STP enabled interfaces ;vdsmdummy; 8000.000000000000 no rhevm 8000.009c02b0ff28 no enp4s0f0 Created attachment 989365 [details]
dmesg output (puma28.scl.lab.tlv.redhat.com) RHEL 7.0
Created attachment 989369 [details]
journalctl output (puma28.scl.lab.tlv.redhat.com) RHEL 7.0
I can't see any useful info in the last two dmesg & journalctl logs. Anyway to summarize what we know so far: Something (vdsm ?) stops/kills (we know that from strace) dhclient. Dhclient (or rather dhclient-script) sets address lifetime when adding it to interface. So when the address is not renewed (for example because something has stopped/killed dhclient), kernel automatically removes the address when its lifetime ends. Gil verified (as he wrote in e-mail) that setting the lifetime to 'forever' makes the problem not appear. This setting of address lifetime was added with http://pkgs.devel.redhat.com/cgit/rpms/dhcp/commit/?h=rhel-7.0&id=d62cff5d6c06b40aefe2c5839101305c54f6aba7 http://pkgs.devel.redhat.com/cgit/rpms/dhcp/commit/?h=rhel-7.0&id=dbe9812c437730d64c2ec39ba54d53b79201d16f due to bug #1032809. If there's no other way, it'd be possible to revert those commits, but that's just hiding of a bug somewhere else - but where I have no idea. Jiri, I have another reproducer of the issue. Here, `vdsm-tool restore-nets` helper script calls `ifdown enp6s0` which ends up killing dhclient. Then, the script calls `ifup enp6s0` which creates a new dhclient process. For some reason, this new process dies after 5 seconds or so, shortly after handling a DHCPREQUEST. This does not happen on el6 hosts. Based on timing, it seems as if the dhclient process is killed after ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start finishes. Do you happen to know what can be doing this? systemd? No idea. We'd need to see strace output first to know whether it's still the same case of dhclient being killed/stopped (as in the previous reproducers) or it crashed or exited. We're not yet sure that this is really a vdsm issue, but it's definitely vdsm that's triggering it. Changing product, though it may be changed back soon... *** Bug 1189837 has been marked as a duplicate of this bug. *** Created attachment 990419 [details]
strace output of a relevant dhclient
Gil, in the reproduced scenario, initscripts starts dhclient on boot. Then, vdsm calls ifdown to kill it (I suspect that this is what you see in auditctl). Then, vdsm starts a new dhclient up, and it is killed by something after spawning into background. Can you double-check that you find the killer of the second dhclient? (In reply to Dan Kenigsberg from comment #34) > Gil, in the reproduced scenario, initscripts starts dhclient on boot. Then, > vdsm calls ifdown to kill it (I suspect that this is what you see in > auditctl). Then, vdsm starts a new dhclient up, and it is killed by > something after spawning into background. Can you double-check that you find > the killer of the second dhclient? It seems to be ifdown-eth in all cases: # ausearch -k teste_kill | grep -A 6 -B 6 dhclient type=SYSCALL msg=audit(1423773423.638:91): arch=c000003e syscall=62 success=no exit=-3 a0=1247 a1=f a2=7f5a562bf640 a3=8 items=0 ppid=1330 pid=4674 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="sshd" exe="/usr/sbin/sshd" subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 key="teste_kill" ---- time->Thu Feb 12 22:37:07 2015 type=SYSCALL msg=audit(1423773427.327:103): arch=c000003e syscall=62 success=no exit=-3 a0=1208 a1=f a2=0 a3=1208 items=0 ppid=4740 pid=4743 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ifdown-eth" exe="/usr/bin/bash" subj=system_u:system_r:initrc_t:s0 key="teste_kill" ---- time->Thu Feb 12 22:37:07 2015 type=OBJ_PID msg=audit(1423773427.300:102): opid=4616 oauid=-1 ouid=0 oses=-1 obj=system_u:system_r:dhcpc_t:s0 ocomm="dhclient" type=SYSCALL msg=audit(1423773427.300:102): arch=c000003e syscall=62 success=yes exit=0 a0=1208 a1=f a2=0 a3=7fffb7b49490 items=0 ppid=4740 pid=4743 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ifdown-eth" exe="/usr/bin/bash" subj=system_u:system_r:initrc_t:s0 key="teste_kill" ---- time->Thu Feb 12 22:37:14 2015 type=OBJ_PID msg=audit(1423773434.284:108): opid=4718 oauid=0 ouid=0 oses=24 obj=unconfined_u:system_r:systemd_passwd_agent_t:s0 ocomm="systemd-tty-ask" type=SYSCALL msg=audit(1423773434.284:108): arch=c000003e syscall=62 success=yes exit=0 a0=126e a1=f a2=7fce6a7bb768 a3=7fffbb645310 items=0 ppid=4711 pid=4717 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=24 comm="systemctl" exe="/usr/bin/systemctl" subj=unconfined_u:system_r:initrc_t:s0 key="teste_kill" ---- -- ---- time->Sun Feb 15 18:54:40 2015 type=OBJ_PID msg=audit(1424019280.153:33136): opid=1860 oauid=-1 ouid=107 oses=-1 obj=system_u:system_r:svirt_t:s0:c310,c1006 ocomm="qemu-kvm" type=SYSCALL msg=audit(1424019280.153:33136): arch=c000003e syscall=62 success=yes exit=0 a0=744 a1=0 a2=7fc5f0000048 a3=0 items=0 ppid=1 pid=32424 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="libvirtd" exe="/usr/sbin/libvirtd" subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 key="teste_kill" ---- time->Sun Feb 15 18:54:54 2015 type=OBJ_PID msg=audit(1424019294.006:33164): opid=5241 oauid=-1 ouid=0 oses=-1 obj=system_u:system_r:dhcpc_t:s0 ocomm="dhclient" type=SYSCALL msg=audit(1424019294.006:33164): arch=c000003e syscall=62 success=yes exit=0 a0=1479 a1=f a2=0 a3=7fffa4f7fce0 items=0 ppid=1946 pid=1949 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ifdown-eth" exe="/usr/bin/bash" subj=system_u:system_r:initrc_t:s0 key="teste_kill" ---- time->Sun Feb 15 18:54:54 2015 type=SYSCALL msg=audit(1424019294.034:33165): arch=c000003e syscall=62 success=no exit=-3 a0=1479 a1=f a2=0 a3=1479 items=0 ppid=1946 pid=1949 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ifdown-eth" exe="/usr/bin/bash" subj=system_u:system_r:initrc_t:s0 key="teste_kill" ---- time->Sun Feb 15 18:57:05 2015 -- ---- time->Sun Feb 15 19:02:05 2015 type=OBJ_PID msg=audit(1424019725.007:66): opid=2464 oauid=-1 ouid=107 oses=-1 obj=system_u:system_r:virtd_t:s0-s0:c0.c1023 ocomm="qemu-kvm" type=SYSCALL msg=audit(1424019725.007:66): arch=c000003e syscall=62 success=yes exit=0 a0=9a0 a1=9 a2=7f337c000040 a3=2 items=0 ppid=1 pid=2354 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="libvirtd" exe="/usr/sbin/libvirtd" subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 key="teste_kill" ---- time->Sun Feb 15 19:02:06 2015 type=OBJ_PID msg=audit(1424019726.445:69): opid=1287 oauid=-1 ouid=0 oses=-1 obj=system_u:system_r:dhcpc_t:s0 ocomm="dhclient" type=SYSCALL msg=audit(1424019726.445:69): arch=c000003e syscall=62 success=yes exit=0 a0=507 a1=f a2=0 a3=7fff2e7f92a0 items=0 ppid=2496 pid=2503 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ifdown-eth" exe="/usr/bin/bash" subj=system_u:system_r:init_t:s0 key="teste_kill" ---- time->Sun Feb 15 19:02:06 2015 type=SYSCALL msg=audit(1424019726.470:70): arch=c000003e syscall=62 success=no exit=-3 a0=507 a1=f a2=0 a3=507 items=0 ppid=2496 pid=2503 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ifdown-eth" exe="/usr/bin/bash" subj=system_u:system_r:init_t:s0 key="teste_kill" Created attachment 992150 [details]
alma07 logs (auditd/journalctl/dmesg)
One idea that just came to my mind: We can put systemd in debug mode to see if systemd is killing dhclient: Add systemd.log_level=debug to the kernel cmdline (maybe it can also be put in debug mode in userspace) this bugs weren't included in the 3.5.1 tracker, hence not marked as critical to the version. 3.5.1 was done and already has a high number of bugs, so postponing this to 3.5.2 I have also just experienced this issue on both my hosts: I am running: rhel7.1 on my hosts with rhevm3.5.1 vt14 vdsm-4.16.12-2.el7ev.x86_64 ifconfig -a displays no IP /var/log/messages reports: Mar 4 18:25:27 nott-vds2 puppet: Error: NetworkManager is not running. Mar 4 18:25:27 nott-vds2 puppet: Could not retrieve fact='ipaddress', resolution='<anonymous>': Network is unreachable - sendto(2) Resolved the issue by running dhclient Created attachment 998341 [details]
server, engine and vdsm logs
Adding logs
Now that we have a fix almost ready for this deeply-troubling el7 bug, I would like to include it in the upcoming rhev-3.5.1 the same behavior (systemd is killing dhclient) also occurres during service restart. this is a cavity in the posted solution that can be fixed simply by telling systemd not to do so. Verified and tested on RHEL 7.1 3.6.0-0.0.master.20150412172306.git55ba764.el6.noarch vdsm-4.17.0-632.git19a83a2.el7.x86_64 vdsm-4.17.0-633.git7ad88bc.el7.x86_64 systemd-208-20.el7_1.2.x86_64 systemd-208-20.el7_1.3.x86_64 dhclient-4.2.5-36.el7.x86_64 Moving 'requires_doc_text' to '-', as this bug is verified and no longer needs to be documented as a known issue. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0362.html |
Created attachment 985664 [details] strace output from dhclient Description of problem: Host that is configured which DHCP is loosing it's IP and connectivity after ~1 day. Version-Release number of selected component (if applicable): [root@puma20 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.1 Beta (Maipo) [root@puma20 ~]# uname -a Linux puma20.scl.lab.tlv.redhat.com 3.10.0-223.el7.x86_64 #1 SMP Tue Jan 13 10:49:13 EST 2015 x86_64 x86_64 x86_64 GNU/Linux How reproducible: 100% on specific hosts Steps to Reproduce: 1. Configure the host to use DHCP 2. Use the host for ~1 day Actual results: Host will lose it's IP, dhclient won't be running and host will lose its connectivity to the network. Expected results: Host should not be losing it's IP address Additional info: The interface that loses it's ip is part of a bridge. Network manager is inactive. systemctl status NetworkManager NetworkManager.service - Network Manager Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; disabled) Active: inactive (dead) # rpm -qa |grep dhc dhcp-4.2.5-35.el7.x86_64 dhcp-debuginfo-4.2.5-35.el7.x86_64 dhclient-4.2.5-35.el7.x86_64 dhcp-devel-4.2.5-35.el7.x86_64 dhcp-libs-4.2.5-35.el7.x86_64 dhcp-common-4.2.5-35.el7.x86_64 # ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp4s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master rhevm state UP qlen 1000 link/ether 00:9c:02:b0:df:b0 brd ff:ff:ff:ff:ff:ff inet 10.35.160.49/24 brd 10.35.160.255 scope global dynamic enp4s0f0 valid_lft 28273sec preferred_lft 28273sec inet6 fe80::29c:2ff:feb0:dfb0/64 scope link valid_lft forever preferred_lft forever 3: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 44:1e:a1:73:39:92 brd ff:ff:ff:ff:ff:ff inet 10.35.162.32/24 brd 10.35.162.255 scope global dynamic enp5s0f0 valid_lft 26136sec preferred_lft 26136sec inet6 fe80::461e:a1ff:fe73:3992/64 scope link valid_lft forever preferred_lft forever 4: enp4s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 00:9c:02:b0:df:b4 brd ff:ff:ff:ff:ff:ff inet 10.35.163.98/24 brd 10.35.163.255 scope global dynamic enp4s0f1 valid_lft 42912sec preferred_lft 42912sec inet6 fe80::29c:2ff:feb0:dfb4/64 scope link valid_lft forever preferred_lft forever 5: enp5s0f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000 link/ether 44:1e:a1:73:39:93 brd ff:ff:ff:ff:ff:ff 6: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN link/ether 42:87:eb:f0:40:44 brd ff:ff:ff:ff:ff:ff 8: ;vdsmdummy;: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN link/ether 02:bf:b8:c7:e1:7e brd ff:ff:ff:ff:ff:ff inet6 fe80::bf:b8ff:fec7:e17e/64 scope link valid_lft forever preferred_lft forever 9: rhevm: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 00:9c:02:b0:df:b0 brd ff:ff:ff:ff:ff:ff inet 10.35.160.49/24 brd 10.35.160.255 scope global dynamic rhevm valid_lft 25776sec preferred_lft 25776sec inet6 fe80::29c:2ff:feb0:dfb0/64 scope link valid_lft forever preferred_lft forever # brctl show bridge name bridge id STP enabled interfaces ;vdsmdummy; 8000.000000000000 no rhevm 8000.009c02b0dfb0 no enp4s0f0 Please see the attached strace output for dhclient that shows a SIGTERM signal was sent to the process. --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=9759, si_uid=0} +++ killed by SIGTERM +++