I don't get an ip address thru dhcp when I run a qemu to install an AtomicHost ppc64le iso image. Fedora-AtomicHost-ostree-ppc64le-Rawhide-20181105.n.1.iso qemu command: /usr/bin/qemu-system-ppc64 -name vm90 -enable-kvm -M pseries -smp 1 -m 8G -nographic -nodefaults -monitor stdio -serial pty -device virtio-net-pci,netdev=net10130,mac=c0:ff:ee:00:00:90 -netdev bridge,br=br0,id=net10130 -cdrom isolerawhide_atomic -drive file=hd1.qcow2 -drive file=hd2.qcow2 -boot d -S Note that in my env it should connect to a dhcp and get an ip address based on the given mac. When reach the anaconda panel to choose between starting vnc or text mode for installation: Starting installer, one moment... anaconda 30.8-1.fc30 for Fedora Rawhide (pre-release) started. * installation log files are stored in /tmp during the installation * shell is available on TTY2 * when reporting a bug add logs from /tmp as separate text/plain attachments 15:29:08 X startup failed, falling back to text mode ================================================================================ ================================================================================ 1) Start VNC 2) Use text mode Please make a selection from the above ['c' to continue, 'q' to quit, 'r' to refresh]: if a choose VNC, it didn't get a valid ip address 15:29:56 Starting VNC... 15:30:02 The VNC server is now running. 15:30:02 WARNING!!! VNC server running with NO PASSWORD! You can use the vncpassword=PASSWORD boot option if you would like to secure the server. 15:30:02 Please manually connect your vnc client to IP-ADDRESS:1 to begin the install. Switch to the shell (Ctrl-B 2) and run 'ip addr' to find the IP-ADDRESS. 15:30:02 Attempting to start vncconfig I can check that there is no ip address: [anaconda root@localhost ~]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group defaul t qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp0s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether c0:ff:ee:00:00:90 brd ff:ff:ff:ff:ff:ff inet6 fe80::58bd:d42a:2b9a:3878/64 scope link noprefixroute valid_lft forever preferred_lft forever In the syslog, I can find: ... 15:30:24,946 DEBUG NetworkManager:<debug> [1541691024.9460] bus-manager: (dhcp) accepted connection 0x1002b18e910 on private socket 15:30:24,946 DEBUG NetworkManager:<debug> [1541691024.9466] dhcp4 (enp0s0): unmapped DHCP state 'PREINIT' 15:30:24,946 DEBUG NetworkManager:<debug> [1541691024.9468] dhcp4 (enp0s0): DHCP state 'unknown' -> 'unknown' (reason: 'PREINIT') 15:30:24,948 DEBUG NetworkManager:<debug> [1541691024.9481] bus-manager: (dhcp) closed connection 0x1002b18e910 on private socket 15:30:24,949 ERR dhclient:Can't install packet filter program: Unknown error 524 15:30:24,950 ERR dhclient:or 524 15:30:24,950 ERR dhclient:This version of ISC DHCP is based on the release available 15:30:24,950 ERR dhclient:on ftp.isc.org. Features have been added and other changes 15:30:24,950 ERR dhclient:have been made to the base software release in order to make 15:30:24,950 ERR dhclient:it work better with this distribution. 15:30:24,950 ERR dhclient:ution. 15:30:24,951 ERR dhclient:Please report issues with this software via: 15:30:24,951 ERR dhclient:https://bugzilla.redhat.com/ 15:30:24,951 ERR dhclient:ution. 15:30:24,951 ERR dhclient:exiting. 15:30:24,953 INFO NetworkManager:<info> [1541691024.9535] dhcp4 (enp0s0): client pid 2820 exited with status 1 15:30:24,953 INFO NetworkManager:<info> [1541691024.9536] dhcp4 (enp0s0): state changed unknown -> done 15:30:24,953 DEBUG NetworkManager:<debug> [1541691024.9539] device[0x1002b1f45b0] (enp0s0): new DHCPv4 client state 3 15:30:24,954 DEBUG NetworkManager:<debug> [1541691024.9540] device[0x1002b1f45b0] (enp0s0): DHCPv4 failed (ip_state conf) 15:30:24,954 DEBUG NetworkManager:<debug> [1541691024.9542] device[0x1002b1f45b0] (enp0s0): remove_pending_action (1): 'dhcp4' 15:30:24,954 INFO NetworkManager:<info> [1541691024.9545] dhcp4 (enp0s0): canceled DHCP transaction
"dhclient:Can't install packet filter program: Unknown error 524" and no IPv4 address is what I got when trying 4.20-pre kernel on my F-28 system.
And from what I see in the x86 openqa instance for Rawhide composes, also x86 suffers from this "no IP" problem.
adding kernel maintainers to CC, it might be something wrong on the kernel side.
still a problem with kernel-4.20.0-0.rc1.git3.1.fc30
strace output from dhclient looks like ... 3756 socket(AF_PACKET, SOCK_RAW, 768) = 7 3756 ioctl(7, SIOCGIFINDEX, {ifr_name="enp0s1", }) = 0 3756 bind(7, {sa_family=AF_PACKET, sll_protocol=htons(ETH_P_ALL), sll_ifindex=if_nametoindex("enp0s1"), sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=0}, 20) = 0 3756 setsockopt(7, SOL_PACKET, PACKET_AUXDATA, [1], 4) = 0 3756 setsockopt(7, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x116fc27f8}, 16) = -1 ENOTSUPP (Unknown error 524) 3756 getpid() = 3756 3756 send(3, "<27>Nov 13 12:13:21 dhclient[375"..., 90, MSG_NOSIGNAL) = 90 3756 write(2, "Can't install packet filter prog"..., 54) = 54 ... Building kernel with CONFIG_BPFILTER enabled to see if it helps.
Note that I found the problem by investigating an openqa test fail on AtomicHost iso (in my own openqa environment) but this test is fine on x86-64, this is why I thought at beginning it was a ppc64le specific problem. test on AtomicHost iso ok on x86-64 with Fedora-Rawhide-20181112.n.0 https://openqa.stg.fedoraproject.org/tests/393668
Nothing to do with dhclient in this case. errno 524 (ENOSUPP) is internal to kernel/bpfilter(?) and should not be exposed (see GETSOCKOPT(2))
switch back to ppc64le, seems x86_64 really isn't affected by this
Indeed, the official openQA tests on ppc64le do seem to be suffering from this, same tests on other arches are not. I just spent an hour rediscovering this, I should've looked for bug reports from Guy first :P
I get the same error if I use rtl8139 as the network device rather than virtio-net, if it helps at all.
This looks related to capabilities. I had a system at hand (custom kernel "4.20.0-rc1.skt", ppc64le), where NetworkManager's dhclient would fail with strace output: setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, 16) = -1 ENOTSUPP (Unknown error 524) Interestingly, when starting dhclient in a terminal, it would succeed. So, I removed CapabilityBoundingSet=CAP_NET_ADMIN CAP_DAC_OVERRIDE CAP_NET_RAW CAP_NET_BIND_SERVICE CAP_SETGID CAP_SETUID CAP_SYS_MODULE CAP_AUDIT_WRITE CAP_KILL CAP_SYS_CHROOT from /usr/lib/systemd/system/NetworkManager.service, and then dhclient started working with NetworkManager.
adding CAP_SYS_ADMIN to CapabilityBoundingSet made it work.
I see the problem even when running dhclient from the command line with "sudo dhclient enp0s1" (in a terminal app under XFCE).
And still problem in NM with CAP_SYS_ADMIN added. Could it be 2 distinct issues, with one ppc64/ppc64le specific?
Thomas says the system he's testing on is ppc64le.
(In reply to Adam Williamson from comment #15) > Thomas says the system he's testing on is ppc64le. right, I missed that :-)
What is next step for this bug ? * there was in comment#12 a proposal to add CAP_SYS_ADMIN to CapabilityBoundingSet in /usr/lib/systemd/system/NetworkManager.service * is it only a workaround or a proposed correction ?
Michel, does adding CAP_SYS_ADMIN fix the problem for you? Because it didn't for me.
Created attachment 1511450 [details] bug1647947_still_failed_despite_workaround.png as per attached image bug1647947_still_failed_despite_workaround.png I tried the workaround of comment#12 modifying the NetworkManager.service file in an openQA test with last Rawhide compose (20181204) But despite service reload and restart * we still have error 524 at install packet filter (the red text in png file) * and no assigned ip address.
Created attachment 1511451 [details] bug1647947_still_failed_despite_workaround.png my previous image was not complete, so replace by this new one.
Comment on attachment 1511450 [details] bug1647947_still_failed_despite_workaround.png * keep first png to show sed command for workaround in NetworkManager.service * and 2nd png to show ip a command output.
Did you do systemctl daemon-reload (IIRC) after modifying the service file? Just modifying the service file and restarting the service won't do the trick. I can actually probably hack up a test which uses a modified NetworkManager package both during and after install, and see what happens with that...
yes I did the daemon-reload as detailed in my local patch https://pagure.io/fork/michelmno/fedora-qa/os-autoinst-distri-fedora/c/424a1787038557f134ebf3f899c688a39324adde?branch=debug_1647947
There are a couple recently-proposed patches, specific to ppc64, which I think may address this issue: https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-December/182399.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-December/182454.html
I guess that's it, dhclient succeeds after manually setting bpf_jit_limit to a pozitive number.
Laura, Justin, could we maybe put those in Rawhide and see if the openQA tests start working again? thanks!
------- Comment From hannsj_uhl.com 2018-12-07 07:21 EDT------- Comment from Sandipan Das 2018-12-07 06:10:48 CST A workaround would be to add something like the following in /etc/sysctl.conf. This way it will persist across reboots and nothing else has to be modified. net.core.bpf_jit_limit = 262144000
Yes, but it needs a successful installation first. AFAIK it's not possible to pass the setting thru the kernel command line.
could probably set it with sysctl from a shell in anaconda. I could try and hack the openQA tests to do that as a check...
(In reply to Adam Williamson from comment #29) > could probably set it with sysctl from a shell in anaconda. I could try and > hack the openQA tests to do that as a check... I tried a patch (1) for some openQA tests and confirmed a sysctl allow bypass for some install flow, not all of them. (1) https://pagure.io/fork/michelmno/fedora-qa/os-autoinst-distri-fedora/c/050466890c332a46285341daba6625367a68c314?branch=bug1647947_workaround
yeah, ones where the network needs to be working before you can get to a console won't be fixed, obviously. but if it works for at least some of the tests, it gives us a solid indication that is the problem. oddly enough, I've noticed the network sometimes not being up on *x86_64* tests recently too (far less often than on ppc64, though). not sure if this is something somehow similar, or entirely unrelated.
With the patch from https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-December/182631.html I see dhclient is working again.
It seems like there's been a lot of discussion so I'd like to wait until a patch hits a maintainer's tree. Once it get committed we can certainly bring it to Fedora.
------- Comment From hannsj_uhl.com 2018-12-17 03:38 EDT------- (In reply to comment #15) > With the patch from > https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-December/182631.html I > see dhclient is working again. > . ... which is upstream accepted in the bpf tree as git commit https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=fdadd04931c2d7cd294dc5b2b342863f94be53a3 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
And now also in the mainline tree as https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fdadd04931c2d7cd294dc5b2b342863f94be53a3 (post-rc7)