Bug 1647947 - dhclient fails with "Can't install packet filter program: Unknown error 524" [ppc64le]
Summary: dhclient fails with "Can't install packet filter program: Unknown error 524" ...
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: ppc64le
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords: Patch
Depends On:
Blocks: PPCTracker
TreeView+ depends on / blocked
 
Reported: 2018-11-08 15:34 UTC by Menanteau Guy
Modified: 2019-01-03 13:01 UTC (History)
29 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2019-01-03 13:01:50 UTC


Attachments (Terms of Use)
bug1647947_still_failed_despite_workaround.png (116.78 KB, image/png)
2018-12-04 18:03 UTC, Michel Normand
no flags Details
bug1647947_still_failed_despite_workaround.png (114.33 KB, image/png)
2018-12-04 18:12 UTC, Michel Normand
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 173962 None None None 2019-04-18 16:05 UTC

Description Menanteau Guy 2018-11-08 15:34:07 UTC
I don't get an ip address thru dhcp when I run a qemu to install an AtomicHost ppc64le iso image.

Fedora-AtomicHost-ostree-ppc64le-Rawhide-20181105.n.1.iso

qemu command:
/usr/bin/qemu-system-ppc64 -name vm90 -enable-kvm -M pseries -smp 1 -m 8G -nographic -nodefaults -monitor stdio -serial pty -device
virtio-net-pci,netdev=net10130,mac=c0:ff:ee:00:00:90 -netdev bridge,br=br0,id=net10130 -cdrom isolerawhide_atomic -drive file=hd1.qcow2 -drive file=hd2.qcow2 -boot d -S

Note that in my env it should connect to a dhcp and get an ip address based on the given mac.

When reach the anaconda panel to choose between starting vnc or text mode for installation:

Starting installer, one moment...
anaconda 30.8-1.fc30 for Fedora Rawhide (pre-release) started.
 * installation log files are stored in /tmp during the installation
 * shell is available on TTY2
 * when reporting a bug add logs from /tmp as separate text/plain attachments
15:29:08 X startup failed, falling back to text mode
================================================================================
================================================================================

1) Start VNC
2) Use text mode

Please make a selection from the above ['c' to continue, 'q' to quit, 'r' to
refresh]: 

if a choose VNC, it didn't get a valid ip address

15:29:56 Starting VNC...
15:30:02 The VNC server is now running.
15:30:02 

WARNING!!! VNC server running with NO PASSWORD!
You can use the vncpassword=PASSWORD boot option
if you would like to secure the server.

15:30:02 Please manually connect your vnc client to IP-ADDRESS:1 to begin the install. Switch to the shell (Ctrl-B 2) and run 'ip addr' to find the IP-ADDRESS.
15:30:02 Attempting to start vncconfig

I can check that there is no ip address:
[anaconda root@localhost ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group defaul
t qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether c0:ff:ee:00:00:90 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::58bd:d42a:2b9a:3878/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever


In the syslog, I can find:
...
15:30:24,946 DEBUG NetworkManager:<debug> [1541691024.9460] bus-manager: (dhcp) accepted connection 0x1002b18e910 on private socket
15:30:24,946 DEBUG NetworkManager:<debug> [1541691024.9466] dhcp4 (enp0s0): unmapped DHCP state 'PREINIT'
15:30:24,946 DEBUG NetworkManager:<debug> [1541691024.9468] dhcp4 (enp0s0): DHCP state 'unknown' -> 'unknown' (reason: 'PREINIT')
15:30:24,948 DEBUG NetworkManager:<debug> [1541691024.9481] bus-manager: (dhcp) closed connection 0x1002b18e910 on private socket
15:30:24,949 ERR dhclient:Can't install packet filter program: Unknown error 524
15:30:24,950 ERR dhclient:or 524
15:30:24,950 ERR dhclient:This version of ISC DHCP is based on the release available
15:30:24,950 ERR dhclient:on ftp.isc.org. Features have been added and other changes
15:30:24,950 ERR dhclient:have been made to the base software release in order to make
15:30:24,950 ERR dhclient:it work better with this distribution.
15:30:24,950 ERR dhclient:ution.
15:30:24,951 ERR dhclient:Please report issues with this software via:
15:30:24,951 ERR dhclient:https://bugzilla.redhat.com/
15:30:24,951 ERR dhclient:ution. 
15:30:24,951 ERR dhclient:exiting.
15:30:24,953 INFO NetworkManager:<info>  [1541691024.9535] dhcp4 (enp0s0): client pid 2820 exited with status 1
15:30:24,953 INFO NetworkManager:<info>  [1541691024.9536] dhcp4 (enp0s0): state changed unknown -> done
15:30:24,953 DEBUG NetworkManager:<debug> [1541691024.9539] device[0x1002b1f45b0] (enp0s0): new DHCPv4 client state 3
15:30:24,954 DEBUG NetworkManager:<debug> [1541691024.9540] device[0x1002b1f45b0] (enp0s0): DHCPv4 failed (ip_state conf)
15:30:24,954 DEBUG NetworkManager:<debug> [1541691024.9542] device[0x1002b1f45b0] (enp0s0): remove_pending_action (1): 'dhcp4'
15:30:24,954 INFO NetworkManager:<info>  [1541691024.9545] dhcp4 (enp0s0): canceled DHCP transaction

Comment 1 Dan Horák 2018-11-08 15:42:31 UTC
"dhclient:Can't install packet filter program: Unknown error 524" and no IPv4 address is what I got when trying 4.20-pre kernel on my F-28 system.

Comment 2 Dan Horák 2018-11-08 16:42:48 UTC
And from what I see in the x86 openqa instance for Rawhide composes, also x86 suffers from this "no IP" problem.

Comment 3 Dan Horák 2018-11-09 11:35:32 UTC
adding kernel maintainers to CC, it might be something wrong on the kernel side.

Comment 4 Dan Horák 2018-11-09 11:45:25 UTC
still a problem with kernel-4.20.0-0.rc1.git3.1.fc30

Comment 5 Dan Horák 2018-11-13 12:42:43 UTC
strace output from dhclient looks like

...
3756  socket(AF_PACKET, SOCK_RAW, 768)  = 7
3756  ioctl(7, SIOCGIFINDEX, {ifr_name="enp0s1", }) = 0
3756  bind(7, {sa_family=AF_PACKET, sll_protocol=htons(ETH_P_ALL), sll_ifindex=if_nametoindex("enp0s1"), sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=0}, 20) = 0
3756  setsockopt(7, SOL_PACKET, PACKET_AUXDATA, [1], 4) = 0
3756  setsockopt(7, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x116fc27f8}, 16) = -1 ENOTSUPP (Unknown error 524)
3756  getpid()                          = 3756
3756  send(3, "<27>Nov 13 12:13:21 dhclient[375"..., 90, MSG_NOSIGNAL) = 90
3756  write(2, "Can't install packet filter prog"..., 54) = 54
...

Building kernel with CONFIG_BPFILTER enabled to see if it helps.

Comment 6 Menanteau Guy 2018-11-13 15:18:48 UTC
Note that I found the problem by investigating an openqa test fail on AtomicHost iso (in my own openqa environment) but this test is fine on x86-64, this is why I thought at beginning it was a ppc64le specific problem.
test on AtomicHost iso ok on x86-64 with Fedora-Rawhide-20181112.n.0 https://openqa.stg.fedoraproject.org/tests/393668

Comment 7 Pavel Zhukov 2018-11-14 08:21:59 UTC
Nothing to do with dhclient in this case. 
errno 524 (ENOSUPP) is internal to kernel/bpfilter(?) and should not be exposed (see GETSOCKOPT(2))

Comment 8 Dan Horák 2018-11-15 14:06:45 UTC
switch back to ppc64le, seems x86_64 really isn't affected by this

Comment 9 Adam Williamson 2018-11-15 22:27:23 UTC
Indeed, the official openQA tests on ppc64le do seem to be suffering from this, same tests on other arches are not. I just spent an hour rediscovering this, I should've looked for bug reports from Guy first :P

Comment 10 Adam Williamson 2018-11-16 00:56:33 UTC
I get the same error if I use rtl8139 as the network device rather than virtio-net, if it helps at all.

Comment 11 Thomas Haller 2018-11-19 16:23:12 UTC
This looks related to capabilities.


I had a system at hand (custom kernel "4.20.0-rc1.skt", ppc64le), where NetworkManager's dhclient would fail with strace output:

  setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, 16) = -1 ENOTSUPP (Unknown error 524)

Interestingly, when starting dhclient in a terminal, it would succeed. So, I removed

  CapabilityBoundingSet=CAP_NET_ADMIN CAP_DAC_OVERRIDE CAP_NET_RAW CAP_NET_BIND_SERVICE CAP_SETGID CAP_SETUID CAP_SYS_MODULE CAP_AUDIT_WRITE CAP_KILL CAP_SYS_CHROOT

from /usr/lib/systemd/system/NetworkManager.service, and then dhclient started working with NetworkManager.

Comment 12 Thomas Haller 2018-11-19 16:26:25 UTC
adding CAP_SYS_ADMIN to CapabilityBoundingSet made it work.

Comment 13 Dan Horák 2018-11-20 10:33:32 UTC
I see the problem even when running dhclient from the command line with "sudo dhclient enp0s1" (in a terminal app under XFCE).

Comment 14 Dan Horák 2018-11-20 13:07:44 UTC
And still problem in NM with CAP_SYS_ADMIN added. Could it be 2 distinct issues, with one ppc64/ppc64le specific?

Comment 15 Adam Williamson 2018-11-20 16:27:03 UTC
Thomas says the system he's testing on is ppc64le.

Comment 16 Dan Horák 2018-11-20 16:43:09 UTC
(In reply to Adam Williamson from comment #15)
> Thomas says the system he's testing on is ppc64le.

right, I missed that :-)

Comment 17 Michel Normand 2018-12-04 17:26:17 UTC
What is next step for this bug ?

* there was in comment#12 a proposal to add CAP_SYS_ADMIN to CapabilityBoundingSet in /usr/lib/systemd/system/NetworkManager.service

* is it only a workaround or a proposed correction ?

Comment 18 Dan Horák 2018-12-04 17:42:44 UTC
Michel, does adding CAP_SYS_ADMIN fix the problem for you? Because it didn't for me.

Comment 19 Michel Normand 2018-12-04 18:03 UTC
Created attachment 1511450 [details]
bug1647947_still_failed_despite_workaround.png

as per attached image bug1647947_still_failed_despite_workaround.png I tried the workaround of comment#12 modifying the NetworkManager.service file in an openQA test with last Rawhide compose (20181204) 
But despite service reload and restart 
* we still have error 524 at install packet filter (the red text in png file)
* and no assigned ip address.

Comment 20 Michel Normand 2018-12-04 18:12 UTC
Created attachment 1511451 [details]
bug1647947_still_failed_despite_workaround.png

my previous image was not complete, so replace by this new one.

Comment 21 Michel Normand 2018-12-04 18:16:52 UTC
Comment on attachment 1511450 [details]
bug1647947_still_failed_despite_workaround.png

* keep first png to show sed command for workaround in NetworkManager.service
* and  2nd   png to show ip a command output.

Comment 22 Adam Williamson 2018-12-04 18:35:20 UTC
Did you do systemctl daemon-reload (IIRC) after modifying the service file? Just modifying the service file and restarting the service won't do the trick.

I can actually probably hack up a test which uses a modified NetworkManager package both during and after install, and see what happens with that...

Comment 23 Michel Normand 2018-12-04 19:02:31 UTC
yes I did the daemon-reload as detailed in my local patch https://pagure.io/fork/michelmno/fedora-qa/os-autoinst-distri-fedora/c/424a1787038557f134ebf3f899c688a39324adde?branch=debug_1647947

Comment 24 Michael Roth 2018-12-06 17:21:25 UTC
There are a couple recently-proposed patches, specific to ppc64, which I think may address this issue:

https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-December/182399.html
https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-December/182454.html

Comment 25 Dan Horák 2018-12-06 17:42:56 UTC
I guess that's it, dhclient succeeds after manually setting bpf_jit_limit to a pozitive number.

Comment 26 Adam Williamson 2018-12-06 19:43:49 UTC
Laura, Justin, could we maybe put those in Rawhide and see if the openQA tests start working again? thanks!

Comment 27 IBM Bug Proxy 2018-12-07 12:30:53 UTC
------- Comment From hannsj_uhl@de.ibm.com 2018-12-07 07:21 EDT-------
Comment from  Sandipan Das 2018-12-07 06:10:48 CST

A workaround would be to add something like the following in /etc/sysctl.conf. This way it will persist across reboots and nothing else has to be modified.

net.core.bpf_jit_limit = 262144000

Comment 28 Dan Horák 2018-12-07 12:41:01 UTC
Yes, but it needs a successful installation first. AFAIK it's not possible to pass the setting thru the kernel command line.

Comment 29 Adam Williamson 2018-12-07 18:20:44 UTC
could probably set it with sysctl from a shell in anaconda. I could try and hack the openQA tests to do that as a check...

Comment 30 Michel Normand 2018-12-11 16:01:36 UTC
(In reply to Adam Williamson from comment #29)
> could probably set it with sysctl from a shell in anaconda. I could try and
> hack the openQA tests to do that as a check...

I tried a patch (1) for some openQA tests and confirmed a sysctl allow bypass for some install flow, not all of them.

(1) https://pagure.io/fork/michelmno/fedora-qa/os-autoinst-distri-fedora/c/050466890c332a46285341daba6625367a68c314?branch=bug1647947_workaround

Comment 31 Adam Williamson 2018-12-11 16:57:44 UTC
yeah, ones where the network needs to be working before you can get to a console won't be fixed, obviously. but if it works for at least some of the tests, it gives us a solid indication that is the problem.

oddly enough, I've noticed the network sometimes not being up on *x86_64* tests recently too (far less often than on ppc64, though). not sure if this is something somehow similar, or entirely unrelated.

Comment 32 Dan Horák 2018-12-11 17:45:24 UTC
With the patch from https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-December/182631.html I see dhclient is working again.

Comment 33 Laura Abbott 2018-12-11 17:55:21 UTC
It seems like there's been a lot of discussion so I'd like to wait until a patch hits a maintainer's tree. Once it get committed we can certainly bring it to Fedora.

Comment 34 IBM Bug Proxy 2018-12-17 08:40:28 UTC
------- Comment From hannsj_uhl@de.ibm.com 2018-12-17 03:38 EDT-------
(In reply to comment #15)
> With the patch from
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-December/182631.html I
> see dhclient is working again.
>
.
... which is upstream accepted in the bpf tree as git commit
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=fdadd04931c2d7cd294dc5b2b342863f94be53a3
("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")

Comment 35 Dan Horák 2018-12-23 10:00:58 UTC
And now also in the mainline tree as https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fdadd04931c2d7cd294dc5b2b342863f94be53a3 (post-rc7)


Note You need to log in before you can comment on or make changes to this bug.