Created attachment 1286381 [details]
[ 13.098430] general protection fault: 0000 [#1] SMP
[ 13.100073] Modules linked in: act_tunnel_key cls_matchall sch_htb vxlan ip6_udp_tunnel udp_tunnel bridge stp llc v
[ 13.109247] CPU: 0 PID: 213 Comm: kworker/0:2 Not tainted 3.10.0-678.el7.x86_64 #1
[ 13.110538] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 13.111388] Workqueue: ipv6_addrconf addrconf_dad_work
[ 13.112232] task: ffff880036b52f70 ti: ffff880036b24000 task.ti: ffff880036b24000
[ 13.113510] RIP: 0010:[<ffffffff815ae840>] [<ffffffff815ae840>] tc_classify_compat+0x40/0x80
[ 13.114952] RSP: 0018:ffff880036b279f0 EFLAGS: 00010286
[ 13.115748] RAX: 00000000ffffffff RBX: f908c0c748d26348 RCX: 0000000000000001
[ 13.116703] RDX: ffff880036b27a50 RSI: f908c0c748d26348 RDI: ffff88003c183b00
[ 13.117658] RBP: ffff880036b27a10 R08: 00000000894881b0 R09: 0000000000000003
[ 13.118617] R10: ffff88003e001600 R11: ffffffff815ac210 R12: 000000000000dd86
[ 13.119575] R13: ffff88003c183b00 R14: ffff880036b27a50 R15: 0000000000000000
[ 13.120532] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[ 13.121869] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 13.122714] CR2: 00007f87659d2e60 CR3: 000000003cc3b000 CR4: 00000000000006f0
[ 13.123682] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 13.124643] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 13.125604] Stack:
[ 13.126122] 0000000000000005 f908c0c748d26348 ffff880036b27a50 ffff88003c183b00
[ 13.127605] ffff880036b27a40 ffffffff815af263 ffff8800368af000 ffff88003c183b00
[ 13.129093] ffff8800368af09c ffffffff810992e2 ffff880036b27a88 ffffffffc02f45b0
[ 13.130575] Call Trace:
[ 13.131133] [<ffffffff815af263>] tc_classify+0x33/0x90
[ 13.131926] [<ffffffff810992e2>] ? del_timer_sync+0x52/0x60
[ 13.132757] [<ffffffffc02f45b0>] htb_enqueue+0xb0/0x3a0 [sch_htb]
[ 13.133639] [<ffffffff810992e2>] ? del_timer_sync+0x52/0x60
[ 13.134475] [<ffffffff81587588>] __dev_queue_xmit+0x298/0x550
[ 13.142918] [<ffffffff81587850>] dev_queue_xmit+0x10/0x20
[ 13.143754] [<ffffffff8159322b>] neigh_resolve_output+0x11b/0x220
[ 13.144654] [<ffffffff8163e5da>] ip6_finish_output2+0x19a/0x4c0
[ 13.145522] [<ffffffff812bbc0a>] ? selinux_ipv6_postroute+0x1a/0x20
[ 13.146421] [<ffffffff815bde10>] ? nf_iterate+0x70/0xb0
[ 13.147223] [<ffffffff81640abc>] ip6_finish_output+0x8c/0xf0
[ 13.148064] [<ffffffff81640b77>] ip6_output+0x57/0x100
[ 13.148864] [<ffffffff81640a30>] ? ip6_fragment+0xad0/0xad0
[ 13.149738] [<ffffffff81657c46>] NF_HOOK_THRESH.constprop.26+0x36/0xa0
[ 13.150654] [<ffffffff8164fba3>] ? icmp6_dst_alloc+0x113/0x150
[ 13.151509] [<ffffffff81657e2e>] ndisc_send_skb+0x17e/0x290
[ 13.152346] [<ffffffff81658d6c>] ndisc_send_ns+0xfc/0x1d0
[ 13.153165] [<ffffffff8164800e>] addrconf_dad_work+0x27e/0x360
[ 13.154020] [<ffffffff810bf0b0>] ? finish_task_switch+0x50/0x160
[ 13.154889] [<ffffffff810a87fa>] process_one_work+0x17a/0x440
[ 13.155733] [<ffffffff810a94c6>] worker_thread+0x126/0x3c0
[ 13.156557] [<ffffffff810a93a0>] ? manage_workers.isra.24+0x2a0/0x2a0
[ 13.157544] [<ffffffff810b096f>] kthread+0xcf/0xe0
[ 13.158398] [<ffffffff810b08a0>] ? insert_kthread_work+0x40/0x40
[ 13.160050] [<ffffffff816b2958>] ret_from_fork+0x58/0x90
[ 13.160871] [<ffffffff810b08a0>] ? insert_kthread_work+0x40/0x40
[ 13.161819] Code: 53 f6 87 a3 00 00 00 10 48 89 f3 44 0f b7 a7 a0 00 00 00 66 44 0f 44 67 7e 48 85 f6 75 0d eb 3b 0
[ 13.168137] RIP [<ffffffff815ae840>] tc_classify_compat+0x40/0x80
[ 13.169071] RSP <ffff880036b279f0>
[ 13.169744] ---[ end trace ff2c16620cf64c35 ]---
[ 13.170504] Kernel panic - not syncing: Fatal exception in interrupt
[ 13.171591] Kernel Offset: disabled
Created attachment 1296176 [details]
reproducer - many qdiscs + testmode
extend the reproducer to try it on multiple qdiscs. Using testmode =1, so that res.class is initialized to 0x100, before invoking the filter, Iobserved the NULL pointer dereference using the following queue disciplines:
At least in the htb case, the patch just causes htb to classify on a random classid (depending on the previous memory contents).
The correct way is to behave like there was no filter, i.e. invoke continue if no classification happened. The problem here is the current API does not pass that information out of the filters. This will require deeper changes, I'm afraid.
Created attachment 1322756 [details]
patch for upstream 4.13
(In reply to Jiri Benc from comment #6)
> At least in the htb case, the patch just causes htb to classify on a random
> classid (depending on the previous memory contents).
> The correct way is to behave like there was no filter, i.e. invoke continue
> if no classification happened. The problem here is the current API does not
> pass that information out of the filters. This will require deeper changes,
> I'm afraid.
'matchall' seems to be the only tc filter that does not do anything to the tcf_result. Other filters assign their own configured values (tipically class = 0 and classid = < 32-bit id configured from userspace>. Maybe it's enough to add a single line in mall_classify(), before the actions are invoked, where the configured values are copied into tcf_result.
I tried this one-line patch with reproducer and observed no more crashes _ and accidentally saw that the 'classid' parameter of the matchall man page (http://man7.org/linux/man-pages/man8/tc-matchall.8.html) is then used in htb_classify.
Do you think this is acceptable upstream?
thank you in advance,
(In reply to Davide Caratti from comment #7)
> Do you think this is acceptable upstream?
It does not solve the problem, it just hides it. I don't think that's what upstream wants.
Btw, I spoke about this with Jiri Pirko and Jamal but I'm quite sure they forgot by now :-) I'm adding Jiri and opening the bug, there's nothing secret in here.
(In reply to Jiri Benc from comment #8)
> (In reply to Davide Caratti from comment #7)
> > Do you think this is acceptable upstream?
> It does not solve the problem, it just hides it. I don't think that's what
> upstream wants.
looking at the sources, it seems that 'matchall' never worked as classifier since the beginning (i.e. it doesn't provide a classid value in the output, like other classifiers do).
But this does not look intentional, otherwise the UAPI wouldn't expose TCA_MATCHALL_CLASSID as a control for the 'matchall' classifier (and there wouldn't be a reference of it in the man page). Currently, whatever 32bit number is written in tb[TCA_MATCHALL_CLASSID], the kernel ignores it; IMHO this is worth fixing (or we should deprecate it), because similar controls for other filters (i.e TCA_U32_CLASSID, TCA_TCINDEX_CLASSID or TCA_BASIC_CLASSID) are functional.
That said, users might expect that cls_matchall skips classification, and jumps to the next filter as you propose in comment #6, in case the TCA_MATCHALL_CLASSID is not specified. This behavior can be also made generic so that it can be implemented in other classifiers (like u32, or tc_index, or others).
thank you in advance,
Created attachment 1325902 [details]
patch for upstream 4.13 (updated message)
hi Jiri (Benc),
I looked at other classifiers code: all of them write to tcf_result when the filter matches, even when the 'classid' parameter is not specified by the user (in this last case, the struct is filled with zero).
To let 'matchall' avoid classification and jump to the next filter (like you suggest in comment #6) , we can make mall_classify() return a negative value (e.g. -1, TC_ACT_UNSPEC): same as what 'tcindex' classifier does when it's configured with 'pass_on' option and the packet doesn't match (see http://man7.org/linux/man-pages/man8/tc-tcindex.8.html).
Anyway, this is a change of behavior for 'matchall', we should at least make it configurable and documented, i.e. specifying a keyword in iproute2 that makes it jump to the next filter (I'm testing a patch that implements this).
And, before changing 'matchall' behavior, we should at least fix the current behavior - where the user can configure 'classid' but matchall doesn't care of that value.
Unless you have objections, I will post the attached patch (where I tried to describe the problem more clearly) targeting net.git, and a proposal for adding configurable 'pass_on' to cls_matchall, targeting net-next.git.
thank you in advance for any feedback,
bz1494554 has been filed as followup of comment #6 and conmment #11
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing
Patch(es) available on kernel-3.10.0-742.el7
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.