Bug 1460213 - cls_matchall: kernel panic when used with classful qdiscs
Summary: cls_matchall: kernel panic when used with classful qdiscs
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.4
Hardware: Unspecified
OS: Unspecified
Target Milestone: rc
: ---
Assignee: Davide Caratti
QA Contact: Li Shuang
Ioanna Gkioka
Depends On:
Blocks: 1323132 1470965
TreeView+ depends on / blocked
Reported: 2017-06-09 11:40 UTC by Jiri Benc
Modified: 2018-04-10 20:48 UTC (History)
11 users (show)

Fixed In Version: kernel-3.10.0-742.el7
Doc Type: Bug Fix
Doc Text:
Using `cls_matchall` with classful queue disciplines no longer causes the kernel to crash Previously, the matchall classifier `(cls_matchall)` did not assign the `classic` option to a packet. As a consequence, the kernel terminated unexpectedly when trying to use `cls_matchall` with classful queueing disciplines `(classful qdiscs)`, such as Hierarchical Token Bucket (HTB) or Class Based Queueing (CBQ). With this update, when `cls_matchall` processes "classid", "classid" is assigned to a packet. As a result, `cls_matchall` with `classful qdiscs` can now be used successfully and the user-provided value of "classid" is no longer ignored in the described scenario. For more details on the kernel actions related to "classid", see the `OPTIONS` section in the "tc-matchall (8)" man page.
Clone Of:
Last Closed: 2018-04-10 20:47:23 UTC
Target Upstream Version:

Attachments (Terms of Use)
Reproducer (983 bytes, text/plain)
2017-06-09 11:40 UTC, Jiri Benc
no flags Details
reproducer - many qdiscs + testmode (1.48 KB, text/plain)
2017-07-11 10:38 UTC, Davide Caratti
no flags Details
patch for upstream 4.13 (1.30 KB, patch)
2017-09-06 17:00 UTC, Davide Caratti
no flags Details | Diff
patch for upstream 4.13 (updated message) (2.07 KB, patch)
2017-09-14 10:54 UTC, Davide Caratti
no flags Details | Diff

System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:1062 0 None None None 2018-04-10 20:48:38 UTC

Description Jiri Benc 2017-06-09 11:40:55 UTC
Created attachment 1286381 [details]

[   13.098430] general protection fault: 0000 [#1] SMP
[   13.100073] Modules linked in: act_tunnel_key cls_matchall sch_htb vxlan ip6_udp_tunnel udp_tunnel bridge stp llc v
[   13.109247] CPU: 0 PID: 213 Comm: kworker/0:2 Not tainted 3.10.0-678.el7.x86_64 #1
[   13.110538] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[   13.111388] Workqueue: ipv6_addrconf addrconf_dad_work
[   13.112232] task: ffff880036b52f70 ti: ffff880036b24000 task.ti: ffff880036b24000
[   13.113510] RIP: 0010:[<ffffffff815ae840>]  [<ffffffff815ae840>] tc_classify_compat+0x40/0x80
[   13.114952] RSP: 0018:ffff880036b279f0  EFLAGS: 00010286
[   13.115748] RAX: 00000000ffffffff RBX: f908c0c748d26348 RCX: 0000000000000001
[   13.116703] RDX: ffff880036b27a50 RSI: f908c0c748d26348 RDI: ffff88003c183b00
[   13.117658] RBP: ffff880036b27a10 R08: 00000000894881b0 R09: 0000000000000003
[   13.118617] R10: ffff88003e001600 R11: ffffffff815ac210 R12: 000000000000dd86
[   13.119575] R13: ffff88003c183b00 R14: ffff880036b27a50 R15: 0000000000000000
[   13.120532] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[   13.121869] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   13.122714] CR2: 00007f87659d2e60 CR3: 000000003cc3b000 CR4: 00000000000006f0
[   13.123682] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   13.124643] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   13.125604] Stack:
[   13.126122]  0000000000000005 f908c0c748d26348 ffff880036b27a50 ffff88003c183b00
[   13.127605]  ffff880036b27a40 ffffffff815af263 ffff8800368af000 ffff88003c183b00
[   13.129093]  ffff8800368af09c ffffffff810992e2 ffff880036b27a88 ffffffffc02f45b0
[   13.130575] Call Trace:
[   13.131133]  [<ffffffff815af263>] tc_classify+0x33/0x90
[   13.131926]  [<ffffffff810992e2>] ? del_timer_sync+0x52/0x60
[   13.132757]  [<ffffffffc02f45b0>] htb_enqueue+0xb0/0x3a0 [sch_htb]
[   13.133639]  [<ffffffff810992e2>] ? del_timer_sync+0x52/0x60
[   13.134475]  [<ffffffff81587588>] __dev_queue_xmit+0x298/0x550
[   13.142918]  [<ffffffff81587850>] dev_queue_xmit+0x10/0x20
[   13.143754]  [<ffffffff8159322b>] neigh_resolve_output+0x11b/0x220
[   13.144654]  [<ffffffff8163e5da>] ip6_finish_output2+0x19a/0x4c0
[   13.145522]  [<ffffffff812bbc0a>] ? selinux_ipv6_postroute+0x1a/0x20
[   13.146421]  [<ffffffff815bde10>] ? nf_iterate+0x70/0xb0
[   13.147223]  [<ffffffff81640abc>] ip6_finish_output+0x8c/0xf0
[   13.148064]  [<ffffffff81640b77>] ip6_output+0x57/0x100
[   13.148864]  [<ffffffff81640a30>] ? ip6_fragment+0xad0/0xad0
[   13.149738]  [<ffffffff81657c46>] NF_HOOK_THRESH.constprop.26+0x36/0xa0
[   13.150654]  [<ffffffff8164fba3>] ? icmp6_dst_alloc+0x113/0x150
[   13.151509]  [<ffffffff81657e2e>] ndisc_send_skb+0x17e/0x290
[   13.152346]  [<ffffffff81658d6c>] ndisc_send_ns+0xfc/0x1d0
[   13.153165]  [<ffffffff8164800e>] addrconf_dad_work+0x27e/0x360
[   13.154020]  [<ffffffff810bf0b0>] ? finish_task_switch+0x50/0x160
[   13.154889]  [<ffffffff810a87fa>] process_one_work+0x17a/0x440
[   13.155733]  [<ffffffff810a94c6>] worker_thread+0x126/0x3c0
[   13.156557]  [<ffffffff810a93a0>] ? manage_workers.isra.24+0x2a0/0x2a0
[   13.157544]  [<ffffffff810b096f>] kthread+0xcf/0xe0
[   13.158398]  [<ffffffff810b08a0>] ? insert_kthread_work+0x40/0x40
[   13.160050]  [<ffffffff816b2958>] ret_from_fork+0x58/0x90
[   13.160871]  [<ffffffff810b08a0>] ? insert_kthread_work+0x40/0x40
[   13.161819] Code: 53 f6 87 a3 00 00 00 10 48 89 f3 44 0f b7 a7 a0 00 00 00 66 44 0f 44 67 7e 48 85 f6 75 0d eb 3b 0
[   13.168137] RIP  [<ffffffff815ae840>] tc_classify_compat+0x40/0x80
[   13.169071]  RSP <ffff880036b279f0>
[   13.169744] ---[ end trace ff2c16620cf64c35 ]---
[   13.170504] Kernel panic - not syncing: Fatal exception in interrupt
[   13.171591] Kernel Offset: disabled

Comment 4 Davide Caratti 2017-07-11 10:38:20 UTC
Created attachment 1296176 [details]
reproducer - many qdiscs + testmode

extend the reproducer to try it on multiple qdiscs. Using testmode =1, so that res.class is initialized to 0x100, before invoking the filter, Iobserved the NULL pointer dereference using the following queue disciplines:


Comment 6 Jiri Benc 2017-07-11 11:42:23 UTC
At least in the htb case, the patch just causes htb to classify on a random classid (depending on the previous memory contents).

The correct way is to behave like there was no filter, i.e. invoke continue if no classification happened. The problem here is the current API does not pass that information out of the filters. This will require deeper changes, I'm afraid.

Comment 7 Davide Caratti 2017-09-06 17:00:31 UTC
Created attachment 1322756 [details]
patch for upstream 4.13

(In reply to Jiri Benc from comment #6)
> At least in the htb case, the patch just causes htb to classify on a random
> classid (depending on the previous memory contents).
> The correct way is to behave like there was no filter, i.e. invoke continue
> if no classification happened. The problem here is the current API does not
> pass that information out of the filters. This will require deeper changes,
> I'm afraid.

hello Jiri,

'matchall' seems to be the only tc filter that does not do anything to the tcf_result. Other filters assign their own configured values (tipically class = 0 and classid = < 32-bit id configured from userspace>. Maybe it's enough to add a single line in mall_classify(), before the actions are invoked, where the configured values are copied into tcf_result.

I tried this one-line patch with reproducer and observed no more crashes _ and accidentally saw that the 'classid' parameter of the matchall man page (http://man7.org/linux/man-pages/man8/tc-matchall.8.html) is then used in htb_classify.

Do you think this is acceptable upstream?
thank you in advance,

Comment 8 Jiri Benc 2017-09-06 18:31:17 UTC
(In reply to Davide Caratti from comment #7)
> Do you think this is acceptable upstream?

It does not solve the problem, it just hides it. I don't think that's what upstream wants.

Comment 9 Jiri Benc 2017-09-06 18:35:31 UTC
Btw, I spoke about this with Jiri Pirko and Jamal but I'm quite sure they forgot by now :-) I'm adding Jiri and opening the bug, there's nothing secret in here.

Comment 10 Davide Caratti 2017-09-07 13:34:38 UTC
(In reply to Jiri Benc from comment #8)
> (In reply to Davide Caratti from comment #7)
> > Do you think this is acceptable upstream?
> It does not solve the problem, it just hides it. I don't think that's what
> upstream wants.


looking at the sources, it seems that 'matchall' never worked as classifier since the beginning (i.e. it doesn't provide a classid value in the output, like other classifiers do). 

But this does not look intentional, otherwise the UAPI wouldn't expose TCA_MATCHALL_CLASSID as a control for the 'matchall' classifier (and there wouldn't be a reference of it in the man page). Currently, whatever 32bit number is written in tb[TCA_MATCHALL_CLASSID], the kernel ignores it; IMHO this is worth fixing (or we should deprecate it), because similar controls for other filters (i.e TCA_U32_CLASSID, TCA_TCINDEX_CLASSID or TCA_BASIC_CLASSID) are functional.

That said, users might expect that cls_matchall skips classification, and jumps to the next filter as you propose in comment #6, in case the TCA_MATCHALL_CLASSID is not specified. This behavior can be also made generic so that it can be implemented in other classifiers (like u32, or tc_index, or others).

Any opinion?
thank you in advance,

Comment 11 Davide Caratti 2017-09-14 10:54:49 UTC
Created attachment 1325902 [details]
patch for upstream 4.13 (updated message)

hi Jiri (Benc),

I looked at other classifiers code: all of them write to tcf_result when the filter matches, even when the 'classid' parameter is not specified by the user (in this last case, the struct is filled with zero).
To let 'matchall' avoid classification and jump to the next filter (like you suggest in comment #6) , we can make mall_classify() return a negative value (e.g. -1, TC_ACT_UNSPEC): same as what 'tcindex' classifier does when it's configured with 'pass_on' option and the packet doesn't match (see http://man7.org/linux/man-pages/man8/tc-tcindex.8.html).

Anyway, this is a change of behavior for 'matchall', we should at least make it configurable and documented, i.e. specifying a keyword in iproute2 that makes it jump to the next filter (I'm testing a patch that implements this).
And, before changing 'matchall' behavior, we should at least fix the current behavior - where the user can configure 'classid' but matchall doesn't care of that value.

Unless you have objections, I will post the attached patch (where I tried to describe the problem more clearly) targeting net.git, and a proposal for adding configurable 'pass_on' to cls_matchall, targeting net-next.git.

thank you in advance for any feedback,

Comment 12 Davide Caratti 2017-09-22 14:12:44 UTC
bz1494554 has been filed as followup of comment #6 and conmment #11

Comment 14 Davide Caratti 2017-09-22 14:15:29 UTC

Comment 16 Rafael Aquini 2017-10-20 04:13:09 UTC
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 18 Rafael Aquini 2017-10-21 03:11:51 UTC
Patch(es) available on kernel-3.10.0-742.el7

Comment 28 errata-xmlrpc 2018-04-10 20:47:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.