RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2127774 - nft: netlink_delinearize.c:2695: netlink_delinearize_rule: Assertion `pctx->table != NULL' failed.
Summary: nft: netlink_delinearize.c:2695: netlink_delinearize_rule: Assertion `pctx->t...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: nftables
Version: 8.5
Hardware: All
OS: Linux
unspecified
high
Target Milestone: rc
: 8.8
Assignee: Phil Sutter
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On: 2211076
Blocks: 2130721
TreeView+ depends on / blocked
 
Reported: 2022-09-18 23:38 UTC by Jonathan Maxwell
Modified: 2024-06-05 12:10 UTC (History)
7 users (show)

Fixed In Version: nftables-1.0.4-2.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2130721 (view as bug list)
Environment:
Last Closed: 2024-06-05 12:10:22 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-134226 0 None None None 2022-09-18 23:44:59 UTC

Comment 3 Phil Sutter 2022-09-20 17:10:13 UTC
Hi Jon,

The assert there triggers if the rule received from kernel references a table that doesn't exist in cache. This is a bit odd because in cache_init() (src/rule.c), table cache is unconditionally populated before fetching rule cache.

Do you know how nft was called and what the ruleset was? Maybe there's a bug in RHEL8.5 nftables. This is nftables-0.9.3-21.el8, right?

Cheers, Phil

Comment 5 Phil Sutter 2022-09-20 17:44:10 UTC
I'm currently looking at the C01S01F1 core dump. It seems lookup is for IPv6
family table "filter", while cache contains only IPv4 family table "filter":

(gdb) print h->family
$12 = 10
(gdb) print h->table.name
$13 = 0x55b359887980 "filter"

(gdb) print ((struct table *)(ctx->nft->cache->list))->handle.family
$14 = 2
(gdb) print ((struct table *)(ctx->nft->cache->list))->handle.table.name
$15 = 0x55b359887d70 "filter"
(gdb) print ctx->nft->cache->list->next == ctx->nft->cache->list->prev
$16 = 1

I did not find a related fix yet.

Comment 6 Phil Sutter 2022-09-20 23:09:00 UTC
According to the dump, the command was 'nft monitor rules'.

When monitoring, nft has to consistently keep the cache up to date. It reused
the events for this: A new table event will add said table to the cache.

If a new rule event is received, nft assumes it has either seen the rule's
table and chain at startup (where initially a full cache is fetched) or there
must have been a new table/chain event prior to the new rule event.

The dump indicates nft either missed this new table event or it is possible
somehow to make it drop the cache when it should not. Or the kernel did not
send the new table event.

Either way, I'm a bit out of ideas when it comes to reproducing the problem,
also I didn't find a potential fix in nftables git history at least.

How frequent does it happen in the customer's env? Are they able to reproduce
the crash?

Comment 7 Jonathan Maxwell 2022-09-20 23:12:43 UTC
(In reply to Phil Sutter from comment #3)
> Hi Jon,
> 
> The assert there triggers if the rule received from kernel references a
> table that doesn't exist in cache. This is a bit odd because in cache_init()
> (src/rule.c), table cache is unconditionally populated before fetching rule
> cache.
> 
> Do you know how nft was called and what the ruleset was? Maybe there's a bug
> in RHEL8.5 nftables. This is nftables-0.9.3-21.el8, right?
> 
> Cheers, Phil

Hi Phil,

Yes its:

nftables-0.9.3-21.el8.x86_64

Regards

Jon

Comment 8 Jonathan Maxwell 2022-09-21 00:03:59 UTC
(In reply to Phil Sutter from comment #6)
> According to the dump, the command was 'nft monitor rules'.
> 
> When monitoring, nft has to consistently keep the cache up to date. It reused
> the events for this: A new table event will add said table to the cache.
> 
> If a new rule event is received, nft assumes it has either seen the rule's
> table and chain at startup (where initially a full cache is fetched) or there
> must have been a new table/chain event prior to the new rule event.
> 
> The dump indicates nft either missed this new table event or it is possible
> somehow to make it drop the cache when it should not. Or the kernel did not
> send the new table event.
> 
> Either way, I'm a bit out of ideas when it comes to reproducing the problem,
> also I didn't find a potential fix in nftables git history at least.
> 

Thanks for the Hypothesis Phil that makes sense.

> How frequent does it happen in the customer's env? Are they able to reproduce
> the crash?

They are not able to reproduce per se. They said "It happens only on some system and not always.". I have asked if they can provide us with the application code and a reproducer if possible. 

Regards

Jon

Comment 14 Phil Sutter 2022-09-24 09:51:36 UTC
Turns out my attempts at reproducing the issue were just not pressing enough: I
had tried to run 'iptables -A' and 'nft flush ruleset' in a loop and called
'nft monitor' a few times. But since this is a race condition between 'nft add
table' and 'nft monitor' startup, it doesn't happen as often. Starting and killing 'nft monitor' in a loop as well did the trick:

| #!/bin/bash
| 
| while true; do
|         ./install/sbin/nft flush ruleset
|         nft -f - <<-EOF
|                 table t {
|                         chain c {
|                                 counter
|                         }
|                 }
|         EOF
| done &
| maniploop=$!
| 
| trap "kill $maniploop; kill \$!; wait" EXIT
| 
| while true; do
|         ./install/sbin/nft monitor rules >/dev/null &
|         sleep 0.2
|         kill $!
| done

I tried to make 'nft monitor' refresh cache once after receiving the first
event, but it made the abort more likely. I'll try to eliminate the assert()
calls next week - they're bad practice within a library anyway.

Comment 16 Phil Sutter 2022-09-28 23:33:09 UTC
Fix submitted upstream: 

https://lore.kernel.org/netfilter-devel/20220928223248.25933-1-phil@nwl.cc/

I'll clone this ticket for RHEL9 since the problem exists there as well.

Comment 17 Jonathan Maxwell 2022-10-23 05:15:52 UTC
Phil,

Seeing that it will now return NULL and errno ENOENT. Will any changes be required to the calling program?

Regards

Jon

Comment 18 Phil Sutter 2022-11-02 10:51:03 UTC
Hi Jon,

(In reply to Jonathan Maxwell from comment #17)
> Seeing that it will now return NULL and errno ENOENT. Will any changes be
> required to the calling program?

No, it's fine as it is: The calling function in 'nft monitor',
netlink_events_rule_cb(), will print 'W: Received event for an unknown table.'
on stderr and otherwise ignore the event if netlink_delinearize_rule() returns
NULL. The program then returns to listening for the next event.

Cheers, Phil

Comment 20 sushil kulkarni 2022-11-16 21:09:32 UTC
Dropping from the 8.8 RPL because of lack of Votes (devel whiteboard).

-Sushil

Comment 21 Phil Sutter 2023-08-24 11:32:42 UTC
Inherited the fix mentioned in comment 16 by package rebase, marking as TestOnly.

Comment 22 Marcelo Ricardo Leitner 2023-11-27 14:55:13 UTC
Hi QE, please adjust the stale date if you still want to have a specific test case for this. Otherwise, this bz will close in a week from now.
Thanks.

Comment 23 Marcelo Ricardo Leitner 2024-06-05 12:10:22 UTC
The program is considering auto-closing old/stale bugzilla tickets, like this one. I'll go ahead and close it manually already.


Note You need to log in before you can comment on or make changes to this bug.