Bug 1659706

Summary: general protection fault in nf_conncount_destroy [nf_conncount]
Product: [Fedora] Fedora Reporter: Harald Reindl <h.reindl>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 28CC: airlied, bskeggs, ewk, hdegoede, home+fedora, ichavero, itamar, jarodwilson, jcline, jforbes, jglisse, joe, john.j5live, jonathan, josef, kernel-maint, linville, mchehab, mjg59, steved, y9t7sypezp
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-29 16:22:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
screenshot from VMware vSphere
none
reproducer without forwarding atatched none

Description Harald Reindl 2018-12-15 13:31:14 UTC
Created attachment 1514613 [details]
screenshot from VMware vSphere

https://www.spinics.net/lists/netdev/msg533253.html
this or something related is still an issue on every 4.19.x kernel up to 4.19.9

i have that on 2 out of 30 virtual servers on vSphere 6.5 as well as randomly on my pyhiscal homeserver where it takes up to a day until the machine freezes, sadly Linux is not capable to show the stacktrace when running a GUI and so i can only assume it#s the same root cause

Comment 1 Harald Reindl 2018-12-24 13:03:57 UTC
i triggered something similar as https://www.spinics.net/lists/netdev/msg533254.html recently with 4.19.12-200.fc28.x86_64 by just call my "iptables.sh" which clears and sets up all sort of rules, chains and ipset - was a one-time event by call "iptables -t filter -P INPUT DROP" but i guess that should not happen and may point out a general problem explaining the random instability of the whole 4.19.x series

IPTABLES="/usr/sbin/iptables"
IPTABLES_FLT="$IPTABLES -t filter"

/scripts/iptables.sh: line 617:  7874 Segmentation fault      $IPTABLES_FLT -P INPUT DROP

[root@firewall:~]$ dmesg -c
[Mon Dec 24 13:49:01 2018] general protection fault: 0000 [#1] SMP PTI
[Mon Dec 24 13:49:01 2018] CPU: 0 PID: 7874 Comm: iptables Not tainted 4.19.12-200.fc28.x86_64 #1
[Mon Dec 24 13:49:01 2018] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/03/2018
[Mon Dec 24 13:49:01 2018] RIP: 0010:rb_erase+0x216/0x370
[Mon Dec 24 13:49:01 2018] Code: e9 6b fe ff ff 4d 89 48 10 e9 91 fe ff ff c3 48 89 06 48 89 d0 48 8b 52 10 e9 b1 fe ff ff 48 8b 07 48 89 c1 48 83 e1 fc 74 53 <48> 3b 79 10 0f 84 94 00 00 00 4c 89 41 08 4d 85 c0 75 4c a8 01 0f
[Mon Dec 24 13:49:01 2018] RSP: 0018:ffffb63fc2263d28 EFLAGS: 00010286
[Mon Dec 24 13:49:01 2018] RAX: ffd7d18a01ee7a26 RBX: ffff9651a1b1c960 RCX: ffd7d18a01ee7a24
[Mon Dec 24 13:49:01 2018] RDX: 0000000000000000 RSI: ffff96519890d3e8 RDI: ffff9651a1b1c960
[Mon Dec 24 13:49:01 2018] RBP: ffff9651a5942c08 R08: 0000000000000000 R09: ffffffffc02a23de
[Mon Dec 24 13:49:01 2018] R10: ffff96519890b000 R11: 0000000000000000 R12: ffff96519890d3e8
[Mon Dec 24 13:49:01 2018] R13: ffff96519890d808 R14: ffff96519890d000 R15: ffff9651a1b1c980
[Mon Dec 24 13:49:01 2018] FS:  00007f76a1b53740(0000) GS:ffff9651a5e00000(0000) knlGS:0000000000000000
[Mon Dec 24 13:49:01 2018] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Mon Dec 24 13:49:01 2018] CR2: 0000564a46078000 CR3: 000000001951e005 CR4: 00000000001606f0
[Mon Dec 24 13:49:01 2018] Call Trace:
[Mon Dec 24 13:49:01 2018]  nf_conncount_destroy+0x58/0xc0 [nf_conncount]
[Mon Dec 24 13:49:01 2018]  cleanup_match+0x45/0x70
[Mon Dec 24 13:49:01 2018]  cleanup_entry+0x3e/0xc0
[Mon Dec 24 13:49:01 2018]  __do_replace+0x1ca/0x230
[Mon Dec 24 13:49:01 2018]  do_ipt_set_ctl+0x146/0x1a2
[Mon Dec 24 13:49:01 2018]  nf_setsockopt+0x44/0x70
[Mon Dec 24 13:49:01 2018]  __sys_setsockopt+0x82/0xe0
[Mon Dec 24 13:49:01 2018]  __x64_sys_setsockopt+0x20/0x30
[Mon Dec 24 13:49:01 2018]  do_syscall_64+0x5b/0x160
[Mon Dec 24 13:49:01 2018]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Mon Dec 24 13:49:01 2018] RIP: 0033:0x7f76a0a7e4ea
[Mon Dec 24 13:49:01 2018] Code: ff ff ff c3 48 8b 15 b5 d9 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b1 0f 1f 80 00 00 00 00 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 86 d9 2b 00 f7 d8 64 89 01 48
[Mon Dec 24 13:49:01 2018] RSP: 002b:00007ffc72e63a78 EFLAGS: 00000202 ORIG_RAX: 0000000000000036
[Mon Dec 24 13:49:01 2018] RAX: ffffffffffffffda RBX: 0000564a46032268 RCX: 00007f76a0a7e4ea
[Mon Dec 24 13:49:01 2018] RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000004
[Mon Dec 24 13:49:01 2018] RBP: 0000564a46060060 R08: 0000000000015f80 R09: 0000000000000000
[Mon Dec 24 13:49:01 2018] R10: 0000564a46060060 R11: 0000000000000202 R12: 0000564a460600c0
[Mon Dec 24 13:49:01 2018] R13: 0000564a46032268 R14: 0000000000015f20 R15: 0000564a46032260
[Mon Dec 24 13:49:01 2018] Modules linked in: bridge stp llc nf_nat_ftp nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_connlimit nf_conncount xt_recent nf_conntrack_ftp xt_CT xt_multiport xt_set iptable_raw xt_nat xt_NETMAP xt_iprange iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle ip_set_bitmap_port ip_set_hash_net ip_set nfnetlink crct10dif_pclmul crc32_pclmul ghash_clmulni_intel vmw_balloon vmxnet3 vmw_vmci crc32c_intel vmw_pvscsi
[Mon Dec 24 13:49:01 2018] ---[ end trace b66858b9c9a97ef2 ]---
[Mon Dec 24 13:49:01 2018] RIP: 0010:rb_erase+0x216/0x370
[Mon Dec 24 13:49:01 2018] Code: e9 6b fe ff ff 4d 89 48 10 e9 91 fe ff ff c3 48 89 06 48 89 d0 48 8b 52 10 e9 b1 fe ff ff 48 8b 07 48 89 c1 48 83 e1 fc 74 53 <48> 3b 79 10 0f 84 94 00 00 00 4c 89 41 08 4d 85 c0 75 4c a8 01 0f
[Mon Dec 24 13:49:01 2018] RSP: 0018:ffffb63fc2263d28 EFLAGS: 00010286
[Mon Dec 24 13:49:01 2018] RAX: ffd7d18a01ee7a26 RBX: ffff9651a1b1c960 RCX: ffd7d18a01ee7a24
[Mon Dec 24 13:49:01 2018] RDX: 0000000000000000 RSI: ffff96519890d3e8 RDI: ffff9651a1b1c960
[Mon Dec 24 13:49:01 2018] RBP: ffff9651a5942c08 R08: 0000000000000000 R09: ffffffffc02a23de
[Mon Dec 24 13:49:01 2018] R10: ffff96519890b000 R11: 0000000000000000 R12: ffff96519890d3e8
[Mon Dec 24 13:49:01 2018] R13: ffff96519890d808 R14: ffff96519890d000 R15: ffff9651a1b1c980
[Mon Dec 24 13:49:01 2018] FS:  00007f76a1b53740(0000) GS:ffff9651a5e00000(0000) knlGS:0000000000000000
[Mon Dec 24 13:49:01 2018] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Mon Dec 24 13:49:01 2018] CR2: 0000564a46078000 CR3: 000000001951e005 CR4: 00000000001606f0

Comment 2 Harald Reindl 2018-12-24 14:05:37 UTC
sadly until now i am not able to get a tiny reproducer

the existence of the "-p all -m set --match-set DNS_PORT dst -m recent" in a complex forwarding/nat-setup is a 1:1 reproducer here by set it to a low hithount, make some telnet calls and then call the "iptables.sh" which segfaults at "iptables -t filter -P INPUT DROP" but without all the forwarding stuff and a ton of other rules in the config below it don't crash

anways, in der forwarding setup 3 telnet connections and then "iptables.sh" even with the small script below is a reproduceable crash with the stacktrace above while there are tons of other xt_recent / ipset rules but *that one* in combination makes it to a reprdocer because that was the only change in a 800 lines shell script maintained and in use for months

all other crashes are random after hours - it would be so much helpful if you could tell the kernel: duimp all your stackzrace infos to this USB stick and don't care about it#s consistence instead sit in front of a freezed desktop or get just a few lines in a vSphere screenshot

#!/bin/bash
ipset -exist create DNS_PORT bitmap:port range 53-53
ipset flush DNS_PORT
ipset add DNS_PORT "53"
iptables -t filter -P INPUT DROP
iptables -t filter -P FORWARD DROP
iptables -t filter -P OUTPUT ACCEPT
iptables -t filter -F
iptables -t filter -X
for table in `cat /proc/net/ip_tables_names`; do iptables -t "$table" -F; done
for table in `cat /proc/net/ip_tables_names`; do iptables -t "$table" -X; done
for table in `cat /proc/net/ip_tables_names`; do iptables -t "$table" -Z; done
iptables -t filter -N INBOUND
iptables -t filter -A INBOUND -p all -m set --match-set DNS_PORT dst -m recent --name "limit_dns_global" --update --seconds 2 --hitcount 1 --rsource --rttl --reap -j REJECT
iptables -t filter -A INBOUND -p all -m set --match-set DNS_PORT dst -m recent --name "limit_dns_global" --set --rsource
iptables -t filter -A INBOUND -p tcp --dport 10022 -j ACCEPT
iptables -t filter -A INPUT -p all -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
iptables -t filter -A INPUT -p all -j INBOUND

Comment 3 Harald Reindl 2018-12-24 15:16:48 UTC
Created attachment 1516559 [details]
reproducer without forwarding atatched

see shellscript, stacktrace and files of "/etc/sysconfig/iptables" and "/etc/ipset/ipset"

* ssh port is 10022 on the vmare guest
* execute "iptables-debug.sh"
* fire some connections to port 10022 while ssh is still connected
* call "iptables-debug.sh" repeatly

[Mon Dec 24 16:08:04 2018] general protection fault: 0000 [#1] SMP PTI
[Mon Dec 24 16:08:04 2018] CPU: 0 PID: 890 Comm: iptables Not tainted 4.19.12-200.fc28.x86_64 #1
[Mon Dec 24 16:08:04 2018] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/03/2018
[Mon Dec 24 16:08:04 2018] RIP: 0010:rb_erase+0x216/0x370
[Mon Dec 24 16:08:04 2018] Code: e9 6b fe ff ff 4d 89 48 10 e9 91 fe ff ff c3 48 89 06 48 89 d0 48 8b 52 10 e9 b1 fe ff ff 48 8b 07 48 89 c1 48 83 e1 fc 74 53 <48> 3b 79 10 0f 84 94 00 00 00 4c 89 41 08 4d 85 c0 75 4c a8 01 0f
[Mon Dec 24 16:08:04 2018] RSP: 0018:ffffa8a840ac7d28 EFLAGS: 00010286
[Mon Dec 24 16:08:04 2018] RAX: 89a7851347204a0d RBX: ffff940e233086c0 RCX: 89a7851347204a0c
[Mon Dec 24 16:08:04 2018] RDX: 0000000000000000 RSI: ffff940e22502730 RDI: ffff940e233086c0
[Mon Dec 24 16:08:04 2018] RBP: ffff940e25942ec8 R08: 0000000000000000 R09: ffffffffc01533de
[Mon Dec 24 16:08:04 2018] R10: ffff940e24aac268 R11: 00000000000003c0 R12: ffff940e22502730
[Mon Dec 24 16:08:04 2018] R13: ffff940e22502808 R14: ffff940e22502000 R15: ffff940e233086e0
[Mon Dec 24 16:08:04 2018] FS:  00007f29173e7740(0000) GS:ffff940e25e00000(0000) knlGS:0000000000000000
[Mon Dec 24 16:08:04 2018] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Mon Dec 24 16:08:04 2018] CR2: 000055f6ca2fab48 CR3: 000000002242e003 CR4: 00000000001606f0
[Mon Dec 24 16:08:04 2018] Call Trace:
[Mon Dec 24 16:08:04 2018]  nf_conncount_destroy+0x58/0xc0 [nf_conncount]
[Mon Dec 24 16:08:04 2018]  cleanup_match+0x45/0x70
[Mon Dec 24 16:08:04 2018]  cleanup_entry+0x3e/0xc0
[Mon Dec 24 16:08:04 2018]  __do_replace+0x1ca/0x230
[Mon Dec 24 16:08:04 2018]  do_ipt_set_ctl+0x146/0x1a2
[Mon Dec 24 16:08:04 2018]  nf_setsockopt+0x44/0x70
[Mon Dec 24 16:08:04 2018]  __sys_setsockopt+0x82/0xe0
[Mon Dec 24 16:08:04 2018]  __x64_sys_setsockopt+0x20/0x30
[Mon Dec 24 16:08:04 2018]  do_syscall_64+0x5b/0x160
[Mon Dec 24 16:08:04 2018]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Mon Dec 24 16:08:04 2018] RIP: 0033:0x7f29163124ea
[Mon Dec 24 16:08:04 2018] Code: ff ff ff c3 48 8b 15 b5 d9 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b1 0f 1f 80 00 00 00 00 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 86 d9 2b 00 f7 d8 64 89 01 48
[Mon Dec 24 16:08:04 2018] RSP: 002b:00007ffcc79633b8 EFLAGS: 00000202 ORIG_RAX: 0000000000000036
[Mon Dec 24 16:08:04 2018] RAX: ffffffffffffffda RBX: 000055f6ca2f8268 RCX: 00007f29163124ea
[Mon Dec 24 16:08:04 2018] RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000004
[Mon Dec 24 16:08:04 2018] RBP: 000055f6ca2f9e50 R08: 0000000000000cf8 R09: 0000000000000000
[Mon Dec 24 16:08:04 2018] R10: 000055f6ca2f9e50 R11: 0000000000000202 R12: 000055f6ca2f9eb0
[Mon Dec 24 16:08:04 2018] R13: 000055f6ca2f8268 R14: 0000000000000c98 R15: 000055f6ca2f8260
[Mon Dec 24 16:08:04 2018] Modules linked in: bridge stp llc xt_recent xt_set xt_connlimit nf_conncount xt_conntrack iptable_raw iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle ip_set_bitmap_port ip_set nfnetlink crct10dif_pclmul crc32_pclmul ghash_clmulni_intel vmw_balloon vmxnet3 vmw_vmci crc32c_intel vmw_pvscsi
[Mon Dec 24 16:08:04 2018] ---[ end trace a31b84c2a1d265ac ]---
[Mon Dec 24 16:08:04 2018] RIP: 0010:rb_erase+0x216/0x370
[Mon Dec 24 16:08:04 2018] Code: e9 6b fe ff ff 4d 89 48 10 e9 91 fe ff ff c3 48 89 06 48 89 d0 48 8b 52 10 e9 b1 fe ff ff 48 8b 07 48 89 c1 48 83 e1 fc 74 53 <48> 3b 79 10 0f 84 94 00 00 00 4c 89 41 08 4d 85 c0 75 4c a8 01 0f
[Mon Dec 24 16:08:04 2018] RSP: 0018:ffffa8a840ac7d28 EFLAGS: 00010286
[Mon Dec 24 16:08:04 2018] RAX: 89a7851347204a0d RBX: ffff940e233086c0 RCX: 89a7851347204a0c
[Mon Dec 24 16:08:04 2018] RDX: 0000000000000000 RSI: ffff940e22502730 RDI: ffff940e233086c0
[Mon Dec 24 16:08:04 2018] RBP: ffff940e25942ec8 R08: 0000000000000000 R09: ffffffffc01533de
[Mon Dec 24 16:08:04 2018] R10: ffff940e24aac268 R11: 00000000000003c0 R12: ffff940e22502730
[Mon Dec 24 16:08:04 2018] R13: ffff940e22502808 R14: ffff940e22502000 R15: ffff940e233086e0
[Mon Dec 24 16:08:04 2018] FS:  00007f29173e7740(0000) GS:ffff940e25e00000(0000) knlGS:0000000000000000
[Mon Dec 24 16:08:04 2018] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Mon Dec 24 16:08:04 2018] CR2: 000055f6ca2fab48 CR3: 000000002242e003 CR4: 00000000001606f0

Comment 4 Harald Reindl 2018-12-25 09:38:12 UTC
sambe with 4.20.0 - it's amazing to see how broken the network stack got by just one small break of Linus

[Tue Dec 25 10:36:09 2018] general protection fault: 0000 [#1] SMP PTI
[Tue Dec 25 10:36:09 2018] CPU: 0 PID: 1482 Comm: iptables Not tainted 4.20.0-1.fc30.x86_64 #1
[Tue Dec 25 10:36:09 2018] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/03/2018
[Tue Dec 25 10:36:09 2018] RIP: 0010:rb_erase+0x216/0x370
[Tue Dec 25 10:36:09 2018] Code: e9 6b fe ff ff 4d 89 48 10 e9 91 fe ff ff c3 48 89 06 48 89 d0 48 8b 52 10 e9 b1 fe ff ff 48 8b 07 48 89 c1 48 83 e1 fc 74 53 <48> 3b 79 10 0f 84 94 00 00 00 4c 89 41 08 4d 85 c0 75 4c a8 01 0f
[Tue Dec 25 10:36:09 2018] RSP: 0018:ffffa012c1533d28 EFLAGS: 00010282
[Tue Dec 25 10:36:09 2018] RAX: 92c28cd212bb14de RBX: ffff8a61e5b88c00 RCX: 92c28cd212bb14dc
[Tue Dec 25 10:36:09 2018] RDX: 0000000000000000 RSI: ffff8a61e2f5c3b0 RDI: ffff8a61e5b88c00
[Tue Dec 25 10:36:09 2018] RBP: ffff8a61e595ad68 R08: 0000000000000000 R09: ffffffffc01d73de
[Tue Dec 25 10:36:09 2018] R10: ffff8a61ddc3e000 R11: 0000000000000001 R12: ffff8a61e2f5c3b0
[Tue Dec 25 10:36:09 2018] R13: ffff8a61e2f5c808 R14: ffff8a61e2f5c000 R15: ffff8a61e5b88c20
[Tue Dec 25 10:36:09 2018] FS:  00007f0584f23740(0000) GS:ffff8a61e5e00000(0000) knlGS:0000000000000000
[Tue Dec 25 10:36:09 2018] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Tue Dec 25 10:36:09 2018] CR2: 00005572afa8d000 CR3: 000000001eef8004 CR4: 00000000001606f0
[Tue Dec 25 10:36:09 2018] Call Trace:
[Tue Dec 25 10:36:09 2018]  nf_conncount_destroy+0x58/0xc0 [nf_conncount]
[Tue Dec 25 10:36:09 2018]  cleanup_match+0x45/0x70
[Tue Dec 25 10:36:09 2018]  cleanup_entry+0x3e/0xc0
[Tue Dec 25 10:36:09 2018]  __do_replace+0x1ca/0x230
[Tue Dec 25 10:36:09 2018]  do_ipt_set_ctl+0x146/0x1a2
[Tue Dec 25 10:36:09 2018]  nf_setsockopt+0x44/0x70
[Tue Dec 25 10:36:09 2018]  __sys_setsockopt+0x82/0xe0
[Tue Dec 25 10:36:09 2018]  __x64_sys_setsockopt+0x20/0x30
[Tue Dec 25 10:36:09 2018]  do_syscall_64+0x5b/0x160
[Tue Dec 25 10:36:09 2018]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Tue Dec 25 10:36:09 2018] RIP: 0033:0x7f0583e4e4ea
[Tue Dec 25 10:36:09 2018] Code: ff ff ff c3 48 8b 15 b5 d9 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b1 0f 1f 80 00 00 00 00 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 86 d9 2b 00 f7 d8 64 89 01 48
[Tue Dec 25 10:36:09 2018] RSP: 002b:00007ffdfbf8ffb8 EFLAGS: 00000202 ORIG_RAX: 0000000000000036
[Tue Dec 25 10:36:09 2018] RAX: ffffffffffffffda RBX: 00005572afa47268 RCX: 00007f0583e4e4ea
[Tue Dec 25 10:36:09 2018] RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000004
[Tue Dec 25 10:36:09 2018] RBP: 00005572afa75060 R08: 0000000000015f80 R09: 0000000000000000
[Tue Dec 25 10:36:09 2018] R10: 00005572afa75060 R11: 0000000000000202 R12: 00005572afa750c0
[Tue Dec 25 10:36:09 2018] R13: 00005572afa47268 R14: 0000000000015f20 R15: 00005572afa47260
[Tue Dec 25 10:36:09 2018] Modules linked in: bridge stp llc nf_nat_ftp nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_connlimit nf_conncount xt_recent nf_conntrack_ftp xt_CT xt_multiport xt_set iptable_raw xt_nat xt_NETMAP xt_iprange iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle ip_set_hash_net ip_set_bitmap_port ip_set nfnetlink crct10dif_pclmul crc32_pclmul ghash_clmulni_intel vmw_balloon vmxnet3 vmw_vmci crc32c_intel vmw_pvscsi
[Tue Dec 25 10:36:09 2018] ---[ end trace fd2c7198d3893a8f ]---
[Tue Dec 25 10:36:09 2018] RIP: 0010:rb_erase+0x216/0x370
[Tue Dec 25 10:36:09 2018] Code: e9 6b fe ff ff 4d 89 48 10 e9 91 fe ff ff c3 48 89 06 48 89 d0 48 8b 52 10 e9 b1 fe ff ff 48 8b 07 48 89 c1 48 83 e1 fc 74 53 <48> 3b 79 10 0f 84 94 00 00 00 4c 89 41 08 4d 85 c0 75 4c a8 01 0f
[Tue Dec 25 10:36:09 2018] RSP: 0018:ffffa012c1533d28 EFLAGS: 00010282
[Tue Dec 25 10:36:09 2018] RAX: 92c28cd212bb14de RBX: ffff8a61e5b88c00 RCX: 92c28cd212bb14dc
[Tue Dec 25 10:36:09 2018] RDX: 0000000000000000 RSI: ffff8a61e2f5c3b0 RDI: ffff8a61e5b88c00
[Tue Dec 25 10:36:09 2018] RBP: ffff8a61e595ad68 R08: 0000000000000000 R09: ffffffffc01d73de
[Tue Dec 25 10:36:09 2018] R10: ffff8a61ddc3e000 R11: 0000000000000001 R12: ffff8a61e2f5c3b0
[Tue Dec 25 10:36:09 2018] R13: ffff8a61e2f5c808 R14: ffff8a61e2f5c000 R15: ffff8a61e5b88c20
[Tue Dec 25 10:36:09 2018] FS:  00007f0584f23740(0000) GS:ffff8a61e5e00000(0000) knlGS:0000000000000000
[Tue Dec 25 10:36:09 2018] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Tue Dec 25 10:36:09 2018] CR2: 00005572afa8d000 CR3: 000000001eef8004 CR4: 00000000001606f0

Comment 5 Harald Reindl 2018-12-25 19:08:27 UTC
since the Fedora bugtracker is authistic see upstream report: https://bugzilla.kernel.org/show_bug.cgi?id=202065

Comment 6 Jeremy Cline 2019-01-17 20:23:46 UTC
The upstream series referenced in the kernel BZ (https://marc.info/?l=netfilter-devel&m=154595714909303) has been merged into 5.0-rc1 as commit c78e7818f16f through a007232066f6.

Comment 7 Jeremy Cline 2019-01-17 20:27:59 UTC
*** Bug 1662509 has been marked as a duplicate of this bug. ***

Comment 8 Harald Reindl 2019-01-17 20:53:26 UTC
and weill we ever see a fix or do we need to run 4.18.20-100.fc27.x86_64 on F28/F29 until Fedora is rebased to 5.0 in a few months?

Comment 9 Harald Reindl 2019-01-17 20:54:58 UTC
i don't give a fuck about 5.0-rc1 in context of https://bugzilla.kernel.org/show_bug.cgi?id=202065#c2

Comment 10 Jeremy Cline 2019-01-17 21:16:36 UTC
Hi Harald,

I think several of your comments on this bug out of line with the Fedora Code of Conduct[0] and I recommend that you review that document. I realize you're frustrated, but there are a great many bugs and very few people. In the future I recommend that you politely point to upstream fixes in the corresponding Red Hat Bugzilla, as it's much more likely one of us will see that and be inclined to pick up the patches.

Since the series apply cleanly to v4.20 it's likely Fedora will carry them.

[0] https://docs.fedoraproject.org/en-US/project/code-of-conduct/

Comment 11 Jeremy Cline 2019-01-17 21:44:26 UTC
I would appreciate it if folks could test https://koji.fedoraproject.org/koji/taskinfo?taskID=32096600 when it's done.

Comment 12 Harald Reindl 2019-01-17 21:58:01 UTC
Harald Reindl 2018-12-25 19:08:27 UTC
since the Fedora bugtracker is authistic see upstream report: https://bugzilla.kernel.org/show_bug.cgi?id=202065

https://bugzilla.kernel.org/show_bug.cgi?id=202065#c2

frankly *what else* do you expect from me than 

a) make notice on the fedora kernel list that 4.19.x crashs all the time
b) risk my data by still playinng aorund with it
c) write a repdoduxer
d) point on the fedora kenel list to the reprocuer
e) open a upstream bugreport pointing to the fedora bugzilla and attach the reproducer

there was *no single* response on this bugreport nor on the mailing list so please don't play the "code of conduct" card to me in that case when one is ignored completly at every channel

Comment 13 Harald Reindl 2019-01-17 22:02:41 UTC
> In the future I recommend that you politely point to upstream fixes in the corresponding Red Hat Bugzilla

that happened with https://bugzilla.redhat.com/show_bug.cgi?id=1659706#c5
i commented each and every kernel update since then with a reference to the connlimt bug


SERIOUSLY: 
tell me what else should i have done!
are you kidding me?
just read your karma feedback

================================================================================
     kernel-4.19.13-200.fc28 kernel-headers-4.19.13-200.fc28
================================================================================
  Update ID: FEDORA-2019-b968817096
    Release: Fedora 28
     Status: pending
       Type: bugfix
   Severity: unspecified
      Karma: 1
   Critpath: True
    Request: testing
      Notes: The v4.19.13 stable update contains important fixes across the
           : tree.
  Submitter: jcline
  Submitted: 2019-01-01 17:31:50.282651
   Comments: hreindl - 2019-01-01 18:21:20.567472 (karma 1)
             works for me - at least when you don't use iptables
             connlimit because the patches for
             https://bugzilla.kernel.org/show_bug.cgi?id=202065 did
             not make it to 4.19.x until now - amazing how a
             complete kernel series can be broken like that
             bodhi - 2019-01-01 17:31:50.293743 (karma 0)
             This update has been submitted for testing by jcline.

Comment 14 Harald Reindl 2019-01-18 02:21:51 UTC
koji build is running here on F28 for now (3:20 AM localtime) and if the machine hasn't freezed in around 5 hours from now i would say it's fixed and hope F28 get as soon as possible official rebased to 4.20

[root@srv-rhsoft:~]$ uname -a
Linux srv-rhsoft.rhsoft.net 4.20.3-200.rhbz1659706.fc29.x86_64 #1 SMP Thu Jan 17 22:47:56 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

[root@srv-rhsoft:~]$ firewall_status | grep conn
7        0     0 DROP       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            #conn src/32 > 50
8        0     0 DROP       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            #conn src/24 > 150

[root@srv-rhsoft:~]$ uptime
03:18:57 up  1:30,  7 users,  load average: 1,53, 0,89, 0,74

Comment 15 Harald Reindl 2019-01-18 02:22:57 UTC
somehow late to the party the patches are now *proposed* for 4.19.x too

-------- Weitergeleitete Nachricht --------
Betreff: stable fixes for nf_conncount 4.19.x
Datum: Fri, 18 Jan 2019 02:24:14 +0100
Von: Pablo Neira Ayuso <pablo>
An: Greg Kroah-Hartman <gregkh>
Kopie (CC): stable.org, netfilter-devel.org

Hi Greg,

Could you cherry-pick the follow list of patches into -stable 4.19.x, please?

a007232066f6 netfilter: nf_conncount: fix argument order to find_next_bit
c80f10bc973a netfilter: nf_conncount: speculative garbage collection on empty lists
2f971a8f4255 netfilter: nf_conncount: move all list iterations under spinlock
df4a90250976 netfilter: nf_conncount: merge lookup and add functions
e8cfb372b38a netfilter: nf_conncount: restart search when nodes have been erased
f7fcc98dfc2d netfilter: nf_conncount: split gc in two phases
4cd273bb91b3 netfilter: nf_conncount: don't skip eviction when age is negative
c78e7818f16f netfilter: nf_conncount: replace CONNCOUNT_LOCK_SLOTS with CONNCOUNT_SLOTS

conncount infrastructure is not in good shape, for more details see:

https://bugzilla.kernel.org/show_bug.cgi?id=202013

Comment 16 Harald Reindl 2019-01-18 09:05:57 UTC
confirmed, the longest uptime after 4.18.0 without crash when you have connlimit rules

[harry@srv-rhsoft:~]$ uptime
10:04:42 up  8:15,  9 users,  load average: 1,34, 1,38, 1,31

[harry@srv-rhsoft:~]$ uname -a
Linux srv-rhsoft.rhsoft.net 4.20.3-200.rhbz1659706.fc29.x86_64 #1 SMP Thu Jan 17 22:47:56 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Comment 17 Harald Reindl 2019-01-18 09:48:36 UTC

-------- Weitergeleitete Nachricht --------
Betreff: Re: stable fixes for nf_conncount 4.19.x
Datum: Fri, 18 Jan 2019 10:30:31 +0100
Von: Reindl Harald <h.reindl>
An: Greg Kroah-Hartman <gregkh>, Pablo Neira Ayuso <pablo>
Kopie (CC): stable.org, netfilter-devel.org


Am 18.01.19 um 09:14 schrieb Greg Kroah-Hartman:
> On Fri, Jan 18, 2019 at 07:57:07AM +0100, Greg Kroah-Hartman wrote:
>> On Fri, Jan 18, 2019 at 02:24:14AM +0100, Pablo Neira Ayuso wrote:
>>> Hi Greg,
>>>
>>> Could you cherry-pick the follow list of patches into -stable 4.19.x, please?
>>>
>>> a007232066f6 netfilter: nf_conncount: fix argument order to find_next_bit
>>> c80f10bc973a netfilter: nf_conncount: speculative garbage collection on empty lists
>>> 2f971a8f4255 netfilter: nf_conncount: move all list iterations under spinlock
>>> df4a90250976 netfilter: nf_conncount: merge lookup and add functions
>>> e8cfb372b38a netfilter: nf_conncount: restart search when nodes have been erased
>>> f7fcc98dfc2d netfilter: nf_conncount: split gc in two phases
>>> 4cd273bb91b3 netfilter: nf_conncount: don't skip eviction when age is negative
>>> c78e7818f16f netfilter: nf_conncount: replace CONNCOUNT_LOCK_SLOTS with CONNCOUNT_SLOTS
>>>
>>> conncount infrastructure is not in good shape, for more details see:
>>>
>>> https://bugzilla.kernel.org/show_bug.cgi?id=202013
>>
>> These should also go into 4.20.y as well, right?  I don't want people to
>> experience regressions moving from 4.19 to a newer kernel release.

there is a 4.20.3 Fedora build with the patches
https://koji.fedoraproject.org/koji/taskinfo?taskID=32096601

[harry@srv-rhsoft:~]$ uname -a
Linux srv-rhsoft.rhsoft.net 4.20.3-200.rhbz1659706.fc29.x86_64 #1 SMP
Thu Jan 17 22:47:56 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

[harry@srv-rhsoft:~]$ uptime
10:29:17 up  8:40,  9 users,  load average: 0,63, 0,64, 0,82

[root@srv-rhsoft:~]$ firewall_status | grep conn
7        0     0 DROP       tcp  --  *      *       0.0.0.0/0
 0.0.0.0/0            #conn src/32 > 50
8        0     0 DROP       tcp  --  *      *       0.0.0.0/0
 0.0.0.0/0            #conn src/24 > 150

before this machine crashed within 4 hours from 4.19.0 until recent

Comment 18 Jeremy Cline 2019-01-18 19:45:09 UTC
*** Bug 1667220 has been marked as a duplicate of this bug. ***

Comment 19 Steve 2019-01-19 17:29:39 UTC
Here is a reliable reproducer with 4.19.15-300.fc29.x86_64 in a VM using "httpd" (Apache) and "ab" (Apache benchmark).

Install the "httpd" and "httpd-tools" packages (the latter has "ab").

Verify that httpd is running and can be connected to:
$ curl -I 10.0.2.15 # Use "ip addr" to get the VM's IP address.

Stop firewalld:
# systemctl stop firewalld.service

Install and verify this netfilter rule (Thank-you Vsevolod: Bug 1667220):
# iptables -A INPUT -m connlimit --connlimit-above 100 -j REJECT --reject-with icmp-port-unreachable
# iptables -S # This should show only one rule.

In a separate terminal window, run:
$ dmesg -w # Monitor log messages as they are written.

In a separate terminal window run:
$ ab -n 10000 -c 100 10.0.2.15/

Switch to a root terminal window and run:
# iptables -Z
Segmentation fault

The "dmesg" window should show the "nf_conncount_destroy+0x58/0xc0 [nf_conncount]" call trace.

The VM host is F28 with:
qemu-system-x86-2.11.2-4.fc28.x86_64
libvirt-4.1.0-5.fc28.x86_64
virt-manager-1.5.1-1.fc28.noarch

Comment 20 Harald Reindl 2019-01-22 21:33:10 UTC
https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.20.4

can you for the sake of god rebase F28 too given that the in muliple ways broken 4.19.x was rebased at 4.19.2 before 4.18.x was EOL

Comment 21 Steve 2019-01-23 07:04:20 UTC
The "netfilter: nf_conncount: ..." commits are in 4.19.17:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v4.19.17

$ git -C linux-4.19 log --oneline --grep 'netfilter: nf_conncount:' v4.19.16..v4.19.17
6567515e4 netfilter: nf_conncount: fix argument order to find_next_bit
b01b92417 netfilter: nf_conncount: speculative garbage collection on empty lists
aea1d1959 netfilter: nf_conncount: move all list iterations under spinlock
bdc6c893b netfilter: nf_conncount: merge lookup and add functions
13c639424 netfilter: nf_conncount: restart search when nodes have been erased
d6b3ff022 netfilter: nf_conncount: split gc in two phases
ef68fdb51 netfilter: nf_conncount: don't skip eviction when age is negative
c5cbe95a4 netfilter: nf_conncount: replace CONNCOUNT_LOCK_SLOTS with CONNCOUNT_SLOTS

Comment 22 Harald Reindl 2019-01-23 19:35:18 UTC
god bless your for https://koji.fedoraproject.org/koji/buildinfo?buildID=1181740
the x86_64 part (https://koji.fedoraproject.org/koji/taskinfo?taskID=32211932) is already done

[harry@srv-rhsoft:~]$ uname -a
Linux srv-rhsoft.rhsoft.net 4.20.4-100.fc28.x86_64 #1 SMP Wed Jan 23 16:46:32 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Comment 23 Justin M. Forbes 2019-01-29 16:22:39 UTC
The 4.20.4 update has been pushed to stable. Closing this for now, reopen if the issue is not resolved.