RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1531680 - openvswitch: list_add corruption splat on running OVS check-kernel tests
Summary: openvswitch: list_add corruption splat on running OVS check-kernel tests
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.5
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Eric Garver
QA Contact: Jiying Qiu
URL:
Whiteboard:
: 1548330 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-05 19:27 UTC by Eric Garver
Modified: 2019-05-09 08:42 UTC (History)
9 users (show)

Fixed In Version: kernel-3.10.0-842.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-10 23:26:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:1062 0 None None None 2018-04-10 23:28:45 UTC

Description Eric Garver 2018-01-05 19:27:24 UTC
Running the OVS check-kernel on kernel-3.10.0-826.el7 results in a splat and/or panic. I bisected this down to:

  fc2302dde0d9 ("[net] openvswitch: Fix refcount leak on force commit")

I believe this is caused by a missing del_timer() when deleting the ct in OVS. RHEL7 does not yet have the changes to remove the timers (f330a7fdbe16 ("netfilter: conntrack: get rid of conntrack timer")). The force commit feature was added upstream _after_ f330a7fdbe16.


---->8----


 19: conntrack - force commit                       [  292.986769] openvswitch: Open vSwitch switching datapath
[  293.005376] gre: GRE over IPv4 demultiplexor driver
[  293.011158] ip_gre: GRE over IPv4 tunneling driver
[  293.080647] device ovs-system entered promiscuous mode
[  293.087169] device br0 entered promiscuous mode
[  293.158999] IPv6: ADDRCONF(NETDEV_UP): ovs-p0: link is not ready
[  293.166910] device ovs-p0 entered promiscuous mode
[  293.193609] IPv6: ADDRCONF(NETDEV_CHANGE): ovs-p0: link becomes ready
[  293.218107] IPv6: ADDRCONF(NETDEV_UP): ovs-p1: link is not ready
[  293.223920] device ovs-p1 entered promiscuous mode
[  293.249218] IPv6: ADDRCONF(NETDEV_CHANGE): ovs-p1: link becomes ready
[  293.282799] ------------[ cut here ]------------
[  293.283729] WARNING: CPU: 0 PID: 7047 at lib/list_debug.c:33 __list_add+0xac/0xc0
[  293.284744] list_add corruption. prev->next should be next (ffffffff890b6078), but was           (null). (prev=ffff8936f764ad10).
[  293.286499] Modules linked in: vport_vxlan vxlan vport_gre ip_gre ip_tunnel gre vport_geneve geneve ip6_udp_tunnel udp_tunnel openvswitch nf_nat_ipv6 nf_nat_ipv4 nf_nat_tf
tp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_netlink nfnetlink bonding 8021q garp mr
p stp llc veth nf_conntrack snd_hda_codec_generic ext4 snd_hda_intel iosf_mbi snd_hda_codec crc32_pclmul mbcache jbd2 snd_hda_core ghash_clmulni_intel snd_hwdep snd_seq snd_s
eq_device snd_pcm ppdev aesni_intel lrw gf128mul glue_helper ablk_helper snd_timer cryptd pcspkr snd joydev virtio_balloon soundcore i2c_piix4 parport_pc parport ip_tables xf
s libcrc32c ata_generic pata_acpi qxl ata_piix virtio_blk drm_kms_helper virtio_net virtio_console syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10di
f_common crc32c_intel libata drm serio_raw i2c_core virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4]
[  293.298457] CPU: 0 PID: 7047 Comm: handler1 Not tainted 3.10.0+ #36
[  293.299505] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  293.300517] Call Trace:
[  293.301331]  [<ffffffff886f4b0d>] dump_stack+0x19/0x1b
[  293.302338]  [<ffffffff8808e338>] __warn+0xd8/0x100
[  293.303318]  [<ffffffff8808e3bf>] warn_slowpath_fmt+0x5f/0x80
[  293.304372]  [<ffffffff88362eac>] __list_add+0xac/0xc0
[  293.305354]  [<ffffffff8809fb03>] __internal_add_timer+0x113/0x130
[  293.306396]  [<ffffffff8809fb52>] internal_add_timer+0x32/0x70
[  293.307431]  [<ffffffff880a0fae>] mod_timer+0x13e/0x220
[  293.308418]  [<ffffffff880a10a8>] add_timer+0x18/0x20
[  293.309412]  [<ffffffffc05582ca>] __nf_conntrack_confirm+0x34a/0x510 [nf_conntrack]
[  293.310538]  [<ffffffffc05d08c8>] ovs_ct_execute+0x598/0x6a0 [openvswitch]
[  293.311616]  [<ffffffffc05c8e2d>] ? reserve_sfa_size+0x2d/0xe0 [openvswitch]
[  293.312700]  [<ffffffffc05c010e>] do_execute_actions+0x4ee/0xa30 [openvswitch]
[  293.313827]  [<ffffffffc05c8db9>] ? nla_alloc_flow_actions+0x39/0x80 [openvswitch]
[  293.314999]  [<ffffffffc05c0a4c>] ovs_execute_actions+0x4c/0x140 [openvswitch]
[  293.316111]  [<ffffffffc05c3c4b>] ovs_packet_cmd_execute+0x2ab/0x2e0 [openvswitch]
[  293.317234]  [<ffffffff8860f02a>] genl_family_rcv_msg+0x1fa/0x420
[  293.318296]  [<ffffffff885bfacd>] ? __alloc_skb+0x5d/0x2d0
[  293.319354]  [<ffffffff8860f2ab>] genl_rcv_msg+0x5b/0xc0
[  293.320354]  [<ffffffff8860b5c0>] ? __netlink_lookup+0xc0/0x110
[  293.321382]  [<ffffffff8860f250>] ? genl_family_rcv_msg+0x420/0x420
[  293.322429]  [<ffffffff8860d2c9>] netlink_rcv_skb+0xa9/0xc0
[  293.323456]  [<ffffffff8860d808>] genl_rcv+0x28/0x40
[  293.324418]  [<ffffffff8860cc4a>] netlink_unicast+0x16a/0x210
[  293.325425]  [<ffffffff8860cff8>] netlink_sendmsg+0x308/0x420
[  293.326428]  [<ffffffff885b52f0>] sock_sendmsg+0xb0/0xf0
[  293.327516]  [<ffffffff8868c796>] ? unix_dgram_sendmsg+0x3c6/0x770
[  293.328692]  [<ffffffff885b60e9>] ___sys_sendmsg+0x3a9/0x3c0
[  293.329714]  [<ffffffff885b5a4e>] ? SYSC_sendto+0x17e/0x1c0
[  293.330721]  [<ffffffff885b7691>] __sys_sendmsg+0x51/0x90
[  293.331744]  [<ffffffff885b76e2>] SyS_sendmsg+0x12/0x20
[  293.332699]  [<ffffffff88705d09>] system_call_fastpath+0x16/0x1b
[  293.333743] ---[ end trace 037a31a4a0de0f18 ]---
[  293.334679] ------------[ cut here ]------------
[  293.335584] WARNING: CPU: 0 PID: 7047 at lib/list_debug.c:36 __list_add+0x8a/0xc0
[  293.336661] list_add double add: new=ffff8936f764ad10, prev=ffff8936f764ad10, next=ffffffff890b6078.
[  293.337854] Modules linked in: vport_vxlan vxlan vport_gre ip_gre ip_tunnel gre vport_geneve geneve ip6_udp_tunnel udp_tunnel openvswitch nf_nat_ipv6 nf_nat_ipv4 nf_nat_tf
tp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_netlink nfnetlink bonding 8021q garp mr
p stp llc veth nf_conntrack snd_hda_codec_generic ext4 snd_hda_intel iosf_mbi snd_hda_codec crc32_pclmul mbcache jbd2 snd_hda_core ghash_clmulni_intel snd_hwdep snd_seq snd_s
eq_device snd_pcm ppdev aesni_intel lrw gf128mul glue_helper ablk_helper snd_timer cryptd pcspkr snd joydev virtio_balloon soundcore i2c_piix4 parport_pc parport ip_tables xf
s libcrc32c ata_generic pata_acpi qxl ata_piix virtio_blk drm_kms_helper virtio_net virtio_console syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10di
f_common crc32c_intel libata drm serio_raw i2c_core virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4]
[  293.350046] CPU: 0 PID: 7047 Comm: handler1 Tainted: G        W      ------------   3.10.0+ #36
[  293.351205] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  293.352182] Call Trace:
[  293.352941]  [<ffffffff886f4b0d>] dump_stack+0x19/0x1b
[  293.353861]  [<ffffffff8808e338>] __warn+0xd8/0x100
[  293.354769]  [<ffffffff8808e3bf>] warn_slowpath_fmt+0x5f/0x80
[  293.355731]  [<ffffffff88362e8a>] __list_add+0x8a/0xc0
[  293.356683]  [<ffffffff8809fb03>] __internal_add_timer+0x113/0x130
[  293.357706]  [<ffffffff8809fb52>] internal_add_timer+0x32/0x70
[  293.358681]  [<ffffffff880a0fae>] mod_timer+0x13e/0x220
[  293.359637]  [<ffffffff880a10a8>] add_timer+0x18/0x20
[  293.360566]  [<ffffffffc05582ca>] __nf_conntrack_confirm+0x34a/0x510 [nf_conntrack]
[  293.361649]  [<ffffffffc05d08c8>] ovs_ct_execute+0x598/0x6a0 [openvswitch]
[  293.362683]  [<ffffffffc05c8e2d>] ? reserve_sfa_size+0x2d/0xe0 [openvswitch]
[  293.364140]  [<ffffffffc05c010e>] do_execute_actions+0x4ee/0xa30 [openvswitch]
[  293.365513]  [<ffffffffc05c8db9>] ? nla_alloc_flow_actions+0x39/0x80 [openvswitch]
[  293.366821]  [<ffffffffc05c0a4c>] ovs_execute_actions+0x4c/0x140 [openvswitch]
[  293.368014]  [<ffffffffc05c3c4b>] ovs_packet_cmd_execute+0x2ab/0x2e0 [openvswitch]
[  293.369156]  [<ffffffff8860f02a>] genl_family_rcv_msg+0x1fa/0x420
[  293.370305]  [<ffffffff885bfacd>] ? __alloc_skb+0x5d/0x2d0
[  293.371320]  [<ffffffff8860f2ab>] genl_rcv_msg+0x5b/0xc0
[  293.372282]  [<ffffffff8860b5c0>] ? __netlink_lookup+0xc0/0x110
[  293.373319]  [<ffffffff8860f250>] ? genl_family_rcv_msg+0x420/0x420
[  293.374403]  [<ffffffff8860d2c9>] netlink_rcv_skb+0xa9/0xc0
[  293.375391]  [<ffffffff8860d808>] genl_rcv+0x28/0x40
[  293.376334]  [<ffffffff8860cc4a>] netlink_unicast+0x16a/0x210
[  293.377336]  [<ffffffff8860cff8>] netlink_sendmsg+0x308/0x420
[  293.378355]  [<ffffffff885b52f0>] sock_sendmsg+0xb0/0xf0
[  293.379322]  [<ffffffff8868c796>] ? unix_dgram_sendmsg+0x3c6/0x770
[  293.380353]  [<ffffffff885b60e9>] ___sys_sendmsg+0x3a9/0x3c0
[  293.381349]  [<ffffffff885b5a4e>] ? SYSC_sendto+0x17e/0x1c0
[  293.382337]  [<ffffffff885b7691>] __sys_sendmsg+0x51/0x90
[  293.384515]  [<ffffffff885b76e2>] SyS_sendmsg+0x12/0x20
[  293.385444]  [<ffffffff88705d09>] system_call_fastpath+0x16/0x1b
[  293.386456] ---[ end trace 037a31a4a0de0f19 ]---

Comment 2 Eric Garver 2018-01-05 19:31:15 UTC
Sorry, the correct commit to which I bisected this down to is:

  f37ed043ed24 ("[net] openvswitch: Add force commit")

Comment 8 Jiying Qiu 2018-01-11 10:29:09 UTC
When reproduce this bug, there is the issue below. Is this the same as this bug? Should I submit a new bug? Thanks.


[23151.875866] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[23151.884623] IP: [<ffffffffa8ea1268>] get_next_timer_interrupt+0x1b8/0x260
[23151.892212] PGD 0 
[23151.894461] Oops: 0000 [#1] SMP 
[23151.898078] Modules linked in: nf_conntrack_netlink nfnetlink vhost_net vhost macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat xt_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack mlx4_en mlx4_core devlink sunrpc sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass ipmi_ssif crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd ipmi_devintf sg ipmi_msghandler iTCO_wdt iTCO_vendor_support pcspkr mxm_wmi dcdbas mei_me shpchp wmi mei acpi_power_meter lpc_ich ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm i40e tg3 libahci crct10dif_pclmul crct10dif_common libata crc32c_intel megaraid_sas ptp i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
[23152.002627] CPU: 39 PID: 0 Comm: swapper/39 Not tainted 3.10.0-826.el7.x86_64 #1
[23152.010879] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.4.3 01/17/2017
[23152.019227] task: ffff9e9a34fccf10 ti: ffff9e9a34fe0000 task.ti: ffff9e9a34fe0000
[23152.027576] RIP: 0010:[<ffffffffa8ea1268>]  [<ffffffffa8ea1268>] get_next_timer_interrupt+0x1b8/0x260
[23152.037874] RSP: 0018:ffff9e9a34fe3df0  EFLAGS: 00010017
[23152.043798] RAX: ffff9e9a33841428 RBX: 0000150e2f762f80 RCX: 0000000000000000
[23152.051759] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000040573
[23152.059720] RBP: ffff9e9a34fe3e40 R08: ffff9e9a33841788 R09: 0000000000000001
[23152.067681] R10: 0000000000000036 R11: 0000000000000033 R12: 00000001015cac6f
[23152.075642] R13: ffff9e9a33840000 R14: ffff9e9a34fe3e00 R15: ffff9e9a34fe3e10
[23152.083602] FS:  0000000000000000(0000) GS:ffff9ea19e2c0000(0000) knlGS:0000000000000000
[23152.092628] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[23152.099038] CR2: 0000000000000018 CR3: 000000018300e000 CR4: 00000000003427e0
[23152.106999] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[23152.114959] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[23152.122920] Call Trace:
[23152.125653]  [<ffffffffa8efd798>] tick_nohz_stop_sched_tick+0x1e8/0x370
[23152.133032]  [<ffffffffa8efd9bf>] __tick_nohz_idle_enter+0x9f/0x160
[23152.140024]  [<ffffffffa8efdeff>] tick_nohz_idle_enter+0x3f/0x70
[23152.146726]  [<ffffffffa8eef007>] cpu_startup_entry+0xa7/0x1e0
[23152.153237]  [<ffffffffa8e548f6>] start_secondary+0x1b6/0x230
[23152.159649]  [<ffffffffa8e000d5>] start_cpu+0x5/0x14
[23152.165186] Code: 00 48 89 55 c8 41 89 fb 41 83 e3 3f 45 89 da 0f 1f 40 00 4d 63 c2 49 c1 e0 04 49 01 c0 49 8b 10 4c 39 c2 74 25 66 0f 1f 44 00 00 <f6> 42 18 01 75 11 48 8b 72 10 41 b9 01 00 00 00 48 39 ce 48 0f 
[23152.186752] RIP  [<ffffffffa8ea1268>] get_next_timer_interrupt+0x1b8/0x260
[23152.194432]  RSP <ffff9e9a34fe3df0>
[23152.198320] CR2: 0000000000000018

Comment 9 Eric Garver 2018-01-11 13:47:49 UTC
(In reply to Jiying Qiu from comment #8)
> When reproduce this bug, there is the issue below. Is this the same as this
> bug? Should I submit a new bug? Thanks.

It's likely the same bug. This instance is a bad dereference when doing work on the timer list.

Comment 10 Bruno Meneguele 2018-01-31 10:29:29 UTC
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 12 Bruno Meneguele 2018-02-01 15:55:11 UTC
Patch(es) available on kernel-3.10.0-842.el7

Comment 16 Jiri Benc 2018-02-27 10:17:51 UTC
*** Bug 1548330 has been marked as a duplicate of this bug. ***

Comment 17 errata-xmlrpc 2018-04-10 23:26:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1062


Note You need to log in before you can comment on or make changes to this bug.