Bug 1531680
| Summary: | openvswitch: list_add corruption splat on running OVS check-kernel tests | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Eric Garver <egarver> |
| Component: | kernel | Assignee: | Eric Garver <egarver> |
| kernel sub component: | OVS | QA Contact: | Jiying Qiu <jiqiu> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | urgent | CC: | egarver, fleitner, jbenc, jiqiu, lmiksik, network-qe, qding, rkeshri, rkhan |
| Version: | 7.5 | ||
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-3.10.0-842.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-04-10 23:26:19 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Sorry, the correct commit to which I bisected this down to is:
f37ed043ed24 ("[net] openvswitch: Add force commit")
When reproduce this bug, there is the issue below. Is this the same as this bug? Should I submit a new bug? Thanks. [23151.875866] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [23151.884623] IP: [<ffffffffa8ea1268>] get_next_timer_interrupt+0x1b8/0x260 [23151.892212] PGD 0 [23151.894461] Oops: 0000 [#1] SMP [23151.898078] Modules linked in: nf_conntrack_netlink nfnetlink vhost_net vhost macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat xt_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack mlx4_en mlx4_core devlink sunrpc sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass ipmi_ssif crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper ipmi_si cryptd ipmi_devintf sg ipmi_msghandler iTCO_wdt iTCO_vendor_support pcspkr mxm_wmi dcdbas mei_me shpchp wmi mei acpi_power_meter lpc_ich ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci drm i40e tg3 libahci crct10dif_pclmul crct10dif_common libata crc32c_intel megaraid_sas ptp i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod [23152.002627] CPU: 39 PID: 0 Comm: swapper/39 Not tainted 3.10.0-826.el7.x86_64 #1 [23152.010879] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.4.3 01/17/2017 [23152.019227] task: ffff9e9a34fccf10 ti: ffff9e9a34fe0000 task.ti: ffff9e9a34fe0000 [23152.027576] RIP: 0010:[<ffffffffa8ea1268>] [<ffffffffa8ea1268>] get_next_timer_interrupt+0x1b8/0x260 [23152.037874] RSP: 0018:ffff9e9a34fe3df0 EFLAGS: 00010017 [23152.043798] RAX: ffff9e9a33841428 RBX: 0000150e2f762f80 RCX: 0000000000000000 [23152.051759] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000040573 [23152.059720] RBP: ffff9e9a34fe3e40 R08: ffff9e9a33841788 R09: 0000000000000001 [23152.067681] R10: 0000000000000036 R11: 0000000000000033 R12: 00000001015cac6f [23152.075642] R13: ffff9e9a33840000 R14: ffff9e9a34fe3e00 R15: ffff9e9a34fe3e10 [23152.083602] FS: 0000000000000000(0000) GS:ffff9ea19e2c0000(0000) knlGS:0000000000000000 [23152.092628] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [23152.099038] CR2: 0000000000000018 CR3: 000000018300e000 CR4: 00000000003427e0 [23152.106999] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [23152.114959] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [23152.122920] Call Trace: [23152.125653] [<ffffffffa8efd798>] tick_nohz_stop_sched_tick+0x1e8/0x370 [23152.133032] [<ffffffffa8efd9bf>] __tick_nohz_idle_enter+0x9f/0x160 [23152.140024] [<ffffffffa8efdeff>] tick_nohz_idle_enter+0x3f/0x70 [23152.146726] [<ffffffffa8eef007>] cpu_startup_entry+0xa7/0x1e0 [23152.153237] [<ffffffffa8e548f6>] start_secondary+0x1b6/0x230 [23152.159649] [<ffffffffa8e000d5>] start_cpu+0x5/0x14 [23152.165186] Code: 00 48 89 55 c8 41 89 fb 41 83 e3 3f 45 89 da 0f 1f 40 00 4d 63 c2 49 c1 e0 04 49 01 c0 49 8b 10 4c 39 c2 74 25 66 0f 1f 44 00 00 <f6> 42 18 01 75 11 48 8b 72 10 41 b9 01 00 00 00 48 39 ce 48 0f [23152.186752] RIP [<ffffffffa8ea1268>] get_next_timer_interrupt+0x1b8/0x260 [23152.194432] RSP <ffff9e9a34fe3df0> [23152.198320] CR2: 0000000000000018 (In reply to Jiying Qiu from comment #8) > When reproduce this bug, there is the issue below. Is this the same as this > bug? Should I submit a new bug? Thanks. It's likely the same bug. This instance is a bad dereference when doing work on the timer list. Patch(es) committed on kernel repository and an interim kernel build is undergoing testing Patch(es) available on kernel-3.10.0-842.el7 *** Bug 1548330 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1062 |
Running the OVS check-kernel on kernel-3.10.0-826.el7 results in a splat and/or panic. I bisected this down to: fc2302dde0d9 ("[net] openvswitch: Fix refcount leak on force commit") I believe this is caused by a missing del_timer() when deleting the ct in OVS. RHEL7 does not yet have the changes to remove the timers (f330a7fdbe16 ("netfilter: conntrack: get rid of conntrack timer")). The force commit feature was added upstream _after_ f330a7fdbe16. ---->8---- 19: conntrack - force commit [ 292.986769] openvswitch: Open vSwitch switching datapath [ 293.005376] gre: GRE over IPv4 demultiplexor driver [ 293.011158] ip_gre: GRE over IPv4 tunneling driver [ 293.080647] device ovs-system entered promiscuous mode [ 293.087169] device br0 entered promiscuous mode [ 293.158999] IPv6: ADDRCONF(NETDEV_UP): ovs-p0: link is not ready [ 293.166910] device ovs-p0 entered promiscuous mode [ 293.193609] IPv6: ADDRCONF(NETDEV_CHANGE): ovs-p0: link becomes ready [ 293.218107] IPv6: ADDRCONF(NETDEV_UP): ovs-p1: link is not ready [ 293.223920] device ovs-p1 entered promiscuous mode [ 293.249218] IPv6: ADDRCONF(NETDEV_CHANGE): ovs-p1: link becomes ready [ 293.282799] ------------[ cut here ]------------ [ 293.283729] WARNING: CPU: 0 PID: 7047 at lib/list_debug.c:33 __list_add+0xac/0xc0 [ 293.284744] list_add corruption. prev->next should be next (ffffffff890b6078), but was (null). (prev=ffff8936f764ad10). [ 293.286499] Modules linked in: vport_vxlan vxlan vport_gre ip_gre ip_tunnel gre vport_geneve geneve ip6_udp_tunnel udp_tunnel openvswitch nf_nat_ipv6 nf_nat_ipv4 nf_nat_tf tp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_netlink nfnetlink bonding 8021q garp mr p stp llc veth nf_conntrack snd_hda_codec_generic ext4 snd_hda_intel iosf_mbi snd_hda_codec crc32_pclmul mbcache jbd2 snd_hda_core ghash_clmulni_intel snd_hwdep snd_seq snd_s eq_device snd_pcm ppdev aesni_intel lrw gf128mul glue_helper ablk_helper snd_timer cryptd pcspkr snd joydev virtio_balloon soundcore i2c_piix4 parport_pc parport ip_tables xf s libcrc32c ata_generic pata_acpi qxl ata_piix virtio_blk drm_kms_helper virtio_net virtio_console syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10di f_common crc32c_intel libata drm serio_raw i2c_core virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4] [ 293.298457] CPU: 0 PID: 7047 Comm: handler1 Not tainted 3.10.0+ #36 [ 293.299505] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 293.300517] Call Trace: [ 293.301331] [<ffffffff886f4b0d>] dump_stack+0x19/0x1b [ 293.302338] [<ffffffff8808e338>] __warn+0xd8/0x100 [ 293.303318] [<ffffffff8808e3bf>] warn_slowpath_fmt+0x5f/0x80 [ 293.304372] [<ffffffff88362eac>] __list_add+0xac/0xc0 [ 293.305354] [<ffffffff8809fb03>] __internal_add_timer+0x113/0x130 [ 293.306396] [<ffffffff8809fb52>] internal_add_timer+0x32/0x70 [ 293.307431] [<ffffffff880a0fae>] mod_timer+0x13e/0x220 [ 293.308418] [<ffffffff880a10a8>] add_timer+0x18/0x20 [ 293.309412] [<ffffffffc05582ca>] __nf_conntrack_confirm+0x34a/0x510 [nf_conntrack] [ 293.310538] [<ffffffffc05d08c8>] ovs_ct_execute+0x598/0x6a0 [openvswitch] [ 293.311616] [<ffffffffc05c8e2d>] ? reserve_sfa_size+0x2d/0xe0 [openvswitch] [ 293.312700] [<ffffffffc05c010e>] do_execute_actions+0x4ee/0xa30 [openvswitch] [ 293.313827] [<ffffffffc05c8db9>] ? nla_alloc_flow_actions+0x39/0x80 [openvswitch] [ 293.314999] [<ffffffffc05c0a4c>] ovs_execute_actions+0x4c/0x140 [openvswitch] [ 293.316111] [<ffffffffc05c3c4b>] ovs_packet_cmd_execute+0x2ab/0x2e0 [openvswitch] [ 293.317234] [<ffffffff8860f02a>] genl_family_rcv_msg+0x1fa/0x420 [ 293.318296] [<ffffffff885bfacd>] ? __alloc_skb+0x5d/0x2d0 [ 293.319354] [<ffffffff8860f2ab>] genl_rcv_msg+0x5b/0xc0 [ 293.320354] [<ffffffff8860b5c0>] ? __netlink_lookup+0xc0/0x110 [ 293.321382] [<ffffffff8860f250>] ? genl_family_rcv_msg+0x420/0x420 [ 293.322429] [<ffffffff8860d2c9>] netlink_rcv_skb+0xa9/0xc0 [ 293.323456] [<ffffffff8860d808>] genl_rcv+0x28/0x40 [ 293.324418] [<ffffffff8860cc4a>] netlink_unicast+0x16a/0x210 [ 293.325425] [<ffffffff8860cff8>] netlink_sendmsg+0x308/0x420 [ 293.326428] [<ffffffff885b52f0>] sock_sendmsg+0xb0/0xf0 [ 293.327516] [<ffffffff8868c796>] ? unix_dgram_sendmsg+0x3c6/0x770 [ 293.328692] [<ffffffff885b60e9>] ___sys_sendmsg+0x3a9/0x3c0 [ 293.329714] [<ffffffff885b5a4e>] ? SYSC_sendto+0x17e/0x1c0 [ 293.330721] [<ffffffff885b7691>] __sys_sendmsg+0x51/0x90 [ 293.331744] [<ffffffff885b76e2>] SyS_sendmsg+0x12/0x20 [ 293.332699] [<ffffffff88705d09>] system_call_fastpath+0x16/0x1b [ 293.333743] ---[ end trace 037a31a4a0de0f18 ]--- [ 293.334679] ------------[ cut here ]------------ [ 293.335584] WARNING: CPU: 0 PID: 7047 at lib/list_debug.c:36 __list_add+0x8a/0xc0 [ 293.336661] list_add double add: new=ffff8936f764ad10, prev=ffff8936f764ad10, next=ffffffff890b6078. [ 293.337854] Modules linked in: vport_vxlan vxlan vport_gre ip_gre ip_tunnel gre vport_geneve geneve ip6_udp_tunnel udp_tunnel openvswitch nf_nat_ipv6 nf_nat_ipv4 nf_nat_tf tp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_netlink nfnetlink bonding 8021q garp mr p stp llc veth nf_conntrack snd_hda_codec_generic ext4 snd_hda_intel iosf_mbi snd_hda_codec crc32_pclmul mbcache jbd2 snd_hda_core ghash_clmulni_intel snd_hwdep snd_seq snd_s eq_device snd_pcm ppdev aesni_intel lrw gf128mul glue_helper ablk_helper snd_timer cryptd pcspkr snd joydev virtio_balloon soundcore i2c_piix4 parport_pc parport ip_tables xf s libcrc32c ata_generic pata_acpi qxl ata_piix virtio_blk drm_kms_helper virtio_net virtio_console syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10di f_common crc32c_intel libata drm serio_raw i2c_core virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_defrag_ipv4] [ 293.350046] CPU: 0 PID: 7047 Comm: handler1 Tainted: G W ------------ 3.10.0+ #36 [ 293.351205] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 293.352182] Call Trace: [ 293.352941] [<ffffffff886f4b0d>] dump_stack+0x19/0x1b [ 293.353861] [<ffffffff8808e338>] __warn+0xd8/0x100 [ 293.354769] [<ffffffff8808e3bf>] warn_slowpath_fmt+0x5f/0x80 [ 293.355731] [<ffffffff88362e8a>] __list_add+0x8a/0xc0 [ 293.356683] [<ffffffff8809fb03>] __internal_add_timer+0x113/0x130 [ 293.357706] [<ffffffff8809fb52>] internal_add_timer+0x32/0x70 [ 293.358681] [<ffffffff880a0fae>] mod_timer+0x13e/0x220 [ 293.359637] [<ffffffff880a10a8>] add_timer+0x18/0x20 [ 293.360566] [<ffffffffc05582ca>] __nf_conntrack_confirm+0x34a/0x510 [nf_conntrack] [ 293.361649] [<ffffffffc05d08c8>] ovs_ct_execute+0x598/0x6a0 [openvswitch] [ 293.362683] [<ffffffffc05c8e2d>] ? reserve_sfa_size+0x2d/0xe0 [openvswitch] [ 293.364140] [<ffffffffc05c010e>] do_execute_actions+0x4ee/0xa30 [openvswitch] [ 293.365513] [<ffffffffc05c8db9>] ? nla_alloc_flow_actions+0x39/0x80 [openvswitch] [ 293.366821] [<ffffffffc05c0a4c>] ovs_execute_actions+0x4c/0x140 [openvswitch] [ 293.368014] [<ffffffffc05c3c4b>] ovs_packet_cmd_execute+0x2ab/0x2e0 [openvswitch] [ 293.369156] [<ffffffff8860f02a>] genl_family_rcv_msg+0x1fa/0x420 [ 293.370305] [<ffffffff885bfacd>] ? __alloc_skb+0x5d/0x2d0 [ 293.371320] [<ffffffff8860f2ab>] genl_rcv_msg+0x5b/0xc0 [ 293.372282] [<ffffffff8860b5c0>] ? __netlink_lookup+0xc0/0x110 [ 293.373319] [<ffffffff8860f250>] ? genl_family_rcv_msg+0x420/0x420 [ 293.374403] [<ffffffff8860d2c9>] netlink_rcv_skb+0xa9/0xc0 [ 293.375391] [<ffffffff8860d808>] genl_rcv+0x28/0x40 [ 293.376334] [<ffffffff8860cc4a>] netlink_unicast+0x16a/0x210 [ 293.377336] [<ffffffff8860cff8>] netlink_sendmsg+0x308/0x420 [ 293.378355] [<ffffffff885b52f0>] sock_sendmsg+0xb0/0xf0 [ 293.379322] [<ffffffff8868c796>] ? unix_dgram_sendmsg+0x3c6/0x770 [ 293.380353] [<ffffffff885b60e9>] ___sys_sendmsg+0x3a9/0x3c0 [ 293.381349] [<ffffffff885b5a4e>] ? SYSC_sendto+0x17e/0x1c0 [ 293.382337] [<ffffffff885b7691>] __sys_sendmsg+0x51/0x90 [ 293.384515] [<ffffffff885b76e2>] SyS_sendmsg+0x12/0x20 [ 293.385444] [<ffffffff88705d09>] system_call_fastpath+0x16/0x1b [ 293.386456] ---[ end trace 037a31a4a0de0f19 ]---