Bug 2054037

Summary: use-after-free in sctp_do_8_2_transport_strike [rhel-7.9.z]
Product: Red Hat Enterprise Linux 7 Reporter: Jonathan Maxwell <jmaxwell>
Component: kernelAssignee: Xin Long <lxin>
kernel sub component: sctp QA Contact: ying xu <yinxu>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: arawal, brstephe, cutaylor, dhoward, jaeshin, jiji, kjeon, kpfleming, linzhao, lxin, mleitner, mtesar, network-qe, nmurray, sababu, stanislav.moravec, sukulkar, yinxu
Version: 7.9Keywords: Triaged, ZStream
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-3.10.0-1160.85.1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-07 09:54:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1880027    

Description Jonathan Maxwell 2022-02-14 04:03:07 UTC
Description of problem:

A customer reported a crashed VM and uploaded a Vmcore:

From sureshk analysis:

crash> sys |grep -e RELEASE -e PANIC
     RELEASE: 3.10.0-1160.6.1.el7.x86_64
       PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000268"

The backtrace if the crash is:
+++
    [exception RIP: sctp_do_8_2_transport_strike+0x71]
    RIP: ffffffffc07fc991  RSP: ffff9383f9643b80  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff93837b8c2c00  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff93837b8c2c00  RDI: ffff9383ed6e6000
    RBP: ffff9383f9643b98   R8: 0000000000000003   R9: ffff9383f9643c90
    R10: ffff938377345204  R11: 0000000000000005  R12: ffff9383ed6e6000
    R13: 0000000000000000  R14: 0000000000000003  R15: ffff9383f9643c90
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff9383f9643ba0] sctp_cmd_interpreter at ffffffffc07fe385 [sctp]
#11 [ffff9383f9643c38] sctp_do_sm at ffffffffc07fcc91 [sctp]
#12 [ffff9383f9643e10] sctp_generate_timeout_event at ffffffffc07fd305 [sctp]
#13 [ffff9383f9643e58] sctp_generate_t2_shutdown_event at ffffffffc07fd3e3 [sctp]
#14 [ffff9383f9643e68] call_timer_fn at ffffffffa7cabd58
#15 [ffff9383f9643ea0] run_timer_softirq at ffffffffa7cae1ed
#16 [ffff9383f9643f18] __do_softirq at ffffffffa7ca4b95
#17 [ffff9383f9643f88] call_softirq at ffffffffa83984ec
#18 [ffff9383f9643fa0] do_softirq at ffffffffa7c2f715
#19 [ffff9383f9643fc0] irq_exit at ffffffffa7ca4f15
#20 [ffff9383f9643fd8] smp_apic_timer_interrupt at ffffffffa8399a88
#21 [ffff9383f9643ff0] apic_timer_interrupt at ffffffffa8395fba
+++

The sctp module was trying to access the sctp association, but crashed because the association was NULL

+++
Crashed while trying to access  "transport->asoc->rto_max"

and association is NULL

crash> struct sctp_transport.asoc ffff93837b8c2c00
  asoc = 0x0 <--- NULL
+++

freed slab object:

crash> kmem ffff93837b8c2c00
CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
ffff9383f9007500     1024       3352      3456    108    32k  kmalloc-1024
  SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
  fffffb6542ee3000  ffff93837b8c0000     0     32         24     8
  FREE / [ALLOCATED]
   ffff93837b8c2c00  (cpu 1 cache)
+++

Actual results:

Use after free crash.

Expected results:

no crash.

Additional info:

This looks very similar to:

http://lkml.iu.edu/hypermail/linux/kernel/2104.2/05811.html

But I can't see a fix for that in upstream.

Comment 3 Jonathan Maxwell 2022-02-14 04:12:23 UTC
> sctp_do_8_2_transport_strike.constprop.0+0xa27/0xab0 net/sctp/sm_sideeffect.c:531

531 ▹       ▹       if (transport->state != SCTP_INACTIVE)↩

So it looks the KASAN report was a good match.

Comment 4 Jonathan Maxwell 2022-02-23 01:23:49 UTC
Hi Xin, do you have any update on this?

Regards

Jon

Comment 5 Xin Long 2022-02-23 03:51:14 UTC
(In reply to Jonathan Maxwell from comment #4)
> Hi Xin, do you have any update on this?
> 
Hi, Jon, sorry for late

Yes, I think it's the same one as the KASAN reported.
The fix is already on upstream:

  35b4f24415c8 sctp: do asoc update earlier in sctp_sf_do_dupcook_a

we may need this one too:

  51eac7f2f06b sctp: do asoc update earlier in sctp_sf_do_dupcook_b

Comment 6 Jonathan Maxwell 2022-02-23 03:53:51 UTC
Thanks Xin, awesome. I'll tell the customer we have a fix but it won't go into RHEL7 and report back here.

Comment 7 Sangam 2022-08-15 05:28:14 UTC
We have another customer facing the same issue, are we going to backport the known fix to RHEL 7 z-stream?

Comment 8 Curtis Taylor 2022-08-29 17:14:06 UTC
  commit a50d19c2501493fa7d8de3385c83329f5f42f93f

    Merge: sctp: fix a use after free crash of sctp_transport structure
      
    Xin Long (3):
      sctp: do asoc update earlier in sctp_sf_do_dupcook_a
      sctp: do asoc update earlier in sctp_sf_do_dupcook_b
      Revert "sctp: Fix SHUTDOWN CTSN Ack in the peer restart case"  <------ not mentioned so far in this BZ.

[] Confirms commit fixed by these patches is in RHEL7.9
  $ git show 35b4f24415c8 | grep Fixes:  <--- linux tree
    Fixes: 145cb2f7177d ("sctp: Fix bundling of SHUTDOWN with COOKIE-ACK")
  $ git log --oneline --grep="sctp: Fix bundling of SHUTDOWN"   <---- rhel7 tree
    92504ce6d122 [net] sctp: Fix bundling of SHUTDOWN with COOKIE-ACK

[] Is the revert also needed if this is ported to rhel7.9.z?
  $ git log --oneline --grep="sctp: Fix SHUTDOWN CTSN"  <--- rhel7 tree
    9836dfeb3786 [net] sctp: Fix SHUTDOWN CTSN Ack in the peer restart case
  $ git tag --contains=9836dfeb3786 | head -2
    RHEL-7.9
    kernel-3.10.0-1144.el7

Xin would the revert be necessary for rhel7.9?

Comment 9 Xin Long 2022-08-29 18:19:42 UTC
(In reply to Curtis Taylor from comment #8)
>   commit a50d19c2501493fa7d8de3385c83329f5f42f93f
> 
>     Merge: sctp: fix a use after free crash of sctp_transport structure
>       
>     Xin Long (3):
>       sctp: do asoc update earlier in sctp_sf_do_dupcook_a
>       sctp: do asoc update earlier in sctp_sf_do_dupcook_b
>       Revert "sctp: Fix SHUTDOWN CTSN Ack in the peer restart case"  <------
> not mentioned so far in this BZ.
> 
> [] Confirms commit fixed by these patches is in RHEL7.9
>   $ git show 35b4f24415c8 | grep Fixes:  <--- linux tree
>     Fixes: 145cb2f7177d ("sctp: Fix bundling of SHUTDOWN with COOKIE-ACK")
>   $ git log --oneline --grep="sctp: Fix bundling of SHUTDOWN"   <---- rhel7
> tree
>     92504ce6d122 [net] sctp: Fix bundling of SHUTDOWN with COOKIE-ACK
> 
> [] Is the revert also needed if this is ported to rhel7.9.z?
>   $ git log --oneline --grep="sctp: Fix SHUTDOWN CTSN"  <--- rhel7 tree
>     9836dfeb3786 [net] sctp: Fix SHUTDOWN CTSN Ack in the peer restart case
>   $ git tag --contains=9836dfeb3786 | head -2
>     RHEL-7.9
>     kernel-3.10.0-1144.el7
> 
> Xin would the revert be necessary for rhel7.9?

Not really, the revert is just an improvement, no fix in there.

Thanks.

Comment 11 Jonathan Maxwell 2022-09-09 04:53:14 UTC
Hi Xin,

Can you please provide devel_ack? So that Norm can proceed? 

Regards

Jon

Comment 14 Abhishek Rawal 2022-10-12 02:25:27 UTC
We have another customer facing the same issue with kernel-3.10.0-1160.76.1.el7 ;

++
https://galvatron-x86.cee.redhat.com/manager/301222650
retrace-server-interact 301222650 shell
retrace-server-interact 301222650 crash
++

Will it be possible, for engineering team to share us the information when(tentatively) the known fixes|commits will be merged in 7.z stream, please ?

Comment 15 Xin Long 2022-10-12 18:34:03 UTC
(In reply to Abhishek Rawal from comment #14)
> We have another customer facing the same issue with
> kernel-3.10.0-1160.76.1.el7 ;
> 
> ++
> https://galvatron-x86.cee.redhat.com/manager/301222650
> retrace-server-interact 301222650 shell
> retrace-server-interact 301222650 crash
> ++
> 
> Will it be possible, for engineering team to share us the information
> when(tentatively) the known fixes|commits will be merged in 7.z stream,
> please ?

It should be Dec 13th, the date of GA release.

Thanks.

Comment 47 errata-xmlrpc 2023-03-07 09:54:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:1091