Bug 1956416

Summary: [kernel-rt] RIP: 0010:__transport_wait_for_tasks+0x122/0x150 [target_core_mod]
Product: Red Hat Enterprise Linux 8 Reporter: Martin Hoyer <mhoyer>
Component: kernel-rtAssignee: Maurizio Lombardi <mlombard>
kernel-rt sub component: Kernel Target (LIO) QA Contact: Martin Hoyer <mhoyer>
Status: MODIFIED --- Docs Contact:
Severity: unspecified    
Priority: medium CC: bhu, cleech, jlelli, lgoncalv, mlombard, mstowell, tgill
Version: 8.4Keywords: Triaged
Target Milestone: beta   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-4.18.0-314.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
console log with call trace none

Description Martin Hoyer 2021-05-03 16:10:05 UTC
Created attachment 1779049 [details]
console log with call trace

Description of problem:
Call trace is generated on kernel-rt when running IO on local iSCSI target.

Version-Release number of selected component (if applicable):
4.18.0-304.7.rt7.78.el8.x86_64 (RHEL-8.4)
4.18.0-240.rt7.54.el8.x86_64 (RHEL-8.3)
have been observed on:
4.18.0-240.22.1.rt7.77.el8_3.mreq409.x86_64
4.18.0-193.52.1.rt13.102.el8_2.bz1949845.x86_64
4.18.0-304.7.rt7.78.el8debugcallfn

How reproducible:
It seems that /kernel/storage/iscsi/python/iscsi-basic test is a good reproducer

Steps to Reproduce:
1. run /kernel/storage/iscsi/python/iscsi-basic
2. check logs for call trace

Actual results:
WARNING: CPU: 1 PID: 829784 at drivers/target/target_core_transport.c:3013 __transport_wait_for_tasks+0x122/0x150 [target_core_mod]

Expected results:
No warning, no call trace

Additional info:
Maurizio already did some investigation and maybe found potential root cause in lwn.net/Articles/146861/:

```
In PREEMPT_RT, normal spinlocks (spinlock_t and rwlock_t) are preemptible, as are RCU read-side critical sections (rcu_read_lock() and rcu_read_unlock()). Semaphore critical sections are preemptible, but they already are in both PREEMPT and non-PREEMPT kernels (but more on semaphores later). This preemptibility means that you can block while acquiring a spinlock, which in turn means that it is illegal to acquire a spinlock with either preemption or interrupts disabled (the one exception to this rule being the _trylock variants, at least as long as you don't repeatedly invoke them in a tight loop). This also means that spin_lock_irqsave() does -not- disable hardware interrupts when used on a spinlock_t.
```

Comment 1 Juri Lelli 2021-05-04 06:30:35 UTC
Hi Maurizio,

Is the WARN_ON_ONCE in __transport_wait_for_tasks

static bool
__transport_wait_for_tasks(struct se_cmd *cmd, bool fabric_stop,
                           bool *aborted, bool *tas, unsigned long *flags)
        __releases(&cmd->t_state_lock)
        __acquires(&cmd->t_state_lock)
{

        assert_spin_locked(&cmd->t_state_lock);
        WARN_ON_ONCE(!irqs_disabled());                 <---

        if (fabric_stop)
                cmd->transport_state |= CMD_T_FABRIC_STOP;

What you think is emitting the warning message?

Comment 2 Maurizio Lombardi 2021-05-04 07:35:33 UTC
(In reply to Juri Lelli from comment #1)
> Hi Maurizio,
> 
> Is the WARN_ON_ONCE in __transport_wait_for_tasks
> 
> static bool
> __transport_wait_for_tasks(struct se_cmd *cmd, bool fabric_stop,
>                            bool *aborted, bool *tas, unsigned long *flags)
>         __releases(&cmd->t_state_lock)
>         __acquires(&cmd->t_state_lock)
> {
> 
>         assert_spin_locked(&cmd->t_state_lock);
>         WARN_ON_ONCE(!irqs_disabled());                 <---
> 
>         if (fabric_stop)
>                 cmd->transport_state |= CMD_T_FABRIC_STOP;
> 
> What you think is emitting the warning message?


Yes, the warning we are triggering is  WARN_ON_ONCE(!irqs_disabled())
and I am quite sure the root cause is due to the fact that on a real
time kernel spin_lock_irq*() DO NOT disable the interrupts and this trigger the warning.

Comment 3 Maurizio Lombardi 2021-05-04 07:38:06 UTC
(In reply to Juri Lelli from comment #1)
> What you think is emitting the warning message?

Do you mean the kernel component?
It's the target iSCSI driver.