Bug 2176919 - INFO: task systemd-udevd:20742 blocked for more than 122 seconds.
Summary: INFO: task systemd-udevd:20742 blocked for more than 122 seconds.
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: iscsi-initiator-utils
Version: 9.3
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: rc
: ---
Assignee: Chris Leech
QA Contact: Storage QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-03-09 16:29 UTC by Martin Hoyer
Modified: 2023-06-27 11:07 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-151289 0 None None None 2023-03-09 16:31:39 UTC

Description Martin Hoyer 2023-03-09 16:29:07 UTC
Description of problem:
After repeatedly logging in and out to iSCSI target using qedi interfaces, call trace was observed on dmesg:


[ 7906.133082] scsi 1:0:2:1: Direct-Access     LIO-ORG  st86backstore    4.0  PQ: 0 ANSI: 6
[ 7906.143294] scsi 1:0:2:1: alua: supports implicit and explicit TPGS
[ 7906.150292] scsi 1:0:2:1: alua: device naa.60014056e1c42009a0c47b4a3bfc7365 port group 0 rel port 1
[ 7906.160502] sd 1:0:2:1: Attached scsi generic sg3 type 0
[ 7906.160774] sd 1:0:2:1: [sdd] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[ 7906.175170] sd 1:0:2:1: [sdd] Write Protect is off
[ 7906.180522] sd 1:0:2:1: [sdd] Mode Sense: 43 00 00 08
[ 7906.180918] sd 1:0:2:1: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 7906.191324] sd 1:0:2:1: [sdd] Preferred minimum I/O size 512 bytes
[ 7906.198217] sd 1:0:2:1: [sdd] Optimal transfer size 262144 bytes
[ 7906.214771] sd 1:0:2:1: [sdd] Attached SCSI disk
[ 7911.550804] [0000:02:00.5]:[qedi_offload_work:888]:2: Offload conn TIMEOUT iscsi_cid=1, qedi_ep=0000000002fa705b
[ 7917.139541] scsi 2:0:3:1: Direct-Access     LIO-ORG  st86backstore    4.0  PQ: 0 ANSI: 6
[ 7917.150408] scsi 2:0:3:1: alua: supports implicit and explicit TPGS
[ 7917.157408] scsi 2:0:3:1: alua: device naa.60014056e1c42009a0c47b4a3bfc7365 port group 0 rel port 1
[ 7917.167612] sd 2:0:3:1: Attached scsi generic sg4 type 0
[ 7917.167971] sd 2:0:3:1: [sde] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
[ 7917.182327] sd 2:0:3:1: [sde] Write Protect is off
[ 7917.187677] sd 2:0:3:1: [sde] Mode Sense: 43 00 00 08
[ 7917.188116] sd 2:0:3:1: [sde] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 7917.198540] sd 2:0:3:1: [sde] Preferred minimum I/O size 512 bytes
[ 7917.205455] sd 2:0:3:1: [sde] Optimal transfer size 262144 bytes
[ 7917.223530] sd 2:0:3:1: [sde] Attached SCSI disk
[ 7962.361415] device-mapper: multipath: 253:4: Failing path 8:16.
[ 7962.370106] [0000:02:00.5]:[qedi_iscsi_event_cb:129]:2: Recv Unknown Event 12
[ 7962.383875] scsi 1:0:2:1: alua: Detached
[ 7962.388476] [0000:02:00.4]:[qedi_iscsi_event_cb:129]:1: Recv Unknown Event 12
[ 7962.400898] scsi 1:0:1:1: rejecting I/O to dead device
[ 7962.404237] [0000:02:00.5]:[qedi_iscsi_event_cb:129]:2: Recv Unknown Event 12
[ 7962.406673] scsi 1:0:1:1: alua: rtpg failed, result 65536
[ 7962.420240] [0000:02:00.4]:[qedi_iscsi_event_cb:129]:1: Recv Unknown Event 12
[ 7962.420648] naa.60014056e1c42009a0c47b4a3bfc7365: no device found for rtpg
[ 7962.436289] device-mapper: multipath: 253:4: Failing path 8:32.
[ 7962.442933] device-mapper: multipath: 253:4: Failing path 8:64.
[ 7962.449599] I/O error, dev dm-4, sector 41942912 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[ 7962.459862] I/O error, dev dm-4, sector 41942912 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ 7962.469673] Buffer I/O error on dev dm-4, logical block 5242864, async page read
[ 7963.511808] scsi 2:0:0:1: alua: Detached
[ 7963.516347] device-mapper: multipath: 253:4: Failing path 8:32.
[ 7963.522987] device-mapper: multipath: 253:4: Failing path 8:64.
[ 7963.550810] scsi 1:0:1:1: alua: Detached
[ 7963.555324] device-mapper: multipath: 253:4: Failing path 8:64.
[ 7963.562132] device-mapper: uevent: dm_send_uevents: skipping sending uevent for lost device
[ 7963.576827] scsi 2:0:3:1: alua: Detached
[ 8113.278942] INFO: task systemd-udevd:20742 blocked for more than 122 seconds.
[ 8113.286904]       Not tainted 5.14.0-284.el9.x86_64 #1
[ 8113.292640] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 8113.301382] task:systemd-udevd   state:D stack:    0 pid:20742 ppid:   663 flags:0x00004006
[ 8113.310708] Call Trace:
[ 8113.313430]  <TASK>
[ 8113.315762]  __schedule+0x248/0x620
[ 8113.319658]  ? radix_tree_node_ctor+0x50/0x50
[ 8113.324524]  schedule+0x5a/0xc0
[ 8113.328030]  io_schedule+0x42/0x70
[ 8113.331827]  folio_wait_bit_common+0x147/0x380
[ 8113.336789]  ? file_check_and_advance_wb_err+0xd0/0xd0
[ 8113.342527]  truncate_inode_pages_range+0x34e/0x500
[ 8113.347991]  blkdev_flush_mapping+0x58/0xc0
[ 8113.352664]  blkdev_put_whole+0x3a/0x50
[ 8113.356961]  blkdev_put+0x9f/0x1b0
[ 8113.360770]  blkdev_close+0x18/0x20
[ 8113.364681]  __fput+0x91/0x250
[ 8113.368106]  task_work_run+0x59/0x90
[ 8113.372112]  do_exit+0x262/0x4a0
[ 8113.375732]  do_group_exit+0x2d/0x90
[ 8113.379736]  get_signal+0x9d3/0xa00
[ 8113.383647]  ? new_sync_read+0xf9/0x180
[ 8113.387953]  arch_do_signal_or_restart+0x25/0x100
[ 8113.393206]  ? vfs_read+0x13c/0x190
[ 8113.397115]  exit_to_user_mode_loop+0x9c/0x130
[ 8113.402093]  exit_to_user_mode_prepare+0xb6/0x100
[ 8113.407360]  syscall_exit_to_user_mode+0x12/0x30
[ 8113.412531]  do_syscall_64+0x69/0x90
[ 8113.416538]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 8113.422194] RIP: 0033:0x7fbcfe53eaf2
[ 8113.426204] RSP: 002b:00007ffc1c42fae8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 8113.434671] RAX: fffffffffffffffc RBX: 000055ace72ba9a8 RCX: 00007fbcfe53eaf2
[ 8113.442652] RDX: 0000000000040000 RSI: 000055ace72ba9b8 RDI: 000000000000000f
[ 8113.450631] RBP: 000055ace72084e0 R08: 0000000000000003 R09: 0000000000000000
[ 8113.458612] R10: 0000000000040000 R11: 0000000000000246 R12: 000000027ff80000
[ 8113.466591] R13: 0000000000040000 R14: 000055ace72ba990 R15: 000055ace7208530
[ 8113.474590]  </TASK>

[ 8236.159046] INFO: task systemd-udevd:20742 blocked for more than 245 seconds.
[ 8236.167014]       Not tainted 5.14.0-284.el9.x86_64 #1
[ 8236.172744] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 8236.181483] task:systemd-udevd   state:D stack:    0 pid:20742 ppid:   663 flags:0x00004006
[ 8236.190809] Call Trace:
[ 8236.193540]  <TASK>
[ 8236.195881]  __schedule+0x248/0x620
[ 8236.199777]  ? radix_tree_node_ctor+0x50/0x50
[ 8236.204641]  schedule+0x5a/0xc0
[ 8236.208146]  io_schedule+0x42/0x70
[ 8236.211942]  folio_wait_bit_common+0x147/0x380
[ 8236.216894]  ? file_check_and_advance_wb_err+0xd0/0xd0
[ 8236.222631]  truncate_inode_pages_range+0x34e/0x500
[ 8236.228083]  blkdev_flush_mapping+0x58/0xc0
[ 8236.232753]  blkdev_put_whole+0x3a/0x50
[ 8236.237034]  blkdev_put+0x9f/0x1b0
[ 8236.240831]  blkdev_close+0x18/0x20
[ 8236.244725]  __fput+0x91/0x250
[ 8236.248127]  task_work_run+0x59/0x90
[ 8236.252118]  do_exit+0x262/0x4a0
[ 8236.255721]  do_group_exit+0x2d/0x90
[ 8236.259704]  get_signal+0x9d3/0xa00
[ 8236.263599]  ? new_sync_read+0xf9/0x180
[ 8236.267881]  arch_do_signal_or_restart+0x25/0x100
[ 8236.273132]  ? vfs_read+0x13c/0x190
[ 8236.277026]  exit_to_user_mode_loop+0x9c/0x130
[ 8236.281990]  exit_to_user_mode_prepare+0xb6/0x100
[ 8236.287242]  syscall_exit_to_user_mode+0x12/0x30
[ 8236.292387]  do_syscall_64+0x69/0x90
[ 8236.296378]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 8236.302016] RIP: 0033:0x7fbcfe53eaf2
[ 8236.306006] RSP: 002b:00007ffc1c42fae8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 8236.314455] RAX: fffffffffffffffc RBX: 000055ace72ba9a8 RCX: 00007fbcfe53eaf2
[ 8236.322420] RDX: 0000000000040000 RSI: 000055ace72ba9b8 RDI: 000000000000000f
[ 8236.330386] RBP: 000055ace72084e0 R08: 0000000000000003 R09: 0000000000000000
[ 8236.338352] R10: 0000000000040000 R11: 0000000000000246 R12: 000000027ff80000
[ 8236.346317] R13: 0000000000040000 R14: 000055ace72ba990 R15: 000055ace7208530
[ 8236.354284]  </TASK>

Version-Release number of selected component (if applicable):
kernel-5.14.0-284.el9.x86_64
device-mapper-multipath-libs-0.8.7-20.el9.x86_64
device-mapper-multipath-0.8.7-20.el9.x86_64
systemd-udev-252-8.el9.x86_64

How reproducible:
once

Additional info:
I wasn't sure what component to open this against, please re-assign as needed

Comment 1 Martin Hoyer 2023-03-09 17:19:59 UTC
So I suspect it was caused by the iSCSI initiator trying to connect to a target which it had no network path to.

After running
`iscsiadm -m discovery -t st -p 172.16.0.20 -I qedi0`
172.16.1.20:3260,1 iqn.1994-05.com.redhat:stqe-lio
172.16.0.20:3260,1 iqn.1994-05.com.redhat:stqe-lio
`iscsiadm -m discovery -t st -p 172.16.1.20 -I qedi1`
172.16.1.20:3260,1 iqn.1994-05.com.redhat:stqe-lio
172.16.0.20:3260,1 iqn.1994-05.com.redhat:stqe-lio

target will provide two portals for each interface, however the 172.16.1 is not reachable by qedi0 and 172.16.0 is not reachable by qedi1.

So when user instead of specifying a portal and interface simply runs `iscsiadm -m node -l`, timeouts will arise and this call trace can be reproduced.

Comment 2 Ben Marzinski 2023-03-22 16:20:26 UTC
Reassigning to iscsi-initiator-utils based on comment #1. Assuming that the multipath device was set to queue_if_no_path, this is exactly what you'd expect if you had no paths to your device.  I'm not sure that there's actually a bug here. But if there is, and it further investigation points to multipath, feel free to reassign it back.

Comment 3 liunana 2023-06-27 11:07:26 UTC
Hi, I reported one similar bug on RHEL8.9 with high reproducibility: Bug 2217866 - [RHEL8.9] INFO: task systemd-udevd:2691 blocked for more than 120 seconds with over 256 vcpus, please check.


Note You need to log in before you can comment on or make changes to this bug.