Hide Forgot
The pulling of a SCSI command from the request queue and submitting to the HBA has a rare race condition. The timeout timer is started first, then the command is sent to the HBA or requeued. If the timer times out before the command makes it to the HBA driver or before the requeue stops the timer, the command can end up being actively used by the SCSI error handler to handle the timeout while the task trying to submit or requeue the request also thinks it owns the SCSI command. This results in the command being active and submitted twice, which can corrupt request queue lists or softirq done lists and lead to accessing freed request structs. A customer has had several systems which appear to trigger this condition do to setting a very low timeout of 4 seconds. There were large CPU systems with several lpfc HBAs generating high amounts of I/O and network traffic. The vmcore showed a system which crashed from a corrupt block softirq complete list, with an already freed request and loops in the list. Additionally, the vmcore showed a task interrupted in scsi_request_fn. It was interrupted in the "not_ready" path upon dropping the host_lock and before it could call blk_requeue_request. A series of interrupts and time in interrupt context was close to causing the issue again for the system. Version-Release number of selected component (if applicable): 2.6.32-504.el6 How reproducible: Occurs randomly on the customer's system. It's not the exact same trigger, but the same basic issue can be caused by turning up scsi logging so that scsi_log_send prints messages for every command to a slow serial console Steps to Reproduce: 1. On a multi CPU, multi hba system, configure and have the kernel log messages to a serial console. 2. Set the timeout for the scsi disks very low, like a value of 2. 2. Set scsi_logging_level to a value like "0x3d00" to make scsi_log_send print and make the scsi layer generate lots of messages. 3. The system will crash when a delay in scsi_log_send() to log messages holds up the submission of a command long enough for the race to occur. Actual results: System crashes when using a very short SCSI timeout Expected results: System should continue to run even if an overly short SCSI timeout is selected. Additional info: Secure information concerns mean we only had limited viewing of the vmcore, We do not have free access to it or most of its output.
This is not a regression in 6.8, so considering the schedule, and the fact that this problem is rare in systems running with the default config. parameters, it should not block 6.8. Moving to 6.9. Nonetheless, this should be considered a high priority.
I have access to the vmcore referenced above and can as needed assist with any debugger commands someone may want. And am I reading this right, the scsi commands submitted have not yet made it to the HBA in this case (ie after around 4 seconds?) and the issue is what happens when those commands get aborted while the HBA is attempting to process them? Is there anything else that could be adjusted to speed up the IO getting submitted to the HBA, or is this just a basic issue when high IO is being done? This delay may explain io delays we get when the system is under stress (SAN is not responding correct-bad cables and lost packets). And this may explain some others crashes I have looked at were we did not get vmcores, the IO subsystem was under stress (SAN issues again) and the node paniced during the stress, this is also pretty rare, and does not involve the extra driver being loaded in the kernel. Good work finding it and good luck fixing.
If you have access to the dump, could you examine the current "jiffies" value as well as the "->start_time" of the request that caused the machine to crash. I am not sure how to locate the request given the bug description above, but another request might be locatable in the task that is in the "not_ready" path in scsi_request_fn() as mentioned. The "->start_time" of that request would be interesting also. I am looking through the code in RHEL6 to see how we can protect against this race.
I added all of the previous debugging into and the jiffies and the struct request for the bad structure, not sure if it is what you want, but this may be enough for you to tell me the command to get what you want. Oops trace: BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0 IP: [<ffffffff8127834f>] blk_done_softirq+0x7f/0xa0 PGD dfd338b067 PUD cd80ed6067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:07.0/0000:0b:00.1/host6/rport-6:0-4/target6:0:2/6:0:2:83/state CPU 50 Modules linked in: hangcheck_timer oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U) krg_11_0_0_1130_impRHEL6K1smp-x86_64(P)(U) mptctl mptbase nfsd exportfs oracleasm(U) nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc ipv6 ext3 jbd dm_round_robin iTCO_wdt iTCO_vendor_support be2net ixgbe dca mdio e1000e ptp pps_core microcode ipmi_devintf serio_raw lpc_ich mfd_core hpilo hpwdt i7core_edac edac_core sg power_meter acpi_ipmi ipmi_si ipmi_msghandler bnx2 shpchp ext4 jbd2 mbcache sr_mod cdrom sd_mod lpfc scsi_transport_fc scsi_tgt crc_t10dif pata_acpi ata_generic ata_piix hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Pid: 205, comm: ksoftirqd/50 Tainted: P --------------- 2.6.32-504.1.3.el6.x86_64 #1 HP ProLiant DL980 G7 RIP: 0010:[<ffffffff8127834f>] [<ffffffff8127834f>] blk_done_softirq+0x7f/0xa0 RSP: 0018:ffff88a070c03f18 EFLAGS: 00010216 RAX: 0000000000000000 RBX: ffff88a070c03f18 RCX: ffff88adba447490 RDX: ffff88a070c03f18 RSI: ffff881fd299bbc0 RDI: ffff88a070c137c0 RBP: ffff88a070c03f38 R08: 0000000000000000 R09: 0000000000000000 R10: ffff881fd1a74400 R11: 00000000055b9b1f R12: ffffffff81a830a0 R13: 0000000000000020 R14: 0000000000000100 R15: 0000000000000004 FS: 0000000000000000(0000) GS:ffff88a070c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000000000b0 CR3: 000000cd920a0000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ksoftirqd/50 (pid: 205, threadinfo ffff881fd3a42000, task ffff881fd3a3eaa0) Stack: ffff88adba447490 ffff88ae8b7d14d0 0000000000000100 0000000000000029 <d> ffff88a070c03fa8 ffffffff8107d8b1 0000000000000032 ffff88a070c03f68 <d> 0000003200000005 ffff881fd3a43fd8 ffff881fd3a43fd8 0000000000011440 Call Trace: <IRQ> [<ffffffff8107d8b1>] __do_softirq+0xc1/0x1e0 [<ffffffff8100c30c>] call_softirq+0x1c/0x30 <EOI> [<ffffffff8100fc15>] ? do_softirq+0x65/0xa0 [<ffffffff8107d470>] ksoftirqd+0x80/0x110 [<ffffffff8107d3f0>] ? ksoftirqd+0x0/0x110 [<ffffffff8109e66e>] kthread+0x9e/0xc0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 Code: e0 48 39 d8 74 34 66 0f 1f 44 00 00 48 8d 78 f0 48 8b 4f 10 48 8b 57 18 48 89 51 08 48 89 0a 48 89 47 10 48 89 47 18 48 8b 47 38 <ff> 90 b0 00 00 00 48 8b 45 e0 48 39 d8 75 d2 48 83 c4 18 5b c9 RIP [<ffffffff8127834f>] blk_done_softirq+0x7f/0xa0 RSP <ffff88a070c03f18> CR2: 00000000000000b0 Here is all previous debugging that was asked for, current jiffies are at the end. Not sure which one was the crash, I have included a struct request ffff88adb2c49080 which is the out of bounds one. KERNEL: vmlinux DUMPFILE: vmcore_charabldb1_010716 [PARTIAL DUMP] CPUS: 160 DATE: Wed Jan 6 16:12:18 2016 UPTIME: 1 days, 23:27:12 LOAD AVERAGE: 38.86, 14.07, 6.71 TASKS: 4163 NODENAME: charabldb1 RELEASE: 2.6.32-504.1.3.el6.x86_64 VERSION: #1 SMP Fri Oct 31 11:37:10 EDT 2014 MACHINE: x86_64 (2393 Mhz) MEMORY: 1006 GB PANIC: "Oops: 0000 [#1] SMP " (check log for details) PID: 205 COMMAND: "ksoftirqd/50" TASK: ffff881fd3a3eaa0 [THREAD_INFO: ffff881fd3a42000] CPU: 50 STATE: TASK_RUNNING (PANIC) crash> where No stack. gdb: gdb request failed: where crash> p Usage: p [-x|-d][-u] [expression | symbol[:cpuspec]] Enter "help p" for details. crash> p ((struct scsi_cmnd *)0xffff888eaf5f3980)->device->host->hostt[0] $1 = { module = 0xffffffffa030f560, name = 0xffffffffa02f5754 "lpfc", detect = 0x0, release = 0x0, info = 0xffffffffa02c97a0, ioctl = 0x0, compat_ioctl = 0x0, queuecommand = 0xffffffffa02cbcc0, transfer_response = 0x0, eh_abort_handler = 0xffffffffa02c8080, eh_device_reset_handler = 0xffffffffa02c7e40, eh_target_reset_handler = 0xffffffffa02c7bf0, eh_bus_reset_handler = 0xffffffffa02c77f0, eh_host_reset_handler = 0xffffffffa02c6da0, slave_alloc = 0xffffffffa02c9310, slave_configure = 0xffffffffa02cdec0, slave_destroy = 0xffffffffa02c90c0, target_alloc = 0x0, target_destroy = 0x0, scan_finished = 0xffffffffa02b40d0, scan_start = 0x0, change_queue_depth = 0xffffffffa02cb000, change_queue_type = 0xffffffffa02c6740, bios_param = 0x0, proc_info = 0x0, eh_timed_out = 0x0, proc_name = 0x0, proc_dir = 0x0, can_queue = 0, this_id = -1, sg_tablesize = 64, max_sectors = 65535, dma_boundary = 0, cmd_per_lun = 3, present = 0 '\000', supported_mode = 0, unchecked_isa_dma = 0, use_clustering = 1, emulated = 0, skip_settle_delay = 0, ordered_tag = 0, lockless = 0, max_host_blocked = 0, shost_attrs = 0xffffffffa030d680, sdev_attrs = 0x0, legacy_hosts = { next = 0x0, prev = 0x0 }, vendor_id = 72057594037932255 } crash> list -r -H ffff88a070c03f18 ffff88ae8b7d14d0 ffff88dfce1f3b38 ffff88add004fcc0 ffff88aeb126b370 ffff88ae63fb1cc0 ffff88ae7bce1080 ffff88adb7ac91d0 ffff88add004f080 ffff883fd2aaccc0 ffff88ae7ef11e48 ffff88ae9114c330 ffff88cdd7542518 ffff88e9f984b828 ffff88ae148ba6a0 ffff88ae7bce16a0 ffff88ae66172390 ffff88adecabe370 ffff88adcad66518 ffff88adcad66390 ffff88ae8fc3b518 ffff88ae7bce1e48 ffff88adcad66208 ffff88bfd14d56a0 ffff88ae50d3e828 ffff88e999b25080 ffff88ad97f6c828 ffff88ad97f6c6a0 ffff88aeb13c8390 ffff88ae9138f6a0 ffff88ae907ee9b0 ffff88aeb13bcb70 ffff88ad97f6c080 ffff88ad97f6c208 ffff88add004f6a0 ffff88add004f9b0 ffff88aeb105c080 ffff88bfd1be1208 ffff88ae911aacc0 ffff88bfd196dcc0 ffff88ae912f76a0 ffff88aeb13c8cc0 ffff88adac7ff828 ffff88ada7246208 ffff88ae68f7bcd0 ffff88adac7ff9b0 ffff88adac7ffcc0 ffff88ae21d21370 ffff88ae10a21210 ffff88ae90eefe48 ffff88eaa1817080 ffff88cda8c889b0 ffff88adb2d489b0 ffff88ae90eefcc0 ffff88ae906cae48 ffff88ae14ab7630 ffff88aeb2bcde48 ffff88ae25263828 ffff88ae21d21790 ffff88ae66172828 ffff88bfd14d59b0 ffff88bfd14d5cc0 ffff88cdd77ef390 ffff88adb2d48518 ffff88ae7ed65cc0 ffff882dd458bb38 ffff88ce5cabd518 ffff88ae250e16a0 ffff88ae250e1e48 ffff88dfce1f3cc0 ffff88bfd2b6fcc0 ffff88bfd2b6f390 ffff88adb6738e48 ffff88ad93fb41d0 ffff88e9e0911518 ffff88ae66328e48 ffff88ade36a06a0 ffff882d95c7a080 ffff88ad93fb4070 ffff88e9746d0e48 ffff88aeb102c210 ffff88ae8e307b38 ffff882db52f3e48 ffff88ae8e307828 ffff88adb65ae6a0 ffff88ad93fae210 ffff88bfd2b6f080 ffff88bfd2b6f828 ffff88aeb19d7e48 ffff88ceaa1159b0 ffff88adfd1359b0 ffff88ae7679c208 ffff88bfd0d38b38 ffff88bfd0d38828 ffff88cda179ab38 ffff88adfd172518 ffff88ae7679c828 ffff88ae906ca080 ffff88add0150b70 ffff88ae906ca9b0 ffff88ceaaccf828 ffff88ae9111c390 ffff88ae0c49da10 ffff88dfd08c16a0 ffff88adecabe8f0 ffff88adc3abd518 ffff88ae2537e828 ffff88adb7803a10 ffff88adc3abd390 ffff88ae911aa390 ffff88ae68f7bb70 ffff88e9a7483b38 ffff88adf3bd3cd0 ffff88bfd1b31518 ffff88aeb1bf7080 ffff88ad93faf518 ffff88adb65a8e48 ffff88aeb13c8208 ffff88bfd0d38518 ffff88aeb2ada390 ffff88adc3abdcc0 ffff88bfd0905b38 ffff88ae25263080 ffff88bfd0905080 ffff880e94186390 ffff88ae25263518 ffff88ade1586390 ffff88ae25263208 ffff88ae5753c210 ffff88adb2c49080 ffff88aeb12bd518 ffff880e9eef7208 ffff886d1630c390 ffff88ae923cf390 ffff88cda8c88518 ffff88ae31b04e48 ffff88adb67389b0 ffff88ae17e926a0 ffff88aeb19e9cc0 ffff88adc3abd208 ffff88adf3809080 ffff88ae911bc828 ffff88aeb19e9518 ffff88ae906ca6a0 ffff88adf3809518 ffff88aeb19e9080 ffff88aeaeae26a0 ffff884eaf304b38 ffff88bfd2d2be48 ffff88aea9c6a080 ffff88adf3809e48 ffff88cda8c88828 ffff88aeb1331390 ffff88addadc9e48 ffff882d75dec6a0 ffff880e9eef79b0 ffff88ade36a09b0 ffff880cc28f99b0 ffff88ae7ef11390 ffff884c6dc6f9b0 ffff884c12dc3e48 ffff885fd08ddcc0 ffff88add406a6a0 ffff88bfd2067208 ffff88ae250e4cc0 ffff88ae250e4208 ffff88bfd0d92e48 ffff88ae8f98ae48 ffff886d0217de48 ffff886eab1ef080 ffff886ce6b22cc0 ffff880e9dc8d9b0 ffff880e8dad4390 ffff88ade791e518 ffff880db033db38 ffff88bfd20679b0 ffff88ae221679b0 ffff88ae66172208 ffff88aeaebf7080 ffff88ada720ae48 ffff88bfd29239b0 ffff88bfd2923cc0 ffff88ad9c712080 ffff88aeaebf7208 ffff88adf1563cc0 ffff88ade8ec7b38 ffff88ae7bcad390 ffff88cdd7543cc0 ffff884eaad80080 ffff88bfd1bc46a0 ffff884eaa444518 ffff88ae8ff2f390 ffff88ae21d24080 ffff88add01ee208 ffff88ae911aa518 ffff88adb67386a0 ffff88e99dc8e828 ffff88ad97f6ccc0 ffff88ae66172e48 ffff88ad97f6cb38 ffff884eaafb06a0 ffff88e9e09106a0 ffff88ae907eecc0 ffff88ae663289b0 ffff88ae907ee208 ffff88add007b9b0 ffff88ae265f4cc0 ffff884eac0f0cc0 ffff88adf752ccc0 ffff88ae84c5b9b0 ffff88aeb12bd208 ffff88bfd2923b38 ffff88ade8ec7080 ffff884dc4ec7828 ffff88adf752ce48 ffff88eab6f846a0 ffff884d3d465390 ffff88aeb1331828 ffff88aeb1331cc0 ffff88bfd1bc4390 ffff88e9635bc828 ffff88bfd114a6a0 ffff88adcb596080 ffff88aeb13c8518 ffff88bfd114ab38 ffff88e929f02518 ffff88ade8ec7518 ffff88ae910ef828 ffff88ae24062e48 ffff88adb79b1cc0 ffff88ae66172518 ffff88ada72476a0 ffff884bad02ab38 ffff88adb79b19b0 ffff884ddb078828 ffff88aeb12bd390 ffff88adf14d1080 ffff884ca3563390 ffff88adac7ff6a0 ffff88aeb1331208 ffff88ae661726a0 ffff88bfd2923e48 ffff884eabc9cb38 ffff88ade8e76828 ffff88aeb19e96a0 ffff88dfd27a1828 ffff884e3e8a5828 ffff88aeb12bd080 ffff88aeb12bde48 ffff88adf14d19b0 ffff88e929d91e48 ffff88ae250e1518 ffff88ad93fafb38 ffff885fcf29a208 ffff88e95d1fce48 ffff88ae90d95cc0 ffff88aeb1331518 ffff88bfd2e03208 ffff88ae250e1828 ffff88bfd0a40080 ffff88bfd1ff0828 ffff886cf997d6a0 ffff884eac132208 ffff88e9504af9b0 ffff88aeb13c8b38 ffff88adb2c49518 ffff88ade8e769b0 ffff88ae911bce48 ffff884e3eae3828 ffff88adfd134080 ffff88e9af9bc080 ffff887fd1a2e390 ffff88ae17db66a0 ffff884eafd48208 ffff88eaa1221518 ffff88adac785390 ffff88ae616abcc0 ffff88ada7246080 ffff88bfd29236a0 ffff88cdf26bb9b0 ffff884eaa7d76a0 ffff884eaa895390 ffff88adac785828 ffff88ae04b65cc0 ffff88bfd1ff0080 ffff88ae1ec7de48 ffff88bfd235e518 ffff88bfd235e6a0 ffff88e9f30f9390 ffff88adb65a8390 ffff88adb65a8208 ffff880db1feb208 ffff88ceaa488518 ffff880e9dcfeb38 ffff88ae91381b38 ffff88cdc72e9390 ffff88bfd235ecc0 ffff88e9a9af6518 ffff88ceaa3789b0 ffff88aeb13c8e48 ffff88cd7c904518 ffff88bfd1bc4080 ffff88adfd1349b0 ffff88adb2d48208 ffff88ae08f06208 ffff884d4c98d208 ffff88ae91140080 ffff88cd7c9046a0 ffff886ea7a13828 ffff88ae90c65208 ffff88ae19a32390 ffff88aeaebf76a0 ffff88ae04b65828 ffff88ae08f06390 ffff884eaf34e6a0 ffff88ae04b65b38 ffff88bfd0a9a080 ffff885fd08dd208 ffff881fcac1de48 ffff88ae9138fcc0 ffff88bfd2067390 ffff88ae50d3e080 ffff88bfd2067828 ffff88bfd2067080 ffff88bfd2067e48 ffff88aeb105cb38 ffff88aeb105c6a0 ffff88ae0ef5eb38 ffff881fcbb20b38 ffff88aeb105c828 ffff88ae90668cc0 ffff88ae7655b6a0 ffff88adb2c49cc0 ffff88adbccc3080 ffff88adb2c49e48 ffff88ad93fafcc0 ffff88bfd0d92b38 ffff88aeb1360518 ffff886e753c6b38 ffff88ae90d95e48 ffff88adb2c49828 ffff88aeb1360390 ffff88ae90c65080 ffff88e9cfc4d9b0 ffff88fbd356e6a0 ffff88aeb13609b0 ffff886d0db7a390 ffff88ae91140b38 ffff88ad9c712390 ffff88e9b1b3d9b0 ffff886e3558f390 ffff88ae1d3d6cc0 ffff88e950606cc0 ffff88ad9c712cc0 ffff88ae7ef11cc0 ffff88bfd1b31080 ffff88bfd2067518 ffff88aeb1360828 ffff88bfd2067cc0 ffff884ea9c949b0 ffff88adbccc3828 ffff88bfd2ae5208 ffff88ae9040fe48 ffff88ae04b65390 ffff88dfd09846a0 ffff88ade8ec7e48 ffff88eaa1221080 ffff88ae04b65e48 ffff88ae9139f828 ffff880db1febb38 ffff88ae92350208 ffff88ae66185390 ffff88ae04b65080 ffff88bfd2ae5080 ffff88ae04b65518 ffff88adcb5946a0 ffff88ae9139fb38 ffff88ae04b65208 ffff88ae7ef11b38 ffff88e9b1bba390 ffff880db1feb6a0 ffff88adfd172080 ffff88adb6738b38 ffff88ae2abbfb38 ffff88ae19ac86a0 ffff88ae7bcadcc0 ffff88aea9c856a0 ffff88bfd1b31cc0 ffff88ae161799b0 ffff88aeaeb68e48 ffff88bfd2d92080 ffff88bfd0d92390 ffff88bfd0d92208 ffff88add02ac208 ffff88ae16179cc0 ffff88ae911bccc0 ffff88ad9c712828 ffff88ae16179518 ffff88addbcd6828 ffff88addadc9208 ffff88ae8e307208 ffff88ae8e307080 ffff88ae9139f208 ffff88ae265f4b38 ffff88ade1586208 ffff88ae8e307390 ffff88adb2c49208 ffff88bfd0d389b0 ffff88cd8f5f56a0 ffff88ae1d3d6e48 ffff88ad93faf9b0 ffff88ae90c65390 ffff88adfd172828 ffff88ce47371cc0 ffff880cb9542390 ffff88ae7655b828 ffff884eaed1a390 ffff88ade791ee48 ffff88add02ace48 ffff88bfd0905390 ffff884eaa200e48 ffff88aea9c85828 ffff88add01ee6a0 ffff88ade791e9b0 ffff88ae90c65518 ffff88add02ac390 ffff88bfd0c80080 ffff88ae92350390 ffff886e7af02b38 ffff88ae220879b0 ffff88ae22087b38 ffff880d89d5c9b0 ffff880d5d7df208 ffff88adceeb6390 ffff880c82b4eb38 ffff88ad97f42cc0 ffff88adf14e9080 ffff88ae910ed518 ffff88adf3bc7518 ffff884ca14e7390 ffff88bfd0d38080 ffff88cdae95e208 ffff88bfd2923080 ffff880dbb94e6a0 ffff880c6d79eb38 ffff880dbbb52080 ffff88fbd6a23e48 ffff88ae17e92080 ffff884e1ae86390 ffff88cd8f5f5518 ffff884eb06e59b0 ffff88e9504b6208 ffff88eab511ae48 ffff88fbd5324208 ffff88adc15ce828 ffff88ae221cbb38 ffff88adb390b208 ffff880d5b42ce48 ffff88bfd0d92cc0 ffff88dfd1b1f390 ffff88adb390b9b0 ffff88ae90f66080 ffff88aea9c85cc0 ffff88a070c117e0 ffff88ae7ed9f518 ffff88aeaeae2080 ffff88ae22486cc0 ffff88aeb24e1518 ffff886eaac6b828 ffff88addad83390 ffff886eaa826b38 ffff88bfd2ae5b38 ffff88ce0c985828 ffff88adcae27b38 ffff88ae8e323390 ffff88adf1563080 ffff88aeb2adacc0 ffff88ae90eef828 ffff88e9ebe57390 ffff88adfd0d6b38 ffff88ae9111c6a0 ffff88add403a390 ffff88e975de4518 ffff886d2ccf2b38 ffff88e9b1b3d828 ffff88ae9111c208 ffff88cdc7005080 ffff88add02ac9b0 ffff88e9f4de2828 ffff88adfd0d6390 ffff88e9de77a9b0 ffff88addadc9390 ffff88ae9111ce48 ffff886ea759c390 ffff88ae22167518 ffff88ceaa115208 ffff88adf3bc7390 ffff88adf3bc7080 ffff88ae907ee6a0 ffff88eab56d8b38 ffff884b6d2e8080 ffff882db73d7390 ffff88fbd3589080 ffff886e4eeea208 ffff88ad97db7208 ffff88adb2d486a0 ffff88ad97db7390 ffff88e99bccb518 ffff88ae250e1208 ffff886ea6dc8828 ffff88adb2c49b38 ffff88adfd0d6208 ffff88ae907ee828 ffff88bfd1be1b38 ffff885fd0b82208 ffff88bfd0d38208 ffff88bfd0d38cc0 ffff886d35178cc0 ffff88ae148ba828 ffff88aeb2adae48 ffff88cdd7776b38 ffff88ae0edf8b38 ffff88cdeeaa2cc0 ffff88aeb13c89b0 ffff88ae21d24828 ffff88ae9131a9b0 ffff88ea1e277b38 ffff88ae905906a0 ffff88aeb2bcd828 ffff88cd9a917828 ffff88aeaea71b38 ffff88aeaea71cc0 ffff88ae911aab38 ffff884e8e773518 ffff88aeaea71828 ffff88bfd0a409b0 ffff88ae911aa6a0 ffff88e9504b6518 ffff88aeb1bd0518 ffff88fbd6a23b38 ffff884d7a193e48 ffff88ae265f4828 ffff88aeaeae2208 ffff88ae8ff2f080 ffff88cdd77ef828 ffff88ae8ff2fcc0 ffff88ae66172cc0 ffff88aeaea71e48 ffff88ae2537e6a0 ffff88bfd2b6f9b0 ffff88ae923cfcc0 ffff880d5b51bb38 ffff880e4d5da9b0 ffff88bfd0a40518 ffff88aeb12ad390 ffff88ae910f7e48 ffff885fd127b390 ffff88ad9c675390 ffff88eab66ff6a0 ffff88eab5714518 ffff88ae905909b0 ffff88adb7acc208 ffff88ae66172b38 ffff884eaca9e080 ffff886ce6b229b0 ffff88adb7acc080 ffff88ae7ef11208 ffff88ae22486080 ffff88ae90f66b38 ffff88ae7ed9fcc0 ffff88ae2536ab38 ffff88ae2537e080 ffff88adf1563518 ffff88ae7ed9f390 ffff88bfd0d38390 ffff88adbcfdc208 ffff88ada720ab38 ffff88ae616ab9b0 ffff88adb7accb38 ffff88adb390b6a0 ffff882dc623b390 ffff880db1feb390 ffff88ae25263b38 ffff88adb65ae9b0 ffff88adb79b1b38 ffff884ead8aa208 ffff880c6d79e6a0 ffff88aeb12adcc0 ffff88aeb1bd0b38 ffff88bfd0c80208 ffff88ae209bb080 ffff88addadc9b38 ffff88ae7bcad9b0 ffff88ae51ed0828 ffff88ae8fc3bcc0 ffff882e75511b38 ffff882d5df6e9b0 ffff882dd826f208 ffff882dd8a6f9b0 ffff882d75cb8080 ffff882d9f8636a0 ffff882ea8ca7390 ffff883fd0b3ab38 ffff88adb7acc518 ffff88ae22167b38 ffff882db924e6a0 ffff882db73d6b38 ffff882e7548eb38 ffff88ae9131a208 ffff883fd0ae3518 ffff882ea948b6a0 ffff88ae7ed65080 ffff88ae22486208 ffff88ae8f98a390 ffff88ae21d24518 ffff882dd7da1390 ffff88aeb19e9828 ffff88ada70f89b0 ffff88bfd114acc0 ffff88adf1563390 ffff88ae7ed65518 ffff88ae8f98ab38 ffff886cf9a4e6a0 ffff88ae224869b0 ffff88adfd134390 ffff88addad83cc0 ffff88ae7bcad080 ffff88adfd0d6518 ffff88bfd2a5f518 ffff88ae8f98a828 ffff88bfd2e03390 ffff88ae907ee080 ffff88addad836a0 ffff88ae8f98acc0 ffff88ae21d24390 ffff88ae51ed06a0 ffff88ae21d24e48 ffff88adf3809b38 ffff88ae8f98a518 ffff88aea9c85518 ffff88aeb1bf7828 ffff88aea9c85208 ffff88aeb1bf7390 ffff88ae21d24b38 ffff88ae072c1518 ffff886d054779b0 ffff88aeb19e9e48 ffff88adb390bb38 ffff88adb390b080 ffff88aeaeb689b0 ffff88aeaeb68cc0 ffff88aeb1bf79b0 ffff88ae7bcad518 ffff88ae221cb080 ffff88bfd0a40e48 ffff88bfd0a40828 ffff88adf38096a0 ffff88aeb1bf7518 ffff88aeb1bf7208 ffff88aeaeb686a0 ffff88ae072c1e48 ffff88ae9139fe48 ffff88bfd0a406a0 ffff88ae2536a080 ffff88ae2536a208 ffff88aea9c85b38 ffff88bfd0a40390 ffff88ae221cb828 ffff88addbcd66a0 ffff88aeb12bd9b0 ffff88ae910ed080 ffff88ae252636a0 ffff88ae252639b0 ffff884eac548080 ffff88ae2536a390 ffff88ae0edf8390 ffff88ae58504390 ffff88bfd0a40208 ffff88ae58504208 ffff88adb2c49390 ffff88aea9c859b0 ffff88ae7ed9f828 ffff88adcee819b0 ffff88add004f208 ffff88aeb12e6e48 ffff88ada720a6a0 ffff88aeb12e6b38 ffff88adcd691cc0 ffff88adcd691828 ffff88adb2d48cc0 ffff88bfd2d2bb38 ffff88adb2d48390 ffff88ae910ed6a0 ffff88bfd2d2b828 ffff88bfd2d2b208 ffff88adcd691b38 ffff88ae25263cc0 ffff88ada720a9b0 ffff88aeaeae29b0 ffff88ae910ed9b0 ffff88ae616ab6a0 ffff88ae616ab208 ffff88aeb12e6208 ffff88ae910edb38 ffff88ae25263e48 ffff88adb67a3390 ffff88adbcfdc080 ffff88adbcfdc518 ffff88adbcfdcb38 ffff88adbcfdc9b0 ffff88adbcfdc828 ffff88adbcfdccc0 ffff88adb7acc9b0 ffff88aeb12ad9b0 ffff88adb67a39b0 ffff88aeb12ad828 ffff88aeb12adb38 ffff88bfd0c80518 ffff88bfd2923208 ffff88ae7ef11828 ffff88bfd0c80390 ffff88aeb12ad080 ffff88adb2c496a0 ffff88ae7ed9f6a0 ffff88ad97f6ce48 ffff88ad97f6c390 ffff88aeb12bdb38 ffff88ae7ef119b0 ffff88aeb12bd828 ffff88bfd0c80b38 ffff88bfd2923518 ffff88bfd2b6f6a0 ffff88cdd77efe48 ffff88bfd2d2b390 ffff88bfd2d2b9b0 ffff88adf14e9390 ffff88adbccc3390 ffff88aeb2ada9b0 ffff88ae91296390 ffff88ae905046a0 ffff88ae9138f828 ffff88ae9138f9b0 ffff88aeb2ada828 ffff88aeb2ada518 ffff88adb67a3828 ffff88ad97f6c518 ffff88bfd0c80cc0 ffff88bfd14d5518 ffff88bfd0c806a0 ffff88ade791e208 ffff88bfd2923828 ffff88aeaeae2b38 ffff88ae2537e9b0 ffff88addad83828 ffff88adb2d48b38 ffff88ae90668518 ffff88ae91296828 ffff88addd186208 ffff88ae911bcb38 ffff88adb6739518 ffff88ae91296e48 ffff88adfd135390 ffff88bfd235e208 ffff88bfd235eb38 ffff88ae90504208 ffff88ae91296cc0 ffff88bfd2923390 ffff88ae911aa828 ffff88ae2536a9b0 ffff88ae90590b38 ffff88ae911aae48 ffff88aeb12ad518 ffff88bfd14d5208 ffff88adfd134828 ffff88add007bcc0 ffff88bfd1ab1828 ffff88bfd2e036a0 ffff88adfd134cc0 ffff88addbcd6cc0 ffff88aeb1e53e48 ffff88ade8e766a0 ffff88bfd0d929b0 ffff88adf1563e48 ffff88ade36a0390 ffff88adb7acce48 ffff88add02ac080 ffff88bfd114a9b0 ffff88adfd134e48 ffff88adf14e9e48 ffff88addd186e48 ffff88ae0ece3208 ffff88aeb24e1cc0 ffff88ae90c98080 ffff88ada70f8518 ffff88adcb594208 ffff88ae911bc208 ffff88ae265f4208 ffff88ae7ed9f208 ffff88adb390b518 ffff88ae90f666a0 ffff88ada70f8390 ffff88aeaea71080 ffff88adceeb6828 ffff88aeb19e99b0 ffff88ae7bcadb38 ffff88ae911406a0 ffff88ae2536acc0 ffff88ae19a329b0 ffff88adf1656cc0 ffff88adb7acc6a0 ffff88ae90590518 ffff88ae51ed0e48 ffff88ae7ef11518 ffff88ae1ec7d6a0 ffff88ae90504518 ffff88ad9c712b38 ffff88ae2536ae48 ffff88aeaebf79b0 ffff88adb65a8b38 ffff88ae2d0f0828 ffff88ae911bc9b0 ffff88aea9c85e48 ffff88aeaea71390 ffff88ae90668208 ffff88adf14e9cc0 ffff88adf752c208 ffff88aeaeb17cc0 ffff88adb67a3cc0 ffff88cda13109b0 ffff88adcad66828 ffff88ae7ed659b0 ffff88aeaea71518 ffff88adf14e9b38 ffff88adcb594e48 ffff88aeae8d0390 ffff88bfd1be1080 ffff886e744a1e48 ffff88ae2536a828 ffff88ae2536a6a0 ffff88adcae27208 ffff88ae923cfb38 ffff88addd1869b0 ffff88aea9e22828 ffff88adcb594828 ffff88ae8fc3bb38 ffff88ae91296080 ffff88aeb24e19b0 ffff88aea9c85080 ffff88adbccb9828 ffff88ae905049b0 ffff88ae585049b0 ffff88adb67a3b38 ffff88adb67a3e48 ffff88ae29d81080 ffff88adcee81b38 ffff88ae912966a0 ffff88ae2d0f0e48 ffff88ae923cf6a0 ffff88ae51ed0208 ffff88adcad669b0 ffff88bfd19a9cc0 ffff88adf14e9208 ffff88ae072c1cc0 ffff88ae7bce1b38 ffff88ae911aa208 ffff88ae22087828 ffff88ae8e3076a0 ffff88bfd196d828 ffff88ae7655b9b0 ffff88ae9040f518 ffff88ada70f8b38 ffff88ae7679c390 ffff88ade8e76cc0 ffff88bfd196d390 ffff88ae17db6208 ffff88ae9040f390 ffff88ae616ab828 ffff88ae19ac8b38 ffff88ae84c5b080 ffff88ae8b457080 ffff88ae8b457e48 ffff88ae8b457cc0 ffff88adb79b1518 ffff88aeb24e16a0 ffff88aeb1bf7b38 ffff88bfd2a5f080 ffff88bfd2a5fcc0 ffff88bfd2a5f9b0 ffff88bfd2a5f390 ffff88bfd2a5fe48 ffff88cdd77ef9b0 ffff88bfd2a5f828 ffff88ae250e4518 ffff88ae250e4b38 ffff88ae923cf9b0 ffff880cb52c3518 ffff88ae911bc6a0 ffff880dbb86fcc0 ffff88e925bd7828 ffff88adb2c499b0 ffff886d12663e48 ffff88bfd2b6fb38 ffff88eaa0fdd208 ffff88ae923cf518 ffff88ae923cf208 ffff88dfd08c1080 ffff884bfe6f9518 ffff88adb2c49080 list: duplicate list entry: ffff88adb2c49080 crash> kmem ffff88aeb1331b38 CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE ffff88fbd6740380 dm_rq_target_io 392 47375 57320 5732 4k SLAB MEMORY TOTAL ALLOCATED FREE ffff88aeb1331000 ffff88aeb1331058 10 6 4 FREE / [ALLOCATED] [ffff88aeb1331b10] PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea02636c32b8 aeb1331000 0 ffff88ae34acf3c0 1 2c0000000000080 slab crash> crash> kmem ffff88adb2c49080 CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE ffff88fbd6740380 dm_rq_target_io 392 47375 57320 5732 4k SLAB MEMORY TOTAL ALLOCATED FREE ffff88adb2c49000 ffff88adb2c49058 10 10 0 FREE / [ALLOCATED] [ffff88adb2c49058] PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea025ff1aff8 adb2c49000 0 ffff88addafbd100 1 2c0000000000080 slab crash> kmem ffff884bfe6f9518 CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE ffff88fbd6740380 dm_rq_target_io 392 47375 57320 5732 4k SLAB MEMORY TOTAL ALLOCATED FREE ffff884bfe6f9000 ffff884bfe6f9058 10 4 6 FREE / [ALLOCATED] [ffff884bfe6f94f0] PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea0109fa8678 4bfe6f9000 0 ffff885fd2c63380 1 140000000000080 slab crash> kmem ffff88aeb12bd518 CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE ffff88fbd6740380 dm_rq_target_io 392 47375 57320 5732 4k SLAB MEMORY TOTAL ALLOCATED FREE ffff88aeb12bd000 ffff88aeb12bd058 10 10 0 FREE / [ALLOCATED] [ffff88aeb12bd4f0] PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea02636c1958 aeb12bd000 0 ffff88ae90ce8d80 1 2c0000000000080 slab crash> struct request ffff88aeb1331b28 struct request { queuelist = { next = 0xffff881fcd733088, prev = 0xffff884eac5b3b28 }, csd = { list = { next = 0xffff88adb2c49080, prev = 0xffff88cd7cb87828 }, func = 0xffffffff81278370 <trigger_softirq>, info = 0xffff88aeb1331b28, flags = 0, priv = 0 }, q = 0xffff881fcd732ce8, cmd_flags = 16784965, cmd_type = REQ_TYPE_FS, atomic_flags = 1, cpu = 130, __data_len = 8192, __sector = 803116671, bio = 0xffff88ae209fabc0, biotail = 0xffff88ae209fabc0, hash = { next = 0x0, pprev = 0x0 }, { rb_node = { rb_parent_color = 18446612882611444648, rb_right = 0x0, rb_left = 0x0 }, completion_data = 0xffff88aeb1331ba8 }, { elevator_private = {0x0, 0x0, 0x0}, flush = { seq = 0, list = { next = 0x0, prev = 0x0 } } }, rq_disk = 0xffff883fd1dff000, start_time = 4465498210, start_time_ns = 170936376757782, io_start_time_ns = 170936624155168, nr_phys_segments = 1, ioprio = 0, ref_count = 1, special = 0xffff88ceaa2399c0, buffer = 0x0, tag = 0, errors = 0, __cmd = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", cmd = 0xffff88bfd18e6710 "*", cmd_len = 16, extra_len = 0, sense_len = 0, resid_len = 8192, sense = 0x0, deadline = 4465502457, timeout_list = { next = 0xffff884eac5b3c50, prev = 0xffff881fcd7330f8 }, timeout = 4000, retries = 0, end_io = 0xffffffffa0002a00, end_io_data = 0xffff88aeb1331b10, next_rq = 0x0, pad = 0x0 } crash> struct request ffff88adb2c49070 struct request { queuelist = { next = 0xffff88adb67389a0, prev = 0xffff88adb2c499a0 }, csd = { list = { next = 0xffff884bfe6f9518, prev = 0xffff88aeb12bd518 }, func = 0xffffffff81278370 <trigger_softirq>, info = 0xffff88adb2c49070, flags = 1, priv = 0 }, q = 0xffff881fcd641328, cmd_flags = 16784965, cmd_type = REQ_TYPE_FS, atomic_flags = 1, cpu = 135, __data_len = 8192, __sector = 721182495, bio = 0xffff88adf39925c0, biotail = 0xffff88adf39925c0, hash = { next = 0x0, pprev = 0x0 }, { rb_node = { rb_parent_color = 18446612878342787312, rb_right = 0x0, rb_left = 0x0 }, completion_data = 0xffff88adb2c490f0 }, { elevator_private = {0x0, 0x0, 0x0}, flush = { seq = 0, list = { next = 0x0, prev = 0x0 } } }, rq_disk = 0xffff883fd1dd6000, start_time = 4465499326, start_time_ns = 170937493383578, io_start_time_ns = 170938157020987, nr_phys_segments = 1, ioprio = 0, ref_count = 1, special = 0xffff888eaf5f3980, buffer = 0x0, tag = 11, errors = 0, __cmd = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", cmd = 0xffff88ae63c08410 "*", cmd_len = 16, extra_len = 0, sense_len = 0, resid_len = 8192, sense = 0x0, deadline = 4465503989, timeout_list = { next = 0xffff88adb2c49ac8, prev = 0xffff88adb6738ac8 }, timeout = 4000, retries = 0, end_io = 0xffffffffa0002a00, end_io_data = 0xffff88adb2c49058, next_rq = 0x0, pad = 0x0 } crash> struct request ffff884bfe6f9508 struct request { queuelist = { next = 0xffff88e9ee3aa690, prev = 0xffff88eaae2b0b28 }, csd = { list = { next = 0xffff88dfd08c1080, prev = 0xffff88adb2c49080 }, func = 0xffffffff81278370 <trigger_softirq>, info = 0xffff884bfe6f9508, flags = 1, priv = 0 }, q = 0xffff881fcd924ea8, cmd_flags = 16784965, cmd_type = REQ_TYPE_FS, atomic_flags = 1, cpu = 135, __data_len = 8192, __sector = 88864255, bio = 0xffff884d934b0780, biotail = 0xffff884d934b0780, hash = { next = 0x0, pprev = 0x0 }, { rb_node = { rb_parent_color = 18446612458705491336, rb_right = 0x0, rb_left = 0x0 }, completion_data = 0xffff884bfe6f9588 }, { elevator_private = {0x0, 0x0, 0x0}, flush = { seq = 0, list = { next = 0x0, prev = 0x0 } } }, rq_disk = 0xffff883fd34ba800, start_time = 4465499330, start_time_ns = 170937497327657, io_start_time_ns = 170938154952269, nr_phys_segments = 1, ioprio = 0, ref_count = 1, special = 0xffff885fd220b680, buffer = 0x0, tag = 0, errors = 0, __cmd = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", cmd = 0xffff88adb4da2710 "*", cmd_len = 16, extra_len = 0, sense_len = 0, resid_len = 8192, sense = 0x0, deadline = 4465503987, timeout_list = { next = 0xffff88eaae2b0c50, prev = 0xffff88e9ee3aa7b8 }, timeout = 4000, retries = 0, end_io = 0xffffffffa0002a00, end_io_data = 0xffff884bfe6f94f0, next_rq = 0x0, pad = 0x0 } crash> struct request ffff88aeb12bd508 struct request { queuelist = { next = 0xffff88aeb12bd1f8, prev = 0xffff88e9ee3aa9a0 }, csd = { list = { next = 0xffff88adb2c49080, prev = 0xffff880e9eef7208 }, func = 0xffffffff81278370 <trigger_softirq>, info = 0xffff88aeb12bd508, flags = 1, priv = 0 }, q = 0xffff881fcdbb2238, cmd_flags = 16784965, cmd_type = REQ_TYPE_FS, atomic_flags = 1, cpu = 135, __data_len = 8192, __sector = 90617247, bio = 0xffff88aeb1109600, biotail = 0xffff88aeb1109600, hash = { next = 0x0, pprev = 0x0 }, { rb_node = { rb_parent_color = 18446612882610967944, rb_right = 0x0, rb_left = 0x0 }, completion_data = 0xffff88aeb12bd588 }, { elevator_private = {0x0, 0x0, 0x0}, flush = { seq = 0, list = { next = 0x0, prev = 0x0 } } }, rq_disk = 0xffff883fd1d7dc00, start_time = 4465499327, start_time_ns = 170937494653426, io_start_time_ns = 170938156582742, nr_phys_segments = 1, ioprio = 0, ref_count = 1, special = 0xffff88e9ac078380, buffer = 0x0, tag = 10, errors = 0, __cmd = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", cmd = 0xffff88ae76713c90 "*", cmd_len = 16, extra_len = 0, sense_len = 0, resid_len = 8192, sense = 0x0, deadline = 4465503989, timeout_list = { next = 0xffff88e9ee3aaac8, prev = 0xffff88aeb12bd320 }, timeout = 4000, retries = 0, end_io = 0xffffffffa0002a00, end_io_data = 0xffff88aeb12bd4f0, next_rq = 0x0, pad = 0x0 } crash> list ffff88a070c03f18 ffff88a070c03f18 ffff88adba447490 ffff88aeb13bc1d0 ffff88cdc50f8b38 ffff88ce15a05518 ffff88aea9e7f790 ffff88ce94064208 ffff88ad93fae630 ffff884ead10d518 ffff88ae90eef208 ffff88bfd1b319b0 ffff88ae90eef080 ffff88adf15a9d10 ffff884d7a317080 ffff88e9bf547390 ffff88e9bf547080 ffff882db3077cc0 ffff88aeb105c518 ffff88dfce1f3518 ffff88ae63c08750 ffff88adac7ffe48 ffff88adb79b1208 ffff88ae90f66e48 ffff88ae2d0f0080 ffff88cd7cb87828 ffff88aeb1331b38 ffff88adb2c49080 ffff884bfe6f9518 ffff88dfd08c1080 ffff88ae923cf208 ffff88ae923cf518 ffff88eaa0fdd208 ffff88bfd2b6fb38 ffff886d12663e48 ffff88adb2c499b0 ffff88e925bd7828 ffff880dbb86fcc0 ffff88ae911bc6a0 ffff880cb52c3518 ffff88ae923cf9b0 ffff88ae250e4b38 ffff88ae250e4518 ffff88bfd2a5f828 ffff88cdd77ef9b0 ffff88bfd2a5fe48 ffff88bfd2a5f390 ffff88bfd2a5f9b0 ffff88bfd2a5fcc0 ffff88bfd2a5f080 ffff88aeb1bf7b38 ffff88aeb24e16a0 ffff88adb79b1518 ffff88ae8b457cc0 ffff88ae8b457e48 ffff88ae8b457080 ffff88ae84c5b080 ffff88ae19ac8b38 ffff88ae616ab828 ffff88ae9040f390 ffff88ae17db6208 ffff88bfd196d390 ffff88ade8e76cc0 ffff88ae7679c390 ffff88ada70f8b38 ffff88ae9040f518 ffff88ae7655b9b0 ffff88bfd196d828 ffff88ae8e3076a0 ffff88ae22087828 ffff88ae911aa208 ffff88ae7bce1b38 ffff88ae072c1cc0 ffff88adf14e9208 ffff88bfd19a9cc0 ffff88adcad669b0 ffff88ae51ed0208 ffff88ae923cf6a0 ffff88ae2d0f0e48 ffff88ae912966a0 ffff88adcee81b38 ffff88ae29d81080 ffff88adb67a3e48 ffff88adb67a3b38 ffff88ae585049b0 ffff88ae905049b0 ffff88adbccb9828 ffff88aea9c85080 ffff88aeb24e19b0 ffff88ae91296080 ffff88ae8fc3bb38 ffff88adcb594828 ffff88aea9e22828 ffff88addd1869b0 ffff88ae923cfb38 ffff88adcae27208 ffff88ae2536a6a0 ffff88ae2536a828 ffff886e744a1e48 ffff88bfd1be1080 ffff88aeae8d0390 ffff88adcb594e48 ffff88adf14e9b38 ffff88aeaea71518 ffff88ae7ed659b0 ffff88adcad66828 ffff88cda13109b0 ffff88adb67a3cc0 ffff88aeaeb17cc0 ffff88adf752c208 ffff88adf14e9cc0 ffff88ae90668208 ffff88aeaea71390 ffff88aea9c85e48 ffff88ae911bc9b0 ffff88ae2d0f0828 ffff88adb65a8b38 ffff88aeaebf79b0 ffff88ae2536ae48 ffff88ad9c712b38 ffff88ae90504518 ffff88ae1ec7d6a0 ffff88ae7ef11518 ffff88ae51ed0e48 ffff88ae90590518 ffff88adb7acc6a0 ffff88adf1656cc0 ffff88ae19a329b0 ffff88ae2536acc0 ffff88ae911406a0 ffff88ae7bcadb38 ffff88aeb19e99b0 ffff88adceeb6828 ffff88aeaea71080 ffff88ada70f8390 ffff88ae90f666a0 ffff88adb390b518 ffff88ae7ed9f208 ffff88ae265f4208 ffff88ae911bc208 ffff88adcb594208 ffff88ada70f8518 ffff88ae90c98080 ffff88aeb24e1cc0 ffff88ae0ece3208 ffff88addd186e48 ffff88adf14e9e48 ffff88adfd134e48 ffff88bfd114a9b0 ffff88add02ac080 ffff88adb7acce48 ffff88ade36a0390 ffff88adf1563e48 ffff88bfd0d929b0 ffff88ade8e766a0 ffff88aeb1e53e48 ffff88addbcd6cc0 ffff88adfd134cc0 ffff88bfd2e036a0 ffff88bfd1ab1828 ffff88add007bcc0 ffff88adfd134828 ffff88bfd14d5208 ffff88aeb12ad518 ffff88ae911aae48 ffff88ae90590b38 ffff88ae2536a9b0 ffff88ae911aa828 ffff88bfd2923390 ffff88ae91296cc0 ffff88ae90504208 ffff88bfd235eb38 ffff88bfd235e208 ffff88adfd135390 ffff88ae91296e48 ffff88adb6739518 ffff88ae911bcb38 ffff88addd186208 ffff88ae91296828 ffff88ae90668518 ffff88adb2d48b38 ffff88addad83828 ffff88ae2537e9b0 ffff88aeaeae2b38 ffff88bfd2923828 ffff88ade791e208 ffff88bfd0c806a0 ffff88bfd14d5518 ffff88bfd0c80cc0 ffff88ad97f6c518 ffff88adb67a3828 ffff88aeb2ada518 ffff88aeb2ada828 ffff88ae9138f9b0 ffff88ae9138f828 ffff88ae905046a0 ffff88ae91296390 ffff88aeb2ada9b0 ffff88adbccc3390 ffff88adf14e9390 ffff88bfd2d2b9b0 ffff88bfd2d2b390 ffff88cdd77efe48 ffff88bfd2b6f6a0 ffff88bfd2923518 ffff88bfd0c80b38 ffff88aeb12bd828 ffff88ae7ef119b0 ffff88aeb12bdb38 ffff88ad97f6c390 ffff88ad97f6ce48 ffff88ae7ed9f6a0 ffff88adb2c496a0 ffff88aeb12ad080 ffff88bfd0c80390 ffff88ae7ef11828 ffff88bfd2923208 ffff88bfd0c80518 ffff88aeb12adb38 ffff88aeb12ad828 ffff88adb67a39b0 ffff88aeb12ad9b0 ffff88adb7acc9b0 ffff88adbcfdccc0 ffff88adbcfdc828 ffff88adbcfdc9b0 ffff88adbcfdcb38 ffff88adbcfdc518 ffff88adbcfdc080 ffff88adb67a3390 ffff88ae25263e48 ffff88ae910edb38 ffff88aeb12e6208 ffff88ae616ab208 ffff88ae616ab6a0 ffff88ae910ed9b0 ffff88aeaeae29b0 ffff88ada720a9b0 ffff88ae25263cc0 ffff88adcd691b38 ffff88bfd2d2b208 ffff88bfd2d2b828 ffff88ae910ed6a0 ffff88adb2d48390 ffff88bfd2d2bb38 ffff88adb2d48cc0 ffff88adcd691828 ffff88adcd691cc0 ffff88aeb12e6b38 ffff88ada720a6a0 ffff88aeb12e6e48 ffff88add004f208 ffff88adcee819b0 ffff88ae7ed9f828 ffff88aea9c859b0 ffff88adb2c49390 ffff88ae58504208 ffff88bfd0a40208 ffff88ae58504390 ffff88ae0edf8390 ffff88ae2536a390 ffff884eac548080 ffff88ae252639b0 ffff88ae252636a0 ffff88ae910ed080 ffff88aeb12bd9b0 ffff88addbcd66a0 ffff88ae221cb828 ffff88bfd0a40390 ffff88aea9c85b38 ffff88ae2536a208 ffff88ae2536a080 ffff88bfd0a406a0 ffff88ae9139fe48 ffff88ae072c1e48 ffff88aeaeb686a0 ffff88aeb1bf7208 ffff88aeb1bf7518 ffff88adf38096a0 ffff88bfd0a40828 ffff88bfd0a40e48 ffff88ae221cb080 ffff88ae7bcad518 ffff88aeb1bf79b0 ffff88aeaeb68cc0 ffff88aeaeb689b0 ffff88adb390b080 ffff88adb390bb38 ffff88aeb19e9e48 ffff886d054779b0 ffff88ae072c1518 ffff88ae21d24b38 ffff88aeb1bf7390 ffff88aea9c85208 ffff88aeb1bf7828 ffff88aea9c85518 ffff88ae8f98a518 ffff88adf3809b38 ffff88ae21d24e48 ffff88ae51ed06a0 ffff88ae21d24390 ffff88ae8f98acc0 ffff88addad836a0 ffff88ae907ee080 ffff88bfd2e03390 ffff88ae8f98a828 ffff88bfd2a5f518 ffff88adfd0d6518 ffff88ae7bcad080 ffff88addad83cc0 ffff88adfd134390 ffff88ae224869b0 ffff886cf9a4e6a0 ffff88ae8f98ab38 ffff88ae7ed65518 ffff88adf1563390 ffff88bfd114acc0 ffff88ada70f89b0 ffff88aeb19e9828 ffff882dd7da1390 ffff88ae21d24518 ffff88ae8f98a390 ffff88ae22486208 ffff88ae7ed65080 ffff882ea948b6a0 ffff883fd0ae3518 ffff88ae9131a208 ffff882e7548eb38 ffff882db73d6b38 ffff882db924e6a0 ffff88ae22167b38 ffff88adb7acc518 ffff883fd0b3ab38 ffff882ea8ca7390 ffff882d9f8636a0 ffff882d75cb8080 ffff882dd8a6f9b0 ffff882dd826f208 ffff882d5df6e9b0 ffff882e75511b38 ffff88ae8fc3bcc0 ffff88ae51ed0828 ffff88ae7bcad9b0 ffff88addadc9b38 ffff88ae209bb080 ffff88bfd0c80208 ffff88aeb1bd0b38 ffff88aeb12adcc0 ffff880c6d79e6a0 ffff884ead8aa208 ffff88adb79b1b38 ffff88adb65ae9b0 ffff88ae25263b38 ffff880db1feb390 ffff882dc623b390 ffff88adb390b6a0 ffff88adb7accb38 ffff88ae616ab9b0 ffff88ada720ab38 ffff88adbcfdc208 ffff88bfd0d38390 ffff88ae7ed9f390 ffff88adf1563518 ffff88ae2537e080 ffff88ae2536ab38 ffff88ae7ed9fcc0 ffff88ae90f66b38 ffff88ae22486080 ffff88ae7ef11208 ffff88adb7acc080 ffff886ce6b229b0 ffff884eaca9e080 ffff88ae66172b38 ffff88adb7acc208 ffff88ae905909b0 ffff88eab5714518 ffff88eab66ff6a0 ffff88ad9c675390 ffff885fd127b390 ffff88ae910f7e48 ffff88aeb12ad390 ffff88bfd0a40518 ffff880e4d5da9b0 ffff880d5b51bb38 ffff88ae923cfcc0 ffff88bfd2b6f9b0 ffff88ae2537e6a0 ffff88aeaea71e48 ffff88ae66172cc0 ffff88ae8ff2fcc0 ffff88cdd77ef828 ffff88ae8ff2f080 ffff88aeaeae2208 ffff88ae265f4828 ffff884d7a193e48 ffff88fbd6a23b38 ffff88aeb1bd0518 ffff88e9504b6518 ffff88ae911aa6a0 ffff88bfd0a409b0 ffff88aeaea71828 ffff884e8e773518 ffff88ae911aab38 ffff88aeaea71cc0 ffff88aeaea71b38 ffff88cd9a917828 ffff88aeb2bcd828 ffff88ae905906a0 ffff88ea1e277b38 ffff88ae9131a9b0 ffff88ae21d24828 ffff88aeb13c89b0 ffff88cdeeaa2cc0 ffff88ae0edf8b38 ffff88cdd7776b38 ffff88aeb2adae48 ffff88ae148ba828 ffff886d35178cc0 ffff88bfd0d38cc0 ffff88bfd0d38208 ffff885fd0b82208 ffff88bfd1be1b38 ffff88ae907ee828 ffff88adfd0d6208 ffff88adb2c49b38 ffff886ea6dc8828 ffff88ae250e1208 ffff88e99bccb518 ffff88ad97db7390 ffff88adb2d486a0 ffff88ad97db7208 ffff886e4eeea208 ffff88fbd3589080 ffff882db73d7390 ffff884b6d2e8080 ffff88eab56d8b38 ffff88ae907ee6a0 ffff88adf3bc7080 ffff88adf3bc7390 ffff88ceaa115208 ffff88ae22167518 ffff886ea759c390 ffff88ae9111ce48 ffff88addadc9390 ffff88e9de77a9b0 ffff88adfd0d6390 ffff88e9f4de2828 ffff88add02ac9b0 ffff88cdc7005080 ffff88ae9111c208 ffff88e9b1b3d828 ffff886d2ccf2b38 ffff88e975de4518 ffff88add403a390 ffff88ae9111c6a0 ffff88adfd0d6b38 ffff88e9ebe57390 ffff88ae90eef828 ffff88aeb2adacc0 ffff88adf1563080 ffff88ae8e323390 ffff88adcae27b38 ffff88ce0c985828 ffff88bfd2ae5b38 ffff886eaa826b38 ffff88addad83390 ffff886eaac6b828 ffff88aeb24e1518 ffff88ae22486cc0 ffff88aeaeae2080 ffff88ae7ed9f518 ffff88a070c117e0 ffff88aea9c85cc0 ffff88ae90f66080 ffff88adb390b9b0 ffff88dfd1b1f390 ffff88bfd0d92cc0 ffff880d5b42ce48 ffff88adb390b208 ffff88ae221cbb38 ffff88adc15ce828 ffff88fbd5324208 ffff88eab511ae48 ffff88e9504b6208 ffff884eb06e59b0 ffff88cd8f5f5518 ffff884e1ae86390 ffff88ae17e92080 ffff88fbd6a23e48 ffff880dbbb52080 ffff880c6d79eb38 ffff880dbb94e6a0 ffff88bfd2923080 ffff88cdae95e208 ffff88bfd0d38080 ffff884ca14e7390 ffff88adf3bc7518 ffff88ae910ed518 ffff88adf14e9080 ffff88ad97f42cc0 ffff880c82b4eb38 ffff88adceeb6390 ffff880d5d7df208 ffff880d89d5c9b0 ffff88ae22087b38 ffff88ae220879b0 ffff886e7af02b38 ffff88ae92350390 ffff88bfd0c80080 ffff88add02ac390 ffff88ae90c65518 ffff88ade791e9b0 ffff88add01ee6a0 ffff88aea9c85828 ffff884eaa200e48 ffff88bfd0905390 ffff88add02ace48 ffff88ade791ee48 ffff884eaed1a390 ffff88ae7655b828 ffff880cb9542390 ffff88ce47371cc0 ffff88adfd172828 ffff88ae90c65390 ffff88ad93faf9b0 ffff88ae1d3d6e48 ffff88cd8f5f56a0 ffff88bfd0d389b0 ffff88adb2c49208 ffff88ae8e307390 ffff88ade1586208 ffff88ae265f4b38 ffff88ae9139f208 ffff88ae8e307080 ffff88ae8e307208 ffff88addadc9208 ffff88addbcd6828 ffff88ae16179518 ffff88ad9c712828 ffff88ae911bccc0 ffff88ae16179cc0 ffff88add02ac208 ffff88bfd0d92208 ffff88bfd0d92390 ffff88bfd2d92080 ffff88aeaeb68e48 ffff88ae161799b0 ffff88bfd1b31cc0 ffff88aea9c856a0 ffff88ae7bcadcc0 ffff88ae19ac86a0 ffff88ae2abbfb38 ffff88adb6738b38 ffff88adfd172080 ffff880db1feb6a0 ffff88e9b1bba390 ffff88ae7ef11b38 ffff88ae04b65208 ffff88ae9139fb38 ffff88adcb5946a0 ffff88ae04b65518 ffff88bfd2ae5080 ffff88ae04b65080 ffff88ae66185390 ffff88ae92350208 ffff880db1febb38 ffff88ae9139f828 ffff88ae04b65e48 ffff88eaa1221080 ffff88ade8ec7e48 ffff88dfd09846a0 ffff88ae04b65390 ffff88ae9040fe48 ffff88bfd2ae5208 ffff88adbccc3828 ffff884ea9c949b0 ffff88bfd2067cc0 ffff88aeb1360828 ffff88bfd2067518 ffff88bfd1b31080 ffff88ae7ef11cc0 ffff88ad9c712cc0 ffff88e950606cc0 ffff88ae1d3d6cc0 ffff886e3558f390 ffff88e9b1b3d9b0 ffff88ad9c712390 ffff88ae91140b38 ffff886d0db7a390 ffff88aeb13609b0 ffff88fbd356e6a0 ffff88e9cfc4d9b0 ffff88ae90c65080 ffff88aeb1360390 ffff88adb2c49828 ffff88ae90d95e48 ffff886e753c6b38 ffff88aeb1360518 ffff88bfd0d92b38 ffff88ad93fafcc0 ffff88adb2c49e48 ffff88adbccc3080 ffff88adb2c49cc0 ffff88ae7655b6a0 ffff88ae90668cc0 ffff88aeb105c828 ffff881fcbb20b38 ffff88ae0ef5eb38 ffff88aeb105c6a0 ffff88aeb105cb38 ffff88bfd2067e48 ffff88bfd2067080 ffff88bfd2067828 ffff88ae50d3e080 ffff88bfd2067390 ffff88ae9138fcc0 ffff881fcac1de48 ffff885fd08dd208 ffff88bfd0a9a080 ffff88ae04b65b38 ffff884eaf34e6a0 ffff88ae08f06390 ffff88ae04b65828 ffff88aeaebf76a0 ffff88ae19a32390 ffff88ae90c65208 ffff886ea7a13828 ffff88cd7c9046a0 ffff88ae91140080 ffff884d4c98d208 ffff88ae08f06208 ffff88adb2d48208 ffff88adfd1349b0 ffff88bfd1bc4080 ffff88cd7c904518 ffff88aeb13c8e48 ffff88ceaa3789b0 ffff88e9a9af6518 ffff88bfd235ecc0 ffff88cdc72e9390 ffff88ae91381b38 ffff880e9dcfeb38 ffff88ceaa488518 ffff880db1feb208 ffff88adb65a8208 ffff88adb65a8390 ffff88e9f30f9390 ffff88bfd235e6a0 ffff88bfd235e518 ffff88ae1ec7de48 ffff88bfd1ff0080 ffff88ae04b65cc0 ffff88adac785828 ffff884eaa895390 ffff884eaa7d76a0 ffff88cdf26bb9b0 ffff88bfd29236a0 ffff88ada7246080 ffff88ae616abcc0 ffff88adac785390 ffff88eaa1221518 ffff884eafd48208 ffff88ae17db66a0 ffff887fd1a2e390 ffff88e9af9bc080 ffff88adfd134080 ffff884e3eae3828 ffff88ae911bce48 ffff88ade8e769b0 ffff88adb2c49518 ffff88aeb13c8b38 ffff88e9504af9b0 ffff884eac132208 ffff886cf997d6a0 ffff88bfd1ff0828 ffff88bfd0a40080 ffff88ae250e1828 ffff88bfd2e03208 ffff88aeb1331518 ffff88ae90d95cc0 ffff88e95d1fce48 ffff885fcf29a208 ffff88ad93fafb38 ffff88ae250e1518 ffff88e929d91e48 ffff88adf14d19b0 ffff88aeb12bde48 ffff88aeb12bd080 ffff884e3e8a5828 ffff88dfd27a1828 ffff88aeb19e96a0 ffff88ade8e76828 ffff884eabc9cb38 ffff88bfd2923e48 ffff88ae661726a0 ffff88aeb1331208 ffff88adac7ff6a0 ffff884ca3563390 ffff88adf14d1080 ffff88aeb12bd390 ffff884ddb078828 ffff88adb79b19b0 ffff884bad02ab38 ffff88ada72476a0 ffff88ae66172518 ffff88adb79b1cc0 ffff88ae24062e48 ffff88ae910ef828 ffff88ade8ec7518 ffff88e929f02518 ffff88bfd114ab38 ffff88aeb13c8518 ffff88adcb596080 ffff88bfd114a6a0 ffff88e9635bc828 ffff88bfd1bc4390 ffff88aeb1331cc0 ffff88aeb1331828 ffff884d3d465390 ffff88eab6f846a0 ffff88adf752ce48 ffff884dc4ec7828 ffff88ade8ec7080 ffff88bfd2923b38 ffff88aeb12bd208 ffff88ae84c5b9b0 ffff88adf752ccc0 ffff884eac0f0cc0 ffff88ae265f4cc0 ffff88add007b9b0 ffff88ae907ee208 ffff88ae663289b0 ffff88ae907eecc0 ffff88e9e09106a0 ffff884eaafb06a0 ffff88ad97f6cb38 ffff88ae66172e48 ffff88ad97f6ccc0 ffff88e99dc8e828 ffff88adb67386a0 ffff88ae911aa518 ffff88add01ee208 ffff88ae21d24080 ffff88ae8ff2f390 ffff884eaa444518 ffff88bfd1bc46a0 ffff884eaad80080 ffff88cdd7543cc0 ffff88ae7bcad390 ffff88ade8ec7b38 ffff88adf1563cc0 ffff88aeaebf7208 ffff88ad9c712080 ffff88bfd2923cc0 ffff88bfd29239b0 ffff88ada720ae48 ffff88aeaebf7080 ffff88ae66172208 ffff88ae221679b0 ffff88bfd20679b0 ffff880db033db38 ffff88ade791e518 ffff880e8dad4390 ffff880e9dc8d9b0 ffff886ce6b22cc0 ffff886eab1ef080 ffff886d0217de48 ffff88ae8f98ae48 ffff88bfd0d92e48 ffff88ae250e4208 ffff88ae250e4cc0 ffff88bfd2067208 ffff88add406a6a0 ffff885fd08ddcc0 ffff884c12dc3e48 ffff884c6dc6f9b0 ffff88ae7ef11390 ffff880cc28f99b0 ffff88ade36a09b0 ffff880e9eef79b0 ffff882d75dec6a0 ffff88addadc9e48 ffff88aeb1331390 ffff88cda8c88828 ffff88adf3809e48 ffff88aea9c6a080 ffff88bfd2d2be48 ffff884eaf304b38 ffff88aeaeae26a0 ffff88aeb19e9080 ffff88adf3809518 ffff88ae906ca6a0 ffff88aeb19e9518 ffff88ae911bc828 ffff88adf3809080 ffff88adc3abd208 ffff88aeb19e9cc0 ffff88ae17e926a0 ffff88adb67389b0 ffff88ae31b04e48 ffff88cda8c88518 ffff88ae923cf390 ffff886d1630c390 ffff880e9eef7208 ffff88aeb12bd518 ffff88adb2c49080 list: duplicate list entry: ffff88adb2c49080 crash> list -r ffff88a070c03f18 ffff88a070c03f18 ffff88adba447490 ffff88aeb13bc1d0 ffff88cdc50f8b38 ffff88ce15a05518 ffff88aea9e7f790 ffff88ce94064208 ffff88ad93fae630 ffff884ead10d518 ffff88ae90eef208 ffff88bfd1b319b0 ffff88ae90eef080 ffff88adf15a9d10 ffff884d7a317080 ffff88e9bf547390 ffff88e9bf547080 ffff882db3077cc0 ffff88aeb105c518 ffff88dfce1f3518 ffff88ae63c08750 ffff88adac7ffe48 ffff88adb79b1208 ffff88ae90f66e48 ffff88ae2d0f0080 ffff88cd7cb87828 ffff88aeb1331b38 ffff88adb2c49080 ffff884bfe6f9518 ffff88dfd08c1080 ffff88ae923cf208 ffff88ae923cf518 ffff88eaa0fdd208 ffff88bfd2b6fb38 ffff886d12663e48 ffff88adb2c499b0 ffff88e925bd7828 ffff880dbb86fcc0 ffff88ae911bc6a0 ffff880cb52c3518 ffff88ae923cf9b0 ffff88ae250e4b38 ffff88ae250e4518 ffff88bfd2a5f828 ffff88cdd77ef9b0 ffff88bfd2a5fe48 ffff88bfd2a5f390 ffff88bfd2a5f9b0 ffff88bfd2a5fcc0 ffff88bfd2a5f080 ffff88aeb1bf7b38 ffff88aeb24e16a0 ffff88adb79b1518 ffff88ae8b457cc0 ffff88ae8b457e48 ffff88ae8b457080 ffff88ae84c5b080 ffff88ae19ac8b38 ffff88ae616ab828 ffff88ae9040f390 ffff88ae17db6208 ffff88bfd196d390 ffff88ade8e76cc0 ffff88ae7679c390 ffff88ada70f8b38 ffff88ae9040f518 ffff88ae7655b9b0 ffff88bfd196d828 ffff88ae8e3076a0 ffff88ae22087828 ffff88ae911aa208 ffff88ae7bce1b38 ffff88ae072c1cc0 ffff88adf14e9208 ffff88bfd19a9cc0 ffff88adcad669b0 ffff88ae51ed0208 ffff88ae923cf6a0 ffff88ae2d0f0e48 ffff88ae912966a0 ffff88adcee81b38 ffff88ae29d81080 ffff88adb67a3e48 ffff88adb67a3b38 ffff88ae585049b0 ffff88ae905049b0 ffff88adbccb9828 ffff88aea9c85080 ffff88aeb24e19b0 ffff88ae91296080 ffff88ae8fc3bb38 ffff88adcb594828 ffff88aea9e22828 ffff88addd1869b0 ffff88ae923cfb38 ffff88adcae27208 ffff88ae2536a6a0 ffff88ae2536a828 ffff886e744a1e48 ffff88bfd1be1080 ffff88aeae8d0390 ffff88adcb594e48 ffff88adf14e9b38 ffff88aeaea71518 ffff88ae7ed659b0 ffff88adcad66828 ffff88cda13109b0 ffff88adb67a3cc0 ffff88aeaeb17cc0 ffff88adf752c208 ffff88adf14e9cc0 ffff88ae90668208 ffff88aeaea71390 ffff88aea9c85e48 ffff88ae911bc9b0 ffff88ae2d0f0828 ffff88adb65a8b38 ffff88aeaebf79b0 ffff88ae2536ae48 ffff88ad9c712b38 ffff88ae90504518 ffff88ae1ec7d6a0 ffff88ae7ef11518 ffff88ae51ed0e48 ffff88ae90590518 ffff88adb7acc6a0 ffff88adf1656cc0 ffff88ae19a329b0 ffff88ae2536acc0 ffff88ae911406a0 ffff88ae7bcadb38 ffff88aeb19e99b0 ffff88adceeb6828 ffff88aeaea71080 ffff88ada70f8390 ffff88ae90f666a0 ffff88adb390b518 ffff88ae7ed9f208 ffff88ae265f4208 ffff88ae911bc208 ffff88adcb594208 ffff88ada70f8518 ffff88ae90c98080 ffff88aeb24e1cc0 ffff88ae0ece3208 ffff88addd186e48 ffff88adf14e9e48 ffff88adfd134e48 ffff88bfd114a9b0 ffff88add02ac080 ffff88adb7acce48 ffff88ade36a0390 ffff88adf1563e48 ffff88bfd0d929b0 ffff88ade8e766a0 ffff88aeb1e53e48 ffff88addbcd6cc0 ffff88adfd134cc0 ffff88bfd2e036a0 ffff88bfd1ab1828 ffff88add007bcc0 ffff88adfd134828 ffff88bfd14d5208 ffff88aeb12ad518 ffff88ae911aae48 ffff88ae90590b38 ffff88ae2536a9b0 ffff88ae911aa828 ffff88bfd2923390 ffff88ae91296cc0 ffff88ae90504208 ffff88bfd235eb38 ffff88bfd235e208 ffff88adfd135390 ffff88ae91296e48 ffff88adb6739518 ffff88ae911bcb38 ffff88addd186208 ffff88ae91296828 ffff88ae90668518 ffff88adb2d48b38 ffff88addad83828 ffff88ae2537e9b0 ffff88aeaeae2b38 ffff88bfd2923828 ffff88ade791e208 ffff88bfd0c806a0 ffff88bfd14d5518 ffff88bfd0c80cc0 ffff88ad97f6c518 ffff88adb67a3828 ffff88aeb2ada518 ffff88aeb2ada828 ffff88ae9138f9b0 ffff88ae9138f828 ffff88ae905046a0 ffff88ae91296390 ffff88aeb2ada9b0 ffff88adbccc3390 ffff88adf14e9390 ffff88bfd2d2b9b0 ffff88bfd2d2b390 ffff88cdd77efe48 ffff88bfd2b6f6a0 ffff88bfd2923518 ffff88bfd0c80b38 ffff88aeb12bd828 ffff88ae7ef119b0 ffff88aeb12bdb38 ffff88ad97f6c390 ffff88ad97f6ce48 ffff88ae7ed9f6a0 ffff88adb2c496a0 ffff88aeb12ad080 ffff88bfd0c80390 ffff88ae7ef11828 ffff88bfd2923208 ffff88bfd0c80518 ffff88aeb12adb38 ffff88aeb12ad828 ffff88adb67a39b0 ffff88aeb12ad9b0 ffff88adb7acc9b0 ffff88adbcfdccc0 ffff88adbcfdc828 ffff88adbcfdc9b0 ffff88adbcfdcb38 ffff88adbcfdc518 ffff88adbcfdc080 ffff88adb67a3390 ffff88ae25263e48 ffff88ae910edb38 ffff88aeb12e6208 ffff88ae616ab208 ffff88ae616ab6a0 ffff88ae910ed9b0 ffff88aeaeae29b0 ffff88ada720a9b0 ffff88ae25263cc0 ffff88adcd691b38 ffff88bfd2d2b208 ffff88bfd2d2b828 ffff88ae910ed6a0 ffff88adb2d48390 ffff88bfd2d2bb38 ffff88adb2d48cc0 ffff88adcd691828 ffff88adcd691cc0 ffff88aeb12e6b38 ffff88ada720a6a0 ffff88aeb12e6e48 ffff88add004f208 ffff88adcee819b0 ffff88ae7ed9f828 ffff88aea9c859b0 ffff88adb2c49390 ffff88ae58504208 ffff88bfd0a40208 ffff88ae58504390 ffff88ae0edf8390 ffff88ae2536a390 ffff884eac548080 ffff88ae252639b0 ffff88ae252636a0 ffff88ae910ed080 ffff88aeb12bd9b0 ffff88addbcd66a0 ffff88ae221cb828 ffff88bfd0a40390 ffff88aea9c85b38 ffff88ae2536a208 ffff88ae2536a080 ffff88bfd0a406a0 ffff88ae9139fe48 ffff88ae072c1e48 ffff88aeaeb686a0 ffff88aeb1bf7208 ffff88aeb1bf7518 ffff88adf38096a0 ffff88bfd0a40828 ffff88bfd0a40e48 ffff88ae221cb080 ffff88ae7bcad518 ffff88aeb1bf79b0 ffff88aeaeb68cc0 ffff88aeaeb689b0 ffff88adb390b080 ffff88adb390bb38 ffff88aeb19e9e48 ffff886d054779b0 ffff88ae072c1518 ffff88ae21d24b38 ffff88aeb1bf7390 ffff88aea9c85208 ffff88aeb1bf7828 ffff88aea9c85518 ffff88ae8f98a518 ffff88adf3809b38 ffff88ae21d24e48 ffff88ae51ed06a0 ffff88ae21d24390 ffff88ae8f98acc0 ffff88addad836a0 ffff88ae907ee080 ffff88bfd2e03390 ffff88ae8f98a828 ffff88bfd2a5f518 ffff88adfd0d6518 ffff88ae7bcad080 ffff88addad83cc0 ffff88adfd134390 ffff88ae224869b0 ffff886cf9a4e6a0 ffff88ae8f98ab38 ffff88ae7ed65518 ffff88adf1563390 ffff88bfd114acc0 ffff88ada70f89b0 ffff88aeb19e9828 ffff882dd7da1390 ffff88ae21d24518 ffff88ae8f98a390 ffff88ae22486208 ffff88ae7ed65080 ffff882ea948b6a0 ffff883fd0ae3518 ffff88ae9131a208 ffff882e7548eb38 ffff882db73d6b38 ffff882db924e6a0 ffff88ae22167b38 ffff88adb7acc518 ffff883fd0b3ab38 ffff882ea8ca7390 ffff882d9f8636a0 ffff882d75cb8080 ffff882dd8a6f9b0 ffff882dd826f208 ffff882d5df6e9b0 ffff882e75511b38 ffff88ae8fc3bcc0 ffff88ae51ed0828 ffff88ae7bcad9b0 ffff88addadc9b38 ffff88ae209bb080 ffff88bfd0c80208 ffff88aeb1bd0b38 ffff88aeb12adcc0 ffff880c6d79e6a0 ffff884ead8aa208 ffff88adb79b1b38 ffff88adb65ae9b0 ffff88ae25263b38 ffff880db1feb390 ffff882dc623b390 ffff88adb390b6a0 ffff88adb7accb38 ffff88ae616ab9b0 ffff88ada720ab38 ffff88adbcfdc208 ffff88bfd0d38390 ffff88ae7ed9f390 ffff88adf1563518 ffff88ae2537e080 ffff88ae2536ab38 ffff88ae7ed9fcc0 ffff88ae90f66b38 ffff88ae22486080 ffff88ae7ef11208 ffff88adb7acc080 ffff886ce6b229b0 ffff884eaca9e080 ffff88ae66172b38 ffff88adb7acc208 ffff88ae905909b0 ffff88eab5714518 ffff88eab66ff6a0 ffff88ad9c675390 ffff885fd127b390 ffff88ae910f7e48 ffff88aeb12ad390 ffff88bfd0a40518 ffff880e4d5da9b0 ffff880d5b51bb38 ffff88ae923cfcc0 ffff88bfd2b6f9b0 ffff88ae2537e6a0 ffff88aeaea71e48 ffff88ae66172cc0 ffff88ae8ff2fcc0 ffff88cdd77ef828 ffff88ae8ff2f080 ffff88aeaeae2208 ffff88ae265f4828 ffff884d7a193e48 ffff88fbd6a23b38 ffff88aeb1bd0518 ffff88e9504b6518 ffff88ae911aa6a0 ffff88bfd0a409b0 ffff88aeaea71828 ffff884e8e773518 ffff88ae911aab38 ffff88aeaea71cc0 ffff88aeaea71b38 ffff88cd9a917828 ffff88aeb2bcd828 ffff88ae905906a0 ffff88ea1e277b38 ffff88ae9131a9b0 ffff88ae21d24828 ffff88aeb13c89b0 ffff88cdeeaa2cc0 ffff88ae0edf8b38 ffff88cdd7776b38 ffff88aeb2adae48 ffff88ae148ba828 ffff886d35178cc0 ffff88bfd0d38cc0 ffff88bfd0d38208 ffff885fd0b82208 ffff88bfd1be1b38 ffff88ae907ee828 ffff88adfd0d6208 ffff88adb2c49b38 ffff886ea6dc8828 ffff88ae250e1208 ffff88e99bccb518 ffff88ad97db7390 ffff88adb2d486a0 ffff88ad97db7208 ffff886e4eeea208 ffff88fbd3589080 ffff882db73d7390 ffff884b6d2e8080 ffff88eab56d8b38 ffff88ae907ee6a0 ffff88adf3bc7080 ffff88adf3bc7390 ffff88ceaa115208 ffff88ae22167518 ffff886ea759c390 ffff88ae9111ce48 ffff88addadc9390 ffff88e9de77a9b0 ffff88adfd0d6390 ffff88e9f4de2828 ffff88add02ac9b0 ffff88cdc7005080 ffff88ae9111c208 ffff88e9b1b3d828 ffff886d2ccf2b38 ffff88e975de4518 ffff88add403a390 ffff88ae9111c6a0 ffff88adfd0d6b38 ffff88e9ebe57390 ffff88ae90eef828 ffff88aeb2adacc0 ffff88adf1563080 ffff88ae8e323390 ffff88adcae27b38 ffff88ce0c985828 ffff88bfd2ae5b38 ffff886eaa826b38 ffff88addad83390 ffff886eaac6b828 ffff88aeb24e1518 ffff88ae22486cc0 ffff88aeaeae2080 ffff88ae7ed9f518 ffff88a070c117e0 ffff88aea9c85cc0 ffff88ae90f66080 ffff88adb390b9b0 ffff88dfd1b1f390 ffff88bfd0d92cc0 ffff880d5b42ce48 ffff88adb390b208 ffff88ae221cbb38 ffff88adc15ce828 ffff88fbd5324208 ffff88eab511ae48 ffff88e9504b6208 ffff884eb06e59b0 ffff88cd8f5f5518 ffff884e1ae86390 ffff88ae17e92080 ffff88fbd6a23e48 ffff880dbbb52080 ffff880c6d79eb38 ffff880dbb94e6a0 ffff88bfd2923080 ffff88cdae95e208 ffff88bfd0d38080 ffff884ca14e7390 ffff88adf3bc7518 ffff88ae910ed518 ffff88adf14e9080 ffff88ad97f42cc0 ffff880c82b4eb38 ffff88adceeb6390 ffff880d5d7df208 ffff880d89d5c9b0 ffff88ae22087b38 ffff88ae220879b0 ffff886e7af02b38 ffff88ae92350390 ffff88bfd0c80080 ffff88add02ac390 ffff88ae90c65518 ffff88ade791e9b0 ffff88add01ee6a0 ffff88aea9c85828 ffff884eaa200e48 ffff88bfd0905390 ffff88add02ace48 ffff88ade791ee48 ffff884eaed1a390 ffff88ae7655b828 ffff880cb9542390 ffff88ce47371cc0 ffff88adfd172828 ffff88ae90c65390 ffff88ad93faf9b0 ffff88ae1d3d6e48 ffff88cd8f5f56a0 ffff88bfd0d389b0 ffff88adb2c49208 ffff88ae8e307390 ffff88ade1586208 ffff88ae265f4b38 ffff88ae9139f208 ffff88ae8e307080 ffff88ae8e307208 ffff88addadc9208 ffff88addbcd6828 ffff88ae16179518 ffff88ad9c712828 ffff88ae911bccc0 ffff88ae16179cc0 ffff88add02ac208 ffff88bfd0d92208 ffff88bfd0d92390 ffff88bfd2d92080 ffff88aeaeb68e48 ffff88ae161799b0 ffff88bfd1b31cc0 ffff88aea9c856a0 ffff88ae7bcadcc0 ffff88ae19ac86a0 ffff88ae2abbfb38 ffff88adb6738b38 ffff88adfd172080 ffff880db1feb6a0 ffff88e9b1bba390 ffff88ae7ef11b38 ffff88ae04b65208 ffff88ae9139fb38 ffff88adcb5946a0 ffff88ae04b65518 ffff88bfd2ae5080 ffff88ae04b65080 ffff88ae66185390 ffff88ae92350208 ffff880db1febb38 ffff88ae9139f828 ffff88ae04b65e48 ffff88eaa1221080 ffff88ade8ec7e48 ffff88dfd09846a0 ffff88ae04b65390 ffff88ae9040fe48 ffff88bfd2ae5208 ffff88adbccc3828 ffff884ea9c949b0 ffff88bfd2067cc0 ffff88aeb1360828 ffff88bfd2067518 ffff88bfd1b31080 ffff88ae7ef11cc0 ffff88ad9c712cc0 ffff88e950606cc0 ffff88ae1d3d6cc0 ffff886e3558f390 ffff88e9b1b3d9b0 ffff88ad9c712390 ffff88ae91140b38 ffff886d0db7a390 ffff88aeb13609b0 ffff88fbd356e6a0 ffff88e9cfc4d9b0 ffff88ae90c65080 ffff88aeb1360390 ffff88adb2c49828 ffff88ae90d95e48 ffff886e753c6b38 ffff88aeb1360518 ffff88bfd0d92b38 ffff88ad93fafcc0 ffff88adb2c49e48 ffff88adbccc3080 ffff88adb2c49cc0 ffff88ae7655b6a0 ffff88ae90668cc0 ffff88aeb105c828 ffff881fcbb20b38 ffff88ae0ef5eb38 ffff88aeb105c6a0 ffff88aeb105cb38 ffff88bfd2067e48 ffff88bfd2067080 ffff88bfd2067828 ffff88ae50d3e080 ffff88bfd2067390 ffff88ae9138fcc0 ffff881fcac1de48 ffff885fd08dd208 ffff88bfd0a9a080 ffff88ae04b65b38 ffff884eaf34e6a0 ffff88ae08f06390 ffff88ae04b65828 ffff88aeaebf76a0 ffff88ae19a32390 ffff88ae90c65208 ffff886ea7a13828 ffff88cd7c9046a0 ffff88ae91140080 ffff884d4c98d208 ffff88ae08f06208 ffff88adb2d48208 ffff88adfd1349b0 ffff88bfd1bc4080 ffff88cd7c904518 ffff88aeb13c8e48 ffff88ceaa3789b0 ffff88e9a9af6518 ffff88bfd235ecc0 ffff88cdc72e9390 ffff88ae91381b38 ffff880e9dcfeb38 ffff88ceaa488518 ffff880db1feb208 ffff88adb65a8208 ffff88adb65a8390 ffff88e9f30f9390 ffff88bfd235e6a0 ffff88bfd235e518 ffff88ae1ec7de48 ffff88bfd1ff0080 ffff88ae04b65cc0 ffff88adac785828 ffff884eaa895390 ffff884eaa7d76a0 ffff88cdf26bb9b0 ffff88bfd29236a0 ffff88ada7246080 ffff88ae616abcc0 ffff88adac785390 ffff88eaa1221518 ffff884eafd48208 ffff88ae17db66a0 ffff887fd1a2e390 ffff88e9af9bc080 ffff88adfd134080 ffff884e3eae3828 ffff88ae911bce48 ffff88ade8e769b0 ffff88adb2c49518 ffff88aeb13c8b38 ffff88e9504af9b0 ffff884eac132208 ffff886cf997d6a0 ffff88bfd1ff0828 ffff88bfd0a40080 ffff88ae250e1828 ffff88bfd2e03208 ffff88aeb1331518 ffff88ae90d95cc0 ffff88e95d1fce48 ffff885fcf29a208 ffff88ad93fafb38 ffff88ae250e1518 ffff88e929d91e48 ffff88adf14d19b0 ffff88aeb12bde48 ffff88aeb12bd080 ffff884e3e8a5828 ffff88dfd27a1828 ffff88aeb19e96a0 ffff88ade8e76828 ffff884eabc9cb38 ffff88bfd2923e48 ffff88ae661726a0 ffff88aeb1331208 ffff88adac7ff6a0 ffff884ca3563390 ffff88adf14d1080 ffff88aeb12bd390 ffff884ddb078828 ffff88adb79b19b0 ffff884bad02ab38 ffff88ada72476a0 ffff88ae66172518 ffff88adb79b1cc0 ffff88ae24062e48 ffff88ae910ef828 ffff88ade8ec7518 ffff88e929f02518 ffff88bfd114ab38 ffff88aeb13c8518 ffff88adcb596080 ffff88bfd114a6a0 ffff88e9635bc828 ffff88bfd1bc4390 ffff88aeb1331cc0 ffff88aeb1331828 ffff884d3d465390 ffff88eab6f846a0 ffff88adf752ce48 ffff884dc4ec7828 ffff88ade8ec7080 ffff88bfd2923b38 ffff88aeb12bd208 ffff88ae84c5b9b0 ffff88adf752ccc0 ffff884eac0f0cc0 ffff88ae265f4cc0 ffff88add007b9b0 ffff88ae907ee208 ffff88ae663289b0 ffff88ae907eecc0 ffff88e9e09106a0 ffff884eaafb06a0 ffff88ad97f6cb38 ffff88ae66172e48 ffff88ad97f6ccc0 ffff88e99dc8e828 ffff88adb67386a0 ffff88ae911aa518 ffff88add01ee208 ffff88ae21d24080 ffff88ae8ff2f390 ffff884eaa444518 ffff88bfd1bc46a0 ffff884eaad80080 ffff88cdd7543cc0 ffff88ae7bcad390 ffff88ade8ec7b38 ffff88adf1563cc0 ffff88aeaebf7208 ffff88ad9c712080 ffff88bfd2923cc0 ffff88bfd29239b0 ffff88ada720ae48 ffff88aeaebf7080 ffff88ae66172208 ffff88ae221679b0 ffff88bfd20679b0 ffff880db033db38 ffff88ade791e518 ffff880e8dad4390 ffff880e9dc8d9b0 ffff886ce6b22cc0 ffff886eab1ef080 ffff886d0217de48 ffff88ae8f98ae48 ffff88bfd0d92e48 ffff88ae250e4208 ffff88ae250e4cc0 ffff88bfd2067208 ffff88add406a6a0 ffff885fd08ddcc0 ffff884c12dc3e48 ffff884c6dc6f9b0 ffff88ae7ef11390 ffff880cc28f99b0 ffff88ade36a09b0 ffff880e9eef79b0 ffff882d75dec6a0 ffff88addadc9e48 ffff88aeb1331390 ffff88cda8c88828 ffff88adf3809e48 ffff88aea9c6a080 ffff88bfd2d2be48 ffff884eaf304b38 ffff88aeaeae26a0 ffff88aeb19e9080 ffff88adf3809518 ffff88ae906ca6a0 ffff88aeb19e9518 ffff88ae911bc828 ffff88adf3809080 ffff88adc3abd208 ffff88aeb19e9cc0 ffff88ae17e926a0 ffff88adb67389b0 ffff88ae31b04e48 ffff88cda8c88518 ffff88ae923cf390 ffff886d1630c390 ffff880e9eef7208 ffff88aeb12bd518 ffff88adb2c49080 list: duplicate list entry: ffff88adb2c49080 crash> p jiffies jiffies = $2 = 4465500172 crash> struct request ffff88adb2c49080 struct request { queuelist = { next = 0xffff884bfe6f9518, prev = 0xffff88aeb12bd518 }, csd = { list = { next = 0xffffffff81278370 <trigger_softirq>, prev = 0xffff88adb2c49070 }, func = 0x1, info = 0xffff881fcd641328, flags = 7749, priv = 256 }, q = 0x1, cmd_flags = 135, cmd_type = 8192, atomic_flags = 721182495, cpu = -208067136, __data_len = 4294936749, __sector = 18446612879430460864, bio = 0x0, biotail = 0x0, hash = { next = 0xffff88adb2c490f0, pprev = 0x0 }, { rb_node = { rb_parent_color = 0, rb_right = 0x0, rb_left = 0x0 }, completion_data = 0x0 }, { elevator_private = {0x0, 0xffff883fd1dd6000, 0x10a2a1cbe}, flush = { seq = 0, list = { next = 0xffff883fd1dd6000, prev = 0x10a2a1cbe } } }, rq_disk = 0x9b777c92699a, start_time = 170938157020987, start_time_ns = 4294967297, io_start_time_ns = 18446612745141827968, nr_phys_segments = 0, ioprio = 0, ref_count = 0, special = 0xb, buffer = 0x0, tag = 0, errors = 0, __cmd = "\020\204\300c\256\210\377\377\020\000\000\000\000\000\000", cmd = 0x200000000000 <Address 0x200000000000 out of bounds>, cmd_len = 0, extra_len = 0, sense_len = 170536693, resid_len = 1, sense = 0xffff88adb2c49ac8, deadline = 18446612878404586184, timeout_list = { next = 0xfa0, prev = 0xffffffffa0002a00 }, timeout = 2999226456, retries = -30547, end_io = 0x0, end_io_data = 0x0, next_rq = 0xffffffff00000000, pad = 0xffff88ae616aaae8 }
Thank you. Can you provide the output of > struct request ffff88a070c137c0 > foreach bt
Created attachment 1146143 [details] struct request ffff88a070c137c0 and foreach bt output. struct request ffff88a070c137c0 and foreach bt output from corefile.
OK, thank you very much.
If the below looks like the bug, then we have reproduced it with the same test node load without the imperva module load. An oracle flashback was in progress doing in excess of 10000write-iops/second (480MB/second) to a flash array. iostats was reporting queue: values on the disks of up to 27 about the time the panic happened, with BUSY values of 95+, with avgresp of 15 or so and avgwait of as high of 472, so the io subsystem was being pushed hard during the flashback. <4>[86408.053121] ------------[ cut here ]------------ <2>[86408.062550] kernel BUG at block/blk-core.c:1144! <4>[86408.074664] invalid opcode: 0000 [#1] SMP <4>[86408.087949] last sysfs file: /sys/devices/virtual/net/bond0/carrier <4>[86408.099259] CPU 20 <4>[86408.101943] Modules linked in: oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U) hangcheck_timer mptctl mptbase nfsd exportfs oracleasm(U) nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc ipv6 ext3 jbd dm_round_robin iTCO_wdt iTCO_vendor_support be2net ixgbe dca mdio e1000e ptp pps_core microcode ipmi_devintf serio_raw lpc_ich mfd_core hpilo hpwdt i7core_edac edac_core sg power_meter acpi_ipmi ipmi_si ipmi_msghandler bnx2 shpchp ext4 jbd2 mbcache sr_mod cdrom sd_mod lpfc scsi_transport_fc scsi_tgt crc_t10dif pata_acpi ata_generic ata_piix hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] <4>[86408.214181] <4>[86408.220736] Pid: 990, comm: kblockd/20 Tainted: P --------------- 2.6.32-504.1.3.el6.x86_64 #1 HP ProLiant DL980 G7 <4>[86408.243917] RIP: 0010:[<ffffffff8126ee84>] [<ffffffff8126ee84>] blk_requeue_request+0x94/0xa0 <4>[86408.258790] RSP: 0018:ffff881fd2a11bc0 EFLAGS: 00010002 <4>[86408.269941] RAX: ffff88e9edbb5630 RBX: ffff88e9edbb5508 RCX: ffff88e9edbb5630 <4>[86408.283143] RDX: ffff88e9edbb5630 RSI: ffff88e9edbb5508 RDI: ffff88e9edbb5508 <4>[86408.296269] RBP: ffff881fd2a11be0 R08: 0000000000000001 R09: 0000000000000002 <4>[86408.309316] R10: 0000000000000000 R11: 0000000000000000 R12: ffff889fcff0ab68
Since the purpose of the scsi timeout appears to be to detect a bad/slow scsi device, shouldn't the timer start once the io is submitted to the actual device rather than when the io is submitted to the scsi layer and starts working its way through? Or maybe do we need 2 timeouts, one for the on the wire time and one for this layer. The purpose of the timeout in our usage is to deal with underlying physical SAN issues so a timer that timed things closer to the actual SAN device would do a better job for our primary usage.
cc. 2.6.32-504.16.2.el6.x86_64 filename: /lib/modules/2.6.32-504.16.2.el6.x86_64/weak-updates/elx-lpfc/lpfc.ko version: 0:10.4.255.16 Think we hit this one under moderate tape IO.
(In reply to Tore H. Larsen from comment #17) > cc. 2.6.32-504.16.2.el6.x86_64 > > filename: > /lib/modules/2.6.32-504.16.2.el6.x86_64/weak-updates/elx-lpfc/lpfc.ko > version: 0:10.4.255.16 > > Think we hit this one under moderate tape IO. OK, thank you. Do you have a crash dump we could examine?
Unfortunately no dump. Have to admit that this was emulex elx-lpfc driver pulled from Emulex, as I had issues with the defaulte. Kernel also tainted with cxfs (sgi) 7.3.0.3 and lin_tape (ibm) 2.9.4 as well. [root@pem-adm1 ~]# systool -c scsi_host -v |grep -i fwrev fwrev = "10.0.803.25, sli-4:2:b" fwrev = "10.0.803.25, sli-4:2:b" fwrev = "10.0.803.25, sli-4:2:b" fwrev = "10.0.803.25, sli-4:2:b" fwrev = "10.0.803.25, sli-4:2:b" fwrev = "10.0.803.25, sli-4:2:b" [root@pem-adm1 ~]# systool -c scsi_host -v |grep -i driv lpfc_drvr_version = "Emulex LightPulse Fibre Channel SCSI driver 10.4.255.16" lpfc_drvr_version = "Emulex LightPulse Fibre Channel SCSI driver 10.4.255.16" lpfc_drvr_version = "Emulex LightPulse Fibre Channel SCSI driver 10.4.255.16" lpfc_drvr_version = "Emulex LightPulse Fibre Channel SCSI driver 10.4.255.16" lpfc_drvr_version = "Emulex LightPulse Fibre Channel SCSI driver 10.4.255.16" lpfc_drvr_version = "Emulex LightPulse Fibre Channel SCSI driver 10.4.255.16" [root@pem-adm1 ~]# systool -c scsi_host -v |grep -i model modeldesc = "Emulex LPe16002B-M6 PCIe 2-port 16Gb Fibre Channel Adapter" modelname = "LPe16002B-M6" modeldesc = "Emulex LPe16002B-M6 PCIe 2-port 16Gb Fibre Channel Adapter" modelname = "LPe16002B-M6" modeldesc = "Emulex LightPulse LPe16004-M6 4-Port 16Gb Fibre Channel Adapter" modelname = "LPe16004-M6" modeldesc = "Emulex LightPulse LPe16004-M6 4-Port 16Gb Fibre Channel Adapter" modelname = "LPe16004-M6" modeldesc = "Emulex LightPulse LPe16004-M6 4-Port 16Gb Fibre Channel Adapter" modelname = "LPe16004-M6" modeldesc = "Emulex LightPulse LPe16004-M6 4-Port 16Gb Fibre Channel Adapter" modelname = "LPe16004-M6" [root@pem-adm1 modprobe.d]# more lpfc.conf options lpfc lpfc_sg_seg_cnt=256 \ lpfc_fcp_io_channel=4 \ lpfc_fcp_io_sched=1 \ lpfc_fcp_imax=500000 \ lpfc_lun_queue_depth=32 \ lpfc_fcp2_no_tgt_reset=1 messages log prior to panic: Sep 23 05:26:58 pem-adm1 kernel: sd 12:0:2:231: [sdes] Synchronizing SCSI cache Sep 23 05:26:58 pem-adm1 kernel: lpfc 0000:81:00.1: 1:(0):0722 Target Reset rport failure: rdata xffff882062770fe8 Sep 23 05:26:58 pem-adm1 kernel: end_request: I/O error, dev sdeu, sector 26437910144 Sep 23 05:26:58 pem-adm1 kernel: sd 12:0:2:232: [sdet] Synchronizing SCSI cache Sep 23 05:26:58 pem-adm1 kernel: sd 12:0:2:233: [sdeu] Synchronizing SCSI cache Sep 23 05:26:58 pem-adm1 kernel: sd 12:0:2:234: [sdev] Synchronizing SCSI cache Sep 23 05:26:58 pem-adm1 kernel: sd 12:0:2:240: [sdew] Synchronizing SCSI cache Sep 23 05:26:58 pem-adm1 kernel: sd 12:0:2:241: [sdex] Synchronizing SCSI cache Sep 23 05:26:58 pem-adm1 kernel: sd 12:0:2:242: [sdey] Synchronizing SCSI cache Sep 23 05:26:58 pem-adm1 kernel: sd 12:0:2:243: [sdez] Synchronizing SCSI cache Sep 23 05:26:58 pem-adm1 kernel: sd 12:0:2:244: [sdfa] Synchronizing SCSI cache Sep 23 05:26:58 pem-adm1 kernel: sd 12:0:2:245: [sdfb] Synchronizing SCSI cache Sep 23 05:26:58 pem-adm1 kernel: sd 12:0:2:246: [sdfc] Synchronizing SCSI cache Sep 23 05:26:58 pem-adm1 kernel: sd 12:0:2:247: [sdfd] Synchronizing SCSI cache Sep 23 05:26:58 pem-adm1 kernel: sd 12:0:2:248: [sdfe] Synchronizing SCSI cache Sep 23 05:26:58 pem-adm1 kernel: sd 12:0:2:249: [sdff] Synchronizing SCSI cache Sep 23 05:27:31 pem-adm1 kernel: lpfc 0000:81:00.1: 1:(0):2756 LOGO failure DID:011A00 Status:x3/x31000002 Sep 23 05:28:34 pem-adm1 kernel: lpfc 0000:81:00.1: 1:(0):2753 PLOGI failure DID:011A00 Status:x3/x31000002 Sep 23 05:29:39 pem-adm1 kernel: lpfc 0000:84:00.1: 3:(0):0727 TMF FCP_LUN_RESET to TGT 1 LUN 231 failed (3, 805306372) iocb_flag x6 Sep 23 05:29:39 pem-adm1 kernel: lpfc 0000:84:00.1: 3:(0):0713 SCSI layer issued Device Reset (1, 231) return x2003 Sep 23 05:30:09 pem-adm1 kernel: lpfc 0000:84:00.1: 3:(0):0203 Devloss timeout on WWPN 20:32:00:80:e5:29:a1:38 NPort x011a00 Data: x100 x5 x6 Sep 23 05:30:09 pem-adm1 kernel: sd 14:0:1:231: [sdnb] Synchronizing SCSI cache Sep 23 05:30:09 pem-adm1 kernel: lpfc 0000:84:00.1: 3:(0):0722 Target Reset rport failure: rdata xffff88205e5f37e8 Sep 23 05:30:09 pem-adm1 kernel: end_request: I/O error, dev sdnb, sector 26439833216 Sep 23 05:30:09 pem-adm1 kernel: sd 14:0:1:232: [sdnc] Synchronizing SCSI cache Sep 23 05:30:09 pem-adm1 kernel: sd 14:0:1:233: [sdnd] Synchronizing SCSI cache Sep 23 05:30:09 pem-adm1 kernel: sd 14:0:1:234: [sdne] Synchronizing SCSI cache Sep 23 05:30:09 pem-adm1 kernel: sd 14:0:1:240: [sdnf] Synchronizing SCSI cache Sep 23 05:30:09 pem-adm1 kernel: sd 14:0:1:241: [sdng] Synchronizing SCSI cache Sep 23 05:30:09 pem-adm1 kernel: sd 14:0:1:242: [sdnh] Synchronizing SCSI cache Sep 23 05:30:09 pem-adm1 kernel: sd 14:0:1:243: [sdni] Synchronizing SCSI cache Sep 23 05:30:09 pem-adm1 kernel: sd 14:0:1:244: [sdnj] Synchronizing SCSI cache Sep 23 05:30:09 pem-adm1 kernel: sd 14:0:1:245: [sdnk] Synchronizing SCSI cache Sep 23 05:30:09 pem-adm1 kernel: sd 14:0:1:246: [sdnl] Synchronizing SCSI cache Sep 23 05:30:09 pem-adm1 kernel: sd 14:0:1:247: [sdnm] Synchronizing SCSI cache Sep 23 05:30:09 pem-adm1 kernel: sd 14:0:1:248: [sdnn] Synchronizing SCSI cache Sep 23 05:30:09 pem-adm1 kernel: sd 14:0:1:249: [sdno] Synchronizing SCSI cache Sep 23 05:30:38 pem-adm1 kernel: lpfc 0000:85:00.1: 5:(0):0727 TMF FCP_LUN_RESET to TGT 7 LUN 231 failed (3, 805306372) iocb_flag x6 Sep 23 05:30:38 pem-adm1 kernel: lpfc 0000:85:00.1: 5:(0):0713 SCSI layer issued Device Reset (7, 231) return x2003 Sep 23 05:30:42 pem-adm1 kernel: lpfc 0000:84:00.1: 3:(0):2756 LOGO failure DID:011A00 Status:x3/x31000002 Sep 23 06:25:11 pem-adm1 kernel: imklog 5.8.10, log source = /proc/kmsg started. No indication on NetApp/SGI E5660F storage side eventlog. Have to admit that the array which includes lun's 231 and 233 had a complete drawer failure some weeks back, but due to enough GHS and enclosure protection it rebuilt completely in the background. Storage array firmware 08.10.15.00. Planning to go to latest qualified 08.10.19.00 after survey. As well as kernel 2.6.32-504.30.3.
(In reply to Roger Heflin from comment #13) > Since the purpose of the scsi timeout appears to be to detect a bad/slow > scsi device, shouldn't the timer start once the io is submitted to the > actual device rather than when the io is submitted to the scsi layer and > starts working its way through? Or maybe do we need 2 timeouts, one for > the on the wire time and one for this layer. The purpose of the timeout in > our usage is to deal with underlying physical SAN issues so a timer that > timed things closer to the actual SAN device would do a better job for our > primary usage. So, the timeout is started when the request is taken off the queue and submitted to the HBA to go out on the fabric, it is not running while the request is on the queue, just to be clear. This happens when scsi_request_fn() calls blk_start_request() -> blk_add_timer(). scsi_request_fn() then calls scsi_dispatch_cmd() which calls the lpfc driver's lpfc_queuecommand() function. If at any point the command cannot be started, the timer is stopped and the command is requeued. The timer will be started again from the beginning the next time it is attempted to be issued. The rare race that David Jeffery refers to is that there is a finite amount of processing performed between the call to blk_add_timer() and the time when the driver adds the command to its internal data structures. Until this is done, the abort logic (described below) would not work correctly. A timeout, when it happens, results in a call to abort the command. The abort call into the driver will check to see if the command is still pending (i.e. if it is still in the driver's internal data structures). If it is, an ABTS is issued to the fabric and we wait for completion. The logic is designed to ensure that the driver is not still using the command prior to the abort call completion. After this, the SCSI midlayer performs error recovery to attempt to regain connectivity to the device. If the command is in the process of being added to the driver and it times out, an abort call would succeed, and the normal command processing would also succeed. In practice this does not happen because there is not (usually) any significant delay in the driver's _queuecommand() routine. There is also a small amount of processing performed if the driver's _queuecommand() routine rejects the command, causing it to be requeued, or if one of the checks for (host, target, device) blocked causes it to be requeued without being issued. In those cases the timer is stopped shortly after being started. There is also the possibility for a race condition upon the completion of a command, when the driver is removing the reference to the command from its internal data structures. In the case of lpfc, however, this is not done until after ->scsi_done() is called, so it does not look like it would be the cause of the problem. It might be a problem for other drivers though. --- It is difficult to see how we would not make it to the point of calling lpfc_get_scsi_buf() and setting lpfc_cmd->pCmd and cmnd->host_scribble in lpfc_queuecommand() within a 4 second timeout. The sources of delay are: spin_lock(shost->host_lock); [ in scsi_request_fn() ] spin_lock_irqsave(host->host_lock, flags); [ in scsi_dispatch_cmd() ] spin_lock_irqsave(&phba->scsi_buf_list_get_lock, iflag); ---
Here is some new stuff: We have 4 nodes with each slightly different loads on them that we can reproduce this issue with the 4 sec timeout. 2 of those nodes stop being able to reproduce the kernel panic with the timeout set to 15 seconds. 2 of the other nodes have still reproduced the scsi crash with the timeout set as high as 180 seconds. In both of those cases the SAN is very likely losing random requests as there are a number of hosts sharing the ports on the array so there is a high probability of requests being lost and the abort code having to run when this is happening. This exact same setup was not having any issues under a rhel5.10 errata kernel with the same badly overused SAN and the 4 second timeout. All nodes reproducing the issue either have extreme load (the first one that reproduced it was doing an oracle flashback and I believe is successfully overloading the write cache on a flash array, it writes at 800mb/second for quite a while (a number of minutes) with good response times and then the response times go up >10x and we hit the issue if the timeout is 4sec, but don't hit it when we have higher timeouts). In this case the SAN array and ports are dedicated to the given hosts so I would not expect there to be loss of SAN requests, just slow SAN requests because of the array possibly having its write cache overrun. The other cases are not doing sustained writes for long periods of time but are also rapidly applying oracle archive logs every so often that takes a minute or more of intensive writes. Given the 180second timeout hitting it it may be more a simple case of the abort is running at the exact same time something else is attempting to work on the same request. I would be surprised if in the overused SAN case if a request were actually coming back as complete after 180seconds, but that may actually be possible that the array had been trying to send back the completely request for a while and just finally gets it back. In all of these cases though, we do see extremely high response times and wait times and I am unable to get my disk response monitoring tool to be able to successfully write stuff to a local internal disk (HP cciss/hpsa driver) for large periods of time when this is happening, so it does appear that the entire disk IO subsystem was backed up and not responding in a reasonable time. I am able to run that same tool with output to the screen being saved on another machine, so it is not an issue with the underlying iostat command I am using to collect my data. It does show very high response and wait times >4000's of ms when the issue happens.
It would be helpful to at least get the stack traces from the crashes you mention, particularly those where the timeout values were much higher.
This is from the serial console we have logging all on the troublesome machines: kernel BUG at block/blk-core.c:2166!^M invalid opcode: 0000 [#1] SMP ^M last sysfs file: /sys/devices/pci0000:00/0000:00:07.0/0000:0b:00.0/host2/rport-2:0-6/target2:0:4/2:0:4:32/state^M CPU 55 ^M Modules linked in: bridge oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U) hangcheck_timer krg_11_0_0_1130_impRHEL6K1smp-x86_64(P)(U) mptctl mptbase oracleasm(U) bonding 8021q garp stp llc ipv6 ext3 jbd microcode be2net iTCO_wdt iTCO_vendor_support serio_raw lpc_ich mfd_core hpwdt hpilo i7core_edac edac_core e1000e ptp pps_core ses enclosure ipmi_devintf power_meter acpi_ipmi ipmi_si ipmi_msghandler sg bnx2 shpchp ext4 jbd2 mbcache dm_round_robin sr_mod cdrom sd_mod pata_acpi ata_generic ata_piix lpfc scsi_transport_fc scsi_tgt crc_t10dif hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]^M ^M Pid: 833, comm: kblockd/55 Tainted: P --------------- 2.6.32-504.1.3.el6.x86_64 #1 HP ProLiant DL980 G7^M RIP: 0010:[<ffffffff8126eafb>] [<ffffffff8126eafb>] blk_start_request+0x4b/0x50^M RSP: 0018:ffff881fd2f59c50 EFLAGS: 00010002^M RAX: 0000000000000000 RBX: ffff880668431690 RCX: 000000000000cb38^M RDX: 000107c1fd5fbd2e RSI: ffff88c070dc0000 RDI: ffff880668431690^M RBP: ffff881fd2f59c60 R08: 0000000000000000 R09: 0000000000000000^M R10: 0000000000000001 R11: 0000000000000000 R12: ffff881fced65e20^M R13: 000000000000000e R14: 000000000000000e R15: ffff881fced94e68^M FS: 0000000000000000(0000) GS:ffff88c070dc0000(0000) knlGS:0000000000000000^M CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b^M CR2: 00007f8714c81978 CR3: 000000c6c4dc7000 CR4: 00000000000007e0^M DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400^M Process kblockd/55 (pid: 833, threadinfo ffff881fd2f58000, task ffff881fd2f57500)^M Stack:^M ffff881fd2f59c70 ffff880668431690 ffff881fd2f59ca0 ffffffff81273a39^M <d> ffff881fd2f59c90 ffff881fced6f000 ffff881fced94e68 ffff880668431690^M <d> ffff881fd1973000 ffff88c070dd9ac8 ffff881fd2f59d10 ffffffff813875c1^M Call Trace:^M [<ffffffff81273a39>] blk_queue_start_tag+0x89/0x120^M [<ffffffff813875c1>] scsi_request_fn+0x131/0x750^M [<ffffffff8108748d>] ? del_timer+0x7d/0xe0^M [<ffffffff8126f562>] __generic_unplug_device+0x32/0x40^M [<ffffffff8126f59e>] generic_unplug_device+0x2e/0x50^M [<ffffffff8126b3e4>] blk_unplug+0x34/0x70^M [<ffffffffa000461c>] dm_table_unplug_all+0x5c/0x100 [dm_mod]^M [<ffffffff8126b440>] ? blk_unplug_work+0x0/0x70^M [<ffffffff8126f562>] ? __generic_unplug_device+0x32/0x40^M [<ffffffff8126b440>] ? blk_unplug_work+0x0/0x70^M [<ffffffffa0000fa6>] dm_unplug_all+0x36/0x50 [dm_mod]^M [<ffffffff8126b476>] blk_unplug_work+0x36/0x70^M [<ffffffff8126b440>] ? blk_unplug_work+0x0/0x70^M [<ffffffff81097fe0>] worker_thread+0x170/0x2a0^M [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40^M [<ffffffff81097e70>] ? worker_thread+0x0/0x2a0^M [<ffffffff8109e66e>] kthread+0x9e/0xc0^M [<ffffffff8100c20a>] child_rip+0xa/0x20^M [<ffffffff8109e5d0>] ? kthread+0x0/0xc0^M [<ffffffff8100c200>] ? child_rip+0x0/0x20^M Code: 8b 83 50 01 00 00 48 85 c0 75 15 f6 43 48 01 75 1a 48 89 df e8 f7 9a 00 00 48 83 c4 08 5b c9 c3 8b 50 54 89 90 14 01 00 00 eb e0 <0f> 0b eb fe 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 31 c0 ^M RIP [<ffffffff8126eafb>] blk_start_request+0x4b/0x50^M RSP <ffff881fd2f59c50>^M I only have this one, we have the timeouts set such that oracle usually boots the node before the panic as the panic leaves the node hung up. Our kdump setup does not appear to have memory allocated to kdump (it ooms when bring in the ~900 SAN luns), and hangs the node rather that doing anything. Not sure how to change that in rhel6 since that is all supposed to be automagic. probably a bug in rhel6.6 kdump I assume. If we were to really need much more I will have to have them upgrade to 6.7 as 6.6 attempts to dump hugepages which is a significant size on this node, and that bug is fixed in 6.7+, and possibly the other kdump issue.
void blk_start_request(struct request *req) { blk_dequeue_request(req); /* * We are now handing the request to the hardware, initialize * resid_len to full count and add the timeout handler. */ req->resid_len = blk_rq_bytes(req); if (unlikely(blk_bidi_rq(req))) req->next_rq->resid_len = blk_rq_bytes(req->next_rq); BUG_ON(test_bit(REQ_ATOM_COMPLETE, &req->atomic_flags)); <==== HERE blk_add_timer(req); } Machine crashed because request already had the completion bit set, before the timer was started. The bit is cleared when the request is allocated and initialized (memset to 0), or when it is requeued (after the timer is stopped), or when the timer is restarted in certain cases (e.g. due to an FC port being in the blocked state and the devloss timer not yet having expired). This will be very difficult to diagnose without a crash dump to examine. I suspect it was caused by the lpfc driver somehow completing the request after it had been aborted and requeued (thus the REQ_ATOM_COMPLETE bit had been reset). There might be some evidence of this earlier in the console output, is there anything in there about any aborted commands?
No aborts on the console or in messages documented in the previous several hours to the crash. I have 3 other similar crashes crash with the timeout set low. <2>kernel BUG at block/blk-core.c:1144! <4>invalid opcode: 0000 [#1] SMP <4>last sysfs file: /sys/devices/system/cpu/online <4>CPU 60 <4>Modules linked in: oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U) hangcheck_timer krg_11_0_0_1130_impRHEL6K1smp-x86_64(P)(U) mptctl mptbase nfsd exportfs oracleasm(U) nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc ipv6 ext3 jbd dm_round_robin iTCO_wdt iTCO_vendor_support be2net ixgbe dca mdio e1000e ptp pps_core microcode ipmi_devintf serio_raw lpc_ich mfd_core hpilo hpwdt i7core_edac edac_core sg power_meter acpi_ipmi ipmi_si ipmi_msghandler bnx2 shpchp ext4 jbd2 mbcache sr_mod cdrom sd_mod lpfc scsi_transport_fc scsi_tgt crc_t10dif pata_acpi ata_generic ata_piix hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] <4> <4>Pid: 1030, comm: kblockd/60 Tainted: P --------------- 2.6.32-504.1.3.el6.x86_64 #1 HP ProLiant DL980 G7 <4>RIP: 0010:[<ffffffff8126ee84>] [<ffffffff8126ee84>] blk_requeue_request+0x94/0xa0 <4>RSP: 0018:ffff881fd2aa3bc0 EFLAGS: 00010006 <4>RAX: ffff88c3bec68f60 RBX: ffff88c3bec68e38 RCX: ffff88c3bec68f60 <4>RDX: ffff88c3bec68f60 RSI: ffff88c3bec68e38 RDI: ffff88c3bec68e38 <4>RBP: ffff881fd2aa3be0 R08: 0000000000000001 R09: 000000000000003c <4>R10: 0000000000000001 R11: 0000000000000000 R12: ffff881fce404678 <4>R13: 0000000000000000 R14: ffff881fcf314000 R15: ffff886e9ed89580 <4>FS: 0000000000000000(0000) GS:ffff88c070c00000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>CR2: 0000000089ca3014 CR3: 0000000e9f403000 CR4: 00000000000007e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process kblockd/60 (pid: 1030, threadinfo ffff881fd2aa2000, task ffff881fd2aa1500) <4>Stack: <4> ffffffffffffff04 ffff881fcebf4000 ffff881fce404678 ffff88c3bec68e38 <4><d> ffff881fd2aa3c50 ffffffff81387665 ffff881fd2aa3d00 ffff88c73cd0e070 <4><d> ffff883fd2a90280 0000000000000004 ffff881fcebf4138 ffff881fcebf4048 <4>Call Trace: <4> [<ffffffff81387665>] scsi_request_fn+0x1d5/0x750 <4> [<ffffffff8126f3c1>] __blk_run_queue+0x31/0x40 <4> [<ffffffff8126a89a>] elv_insert+0xfa/0x190 <4> [<ffffffff8126a970>] __elv_add_request+0x40/0x90 <4> [<ffffffff8126edad>] blk_insert_cloned_request+0x7d/0xc0 <4> [<ffffffffa000315c>] dm_dispatch_request+0x3c/0x70 [dm_mod] <4> [<ffffffffa0003c37>] dm_request_fn+0x187/0x2f0 [dm_mod] <4> [<ffffffff8126b440>] ? blk_unplug_work+0x0/0x70 <4> [<ffffffff8126f562>] __generic_unplug_device+0x32/0x40 <4> [<ffffffff8126f59e>] generic_unplug_device+0x2e/0x50 <4> [<ffffffff8126b440>] ? blk_unplug_work+0x0/0x70 <4> [<ffffffffa0000fb5>] dm_unplug_all+0x45/0x50 [dm_mod] <4> [<ffffffff8126b476>] blk_unplug_work+0x36/0x70 <4> [<ffffffff8126b440>] ? blk_unplug_work+0x0/0x70 <4> [<ffffffff81097fe0>] worker_thread+0x170/0x2a0 <4> [<ffffffff8109eb00>] ? autoremove_wake_function+0x0/0x40 <4> [<ffffffff81097e70>] ? worker_thread+0x0/0x2a0 <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 <4>Code: 00 00 eb d1 4c 8b 2d 1c 3a 96 00 4d 85 ed 74 bf 49 8b 45 00 49 83 c5 08 48 89 de 4c 89 e7 ff d0 49 8b 45 00 48 85 c0 75 eb eb a4 <0f> 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 <1>RIP [<ffffffff8126ee84>] blk_requeue_request+0x94/0xa0 <4> RSP <ffff881fd2aa3bc0> And I have this one: <2>kernel BUG at block/blk-core.c:1144! <4>invalid opcode: 0000 [#1] SMP <4>last sysfs file: /sys/devices/system/cpu/online <4>CPU 91 <4>Modules linked in: oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U) hangcheck_timer mptctl mptbase oracleasm(U) nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc ipv6 ext3 jbd dm_round_robin iTCO_wdt iTCO_vendor_support microcode ipmi_devintf serio_raw lpc_ich mfd_core hpilo hpwdt i7core_edac edac_core sg power_meter acpi_ipmi ipmi_si ipmi_msghandler ixgbe dca ptp pps_core mdio be2net bnx2 shpchp ext4 jbd2 mbcache sr_mod cdrom sd_mod lpfc scsi_transport_fc scsi_tgt crc_t10dif pata_acpi ata_generic ata_piix hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] <4> <4>Pid: 3310, comm: scsi_eh_3 Tainted: P --------------- 2.6.32-504.1.3.el6.x86_64 #1 HP ProLiant DL980 G7 <4>RIP: 0010:[<ffffffff8126ee84>] [<ffffffff8126ee84>] blk_requeue_request+0x94/0xa0 <4>RSP: 0018:ffff881fce925d70 EFLAGS: 00010097 <4>RAX: ffff880e536f54a8 RBX: ffff880e536f5380 RCX: ffff880e536f54a8 <4>RDX: ffff880e536f54a8 RSI: ffff880e536f5380 RDI: ffff880e536f5380 <4>RBP: ffff881fce925d90 R08: ffff881fce925e90 R09: 0000000000000000 <4>R10: 0000000000000002 R11: 0000000000000000 R12: ffff881fce419328 <4>R13: 0000000000000000 R14: ffff881fce419328 R15: ffff881fd169b000 <4>FS: 0000000000000000(0000) GS:ffff882070ec0000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>CR2: 00007fa7239e32b8 CR3: 0000006e20455000 CR4: 00000000000007e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process scsi_eh_3 (pid: 3310, threadinfo ffff881fce924000, task ffff881fcefde040) <4>Stack: <4> 0000000000001057 0000000000000286 ffff882ddcefac80 ffff881fce404800 <4><d> ffff881fce925de0 ffffffff8138881b ffff881fce925dd0 ffff881fce925e80 <4><d> ffff881fce404800 ffff882ddcefac80 ffff881fce925e78 ffff881fce925e90 <4>Call Trace: <4> [<ffffffff8138881b>] __scsi_queue_insert+0x9b/0x140 <4> [<ffffffff81388f93>] scsi_queue_insert+0x13/0x20 <4> [<ffffffff81384093>] scsi_eh_flush_done_q+0x93/0x150 <4> [<ffffffff81385e11>] scsi_error_handler+0x3d1/0x7c0 <4> [<ffffffff81385a40>] ? scsi_error_handler+0x0/0x7c0 <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 <4>Code: 00 00 eb d1 4c 8b 2d 1c 3a 96 00 4d 85 ed 74 bf 49 8b 45 00 49 83 c5 08 48 89 de 4c 89 e7 ff d0 49 8b 45 00 48 85 c0 75 eb eb a4 <0f> 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 <1>RIP [<ffffffff8126ee84>] blk_requeue_request+0x94/0xa0 And this one: <2>kernel BUG at block/blk-core.c:1144! <4>invalid opcode: 0000 [#1] SMP <4>last sysfs file: /sys/devices/virtual/net/bond0/carrier <4>CPU 20 <4>Modules linked in: oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U) hangcheck_timer mptctl mptbase oracleasm(U) nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc ipv6 ext3 jbd dm_round_robin iTCO_wdt iTCO_vendor_support microcode ipmi_devintf serio_raw lpc_ich mfd_core hpilo hpwdt i7core_edac edac_core sg power_meter acpi_ipmi ipmi_si ipmi_msghandler ixgbe dca ptp pps_core mdio be2net bnx2 shpchp ext4 jbd2 mbcache sr_mod cdrom sd_mod lpfc scsi_transport_fc scsi_tgt crc_t10dif pata_acpi ata_generic ata_piix hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] <4> <4>Pid: 3346, comm: scsi_eh_5 Tainted: P --------------- 2.6.32-504.1.3.el6.x86_64 #1 HP ProLiant DL980 G7 <4>RIP: 0010:[<ffffffff8126ee84>] [<ffffffff8126ee84>] blk_requeue_request+0x94/0xa0 <4>RSP: 0018:ffff881fd00c9d70 EFLAGS: 00010006 <4>RAX: ffff884de19a44a8 RBX: ffff884de19a4380 RCX: ffff884de19a44a8 <4>RDX: ffff884de19a44a8 RSI: ffff884de19a4380 RDI: ffff884de19a4380 <4>RBP: ffff881fd00c9d90 R08: ffff881fd00c9e90 R09: 0000000000000000 <4>R10: 0000000000000002 R11: 0000000000000000 R12: ffff885fd1dbe778 <4>R13: 0000000000000000 R14: ffff885fd1dbe778 R15: ffff885fd2b50000 <4>FS: 0000000000000000(0000) GS:ffff884070c00000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>CR2: 00007fe796932008 CR3: 0000002e67737000 CR4: 00000000000007e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process scsi_eh_5 (pid: 3346, threadinfo ffff881fd00c8000, task ffff881fd0845500) <4>Stack: <4> 0000000000001057 0000000000000286 ffff884de1f07dc0 ffff885fd3fdc000 <4><d> ffff881fd00c9de0 ffffffff8138881b ffff881fd00c9dd0 ffff881fd00c9e80 <4><d> ffff885fd3fdc000 ffff884de1f07dc0 ffff881fd00c9e78 ffff881fd00c9e90 <4>Call Trace: <4> [<ffffffff8138881b>] __scsi_queue_insert+0x9b/0x140 <4> [<ffffffff81388f93>] scsi_queue_insert+0x13/0x20 <4> [<ffffffff81384093>] scsi_eh_flush_done_q+0x93/0x150 <4> [<ffffffff81385e11>] scsi_error_handler+0x3d1/0x7c0 <4> [<ffffffff81385a40>] ? scsi_error_handler+0x0/0x7c0 <4> [<ffffffff8109e66e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c20a>] child_rip+0xa/0x20 <4> [<ffffffff8109e5d0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c200>] ? child_rip+0x0/0x20 <4>Code: 00 00 eb d1 4c 8b 2d 1c 3a 96 00 4d 85 ed 74 bf 49 8b 45 00 49 83 c5 08 48 89 de 4c 89 e7 ff d0 49 8b 45 00 48 85 c0 75 eb eb a4 <0f> 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 <1>RIP [<ffffffff8126ee84>] blk_requeue_request+0x94/0xa0 <4> RSP <ffff881fd00c9d70>
Hello Roger A couple of quick questions because I have been watching the BZ here and I have fallen behind the current configuration status. What is the current configuration with respect to the settings for eh_deadline and eh_timeout What is the current setting within multipath.conf for dev_loss_tmo etc. What is the current Oracle disk heartbeat timer set to here. You are statistically within a very tight margin here with these very low scsi timeouts where you are now exposed to these timing races. How long is the maximum SCSI error recovery time allowed here before we have evictions. I know you have these very low SCSI timeouts for that reason. Many Thanks Laurence Oberman
we have not change eh* from default. dev_loss_tmo is I believe 30. We are not getting any declared timeouts on the SAN. No scsi errors in general. when we do get scsi errors they are 24 seconds The disk timeout on some of the nodes is as high as 120 seconds with similar values for oracles hb timers. When it hits 120 seconds usually (but not always) oracle is evicting the node. If we set the oracle timeout a few seconds higher we hit the scsi bugs.
Created attachment 1217410 [details] Commands requested through 01557267
"Some of the array vendors have a SAN "jammer". I believe it is basically a SAN analyzer with some sort of license option that allows one to cause a SAN packet to get misplaces. Based on what we think is going on losing a read or a write often enough may be enough to reproduce this error as that is what our cleaning up would have reduced. Not sure if RedHat can borrow the device from an array or switch vendor or if RedHat knows of someone else with the device. One might be able to insert something in the kernel to randomly lose a SAN read or write packet just the attempt to simulate the situation of SAN packet loss because of congestion in the SAN fabric. Thinking about it adding some magic in the kernel may be simpler. The vendor indicated the "license" was close to the cost of the analyzer. Something like the scsi_dh module that has a ability to make a clean SAN dirty and possible has some limited ability to drop packets based on a rule. That may give redhat a better ability to run test against updated/fixed things in the modules to make sure things correctly handle the sort of weirdness one gets in a large SAN when things are sub-optimal.
Roger I have developed such an option that runs via the tcm_qla2xxx target driver that allows SCSI commands to be discarded with some control to a tcm array. I have been chatting with Ewan about possible setting something up to reproduce this. How many discards or drops do you think are needed to reproduce. Any details you can provide will help in my efforts to use my jammer here. Thanks Laurence
Hi Ewan After realizing latest 4.9 is unstable with the jammer patch, I reverted to the one I used when testing prior to upstream submission. Version 4.5.1. I have 11 LIO Target LUNS. I am running 11 direct_io reads to the mpath devices, and 1 oflag write job to an FS on one of the mpaths. I have tried with eh_timeout and scsi timeout at 2s and with defaults of 10s and 30s. I am discarding only NON TUR commands on the jammer and see the I/O stall, then recover. Depending on how long I drop the commands I will get a stall, then it continues with no logging, or I will get the recovery of the adapter with a reset (lpfc) and this is logged. [root@fcoe-test-rhel6 data1]# uname -a Linux fcoe-test-rhel6 2.6.32-504.1.3.el6.x86_64 #1 SMP Fri Oct 31 11:37:10 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux [root@fcoe-test-rhel6 ~]# cat ./set_eh_timeout+scsi_timeout.sh #!/bin/bash for d in /sys/block/sd* do echo 2 > $d/device/eh_timeout echo 2 > $d/device/timeout done With the above tuning and 15s blocks I get the hard sd device I/O errors This will temporarly lose the path on this failure [ 2062.043034] lpfc 0000:08:00.0: 0:(0):3053 lpfc_log_verbose changed from 0 (x0) to 4115 (x1013) [ 2062.091842] lpfc 0000:05:00.0: 1:(0):3053 lpfc_log_verbose changed from 0 (x0) to 4115 (x1013) [ 2218.681126] sd 3:0:0:6: [sdad] Result: hostbyte=DID_REQUEUE driverbyte=DRIVER_OK [ 2218.723973] sd 3:0:0:6: [sdad] CDB: Write(10): 2a 00 00 2b fb a0 00 00 10 00 [ 2218.763554] end_request: I/O error, dev sdad, sector 2882464 [ 2218.795537] sd 3:0:0:6: [sdad] [ 2218.795589] device-mapper: multipath: Failing path 65:208. [ 2218.844046] Result: hostbyte=DID_REQUEUE driverbyte=DRIVER_OK [ 2218.875699] sd 3:0:0:6: [sdad] CDB: Write(10): 2a 00 00 2c 2b a0 00 00 10 00 [ 2218.918841] end_request: I/O error, dev sdad, sector 2894752 [ 2218.950428] sd 3:0:0:6: [sdad] Result: hostbyte=DID_REQUEUE driverbyte=DRIVER_OK [ 2218.991945] sd 3:0:0:6: [sdad] CDB: Write(10): 2a 00 00 2c 99 80 00 00 10 00 [ 2219.036910] end_request: I/O error, dev sdad, sector 2922880 [ 2220.585006] sd 3:0:0:6: alua: port group 00 state A non-preferred supports TOlUSNA However I have been unable to reproduce this panic so far. Thanks Laurence
OK, thanks. Let's let it run for a while and see what we get.
Still busy here. Moved to a dual port 8G LPFC, was only running a single port before because one of the ports was faulty. Thanks Laurence
Hello Roger After many attempts I have not yet had success in reproducing here. I have tried with very short timeouts per below #!/bin/bash for d in /sys/block/sd* do echo 2 > $d/device/eh_timeout echo 2 > $d/device/timeout done I have tried with no eh_deadline or timeout tuning as well, so at defaults and still have been unable to reproduce. I watch the kernel and see the impact of the discards but after a timeout period the stack is recovering. What is clear to me, and I have been down this same road recently with customers running large Oracle RAC configurations, is that you have to tune the css misscount. The oracle default of 27s is simply too low for multipath to reconfigure on path loss etc. Most customers these days are now running with Oracle voting heartbeats at 90s and above. If you have any more details to share that you think would be helpful about your configuration and how this plays out please let me know. My configuration (most recent) Host with 100 LUNS, 4 paths so 400 sd devices accessed via F/C lpfc 4G on a targetLIO aray running my jammer code. host port 1 -> DS5000 switch->targetLIO array host port 2 -> DS5000 switch->targetLIO array Running multibus 3600140570c3b35be40245618bef4e8e9 dm-15 LIO-ORG,block-6 size=20G features='0' hwhandler='1 alua' wp=rw `-+- policy='round-robin 0' prio=50 status=active |- 12:0:1:5 sdz 65:144 active ready running |- 13:0:0:5 sdap 66:144 active ready running |- 12:0:0:5 sdj 8:144 active ready running `- 13:0:1:5 sdbf 67:144 active ready running Jamming is done to a specific array port and commands are discarded. This was done jamming all commands and then only data movement commands. Thanks Laurence and Ewan
Dick, the following patch went into RHEL7.4 as part of the lpfc driver update: commit 8ed2b039a3a7751d5b436b57fcaa5a256b04bcd9 Author: Rob Evers <revers> Date: Tue Jan 31 18:18:39 2017 -0500 [scsi] lpfc: Correct panics with eh_timeout and eh_deadline Message-id: <1485886737-16352-39-git-send-email-revers> Patchwork-id: 164440 O-Subject: [RHEL7.4 e-stor PATCH 38/56] scsi: lpfc: Correct panics with eh_timeout and eh_deadline Bugzilla: 1382101 RH-Acked-by: Ewan Milne <emilne> RH-Acked-by: Maurizio Lombardi <mlombard> RH-Acked-by: Jarod Wilson <jarod> From: James Smart <james.smart> Correct panics with eh_timeout and eh_deadline We were having double completions on our SLI-3 version of adapters. Solved by clearing our command pointer before calling scsi_done. The eh paths potentially ran simulatenously and would see the non-null value and invoke scsi_done again. Signed-off-by: Dick Kennedy <dick.kennedy> Signed-off-by: James Smart <james.smart> Reviewed-by: Johannes Thumshirn <jthumshirn> Reviewed-by: Hannes Reinecke <hare> Signed-off-by: Martin K. Petersen <martin.petersen> (cherry picked from commit 89533e9be08aeda5cdc4600d46c1540c7b440299) Signed-off-by: Rob Evers <revers> Signed-off-by: Rafael Aquini <aquini> Do you think it would be wise to put this into RHEL6? We have seen some cases of duplicate completions with lpfc on RHEL6, especially with short timeouts.
Ewan, I have a critical customer case seeing a similar issue with sli3. 2.6.32-642.6.2.el6.x86_64 Host Template hostt = 0xffffffffa0116740 <lpfc_template_s3>, We will need to get this into RHEL6.8+ crash> bt PID: 0 TASK: ffffffff81a95020 CPU: 0 COMMAND: "swapper" #0 [ffff88009a203a90] machine_kexec at ffffffff8103fdcb #1 [ffff88009a203af0] crash_kexec at ffffffff810d1dc2 #2 [ffff88009a203bc0] oops_end at ffffffff8154d110 #3 [ffff88009a203bf0] die at ffffffff8101102b #4 [ffff88009a203c20] do_trap at ffffffff8154c964 #5 [ffff88009a203c80] do_invalid_op at ffffffff8100cd95 #6 [ffff88009a203d20] invalid_op at ffffffff8100c01b [exception RIP: blk_requeue_request+148] RIP: ffffffff8127c554 RSP: ffff88009a203dd0 RFLAGS: 00010093 RAX: ffff8802ef189a08 RBX: ffff8802ef1898e0 RCX: ffff8802ef189a08 RDX: ffff8802ef189a08 RSI: ffff8802ef1898e0 RDI: ffff8802ef1898e0 RBP: ffff88009a203df0 R8: ffff88200b0a1178 R9: 0000000000000000 R10: ffff88200f64f5e0 R11: 0000000000000000 R12: ffff88200a52cba8 R13: 0000000000000000 R14: ffff88200a52cba8 R15: ffff88200f652000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff88009a203df8] __scsi_queue_insert at ffffffff813a1cdb #8 [ffff88009a203e48] scsi_queue_insert at ffffffff813a2453 #9 [ffff88009a203e58] scsi_softirq_done at ffffffff813a253d #10 [ffff88009a203e88] blk_done_softirq at ffffffff81285a05 #11 [ffff88009a203eb8] __do_softirq at ffffffff81085275 #12 [ffff88009a203f38] call_softirq at ffffffff8100c38c #13 [ffff88009a203f50] do_softirq at ffffffff8100fca5 #14 [ffff88009a203f70] irq_exit at ffffffff81085105 #15 [ffff88009a203f80] do_IRQ at ffffffff81552c65 --- <IRQ stack> --- #16 [ffffffff81a03d68] ret_from_intr at ffffffff8100ba53 [exception RIP: intel_idle+254] RIP: ffffffff812fc0be RSP: ffffffff81a03e18 RFLAGS: 00000206 RAX: 0000000000000000 RBX: ffffffff81a03ea8 RCX: 0000000000000000 RDX: 00000000000002f5 RSI: 0000000000000000 RDI: 00000000000b8d36 RBP: ffffffff8100ba4e R8: 0000000000000005 R9: 0000000000000386 R10: 000149477ed15797 R11: 0000000000000000 R12: ffff88009a211b80 R13: ffffffff81a03da8 R14: ffffffff810b71ec R15: ffffffff81a03d98 ORIG_RAX: ffffffffffffffac CS: 0010 SS: 0018 #17 [ffffffff81a03eb0] cpuidle_idle_call at ffffffff81441d0a #18 [ffffffff81a03ed0] cpu_idle at ffffffff81009fe6 void blk_requeue_request(struct request_queue *, struct request *); crash> request ffff8802ef1898e0 struct request { .. .. q = 0xffff88200a52cba8, .. rq_disk = 0xffff882009f62400, crash> gendisk 0xffff882009f62400 struct gendisk { major = 135, first_minor = 48, minors = 16, disk_name = "sdij\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000ij", crash> dev -d | grep sdij 135 ffff882009f62400 sdij ffff88200a52cba8 1 0 1 0 In vmcore sdij is online 360050768018e02d61800000000000a18 (360050768018e02d61800000000000a18) dm-2 IBM 2145 size=512000.00M features='1 queue_if_no_path' hwhandler=None +- policy='round-robin' `- 1:0:0:0 sdb 8:16 [scsi_device: 0xffff884013303000 sdev_state: SDEV_RUNNING] `- 2:0:0:0 sdrr 134:336 [scsi_device: 0xffff884013276000 sdev_state: SDEV_RUNNING] +- policy='round-robin' `- 1:0:1:0 sdij 135:48 [scsi_device: 0xffff88200a50f000 sdev_state: SDEV_RUNNING] `- 2:0:1:0 sdaaz 133:624 [scsi_device: 0xffff882008ddd800 sdev_state: SDEV_RUNNING] Why did we panic /** * blk_requeue_request - put a request back on queue * @q: request queue where request should be inserted * @rq: request to be inserted * * Description: * Drivers often keep queueing requests until the hardware cannot accept * more, when that condition happens we need to put the request back * on the queue. Must be called with queue lock held. */ void blk_requeue_request(struct request_queue *q, struct request *rq) { blk_delete_timer(rq); blk_clear_rq_complete(rq); trace_block_rq_requeue(q, rq); if (blk_rq_tagged(rq)) blk_queue_end_tag(q, rq); BUG_ON(blk_queued_rq(rq)); ************ UD2 elv_requeue_request(q, rq); } List is empty so we call BUG_ON struct request { queuelist = { next = 0xffff88200a52cba8, prev = 0xffff88200a52cba8 #define blk_queued_rq(rq) (!list_empty(&(rq)->queuelist)) Note (return head->next == head;) Other notes ------------ We had issues on both scsi host1 and scsi host2 We started with offfline devices on host1 and then we timed on on host2 sd 1:0:1:65: Device offlined - not ready after error recovery sd 1:0:1:1: Device offlined - not ready after error recovery sd 1:0:1:1: Device offlined - not ready after error recovery sd 1:0:1:1: Device offlined - not ready after error recovery sd 1:0:1:1: Device offlined - not ready after error recovery sd 1:0:1:1: Device offlined - not ready after error recovery sd 1:0:1:1: Device offlined - not ready after error recovery sd 1:0:1:65: [sdkw] Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK sd 1:0:1:65: [sdkw] CDB: Write(10): 2a 00 15 56 dd b0 00 00 18 00 end_request: I/O error, dev sdkw, sector 358014384 device-mapper: multipath: Failing path 67:320. sd 1:0:1:1: [sdik] Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK sd 1:0:1:1: [sdik] CDB: Read(10): 28 00 00 58 0a 00 00 02 00 00 end_request: I/O error, dev sdik, sector 5769728 sd 1:0:1:1: [sdik] Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK sd 1:0:1:1: [sdik] CDB: Read(10): 28 00 00 58 0e 00 00 02 00 00 end_request: I/O error, dev sdik, sector 5770752 sd 1:0:1:1: [sdik] Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK sd 1:0:1:1: [sdik] CDB: Read(10): 28 00 00 58 08 00 00 02 00 00 end_request: I/O error, dev sdik, sector 5769216 sd 1:0:1:1: [sdik] Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK sd 1:0:1:1: [sdik] CDB: Read(10): 28 00 00 58 0c 00 00 02 00 00 end_request: I/O error, dev sdik, sector 5770240 sd 1:0:1:1: [sdik] Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK sd 1:0:1:1: [sdik] CDB: Read(10): 28 00 00 58 08 00 00 02 00 00 end_request: I/O error, dev sdik, sector 5769216 sd 1:0:1:1: [sdik] Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK sd 1:0:1:1: [sdik] CDB: Read(10): 28 00 00 58 0c 00 00 02 00 00 end_request: I/O error, dev sdik, sector 5770240 device-mapper: multipath: Failing path 135:64. sd 2:0:1:65: Device offlined - not ready after error recovery sd 2:0:1:1: Device offlined - not ready after error recovery sd 2:0:1:1: Device offlined - not ready after error recovery sd 2:0:1:1: Device offlined - not ready after error recovery sd 2:0:1:1: Device offlined - not ready after error recovery sd 2:0:1:1: Device offlined - not ready after error recovery sd 2:0:1:1: Device offlined - not ready after error recovery sd 2:0:1:1: Device offlined - not ready after error recovery sd 2:0:1:65: [sdadm] Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK sd 2:0:1:65: [sdadm] CDB: Write(10): 2a 00 16 86 e8 28 00 00 28 00 end_request: I/O error, dev sdadm, sector 377940008 sd 2:0:1:1: [sdaba] Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK sd 2:0:1:1: [sdaba] CDB: Read(10): 28 00 00 58 08 00 00 02 00 00 end_request: I/O error, dev sdaba, sector 5769216 device-mapper: multipath: Failing path 65:896. sd 2:0:1:1: [sdaba] Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK sd 2:0:1:1: [sdaba] CDB: Read(10): 28 00 00 58 0c 00 00 02 00 00 end_request: I/O error, dev sdaba, sector 5770240 device-mapper: multipath: Failing path 133:640. sd 2:0:1:8: timing out command, waited 7s sd 2:0:1:152: timing out command, waited 7s device-mapper: multipath: Failing path 133:752. device-mapper: multipath: Failing path 70:1008. sd 1:0:1:212: timing out command, waited 7s device-mapper: multipath: Failing path 132:368. sd 2:0:1:169: timing out command, waited 7s device-mapper: multipath: Failing path 128:768. sd 1:0:1:216: timing out command, waited 7s device-mapper: multipath: Failing path 132:432. sd 2:0:1:170: timing out command, waited 7s device-mapper: multipath: Failing path 128:784. sd 1:0:1:11: timing out command, waited 7s sd 1:0:1:158: timing out command, waited 7s sd 1:0:1:172: timing out command, waited 7s device-mapper: multipath: Failing path 135:224. device-mapper: multipath: Failing path 129:272. device-mapper: multipath: Failing path 129:496. sd 2:0:1:209: timing out command, waited 7s device-mapper: multipath: Failing path 134:544. device-mapper: multipath: Failing path 134:688. device-mapper: multipath: Failing path 70:896. device-mapper: multipath: Failing path 70:944. device-mapper: multipath: Failing path 70:880. device-mapper: multipath: Failing path 71:768. device-mapper: multipath: Failing path 71:896. device-mapper: multipath: Failing path 71:928. device-mapper: multipath: Failing path 128:832. device-mapper: multipath: Failing path 128:864. device-mapper: multipath: Failing path 129:784. device-mapper: multipath: Failing path 130:896. sd 1:0:1:124: timing out command, waited 7s sd 1:0:1:200: timing out command, waited 7s device-mapper: multipath: Failing path 70:496. device-mapper: multipath: Failing path 131:432. At the time of the panic host2 was in recovery === HOST DRIVER NAME NAME Scsi_Host shost_data &.hostdata[0] ------------------------------------------------------------------------------------------------------------------------- host2 lpfc ffff88200f64f000 ffff88200b27f000 ffff88200f64f5e0 DRIVER VERSION : 0:11.0.0.4 HOST BUSY : 226 HOST BLOCKED : 0 HOST FAILED : 224 SELF BLOCKED : 0 SHOST STATE : SHOST_RECOVERY MAX LUN : 4096 CMD/LUN : 3 WORK Q NAME : scsi_wq_2
Dick, see comment # 58. This patch should go into RHEL6.10.
(In reply to Ewan D. Milne from comment #67) > Dick, see comment # 58. This patch should go into RHEL6.10. Yes it makes sense to put it in for the case that Laurence reported in comment 60, I am not sure if it will do anything for the original customer because they were running on Lancer. For the sli4 IOs, if you are running lots of IOs and things start to get aborted then the RRQ timer will be started for that Tgt/Lun/XRI combo. While the timer is active that XRI cannot be used by that Tgt/Lun combo. If you have enough IOs in flight and enough that are waiting for RRQ to clear them, then the lpfc_get_scsi_buf could fail to get a usable buffer/XRI and fail the command in queue command? I was thinking about the driver trying to find a buffer when this timeout happens and the abort is sent. The driver should fail the abort because it does not know about the command. But what if the timing is perfect and it finds a buffer right after the abort was sent. Then we are in trouble. how would we resolve this? Is the blk layer holding the shost lock when it processes the queue for lpfc? In this race condition, the abort could beat the IO? The abort_handler is just using the hbalock for synching and the queuecommand is relying that the caller holds the shost lock. The get_scsi_buf is using a get_buf_list lock. Ewan can you verify what locks are held by the mid-layer for issuing an IO and aborting it?
In RHEL6, the host_lock will be held during the call to lpfc_queuecommand because you do not specify .lockless = 1 in the host template. (A few RHEL6 drivers use this, but lpfc does not.) The calls to the various ->eh_ routines, including ->eh_abort_handler(), occur while the SCSI EH thread is running and all other I/O is stopped on RHEL6, so you will not see other ->queuecommand() calls except for the ones that the EH generates (e.g. TEST UNIT READY), but these all occur in the same EH thread and are synchronous, you should not see any concurrency. (RHEL7 is different, the ->eh_abort_handler() calls are made when the command timeout expires. It is possible to disable this by setting .no_async_abort = 1 in the host template, but so far we have not seen any problems that we are sure were caused by this. It remains an option for RHEL7, but will go away in the next major release.) Where you can run into trouble is in the FC transport code, because there are multiple worker threads that can be running, and the different code paths take locks, we have seen problems with that. But I don't think that is the case here. If the driver cannot issue an abort or handle one of the other ->eh functions you should return a failure code and the SCSI EH will escalate. In the end you will get reset if nothing else works.
I am not sure how i can help fix this issue if the command has not reached the lpfc driver? I don't remember seeing a issue like this but we have several io path fixes to correct locking. I would like to point out that the customer is still on a 10.0.x.x fw and the driver is 10.4.x.x, someone should get them up to date or at least in sync. The fw of a 10.x release should have the driver from that release.
Thanks Dick, I relayed the requests to my customer. Will report back any news.
Please be aware, I have added the customer here within the CC for this bug. I passed along the suggestions to and they have some concerns. The customer sited that "if they have specific firmware requirements for a given driver that they *MUST* bring the firmware with the driver and install it on driver load". I believe this is already done with qlogic. Additionally an explanation was requested on the request of syncing the firmware and driver versions. Is there a known error/issue at play?
Dick, could you respond to the concerns in comment #72?
The driver does not load fw on lpfc. We have always used an like lputil and ocmanager to download fw. Most of the HBAs require a reload of the driver to activate the new fw. The management tool: https://www.broadcom.com/support/download-search/?pg=Storage+Adapters,+Controllers,+and+ICs&pf=Fibre+Channel+Host+Bus+Adapters&pn=LPe12002+FC+Host+Bus+Adapter&po=Emulex&pa=&dk= The fw image to download: LPe16000-series firmware and boot code version 10.4.255.23 https://www.broadcom.com/support/download-search/?pg=&pf=Fibre+Channel+Host+Bus+Adapters&pn=LPe16002B+FC+Host+Bus+Adapter&po=Emulex&pa=&dk=
Since we get the driver with RedHat (inbox driver) firmware does not come in at all that way. *IF* there is a firmware dependency in the driver the driver should be informing us of that during startup, or should be loading the firmware itself. Is this you know what firmware will not work, or is this a case that you have not certified the combination? Note we will have firmware/driver combinations all over the place in our environment, and *ONLY* the machines with abnormally high batch loads and SAN's that tend to sometimes lose packets have this crashing issue (3 pairs of database nodes out of >1000 database nodes). So are you saying that this rhel7 bug/patch: [scsi] lpfc: Correct panics with eh_timeout and eh_deadline is not applicable to rhel6.[789]? Because the description of the bug matches exactly what is happening when the machine crashes (timeouts and deadlines are triggering because of SAN packet loss). And we really cannot test this in prod, right now we have mostly stabilized it by increasing the timeouts to 120seconds and rearranging the SAN to reduce the incident of packet loss.
Hello Roger I hope all is well. As far as FW is concerned, yes its harder on Emulex because the FW is loaded from flash. Typically HPE for example, if you were using their driver, would provide the matching driver and firmware but with the inbox its a tougher ask. This fix below is valid for the SL3 adapters Correct panics with eh_timeout and eh_deadline We were having double completions on our SLI-3 version of adapters. Solved by clearing our command pointer before calling scsi_done. The eh paths potentially ran simulatenously and would see the non-null value and invoke scsi_done again. Signed-off-by: Dick Kennedy <dick.kennedy> Signed-off-by: James Smart <james.smart> drivers/scsi/lpfc/lpfc_scsi.c | 6 +++--- drivers/scsi/lpfc/lpfc_sli.c | 12 ++++++++---- 2 files changed, 11 insertions(+), 7 deletions(-) We had other issues as well with eh_deadline no longer being settable on RHEL7 where the echo would fail. Whereas on RHEL6 we have the silent issue where the SLI3 adapters have the eh_host_reset_handler removed in the host template but if you still had it enabled you could fall through to NULL pointer and panic. This is not yet fixed in RHEL6 kernels but is in progress. Another one found by the genius of David Jeffery. Your very low time-outs set prior are the catalyst for this race in this BZ. I think Dick Kennedy was just making a point that the FW is way behind the driver here but we do of course understand this can get out of sync sometimes. " I would like to point out that the customer is still on a 10.0.x.x fw and the driver is 10.4.x.x, someone should get them up to date or at least in sync. The fw of a 10.x release should have the driver from that release. " I don't think the firmware not matching is the root of this very low time-out setting race as described originally by David Jeffery. The fix you called out is valid and needs to get into RHEL6.9 zstream if its possible. Thanks Laurence
We'll include this patch in the RHEL6.10 and RHEL7.5 patchset. Laurie
Will the patches also, or can they be included in the 6.9 zstream and possible even in the 7.3 zstream? I know the Customer here will be pressing hard for it in 6.9 and possible 7.3.
From initial description: Actual results: System crashes when using a very short SCSI timeout Expected results: System should continue to run even if an overly short SCSI timeout is selected. The timeout value is unreasonably low. This problem is not going to be resolved in rhel6.10.
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available. The official life cycle policy can be reviewed here: http://redhat.com/rhel/lifecycle This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL: https://access.redhat.com/