Description of problem: Without any "slow request" information in Ceph cluster side, from the Vmware side, we are still found like this "Access Lost" information from monitor tab. We use the single Path And single ISCSI GateWay node to test. For example: From the vmware vmkernel.log, we found the below messages: 2018-04-22T22:26:09.748Z cpu5:66599)NMP: nmp_ThrottleLogForDevice:3617: Cmd 0x28 (0x439541441500, 0) to dev "naa.600140551aee71664944242a4503557d" on path "vmhba64:C1:T0:L32 " Failed: H:0x0 D:0x28 P:0x0 Invalid sense data: 0x0 0x0 0x0. Act:NONE 2018-04-22T22:26:09.748Z cpu5:66599)ScsiDeviceIO: 2927: Cmd(0x439541441500) 0x28, CmdSN 0x5631 from world 0 to dev "naa.600140551aee71664944242a4503557d" failed H:0x0 D:0x28 P:0x0 Invalid sense data: 0x76 0x6d 0x68. 2018-04-22T22:26:09.772Z cpu0:65662)HBX: 283: 'CEPH-LUN32': HB at offset 3276800 - Reclaimed heartbeat [Timeout]: 2018-04-22T22:26:09.772Z cpu0:65662) [HB state abcdef02 offset 3276800 gen 225 stampUS 38117571952 uuid 5adc77ba-8cc0f028-bcdd-2c55d3d66700 jrnl <FB 41943053> drv 24.81 loc kImpl 4 ip 10.2.226.137] 2018-04-22T22:26:09.777Z cpu5:66599)ScsiDeviceIO: 2927: Cmd(0x439d46e1c740) 0x28, CmdSN 0xf3 from world 70149 to dev "naa.600140551aee71664944242a4503557d" failed H:0x0 D:0x 28 P:0x0 Invalid sense data: 0x0 0x0 0x0. From the gw messages log, I found many ABORT_TASK informaton. For example: Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399889 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33399889 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399893 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33399893 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399894 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33399894 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399895 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33399895 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399896 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33399896 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399897 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33399897 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399898 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33399898 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399899 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33399899 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399900 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33399900 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399909 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33399909 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399910 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33399910 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399911 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33399911 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399912 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33399912 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399913 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33399913 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399914 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33399914 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399915 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33399915 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399916 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33399916 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399925 Version-Release number of selected component (if applicable): How reproducible: 100%reproduced Steps to Reproduce: 1. 2. 3. Actual results: Path lost Access with unkown reason. Expected results: Find the Path lost Access reason. Additional info: [collab-shell] https://gitlab.cee.redhat.com/gss-tools/collab-shell # ssh your_kerb.redhat.com # cd /cases/02083677
Did you only look for the "slow request" message in the logs? Did you get any of the other debugging info we talked about? I did not see any type of perf dump type of info in /cases/02083677?
On a side note that may actually make the problem worse you are getting these errors: failed H:0x0 D:0x28 because the runner ring is too small. If you wanted to avoid them, on the iscsi target systems open /etc/ceph/iscsi-gateway.cfg and add: [target] max_data_area_mb = N Here N should be big enough to handle all the commands the initiators are going to send. It is only 8MB by default. It would normally be something like MaxIoSizeKB * min(64, esx device queue depth) * connected initiators. You can get the Io size limit from esxcli system settings advanced list -o /ISCSI/MaxIoSizeKB This assumes that the ceph cluster can handle that many IOs. So if like in that email thread we were thinking we might be over queueing then increasing this limit might cause more problems. Just a FYI, for when we get that fixed.
Just a quick question, how can we get the value of esx device queue depth?
(In reply to Jiaju Zhang from comment #10) > Just a quick question, how can we get the value of esx device queue depth? Ignore that, I find it.
I am not going to change the component yet, but I think this might be a kernel bug. It is just too suspicious that a lot of times right when we get an ABORT the cmd completes the same second so we are able to then complete the ABORT. Note that there do seem to be other errors that are probably slow downs in the cluster like when we see: tcmu_rbd_handle_timedout_cmd or when we see the ABORT timeout like here: WARNING: iscsi_vmk: iscsivmk_TaskMgmtIssue: vmhba64:CH:1 T:0 L:32 : Task mgmt "Abort Task" with itt=0x2423b22 (refITT=0x2423a85) timed out. In those cases commands look like they are stuck in different layers like ceph. Can you guys: 1. Answer the question about if you got the perf data. 2. Confirm that the times for the ESX and target boxes are in sync? It seems they might be off because there are times when the ESX log is noisy due to aborts, but the target logs are not showing anything. 3. What is the value of /VMFS3/UseATSForHBOnVMFS5 on each machine? 4. What is the value of scsi.LogAborts? 5. Did you increase the vmkernel.log size or are you guys cutting it off now?
(In reply to Mike Christie from comment #9) > /etc/ceph/iscsi-gateway.cfg > > and add: > > [target] > max_data_area_mb = N > Oh yeah, to make this take hold you need to do: systemctl stop rbd-target-gw systemctl start rbd-target-gw
(In reply to Mike Christie from comment #13) > (In reply to Mike Christie from comment #9) > > /etc/ceph/iscsi-gateway.cfg > > > > and add: > > > > [target] > > max_data_area_mb = N > > > > Oh yeah, to make this take hold you need to do: > > systemctl stop rbd-target-gw > systemctl start rbd-target-gw Correct me if I'm wrong here, but I don't think ceph-iscsi-config supports the 'max_data_area_mb' override. That's included as part of BZ 1562388.
Upstream doesn't support it, but we added the patch, 0016-ceph-iscsi-config-add-max_data_area-setting.patch, thinking that was how we were going to do it upstream. We later decided to make rtslib's constructor more generic so we would not have to update rtslib for every new setting, and Paul requested that we do per device settings. So we are now doing what Venky is doing in that bz.
liuwei or Jiaju Zhang, Have you guys done system tap before? For this case where we are seeing ABORTs complete immediately like here: Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399889 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33399889 I basically just want the transport_state, state_active, cmd_kref.refcount and scsi_status values of the se_cmd when core_tmr_abort_task calls __target_check_io_state and when it calls transport_wait_for_tasks I think the tcmu queue/ring is backed up because commands are taking longer than they should on the ceph cluster. This leads to commands getting failed with task set full and retried immediately in a sort of busy loop. An abort is eventually sent and that completes quickly because it is retried so quickly. We can confirm this by dumping the se_cmd fields when that function is run.
Created attachment 1428985 [details] Systemtap script based on kernel-3.10.0-862.el7 source
I've attached a stap script that should work based on the kernel-3.10.0-862.el7 but I do not have a test system and can not justify the time or knowledge acquisition required to set one up given this is not a core RADOS issue and I am here by invitation only :) If you need to run this on a different kernel you'll need to verify the line numbers for the function call sites. The script probably needs some tweaking to get working properly and I would be happy to do the testing if someone can provide the environment. The easiest way to set up the system to run the script from a purely systemtap perspective is by using the stap-prep command.