Bug 1571129
| Summary: | Without any "slow request" information in Ceph cluster side, from the Vmware side, we are still found like this "Access Lost" information from monitor tab. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | liuwei <wliu> | ||||
| Component: | iSCSI | Assignee: | Mike Christie <mchristi> | ||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Tejas <tchandra> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.0 | CC: | bhubbard, ceph-eng-bugs, ceph-qe-bugs, jdillama, jiajzhan | ||||
| Target Milestone: | rc | ||||||
| Target Release: | 3.* | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2019-02-16 22:03:31 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
liuwei
2018-04-24 07:21:25 UTC
Did you only look for the "slow request" message in the logs? Did you get any of the other debugging info we talked about? I did not see any type of perf dump type of info in /cases/02083677? On a side note that may actually make the problem worse you are getting these errors: failed H:0x0 D:0x28 because the runner ring is too small. If you wanted to avoid them, on the iscsi target systems open /etc/ceph/iscsi-gateway.cfg and add: [target] max_data_area_mb = N Here N should be big enough to handle all the commands the initiators are going to send. It is only 8MB by default. It would normally be something like MaxIoSizeKB * min(64, esx device queue depth) * connected initiators. You can get the Io size limit from esxcli system settings advanced list -o /ISCSI/MaxIoSizeKB This assumes that the ceph cluster can handle that many IOs. So if like in that email thread we were thinking we might be over queueing then increasing this limit might cause more problems. Just a FYI, for when we get that fixed. Just a quick question, how can we get the value of esx device queue depth? (In reply to Jiaju Zhang from comment #10) > Just a quick question, how can we get the value of esx device queue depth? Ignore that, I find it. I am not going to change the component yet, but I think this might be a kernel bug. It is just too suspicious that a lot of times right when we get an ABORT the cmd completes the same second so we are able to then complete the ABORT. Note that there do seem to be other errors that are probably slow downs in the cluster like when we see: tcmu_rbd_handle_timedout_cmd or when we see the ABORT timeout like here: WARNING: iscsi_vmk: iscsivmk_TaskMgmtIssue: vmhba64:CH:1 T:0 L:32 : Task mgmt "Abort Task" with itt=0x2423b22 (refITT=0x2423a85) timed out. In those cases commands look like they are stuck in different layers like ceph. Can you guys: 1. Answer the question about if you got the perf data. 2. Confirm that the times for the ESX and target boxes are in sync? It seems they might be off because there are times when the ESX log is noisy due to aborts, but the target logs are not showing anything. 3. What is the value of /VMFS3/UseATSForHBOnVMFS5 on each machine? 4. What is the value of scsi.LogAborts? 5. Did you increase the vmkernel.log size or are you guys cutting it off now? (In reply to Mike Christie from comment #9) > /etc/ceph/iscsi-gateway.cfg > > and add: > > [target] > max_data_area_mb = N > Oh yeah, to make this take hold you need to do: systemctl stop rbd-target-gw systemctl start rbd-target-gw (In reply to Mike Christie from comment #13) > (In reply to Mike Christie from comment #9) > > /etc/ceph/iscsi-gateway.cfg > > > > and add: > > > > [target] > > max_data_area_mb = N > > > > Oh yeah, to make this take hold you need to do: > > systemctl stop rbd-target-gw > systemctl start rbd-target-gw Correct me if I'm wrong here, but I don't think ceph-iscsi-config supports the 'max_data_area_mb' override. That's included as part of BZ 1562388. Upstream doesn't support it, but we added the patch, 0016-ceph-iscsi-config-add-max_data_area-setting.patch, thinking that was how we were going to do it upstream. We later decided to make rtslib's constructor more generic so we would not have to update rtslib for every new setting, and Paul requested that we do per device settings. So we are now doing what Venky is doing in that bz. liuwei or Jiaju Zhang, Have you guys done system tap before? For this case where we are seeing ABORTs complete immediately like here: Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Found referenced iSCSI task_tag: 33399889 Apr 22 22:26:27 osd1 kernel: ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33399889 I basically just want the transport_state, state_active, cmd_kref.refcount and scsi_status values of the se_cmd when core_tmr_abort_task calls __target_check_io_state and when it calls transport_wait_for_tasks I think the tcmu queue/ring is backed up because commands are taking longer than they should on the ceph cluster. This leads to commands getting failed with task set full and retried immediately in a sort of busy loop. An abort is eventually sent and that completes quickly because it is retried so quickly. We can confirm this by dumping the se_cmd fields when that function is run. Created attachment 1428985 [details]
Systemtap script based on kernel-3.10.0-862.el7 source
I've attached a stap script that should work based on the kernel-3.10.0-862.el7 but I do not have a test system and can not justify the time or knowledge acquisition required to set one up given this is not a core RADOS issue and I am here by invitation only :) If you need to run this on a different kernel you'll need to verify the line numbers for the function call sites. The script probably needs some tweaking to get working properly and I would be happy to do the testing if someone can provide the environment. The easiest way to set up the system to run the script from a purely systemtap perspective is by using the stap-prep command. |