Bug 1500757
Summary: | [iSCSI]: conn errors and IO errors seen when deep-scrub is in progress | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Tejas <tchandra> | ||||||
Component: | iSCSI | Assignee: | Mike Christie <mchristi> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Tejas <tchandra> | ||||||
Severity: | urgent | Docs Contact: | Erin Donnelly <edonnell> | ||||||
Priority: | high | ||||||||
Version: | 3.0 | CC: | bniver, ceph-eng-bugs, ceph-qe-bugs, edonnell, hnallurv, jdillama, mchristi, tchandra | ||||||
Target Milestone: | rc | ||||||||
Target Release: | 4.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | python-rtslib-2.1.fb64-2.el7cp ceph-iscsi-config-2.3-12.el7cp | Doc Type: | Known Issue | ||||||
Doc Text: |
.An iSCSI initiator can send more than `max_data_area_mb` worth of data when a Ceph cluster is under heavy load causing a temporary performance drop
When a Ceph cluster is under heavy load, an iSCSI initiator might send more data than specified by the `max_data_area_mb` parameter. Once the `max_data_area_mb` limit has been reached, the `target_core_user` module returns queue full statuses for commands. The initiators might not fairly retry these commands and they can hit initiator side time outs and be failed in the multipath layer. The multipath layer will retry the commands on another path while other commands are still being executed on the original path. This causes a temporary performance drop, and in some extreme cases in Linux environment the `multipathd` daemon can terminate unexpectedly.
If the `multipathd` daemon crashes, restart it manually:
----
# systemctl restart multipathd
----
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2019-02-27 20:42:02 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1480434 | ||||||||
Bug Blocks: | 1494421 | ||||||||
Attachments: |
|
Description
Tejas
2017-10-11 12:16:46 UTC
@Tejas: did Windows report any (IO) errors when this was occurring? Given the size and performance of your cluster, I wouldn't be surprised that things were timing out on the iSCSI side when the OSDs were being thrashed. Yes I could see a few thousand IO errors on IOmeter. I need to check the event log in more detail. ESX reported quite a few errors in vmkernel.log @Tejas: couple quick things: 1) It looks like ceph-ansible didn't configure any librbd logging so I added the following those hosts' ceph.conf file: [client] log file = /var/log/ceph/$name.$pid.log admin socket = /var/run/ceph/$name.$pid.asok I think that we should open a BZ against ceph-ansible to ensure this is properly configured for gateway roles. 2) Your multipath.conf on buckeye was incorrect. It should be the following: devices { device { vendor "LIO-ORG" hardware_handler "1 alua" path_grouping_policy "failover" path_selector "queue-length 0" failback 60 path_checker tur prio alua prio_args exclusive_pref_bit fast_io_fail_tmo 25 no_path_retry queue } } Hey Tejas, I just wanted to confirm the initial failure time of the test for the bz. The messages in the initial bz description look like they start around Oct 11 13:*. However, earlier at Oct 10 18:16* there is an error which should never happen where a memory allocation failed. If they are separate issues, do you by any chance know what test was run at around 18:16 on Oct 10th? Also, are you using dmesg to dump the log messages you pasted here? If so it might be easier in the future to use "dmesg -T" because you get the human readable output and if the system is rebooted/crashes we can later figure out when the message occurred. Tejas, It looks like the linux initiator errors are caused by a lio/tcmu bug. I will have a new with a fix for this by tomorrow for you. For the linux initiator connection errors, the qfull changes will fix the problem. I have not figured out the reason for the kmalloc failures yet. Here is the bz for the kernel changes https://bugzilla.redhat.com/show_bug.cgi?id=1480434 Created attachment 1341832 [details]
disable qfull waiting
This disables the qfull timeout for downstream. The patch is downstream only.
For upstream, we have a internal queue but in RHEL based kernels the iscsi target recvs and tcmu waits for ring space from the same context. This leads to lun access blocking on one bad device and/or iscsi requests like pings timing out like we see in the log in this bz.
Another patch for ceph-iscsi-config that will be needed is: https://github.com/ceph/ceph-iscsi-config/pull/31 The patch in comment #11 will prevent initiator side timeouts when the ring is temporarily full. The patch in this comment will prevent us from hitting qfull retries too often for smaller/medium sized workloads or at least allow the user to control how much memory they want to devote to preventing qfull retries. Created attachment 1343359 [details]
kernel: make qfull timeout settable
Yeah, here is the patch I am going to post to rh-kernel. It works like the latest upstream posting where 0 means do not wait.
|