Bug 1300572
| Summary: | SMB: while running dd from multiple cifs mounts with aio enabled ,cancelling the I/O's causes mount point to hang | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | surabhi <sbhaloth> |
| Component: | samba | Assignee: | Sachin Prabhu <sprabhu> |
| Status: | CLOSED DUPLICATE | QA Contact: | Vivek Das <vdas> |
| Severity: | low | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.1 | CC: | amukherj, annair, anoopcs, asriram, ira, lbailey, madam, nlevinki, rcyriac, sanandpa, sankarshan, sprabhu, storage-qa-internal, vdas |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Due to a bug in the Linux CIFS client, SMB2.0+ connections from Linux to Red Hat Gluster Storage did not work properly, but were corrected in kernel-3.10.0-481.el7.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-11-20 10:03:39 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1268895 | ||
|
Description
surabhi
2016-01-21 07:56:06 UTC
continuously seeing the error from Linux cifs client when running io's from cifs mount using vers=3. Feb 8 23:24:02 dhcp46-56 kernel: CIFS VFS: SMB response too long (262224 bytes) Feb 8 23:24:03 dhcp46-56 kernel: CIFS VFS: SMB response too long (262224 bytes) Feb 8 23:24:03 dhcp46-56 kernel: CIFS VFS: Send error in read = -11 Feb 8 23:24:04 dhcp46-56 kernel: CIFS VFS: SMB response too long (524368 bytes) Feb 8 23:24:04 dhcp46-56 kernel: CIFS VFS: Send error in read = -11 Feb 8 23:24:06 dhcp46-56 kernel: CIFS VFS: SMB response too long (524368 bytes) Feb 8 23:24:06 dhcp46-56 kernel: CIFS VFS: Send error in read = -11 Feb 8 23:24:08 dhcp46-56 kernel: CIFS VFS: SMB response too long (1048656 bytes) This is a generic error, it has nothing to do with RHGS alas. "There is a known issue in the RHEL SMB client with SMB vers=2, 2.1 or 3. They will not work properly with RHGS." Your thoughts Michael? The original bug description reads
> Tried the same test with xfs-samba share, dd command stops
> as we try to cancel it.
So is this really a generic bug?
Also, there is no configuration or exact description of the setup.
E.g. the only reference to aio is in the subject (added afterwards).
Somehow the problem is not qualified well enough, to be a known
issue that I could propose a doctext for...
- I assume it is happening with vfs_glusterfs and aio enabled.
- Does it happen with vfs_glusterfs but without aio?
- Does it happen with gluster fuse mount and aio enabled / disabled?
- Does it happen with xfs and aio enabled?
- What is the real constellation of the cifs mounts?
I.e. how many are running? Is 2 enough? Does a single
cifs-mount not have the problem? ...
I could be that this is a but specifically triggered by
the vfs_glusterfs aio and a problem in cifs mount.
So more data please! :-)
1. The aio has been added to the subject because aio is enabled by default for 3.1.2.(Just wanted to bring to notice that aio is enabled when issue is seen) 2.Yes, it happens when aio is enabled and we do multiple cifs mount and run dd on all the mount points.the dd doesn't exits and mount point hangs. 3.If we disable aio , dd exits from all mount points when cancelled. 4.vfs-glusterfs without aio doesn't see hang. 5.with single mount issue is not seen, if there are more than one cifs mount with aio enabled then the issue is seen. Following is the data: without aio : glusterfs-samba share dd exits on cancelling (from single mount as well as multiple mounts) No hung ***************************** without aio: xfs-samba share dd exit on cancelling (from single mount as well as multiple mounts) No hung ****************************** with aio: xfs-samba share dd exits on cancelling (from single mount as well as multiple cifs mounts) ******************************* with aio: glusterfs-samba share dd doesn't exit on cancelling (Always with more than one cifs mount) hung ****************************** The cifs client shows: [1018560.875127] INFO: task dd:30545 blocked for more than 120 seconds. [1018560.877405] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1018560.879689] dd D ffff8800c3bee9f8 0 30545 30352 0x00000080 [1018560.879697] ffff8801ec80fc70 0000000000000086 ffff88021233c500 ffff8801ec80ffd8 [1018560.879703] ffff8801ec80ffd8 ffff8801ec80ffd8 ffff88021233c500 ffff8800c3bee9f0 [1018560.879707] ffff8800c3bee9f4 ffff88021233c500 00000000ffffffff ffff8800c3bee9f8 [1018560.879712] Call Trace: [1018560.879727] [<ffffffff8163b9e9>] schedule_preempt_disabled+0x29/0x70 [1018560.879736] [<ffffffff816396e5>] __mutex_lock_slowpath+0xc5/0x1c0 [1018560.879741] [<ffffffff81638b4f>] mutex_lock+0x1f/0x2f [1018560.879747] [<ffffffff811eb9af>] do_last+0x28f/0x1270 [1018560.879754] [<ffffffff811c11ce>] ? kmem_cache_alloc_trace+0x1ce/0x1f0 [1018560.879759] [<ffffffff811ee672>] path_openat+0xc2/0x490 [1018560.879765] [<ffffffff811efe3b>] do_filp_open+0x4b/0xb0 [1018560.879771] [<ffffffff811fc9c7>] ? __alloc_fd+0xa7/0x130 [1018560.879778] [<ffffffff811dd7e3>] do_sys_open+0xf3/0x1f0 [1018560.879784] [<ffffffff811dd8fe>] SyS_open+0x1e/0x20 [1018560.879791] [<ffffffff81645909>] system_call_fastpath+0x16/0x1b [1018726.314764] SELinux: initialized (dev cifs, type cifs), uses genfs_contexts ***************************************************************** The error mentioned in #C6 happens when we give mount with vers3 and aio enabled:Another BZ https://bugzilla.redhat.com/show_bug.cgi?id=1305657 has been raised for the same. Let me know if you need more data. Ira, Could you provide your RCA w.r.t #C4 which will help everyone. The issue appears to be SMB2+ on the Linux CIFS client can't handle async operations. This really hurts us, because for windows, the async ops make a big speed difference, but they are incompatible with the linux client. For now I'm going to recommend the use of SMB1 for Linux, given that we expect Windows to be the dominant use-case for CIFS/SMB. "Due to a bug within the Linux CIFS client, we do not support the use of SMB2+ with RHGS." , is my suggestion for the known issue text, for now. I will work on seeing if we can countermeasure it, up to the release. It is a bug in Linux, not RHGS. It should clearly be documented that way, because we'll run into it with any Linux CIFS using SMB2+. The wording I chose was intentional, because we'll have this as an issue for a while even after a fix is just a Linux kernel fix. Fixes can take a bit to be applied to clients. Once a fix is found we'll clearly direct people to update their clients, but well... Production is production, and they can't always do that. Thankfully, I don't expect much SMB2+ on Linux in the field, so this should largely be a non-issue. The wording is awkward, is SMB 2.0, 2.1 and 3.0 not usable for ALL clients due to the linux cifs client, or just for the linux cifs client? It should be usable from Windows :). (In reply to Ira Cooper from comment #17) > The wording is awkward, is SMB 2.0, 2.1 and 3.0 not usable for ALL clients > due to the linux cifs client, or just for the linux cifs client? > > It should be usable from Windows :). Right, the wording has to be chosen carefully. It has to make clear that: 1) SMB 2 and newer are generally supported on RHGS. 2) SMB 2+ are NOT supported with the linux cifs client (due to a bug in the cifs client). 3) It is up to the user of the cifs client to ensure that it does not mount with SMB version >= 2. (In order to not impact other clients.) Furthermore: 4) If a cifs client triggers the hang with SMB >= 2, are other clients (Windows...) affected as well? If yes, we should document that, too. Cheers - Michael Other clients should not be impacted. -Ira Try the text above, I think it is a hair awkward, but it is factually correct. |