Bug 670858

Summary: cifs hangs on Netapp shares
Product: Red Hat Enterprise Linux 4 Reporter: Jack Waterworth <jwaterwo>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 4.8CC: bfields, dhowells, jlayton, rwheeler, steved
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-26 12:25:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Jack Waterworth 2011-01-19 14:51:02 UTC
Description of problem:
Netapp filer shares respond to cifs requests with malformed packets that do not abide by the RC10001 protocol, which causes the filesystem to hang. It appears to be a problem in the smb header length.

Version-Release number of selected component (if applicable):
kernel-2.6.9-89.ELsmp


How reproducible:
Only occurs under heavy load.

Steps to Reproduce:

1. Increase load on the machine
2. Execute an ls of the filesystem
# ls –lR /opt/filestore/shared 
  
Actual results:
Command hangs with the following messages in /var/log/messages

Dec 20 11:50:10 inet879 kernel:  CIFS VFS: RFC1001 size 210 bigger than SMB for Mid=54074
Dec 20 11:50:32 inet879 kernel:  CIFS VFS: server not responding
Dec 20 11:50:32 inet879 kernel:  CIFS VFS: No response for cmd 50 mid 54074
Dec 20 11:50:38 inet879 kernel:  CIFS VFS: RFC1001 size 210 bigger than SMB for Mid=54078
Dec 20 11:51:02 inet879 kernel:  CIFS VFS: server not responding
Dec 20 11:51:02 inet879 kernel:  CIFS VFS: No response for cmd 50 mid 54078
Dec 20 11:51:06 inet879 kernel:  CIFS VFS: RFC1001 size 210 bigger than SMB for Mid=54082
Dec 20 11:51:32 inet879 kernel:  CIFS VFS: server not responding


Expected results:
ls is able to display the files in the directory


Additional info:

Upstream Samba Bugzilla:
cifs hangs on Netapp DFS shares
https://bugzilla.samba.org/show_bug.cgi?id=7860

There appears to be a patch from the upstream bug, but i believe this is more of a problem on the NetApp side as the storage is sending malformed packets back to the rhel server. the kernel sees the package has extra garbage on the end and drops the packet.

Comment 1 Jeff Layton 2011-01-19 16:37:04 UTC
Yep, known problem. It's really NetApp's bug (and a rather nasty one too -- wonder if there's anything interesting in that extra junk?). EMC also had a similar bug a few years ago but they fixed theirs...

The checks in CIFS are too strict though. There's no real reason for us to drop packets on the floor just because the server tacked some extra stuff on the end. We should just ignore that part.

When and if the upstream maintainer takes this patch we can consider putting it into RHEL, but not sure if it's appropriate for RHEL4 at this point since it'll be in maintenance mode soon.

Comment 2 Jeff Layton 2011-01-26 12:25:09 UTC
It turns out that I was wrong in my initial analysis of the packets coming from the Netapp. The problem there is that the SMB packet has lengths that go beyond the end of the RFC1001 frame.

The patch I had proposed upstream is also wrong and I've self-nak'ed it there. It's unlikely we'll be able to easily make CIFS work with this server.

One possibility that Steve F. suggested was to possibly try and "fix up" the lengths in the packet when they are wrong like this. We know where the RFC1001 container ends, so we could fudge those lengths so that they stay within it.

This would have to be done after checking the signature on the packet if signing is enabled however, which means overhauling how signature checks are actually handled...

In any case, this is really too much for RHEL4, particularly when the real problem is server-side. If you want to reopen this request against RHEL6, then that might be reasonable.