Hide Forgot
Description of problem: Netapp filer shares respond to cifs requests with malformed packets that do not abide by the RC10001 protocol, which causes the filesystem to hang. It appears to be a problem in the smb header length. Version-Release number of selected component (if applicable): kernel-2.6.9-89.ELsmp How reproducible: Only occurs under heavy load. Steps to Reproduce: 1. Increase load on the machine 2. Execute an ls of the filesystem # ls –lR /opt/filestore/shared Actual results: Command hangs with the following messages in /var/log/messages Dec 20 11:50:10 inet879 kernel: CIFS VFS: RFC1001 size 210 bigger than SMB for Mid=54074 Dec 20 11:50:32 inet879 kernel: CIFS VFS: server not responding Dec 20 11:50:32 inet879 kernel: CIFS VFS: No response for cmd 50 mid 54074 Dec 20 11:50:38 inet879 kernel: CIFS VFS: RFC1001 size 210 bigger than SMB for Mid=54078 Dec 20 11:51:02 inet879 kernel: CIFS VFS: server not responding Dec 20 11:51:02 inet879 kernel: CIFS VFS: No response for cmd 50 mid 54078 Dec 20 11:51:06 inet879 kernel: CIFS VFS: RFC1001 size 210 bigger than SMB for Mid=54082 Dec 20 11:51:32 inet879 kernel: CIFS VFS: server not responding Expected results: ls is able to display the files in the directory Additional info: Upstream Samba Bugzilla: cifs hangs on Netapp DFS shares https://bugzilla.samba.org/show_bug.cgi?id=7860 There appears to be a patch from the upstream bug, but i believe this is more of a problem on the NetApp side as the storage is sending malformed packets back to the rhel server. the kernel sees the package has extra garbage on the end and drops the packet.
Yep, known problem. It's really NetApp's bug (and a rather nasty one too -- wonder if there's anything interesting in that extra junk?). EMC also had a similar bug a few years ago but they fixed theirs... The checks in CIFS are too strict though. There's no real reason for us to drop packets on the floor just because the server tacked some extra stuff on the end. We should just ignore that part. When and if the upstream maintainer takes this patch we can consider putting it into RHEL, but not sure if it's appropriate for RHEL4 at this point since it'll be in maintenance mode soon.
It turns out that I was wrong in my initial analysis of the packets coming from the Netapp. The problem there is that the SMB packet has lengths that go beyond the end of the RFC1001 frame. The patch I had proposed upstream is also wrong and I've self-nak'ed it there. It's unlikely we'll be able to easily make CIFS work with this server. One possibility that Steve F. suggested was to possibly try and "fix up" the lengths in the packet when they are wrong like this. We know where the RFC1001 container ends, so we could fudge those lengths so that they stay within it. This would have to be done after checking the signature on the packet if signing is enabled however, which means overhauling how signature checks are actually handled... In any case, this is really too much for RHEL4, particularly when the real problem is server-side. If you want to reopen this request against RHEL6, then that might be reasonable.