Bug 247204

Summary:

[PATCH] cifs / smbfs, mount error 20, NAS device only supports smb

Product:

[Fedora] Fedora

Reporter:

Kai Engert (:kaie) (inactive account) <kengert>

Component:

kernel

Assignee:

Jeff Layton <jlayton>

Status:

CLOSED WORKSFORME

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

low

Docs Contact:

Priority:

low

Version:

CC:

abartlet, chris.brown, esandeen, smfrench, steved, triage

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2008-06-17 01:48:37 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
tcpdump output from mount attempt	none
dmesg output from mount attempt	none
dmesg output from working mount (after disabling unix extensions)	none
patch to retry failed Unix Query Path Info with older CIFS style equivalent	none
cifs 1.50c as a patch against fedora 7 kernel rpm	none

Description Kai Engert (:kaie) (inactive account) 2007-07-05 23:02:00 UTC

Description of problem:
I own a network attached storage device.
German vendor, seems quite popular in stores here.
Raidsonic IB-NAS900
It supports "Windows networking".
I am trying to access it using mount.cifs, but I get:

[root@leise mnt]# mount -t cifs '//192.168.2.73/PUBLIC' /mnt/icystor/
Password:
mount error 20 = Not a directory
Refer to the mount.cifs(8) manual page (e.g.man mount.cifs)

No password set at all.

A simple
[root@leise mnt]# smbclient '//192.168.2.73/PUBLIC'
Password:

allows me to access all files on the drive.


I am not a smb insider, but it appears, that cifs might not be a full
replacement for smbfs?

When searching the web for the above error message, I can find reports about
people who used the original fedora RPMs and enabled the missing smbfs pieces,
and report this enables them to access their shares.

For example:
  http://www-user.tu-chemnitz.de/~tott/FC5-smbfs-HOWTO.html
  http://www.tux-planet.fr/blog/?2007/03/02/140-support-smbfs-pour-fedora-core



Version-Release number of selected component (if applicable):
Fedora 5, Fedora 6, Fedora 7, Rawhide


Actual results:
Unable to mount network share.


Additional info:
Because I can't mount it, that device is more or less a scratch toy, so I'm
willing to run all kinds of tests, just let me know how I can help.
I have considered to try the above instructions to confirm it helps for my
device, too.
Do you want me to?

Comment 1 Kai Engert (:kaie) (inactive account) 2007-07-05 23:06:12 UTC

Similar is bug 190006

Comment 2 Andrew Bartlett 2007-07-09 06:32:07 UTC

The long term aim is for the cifs kernel modules to be a replacement for smbfs.
 Steve (CC'ed) should be able to clarify the status. 

BTW, what does smbclient -L show as the server name (often Samba 1.9, if smbfs
is having trouble)?

A network trace (pcap format) might also show what it claims to support, and
help chase this down.

Comment 3 Kai Engert (:kaie) (inactive account) 2007-07-09 16:08:23 UTC

[kaie@kaiez1:~]$ smbclient -L '//192.168.2.73'
Password:
Domain=[ȇ] OS=[] Server=[�]

        Sharename       Type      Comment
        ---------       ----      -------
        PUBLIC          Disk
        IPC$            IPC
Domain=[ȇ] OS=[] Server=[�]

        Server               Comment
        ---------            -------

        Workgroup            Master
        ---------            -------


(yes, smbclient reports one "special/weird" character for Domain and Server).

Comment 4 Kai Engert (:kaie) (inactive account) 2007-07-09 16:25:11 UTC

(In reply to comment #2)
> A network trace (pcap format) might also show what it claims to support, and
> help chase this down. 

Can you please recommend what tool / commandline I should use to produce the
dump you require?

tcpdump  -w /tmp/smblog host 192.168.2.73

strings smblog

D DBDJDCCODBDGDICODCCODHDDCACACACA
 ELE
SMBr
SMBr
SMBs
SMBs
SMBu
SMBu
SMB%
SMB%
SMB%
SMB%
SMB%
SMB%
SMBq
SMBq
D DBDJDCCODBDGDICODCCODHDDCACACACA
 ELE
SMBr
SMBr
SMBs
SMBs
SMBu
SMBu
SMB%
SMB%
SMB%
SMB%
SMBq
SMBq

Comment 5 Steve French 2007-07-09 17:27:44 UTC

Possible problem with access control for that user on the root of the share. 
Perhaps the user is authenticating with a different default user id in the two
different cases.  In any case an ethereal/tcpdump/wireshark binary trace is more
helpful for us to read/debug.

See
http://wiki.samba.org/index.php/Capture_Packets

Alternatively turn on cifs debugging:
"dmesg -c"                              (to clear the error log)
"echo 7 > /proc/fs/cifs/cifsFYI"
attempt the mount then send or attach the dmesg output

Comment 6 Kai Engert (:kaie) (inactive account) 2007-07-09 17:41:49 UTC

on a fedora 7 machine:

[root@intel tmp]# "echo 7 > /proc/fs/cifs/cifsFYI"
-bash: echo 7 > /proc/fs/cifs/cifsFYI: No such file or directory

Comment 7 Kai Engert (:kaie) (inactive account) 2007-07-09 17:42:34 UTC

(In reply to comment #6)
> on a fedora 7 machine:
> 
> [root@intel tmp]# "echo 7 > /proc/fs/cifs/cifsFYI"
> -bash: echo 7 > /proc/fs/cifs/cifsFYI: No such file or directory

Argh, I'm dumb. I removed the " chars then it's ok ;-)

Comment 8 Kai Engert (:kaie) (inactive account) 2007-07-09 17:45:49 UTC

Created attachment 158793 [details]
tcpdump output from mount attempt

Comment 9 Kai Engert (:kaie) (inactive account) 2007-07-09 17:46:21 UTC

Created attachment 158794 [details]
dmesg output from mount attempt

Comment 10 Simo Sorce 2007-07-10 14:00:52 UTC

From Steve French:

Looks like the Linux cifs client is authenticating ok - but I don't
recognize the server type.  The server does not seem to report its
version.  The server is mishandling at least one request (probably
two) but it claims to support the CIFS Unix Extensions (I wonder if it
is a fork of Samba).

In any case, I am suspicious that the server is mishandling the
response to frame 31 (in your trace) but the main problem seems to be
the response to the SetFSUnixInfo (see frame 30) which is malformed
(looks like invalid word count).

Try disabling the Unix Extensions to see if that will work around the
server bug (the client will then not try this SetFSInfo request that
the server seems to incorrectly respond to)

("echo 0 > /proc/fs/cifs/LinuxExtensionsEnabled" then try the mount)

Comment 11 Kai Engert (:kaie) (inactive account) 2007-07-11 17:29:39 UTC

(In reply to comment #10)
> Try disabling the Unix Extensions to see if that will work around the
> server bug (the client will then not try this SetFSInfo request that
> the server seems to incorrectly respond to)
> 
> ("echo 0 > /proc/fs/cifs/LinuxExtensionsEnabled" then try the mount)

Wow, that helped.
Thanks!
I was able to mount and access/read/write files on the device now, wonderful.

I will attach a dmesg debug logfile in case you are curious about the details.

Should we (I) report something to the hardware vendor?

(Should cifs retry without extensions by default?)

Comment 12 Kai Engert (:kaie) (inactive account) 2007-07-11 17:31:32 UTC

Created attachment 158978 [details]
dmesg output from working mount (after disabling unix extensions)

Comment 13 Simo Sorce 2007-09-12 13:38:35 UTC

I am moving this to the kernel as it is obviously a cifs fs issue.

Comment 14 Christopher Brown 2007-09-17 20:36:01 UTC

Hello Kai,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel?

If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.

Cheers
Chris

Comment 15 Kai Engert (:kaie) (inactive account) 2007-09-19 01:23:13 UTC

Christopher, yes, this bug is still happening with latest Fedora-7
kernel-2.6.22.5-76.fc7

This bug is still waiting for an answer to my comment 11.

While a workaround exists (disable linux extensions in order to work around a
bug in the server device), the open question is: 


  If a server behaves as described in this bug report,
  should the cifs code automatically retry without extensions?

Comment 16 Christopher Brown 2007-09-19 08:33:47 UTC

I can't answer that one Kai so I'm re-assigning to the filesystem maintainer who
might be able to comment better.

Cheers
Chris

Comment 17 Jeff Layton 2007-09-19 13:36:44 UTC

Answering Kai's question in comment #11 and #15 about fallback...

First, there is a mount option (nounix) in very recent kernels that should allow
you to forcibly disable unix extensions on a per-mount basis. That patch should
make it into 2.6.23 (I think).

It sounds like the server is just plain misbehaving here. I'd definitely file a
bug report with the vendor. You might want to include network traces and point
them toward the info in this BZ case.

As to falling back if the mount fails...

If the server claims to support unix extensions, I don't see why we should
second guess that and retry the mount without them. Mounts can fail for many
reasons and trying to automatically guess the reason why it failed can be very
difficult. This is the first case I've heard of where a server claimed to
support unix extensions when it actually doesn't. If this problem were more
widespread, I'd be more inclined to consider automatic workarounds for it.

I suggest we close this as NOTABUG...

Comment 18 Kai Engert (:kaie) (inactive account) 2007-09-19 14:46:33 UTC

Jeff, thanks for your explanations.
It's good to hear that we will soon see mount options that would enable the use
of such devices.

But I think we should try hard to be compatible with as many devices as possible
out of the box. The average user will connect the device and see a failure. IMHO
we should target those users. IIRC my device isn't the only one that behaves in
such a way, see the posts I quoted.

If we claim to be compatible with smb devices, and other or older distributions
can access the device out of the box, but our new code can not, then WE appear
broken (even if you can argue we are not).

After all, we're talking about an extensions to the original protocol. If using
the extension fails, we should fall back.

FYI, this strategy is what we use with the SSL/TLS protocols in Firefox. That
protocol is evolving, too, and there are many servers that don't correctly
implement the new extensions. We try hard to fall back and give the users a
pleasant experience without error messages.

Comment 19 Chuck Ebbert 2007-09-19 19:50:22 UTC

(In reply to comment #18)
> Jeff, thanks for your explanations.
> It's good to hear that we will soon see mount options that would enable the use
> of such devices.
> 
> But I think we should try hard to be compatible with as many devices as possible
> out of the box. The average user will connect the device and see a failure. IMHO
> we should target those users. IIRC my device isn't the only one that behaves in
> such a way, see the posts I quoted.
> 
> If we claim to be compatible with smb devices, and other or older distributions
> can access the device out of the box, but our new code can not, then WE appear
> broken (even if you can argue we are not).
> 
> After all, we're talking about an extensions to the original protocol. If using
> the extension fails, we should fall back.
> 
> FYI, this strategy is what we use with the SSL/TLS protocols in Firefox. That
> protocol is evolving, too, and there are many servers that don't correctly
> implement the new extensions. We try hard to fall back and give the users a
> pleasant experience without error messages.
> 


Problem is, falling back might become a security problem for some users if they
are counting on the extensions being enabled.

Comment 20 Jeff Layton 2007-09-19 20:29:25 UTC

Right. There's also the issue of interpreting what the error you get back
actually means. In this case, we got an ENOTDIR, but that's not necessarily a
1:1 correspondence of problem and error. I'm not convinced that we can reliably
detect when the server is reporting that it supports POSIX extensions but doesn't.

There's also the chicken and egg problem. If we automatically work around this,
then most people won't know that their servers are broken this way. At least
with this method, users will know something is wrong...

The safest thing to me seems to be to leave things as they are. The manual
workarounds should work for anyone needing them until these servers are patched.

Eric suggested trying to report a more helpful error when this occurs. That
seems like it might be possible. I'll plan to give that a look and see whether
we can...

Comment 21 Steve French 2007-09-20 14:13:23 UTC

There are more problems here than responding incorrectly to the "Query File Unix
Basic" SMB transact2 call.   Various non-Unix (original SMB/CIFS) have problems
too - e.g. a) SMB TreeConnect does not return the file system type (so we can
not work around this whenever we see a server whose filesystem is of the same
type) b) SMB SessionSetup does not return the server type (NativeOS and Network
Operating System name).  It does not even return the server's domain.   The
response to QueryFSDeviceInfo returns an error, which is in fact
malformed/illegal (0xFFFF0002).  None of these have anything to do with Unix
Extensions.   It does claim to support Unix Extensions (on negotiate protocol
response capabilities) but then returns a bizarre return code 0x00040002 on the
initial SetFSInfo which presumably we don't map - we could try to add this
particular return code (e.g. make up ERR_BROKEN_UNIX_EXTENSIONS) to a new POSIX
return code which we could look for in SetFSInfo.  We certainly should log the
failed SetFSInfo - perhaps we could add more text to the error on line 4458 of
fs/cifs/cifssmb.c
       cERROR(1, ("Send error in SETFSUnixInfo = %d", rc));
or add more text to line 1744 of fs/cifs/connect.c suggesting disabling unix
extensions?

                if (CIFSSMBSetFSUnixInfo(xid, tcon, cap)) {
                        cFYI(1, ("setting capabilities failed"));
                }

In any case, there are multiple problems with this server implementation as can
be seen from the various warning messages it triggered in dmesg on the client,
and these can also be seen from looking at the tcpdump trace.  Not all of these
problems have to do with the Unix extensions.  There may be a workaround
possibility based on the strange return code on SetFSInfo but the other bugs
(non-Unix extensions related) should also be fixed by the server even if we
workaround this

Comment 22 Steve French 2007-09-20 19:58:13 UTC

I added better debug messages for this case (instructing how to turn off Unix
Extensions when this error encountered) - could you try them out and see if they
help?

http://pserver.samba.org/samba/ftp/cifs-cvs/cifs-1.50c.tar.gz

Comment 23 Andrew Bartlett 2007-09-20 22:43:06 UTC

Given this is not a successful negotiation of the Unix extensions, I agree with
Kai that they should not be considered negotiated in the first place.  Just
logging it and asking the admin to fix things seems wrong.

If the server negotiated Unix extensions, then failed to honor them, this would
be a different situation.

Comment 24 Steve French 2007-09-21 01:52:26 UTC

The problem is that various servers supported (only) the original Unix
extensions but not SetFSInfo (Unix SetFSInfo level) and would be
indistinguishable at first from the broken server.

The server has already said:
1) it supports Unix Extensions (capability returned on Negotiate)
2) it supports some optional Unix capabilities (on the QFSInfo)

That it did not support SetFSInfo makes it look like an older (pre-"CIFS POSIX
Extensions") server, like server which simply implements the original CIFS Unix
Extensions ie those documented in the SNIA CIFS Technical Reference and by HP
before that).  We do not want to forbid using the Unix Extensions simply because
SetFSInfo was not supported, that was only added in the last few years, and is
not as important as the QueryFSInfo Unix (which it does support).

But it is after that that we get problems ...

It is not until the first stat (QueryPathInfo Unix for the root directory of the
mount) that we can see a malformed response.

It seems risky to turn off a set of features just because QueryUnixInfo of "."
fails.   Perhaps we could always fall back on a corrupt/malformed Query Unix
Info response to using the non-Unix NT style QueryPathInfo but it seems drastic
to turn off other Unix Info features because of a bug in implementing one.

In any case, it is important for a bug report to be opened against the server (i
t seems to have other SMB problems as I noted).

Comment 25 Steve French 2007-09-21 04:04:00 UTC

Created attachment 201661 [details]
patch to retry failed Unix Query Path Info with older CIFS style equivalent

Here is a patch (on top of cifs version 1.50c referenced above) to retry
UnixQPathInfo with older style Query Path Info.  Could you see if that gets you
past the worst of the problems (it will still log error messages a lot to dmesg
- we will have to make a change to only log those once - but I want to make
sure that something else is not broken too in the Unix Extensions
implementation).

Comment 26 Christopher Brown 2008-01-09 15:50:19 UTC

Can I bug people for an update on this? Is the nounix mount option working for
reporters here and have the error messages been made more .. helpful? Steve -
what is the latest on the patch you attached to this report?

Comment 27 Kai Engert (:kaie) (inactive account) 2008-01-09 16:08:11 UTC

Thanks for the reminder.
I apologize for not yet having recompiled a kernel with this patch.
Let me try to compile the latest Fedora 7 kernel with the attached patch now.

Comment 28 Kai Engert (:kaie) (inactive account) 2008-01-09 16:35:36 UTC

I downloaded cifs 1.50c from the link in comment 22.
Then I tried to apply the attached patch.
Hunk 2 applied.

There is no context that completely matches Hunk 1, because I can't find 
  cERROR(1, ("Malformed FILE_UNIX_BASIC_INFO response.\n"
  ...
anywhere in the file.

But the second part of the context
      } else {
              __u16 data_offset = le16_to_cpu(pSMBr->t2.DataOffset);
              memcpy((char *) pFindData,
is present in the file only once,
so I think I know where I must change -EIO to -EOPNOTSUPP

Comment 29 Kai Engert (:kaie) (inactive account) 2008-01-10 10:50:19 UTC

It confuses me that I'm getting compiler errors when trying to recompile the
latest Fedora 7 kernel SRPM on a Fedora 7 system...

drivers/net/wireless/ipw2200.c:1236:23: error: bad constant expression
drivers/net/wireless/ath5k/base.c:288:38: error: marked inline, but without a
definition
drivers/net/wireless/ath5k/base.c:253:36: error: marked inline, but without a
definition
drivers/net/wireless/ath5k/base.c:253:36: error: marked inline, but without a
definition
drivers/net/wireless/ath5k/base.c:253:36: error: marked inline, but without a
definition
drivers/net/wireless/ath5k/base.c:281:35: error: marked inline, but without a
definition
drivers/net/wireless/ath5k/base.c:288:38: error: marked inline, but without a
definition


It seems strange, but the above compiler errors seem to be a consequence of
cifs-1.50c

I can successfully rebuild the unchanged kernel source rpm.

When I try to build a rpm that has the cifs-1.50c added as a patch, please a
small change to kernel.spec to apply that patch, then I get above compiler errors...

Comment 30 Kai Engert (:kaie) (inactive account) 2008-01-10 10:51:40 UTC

Created attachment 291263 [details]
cifs 1.50c as a patch against fedora 7 kernel rpm

this is the patch I used, that result in the compiler errors mentioned in the
previous comment.

Comment 31 Kai Engert (:kaie) (inactive account) 2008-01-10 10:53:06 UTC

> When I try to build a rpm that has the cifs-1.50c added as a patch, please a
> small change to kernel.spec to apply that patch, then I get above compiler
errors...


s/please a/plus a/

Comment 32 Kai Engert (:kaie) (inactive account) 2008-04-22 15:53:12 UTC

I would like to donate the NAS hardware and free shipping to someone who would
be interested to work on this and test it.

Unfortunately I've been unable to get it working for testing, see above. I think
it would be simpler if coding and testing can be done at the same time.

Who wants it? Steve?

Comment 33 Bug Zapper 2008-05-14 13:25:47 UTC

This message is a reminder that Fedora 7 is nearing the end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 7. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '7'.

Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 7's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 7 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug. If you are unable to change the version, please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. If possible, it is recommended that you try the newest available Fedora distribution to see if your bug still exists.

Please read the Release Notes for the newest Fedora distribution to make sure it will meet your needs:
http://docs.fedoraproject.org/release-notes/

The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 34 Bug Zapper 2008-06-17 01:48:35 UTC

Fedora 7 changed to end-of-life (EOL) status on June 13, 2008. 
Fedora 7 is no longer maintained, which means that it will not 
receive any further security or bug fix updates. As a result we 
are closing this bug. 

If you can reproduce this bug against a currently maintained version 
of Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 35 Kai Engert (:kaie) (inactive account) 2009-06-22 12:23:14 UTC

More than one year later, I have now repeated my attempts to use this hardware with Fedora 10.

I'm happy to report that it's working now, when using -o nosfu,nounix

Thanks a lot!
Regards,
Kai