From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.3) Gecko/20060426 Firefox/1.5.0.3 Description of problem: Hello, We have two Fedora 5 Servers clustered with GFS. We installed samba and exported the same shares in both of them. All went fine at first, with people accessing to theirs own files and so, but for some programs (minitab, matlab, ...) people need to access the same file at once. Then samba begins to fail and clients hang. In order to fix samba is necessary to restart the service. We've tried to put the shares in a filesystem without GFS and all goes well, people can access the same file without problems simultaneously. Is a weird behaviour because the shares are exported from the two servers, but we really only access files simoultaneuosly using the first server, the other server exports the shares too but isn't used by that clients. I don't know how to debug this problem to see what is happening. It seems something related to GFS and Samba. I have seen mails of people with samba+GFS problems, but we aren't using the same configuration, and the GFS rpm are updated: GFS-6.1.5-0.FC5.1 GFS-kernel-2.6.15.1-5.FC5.32 Any help will be greatly apreciated. Thanks, Version-Release number of selected component (if applicable): GFS-6.1.5-0.FC5.1 How reproducible: Always Steps to Reproduce: 1.Log in four/five Windows XP client 2.Try execute simoultaneously the minitab program. 3.Samba and PC hangs. Minitab program is in one GFS share, but if we put it in a share without GFS all goes well. Actual Results: Samba Hangs Minitab hangs also. Expected Results: The file is accessed without problems Additional info: With other programs (matlab) the same is happening
I attach more information (strace) about samba hangs. It seems that is a GFS (not samba) exclusivelly problem, and happens when 4 or more users access to same information simoultaneously. I proved "strace -f -ttT -o /tmp/smbd.out -p <smbd-pid>" to guess what's happenning, and it seems that system calls like write,open,flock, never finish until samba is restarted. 4665 11:09:31.068381 kill(4666, SIG_0 <unfinished ...> 4665 11:09:31.068750 <... kill resumed> ) = -1 EPERM (Operation not permitted) <0.000310> 4665 11:09:31.068996 kill(4665, SIG_0 <unfinished ...> 4665 11:09:31.069260 <... kill resumed> ) = 0 <0.000205> 4665 11:09:31.069458 kill(4667, SIG_0 <unfinished ...> 4665 11:09:31.069617 <... kill resumed> ) = 0 <0.000099> 4665 11:09:31.069781 open("cint95-intel.mtw", O_RDONLY|O_LARGEFILE <unfinished ...> 4665 11:09:31.070150 <... open resumed> ) = 22 <0.000293> 4665 11:09:31.070396 geteuid32( <unfinished ...> 4665 11:09:31.070649 <... geteuid32 resumed> ) = 503 <0.000195> 4665 11:09:31.070937 write(19, "prova03 opened file cint95-intel"..., 67 <unfinished ...> 4665 11:09:31.071282 <... write resumed> ) = 67 <0.000261> 4665 11:09:31.071511 flock(22, 0x60 /* LOCK_??? */ <unfinished ...> 4665 11:09:31.071770 <... flock resumed> ) = 0 <0.000197> 4665 11:09:31.072127 write(5, "\0\0\0g\377SMB\242\0\0\0\0\210\1\310\0\0\0\0\0\0\0\0\0"..., 107 <unfinished ...> 4665 11:09:31.072447 <... write resumed> ) = 107 <0.000212> ..................................................................... 4665 11:09:31.242316 <... geteuid32 resumed> ) = 503 <0.000118> 4665 11:09:31.242405 write(19, "close fd=22 fnum=6371 (numopen=2"..., 34) = 34 <0.000031> 4665 11:09:31.242572 nanosleep({0, 2000001}, <unfinished ...> 4667 11:09:31.245063 kill(4665, SIG_0) = 0 <0.000018> 4665 11:09:31.248047 <... nanosleep resumed> NULL) = 0 <0.005406> 4665 11:09:31.249355 nanosleep({0, 2000001}, NULL) = 0 <0.002621> 4665 11:09:31.252091 nanosleep({0, 2000001}, NULL) = 0 <0.003853> 4665 11:09:31.256088 nanosleep({0, 2000001}, NULL) = 0 <0.003906> .................. a lot of nanosleeps .............................. 4665 11:10:04.887037 nanosleep({0, 2000001}, <unfinished ...> 4665 11:10:04.887219 <... nanosleep resumed> 0) = ? ERESTART_RESTARTBLOCK (To be restarted) <0.000111> 4665 11:10:04.888197 +++ killed by SIGKILL +++ 4667 11:10:04.890712 kill(4665, SIG_0 <unfinished ...> 4666 11:10:04.920965 kill(4665, SIG_0) = -1 ESRCH (No such process) <0.000017> 4667 11:10:04.934486 kill(4665, SIG_0 <unfinished ...>
Hi Sandra, I believe you were able to get past this issue with the CVS RHEL4 codebase on RHES. (http://www.redhat.com/archives/linux-cluster/2006-October/msg00291.html) Can you please verify that it works for you on fedora as well, so I can close this bugzilla? If not, we need to find a solution to this. Thanks, --Abhi
Hi Abhi, I finally compiled CVS RHEL4 for Fedora 5, but It was impossible to make ccsd work. Perhaps is because the Fedora 5, that we've for testing, have GFS versions previously installed and they were causing interferences with this new installation. I was unable to make it works even with ccs in debug mode, so I gave up it. In fact, we are requesting to spain for RHEL academic license + GFS offers. Best Regards, Sandra Hernández
Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers
This bug is open for a Fedora version that is no longer maintained and will not be fixed by Fedora. Therefore we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen thus bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.