Bug 144865
Summary: | Samba nmbd fails with error | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Ken Hall <kjhall55> |
Component: | samba | Assignee: | Jay Fenlason <fenlason> |
Status: | CLOSED RAWHIDE | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3 | CC: | andrew, dmitry, jakub, jfeeney, oliva |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-10-24 19:53:48 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Ken Hall
2005-01-12 04:00:04 UTC
wins support = yes wins proxy = yes (also fails if set to no) Please attach your complete /etc/samba/smb.conf file to this bug report. Created attachment 109707 [details]
smb.conf requested
WORKSFORME here. my nmbd has been stubbornly staying up for hours ever since I configured it similarly to yours. Could you try running nmbd under valgrind and see if it gives you any useful information? kill your running nmbd processes and use something like valgrind --tool=memcheck --trace-children=yes /usr/sbin/nmbd -D (you might want to run it in a terminal window under script so you'll have a permanent record of everything it says). Created attachment 109767 [details]
valgrind log from nmbd
I ran it for about an hour under valgrind. The attachment Thursday.txt
contains the end of the result, about 20-30 minutes worth. Under valgrind, it
never stopped responding, so I'm not sure if the results in the file are
significant. Run normally, it fails consistently within anywhere from 5
seconds to a couple of hours after starting. Note that the processes don't
fail completely, nmb just stops responding to queries. One of the processes
always hangs in a state that requires kill -9 to terminate.
I also did try a "cold start", clearing out the tdb files from
/var/cache/samba, but it didn't help.
For reference, this is a 1 ghz. Athlon, 256 mb. DDR RAM.
kernel-2.6.9-1.724_FC3
All packages were current per up2date as of about a week ago.
Same issue with roughly the same setup. FC3 (up to date) kernel-2.6.10-1.766_FC3 samba-3.0.10-1.fc3 I have just rewound samba back to the FC3 base version (3.0.8-0.pre1.3) and found comparable glibc based issues. I then rewound glibc back to the FC3 base (2.3.3-74) and things were much the same. For completeness I then wound samba forward leaving glibc behind and things were again much the same. I am now back up to date. I am running a script to kill and restart nmbd when it can no longer answer it's own name. As you can see it is getting some exercise, and there doesn't seem to be any pattern to the lifetime. # ./nmbd_restarter Fri Feb 18 13:09:07 GMT 2005 Kill and restart nmbd Fri Feb 18 13:10:47 GMT 2005 Kill and restart nmbd Fri Feb 18 13:11:09 GMT 2005 Kill and restart nmbd Fri Feb 18 13:13:09 GMT 2005 Kill and restart nmbd Fri Feb 18 13:13:36 GMT 2005 Kill and restart nmbd Fri Feb 18 13:14:04 GMT 2005 Kill and restart nmbd Fri Feb 18 13:14:27 GMT 2005 Kill and restart nmbd Fri Feb 18 13:14:44 GMT 2005 Kill and restart nmbd Fri Feb 18 13:15:43 GMT 2005 Kill and restart nmbd ..... I actually "solved" this by copying nmbd from one of my other servers that's still running Samba 2.2.7a under Redhat 9. That version works perfectly, even running under FC3. The same issue too. Even on samba-3.0.11 (from rawhide) re-compiled under FC3 ... wins support = yes wins proxy = yes dns proxy = yes It seems to be a compiler bug. When I install samba-3.0.7-2.FC1 from latest FC1 updates, all is OK. When I re-compile the same package under FC3, the issue appears. Then I re-compile samba-3.0.7-2.FC1 under FC3 by compat-gcc (gcc 3.3) -- all is OK too. Now I compile samba-3.0.11-5 (from rawhide) -- the same results: when compiled by FC3`s gcc-3.4.2, the issue appears; when compiled by old gcc (3.3 from compat-gcc-8-3.3.4.2) -- all works fine. Jay, Whether there are somewhere new rawhide-like versions of the gcc-3.4 compiler? At me is a few time now to test it. (IMHO, reproduction of this issue in other environment will demand much more efforts...) The issue disappears even with gcc-3.4, but when we don`t use "-mtune=pentium4"; "-mcpu=i686" or "-mtune=pentium3" are OK. Also note, that target host (where nmbd fails) has pentium3, not pentium4 . CPU microcode is updated by microcode_ctl each boot. File "source/nmbd/nmbd_winsproxy.c" is a problem. When this file is compiled with "-march=i386 -mtune=pentium4", the issue appears (even if this file only is "p4`-ed"). From another side, when all files are compiled with "-mtune=pentium4" but this file with "-mtune=pentium3", the issue disappears... Created attachment 111818 [details]
a patch to make -mtune=pentium4 compilation results in OK nmbd binary
This patch "localizes" a problem part of the code.
If this patch is applied, compilation with "-march=i386 -mtune=pentium4"
results in correct "nmbd" binary. If not, the nmbd issue appear (target cpu is
pentium3).
Any idea?
Simplified testcase: /* { dg-do run } */ /* { dg-options "-O2" } */ /* { dg-options "-O2 -fpic -march=i386 -mtune=pentium4" { target i?86-*-* } } */ extern void abort (void); struct S { void (*s1) (void); void (*s2) (void); unsigned int s3; char s4[16]; }; static void h1 (void) { } static void h2 (void) { } void bar (struct S *x) { long *a[sizeof (struct S) / sizeof (long *) + 1]; if (x->s1 != h1 || x->s2 != h2 || x->s3 != sizeof (a)) abort (); } void foo (void *x) { long *a[sizeof (struct S) / sizeof (long *) + 1]; struct S *b = (struct S *) a; __builtin_memset (a, '\0', sizeof (a)); b->s1 = h1; b->s2 = h2; b->s3 = sizeof (a); __builtin_memcpy (b->s4, (char *) &x, sizeof (void *)); bar (b); } int main (void) { foo ((void *) 0); return 0; } Now, I'm not sure if this is valid C or not. Although the struct is written into using memset (i.e. char pointer that aliases anything), its declared type has different alias set than when writing the structure fields. If this is valid C, the fix would be http://gcc.gnu.org/ml/gcc-patches/2004-09/msg02682.html (to stay on the safe side, perhaps just the set_mem_alias_set and set_mem_size after set_mem_attributes). Certainly using long * array in nmbd is completely useless, it should have been e.g. a union of struct userdata_struct and an appriproately sized char pad[...];. Unfortunetely, it is valid C code. The simplified test is good (but unlike nmbd, also crashes with -mtune=pentium3 and any other pentiums). Some playing with this test: -0 -mtune=* OK -02 -mtune={i386|i486} OK -02 -mtune={i586|i686|pentium3|pentium4} abort -O3 -mtune=* OK Also, the bug #142943 seems to be a duplicate of this nmbd issue... I think the code invokes undefined behavior becuase the initial memset writes to a, so it's not obvious to me that this conflicts with the sets that use b. Does changing the memset to use b make any difference? I've submitted a patch to the upstream Samba folks that's been accepted for 3.0.12, so this'll all be academic eventually. Created attachment 111861 [details]
Patch to replace ugly code that upsets gcc with less ugly code
> Does changing the memset to use b make any difference?
Yes, the abort disappears.
IMHO, there are a lot of similar "ugly" code in the other sources
(not samba only). And many programmers, unfortunately, do not consider
such a code ugly.
Therefore it is not only the samba-related problem. May be make
something with gcc itself? At least a warning message...
After update to gcc-3.4.3-22.fc3, the issue still occur (with the simplified test, see above). With gcc4 (gcc4-4.0.0-0.41.fc3) the issue disappears... Still occur after update to newest FC3`s gcc-3.4.4-2.fc3 ... My patch went upstream, so I can close this. |