Bug 144865 - Samba nmbd fails with error
Samba nmbd fails with error
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: samba (Show other bugs)
3
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Jay Fenlason
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-01-11 23:00 EST by Ken Hall
Modified: 2014-08-31 19:27 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-10-24 15:53:48 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
smb.conf requested (12.35 KB, text/plain)
2005-01-12 21:42 EST, Ken Hall
no flags Details
valgrind log from nmbd (24.59 KB, text/plain)
2005-01-13 21:57 EST, Ken Hall
no flags Details
a patch to make -mtune=pentium4 compilation results in OK nmbd binary (1.18 KB, patch)
2005-03-09 13:21 EST, Dmitry Butskoy
no flags Details | Diff
Patch to replace ugly code that upsets gcc with less ugly code (811 bytes, patch)
2005-03-10 12:19 EST, Jay Fenlason
no flags Details | Diff

  None (edit)
Description Ken Hall 2005-01-11 23:00:04 EST
Description of problem:

nmbd crashes after 10-15 minutes with following:

[2005/01/11 21:54:35, 4] 
nmbd/nmbd_packets.c:retransmit_or_expire_response_records(1606)
  retransmit_or_expire_response_records: timeout for packet id 4922 
to IP 192.168.1.100 on subnet UNICAST_SUBNET
*** glibc detected *** free(): invalid next size (fast): 0x09366e68 
***



Version-Release number of selected component (if applicable):

samba-3.0.10-1.fc3
glibc-2.3.4-2.fc3

How reproducible:

Every time

Steps to Reproduce:
1.  Start nmbd
2.  
3.
  
Actual results:

Processes continue to run, but nmbd does not answer queries

Expected results:


Additional info:
Comment 1 Ken Hall 2005-01-11 23:00:57 EST
wins support = yes
wins proxy = yes (also fails if set to no)
Comment 2 Jay Fenlason 2005-01-12 17:10:41 EST
Please attach your complete /etc/samba/smb.conf file to this bug 
report. 
Comment 3 Ken Hall 2005-01-12 21:42:00 EST
Created attachment 109707 [details]
smb.conf requested
Comment 4 Jay Fenlason 2005-01-13 19:18:35 EST
WORKSFORME here.  my nmbd has been stubbornly staying up for hours 
ever since I configured it similarly to yours. 
 
Could you try running nmbd under valgrind and see if it gives you 
any useful information?  kill your running nmbd processes and use 
something like 
valgrind --tool=memcheck --trace-children=yes /usr/sbin/nmbd -D 
(you might want to run it in a terminal window under script so 
you'll have a permanent record of everything it says).   
Comment 5 Ken Hall 2005-01-13 21:57:37 EST
Created attachment 109767 [details]
valgrind log from nmbd

I ran it for about an hour under valgrind.  The attachment Thursday.txt
contains the end of the result, about 20-30 minutes worth.  Under valgrind, it
never stopped responding, so I'm not sure if the results in the file are
significant.  Run normally, it fails consistently within anywhere from 5
seconds to a couple of hours after starting.  Note that the processes don't
fail completely, nmb just stops responding to queries.	One of the processes
always hangs in a state that requires kill -9 to terminate.

I also did try a "cold start", clearing out the tdb files from
/var/cache/samba, but it didn't help.

For reference, this is a 1 ghz. Athlon, 256 mb. DDR RAM.

kernel-2.6.9-1.724_FC3

All packages were current per up2date as of about a week ago.
Comment 6 Andrew Meredith 2005-02-18 06:28:53 EST
Same issue with roughly the same setup.

FC3 (up to date)

kernel-2.6.10-1.766_FC3
samba-3.0.10-1.fc3
Comment 7 Andrew Meredith 2005-02-18 09:19:23 EST
I have just rewound samba back to the FC3 base version
(3.0.8-0.pre1.3) and found comparable glibc based issues. I then
rewound glibc back to the FC3 base (2.3.3-74) and things were much the
same. For completeness I then wound samba forward leaving glibc behind
and things were again much the same. I am now back up to date.

I am running a script to kill and restart nmbd when it can no longer
answer it's own name. As you can see it is getting some exercise, and
there doesn't seem to be any pattern to the lifetime.

# ./nmbd_restarter
Fri Feb 18 13:09:07 GMT 2005 Kill and restart nmbd
Fri Feb 18 13:10:47 GMT 2005 Kill and restart nmbd
Fri Feb 18 13:11:09 GMT 2005 Kill and restart nmbd
Fri Feb 18 13:13:09 GMT 2005 Kill and restart nmbd
Fri Feb 18 13:13:36 GMT 2005 Kill and restart nmbd
Fri Feb 18 13:14:04 GMT 2005 Kill and restart nmbd
Fri Feb 18 13:14:27 GMT 2005 Kill and restart nmbd
Fri Feb 18 13:14:44 GMT 2005 Kill and restart nmbd
Fri Feb 18 13:15:43 GMT 2005 Kill and restart nmbd
.....

Comment 8 Ken Hall 2005-02-20 21:03:21 EST
I actually "solved" this by copying nmbd from one of my other servers 
that's still running Samba 2.2.7a under Redhat 9.  That version works 
perfectly, even running under FC3.
Comment 9 Dmitry Butskoy 2005-03-04 13:01:54 EST
  The same issue too. Even on samba-3.0.11 (from rawhide) re-compiled
under FC3 ...

wins support = yes
wins proxy = yes
dns proxy = yes
Comment 10 Dmitry Butskoy 2005-03-05 12:22:29 EST
  It seems to be a compiler bug.

  When I install samba-3.0.7-2.FC1 from latest FC1 updates, all is OK.
  When I re-compile the same package under FC3, the issue appears.
  Then I re-compile samba-3.0.7-2.FC1 under FC3 by compat-gcc (gcc
3.3) -- all is OK too.

  Now I compile samba-3.0.11-5 (from rawhide) -- the same results:
when compiled by FC3`s gcc-3.4.2, the issue appears; when compiled by
old gcc (3.3 from compat-gcc-8-3.3.4.2) -- all works fine.


  Jay,

  Whether there are somewhere new rawhide-like versions of the gcc-3.4
compiler? At me is a few time now to test it. (IMHO, reproduction of
this issue in other environment will demand much more efforts...)
Comment 11 Dmitry Butskoy 2005-03-09 09:36:55 EST
  The issue disappears even with gcc-3.4, but when we don`t use
"-mtune=pentium4"; "-mcpu=i686" or "-mtune=pentium3" are OK.

  Also note, that target host (where nmbd fails) has pentium3, not
pentium4 . CPU microcode is updated by microcode_ctl each boot.
Comment 12 Dmitry Butskoy 2005-03-09 10:43:58 EST
  File "source/nmbd/nmbd_winsproxy.c" is a problem.
  When this file is compiled with "-march=i386 -mtune=pentium4", the
issue appears (even if this file only is "p4`-ed"). From another side,
when all files are compiled with "-mtune=pentium4" but this file with
"-mtune=pentium3", the issue disappears...
Comment 13 Dmitry Butskoy 2005-03-09 13:21:40 EST
Created attachment 111818 [details]
a patch to make -mtune=pentium4 compilation results in OK nmbd binary

  This patch "localizes" a problem part of the code.
  If this patch is applied, compilation with "-march=i386 -mtune=pentium4"
results in correct "nmbd" binary. If not, the nmbd issue appear (target cpu is
pentium3).

  Any idea?
Comment 14 Jakub Jelinek 2005-03-09 19:04:36 EST
Simplified testcase:
/* { dg-do run } */
/* { dg-options "-O2" } */
/* { dg-options "-O2 -fpic -march=i386 -mtune=pentium4" { target i?86-*-* } } */

extern void abort (void);

struct S
{
  void (*s1) (void);
  void (*s2) (void);
  unsigned int s3;
  char s4[16];
};

static void h1 (void) { }
static void h2 (void) { }

void
bar (struct S *x)
{
  long *a[sizeof (struct S) / sizeof (long *) + 1];
  if (x->s1 != h1 || x->s2 != h2 || x->s3 != sizeof (a))
    abort ();
}

void
foo (void *x)
{
  long *a[sizeof (struct S) / sizeof (long *) + 1];
  struct S *b = (struct S *) a;

  __builtin_memset (a, '\0', sizeof (a));
  b->s1 = h1;
  b->s2 = h2;
  b->s3 = sizeof (a);
  __builtin_memcpy (b->s4, (char *) &x, sizeof (void *));
  bar (b);
}

int
main (void)
{
  foo ((void *) 0);
  return 0;
}

Now, I'm not sure if this is valid C or not.  Although the struct is written into
using memset (i.e. char pointer that aliases anything), its declared type
has different alias set than when writing the structure fields.
If this is valid C, the fix would be
http://gcc.gnu.org/ml/gcc-patches/2004-09/msg02682.html
(to stay on the safe side, perhaps just the set_mem_alias_set and set_mem_size
after set_mem_attributes).

Certainly using long * array in nmbd is completely useless, it should have been
e.g. a union of struct userdata_struct and an appriproately sized char pad[...];.
Comment 15 Dmitry Butskoy 2005-03-10 06:23:41 EST
  Unfortunetely, it is valid C code.
  The simplified test is good (but unlike nmbd, also crashes with
-mtune=pentium3 and any other pentiums).

  Some playing with this test:
-0     -mtune=*              OK
-02    -mtune={i386|i486}    OK
-02    -mtune={i586|i686|pentium3|pentium4}     abort
-O3    -mtune=*              OK


  Also, the bug #142943 seems to be a duplicate of this nmbd issue...
Comment 16 Alexandre Oliva 2005-03-10 11:53:49 EST
I think the code invokes undefined behavior becuase the initial memset
writes to a, so it's not obvious to me that this conflicts with the
sets that use b.  Does changing the memset to use b make any difference?
Comment 17 Jay Fenlason 2005-03-10 12:16:25 EST
I've submitted a patch to the upstream Samba folks that's been 
accepted for 3.0.12, so this'll all be academic eventually. 
 
Comment 18 Jay Fenlason 2005-03-10 12:19:18 EST
Created attachment 111861 [details]
Patch to replace ugly code that upsets gcc with less ugly code
Comment 19 Dmitry Butskoy 2005-03-10 12:43:02 EST
> Does changing the memset to use b make any difference?
Yes, the abort disappears.

  IMHO, there are a lot of similar "ugly" code in the other sources
(not samba only). And many programmers, unfortunately, do not consider
such a code ugly.
  Therefore it is not only the samba-related problem. May be make
something with gcc itself? At least a warning message...
  
Comment 20 Dmitry Butskoy 2005-04-15 07:39:39 EDT
  After update to gcc-3.4.3-22.fc3, the issue still occur (with the simplified
test, see above).
  With gcc4 (gcc4-4.0.0-0.41.fc3) the issue disappears...
Comment 21 Dmitry Butskoy 2005-08-04 10:54:05 EDT
 Still occur after update to newest FC3`s gcc-3.4.4-2.fc3 ...
Comment 22 Jay Fenlason 2005-10-24 15:53:48 EDT
My patch went upstream, so I can close this. 

Note You need to log in before you can comment on or make changes to this bug.