Description of problem: Newer kernels are causing an existing app to fail via seg fault when the app uses ftw(). Reverting to an older kernel fixes the problem. Recompiling the app under the newer kernel still causes the seg fault. Version-Release number of selected component (if applicable): I don't know at what kernel this started, but several recent kernels don't work as expected. 2.6.15-1.1833_FC4 works just fine. How reproducible: Simple call to ftw() from one kernel release to the next shows that under newer kernels, ftw seg faults. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I should also mention that the routine uses ftw twice. This works as expected in older kernels. The second call to ftw is the one that immediately fails with a seg fault. I placed some fprintf statements in the subroutine that ftw is supposed to call for each iteration, and apparently my subroutine never gets called as the fprintf's at the very top of that subroutine never get executed.
Please provide a self-contained testcase which reproduces this, i.e. a small C program with the ftw calls and directory tree on which ftw is called.
I tried to create a short test pgm, but can't get it to fail in the trivial case. However, I modified the actual routine that is failing to show that the failure is somewhere in the ftw code. Here's my function definition: __ftw_func_t ftw_func (const char *filename, const struct stat *stats, int flag) The only executable code in it right now is: fprintf(stderr, "Working on %s.\n", filename); return 0; I modified the main routine that uses ftw as follows: fprintf(stderr, "Replacing strings. %s \n", clone_hierarchy_to); ftw_func(NULL,NULL,NULL); fprintf(stderr, "Replacing strings. %s \n", clone_hierarchy_to); if (ftw(clone_hierarchy_to, (__ftw_func_t) ftw_func, 10) != 0) { perror("ftw_func"); exit(2); } fprintf(stderr, "Done Replacing strings. %s \n", clone_hierarchy_to); When I call my ftw_func directly with NULL's, I get the fprintf output. When I try to get ftw to call it, it aborts as follows via strace: write(2, "Replacing strings. /home/vpopmai"..., 61Replacing strings. /home/vpopmail/domains/private.ycc/1/qqq ) = 61 write(2, "Working on (null).\n", 19Working on (null). ) = 19 write(2, "Replacing strings. /home/vpopmai"..., 61Replacing strings. /home/vpopmail/domains/private.ycc/1/qqq ) = 61 stat64("/home/vpopmail/domains/private.ycc/1/qqq", {st_mode=S_IFDIR|0750, st_size=4096, ...}) = 0 open("/home/vpopmail/domains/private.ycc/1/qqq", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3 fstat64(3, {st_mode=S_IFDIR|0750, st_size=4096, ...}) = 0 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++
You can install glibc-debuginfo, run the program under gdb and get the backtrace. From the above it is not clear where exactly it segfaults, neither why.
gdb /home/vpopmail/bin/vadduser GNU gdb Red Hat Linux (6.3.0.0-1.84rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". (gdb) run qqq xxx Starting program: /home/vpopmail/bin/vadduser qqq xxx Reading symbols from shared object read from target memory...done. Loaded system supplied DSO at 0xad1000 Replacing strings. /home/vpopmail/domains/private.ycc/1/qqq Working on (null). Replacing strings. /home/vpopmail/domains/private.ycc/1/qqq Program received signal SIGSEGV, Segmentation fault. 0x00bad1a4 in ftw_dir (data=0xbfa281b8, st=0xbfa28160, old_dir=0x0) at ftw.c:461 461 result = (*data->func) (data->dirbuf, st, FTW_D, &data->ftw); (gdb) bt #0 0x00bad1a4 in ftw_dir (data=0xbfa281b8, st=0xbfa28160, old_dir=0x0) at ftw.c:461 #1 0x00bad78e in ftw_startup (dir=Variable "dir" is not available. ) at ftw.c:699 #2 0x00bad864 in ftw ( path=0x8057880 "/home/vpopmail/domains/private.ycc/1/qqq", func=0xbfa2836a, descriptors=10) at ftw.c:743 #3 0x0804cad3 in vadduser (username=0xbfa2879c "qqq", domain=0xbfa2869c "private.ycc", password=0x8057660 "xxx", gecos=0x8057780 "qqq", apop=0) at vpopmail.c:662 #4 0x0804993e in main (argc=3, argv=0xbfa28934) at vadduser.c:94 (gdb)
func=0xbfa2836a That sounds like you are passing a nested routine address to ftw. If that's so, please check the binary: readelf -Wl /home/vpopmail/bin/vadduser | grep GNU_STACK If that contains RWX, then this is kernel's fault, recent kernels (certainly FC5, but I think FC4 too) have messed up PT_GNU_STACK support.
If so, this is a dup of #187853 btw.
readelf -Wl /home/vpopmail/bin/vadduser |grep GNU_STACK GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RWE 0x4
*** This bug has been marked as a duplicate of 187853 ***