Bug 72139 - SIGBUS on fread() on networked file system
Summary: SIGBUS on fread() on networked file system
Status: CLOSED RAWHIDE
Alias: None
Product: Red Hat Public Beta
Classification: Retired
Component: glibc (Show other bugs)
(Show other bugs)
Version: null
Hardware: i386 Linux
medium
high
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Jay Turner
URL:
Whiteboard:
Keywords:
Depends On:
Blocks: 67218
TreeView+ depends on / blocked
 
Reported: 2002-08-21 13:20 UTC by jeroen
Modified: 2016-11-24 12:26 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2002-09-02 15:19:42 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Backtrace with symbos for nautilus, libgnomeui, gnome-vfs2 (2.00 KB, text/plain)
2002-08-29 14:44 UTC, Tim Waugh
no flags Details
I attached to nautilus with gdb before browsing to an SMB mount. (16.37 KB, text/plain)
2002-08-29 15:21 UTC, Tim Waugh
no flags Details
test case of fread/truncate SIGBUS interaction (470 bytes, text/plain)
2002-08-29 16:46 UTC, Stephen Tweedie
no flags Details

Description jeroen 2002-08-21 13:20:25 UTC
I've got a mounted samba share called "/mnt/movies"

Entry from fstab:
//glamdring/movies      /mnt/movies             smbfs  
owner,rw,username=foo,password=bar 0 0

This gets mounted during startup.

Now when i right-click on a directory on that smb share, nautilus
crashes (reproducable) immediately (popup menu does not appear first).

This happens on other smb mounted shares too (on all directories).


Debugging Information:

Backtrace was generated from '/usr/bin/nautilus'

(no debugging symbols found)...[New Thread 1024 (LWP 27716)]
[New Thread 2049 (LWP 27717)]
[New Thread 1026 (LWP 27718)]
[New Thread 2051 (LWP 27719)]
[New Thread 3076 (LWP 27720)]
[New Thread 4101 (LWP 27721)]
[New Thread 5126 (LWP 27722)]
[New Thread 6151 (LWP 27723)]
[New Thread 7176 (LWP 27724)]
[New Thread 8201 (LWP 27725)]
[New Thread 9226 (LWP 27726)]
[New Thread 10251 (LWP 27727)]
0x420a0a89 in wait4 () from /lib/i686/libc.so.6
#0  0x420a0a89 in wait4 () from /lib/i686/libc.so.6
#1  0x4211921c in __DTOR_END__ () from /lib/i686/libc.so.6
#2  0x408eab13 in waitpid () from /lib/i686/libpthread.so.0
#3  0x40234035 in libgnomeui_module_info_get () from
/usr/lib/libgnomeui-2.so.0

Thread 12 (Thread 10251 (LWP 27727)):
#0  0x420285a9 in sigsuspend () from /lib/i686/libc.so.6
No symbol table info available.
#1  0x408e7fe8 in __pthread_wait_for_restart_signal ()
   from /lib/i686/libpthread.so.0
No symbol table info available.
#2  0x408e4f8b in pthread_cond_wait () from /lib/i686/libpthread.so.0
No symbol table info available.
#3  0x407563f6 in gnome_vfs_thread_pool_wait_for_work ()
   from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#4  0x4075644f in thread_entry () from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#5  0x40930377 in g_thread_create_proxy () from
/usr/lib/libglib-2.0.so.0
No symbol table info available.
#6  0x408e6871 in pthread_start_thread () from
/lib/i686/libpthread.so.0
No symbol table info available.

Thread 11 (Thread 9226 (LWP 27726)):
#0  0x420285a9 in sigsuspend () from /lib/i686/libc.so.6
No symbol table info available.
#1  0x408e7fe8 in __pthread_wait_for_restart_signal ()
   from /lib/i686/libpthread.so.0
No symbol table info available.
#2  0x408e4f8b in pthread_cond_wait () from /lib/i686/libpthread.so.0
No symbol table info available.
#3  0x407563f6 in gnome_vfs_thread_pool_wait_for_work ()
   from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#4  0x4075644f in thread_entry () from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#5  0x40930377 in g_thread_create_proxy () from
/usr/lib/libglib-2.0.so.0
No symbol table info available.
#6  0x408e6871 in pthread_start_thread () from
/lib/i686/libpthread.so.0
No symbol table info available.

Thread 10 (Thread 8201 (LWP 27725)):
#0  0x420285a9 in sigsuspend () from /lib/i686/libc.so.6
No symbol table info available.
#1  0x408e7fe8 in __pthread_wait_for_restart_signal ()
   from /lib/i686/libpthread.so.0
No symbol table info available.
#2  0x408e4f8b in pthread_cond_wait () from /lib/i686/libpthread.so.0
No symbol table info available.
#3  0x407563f6 in gnome_vfs_thread_pool_wait_for_work ()
   from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#4  0x4075644f in thread_entry () from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#5  0x40930377 in g_thread_create_proxy () from
/usr/lib/libglib-2.0.so.0
No symbol table info available.
#6  0x408e6871 in pthread_start_thread () from
/lib/i686/libpthread.so.0
No symbol table info available.

Thread 9 (Thread 7176 (LWP 27724)):
#0  0x420285a9 in sigsuspend () from /lib/i686/libc.so.6
No symbol table info available.
#1  0x408e7fe8 in __pthread_wait_for_restart_signal ()
   from /lib/i686/libpthread.so.0
No symbol table info available.
#2  0x408e4f8b in pthread_cond_wait () from /lib/i686/libpthread.so.0
No symbol table info available.
#3  0x407563f6 in gnome_vfs_thread_pool_wait_for_work ()
   from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#4  0x4075644f in thread_entry () from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#5  0x40930377 in g_thread_create_proxy () from
/usr/lib/libglib-2.0.so.0
No symbol table info available.
#6  0x408e6871 in pthread_start_thread () from
/lib/i686/libpthread.so.0
No symbol table info available.

Thread 8 (Thread 6151 (LWP 27723)):
#0  0x420285a9 in sigsuspend () from /lib/i686/libc.so.6
No symbol table info available.
#1  0x408e7fe8 in __pthread_wait_for_restart_signal ()
   from /lib/i686/libpthread.so.0
No symbol table info available.
#2  0x408e4f8b in pthread_cond_wait () from /lib/i686/libpthread.so.0
No symbol table info available.
#3  0x407563f6 in gnome_vfs_thread_pool_wait_for_work ()
   from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#4  0x4075644f in thread_entry () from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#5  0x40930377 in g_thread_create_proxy () from
/usr/lib/libglib-2.0.so.0
No symbol table info available.
#6  0x408e6871 in pthread_start_thread () from
/lib/i686/libpthread.so.0
No symbol table info available.

Thread 7 (Thread 5126 (LWP 27722)):
#0  0x420285a9 in sigsuspend () from /lib/i686/libc.so.6
No symbol table info available.
#1  0x408e7fe8 in __pthread_wait_for_restart_signal ()
   from /lib/i686/libpthread.so.0
No symbol table info available.
#2  0x408e4f8b in pthread_cond_wait () from /lib/i686/libpthread.so.0
No symbol table info available.
#3  0x407563f6 in gnome_vfs_thread_pool_wait_for_work ()
   from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#4  0x4075644f in thread_entry () from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#5  0x40930377 in g_thread_create_proxy () from
/usr/lib/libglib-2.0.so.0
No symbol table info available.
#6  0x408e6871 in pthread_start_thread () from
/lib/i686/libpthread.so.0
No symbol table info available.

Thread 6 (Thread4101 (LWP 27721)):
#0  0x420285a9 in sigsuspend () from /lib/i686/libc.so.6
No symbol table info available.
#1  0x408e7fe8 in __pthread_wait_for_restart_signal ()
   from /lib/i686/libpthread.so.0
No symbol table info available.
#2  0x408e4f8b in pthread_cond_wait () from /lib/i686/libpthread.so.0
No symbol table info available.
#3  0x407563f6 in gnome_vfs_thread_pool_wait_for_work ()
   from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#4  0x4075644f in thread_entry () from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#5  0x40930377 in g_thread_create_proxy () from
/usr/lib/libglib-2.0.so.0
No symbol table info available.
#6  0x408e6871 in pthread_start_thread () from
/lib/i686/libpthread.so.0
No symbol table info available.

Thread 5 (Thread 3076 (LWP 27720)):
#0  0x420285a9 in sigsuspend () from /lib/i686/libc.so.6
No symbol table info available.
#1  0x408e7fe8 in __pthread_wait_for_restart_signal ()
   from /lib/i686/libpthread.so.0
No symbol table info available.
#2  0x408e4f8b in pthread_cond_wait () from /lib/i686/libpthread.so.0
No symbol table info available.
#3  0x407563f6 in gnome_vfs_thread_pool_wait_for_work ()
   from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#4  0x4075644f in thread_entry () from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#5  0x40930377 in g_thread_create_proxy () from
/usr/lib/libglib-2.0.so.0
No symbol table info available.
#6  0x408e6871 in pthread_start_thread () from
/lib/i686/libpthread.so.0
No symbol table info available.

Thread 4 (Thread 2051 (LWP 27719)):
#0  0x420285a9 in sigsuspend () from /lib/i686/libc.so.6
No symbol table info available.
#1  0x408e7fe8 in __pthread_wait_for_restart_signal ()
   from /lib/i686/libpthread.so.0
No symbol table info available.
#2  0x408e4f8b in pthread_cond_wait () from /lib/i686/libpthread.so.0
No symbol table info available.
#3  0x407563f6 in gnome_vfs_thread_pool_wait_for_work ()
   from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#4  0x4075644f in thread_entry () from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#5  0x40930377 in g_thread_create_proxy () from
/usr/lib/libglib-2.0.so.0
No symbol table info available.
#6  0x408e6871 in pthread_start_thread () from
/lib/i686/libpthread.so.0
No symbol table info available.

Thread 3 (Thread 1026 (LWP 27718)):
#0  0x420285a9 in sigsuspend () from /lib/i686/libc.so.6
No symbol table info available.
#1  0x408e7fe8 in __pthread_wait_for_restart_signal ()
   from /lib/i686/libpthread.so.0
No symbol table info available.
#2  0x408e4f8b in pthread_cond_wait () from /lib/i686/libpthread.so.0
No symbol table info available.
#3  0x407563f6 in gnome_vfs_thread_pool_wait_for_work ()
   from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#4  0x4075644f in thread_entry () from /usr/lib/libgnomevfs-2.so.0
No symbol table info available.
#5  0x40930377 in g_thread_create_proxy ()from
/usr/lib/libglib-2.0.so.0
No symbol table info available.
#6  0x408e6871 in pthread_start_thread () from
/lib/i686/libpthread.so.0
No symbol table info available.

Thread 2 (Thread 2049 (LWP 27717)):
#0  0x420c3aeb in poll () from /lib/i686/libc.so.6
No symbol table info available.
#1  0x408e5cce in __pthread_manager () from /lib/i686/libpthread.so.0
No symbol table info available.

Thread 1 (Thread 1024 (LWP 27716)):
#0  0x420a0a89 in wait4 () from /lib/i686/libc.so.6
No symbol table info available.
#1  0x4211921c in __DTOR_END__ () from /lib/i686/libc.so.6
No symbol table info available.
#2  0x408eab13 in waitpid () from /lib/i686/libpthread.so.0
No symbol table info available.
#3  0x40234035 in libgnomeui_module_info_get () from
/usr/lib/libgnomeui-2.so.0
No symbol table info available.
#0  0x420a0a89 in wait4 () from /lib/i686/libc.so.6

Comment 1 Havoc Pennington 2002-08-21 14:34:41 UTC
Taking an initial guess it's the smb: vfs backend

Comment 2 Alexander Larsson 2002-08-22 08:16:56 UTC
No it's not. It's not being used, since this is a normal mounted unix filesystem
(although smbfs).

But there is amazingly little data in those backtraces. All the gnome-vfs
threads are waiting for work, and the main thread is in
libgnomeui_module_info_get() which is strange.


Comment 3 Havoc Pennington 2002-08-22 21:11:49 UTC
Note, every no-symbols trace I've seen for this release has been in
libgnomeui_module_info_get(). That just means "no symbols"

Comment 4 Tim Waugh 2002-08-29 14:19:58 UTC
Do we think the crash is in libgnomeui, or in nautilus?

Comment 5 Tim Waugh 2002-08-29 14:44:30 UTC
Created attachment 73744 [details]
Backtrace with symbos for nautilus, libgnomeui, gnome-vfs2

Comment 6 Havoc Pennington 2002-08-29 15:09:11 UTC
the trace has thread 6 and 7 but no 12345?

Comment 7 Tim Waugh 2002-08-29 15:13:00 UTC
I guess bug-buddy ate it.  I'll try again.

Comment 8 Tim Waugh 2002-08-29 15:21:13 UTC
Created attachment 73748 [details]
I attached to nautilus with gdb before browsing to an SMB mount.

Comment 9 Tim Waugh 2002-08-29 15:23:21 UTC
You're probably wondering what's in /mnt/smb/cvsLLmFBI.  It just contains a 
single newline.

Comment 10 Tim Waugh 2002-08-29 15:34:51 UTC
jeroen@xs4all.nl: if you run 'dmesg' does it show smb_open errors?

Comment 11 Tim Waugh 2002-08-29 15:50:45 UTC
Jakub, this seems to be a glibc problem.  This small program: 
 
#include <stdio.h> 
 
int main (int argc, char **argv) 
{ 
        FILE *f = fopen (argv[1], "r"); 
        char buffer; 
        fread (&buffer, 1, 1, f); 
        return 0; 
} 
 
crashes when run as './crash /mnt/smb/file', where /mnt/smb is an SMB mount 
and 'file' has mode 0600 on the UNIX samba server.

Comment 12 Tim Waugh 2002-08-29 15:54:24 UTC
Changing component.

Comment 13 Alan Cox 2002-08-29 16:16:52 UTC
The fread code (and all code touching an mmap buffer) to do stdio type stuff
needs to be catching SIGBUS and returning a sensible error. Otherwise fread can fail
in weird and bogus ways when

- An NFS error occurs (not too bad)
- SMB, AFS or other fs where file permission changes are reflected immediately
- When the underlying file is truncated. In which case an EOF return should
  occur but touching the mmap buffer may SIGBUS

Ditto fgetc etc - and from Sct's pastings they are required to report EOF
properly  rather than randomly SIGBUS


Comment 14 Matt Wilson 2002-08-29 16:43:48 UTC
Proposal: for read-only stdio, mmap with MAP_COPY


Comment 15 Stephen Tweedie 2002-08-29 16:46:40 UTC
Created attachment 73757 [details]
test case of fread/truncate SIGBUS interaction

Comment 16 Stephen Tweedie 2002-08-29 16:49:10 UTC
truncates can interact with the stdio mmap()s to cause a SIGBUS to be raised on
fread().  I don't think we're allowed to expose that SIGBUS to user-land.  EOF
would be fine, but not a signal.

And yes, this happens on ext3 too.

This *will* be visible to applications.  Think about a logfile-processing
application which collides with a background log rotation.

Comment 17 Stephen Tweedie 2002-08-29 16:50:06 UTC
We don't have MAP_COPY.

Comment 18 Matt Wilson 2002-08-29 16:50:49 UTC
make that MAP_PRIVATE

Comment 19 Roland McGrath 2002-08-29 18:52:56 UTC
MAP_PRIVATE doesn't help (MAP_COPY would if it existed).
That only affects modifications made to that mapping.

I raised this on the libc-hacker list some time ago and it is demonstrated
by the existing libio/tst-mmap2-eofsync test program if you comment out the 
fflush call.

There is no simple solution.

Comment 20 Stephen Tweedie 2002-08-29 18:57:45 UTC
Hmm --- what was the reaction on the libc list?  Is this not considered a
regression?  I don't see any simple fix either, other than to disable the mmaped
IO by default.

Comment 21 Stephen Tweedie 2002-08-29 19:04:25 UTC
To be clear, there are two issues at large here.  There's one smbfs-specific
one, but also one general one which will cause the SIGBUS on any filesystem.  We
can open the general case as a separate bugzilla entry if necessary, but I think
it's really one issue overall --- glibc *will* get SIGBUS from mmaped file IO,
and needs to handle that.

Comment 22 Roland McGrath 2002-08-29 19:17:01 UTC
... Thundering silence. :)  I would like to hear Ulrich's comments,
since he pushed this feature in the first place.  But I cannot see how
it can be done without breaking POSIX.1 conformance, unless we do a radically
more hairy implementation that would break old ABI conformance (i.e. cope with
the signals, even in getc macros).

Comment 23 Roland McGrath 2002-08-29 19:20:13 UTC
wrt sct's last comment: the glibc issue is the same.
In smbfs a remote permission change is equivalent from a client's perspective
to the file having been truncated.

Comment 24 Ulrich Drepper 2002-08-29 21:20:02 UTC
The two issues are not the same.

Truncation of a file need only be handled gracefully if the user follows the
rules .  This includes calling fflush() before any I/O operation if the file
might have changed.  The currently used code to handle fflush() calls does not
unconditionally check the new file size etc but this can be added.

For semantically and otherwise broken filesystems the situation is different. 
The changing of the access permissions mustn't have any effect on already opened
files.  This seems not to be the case with some network filesystems which makes
them non-conformant in POSIX environments.  In theory this means we can do what
we want.  This isn't practical so the next best thing is to recognize such
filesystems and not attempt any optimization.

Disabling the automatic optimization is really a bad idea.  nobody will enable
it and the 10+% in performance improvement is lost just because of some people's
setups and special needs.  The fact that there was only this one report in all
the time of beta testing should show how viable this optimization is.

Comment 25 Roland McGrath 2002-08-29 21:42:00 UTC
You are going out on a limb to say that "the user following the rules"
includes making sure that no other user modifies any file.  In fact,
I think I can make a case that a single strictly conformant POSIX.1 program 
can be written that will break with this.  The "Interactions of Other FILE-Type
C Functions" rules in fact talk about a single file description and I can find
nothing that would make these apply to mutliple separate opens of a file.



Comment 26 Alexander Larsson 2002-08-30 06:26:29 UTC
Ok. I have some code (nautilus) that must not randomly SIGBUS, whatever the
posix-compliance of the filesystem is. How do i disable mmaped stdio?


Comment 27 Roland McGrath 2002-08-30 06:38:24 UTC
Any call to setbuf/setvbuf will revert a stream to non-mmap mode.
e.g. setvbuf(stream, NULL, _IOFBF, BUFSIZ);
(Note mmap is only ever used for read-only opens.)

But I think there will be no sane choice but to turn the mmap stdio off
for the release.



Comment 28 Jakub Jelinek 2002-08-30 08:07:14 UTC
How hard would it be to implement MAP_COPY in the kernel?

glibc could intercept SIGBUS too and scan all currently open FILEs mmap areas
if the fault address falls into them and if SIGBUS happened in one of them,
mmap MAP_ANON over that page and retry, otherwise just continue with the application
SIGBUS handler (or abort if DFL).

I see Ulrich changed mmap stdio for now to be only enabled with "m" f*open flag,
but anyway it would be good to solve this RSN so that apps could take advantage
of mmap stdio automatically.

Comment 29 Stephen Tweedie 2002-08-30 09:01:14 UTC
An optimisation is *not* viable if it means that any application using stdio
streams on a live filesystem can be killed by glibc.  Nautilus is a perfect
example of an application which *must* stay alive and which *must* cope somehow
with an active filesystem.

Sure, if you use stio to read a file which is undergoing concurrent
modifications, you can't predict the results, but you do expect the application
to survive --- you just can't predict _what_ data will be shown.

Comment 30 Stephen Tweedie 2002-08-30 09:28:28 UTC
Jakub, the kernel already does the sort of thing you describe, except with a
twist.  Whenever it takes a page fault, it looks for the address not of the
absent page, but of the calling function.  If the fault happened in a function
registered as expecting memory violations, it dispatches the fault to an
appropriate exception handler for that function.  

That allows us to access user space from the kernel with the semantics "access
this memory, and if anything goes wrong, do THIS rather than raising an oops." 
If such a mechanism could be used by glibc (either with kernel support or just
by trapping the SIGBUS), it would be ideal for stdio's use.

Comment 31 Jakub Jelinek 2002-08-30 09:41:02 UTC
Of course I know about asm/uaccess.h. Unfortunately, this cannot be done for glibc -
Some stdio access functions (e.g. unlocked getc) are inlined in applications,
which means even if all stdio accesses in glibc would be guarded (and unlike
kernel this would be in lots of places, not just a bunch of macros, since when
a FILE stream has some buffer then until it reaches its boundaries its accessed
freely by glibc) by similar magic there would still be places outside of glibc
which could happily end up with SIGBUS.

IMHO MAP_COPY would be useful for lots of other things.

Comment 32 Stephen Tweedie 2002-08-30 10:27:09 UTC
alex made the suggestion that if the getc macro:

       ((_fp)->_IO_read_ptr >= (_fp)->_IO_read_end ? __uflow (_fp) \
	: *(unsigned char *) (_fp)->_IO_read_ptr++)

were to be forced into the __uflow() call each time by careful management of the
pointer fields, we could force compatibility for old binaries at the cost of a
significant performance loss if those binaries ever find themselves processing a
new mmaped stream.

Symbol versioning could take care of the common cases by avoiding passing a mmap
stream to old applications, but obviously new apps might still link against, and
pass such streams to, old libraries with the unprotected inline getc.

Comment 33 Alexander Larsson 2002-08-30 10:48:05 UTC
That would of course have to be combined with fixups on SIGBUS for the glibc
code. Much like the kernel uaccess code. This might even make it possible to use
mmaped streams on RW streams.


Comment 34 Jakub Jelinek 2002-09-02 15:19:35 UTC
glibc-2.2.92-1 and later already use mmap stdio only when fopen("...", "rm");
etc., so for now this can be closed...

Comment 35 Jay Turner 2002-09-03 20:23:12 UTC
Closing out at Jakub's request.


Note You need to log in before you can comment on or make changes to this bug.