325141 – mount.ntfs-3g hogs up the system when invoked by thunderbird

Bug 325141 - mount.ntfs-3g hogs up the system when invoked by thunderbird

Summary: mount.ntfs-3g hogs up the system when invoked by thunderbird

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	ntfs-3g
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Christopher Aillon
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-10-09 16:43 UTC by Julian Sikorski
Modified:	2007-11-30 22:12 UTC (History)
CC List:	6 users (show)
Fixed In Version:	2:1.1104-1.fc8
Clone Of:
Environment:
Last Closed:	2007-11-21 22:55:19 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
/var/log/messages (189.42 KB, application/x-bzip2) 2007-10-12 05:47 UTC, Julian Sikorski	no flags	Details
/var/log/messages-20071007 (293.92 KB, application/x-bzip2) 2007-10-12 05:48 UTC, Julian Sikorski	no flags	Details
/var/log/messages.1 (131.67 KB, application/x-bzip2) 2007-10-12 05:48 UTC, Julian Sikorski	no flags	Details
/var/log/messages.2 (137.53 KB, application/x-bzip2) 2007-10-12 05:49 UTC, Julian Sikorski	no flags	Details
/var/log/messages.3 (210.97 KB, application/x-bzip2) 2007-10-12 05:49 UTC, Julian Sikorski	no flags	Details
/var/log/messages.4 (61.85 KB, application/x-bzip2) 2007-10-12 05:49 UTC, Julian Sikorski	no flags	Details
strace of thunderbird process (161.23 KB, text/plain) 2007-10-14 13:31 UTC, Julian Sikorski	no flags	Details
ltrace of ntfs-3g process (979.54 KB, text/plain) 2007-10-14 13:31 UTC, Julian Sikorski	no flags	Details
fuse.log file (259.20 KB, application/x-bzip) 2007-10-16 07:57 UTC, Julian Sikorski	no flags	Details
Full thunderbird strace, from start to lockup (153.72 KB, application/x-bzip2) 2007-10-16 12:04 UTC, Julian Sikorski	no flags	Details
strace of mount.ntfs-3g (135.29 KB, application/x-bzip2) 2007-11-01 15:48 UTC, Julian Sikorski	no flags	Details
strace of mount.ntfs-3g (166.94 KB, application/x-bzip2) 2007-11-01 17:42 UTC, Julian Sikorski	no flags	Details
debug patch for fuse (643 bytes, patch) 2007-11-02 15:52 UTC, Miklos Szeredi	no flags	Details \| Diff
Show Obsolete (1) View All

Description Julian Sikorski 2007-10-09 16:43:39 UTC

Description of problem:
Hi. This has been happening for quite a while, but I found some meaningful info
(hopefully) just recently.
The problem seems quite strange. I keep my thunmerbird mailboxes on an ntfs
partition, so that I can share the emails easily when I dual boot. There are 2
POP3 and one IMAP accounts set up. Sometimes when thunderbird finishes fetching
emails from either of POP3 ones, cpu usage goes 100 % and top shows it's
mount.ntfs-3g. A quick look into /var/log/messages reveals the following:

Oct  9 18:25:56 localhost ntfs-3g[1601]: ntfs_attr_pread: na=0x66c180 
b=0x66d160  pos=2138112  count=-1023473: Bad argument
Oct  9 18:26:38 localhost ntfs-3g[1601]:last message repeated 113551 times

That's more than a hundred thousand times in less than a minute. Nice.

Version-Release number of selected component (if applicable):
2:1.1004-1.fc8

How reproducible:
most of the times

Steps to Reproduce:
1. set up your thunderbird mail storage folders on an ntfs partition
2. fetch emails
3. wait until it breaks
  
Actual results:
cpu usage goes sky high

Expected results:
cpu usage stays normal

Additional info:
This has been working OK for ages with vfat, and since I bought this laptop in
January with ntfs-3g. The breakage began roughly about the same time when I
started to use the rawhide kernel with f7. I thought it's just a peculiar setup,
but now I'm rawhide-only and the issue persists.

Comment 1 Szabolcs Szakacsits 2007-10-11 14:06:11 UTC

Did you convert VFAT to NTFS? It sometimes corrupts the filesystem. Please run
'chkdsk DRIVE: /f' on Windows and reboot into Windows twice. Can you still
reproduce the problem on Linux afterwards? If yes then please send your
/var/log/messages* files here or to szaka. Thanks.

Comment 2 Julian Sikorski 2007-10-11 15:58:22 UTC

This laptop has been ntfs-3g from the beginning. IIRC I did run chkdsk, but I'll
do that again just to be sure.

Comment 3 Julian Sikorski 2007-10-12 05:45:24 UTC

I ran chkdsk and then rebooted into linux. The thunderbird issue persisted. I'm
attaching the logs for further investigation. Watch out, they are *huge* when
unzipped.

Comment 4 Julian Sikorski 2007-10-12 05:47:41 UTC

Created attachment 224991 [details]
/var/log/messages

Comment 5 Julian Sikorski 2007-10-12 05:48:13 UTC

Created attachment 225001 [details]
/var/log/messages-20071007

Comment 6 Julian Sikorski 2007-10-12 05:48:36 UTC

Created attachment 225011 [details]
/var/log/messages.1

Comment 7 Julian Sikorski 2007-10-12 05:49:00 UTC

Created attachment 225021 [details]
/var/log/messages.2

Comment 8 Julian Sikorski 2007-10-12 05:49:29 UTC

Created attachment 225031 [details]
/var/log/messages.3

Comment 9 Julian Sikorski 2007-10-12 05:49:54 UTC

Created attachment 225041 [details]
/var/log/messages.4

Comment 10 Szabolcs Szakacsits 2007-10-12 14:42:49 UTC

Thanks a lot. Can you compile/install the driver if I send you patches?

Comment 11 Julian Sikorski 2007-10-12 15:21:20 UTC

Sure. I'll just add them to rpm.

Comment 12 Szabolcs Szakacsits 2007-10-13 13:10:55 UTC

Could you please do an strace -p <PID_of_thunderbird> what it is doing? If it
infinitely passes the wrong read(2) parameters to the kernel and ntfs-3g returns
the valid EINVAL error, what your logs indicate, then this is a thunderbird bug.

It could be also 64-bit related rawhide kernel or FUSE problem. I couldn't find
anyhing related to your unique problem in ntfs-3g yet which could result falling
into an infinite loop internally.

If it's not a thunderbird bug then could you please run 'ltrace -p
<pid_of_ntfs-3g>' when thunderbird is hanging and send the output?

Thanks.

Comment 13 Julian Sikorski 2007-10-13 13:19:40 UTC

It can be a kernel problem - as I said before, it started when I installed
rawhide kernel under Fedora 7. It was pre-2.6.23, while F-7 2.6.22 worked fine.
I'll strace/ptrace it as soon as it happens again.

Comment 14 Szabolcs Szakacsits 2007-10-13 13:39:17 UTC

Then it indeed looks to be a pre-2.6.23 kernel problem (I remember some issues,
but they are not related to your problem). 

So, the problem is not reproducible? Please also note, that I have asked the
outputs of strace and ltrace. The later tracks the ntfs-3g library calls.

Comment 15 Julian Sikorski 2007-10-13 14:48:09 UTC

strace and ltrace, of course. That was my mistake. As for reproducibility, it
does not happen every time, but quite often.

Comment 16 Szabolcs Szakacsits 2007-10-13 19:43:12 UTC

Btw, I couldn't find the ntfs-3g version and mount options in the logs which are
always logged. The logs are continuous, the info is still not there which means
that apparently there is some reliability problem with the rawhide system
logging as well.

Comment 17 Julian Sikorski 2007-10-13 23:23:17 UTC

strace of thunderbird-bin process spams the console with the following:
read(43, 0x1408ac0, 1024)               = -1 EINVAL (Invalid argument)

ltrace of ntfs-3g process throws out loads of these:
ntfs_attr_open(0x64f940, 128, 0x60964c, 0, 0x65f4f0) = 0x65f4f0
ntfs_attr_pread(0x65f4f0, 880640, -419862, 0x64e930, 0x65f900 <unfinished ...>
ntfs_log_handler_syslog(0x3ad641fcf0, 0x3ad641eb2c, 757, 256, 0) = 82
<... ntfs_attr_pread resumed> )                  = -1
__errno_location()                               = 0x2aaaaaad1348
ntfs_attr_close(0x65f4f0, 0x65f7b0, -1, 0x65f7a0, 0x65f7b0) = 56097
ntfs_inode_close(0x64f940, 0x65f4f0, 0x65f540, 0x65f4e0, 0x65f4f0) = 0
free(0x64e3e0)                                   = <void>
__strdup(0x613140, 0x7fff47846fc8, 0x7fff47846fc0, 880640, 0x7fff478470c0) =
0x64e3e0
ntfs_pathname_to_inode(0x60e3e0, 0, 0x64e3e0, 0x786f626e492f6d,
0x542f61747a636f70) = 0x64f940

I mount ntfs-3g with default options.

Comment 18 Szabolcs Szakacsits 2007-10-14 09:06:41 UTC

Thanks. Could you please send more info from the context? At least 100 lines in
both cases unless they are absolutely indentical. i.e. any single character.

So far I can see that there seems to be a problem at least in thunderbird which
apparently repeats the failing call instead of reporting the error to the user.

There may be further problems in the kernel and I also found a problem in
ntfs-3g which may not be the root cause of the error but it may flood the log
needlessly.

Soon I'm sending you an ntfs-3g patch, so we can get more detail about what's
really going. Thanks again.

Comment 19 Julian Sikorski 2007-10-14 09:22:17 UTC

OK. IIRC the lines were exactly the same, but I'll redirect the output to the
text file next time just to be sure. Will report back ASAP.

Comment 20 Julian Sikorski 2007-10-14 09:26:46 UTC

CCing thunderbird owner.

Comment 21 Szabolcs Szakacsits 2007-10-14 11:42:39 UTC

The ntfs-3g patch is at http://ntfs-3g.org/download/bogus-offset-read-fix.diff

It seems ntfs-3g is called with wrong parameters (kernel or thunderbird problem)
what the driver can't fix but the log flooding shouldn't happen anymore unless
there is a real ntfs-3g problem as well.

Please apply it to ntfs-3g and test the problem. All recent ntfs-3g version
should be ok. Thanks.

Comment 22 Julian Sikorski 2007-10-14 13:29:40 UTC

Will try the patch in a moment. In the meantime, attaching more strace/ltrace
lines. Looks like these are really all the same.

Comment 23 Julian Sikorski 2007-10-14 13:31:11 UTC

Created attachment 226621 [details]
strace of thunderbird process

Comment 24 Julian Sikorski 2007-10-14 13:31:51 UTC

Created attachment 226631 [details]
ltrace of ntfs-3g process

Comment 25 Julian Sikorski 2007-10-14 13:46:39 UTC

Looks like the flood problem is indeed fixed, I remounted the volume, ran into
the issue and nothing at all got written to /var/log/messages. The CPU hogging
problem persists, though.

Comment 26 Szabolcs Szakacsits 2007-10-14 13:57:10 UTC

Thunderbird wants to read over the end of a file and never gives up. It's not
clear if this is a pure thunderbird or kernel problem too. Ntfs-3g logged this
error which caused the log flooding. 

The log flooding shouldn't happen anymore with the patch, so your problem will
be a thunderbird hang what the thunderbird should be able to fix by just
stopping readin when it doesn't makes sense anymore.

It's also possible that the kernel missed to send a setattr() to ntfs-3g and
that's why the files size is not the one thunderbird expectes. There is indeed
kernel work recently on this area, so maybe a kernel bug was indeed introduced.

Comment 27 Julian Sikorski 2007-10-14 14:04:01 UTC

Let's CC the kernel folks as well then

Comment 28 Szabolcs Szakacsits 2007-10-15 22:44:36 UTC

Hi Julian,

Could you please reproduce the problem again with this ntfs-3g debug patch to
get more information about the nature of the problem?
http://ntfs-3g.org/download/bogus-offset-read-debug.diff

This will flood again the log but with useful information. Please do these steps:

apply the ntfs-3g patch
mount -t ntfs-3g -o debug <device> <mountpoint> &> fuse.log
reproduce the problem
umount <mountpoint>

Then please send the fuse.log file and the ntfs-3g outputs from
/var/log/messages. One line is enough in the latter case if they are the same,
probably that's the case.

Then you can revert to the old ntfs-3g patch, which will be included in the next
ntfs-3g release, so logs don't get flooded in case of similar softwares errors.
Thanks.

Comment 29 Julian Sikorski 2007-10-16 07:57:45 UTC

Created attachment 228341 [details]
fuse.log file

This time nothing got written to /var/log/messages, fuse.log was flooded
instead. Attached. It's approx 25 MiB unzipped, I left it baking for a bit too
long.

Comment 30 Szabolcs Szakacsits 2007-10-16 08:38:33 UTC

If nothing was written to /var/log/messages then the old patch was applied, or 
not the patched driver was installed or run. The output from /var/log/messages
is crucial. Thanks.

Comment 31 Szabolcs Szakacsits 2007-10-16 08:46:35 UTC

Sorry, yesterday I was too tired and now still sleeping. You did everything
right, the ntfs-3g log also goes to fuse.log when the debug mount option is
used. I have all the needed info. 

What is the size of /Julian/poczta/Thunderbird/pop3.poczta.onet.pl/Inbox?
Thunderbird tries to read at offset 712704 but it's size is only 364696.

Comment 32 Szabolcs Szakacsits 2007-10-16 09:55:41 UTC

Could you please also start 'strace -f -p <pid_of_thunderbird> -o
thunderbird.log'  before the problem happens and send the log? 

This way we maybe able to confirm that all the problems are in thunderbird or
there is an additional one in the kernel which is probably in FUSE if no one can
reproduce it on any other file systems.

What is your exact kernel version? Any pointer to somewhere what patches the
kernel exactly has?

Comment 33 Julian Sikorski 2007-10-16 10:05:40 UTC

The size is 364696 bytes, exactly as you reported.

The kernel I am running currently is 2.6.23-5.fc8.x86_64. For the patches it
uses, take a look into fedora cvs [1]. Look in the spec file to check which ones
are actually applied.

I'll try to get the full strace as soon as possible.

[1] http://cvs.fedoraproject.org/viewcvs/rpms/kernel/devel/

Comment 34 Julian Sikorski 2007-10-16 10:12:02 UTC

One thing came to my mind: the size of the file is is 364696 bytes, but that's
without a couple of emails that were received and then deleted during the bug
capture. So maybe it is some kind of race condition, which tries to read content
from disk before actually writing it? This could explain the non-reproducibility.

Comment 35 Julian Sikorski 2007-10-16 10:15:02 UTC

More: after receiving these emails Inbox should be bigger than 364696.

Comment 36 Szabolcs Szakacsits 2007-10-16 11:08:44 UTC

The Inbox size is always under 400 kB. The reads are behind 700 kB. Apparently
thunderbird does this

fd = open(file, ...)
read(fd, buf, bufsize);

which should arrive to ntfs-3 as

READ bufsize at offset 0

but it arrives as 

READ bufsize at offset 712704.

So, something corrupts or doesn't reset the ofset value. It could be a
thinderbird multi-threading issue, or kernel, or FUSE.

I couldn't find anything in the log which would have ordered ntfs-3g to increase
the file size to above 712704.

Comment 37 Szabolcs Szakacsits 2007-10-16 11:18:21 UTC

Julian, could you please change the Component field in the bug report from
ntfs-3g to thunderbird? I can't find anything which would indicate that this
would be an ntfs-3g problem. There is at least one thunderbird problem (no error
handling) and there is another one somewhere (file descriptor offset corruption).

Comment 38 Julian Sikorski 2007-10-16 12:03:22 UTC

OK, I managed to obtain full strace of thunderbird.

Comment 39 Julian Sikorski 2007-10-16 12:04:39 UTC

Created attachment 228651 [details]
Full thunderbird strace, from start to lockup

Comment 40 Szabolcs Szakacsits 2007-10-16 13:52:05 UTC

Looks like the whence argument of the second lseek() below was supposed to be
SEEK_SET, not SEEK_CUR. This seems to be a pure thunderbird problem. Thanks a
lot Julian.

3022  open("/mnt/windows/Julian/poczta/Thunderbird/pop.gmail.com/Inbox",
O_RDONLY) = 43
3022  lseek(43, 229086, SEEK_CUR)       = 229086
3022  read(43, "From - Tue Oct 16 13:58:27 2007\n"..., 1024) = 1024
3022  read(43, "108.195.44 as permitted sender) "..., 1024) = 1024
3022  lseek(43, 231133, SEEK_CUR)       = 462267
3022  read(43, 0x147c380, 1024)         = -1 EINVAL (Invalid argument)
3022  read(43, 0x147c380, 1024)         = -1 EINVAL (Invalid argument)
3022  read(43, 0x147c380, 1024)         = -1 EINVAL (Invalid argument)
3022  read(43, 0x147c380, 1024)         = -1 EINVAL (Invalid argument)
3022  read(43, 0x147c380, 1024)         = -1 EINVAL (Invalid argument)
3022  read(43, 0x147c380, 1024)         = -1 EINVAL (Invalid argument)
3022  read(43, 0x147c380, 1024)         = -1 EINVAL (Invalid argument)
3022  read(43, 0x147c380, 1024)         = -1 EINVAL (Invalid argument)
[ ... forever ... ]

Comment 41 Julian Sikorski 2007-10-16 14:53:50 UTC

(In reply to comment #40)
> Looks like the whence argument of the second lseek() below was supposed to be
> SEEK_SET, not SEEK_CUR. This seems to be a pure thunderbird problem. Thanks a
> lot Julian.
> 
No problem. I am interested in a fix too, after all :)

Comment 42 Eric Sandeen 2007-10-16 15:38:23 UTC

Not to be too pedantic about it, but why is this only seen on the ntfs-3g
filesystem?  (or is it?)

Comment 43 Julian Sikorski 2007-10-16 16:00:58 UTC

I keep my mailbox on an ntfs partition. I could move it to ext3 and check, but
is that really necessary?

Comment 44 Szabolcs Szakacsits 2007-10-16 16:03:39 UTC

Here is another recent thunderbird hang:
https://bugzilla.redhat.com/show_bug.cgi?id=302791

Moreover thunderbird may do some wrong assumption and behaves differently as
usual. There were plenty such problems in the past for example made by Firefox,
etc. This can be especially true if they still share the broken code.

Comment 45 Eric Sandeen 2007-10-16 16:11:08 UTC

Not necessary I suppose, was just curious.  It would possibly help absolutely
verify that this is a pure tbird bug, not related to the filesystem.  But, up to
you :)

Comment 46 Szabolcs Szakacsits 2007-10-16 16:20:45 UTC

(In reply to comment #45)
> Not necessary I suppose, was just curious.  It would possibly help absolutely
> verify that this is a pure tbird bug, not related to the filesystem.  But, up to
> you :)

It's pretty clear from the logs that this is a pure thunderbird bug.

Comment 47 Julian Sikorski 2007-10-16 16:24:18 UTC

Given the peculiarity of the issue, I'd prefer to leave the things as they are
for now. I wouldn't like lose the ability to reproduce the behaviour. It's tough
enough already. Let's see what callion will say.

Comment 48 Szabolcs Szakacsits 2007-10-17 10:39:27 UTC

The problem is getting reported from non-rawhide distros too. Please fix
thunderburd, it should be very simple, it was pointed out exactly where the
problem is in comment #40. Thanks.

Comment 49 Julian Sikorski 2007-10-29 20:12:24 UTC

Any updates on this?

Comment 50 Szabolcs Szakacsits 2007-10-31 13:41:48 UTC

Thanks to Tullio Andreatta for pointing out that ntfs-3g shouldn't return EINVAL
but 0 in such cases (the core of the driver is highly paranoid to catch
corruptions and other errors but this case should be indeed handled differently
in an upper layer). So we have several problems in both softwares.

I'll post an ntfs-3g patch for testing ASAP.

Comment 51 Szabolcs Szakacsits 2007-10-31 14:42:21 UTC

Well, this seems to be a kernel issue too in FUSE. Apparently the problem in
ntfs-3g was hidden until recently because the FUSE kernel module handled these
cases correctly (it's not really true that FUSE drivers are user space drivers
because often the syscalls are done purely by the FUSE kernel module -- that's
also why none can see any difference sometimes in performance compared to
in-kernel drivers). But this may not be true anymore for this case, though I
suspect thunderbird multithreading may have something to do with the issue. 
I'll discuss the issue with Miklos Szeredi.

Comment 52 Szabolcs Szakacsits 2007-10-31 15:16:52 UTC

The ntfs-3g fix is against release 1.1030:
http://ntfs-3g.org/download/read-over-file-end-fix.diff

Please let me know how it works. Thanks.

Comment 53 Miklos Szeredi 2007-10-31 21:36:19 UTC

Hi Julian,

I'm intereted in pursuing the kernel side of this bug, if you can help with that.

Two things that may shed some light on the issue:

  - trying a kernel based on 2.6.24-rc1, which has a fuse bug fixed, which might
be related to this

  - doing a 'strace -f -v -x -s 256 ...' on the ntfs-3g process

You should obviously do this with an unpatched ntfs-3g, so hitting the bug is
visible.

Thanks

Comment 54 Julian Sikorski 2007-10-31 21:47:33 UTC

Are you saying that Szaka's patch only hides the bug? I don't want to get overly
enthusiastic, but since I installed it a few hours ago thunderbird never locked up.
As for the 2.6.24-based kernel, I'll have to wait until it hits rawhide.
As for the strace, I can downgrade ntfs-3g and get it.

Comment 55 Miklos Szeredi 2007-10-31 21:55:57 UTC

Yes, ntfs-3g should never have got a request that was over the file size.  This
can only happen if ntfs-3g is reporting the file size incorrectly to fuse, or if
there's some bug in fuse.

There was a related bug that could make fuse beleive the file size is shorter
than it actually is, hence preveting reads up to the end of file.  So I'm
looking for something similar.

Comment 56 Julian Sikorski 2007-11-01 15:48:20 UTC

Created attachment 245801 [details]
strace of mount.ntfs-3g

That's just the lockup. If you need more context, let me know.

Comment 57 Szabolcs Szakacsits 2007-11-01 16:06:21 UTC

> Yes, ntfs-3g should never have got a request that was over the file size.
> This can only happen if ntfs-3g is reporting the file size incorrectly to
> fuse, or if there's some bug in fuse.

The problem started with kernel upgrades and if ntfs-3g reported wrong
sizes then people would report it with older kernels too, I guess.

> There was a related bug that could make fuse beleive the file size is
> shorter than it actually is, hence preveting reads up to the end of file.
> So I'm looking for something similar.

For me it seems that FUSE thinks that the file is actually larger than it
is in real. Maybe because of the 2nd lseek. I don't know, I can't reproduce
the problem with my kernels.

Btw Julian, could you try the FUSE kernel module from the FUSE package,
not the one in the Fedora kernel?

Comment 58 Miklos Szeredi 2007-11-01 16:07:49 UTC

Thanks, more context would be useful. 

If you start 'less strace-mount.ntfs-3g.log' and search for the string 'writev'
then you will notice when the pattern starts to repeat.  That's where the lockup
starts, and basically the interesting part is before that.

Comment 59 Szabolcs Szakacsits 2007-11-01 16:20:48 UTC

The Gentoo kernel having the same problem is 2.6.23-rc8-mm2.

Comment 60 Julian Sikorski 2007-11-01 17:42:40 UTC

Created attachment 245901 [details]
strace of mount.ntfs-3g

This one should have more info as I attached to the process before starting
thunderbird.

Comment 61 Miklos Szeredi 2007-11-02 15:52:07 UTC

Created attachment 246861 [details]
debug patch for fuse

I didn't find the reason for the bug in the traces.

So as a next step, we need to look for the problem inside the kernel.

Attaching a patch that adds some debugging code to the fuse kernel module.

Can you please try this?  The simplest is probably to grab the fuse source
package from

http://switch.dl.sourceforge.net/sourceforge/fuse/fuse-2.7.1.tar.gz

after unpacking apply the patch:

  patch fuse-2.7.1/kernel/file.c < fuse_read_debug.patch

compile the kernel module:

  cd fuse-2.7.1/kernel; ./configure --enable-kernel-module; make

then replace the kernel module:

  rmmod fuse; insmod fuse-2.7.1/kernel/fuse.ko

You will probably need to install a kernel-source or kernel-headers package or
something similar, before compiling the module.

The debug output should be in /var/log/messages, and I only need the first
stack trace after the hang (if there is a stack trace at all), as all of them
will probably be the same.

Thanks.

Comment 62 Julian Sikorski 2007-11-08 20:17:41 UTC

Nov  8 21:15:20 localhost ntfs-3g[21577]: Version 1.1030
Nov  8 21:15:20 localhost ntfs-3g[21577]: Mounted /dev/sda1 (Read-Write, label
"windows", NTFS 3.1)
Nov  8 21:15:20 localhost ntfs-3g[21577]: Cmdline options: rw
Nov  8 21:15:20 localhost ntfs-3g[21577]: Mount options:
noatime,rw,silent,allow_other,nonempty,fsname=/dev/sda1,blkdev,blksize=4096
Nov  8 21:15:23 localhost kernel: fuse: read beyond EOF
Nov  8 21:15:23 localhost kernel: 
Nov  8 21:15:23 localhost kernel: Call Trace:
Nov  8 21:15:23 localhost kernel:  [<ffffffff88992a26>]
:fuse:fuse_read_fill+0x58/0xb0
Nov  8 21:15:23 localhost kernel:  [<ffffffff889920ea>]
:fuse:fuse_readdir+0xbf/0x1df
Nov  8 21:15:23 localhost kernel:  [<ffffffff810a769c>] filldir+0x0/0xb7
Nov  8 21:15:23 localhost kernel:  [<ffffffff810a769c>] filldir+0x0/0xb7
Nov  8 21:15:23 localhost kernel:  [<ffffffff810a77ca>] vfs_readdir+0x77/0xa9
Nov  8 21:15:23 localhost kernel:  [<ffffffff810a7a1c>] sys_getdents+0x75/0xbd
Nov  8 21:15:23 localhost kernel:  [<ffffffff8100bce1>] tracesys+0x71/0xda
Nov  8 21:15:23 localhost kernel:  [<ffffffff8100bd45>] tracesys+0xd5/0xda
Nov  8 21:15:23 localhost kernel: 
Nov  8 21:15:24 localhost kernel: fuse: read beyond EOF
Nov  8 21:15:24 localhost kernel: 
Nov  8 21:15:24 localhost kernel: Call Trace:
Nov  8 21:15:24 localhost kernel:  [<ffffffff88992a26>]
:fuse:fuse_read_fill+0x58/0xb0
Nov  8 21:15:24 localhost kernel:  [<ffffffff88992b32>]
:fuse:fuse_send_read+0x23/0x39
Nov  8 21:15:24 localhost kernel:  [<ffffffff889935a0>]
:fuse:fuse_readpage+0x86/0xc3
Nov  8 21:15:24 localhost kernel:  [<ffffffff810732c0>]
do_generic_mapping_read+0x250/0x3f7
Nov  8 21:15:24 localhost kernel:  [<ffffffff81072956>] file_read_actor+0x0/0x18a
Nov  8 21:15:24 localhost kernel:  [<ffffffff81074972>]
generic_file_aio_read+0x11d/0x160
Nov  8 21:15:24 localhost kernel:  [<ffffffff8109b70c>] do_sync_read+0xc9/0x10c
Nov  8 21:15:24 localhost kernel:  [<ffffffff889931a4>]
:fuse:fuse_open_common+0x141/0x152
Nov  8 21:15:24 localhost kernel:  [<ffffffff810493c1>]
autoremove_wake_function+0x0/0x2e
Nov  8 21:15:24 localhost kernel:  [<ffffffff8109a26d>] do_filp_open+0x2a/0x38
Nov  8 21:15:24 localhost kernel:  [<ffffffff810f7924>]
selinux_file_permission+0x57/0x117
Nov  8 21:15:24 localhost kernel:  [<ffffffff8109bf83>] vfs_read+0xcb/0x173
Nov  8 21:15:24 localhost kernel:  [<ffffffff8109c364>] sys_read+0x45/0x6e
Nov  8 21:15:24 localhost kernel:  [<ffffffff8100bd45>] tracesys+0xd5/0xda
Nov  8 21:15:24 localhost kernel: 

Then the second trace repeats 2 more times.

Comment 63 Miklos Szeredi 2007-11-12 12:20:00 UTC

Found the culprit, it's this commit, which moves the i_size check after
->readpage():

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a32ea1e1f925399e0d81ca3f7394a44a6dafa12c

It can be fixed in fuse by checking the size in ->readpage().  I'll submit a
fix.  Maybe this also qualifies for -stable.

I think this bug can now be closed.  Thanks for the help.

Comment 64 Miklos Szeredi 2007-11-12 17:02:00 UTC

After more thought fuse should not be changed, except to document that a read
may be performed beyond EOF, and the fs should return zero in that case.

Such reads could happen anyway (even with 2.6.22 or earlier kernels), if a
truncate races with a read, since these are not serialized.

As an optimization, the check for read beyond EOF could be added, but that may
just hide bugs, so I think it's better to let the filesystem always deal with
this situation.

Comment 65 cheguaka 2007-11-21 09:19:03 UTC

Same scenario same result. I've bitten by this bug.

Comment 66 Szabolcs Szakacsits 2007-11-21 09:45:12 UTC

(In reply to comment #65)
> Same scenario same result. I've bitten by this bug.

The file system part of the problem was fixed in NTFS-3G 1.1104 release:
http://ntfs-3g.org/releases.html

FreeBSD had the same issue and they also confirmed that the fix works. The
current Fedora NTFS-3G package is newer, 1.1120. I suggest, just update.

Thank you Miklos for the great analyzes.

Comment 67 cheguaka 2007-11-21 16:06:05 UTC

Well, not exactly the same scenario.
Fedora 8 -stable.

kernel-2.6.23.1-49.fc8.i686.rpm 
ntfs-3g-1.1030-1.fc8.i386.rpm 
thunderbird-2.0.0.9-1.fc8.i386.rpm 

Update to (test) ntfs-3g-1.1104-1.fc8.i386.rpm   

Seems ok. No hang in several tries.

Note You need to log in before you can comment on or make changes to this bug.