Bug 641926 - repeatable segfault in e4defrag
Summary: repeatable segfault in e4defrag
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: e2fsprogs
Version: rawhide
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Eric Sandeen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-10-11 14:43 UTC by Michał Piotrowski
Modified: 2011-11-02 00:50 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-11-02 00:50:32 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
canto_della_terra2.sh (2.58 KB, application/octet-stream)
2010-10-11 15:14 UTC, Michał Piotrowski
no flags Details
CDT version for comment #3 (2.58 KB, application/octet-stream)
2010-10-11 19:01 UTC, Michał Piotrowski
no flags Details
2.6.36 ext4 error (120.53 KB, text/plain)
2010-11-04 23:18 UTC, Michał Piotrowski
no flags Details
CDT version for comment #9 (5.54 KB, application/octet-stream)
2010-11-04 23:19 UTC, Michał Piotrowski
no flags Details
defrag.py (3.78 KB, application/octet-stream)
2010-12-04 03:05 UTC, Michał Piotrowski
no flags Details
ext4 spews dmesg (474.36 KB, text/plain)
2011-09-16 18:51 UTC, Harshavardhana
no flags Details

Description Michał Piotrowski 2010-10-11 14:43:12 UTC
Description of problem:
e4defrag[23597]: segfault at 20 ip 000000000040232c sp 00007fff5bb0cd20 error 4 in e4defrag[400000+5000]

Version-Release number of selected component (if applicable):


How reproducible:
canto_della_terra2.sh /mnt/tmp/test/ /mnt/tmp/reply.sh 50

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Eric Sandeen 2010-10-11 14:52:25 UTC
Can you please attach the scripts in question?

Thanks,
-Eric

Comment 2 Michał Piotrowski 2010-10-11 15:14:09 UTC
Created attachment 452734 [details]
canto_della_terra2.sh

I attached it when I created a report. Strange.

Comment 3 Michał Piotrowski 2010-10-11 18:59:45 UTC
This is something interesting. I ran cdt (new version attached) canto_della_terra2.sh /mnt/tmp/test/ /mnt/tmp/reply.sh 50

When the script finished, I checked /mnt/tmp/test/ with e4defrag

[14/75]/mnt/tmp/test/10/1:      100%    [ OK ]
[15/75]/mnt/tmp/test/31:        100%    [ OK ]
[16/75]/mnt/tmp/test/7:   0%    [ NG ]
[17/75]/mnt/tmp/test/21:        100%    [ OK ]
[23/75]/mnt/tmp/test/5/8:         0%    [ NG ]
[29/75]/mnt/tmp/test/40:        100%    [ OK ]
[31/75]/mnt/tmp/test/13/43:       0%    [ NG ]
[43/75]/mnt/tmp/test/26/20/46:  100%    [ OK ]
[44/75]/mnt/tmp/test/26/16:     100%    [ OK ]
[47/75]/mnt/tmp/test/20:          0%    [ NG ]
[53/75]/mnt/tmp/test/4/2:         0%    [ NG ]
[58/75]/mnt/tmp/test/11:        100%    [ OK ]
[59/75]/mnt/tmp/test/14:          0%    [ NG ]
[70/75]/mnt/tmp/test/35/19:     100%    [ OK ]
[74/75]/mnt/tmp/test/16:        100%    [ OK ]

        Success:                        [ 9/75 ]
        Failure:                        [ 66/75 ]
[michal@ozzy tmp]$ e4defrag test/
ext4 defragmentation for directory(test/)
[14/75]/mnt/tmp/test/10/1:      100%    [ OK ]
[15/75]/mnt/tmp/test/31:        100%    [ OK ]
[16/75]/mnt/tmp/test/7: 100%    [ OK ]
[17/75]/mnt/tmp/test/21:        100%    [ OK ]
[23/75]/mnt/tmp/test/5/8:         0%    [ NG ]
[29/75]/mnt/tmp/test/40:        100%    [ OK ]
[31/75]/mnt/tmp/test/13/43:     100%    [ OK ]
[43/75]/mnt/tmp/test/26/20/46:  100%    [ OK ]
[44/75]/mnt/tmp/test/26/16:     100%    [ OK ]
[47/75]/mnt/tmp/test/20:        100%    [ OK ]
[53/75]/mnt/tmp/test/4/2:         0%    [ NG ]
[58/75]/mnt/tmp/test/11:        100%    [ OK ]
[59/75]/mnt/tmp/test/14:          0%    [ NG ]
[70/75]/mnt/tmp/test/35/19:     100%    [ OK ]
[74/75]/mnt/tmp/test/16:          0%    [ NG ]

        Success:                        [ 11/75 ]
        Failure:                        [ 64/75 ]

fewer failures than previously

[..]

[michal@ozzy tmp]$ e4defrag test/
ext4 defragmentation for directory(test/)
[14/75]/mnt/tmp/test/10/1:      100%    [ OK ]
[15/75]/mnt/tmp/test/31:        100%    [ OK ]
[16/75]/mnt/tmp/test/7: 100%    [ OK ]
[17/75]/mnt/tmp/test/21:        100%    [ OK ]
[23/75]/mnt/tmp/test/5/8:         0%    [ NG ]
[29/75]/mnt/tmp/test/40:        100%    [ OK ]
[31/75]/mnt/tmp/test/13/43:       0%    [ NG ]
[43/75]/mnt/tmp/test/26/20/46:  100%    [ OK ]
[44/75]/mnt/tmp/test/26/16:     100%    [ OK ]
[47/75]/mnt/tmp/test/20:        100%    [ OK ]
[53/75]/mnt/tmp/test/4/2:         0%    [ NG ]
[58/75]/mnt/tmp/test/11:        100%    [ OK ]
[59/75]/mnt/tmp/test/14:          0%    [ NG ]
[70/75]/mnt/tmp/test/35/19:     100%    [ OK ]
[74/75]/mnt/tmp/test/16:          0%    [ NG ]

        Success:                        [ 10/75 ]
        Failure:                        [ 65/75 ]


/mnt/tmp/test/13/43 - it's a normal sparse file
/mnt/tmp/test/16 - also is a sparse file

Comment 4 Michał Piotrowski 2010-10-11 19:01:11 UTC
Created attachment 452776 [details]
CDT version for comment #3

Comment 5 Eric Sandeen 2010-10-11 19:03:28 UTC
Thanks!

Comment 6 Michał Piotrowski 2010-10-11 19:04:04 UTC
BTW. there is a free space on fs
/mnt/tmp/ext.img         63461     48429     11756  81% /mnt/tmp/test
behavior described above is repeatable.

Comment 7 Michał Piotrowski 2010-10-11 20:08:39 UTC
I "discovered" -v option in e4defrag.

[43/59]/mnt/tmp/test/24:          0%
        Failed to defrag:Bad file descriptor    [ NG ]
[..]
[43/59]/mnt/tmp/test/24:        100%  extents: 3 -> 3   [ OK ]

Comment 8 Michał Piotrowski 2010-11-04 23:05:30 UTC
Mr. Akira Fujita wrote (quote from e-mail correspondence):

"The problems you reported are as follows, right?
1) Got the segmentation fault with e4defrag.
2) Mismatch of success/failure count by multiple e4derag.

I guess 2) is because contiguous blocks is short on your FS.

The e4defrag tries to get multiple blocks with mballoc which
manages contiguous blocks by the buddy bitmap,
and removes donor file which has exchanged original blocks
to release at its last.

Until the journal is committed (every 5 sec or sync),
the buddy information is not updated (I mean we can not reuse these blocks).

So, when you do e4defrag for many files, you should do
"sync" in before and after to reused blocks immediately."

I confirm that the problem apparent from the fact that I didn't sync before run e4defrag. When I run
sync && e4defrag -v /mnt/tmp/test/
problem doesn't appear.

Comment 9 Michał Piotrowski 2010-11-04 23:17:16 UTC
Something strange is going on here when I run CDT2 script (new version attached) on 2.6.36 kernel

[46519.654518] EXT4-fs: Can't allocate: Allocation context details:
[46519.654663] EXT4-fs: status 1 flags 2080
[46519.654791] EXT4-fs: orig 1/0/7404@0, goal 15/4338/2048@0, best 0/0/0@0 cr 3
[46519.654923] EXT4-fs: 0 scanned, 0 found
[46519.655130] EXT4-fs: groups:
[46519.655271]
[46519.656509] EXT4-fs: Can't allocate: Allocation context details:
[46519.656671] EXT4-fs: status 1 flags 1120
[46519.656810] EXT4-fs: orig 5/5209/449@2275, goal 5/5209/449@2275, best 0/0/0@2275 cr 3
[46519.657160] EXT4-fs: 0 scanned, 0 found
[46519.657294] EXT4-fs: groups:
[46519.657427]
[46519.660664] EXT4-fs: Can't allocate: Allocation context details:
[46519.660817] EXT4-fs: status 1 flags 2080
[46519.660955] EXT4-fs: orig 1/0/3576@0, goal 15/4338/2048@0, best 0/0/0@0 cr 3
[46519.661176] EXT4-fs: 0 scanned, 0 found
[46519.661286] EXT4-fs: groups:
[46519.661396] 6: 363/1
[46519.661500] 7: 352/1
[46519.661634] 10: 755/1
[46519.661782] 11: 471/2
[46519.661891] 14: 328/1
[46519.661990]

dmesg attached

Comment 10 Michał Piotrowski 2010-11-04 23:18:36 UTC
Created attachment 457962 [details]
2.6.36 ext4 error

Comment 11 Michał Piotrowski 2010-11-04 23:19:52 UTC
Created attachment 457963 [details]
CDT version for comment #9

Comment 12 Michał Piotrowski 2010-11-04 23:22:46 UTC
BTW. Kernel is self build 2.6.36-1
http://koji.fedoraproject.org/koji/buildinfo?buildID=201305
running on F14

Comment 13 Michał Piotrowski 2010-11-16 10:41:57 UTC
Kazuya Mio fixed reported problem.

This patch is needed
http://marc.info/?l=linux-ext4&m=128272690010784&w=4

And this

Signed-off-by: Kazuya Mio <k-mio.nec.com>
---
 misc/e4defrag.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/misc/e4defrag.c b/misc/e4defrag.c
index db92e1c..eb3269a 100644
--- a/misc/e4defrag.c
+++ b/misc/e4defrag.c
@@ -936,7 +936,7 @@ static int get_file_extents(int fd, struct fiemap_extent_list **ext_list_head)
               fiemap_buf->fm_start = pos;
               memset(ext_buf, 0, ext_buf_size);
               ret = ioctl(fd, FS_IOC_FIEMAP, fiemap_buf);
-               if (ret < 0)
+               if (ret < 0 || fiemap_buf->fm_mapped_extents == 0)
                       goto out;
               for (i = 0; i < fiemap_buf->fm_mapped_extents; i++) {
                       ext_list = NULL;

Eric, you can find this patch in your mail box - subiect "e4defrag problems".

Kind regards,
Michal

Comment 14 Eric Sandeen 2010-11-16 17:45:29 UTC
I'll bug upstream to get these in, first.

-Eric

Comment 15 Michał Piotrowski 2010-11-16 20:24:41 UTC
Ok.

What about this "Can't allocate: Allocation context details"? I belive it was introduced intentionally in 2.6.36, but I do not think that the intention was to spam dmesg log. Am I right?

BTW. I was thinking about a small program that can be started from Gnome/nautilus context menu. Two options - "show information about file/dir fragmentation" and "defrag this file/dir". I've got a simple non threaded version now. What do you think about this idea? Cool enough? Or maybe I should not waste time on it?

Best regards,
Michal

Comment 16 Eric Sandeen 2010-12-03 15:19:33 UTC
Michał that might be neat, but it'll only work for files on some filesystems.  You'd need to extend it to work with every available defragmenter, I think.  Simply reporting fragmentation stats is more generic, though.

Comment 17 Michał Piotrowski 2010-12-04 03:05:18 UTC
Created attachment 464699 [details]
defrag.py

Attached version defrags ext4, xfs, btrfs and shows info about files fragmentation on ext4 and xfs.

Defragmentation
./defrag-v2.py -d -o /mnt/tmp3

Information about files fragmentation
./defrag-v2.py -o /mnt/tmp3

Comment 18 Harshavardhana 2011-09-16 18:51:51 UTC
Created attachment 523608 [details]
ext4 spews dmesg

I am afraid i see the 'Allocation context details' even in 2.6.18-239.5.1 with ext4.

Correlation was a 9TB ext4 volume reported suddenly overnight that its full and is 100% utilized and dmesg filled with these log messages. 


After a umount everything seems to be normal again while it has used only 900G out of 9TB. 

Are there any other tools to debug this? blktrace? anything? 

Attaching the dmesg 

kernel version 2.6.18-238.9.1 

mount params "rw,barrier=1,data=ordered 0 0"

Comment 19 Eric Sandeen 2011-09-16 19:07:46 UTC
Harshavardhana, are you on the right bug?  This one is for rawhide & e4defrag, which is not shipped in RHEL5 (which is indicated as your OS from your kernel version...)

Comment 20 Harshavardhana 2011-09-16 19:57:44 UTC
Eric, i was searching for the related messages. Ended up here, i will go ahead and open a new bug if there isn't one.

Comment 21 Eric Sandeen 2011-11-01 18:57:59 UTC
Much newer e2fsprogs are in rawhide now.  Can anyone confirm that these e4defrag problems have been fixed?

Comment 22 Michał Piotrowski 2011-11-01 19:06:51 UTC
Hi,

I don't have working rawhide system, but I can build this version for F16 if it is sufficient to test.

Best regards,
Michal

Comment 23 Eric Sandeen 2011-11-01 19:13:31 UTC
That should be a sufficient test, if you are willing to do it! :)

Comment 24 Michał Piotrowski 2011-11-01 20:39:08 UTC
I can't create fs with this version on test image.
$ mkfs.ext4 ext4.img
mkfs.ext4: symbol lookup error: mkfs.ext4: undefined symbol: set_com_err_gettext

Comment 25 Eric Sandeen 2011-11-01 20:45:11 UTC
You'd need to update the libraries (libcom_err) too.

Comment 26 Michał Piotrowski 2011-11-01 21:12:57 UTC
I don't see any segfaults or any other problems with e4defrag. Thanks!

Comment 27 Eric Sandeen 2011-11-02 00:50:32 UTC
Great, thanks for testing!


Note You need to log in before you can comment on or make changes to this bug.