Bug 459436 - ext4 assembly bitops failures on s390
Summary: ext4 assembly bitops failures on s390
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: All
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: Eric Sandeen
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks: 447797
TreeView+ depends on / blocked
 
Reported: 2008-08-18 20:41 UTC by Eric Sandeen
Modified: 2009-01-20 20:10 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-20 20:10:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:0225 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.3 kernel security and bug fix update 2009-01-20 16:06:24 UTC

Description Eric Sandeen 2008-08-18 20:41:39 UTC
There appear to be failures in the assembly bitops that ext4 uses on s390.  (I tested on 2.6.27-rc*)

Simply making a filesystem with mkfs.ext4dev (from e2fsprogs-1.41.0), mounting it, and attempting to copy a few files to the fs will show the problem; the cp never completes, and the thread is either locked or spinning in the ext4 bg initialization code, which uses these bitops (via inlines, so not seen in the following trace):

Call Trace:
([<0000000000016472>] show_trace+0xb2/0x130)
 [<00000000001793f4>] showacpu+0x48/0x68
 [<0000000000024816>] do_ext_call_interrupt+0x8a/0xc4
 [<000000000001c2b6>] do_extint+0xe2/0xfc
 [<0000000000020ea8>] ext_no_vtime+0x16/0x1a
 [<0000000020cd99f2>] ext4_mb_init_cache+0x9fe/0x1020 [ext4dev]
([<0000000020cd91a4>] ext4_mb_init_cache+0x1b0/0x1020 [ext4dev])
 [<0000000020cdba9c>] ext4_mb_load_buddy+0x264/0x36c [ext4dev]
 [<0000000020cdc446>] ext4_mb_regular_allocator+0x53e/0x1218 [ext4dev]
 [<0000000020ce0d06>] ext4_mb_new_blocks+0x1a2/0x7e8 [ext4dev]
 [<0000000020cd6538>] ext4_ext_get_blocks+0xe3c/0x1074 [ext4dev]
 [<0000000020cc32be>] ext4_get_blocks_wrap+0x132/0x190 [ext4dev]
 [<0000000020cc40aa>] ext4_getblk+0x8a/0x26c [ext4dev]
 [<0000000020cc4b12>] ext4_bread+0x26/0xd8 [ext4dev]
 [<0000000020cc89fa>] ext4_mkdir+0x18e/0x3c8 [ext4dev]
 [<00000000000be54c>] vfs_mkdir+0x10c/0x1a8
 [<00000000000c204e>] sys_mkdirat+0xca/0x114
 [<00000000000208c0>] sysc_tracego+0xe/0x14
 [<00000200001345e6>] 0x200001345e6

I've also pinged Martin (schwidefsky.com) and he said he'd look into it but I've not heard back after about a week.

We'd like to ship ext4 as tech preview in RHEL5.3, and it'd be... best... if it worked on s390 too.

I'd appreciate any help in getting this tracked down, and I can backport the fix to the RHEL5.3 kernel.

Thanks,
-Eric

Comment 1 Jan Glauber 2008-08-20 11:34:47 UTC
Eric,
the first version of ext4 that compiled on s390 after we had the bitops support
implemented should have been working. Maybe you could try that and do a bisect search for the change that broke ext4?

Comment 2 Eric Sandeen 2008-08-21 05:39:12 UTC
Jan, which version was that, out of curiosity?

FWIW, this sort of change:

Index: linux-2.6/arch/s390/include/asm/bitops.h
===================================================================
--- linux-2.6.orig/arch/s390/include/asm/bitops.h	2008-08-11 16:23:58.000000000 -0500
+++ linux-2.6/arch/s390/include/asm/bitops.h	2008-08-20 22:43:55.516165589 -0500
@@ -865,7 +865,7 @@ static inline int ext2_find_next_bit(voi
 		 * s390 version of ffz returns __BITOPS_WORDSIZE
 		 * if no zero bit is present in the word.
 		 */
-		set = ffs(__load_ulong_le(p, 0) >> bit) + bit;
+		set = __ffs(__load_ulong_le(p, 0) >> bit) + bit;
 		if (set >= size)
 			return size + offset;
 		if (set < __BITOPS_WORDSIZE)

at least gets the "copy /lib/modules to an ext4 filesystem" test working; however, when I run fsstress I'm running into other trouble.

The above changes the semantics of counting bits from starting at 1 to starting at 0; IOW, for a bitmap of all 1's, the original code did this:

find next set bit starting at 0: 0
find next set bit starting at 1: 2

with the change, it's (properly, I think):

find next set bit starting at 0: 0
find next set bit starting at 1: 1

-Eric

Comment 3 Eric Sandeen 2008-08-21 06:10:43 UTC
Ok, posted that a bit too soon.  I think this gets it going:

Index: linux-2.6/arch/s390/include/asm/bitops.h
===================================================================
--- linux-2.6.orig/arch/s390/include/asm/bitops.h	2008-08-11 16:23:58.000000000 -0500
+++ linux-2.6/arch/s390/include/asm/bitops.h	2008-08-21 00:49:40.950176518 -0500
@@ -862,10 +862,10 @@ static inline int ext2_find_next_bit(voi
 	p = addr + offset / __BITOPS_WORDSIZE;
 	if (bit) {
 		/*
-		 * s390 version of ffz returns __BITOPS_WORDSIZE
-		 * if no zero bit is present in the word.
+		 * s390 version of ffs returns __BITOPS_WORDSIZE
+		 * if no set bit is present in the word.
 		 */
-		set = ffs(__load_ulong_le(p, 0) >> bit) + bit;
+		set = __ffs(__load_ulong_le(p, 0) & (~0UL << bit));
 		if (set >= size)
 			return size + offset;
 		if (set < __BITOPS_WORDSIZE)

-Eric

Comment 7 Don Zickus 2008-09-10 20:14:54 UTC
in kernel-2.6.18-110.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 11 errata-xmlrpc 2009-01-20 20:10:49 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html


Note You need to log in before you can comment on or make changes to this bug.