Bug 608154 - fsck.gfs2: unaligned access on ia64
Summary: fsck.gfs2: unaligned access on ia64
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: cluster
Version: 6.0
Hardware: ia64
OS: Linux
Target Milestone: rc
: ---
Assignee: Robert Peterson
QA Contact: Cluster QE
Depends On:
Blocks: 608158
TreeView+ depends on / blocked
Reported: 2010-06-25 20:22 UTC by Robert Peterson
Modified: 2010-11-10 19:59 UTC (History)
9 users (show)

Clone Of:
: 608158 (view as bug list)
Last Closed: 2010-11-10 19:59:29 UTC

Attachments (Terms of Use)
Proposed patch for STABLE3 (6.16 KB, patch)
2010-06-25 20:29 UTC, Robert Peterson
no flags Details | Diff
Addendum patch (787 bytes, patch)
2010-07-28 01:13 UTC, Robert Peterson
no flags Details | Diff

Description Robert Peterson 2010-06-25 20:22:41 UTC
Description of problem:
While working on bug #606468 on a ia64 box I discovered that
my latest and greatest fsck.gfs2 produced multiple unaligned
access errors during execution.  I backtracked it to to a
regression as described in:


The regression is here:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. fsck.gfs2 /dev/device

Actual results:
fsck.gfs2(10238): unaligned access to 0x600000000006930b, ip=0x400000000004c730

Expected results:
None of these messages should be received

Additional info:
I have a patch to fix the problem

Comment 1 Robert Peterson 2010-06-25 20:23:44 UTC
Requesting ack flags to get this into 6.0.

Comment 2 Robert Peterson 2010-06-25 20:29:23 UTC
Created attachment 426982 [details]
Proposed patch for STABLE3

To fix the problem, I simply ported Steve Whitehouse's kernel
version of the latest gfs2_bitfit function back to user space.
I tested it on system a1 and it works properly.

Comment 3 Bill Nottingham 2010-06-25 20:56:34 UTC
Why is fixing a bug that only shows on a platform we don't ship a blocker?

Comment 4 Robert Peterson 2010-06-25 21:06:05 UTC
Because the patch to fix the problem affects all platforms
and the patch that introduced the problem affected all platforms
and is a regression?

Comment 5 Bill Nottingham 2010-06-25 21:33:06 UTC
OK. It just didn't seem like a particularly relevant regression, if it only affects ia64.

Comment 8 RHEL Product and Program Management 2010-06-28 15:42:55 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for

Comment 9 Robert Peterson 2010-06-28 22:38:46 UTC
I pushed the patch to the master branch of the gfs2-utils git
tree and the STABLE3 and RHEL6 branches of the cluster git tree
for inclusion into 6.0.  Changing status to POST until it gets
built.  This was tested on system a1 for ia64 and on roth-08
for x86_64.

Comment 11 Nate Straz 2010-07-27 19:16:54 UTC
Ran into an issue where fsck.gfs2 got stuck in an infinite loop reading the same block.

<bob> refried: This is a bug with gfs2's bitfit algorithm.  For 608154 I ported it from kernel space to user space.  The problem is that the algorithm doesn't return blocks as ascending order.
<bob> I'm calling the algorithm to get "Get the next bitmap block higher than 0x2016" and it comes back with 0x2015.
<bob> I believe I noticed the non-sequential issue in the kernel code months ago and swhiteho_ohnl and I discussed it.
<bob> So I guess the proper thing to do is to mark 608154 as FAILS_QA because that's what I'm going to have to rework

Comment 12 Robert Peterson 2010-07-28 01:13:57 UTC
Created attachment 434885 [details]
Addendum patch

Okay, mystery solved.  I enhanced a private copy of gfs2_edit
so that it would tell me the block allocations as I walked the
bitmaps.  That enabled me to figure out what the proper values
should be, and that, in turn, enabled me to figure out the problem.

The problem, as it turns out, is a "thinko" in this patch
that only affects i386.  I was using sizeof(unsigned long) when
I should have been using sizeof(unsigned long long).  The value
is the same in x86_64 but different in 32-bit machines like the
one that failed.  That bad size caused a miscalculation of the
shift point, which threw everything off.  The infinite loop was
repeatedly returning the same block due to this bad shift point.

All in all that's good news because it means the original
algorithm in the kernel is sound and doesn't need changing.
This addendum patch should take care of the problem for user space.
I'll push it out shortly.

Comment 13 Robert Peterson 2010-07-28 01:32:33 UTC
I tested the new patch on system morph-01.  The patch was pushed
to the master branch of the gfs2-utils git repository and
the STABLE3 and RHEL6 branches of the cluster git repository.
Changing status to POST until this gets built into a new
cman package.

Comment 15 Steve Whitehouse 2010-07-30 12:03:53 UTC
Is it not better to use fixed size types here? The kernel one always uses chunks of u64 even on 32 bit arches. Its possible that its a bit slower (due to the smaller register set) but I doubt it makes a great deal of difference overall.

Comment 16 Robert Peterson 2010-07-30 13:13:22 UTC
Well, actually that was my first thought, to use declarations
such as "uint64_t" but I was leery of doing that because the
previous algorithm did that and got into trouble with the
unaligned access messages on ia64.

Comment 17 Nate Straz 2010-08-13 17:07:04 UTC
I can get through gfs_fsck_stress on i686 again.

Comment 18 releng-rhel@redhat.com 2010-11-10 19:59:29 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.