Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 608154 - fsck.gfs2: unaligned access on ia64
fsck.gfs2: unaligned access on ia64
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: cluster (Show other bugs)
6.0
ia64 Linux
low Severity medium
: rc
: ---
Assigned To: Robert Peterson
Cluster QE
:
Depends On:
Blocks: 608158
  Show dependency treegraph
 
Reported: 2010-06-25 16:22 EDT by Robert Peterson
Modified: 2010-11-10 14:59 EST (History)
9 users (show)

See Also:
Fixed In Version: cluster-3.0.12-20.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 608158 (view as bug list)
Environment:
Last Closed: 2010-11-10 14:59:29 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Proposed patch for STABLE3 (6.16 KB, patch)
2010-06-25 16:29 EDT, Robert Peterson
no flags Details | Diff
Addendum patch (787 bytes, patch)
2010-07-27 21:13 EDT, Robert Peterson
no flags Details | Diff

  None (edit)
Description Robert Peterson 2010-06-25 16:22:41 EDT
Description of problem:
While working on bug #606468 on a ia64 box I discovered that
my latest and greatest fsck.gfs2 produced multiple unaligned
access errors during execution.  I backtracked it to to a
regression as described in:

https://bugzilla.redhat.com/show_bug.cgi?id=606468#c2

The regression is here:
http://git.fedoraproject.org/git/?p=cluster.git;a=commitdiff;h=7ccf5c0c60ff29fc6e8a1ef0fea01d510f2df79b

Version-Release number of selected component (if applicable):
ia64

How reproducible:
Easily

Steps to Reproduce:
1. fsck.gfs2 /dev/device

Actual results:
fsck.gfs2(10238): unaligned access to 0x600000000006930b, ip=0x400000000004c730

Expected results:
None of these messages should be received

Additional info:
I have a patch to fix the problem
Comment 1 Robert Peterson 2010-06-25 16:23:44 EDT
Requesting ack flags to get this into 6.0.
Comment 2 Robert Peterson 2010-06-25 16:29:23 EDT
Created attachment 426982 [details]
Proposed patch for STABLE3

To fix the problem, I simply ported Steve Whitehouse's kernel
version of the latest gfs2_bitfit function back to user space.
I tested it on system a1 and it works properly.
Comment 3 Bill Nottingham 2010-06-25 16:56:34 EDT
Why is fixing a bug that only shows on a platform we don't ship a blocker?
Comment 4 Robert Peterson 2010-06-25 17:06:05 EDT
Because the patch to fix the problem affects all platforms
and the patch that introduced the problem affected all platforms
and is a regression?
Comment 5 Bill Nottingham 2010-06-25 17:33:06 EDT
OK. It just didn't seem like a particularly relevant regression, if it only affects ia64.
Comment 8 RHEL Product and Program Management 2010-06-28 11:42:55 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 9 Robert Peterson 2010-06-28 18:38:46 EDT
I pushed the patch to the master branch of the gfs2-utils git
tree and the STABLE3 and RHEL6 branches of the cluster git tree
for inclusion into 6.0.  Changing status to POST until it gets
built.  This was tested on system a1 for ia64 and on roth-08
for x86_64.
Comment 11 Nate Straz 2010-07-27 15:16:54 EDT
Ran into an issue where fsck.gfs2 got stuck in an infinite loop reading the same block.

<bob> refried: This is a bug with gfs2's bitfit algorithm.  For 608154 I ported it from kernel space to user space.  The problem is that the algorithm doesn't return blocks as ascending order.
<bob> I'm calling the algorithm to get "Get the next bitmap block higher than 0x2016" and it comes back with 0x2015.
<bob> I believe I noticed the non-sequential issue in the kernel code months ago and swhiteho_ohnl and I discussed it.
<bob> So I guess the proper thing to do is to mark 608154 as FAILS_QA because that's what I'm going to have to rework
Comment 12 Robert Peterson 2010-07-27 21:13:57 EDT
Created attachment 434885 [details]
Addendum patch

Okay, mystery solved.  I enhanced a private copy of gfs2_edit
so that it would tell me the block allocations as I walked the
bitmaps.  That enabled me to figure out what the proper values
should be, and that, in turn, enabled me to figure out the problem.

The problem, as it turns out, is a "thinko" in this patch
that only affects i386.  I was using sizeof(unsigned long) when
I should have been using sizeof(unsigned long long).  The value
is the same in x86_64 but different in 32-bit machines like the
one that failed.  That bad size caused a miscalculation of the
shift point, which threw everything off.  The infinite loop was
repeatedly returning the same block due to this bad shift point.

All in all that's good news because it means the original
algorithm in the kernel is sound and doesn't need changing.
This addendum patch should take care of the problem for user space.
I'll push it out shortly.
Comment 13 Robert Peterson 2010-07-27 21:32:33 EDT
I tested the new patch on system morph-01.  The patch was pushed
to the master branch of the gfs2-utils git repository and
the STABLE3 and RHEL6 branches of the cluster git repository.
Changing status to POST until this gets built into a new
cman package.
Comment 15 Steve Whitehouse 2010-07-30 08:03:53 EDT
Is it not better to use fixed size types here? The kernel one always uses chunks of u64 even on 32 bit arches. Its possible that its a bit slower (due to the smaller register set) but I doubt it makes a great deal of difference overall.
Comment 16 Robert Peterson 2010-07-30 09:13:22 EDT
Well, actually that was my first thought, to use declarations
such as "uint64_t" but I was leery of doing that because the
previous algorithm did that and got into trouble with the
unaligned access messages on ia64.
Comment 17 Nate Straz 2010-08-13 13:07:04 EDT
I can get through gfs_fsck_stress on i686 again.
Comment 18 releng-rhel@redhat.com 2010-11-10 14:59:29 EST
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.