608154 – fsck.gfs2: unaligned access on ia64

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 608154 - fsck.gfs2: unaligned access on ia64

Summary: fsck.gfs2: unaligned access on ia64

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	cluster
Sub Component:
Version:	6.0
Hardware:	ia64
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Robert Peterson
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	608158
TreeView+	depends on / blocked

Reported:	2010-06-25 20:22 UTC by Robert Peterson
Modified:	2010-11-10 19:59 UTC (History)
CC List:	9 users (show)
Fixed In Version:	cluster-3.0.12-20.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	608158 (view as bug list)
Environment:
Last Closed:	2010-11-10 19:59:29 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Proposed patch for STABLE3 (6.16 KB, patch) 2010-06-25 20:29 UTC, Robert Peterson	no flags	Details \| Diff
Addendum patch (787 bytes, patch) 2010-07-28 01:13 UTC, Robert Peterson	no flags	Details \| Diff
View All

Description Robert Peterson 2010-06-25 20:22:41 UTC

Description of problem:
While working on bug #606468 on a ia64 box I discovered that
my latest and greatest fsck.gfs2 produced multiple unaligned
access errors during execution.  I backtracked it to to a
regression as described in:

https://bugzilla.redhat.com/show_bug.cgi?id=606468#c2

The regression is here:
http://git.fedoraproject.org/git/?p=cluster.git;a=commitdiff;h=7ccf5c0c60ff29fc6e8a1ef0fea01d510f2df79b

Version-Release number of selected component (if applicable):
ia64

How reproducible:
Easily

Steps to Reproduce:
1. fsck.gfs2 /dev/device

Actual results:
fsck.gfs2(10238): unaligned access to 0x600000000006930b, ip=0x400000000004c730

Expected results:
None of these messages should be received

Additional info:
I have a patch to fix the problem

Comment 1 Robert Peterson 2010-06-25 20:23:44 UTC

Requesting ack flags to get this into 6.0.

Comment 2 Robert Peterson 2010-06-25 20:29:23 UTC

Created attachment 426982 [details]
Proposed patch for STABLE3

To fix the problem, I simply ported Steve Whitehouse's kernel
version of the latest gfs2_bitfit function back to user space.
I tested it on system a1 and it works properly.

Comment 3 Bill Nottingham 2010-06-25 20:56:34 UTC

Why is fixing a bug that only shows on a platform we don't ship a blocker?

Comment 4 Robert Peterson 2010-06-25 21:06:05 UTC

Because the patch to fix the problem affects all platforms
and the patch that introduced the problem affected all platforms
and is a regression?

Comment 5 Bill Nottingham 2010-06-25 21:33:06 UTC

OK. It just didn't seem like a particularly relevant regression, if it only affects ia64.

Comment 8 RHEL Program Management 2010-06-28 15:42:55 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 9 Robert Peterson 2010-06-28 22:38:46 UTC

I pushed the patch to the master branch of the gfs2-utils git
tree and the STABLE3 and RHEL6 branches of the cluster git tree
for inclusion into 6.0.  Changing status to POST until it gets
built.  This was tested on system a1 for ia64 and on roth-08
for x86_64.

Comment 11 Nate Straz 2010-07-27 19:16:54 UTC

Ran into an issue where fsck.gfs2 got stuck in an infinite loop reading the same block.

<bob> refried: This is a bug with gfs2's bitfit algorithm.  For 608154 I ported it from kernel space to user space.  The problem is that the algorithm doesn't return blocks as ascending order.
<bob> I'm calling the algorithm to get "Get the next bitmap block higher than 0x2016" and it comes back with 0x2015.
<bob> I believe I noticed the non-sequential issue in the kernel code months ago and swhiteho_ohnl and I discussed it.
<bob> So I guess the proper thing to do is to mark 608154 as FAILS_QA because that's what I'm going to have to rework

Comment 12 Robert Peterson 2010-07-28 01:13:57 UTC

Created attachment 434885 [details]
Addendum patch

Okay, mystery solved.  I enhanced a private copy of gfs2_edit
so that it would tell me the block allocations as I walked the
bitmaps.  That enabled me to figure out what the proper values
should be, and that, in turn, enabled me to figure out the problem.

The problem, as it turns out, is a "thinko" in this patch
that only affects i386.  I was using sizeof(unsigned long) when
I should have been using sizeof(unsigned long long).  The value
is the same in x86_64 but different in 32-bit machines like the
one that failed.  That bad size caused a miscalculation of the
shift point, which threw everything off.  The infinite loop was
repeatedly returning the same block due to this bad shift point.

All in all that's good news because it means the original
algorithm in the kernel is sound and doesn't need changing.
This addendum patch should take care of the problem for user space.
I'll push it out shortly.

Comment 13 Robert Peterson 2010-07-28 01:32:33 UTC

I tested the new patch on system morph-01.  The patch was pushed
to the master branch of the gfs2-utils git repository and
the STABLE3 and RHEL6 branches of the cluster git repository.
Changing status to POST until this gets built into a new
cman package.

Comment 15 Steve Whitehouse 2010-07-30 12:03:53 UTC

Is it not better to use fixed size types here? The kernel one always uses chunks of u64 even on 32 bit arches. Its possible that its a bit slower (due to the smaller register set) but I doubt it makes a great deal of difference overall.

Comment 16 Robert Peterson 2010-07-30 13:13:22 UTC

Well, actually that was my first thought, to use declarations
such as "uint64_t" but I was leery of doing that because the
previous algorithm did that and got into trouble with the
unaligned access messages on ia64.

Comment 17 Nate Straz 2010-08-13 17:07:04 UTC

I can get through gfs_fsck_stress on i686 again.

Comment 18 releng-rhel@redhat.com 2010-11-10 19:59:29 UTC

Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.