Bug 173157 - kernel dm-log: big endian 64-bit corruption
kernel dm-log: big endian 64-bit corruption
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Alasdair Kergon
Brian Brock
Depends On:
Blocks: 168429
  Show dependency treegraph
Reported: 2005-11-14 13:28 EST by Alasdair Kergon
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version: RHSA-2006-0132
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-03-07 15:43:47 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Alasdair Kergon 2005-11-14 13:28:56 EST
The linux bitset operators (test_bit, set_bit etc) work on arrays of
"unsigned long".  dm-log uses such bitsets but treats them as
arrays of uint32_t, only allocating and zeroing a multiple of 4 bytes
(as 'clean_bits' is a uint32_t).
The patch below fixes this problem.
The problem is specific to 64-bit big endian machines such as s390x or
ppc-64 and can prevent pvmove terminating.
In the simplest case, if "region_count" were (say) 30, then
bitset_size (below) would be 4 and bitset_uint32_count would be 1.
Thus the memory for this butset, after allocation and zeroing would
   0 0 0 0 X X X X
On a bigendian 64bit machine, bit 0 for this bitset is in the 8th
byte! (and every bit that dm-log would use would be in the X area).
   0 0 0 0 X X X X
which hasn't been cleared properly.
As the dm-raid1 code only syncs and counts regions which have a 0 in
the 'sync_bits' bitset, and only finishes when it has counted high
enough, a large number of 1's among those 'X's will cause the sync to
not complete.
It is worth noting that the code uses the same bitsets for in-memory
and on-disk logs.  As these bitsets are host-endian and host-sized,
this means that they cannot safely be moved between computers with
different architectures.
Signed-off-by: Neil Brown <neilb@suse.de>

Comment 5 Red Hat Bugzilla 2006-03-07 15:43:47 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.