Bug 960284

Summary: thin_dump and other persistent-data tools may perform IO to misaligned buffers
Product: Red Hat Enterprise Linux 6 Reporter: Mike Snitzer <msnitzer>
Component: device-mapper-persistent-dataAssignee: Heinz Mauelshagen <heinzm>
Status: CLOSED ERRATA QA Contact: yanfu,wang <yanwang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.4CC: agk, ccui, cmarthal, heinzm, jbrassow, jorris, msnitzer, prajnoha, prockai, thornber, xiaoli, yanwang, zkabelac
Target Milestone: rc   
Target Release: 6.6   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Tools provided by the device-mapper-persistent-data package fail to operate on 4K hard-sectored metadata devices.
Story Points: ---
Clone Of:
: 1022834 1035990 (view as bug list) Environment:
Last Closed: 2013-11-21 22:59:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1022834, 1035990    

Description Mike Snitzer 2013-05-06 21:31:22 UTC
Description of problem:
On devices with logical_block_size=4096 thin_check and thin_dump can perform IO that isn't 4K aligned.

Steps to Reproduce:
# modprobe scsi_debug sector_size=4096 dev_size_mb=128
# thin_dump /dev/sdh
read failed
# strace thin_dump /dev/sdh
...
open("/dev/sdh", O_RDONLY|O_SYNC|O_DIRECT) = 3
brk(0x155f000)                          = 0x155f000
lseek(3, 0, SEEK_SET)                   = 0
read(3, 0x153d600, 4096)                = -1 EINVAL (Invalid argument)
brk(0x155e000)                          = 0x155e000

0x153d600 isn't a 4K aligned address.

(I've also seen thin_check fail IO to when reading from an mmap() buffer that isn't 4K aligned). 
  
Actual results:
Buffers aren't aligned to logical_block_size (e.g. 4K).

Expected results:
Buffers should be aligned to logical_block_size (e.g. 4K).

Comment 1 Heinz Mauelshagen 2013-08-05 16:09:59 UTC
Mike,

please test with 0.2.3 tools if the issue is fixed.

Comment 2 Mike Snitzer 2013-08-12 21:22:51 UTC
(In reply to Heinz Mauelshagen from comment #1)
> Mike,
> 
> please test with 0.2.3 tools if the issue is fixed.

thin_dump clearly isn't using direct IO anymore, so not seeing any issue now:

# strace thin_dump /dev/sdh
...
stat("/dev/sdh", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 112), ...}) = 0
open("/dev/sdh", O_RDONLY)              = 3
brk(0x1875000)                          = 0x1875000
lseek(3, 0, SEEK_SET)                   = 0
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
close(3)                                = 0

Comment 4 yanfu,wang 2013-10-24 06:49:59 UTC
hi Joe,
Could I know what's the patch since I didn't find it in source package?

And I also test on device-mapper-persistent-data-0.2.8-2.el6 and still got the same issue:
[root@sun-x2270m2-01 ~]# modprobe scsi_debug sector_size=4096 dev_size_mb=128
[root@sun-x2270m2-01 ~]# dmesg|tail
scsi 10:0:0:0: Direct-Access     Linux    scsi_debug       0004 PQ: 0 ANSI: 5
sd 10:0:0:0: Attached scsi generic sg25 type 0
sd 10:0:0:0: [sdz] 32768 4096-byte logical blocks: (134 MB/128 MiB)
sd 10:0:0:0: [sdz] Write Protect is off
sd 10:0:0:0: [sdz] Mode Sense: 73 00 10 08
sd 10:0:0:0: [sdz] Write cache: enabled, read cache: enabled, supports DPO and FUA
sd 10:0:0:0: [sdz] 32768 4096-byte logical blocks: (134 MB/128 MiB)
 sdz: unknown partition table
sd 10:0:0:0: [sdz] 32768 4096-byte logical blocks: (134 MB/128 MiB)
sd 10:0:0:0: [sdz] Attached SCSI disk

[root@sun-x2270m2-01 ~]# thin_dump /dev/sdz
read failed
[root@sun-x2270m2-01 ~]# echo $?
1
[root@sun-x2270m2-01 ~]# strace thin_dump /dev/sdz
execve("/usr/sbin/thin_dump", ["thin_dump", "/dev/sdz"], [/* 36 vars */]) = 0
brk(0)                                  = 0x92c000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f45d17d1000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=47522, ...}) = 0
mmap(NULL, 47522, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f45d17c5000
close(3)                                = 0
open("/usr/lib64/libstdc++.so.6", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360c\5\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=987096, ...}) = 0
mmap(NULL, 3166648, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f45d12ad000
mprotect(0x7f45d1395000, 2097152, PROT_NONE) = 0
mmap(0x7f45d1595000, 36864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe8000) = 0x7f45d1595000
mmap(0x7f45d159e000, 82360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f45d159e000
close(3)                                = 0
open("/lib64/libexpat.so.1", O_RDONLY)  = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320<\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=165264, ...}) = 0
mmap(NULL, 2260432, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f45d1085000
mprotect(0x7f45d10ab000, 2093056, PROT_NONE) = 0
mmap(0x7f45d12aa000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7f45d12aa000
close(3)                                = 0
open("/lib64/libm.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p>\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=596264, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f45d17c4000
mmap(NULL, 2633912, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f45d0e01000
mprotect(0x7f45d0e84000, 2093056, PROT_NONE) = 0
mmap(0x7f45d1083000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x82000) = 0x7f45d1083000
close(3)                                = 0
open("/lib64/libgcc_s.so.1", O_RDONLY)  = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20)\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=90880, ...}) = 0
mmap(NULL, 2186584, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f45d0beb000
mprotect(0x7f45d0c01000, 2093056, PROT_NONE) = 0
mmap(0x7f45d0e00000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x7f45d0e00000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000\356\1\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1921216, ...}) = 0
mmap(NULL, 3750152, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f45d0857000
mprotect(0x7f45d09e2000, 2093056, PROT_NONE) = 0
mmap(0x7f45d0be1000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18a000) = 0x7f45d0be1000
mmap(0x7f45d0be6000, 18696, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f45d0be6000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f45d17c3000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f45d17c1000
arch_prctl(ARCH_SET_FS, 0x7f45d17c1720) = 0
mprotect(0x7f45d0be1000, 16384, PROT_READ) = 0
mprotect(0x7f45d1083000, 4096, PROT_READ) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f45d17c0000
mprotect(0x7f45d1595000, 28672, PROT_READ) = 0
mprotect(0x7f45d17d2000, 4096, PROT_READ) = 0
munmap(0x7f45d17c5000, 47522)           = 0
brk(0)                                  = 0x92c000
brk(0x94d000)                           = 0x94d000
stat("/dev/sdz", {st_mode=S_IFBLK|0660, st_rdev=makedev(65, 144), ...}) = 0
open("/dev/sdz", O_RDONLY)              = 3
ioctl(3, BLKGETSIZE64, 0x7fff762d81b0)  = 0
close(3)                                = 0
stat("/dev/sdz", {st_mode=S_IFBLK|0660, st_rdev=makedev(65, 144), ...}) = 0
open("/dev/sdz", O_RDONLY|O_SYNC|O_DIRECT) = 3
brk(0x96e000)                           = 0x96e000
lseek(3, 0, SEEK_SET)                   = 0
read(3, 0x94c600, 4096)                 = -1 EINVAL (Invalid argument)
close(3)                                = 0
write(2, "read failed", 11read failed)             = 11
write(2, "\n", 1
)                       = 1
brk(0x94d000)                           = 0x94d000
exit_group(1)                           = ?

[root@sun-x2270m2-01 ~]# rpm -qa|grep device
device-mapper-event-libs-1.02.79-6.el6.x86_64
device-mapper-persistent-data-0.2.8-2.el6.x86_64
device-mapper-1.02.79-6.el6.x86_64
device-mapper-libs-1.02.79-6.el6.x86_64
device-mapper-event-1.02.79-6.el6.x86_64

Comment 7 Mike Snitzer 2013-10-29 21:59:16 UTC
The change was to no longer use O_DIRECT.  But it was reported that that change resulted in a regression, see:
http://www.redhat.com/archives/dm-devel/2013-October/msg00017.html

(In reply to yanfu,wang from comment #5)
> no feedback from Joe, could any developer remove the bug from the errata
> since userspace errata deadline is 10/30 and no time to full test now?

Heinz, any insight here?  Did the change that Joe mentioned (restoring the use of O_DIRECT) make it into RHEL6?

If so then there is a decent chance that this issue (misaligned buffers) still exists.

Comment 10 Heinz Mauelshagen 2013-10-30 14:15:50 UTC
Created attachment 817469 [details]
Patch fixing the alignment issue

Comment 18 errata-xmlrpc 2013-11-21 22:59:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2013-1696.html

Comment 19 Alasdair Kergon 2013-11-27 14:50:19 UTC
This was closed incorrectly and remains unresolved.

Comment 20 yanfu,wang 2013-11-28 02:08:00 UTC
(In reply to Alasdair Kergon from comment #19)
> This was closed incorrectly and remains unresolved.

It's closed automatically when release. Pls remember fix the issue in 6.6 or z-stream.

Comment 22 Alasdair Kergon 2013-11-28 02:35:45 UTC
(In reply to yanfu,wang from comment #20)
> It's closed automatically when release.

The fix failed QA and has not been released.

A new bug needs opening for 6.6 now.

Comment 25 yanfu,wang 2013-11-29 09:59:21 UTC
Copy Radek's reply(thank you very much!):
Well, someone should have removed the bug from the advisory before the advisory was shipped live. Although the bug was back in ASSIGNED and had 'FailedQA' in the Verified field, and the ET was aware of that, the ET doesn't take this information into account and won't keep rel-eng from shipping the advisory. If you think this should be a blocking issue in the REL_PREP state, please file an RFE.

And I've filed a errata tool improvement bug 1036058.