This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 821384 - Errors reported during reboot of system with thinp LV
Errors reported during reboot of system with thinp LV
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2 (Show other bugs)
6.3
Unspecified Unspecified
high Severity unspecified
: rc
: ---
Assigned To: Zdenek Kabelac
Cluster QE
:
Depends On:
Blocks: 773507 840699
  Show dependency treegraph
 
Reported: 2012-05-14 06:11 EDT by Nenad Peric
Modified: 2012-12-10 14:38 EST (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-12-10 14:38:59 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
vgchange -vvvv -ay (80.43 KB, text/plain)
2012-05-15 04:02 EDT, Nenad Peric
no flags Details
reboot log + vgchange + vgremove (217.12 KB, text/plain)
2012-05-15 04:19 EDT, Nenad Peric
no flags Details

  None (edit)
Description Nenad Peric 2012-05-14 06:11:15 EDT
Description of problem:

When thin LV is created, during reboot errors are reported. 
With filesystem created and mounted, the errors can sometimes cause the reboot to get stuck. 

Version-Release number of selected component (if applicable):

Kernel 2.6.32-269.el6.x86_64 #1 

lvm2-2.02.95-7.el6.x86_64
lvm2-devel-2.02.95-7.el6.x86_64
lvm2-libs-2.02.95-7.el6.x86_64
lvm2-cluster-2.02.95-7.el6.x86_64
lvm2-debuginfo-2.02.95-7.el6.x86_64

device-mapper-persistent-data-0.1.4-1.el6.x86_64
device-mapper-event-libs-1.02.74-7.el6.x86_64
device-mapper-libs-1.02.74-7.el6.x86_64
device-mapper-devel-1.02.74-7.el6.x86_64
device-mapper-event-devel-1.02.74-7.el6.x86_64
device-mapper-1.02.74-7.el6.x86_64
device-mapper-event-1.02.74-7.el6.x86_64


How reproducible:
Everytime

Steps to Reproduce:
1. Create VG, Create thin pool
2. Create thin LV
3. reboot

  vgcreate vgforthin /dev/sda1 /dev/sdb1 /dev/sdc1
  lvcreate -T -L 1G vgforthin/thin_pool
  lvcreate -V1G vgforthin/thin_pool -T -n virtual1

[root@node01:~]$ lvs
  LV        VG        Attr     LSize    Pool      Origin Data%  Move Log Copy%  Convert
  lv_root   VolGroup  -wi-ao--    8.52g                                                
  lv_swap   VolGroup  -wi-ao-- 1008.00m                                                
  thin_pool vgforthin twi-a-tz    1.00g                    0.00                        
  virtual1  vgforthin Vwi-a-tz    1.00g thin_pool          0.00            


  reboot
  
Actual results:

During reboot these errors are reported:

Stopping monitoring for VG VolGroup:   /dev/mapper/vgforthin-thin_pool: read failed after 0 of 4096 at 1073676288: Input/output error
  /dev/mapper/vgforthin-thin_pool: read failed after 0 of 4096 at 1073733632: Input/output error
  /dev/mapper/vgforthin-thin_pool: read failed after 0 of 4096 at 0: Input/output error
  /dev/mapper/vgforthin-thin_pool: read failed after 0 of 4096 at 4096: Input/output error

Expected results:

Reboot without errors. 

Additional info:

Only one node running, no cluster. 

 locking_type = 1


Will try to add the behaviour in the following comments whet there is a FS created on top of LV and mounted.
Comment 1 Nenad Peric 2012-05-14 06:20:30 EDT
Same steps as above but with adding FS:

  vgcreate vgforthin /dev/sda1 /dev/sdb1 /dev/sdc1
  lvcreate -T -L 1G vgforthin/thin_pool
  lvcreate -V1G vgforthin/thin_pool -T -n virtual1
  lvcreate -V1G vgforthin/thin_pool -T -n virtual2

  mke2fs /dev/vgforthin/virtual1
  mke2fs -t ext4 /dev/vgforthin/virtual2

  mount /dev/vgforthin/virtual1 /mnt/virtual1/
  mount /dev/vgforthin/virtual2 /mnt/virtual2/
  touch /mnt/virtual1/file
  touch /mnt/virtual2/file
  reboot


Results:

Stopping monitoring for VG VolGroup:   /dev/mapper/vgforthin-thin_pool: read failed after 0 of 4096 at 1073676288: Input/output error
  /dev/mapper/vgforthin-thin_pool: read failed after 0 of 4096 at 1073733632: Input/output error
  /dev/mapper/vgforthin-thin_pool: read failed after 0 of 4096 at 0: Input/output error
  /dev/mapper/vgforthin-thin_pool: read failed after 0 of 4096 at 4096: Input/output error
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Failed to get state of mapped device
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Failed to get state of mapped device
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Failed to get driver version
Unmounting file systems:  Buffer I/O error on device dm-7, logical block 1
lost page write due to I/O error on dm-7
Buffer I/O error on device dm-7, logical block 73
lost page write due to I/O error on dm-7
Buffer I/O error on device dm-7, logical block 81
lost page write due to I/O error on dm-7
Buffer I/O error on device dm-7, logical block 97
lost page write due to I/O error on dm-7
Buffer I/O error on device dm-7, logical block 0
lost page write due to I/O error on dm-7
Aborting journal on device dm-7-8.
Buffer I/O error on device dm-7, logical block 131072
lost page write due to I/O error on dm-7
JBD2: I/O error detected when updating journal superblock for dm-7-8.
EXT4-fs error (device dm-7): ext4_put_super: Couldn't clean up the journal
EXT4-fs (dm-7): Remounting filesystem read-only
device-mapper: thin: dm_thin_find_block() failed, error = -19
device-mapper: thin: dm_thin_find_block() failed, error = -19
Buffer I/O error on device dm-6, logical block 0
lost page write due to I/O error on dm-6
------------[ cut here ]------------
WARNING: at fs/buffer.c:1161 mark_buffer_dirty+0x82/0xa0() (Tainted: G           ---------------  T)
Hardware name: KVM
Modules linked in: ext2 dm_thin_pool(T) dm_persistent_data dm_bufio libcrc32c nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc sg sd_mod crc_t10dif be2iscsi iscsi_boot_sysfs uio cxgb4 libcxgbi cxgb3 mdio ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ib_addr]
Pid: 2652, comm: umount Tainted: G           ---------------  T 2.6.32-269.el6.x86_64 #1
Call Trace:
 [<ffffffff8106b6b7>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff8106b70a>] ? warn_slowpath_null+0x1a/0x20
 [<ffffffff811ae282>] ? mark_buffer_dirty+0x82/0xa0
 [<ffffffffa04a26d3>] ? ext2_sync_fs+0x43/0xb0 [ext2]
 [<ffffffff811dde8e>] ? sync_quota_sb+0x5e/0x130
 [<ffffffff811aa92a>] ? __sync_filesystem+0x7a/0x90
 [<ffffffff811aab3b>] ? sync_filesystem+0x4b/0x70
 [<ffffffff8117d957>] ? generic_shutdown_super+0x27/0xe0
 [<ffffffff8117da41>] ? kill_block_super+0x31/0x50
 [<ffffffff8117eaf0>] ? deactivate_super+0x70/0x90
 [<ffffffff8119aadf>] ? mntput_no_expire+0xbf/0x110
 [<ffffffff8119b57b>] ? sys_umount+0x7b/0x3a0
 [<ffffffff81082d51>] ? sigprocmask+0x71/0x110
 [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
---[ end trace da3e93115e42f319 ]---
device-mapper: thin: dm_thin_find_block() failed, error = -19
Buffer I/O error on device dm-6, logical block 67
lost page write due to I/O error on dm-6


Expected: 

normal reboot
Comment 3 Zdenek Kabelac 2012-05-14 09:32:07 EDT
Can you please attach   -vvvv  from  'vgchange -vvvv -ay'  activation after reboot ?

Whether  thin_check successfully passed in this case ?
Comment 4 Alasdair Kergon 2012-05-14 13:23:14 EDT
Two separate fixes may be needed.

1) The "Huge memory allocation" messages should not be appearing - the failure condition, whatever it is, should be caught (by thin_check or lvm2) earlier and handled cleanly.

2) If this is caused by an inconsistent metadata state on disk, the (kernel?) code needs fixing to avoid that state occurring.
Comment 5 Nenad Peric 2012-05-15 04:02:48 EDT
Created attachment 584576 [details]
vgchange -vvvv -ay

I have attached the log of vgchange -ay with -vvvv

As far as I have seen, the issue seems to be (maybe) in the order of stopping things, since the disks being used are iscsi targets. 

I tested the same on a "normal" hardware and no errors occurred during reboot. 

So, during the reboot maybe the lvm monitoring should be stopped after iscsi stopped? 
Or maybe there is some issue with iscsi itself, and does not sync properly, and what we see in lvm later is an effect rather than cause?

As the tests go, I tried changing the sequence of shut down events, and when lvm monitoring is stopped before iscsi stops there are no errors. 
Not sure if this is how it should be done though :)
Comment 6 Nenad Peric 2012-05-15 04:18:19 EDT
An UPDATE:

I just noticed that the second machine I was testing this on did not have device-maper-persistent-data installed, so I have installed it now and I will add a new attachment as a result. 
Sorry for not noticing it sooner. This package imho should be a part of lvm install since it is an integral part of thinp it seems. 

The second attachment contains a reboot log with iscsi errors and a log of vgchange/vgremove.
Comment 7 Nenad Peric 2012-05-15 04:19:37 EDT
Created attachment 584585 [details]
reboot log + vgchange + vgremove
Comment 8 Alasdair Kergon 2012-05-15 07:33:01 EDT
(In reply to comment #6)

> Sorry for not noticing it sooner. This package imho should be a part of lvm
> install since it is an integral part of thinp it seems. 

We did not do that because this we do not want a Tech Preview package installed on everyone's machine.  But if it's not installed and lvm can't find the checking binary required to use thinp, it should be issuing an error.
Comment 9 Nenad Peric 2012-05-15 07:41:43 EDT
Yes, that is how I noticed the error after I already uploaded the first log.

Spotting the error when running with -vvvv is a bit complicated, so that is why I missed it. 

But yes, it is issuing an error that it cannot find thin_check.

Note You need to log in before you can comment on or make changes to this bug.