821384 – Errors reported during reboot of system with thinp LV

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 821384 - Errors reported during reboot of system with thinp LV

Summary: Errors reported during reboot of system with thinp LV

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	6.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Zdenek Kabelac
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	773507 840699
TreeView+	depends on / blocked

Reported:	2012-05-14 10:11 UTC by Nenad Peric
Modified:	2012-12-10 19:38 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-12-10 19:38:59 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
vgchange -vvvv -ay (80.43 KB, text/plain) 2012-05-15 08:02 UTC, Nenad Peric	no flags	Details
reboot log + vgchange + vgremove (217.12 KB, text/plain) 2012-05-15 08:19 UTC, Nenad Peric	no flags	Details
Show Obsolete (1) View All

Description Nenad Peric 2012-05-14 10:11:15 UTC

Description of problem:

When thin LV is created, during reboot errors are reported. 
With filesystem created and mounted, the errors can sometimes cause the reboot to get stuck. 

Version-Release number of selected component (if applicable):

Kernel 2.6.32-269.el6.x86_64 #1 

lvm2-2.02.95-7.el6.x86_64
lvm2-devel-2.02.95-7.el6.x86_64
lvm2-libs-2.02.95-7.el6.x86_64
lvm2-cluster-2.02.95-7.el6.x86_64
lvm2-debuginfo-2.02.95-7.el6.x86_64

device-mapper-persistent-data-0.1.4-1.el6.x86_64
device-mapper-event-libs-1.02.74-7.el6.x86_64
device-mapper-libs-1.02.74-7.el6.x86_64
device-mapper-devel-1.02.74-7.el6.x86_64
device-mapper-event-devel-1.02.74-7.el6.x86_64
device-mapper-1.02.74-7.el6.x86_64
device-mapper-event-1.02.74-7.el6.x86_64


How reproducible:
Everytime

Steps to Reproduce:
1. Create VG, Create thin pool
2. Create thin LV
3. reboot

  vgcreate vgforthin /dev/sda1 /dev/sdb1 /dev/sdc1
  lvcreate -T -L 1G vgforthin/thin_pool
  lvcreate -V1G vgforthin/thin_pool -T -n virtual1

[root@node01:~]$ lvs
  LV        VG        Attr     LSize    Pool      Origin Data%  Move Log Copy%  Convert
  lv_root   VolGroup  -wi-ao--    8.52g                                                
  lv_swap   VolGroup  -wi-ao-- 1008.00m                                                
  thin_pool vgforthin twi-a-tz    1.00g                    0.00                        
  virtual1  vgforthin Vwi-a-tz    1.00g thin_pool          0.00            


  reboot
  
Actual results:

During reboot these errors are reported:

Stopping monitoring for VG VolGroup:   /dev/mapper/vgforthin-thin_pool: read failed after 0 of 4096 at 1073676288: Input/output error
  /dev/mapper/vgforthin-thin_pool: read failed after 0 of 4096 at 1073733632: Input/output error
  /dev/mapper/vgforthin-thin_pool: read failed after 0 of 4096 at 0: Input/output error
  /dev/mapper/vgforthin-thin_pool: read failed after 0 of 4096 at 4096: Input/output error

Expected results:

Reboot without errors. 

Additional info:

Only one node running, no cluster. 

 locking_type = 1


Will try to add the behaviour in the following comments whet there is a FS created on top of LV and mounted.

Comment 1 Nenad Peric 2012-05-14 10:20:30 UTC

Same steps as above but with adding FS:

  vgcreate vgforthin /dev/sda1 /dev/sdb1 /dev/sdc1
  lvcreate -T -L 1G vgforthin/thin_pool
  lvcreate -V1G vgforthin/thin_pool -T -n virtual1
  lvcreate -V1G vgforthin/thin_pool -T -n virtual2

  mke2fs /dev/vgforthin/virtual1
  mke2fs -t ext4 /dev/vgforthin/virtual2

  mount /dev/vgforthin/virtual1 /mnt/virtual1/
  mount /dev/vgforthin/virtual2 /mnt/virtual2/
  touch /mnt/virtual1/file
  touch /mnt/virtual2/file
  reboot


Results:

Stopping monitoring for VG VolGroup:   /dev/mapper/vgforthin-thin_pool: read failed after 0 of 4096 at 1073676288: Input/output error
  /dev/mapper/vgforthin-thin_pool: read failed after 0 of 4096 at 1073733632: Input/output error
  /dev/mapper/vgforthin-thin_pool: read failed after 0 of 4096 at 0: Input/output error
  /dev/mapper/vgforthin-thin_pool: read failed after 0 of 4096 at 4096: Input/output error
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Failed to get state of mapped device
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Failed to get state of mapped device
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Failed to get driver version
Unmounting file systems:  Buffer I/O error on device dm-7, logical block 1
lost page write due to I/O error on dm-7
Buffer I/O error on device dm-7, logical block 73
lost page write due to I/O error on dm-7
Buffer I/O error on device dm-7, logical block 81
lost page write due to I/O error on dm-7
Buffer I/O error on device dm-7, logical block 97
lost page write due to I/O error on dm-7
Buffer I/O error on device dm-7, logical block 0
lost page write due to I/O error on dm-7
Aborting journal on device dm-7-8.
Buffer I/O error on device dm-7, logical block 131072
lost page write due to I/O error on dm-7
JBD2: I/O error detected when updating journal superblock for dm-7-8.
EXT4-fs error (device dm-7): ext4_put_super: Couldn't clean up the journal
EXT4-fs (dm-7): Remounting filesystem read-only
device-mapper: thin: dm_thin_find_block() failed, error = -19
device-mapper: thin: dm_thin_find_block() failed, error = -19
Buffer I/O error on device dm-6, logical block 0
lost page write due to I/O error on dm-6
------------[ cut here ]------------
WARNING: at fs/buffer.c:1161 mark_buffer_dirty+0x82/0xa0() (Tainted: G           ---------------  T)
Hardware name: KVM
Modules linked in: ext2 dm_thin_pool(T) dm_persistent_data dm_bufio libcrc32c nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc sg sd_mod crc_t10dif be2iscsi iscsi_boot_sysfs uio cxgb4 libcxgbi cxgb3 mdio ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi microcode virtio_balloon virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ib_addr]
Pid: 2652, comm: umount Tainted: G           ---------------  T 2.6.32-269.el6.x86_64 #1
Call Trace:
 [<ffffffff8106b6b7>] ? warn_slowpath_common+0x87/0xc0
 [<ffffffff8106b70a>] ? warn_slowpath_null+0x1a/0x20
 [<ffffffff811ae282>] ? mark_buffer_dirty+0x82/0xa0
 [<ffffffffa04a26d3>] ? ext2_sync_fs+0x43/0xb0 [ext2]
 [<ffffffff811dde8e>] ? sync_quota_sb+0x5e/0x130
 [<ffffffff811aa92a>] ? __sync_filesystem+0x7a/0x90
 [<ffffffff811aab3b>] ? sync_filesystem+0x4b/0x70
 [<ffffffff8117d957>] ? generic_shutdown_super+0x27/0xe0
 [<ffffffff8117da41>] ? kill_block_super+0x31/0x50
 [<ffffffff8117eaf0>] ? deactivate_super+0x70/0x90
 [<ffffffff8119aadf>] ? mntput_no_expire+0xbf/0x110
 [<ffffffff8119b57b>] ? sys_umount+0x7b/0x3a0
 [<ffffffff81082d51>] ? sigprocmask+0x71/0x110
 [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
---[ end trace da3e93115e42f319 ]---
device-mapper: thin: dm_thin_find_block() failed, error = -19
Buffer I/O error on device dm-6, logical block 67
lost page write due to I/O error on dm-6


Expected: 

normal reboot

Comment 3 Zdenek Kabelac 2012-05-14 13:32:07 UTC

Can you please attach   -vvvv  from  'vgchange -vvvv -ay'  activation after reboot ?

Whether  thin_check successfully passed in this case ?

Comment 4 Alasdair Kergon 2012-05-14 17:23:14 UTC

Two separate fixes may be needed.

1) The "Huge memory allocation" messages should not be appearing - the failure condition, whatever it is, should be caught (by thin_check or lvm2) earlier and handled cleanly.

2) If this is caused by an inconsistent metadata state on disk, the (kernel?) code needs fixing to avoid that state occurring.

Comment 5 Nenad Peric 2012-05-15 08:02:48 UTC

Created attachment 584576 [details]
vgchange -vvvv -ay

I have attached the log of vgchange -ay with -vvvv

As far as I have seen, the issue seems to be (maybe) in the order of stopping things, since the disks being used are iscsi targets. 

I tested the same on a "normal" hardware and no errors occurred during reboot. 

So, during the reboot maybe the lvm monitoring should be stopped after iscsi stopped? 
Or maybe there is some issue with iscsi itself, and does not sync properly, and what we see in lvm later is an effect rather than cause?

As the tests go, I tried changing the sequence of shut down events, and when lvm monitoring is stopped before iscsi stops there are no errors. 
Not sure if this is how it should be done though :)

Comment 6 Nenad Peric 2012-05-15 08:18:19 UTC

An UPDATE:

I just noticed that the second machine I was testing this on did not have device-maper-persistent-data installed, so I have installed it now and I will add a new attachment as a result. 
Sorry for not noticing it sooner. This package imho should be a part of lvm install since it is an integral part of thinp it seems. 

The second attachment contains a reboot log with iscsi errors and a log of vgchange/vgremove.

Comment 7 Nenad Peric 2012-05-15 08:19:37 UTC

Created attachment 584585 [details]
reboot log + vgchange + vgremove

Comment 8 Alasdair Kergon 2012-05-15 11:33:01 UTC

(In reply to comment #6)

> Sorry for not noticing it sooner. This package imho should be a part of lvm
> install since it is an integral part of thinp it seems. 

We did not do that because this we do not want a Tech Preview package installed on everyone's machine.  But if it's not installed and lvm can't find the checking binary required to use thinp, it should be issuing an error.

Comment 9 Nenad Peric 2012-05-15 11:41:43 UTC

Yes, that is how I noticed the error after I already uploaded the first log.

Spotting the error when running with -vvvv is a bit complicated, so that is why I missed it. 

But yes, it is issuing an error that it cannot find thin_check.

Note You need to log in before you can comment on or make changes to this bug.