Bug 844493 - Various dm_task_run failures cause many continuing problems in LVM
Summary: Various dm_task_run failures cause many continuing problems in LVM
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2
Version: 6.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Peter Rajnoha
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 619574 624148
TreeView+ depends on / blocked
 
Reported: 2012-07-30 22:19 UTC by Jonathan Earl Brassow
Modified: 2016-05-11 01:19 UTC (History)
11 users (show)

Fixed In Version: lvm2-2.02.140-1.el6
Doc Type: Bug Fix
Doc Text:
Clone Of: 813954
Environment:
Last Closed: 2016-05-11 01:19:39 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0964 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2016-05-10 22:57:40 UTC

Description Jonathan Earl Brassow 2012-07-30 22:19:15 UTC
+++ This bug was initially created as a clone of Bug #813954 +++

Several bugs that are attributed to mirroring or other LVM components boil down to issues with 'dm_task_run'.  These 'dm_task_run' errors prop-up intermittently and make the bugs difficult to reproduce.  Often, the way to reproduce is loop over a particular operation until /something/ is hit.

This bug is going to be used to encapsulate the various RHEL5 bugs that have popped up due to 'dm_task_run' failures.  The 'dm_task_run' issues should be solved before these other dependent bugs.

Comment 1 Jonathan Earl Brassow 2012-07-30 22:22:16 UTC
This bug is the RHEL6 equivalent to bug 813954 - a bug designed to encapsulate all the issues that revolve around 'dm_task_run' errors.

Again, the 'dm_task_run' issues should be solved before these other dependent bugs.

Comment 3 RHEL Product and Program Management 2012-12-14 08:20:23 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 4 Zdenek Kabelac 2013-05-14 10:11:13 UTC
I believe there are no random dm_task_run errors anymore with current code.
So I'm closing this BZ - if they appear - reopen this BZ with trace.

Comment 5 Jonathan Earl Brassow 2013-05-29 17:20:05 UTC
This bug is not fixed:
https://bugzilla.redhat.com/show_bug.cgi?id=624148#c27

Comment 6 Zdenek Kabelac 2013-05-30 07:40:48 UTC
This doesn't look like failure of dm_task_run() - but rather lvm2 mirror code is doing bad things here.   ioctl is getting signal to wakeup from some kernel event - so not really a fault of dm_task_run.

Comment 7 Corey Marthaler 2013-05-30 22:00:40 UTC
These don't just happen during mirror operations though. Lots of lvm operations can cause these messages to show up.


May 30 16:52:54 qalvm-01 qarshd[1908]: Running cmdline: lvremove -f /dev/snapper_thinp/snap1
May 30 16:54:12 qalvm-01 lvm[1149]: Monitoring snapshot snapper_thinp-snap2
May 30 16:54:12 qalvm-01 systemd-udevd[1916]: inotify_add_watch(7, /dev/dm-14, 10) failed: No such file or directory
May 30 16:54:12 qalvm-01 systemd-udevd[1919]: inotify_add_watch(7, /dev/dm-12, 10) failed: No such file or directory
May 30 16:54:12 qalvm-01 qarshd[1953]: Running cmdline: dmsetup ls
May 30 16:54:12 qalvm-01 qarshd[1955]: Running cmdline: ls /dev/snapper_thinp/snap1
May 30 16:54:12 qalvm-01 lvm[1149]: Logical volume snap1 not found in volume group snapper_thinp
May 30 16:54:12 qalvm-01 lvm[1149]: Failed to extend snapshot snapper_thinp-snap1.
May 30 16:54:12 qalvm-01 lvm[1149]: dm_task_run failed, errno = 6, No such device or address
May 30 16:54:12 qalvm-01 lvm[1149]: snapper_thinp-snap1 disappeared, detaching
May 30 16:54:12 qalvm-01 lvm[1149]: No longer monitoring snapshot snapper_thinp-snap1
May 30 16:54:14 qalvm-01 lvm[1149]: No longer monitoring snapshot snapper_thinp-snap2

Comment 8 Zdenek Kabelac 2014-11-25 15:45:14 UTC
I believe another one related to Bug #1108540 - where dmeventd was incorrectly unmonitoring devices.

This is fixed with lvm2 2.02.112.

Comment 9 Peter Rajnoha 2014-11-26 07:42:27 UTC
(In reply to Zdenek Kabelac from comment #8)
> I believe another one related to Bug #1108540 - where dmeventd was
> incorrectly unmonitoring devices.
> 
> This is fixed with lvm2 2.02.112.

Well, the bug #1108540 had "Device or resource busy" error, while this bug (and bug #813954) has "Invalid argument" error issued. So it seems a bit different. Anyway, would be good to see if the other error (Invalid argument) is still reproducible...

Comment 10 Peter Rajnoha 2015-04-14 14:16:07 UTC
(In reply to Peter Rajnoha from comment #9)
> (In reply to Zdenek Kabelac from comment #8)
> > I believe another one related to Bug #1108540 - where dmeventd was
> > incorrectly unmonitoring devices.
> > 
> > This is fixed with lvm2 2.02.112.
> 
> Well, the bug #1108540 had "Device or resource busy" error, while this bug
> (and bug #813954) has "Invalid argument" error issued. So it seems a bit
> different. 

(moving back to NEW as this is not yet resolved - the error described in this report differs)

Comment 11 Marian Csontos 2015-05-05 13:25:16 UTC
Can we insert an Internal Error to dmeventd which is failing here?

Then QEs will have to run tests with abort_on_internal_errors set and provide the coredump should it happen again.

Comment 12 Peter Rajnoha 2015-10-15 07:19:07 UTC
We don't have a concrete reproducer for now, marking with "devel cond nak reproducer" for now.

Comment 13 Peter Rajnoha 2015-10-15 07:20:56 UTC
Also, there were lots of fixes in dmeventd - it's probably worth trying the new lvm2 release first once we do the new build for 6.8.

Comment 14 Zdenek Kabelac 2015-10-29 12:48:15 UTC
AFAIK this bug should be closed and reopened when something new appears.

We've fixed large list of bugs - e.g.  dmeventd was breaking its own processed lvm2 command  with SIGALRM causing unexpected ioctl failures if the timer expired while command has been processed.

I'm considering this BZ as solved with release 2.02.133.

Comment 21 Corey Marthaler 2016-02-22 16:09:27 UTC
Marking verified (SanityOnly) in the latest rpms.

2.6.32-615.el6.x86_64

lvm2-2.02.141-2.el6    BUILT: Wed Feb 10 07:49:03 CST 2016
lvm2-libs-2.02.141-2.el6    BUILT: Wed Feb 10 07:49:03 CST 2016
lvm2-cluster-2.02.141-2.el6    BUILT: Wed Feb 10 07:49:03 CST 2016
udev-147-2.71.el6    BUILT: Wed Feb 10 07:07:17 CST 2016
device-mapper-1.02.115-2.el6    BUILT: Wed Feb 10 07:49:03 CST 2016
device-mapper-libs-1.02.115-2.el6    BUILT: Wed Feb 10 07:49:03 CST 2016
device-mapper-event-1.02.115-2.el6    BUILT: Wed Feb 10 07:49:03 CST 2016
device-mapper-event-libs-1.02.115-2.el6    BUILT: Wed Feb 10 07:49:03 CST 2016
device-mapper-persistent-data-0.6.2-0.1.rc1.el6    BUILT: Wed Feb 10 09:52:15 CST 2016
cmirror-2.02.141-2.el6    BUILT: Wed Feb 10 07:49:03 CST 2016

Comment 23 errata-xmlrpc 2016-05-11 01:19:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0964.html


Note You need to log in before you can comment on or make changes to this bug.