Bug 844493 - Various dm_task_run failures cause many continuing problems in LVM
Various dm_task_run failures cause many continuing problems in LVM
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2 (Show other bugs)
6.3
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Peter Rajnoha
cluster-qe@redhat.com
: Reopened
Depends On:
Blocks: 619574 624148
  Show dependency treegraph
 
Reported: 2012-07-30 18:19 EDT by Jonathan Earl Brassow
Modified: 2016-05-10 21:19 EDT (History)
11 users (show)

See Also:
Fixed In Version: lvm2-2.02.140-1.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 813954
Environment:
Last Closed: 2016-05-10 21:19:39 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Jonathan Earl Brassow 2012-07-30 18:19:15 EDT
+++ This bug was initially created as a clone of Bug #813954 +++

Several bugs that are attributed to mirroring or other LVM components boil down to issues with 'dm_task_run'.  These 'dm_task_run' errors prop-up intermittently and make the bugs difficult to reproduce.  Often, the way to reproduce is loop over a particular operation until /something/ is hit.

This bug is going to be used to encapsulate the various RHEL5 bugs that have popped up due to 'dm_task_run' failures.  The 'dm_task_run' issues should be solved before these other dependent bugs.
Comment 1 Jonathan Earl Brassow 2012-07-30 18:22:16 EDT
This bug is the RHEL6 equivalent to bug 813954 - a bug designed to encapsulate all the issues that revolve around 'dm_task_run' errors.

Again, the 'dm_task_run' issues should be solved before these other dependent bugs.
Comment 3 RHEL Product and Program Management 2012-12-14 03:20:23 EST
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 4 Zdenek Kabelac 2013-05-14 06:11:13 EDT
I believe there are no random dm_task_run errors anymore with current code.
So I'm closing this BZ - if they appear - reopen this BZ with trace.
Comment 5 Jonathan Earl Brassow 2013-05-29 13:20:05 EDT
This bug is not fixed:
https://bugzilla.redhat.com/show_bug.cgi?id=624148#c27
Comment 6 Zdenek Kabelac 2013-05-30 03:40:48 EDT
This doesn't look like failure of dm_task_run() - but rather lvm2 mirror code is doing bad things here.   ioctl is getting signal to wakeup from some kernel event - so not really a fault of dm_task_run.
Comment 7 Corey Marthaler 2013-05-30 18:00:40 EDT
These don't just happen during mirror operations though. Lots of lvm operations can cause these messages to show up.


May 30 16:52:54 qalvm-01 qarshd[1908]: Running cmdline: lvremove -f /dev/snapper_thinp/snap1
May 30 16:54:12 qalvm-01 lvm[1149]: Monitoring snapshot snapper_thinp-snap2
May 30 16:54:12 qalvm-01 systemd-udevd[1916]: inotify_add_watch(7, /dev/dm-14, 10) failed: No such file or directory
May 30 16:54:12 qalvm-01 systemd-udevd[1919]: inotify_add_watch(7, /dev/dm-12, 10) failed: No such file or directory
May 30 16:54:12 qalvm-01 qarshd[1953]: Running cmdline: dmsetup ls
May 30 16:54:12 qalvm-01 qarshd[1955]: Running cmdline: ls /dev/snapper_thinp/snap1
May 30 16:54:12 qalvm-01 lvm[1149]: Logical volume snap1 not found in volume group snapper_thinp
May 30 16:54:12 qalvm-01 lvm[1149]: Failed to extend snapshot snapper_thinp-snap1.
May 30 16:54:12 qalvm-01 lvm[1149]: dm_task_run failed, errno = 6, No such device or address
May 30 16:54:12 qalvm-01 lvm[1149]: snapper_thinp-snap1 disappeared, detaching
May 30 16:54:12 qalvm-01 lvm[1149]: No longer monitoring snapshot snapper_thinp-snap1
May 30 16:54:14 qalvm-01 lvm[1149]: No longer monitoring snapshot snapper_thinp-snap2
Comment 8 Zdenek Kabelac 2014-11-25 10:45:14 EST
I believe another one related to Bug #1108540 - where dmeventd was incorrectly unmonitoring devices.

This is fixed with lvm2 2.02.112.
Comment 9 Peter Rajnoha 2014-11-26 02:42:27 EST
(In reply to Zdenek Kabelac from comment #8)
> I believe another one related to Bug #1108540 - where dmeventd was
> incorrectly unmonitoring devices.
> 
> This is fixed with lvm2 2.02.112.

Well, the bug #1108540 had "Device or resource busy" error, while this bug (and bug #813954) has "Invalid argument" error issued. So it seems a bit different. Anyway, would be good to see if the other error (Invalid argument) is still reproducible...
Comment 10 Peter Rajnoha 2015-04-14 10:16:07 EDT
(In reply to Peter Rajnoha from comment #9)
> (In reply to Zdenek Kabelac from comment #8)
> > I believe another one related to Bug #1108540 - where dmeventd was
> > incorrectly unmonitoring devices.
> > 
> > This is fixed with lvm2 2.02.112.
> 
> Well, the bug #1108540 had "Device or resource busy" error, while this bug
> (and bug #813954) has "Invalid argument" error issued. So it seems a bit
> different. 

(moving back to NEW as this is not yet resolved - the error described in this report differs)
Comment 11 Marian Csontos 2015-05-05 09:25:16 EDT
Can we insert an Internal Error to dmeventd which is failing here?

Then QEs will have to run tests with abort_on_internal_errors set and provide the coredump should it happen again.
Comment 12 Peter Rajnoha 2015-10-15 03:19:07 EDT
We don't have a concrete reproducer for now, marking with "devel cond nak reproducer" for now.
Comment 13 Peter Rajnoha 2015-10-15 03:20:56 EDT
Also, there were lots of fixes in dmeventd - it's probably worth trying the new lvm2 release first once we do the new build for 6.8.
Comment 14 Zdenek Kabelac 2015-10-29 08:48:15 EDT
AFAIK this bug should be closed and reopened when something new appears.

We've fixed large list of bugs - e.g.  dmeventd was breaking its own processed lvm2 command  with SIGALRM causing unexpected ioctl failures if the timer expired while command has been processed.

I'm considering this BZ as solved with release 2.02.133.
Comment 21 Corey Marthaler 2016-02-22 11:09:27 EST
Marking verified (SanityOnly) in the latest rpms.

2.6.32-615.el6.x86_64

lvm2-2.02.141-2.el6    BUILT: Wed Feb 10 07:49:03 CST 2016
lvm2-libs-2.02.141-2.el6    BUILT: Wed Feb 10 07:49:03 CST 2016
lvm2-cluster-2.02.141-2.el6    BUILT: Wed Feb 10 07:49:03 CST 2016
udev-147-2.71.el6    BUILT: Wed Feb 10 07:07:17 CST 2016
device-mapper-1.02.115-2.el6    BUILT: Wed Feb 10 07:49:03 CST 2016
device-mapper-libs-1.02.115-2.el6    BUILT: Wed Feb 10 07:49:03 CST 2016
device-mapper-event-1.02.115-2.el6    BUILT: Wed Feb 10 07:49:03 CST 2016
device-mapper-event-libs-1.02.115-2.el6    BUILT: Wed Feb 10 07:49:03 CST 2016
device-mapper-persistent-data-0.6.2-0.1.rc1.el6    BUILT: Wed Feb 10 09:52:15 CST 2016
cmirror-2.02.141-2.el6    BUILT: Wed Feb 10 07:49:03 CST 2016
Comment 23 errata-xmlrpc 2016-05-10 21:19:39 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0964.html

Note You need to log in before you can comment on or make changes to this bug.