Bug 624148 - 'dm_task_run failed' when attempting to deactivate cmirror after successful device failure iteration [NEEDINFO]
'dm_task_run failed' when attempting to deactivate cmirror after successful d...
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2 (Show other bugs)
6.0
All Linux
low Severity low
: rc
: ---
Assigned To: Alasdair Kergon
Corey Marthaler
:
Depends On: 844493
Blocks: 960054 756082 1268411
  Show dependency treegraph
 
Reported: 2010-08-13 16:11 EDT by Corey Marthaler
Modified: 2016-04-07 12:02 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-03-02 10:57:43 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
agk: needinfo? (cmarthal)


Attachments (Terms of Use)
Logs from 3 machines that were running tests when problem occurred (184.83 KB, application/octet-stream)
2015-03-17 18:06 EDT, Jonathan Earl Brassow
no flags Details

  None (edit)
Description Corey Marthaler 2010-08-13 16:11:23 EDT
Description of problem:
From helter_skelter output:

./helter_skelter -l /home/msp/cmarthal/work/rhel6/sts-root -r /usr/tests/sts-rhel6.0/ -R ../../var/share/resource_files/taft.xml -e kill_primary_synced_2_legs -i 3

[...] # all the failure stuff worked

Stopping the io load (collie/xdoio) on mirror(s)
Unmounting gfs and removing mnt point on taft-01...
Unmounting gfs and removing mnt point on taft-02...
Unmounting gfs and removing mnt point on taft-03...
Unmounting gfs and removing mnt point on taft-04...
Deactivating and removing mirror(s)
  Error locking on node taft-01: Unable to deactivate open helter_skelter-syncd_primary_2legs_1_mlog (253:3)
couldn't deactivate mirror syncd_primary_2legs_1

From log:
Aug 13 20:00:59 taft-01 qarshd[9870]: Running cmdline: lvchange -an /dev/helter_skelter/syncd_primary_2legs_1
Aug 13 20:00:59 taft-01 lvm[2371]: dm_task_run failed, errno = 22, Invalid argument
Aug 13 20:00:59 taft-01 lvm[2371]: No longer monitoring mirror device helter_skelter-syncd_primary_2legs_1 for events.


Version-Release number of selected component (if applicable):
2.6.32-59.1.el6.x86_64

lvm2-2.02.72-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
lvm2-libs-2.02.72-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
lvm2-cluster-2.02.72-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
udev-147-2.22.el6    BUILT: Fri Jul 23 07:21:33 CDT 2010
device-mapper-1.02.53-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
device-mapper-libs-1.02.53-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
device-mapper-event-1.02.53-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
device-mapper-event-libs-1.02.53-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
cmirror-2.02.72-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
Comment 1 Corey Marthaler 2010-08-16 12:35:13 EDT
This is reproducible.

Unmounting gfs and removing mnt point on taft-01...
Unmounting gfs and removing mnt point on taft-02...
Unmounting gfs and removing mnt point on taft-03...
Unmounting gfs and removing mnt point on taft-04...
Deactivating and removing mirror(s)
  Error locking on node taft-01: Unable to deactivate open helter_skelter-nonsyncd_log_4legs_1_mlog (253:3)
couldn't deactivate mirror nonsyncd_log_4legs_1

Aug 14 00:30:53 taft-01 lvm[2311]: dm_task_run failed, errno = 22, Invalid argument
Aug 14 00:30:53 taft-01 lvm[2311]: No longer monitoring mirror device helter_skelter-nonsyncd_log_4legs_1 for events.


Version-Release number of selected component (if applicable):
2.6.32-59.1.el6.x86_64

lvm2-2.02.72-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
lvm2-libs-2.02.72-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
lvm2-cluster-2.02.72-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
udev-147-2.22.el6    BUILT: Fri Jul 23 07:21:33 CDT 2010
device-mapper-1.02.53-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
device-mapper-libs-1.02.53-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
device-mapper-event-1.02.53-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
device-mapper-event-libs-1.02.53-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
cmirror-2.02.72-7.el6    BUILT: Wed Aug 11 17:12:24 CDT 2010
Comment 3 Peter Rajnoha 2010-09-09 07:06:14 EDT
(In reply to comment #1)
> This is reproducible.
> Deactivating and removing mirror(s)
>   Error locking on node taft-01: Unable to deactivate open
> helter_skelter-nonsyncd_log_4legs_1_mlog (253:3)

Well, at first, does this happen with udevd stopped?
Comment 5 Corey Marthaler 2011-02-10 16:28:04 EST
I just repo'ed this with the latest 6.1 lvm rpms. 

Feb 10 15:22:17 taft-03 lvm[1235]: dm_task_run failed, errno = 22, Invalid argument
Feb 10 15:22:17 taft-03 lvm[1235]: No longer monitoring mirror device helter_skelter-syncd_primary_2legs_1 for events.
Feb 10 15:22:17 taft-03 lvm[1235]: No longer monitoring mirror device helter_skelter-syncd_primary_2legs_2 for events.

2.6.32-94.el6.x86_64

lvm2-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
lvm2-libs-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
lvm2-cluster-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
udev-147-2.31.el6    BUILT: Wed Jan 26 05:39:15 CST 2011
device-mapper-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-libs-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-event-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
device-mapper-event-libs-1.02.62-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
cmirror-2.02.83-2.el6    BUILT: Tue Feb  8 10:10:57 CST 2011
Comment 6 Corey Marthaler 2011-02-10 16:34:35 EST
I'm not sure how to attempt this with udev stopped since nothing works w/ out it. I can't even get a mirror created with that process killed.
Comment 11 Jonathan Earl Brassow 2011-06-03 13:14:22 EDT
Ok, we'll try to reproduce this again and I'll see if prajnoha can help me figure it out.

Also, is this bug limited to 'after successful device failure iteration'?
Comment 12 Peter Rajnoha 2011-06-06 05:44:40 EDT
Corey, just to make sure - udisks is *not* installed, right?
Comment 13 Corey Marthaler 2011-06-06 15:02:02 EDT
Peter, if by "udisks" you mean udev, then yes. See comment #6, udev *is* involved. 

If you mean something else by "udisks", then I don't think udisks are involved.

[root@hayes-01 ~]# rpm -q udisks
package udisks is not installed
Comment 14 Peter Rajnoha 2011-06-07 04:26:49 EDT
Yes, I meant udisks package - this one contains 80-udisks.rules with the OPTIONS+="watch" set for DM devices. So that minimizes a chance that this is a udev interaction problem now.
Comment 15 Corey Marthaler 2011-09-16 12:40:27 EDT
I'm still seeing the dm_task_failed messages in the syslog during device failure testing. However, I didn't see the deactivation failure due to the locking errors.

[root@taft-01 ~]# grep dm_task_run /var/log/messages | grep failed
Sep 15 16:42:38 taft-01 lvm[3083]: dm_task_run failed, errno = 22, Invalid argument
Sep 15 16:45:49 taft-01 lvm[3083]: dm_task_run failed, errno = 22, Invalid argument
Sep 15 16:49:46 taft-01 lvm[3083]: dm_task_run failed, errno = 22, Invalid argument
Sep 15 16:57:09 taft-01 lvm[3083]: dm_task_run failed, errno = 22, Invalid argument
Sep 15 17:03:51 taft-01 lvm[3083]: dm_task_run failed, errno = 22, Invalid argument
Sep 15 17:07:16 taft-01 lvm[3083]: dm_task_run failed, errno = 22, Invalid argument

2.6.32-195.el6.x86_64

lvm2-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
lvm2-libs-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
lvm2-cluster-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
udev-147-2.38.el6    BUILT: Fri Sep  9 16:25:50 CDT 2011
device-mapper-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
device-mapper-libs-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
device-mapper-event-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
device-mapper-event-libs-1.02.66-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
cmirror-2.02.87-2.1.el6    BUILT: Wed Sep 14 09:44:16 CDT 2011
Comment 25 Jonathan Earl Brassow 2013-05-15 16:56:33 EDT
Corey, can you test to see if this bug is still present?

Bug 844493 has been closed - asserting that there are no more 'dm_task_run failed' errors.  I'm dubious.  Since those errors are what were causing this bug to trigger, this bug is probably a good place to ensure it is fixed.
Comment 26 Jonathan Earl Brassow 2013-05-15 16:59:38 EDT
moving to 6.6.  I will not prioritize this bug until it is reproduced and found that bug 844493 is not the cause.  If bug 844493 is still occurring, then it should be reopened and /that/ bug should be prioritized.
Comment 27 Corey Marthaler 2013-05-17 15:30:02 EDT
These still exist while doing failure testing.

TAFT-02:
May 17 13:41:28 taft-02 lvm[14338]: Mirror status: 1 of 2 images failed.
May 17 13:41:28 taft-02 lvm[14338]: Repair of mirrored device helter_skelter-syncd_primary_2legs_3 finished successfully.
May 17 13:41:28 taft-02 lvm[14338]: Primary mirror device 253:4 has failed (D).
May 17 13:41:28 taft-02 lvm[14338]: Device failure in helter_skelter-syncd_primary_2legs_1.
May 17 13:41:30 taft-02 lvm[14338]: device-mapper: waitevent ioctl on  failed: Interrupted system call
May 17 13:41:30 taft-02 lvm[14338]: dm_task_run failed, errno = 22, Invalid argument


TAFT-04:
May 17 13:41:35 taft-04 lvm[31955]: Repair of mirrored device helter_skelter-syncd_primary_2legs_3 finished successfully.
May 17 13:41:35 taft-04 lvm[31955]: Primary mirror device 253:4 read failed.
May 17 13:41:35 taft-04 lvm[31955]: Primary mirror device 253:8 read failed.
May 17 13:41:35 taft-04 lvm[31955]: Primary mirror device 253:4 has failed (D).
May 17 13:41:35 taft-04 lvm[31955]: Device failure in helter_skelter-syncd_primary_2legs_1.
May 17 13:41:37 taft-04 lvm[31955]: device-mapper: waitevent ioctl on  failed: Interrupted system call
May 17 13:41:37 taft-04 lvm[31955]: dm_task_run failed, errno = 22, Invalid argument
May 17 13:41:37 taft-04 lvm[31955]: No longer monitoring mirror device helter_skelter-syncd_primary_2legs_3 for events.
May 17 13:41:37 taft-04 lvm[31955]: No longer monitoring mirror device helter_skelter-syncd_primary_2legs_2 for events.
May 17 13:41:40 taft-04 lvm[31955]: Couldn't find device with uuid eeDBek-f0JX-Rb4R-0RgM-9CeD-PGbM-TMvHzN.
May 17 13:41:40 taft-04 lvm[31955]: Repair of mirrored device helter_skelter-syncd_primary_2legs_1 finished successfully.
May 17 13:41:42 taft-04 lvm[31955]: No longer monitoring mirror device helter_skelter-syncd_primary_2legs_1 for events.



2.6.32-354.el6.x86_64
lvm2-2.02.98-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
lvm2-libs-2.02.98-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
lvm2-cluster-2.02.98-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
udev-147-2.46.el6    BUILT: Tue Jan 15 06:41:59 CST 2013
device-mapper-1.02.77-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
device-mapper-libs-1.02.77-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
device-mapper-event-1.02.77-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
device-mapper-event-libs-1.02.77-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
cmirror-2.02.98-9.el6    BUILT: Wed Jan 23 10:06:55 CST 2013
Comment 29 Jonathan Earl Brassow 2014-08-26 11:56:49 EDT
Ok, so the messages are still there, but the original problem is not (mirror unable to deactivate)?

I will have to push this out to 6.7 for a possible fix.

I am also seeing more of these "device-mapper: waitevent ioctl on  failed: Interrupted system call" messages in bug 1127231, which is another failure testing scenario involving RAID.  I'm not yet sure why those are happening, but they seem to accompany these issues - not cause them.
Comment 34 Jonathan Earl Brassow 2015-02-25 18:53:48 EST
I still see the messages:
Feb 25 18:39:32 bp-01 lvm[9299]: device-mapper: waitevent ioctl on  failed: Interrupted system call
Feb 25 18:39:32 bp-01 lvm[9299]: dm_task_run failed, errno = 22, Invalid argument

However, they have no ill effect.  I am unclear what is causing them, but I'm setting priority and severity to "low".  Likely, the cause won't be discovered and fixed in 6.7.
Comment 35 Jonathan Earl Brassow 2015-03-02 10:20:15 EST
need verbose logs from:
- dmeventd
- and other LVM commands
Comment 36 Jonathan Earl Brassow 2015-03-17 18:06:18 EDT
Created attachment 1002962 [details]
Logs from 3 machines that were running tests when problem occurred

This tarball contains the logs from three machines that were performing the tests at the time the message was printed.  The error only seemed to be produced from bp-02, but the other logs are provided for completeness.
Comment 37 Alasdair Kergon 2015-04-29 12:15:04 EDT
Mar 17 12:28:06 bp-02 lvm[33921]: device-mapper: waitevent ioctl on  failed: Interrupted system call
Mar 17 12:28:06 bp-02 lvm[33921]: dm_task_run failed, errno = 22, Invalid argument


Could we try to reproduce this with debug logging enabled in dmeventd to get a better idea of the sequence of events that might be occurring?  Add -ddd (making sure syslog captures it)
Comment 38 Alasdair Kergon 2015-05-18 15:08:39 EDT
(We're wondering whether the original errno gets overwritten before the caller reads it, so we'll change the code to store and return it explicitly.)
Comment 40 Alasdair Kergon 2015-10-16 14:29:01 EDT
The errno code has been improved (upstream) since the last reports of this here, so once we have 6.8 builds, we need to see whether the problem is still there or not, and if it is, obtain the updated diagnostics that the new code provides.
Comment 44 Jonathan Earl Brassow 2016-03-02 10:57:43 EST
This bug no longer exists in its original harmful form.  We had seen harmless error messages that we were looking to address with this bug, but we haven't seen them lately either.  closing this bug...

Note You need to log in before you can comment on or make changes to this bug.