Bug 624148
Summary: | 'dm_task_run failed' when attempting to deactivate cmirror after successful device failure iteration | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Corey Marthaler <cmarthal> | ||||
Component: | lvm2 | Assignee: | Alasdair Kergon <agk> | ||||
lvm2 sub component: | Clustered Mirror / cmirrord (RHEL6) | QA Contact: | Corey Marthaler <cmarthal> | ||||
Status: | CLOSED WONTFIX | Docs Contact: | |||||
Severity: | low | ||||||
Priority: | low | CC: | agk, cmarthal, coughlan, dwysocha, heinzm, jbrassow, joe.thornber, msnitzer, prajnoha, prockai, zkabelac | ||||
Version: | 6.0 | ||||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-03-02 15:57:43 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 844493 | ||||||
Bug Blocks: | 756082, 960054, 1268411 | ||||||
Attachments: |
|
Description
Corey Marthaler
2010-08-13 20:11:23 UTC
This is reproducible. Unmounting gfs and removing mnt point on taft-01... Unmounting gfs and removing mnt point on taft-02... Unmounting gfs and removing mnt point on taft-03... Unmounting gfs and removing mnt point on taft-04... Deactivating and removing mirror(s) Error locking on node taft-01: Unable to deactivate open helter_skelter-nonsyncd_log_4legs_1_mlog (253:3) couldn't deactivate mirror nonsyncd_log_4legs_1 Aug 14 00:30:53 taft-01 lvm[2311]: dm_task_run failed, errno = 22, Invalid argument Aug 14 00:30:53 taft-01 lvm[2311]: No longer monitoring mirror device helter_skelter-nonsyncd_log_4legs_1 for events. Version-Release number of selected component (if applicable): 2.6.32-59.1.el6.x86_64 lvm2-2.02.72-7.el6 BUILT: Wed Aug 11 17:12:24 CDT 2010 lvm2-libs-2.02.72-7.el6 BUILT: Wed Aug 11 17:12:24 CDT 2010 lvm2-cluster-2.02.72-7.el6 BUILT: Wed Aug 11 17:12:24 CDT 2010 udev-147-2.22.el6 BUILT: Fri Jul 23 07:21:33 CDT 2010 device-mapper-1.02.53-7.el6 BUILT: Wed Aug 11 17:12:24 CDT 2010 device-mapper-libs-1.02.53-7.el6 BUILT: Wed Aug 11 17:12:24 CDT 2010 device-mapper-event-1.02.53-7.el6 BUILT: Wed Aug 11 17:12:24 CDT 2010 device-mapper-event-libs-1.02.53-7.el6 BUILT: Wed Aug 11 17:12:24 CDT 2010 cmirror-2.02.72-7.el6 BUILT: Wed Aug 11 17:12:24 CDT 2010 (In reply to comment #1) > This is reproducible. > Deactivating and removing mirror(s) > Error locking on node taft-01: Unable to deactivate open > helter_skelter-nonsyncd_log_4legs_1_mlog (253:3) Well, at first, does this happen with udevd stopped? I just repo'ed this with the latest 6.1 lvm rpms. Feb 10 15:22:17 taft-03 lvm[1235]: dm_task_run failed, errno = 22, Invalid argument Feb 10 15:22:17 taft-03 lvm[1235]: No longer monitoring mirror device helter_skelter-syncd_primary_2legs_1 for events. Feb 10 15:22:17 taft-03 lvm[1235]: No longer monitoring mirror device helter_skelter-syncd_primary_2legs_2 for events. 2.6.32-94.el6.x86_64 lvm2-2.02.83-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 lvm2-libs-2.02.83-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 lvm2-cluster-2.02.83-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 udev-147-2.31.el6 BUILT: Wed Jan 26 05:39:15 CST 2011 device-mapper-1.02.62-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 device-mapper-libs-1.02.62-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 device-mapper-event-1.02.62-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 device-mapper-event-libs-1.02.62-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 cmirror-2.02.83-2.el6 BUILT: Tue Feb 8 10:10:57 CST 2011 I'm not sure how to attempt this with udev stopped since nothing works w/ out it. I can't even get a mirror created with that process killed. Ok, we'll try to reproduce this again and I'll see if prajnoha can help me figure it out. Also, is this bug limited to 'after successful device failure iteration'? Corey, just to make sure - udisks is *not* installed, right? Peter, if by "udisks" you mean udev, then yes. See comment #6, udev *is* involved. If you mean something else by "udisks", then I don't think udisks are involved. [root@hayes-01 ~]# rpm -q udisks package udisks is not installed Yes, I meant udisks package - this one contains 80-udisks.rules with the OPTIONS+="watch" set for DM devices. So that minimizes a chance that this is a udev interaction problem now. I'm still seeing the dm_task_failed messages in the syslog during device failure testing. However, I didn't see the deactivation failure due to the locking errors. [root@taft-01 ~]# grep dm_task_run /var/log/messages | grep failed Sep 15 16:42:38 taft-01 lvm[3083]: dm_task_run failed, errno = 22, Invalid argument Sep 15 16:45:49 taft-01 lvm[3083]: dm_task_run failed, errno = 22, Invalid argument Sep 15 16:49:46 taft-01 lvm[3083]: dm_task_run failed, errno = 22, Invalid argument Sep 15 16:57:09 taft-01 lvm[3083]: dm_task_run failed, errno = 22, Invalid argument Sep 15 17:03:51 taft-01 lvm[3083]: dm_task_run failed, errno = 22, Invalid argument Sep 15 17:07:16 taft-01 lvm[3083]: dm_task_run failed, errno = 22, Invalid argument 2.6.32-195.el6.x86_64 lvm2-2.02.87-2.1.el6 BUILT: Wed Sep 14 09:44:16 CDT 2011 lvm2-libs-2.02.87-2.1.el6 BUILT: Wed Sep 14 09:44:16 CDT 2011 lvm2-cluster-2.02.87-2.1.el6 BUILT: Wed Sep 14 09:44:16 CDT 2011 udev-147-2.38.el6 BUILT: Fri Sep 9 16:25:50 CDT 2011 device-mapper-1.02.66-2.1.el6 BUILT: Wed Sep 14 09:44:16 CDT 2011 device-mapper-libs-1.02.66-2.1.el6 BUILT: Wed Sep 14 09:44:16 CDT 2011 device-mapper-event-1.02.66-2.1.el6 BUILT: Wed Sep 14 09:44:16 CDT 2011 device-mapper-event-libs-1.02.66-2.1.el6 BUILT: Wed Sep 14 09:44:16 CDT 2011 cmirror-2.02.87-2.1.el6 BUILT: Wed Sep 14 09:44:16 CDT 2011 Corey, can you test to see if this bug is still present? Bug 844493 has been closed - asserting that there are no more 'dm_task_run failed' errors. I'm dubious. Since those errors are what were causing this bug to trigger, this bug is probably a good place to ensure it is fixed. moving to 6.6. I will not prioritize this bug until it is reproduced and found that bug 844493 is not the cause. If bug 844493 is still occurring, then it should be reopened and /that/ bug should be prioritized. These still exist while doing failure testing. TAFT-02: May 17 13:41:28 taft-02 lvm[14338]: Mirror status: 1 of 2 images failed. May 17 13:41:28 taft-02 lvm[14338]: Repair of mirrored device helter_skelter-syncd_primary_2legs_3 finished successfully. May 17 13:41:28 taft-02 lvm[14338]: Primary mirror device 253:4 has failed (D). May 17 13:41:28 taft-02 lvm[14338]: Device failure in helter_skelter-syncd_primary_2legs_1. May 17 13:41:30 taft-02 lvm[14338]: device-mapper: waitevent ioctl on failed: Interrupted system call May 17 13:41:30 taft-02 lvm[14338]: dm_task_run failed, errno = 22, Invalid argument TAFT-04: May 17 13:41:35 taft-04 lvm[31955]: Repair of mirrored device helter_skelter-syncd_primary_2legs_3 finished successfully. May 17 13:41:35 taft-04 lvm[31955]: Primary mirror device 253:4 read failed. May 17 13:41:35 taft-04 lvm[31955]: Primary mirror device 253:8 read failed. May 17 13:41:35 taft-04 lvm[31955]: Primary mirror device 253:4 has failed (D). May 17 13:41:35 taft-04 lvm[31955]: Device failure in helter_skelter-syncd_primary_2legs_1. May 17 13:41:37 taft-04 lvm[31955]: device-mapper: waitevent ioctl on failed: Interrupted system call May 17 13:41:37 taft-04 lvm[31955]: dm_task_run failed, errno = 22, Invalid argument May 17 13:41:37 taft-04 lvm[31955]: No longer monitoring mirror device helter_skelter-syncd_primary_2legs_3 for events. May 17 13:41:37 taft-04 lvm[31955]: No longer monitoring mirror device helter_skelter-syncd_primary_2legs_2 for events. May 17 13:41:40 taft-04 lvm[31955]: Couldn't find device with uuid eeDBek-f0JX-Rb4R-0RgM-9CeD-PGbM-TMvHzN. May 17 13:41:40 taft-04 lvm[31955]: Repair of mirrored device helter_skelter-syncd_primary_2legs_1 finished successfully. May 17 13:41:42 taft-04 lvm[31955]: No longer monitoring mirror device helter_skelter-syncd_primary_2legs_1 for events. 2.6.32-354.el6.x86_64 lvm2-2.02.98-9.el6 BUILT: Wed Jan 23 10:06:55 CST 2013 lvm2-libs-2.02.98-9.el6 BUILT: Wed Jan 23 10:06:55 CST 2013 lvm2-cluster-2.02.98-9.el6 BUILT: Wed Jan 23 10:06:55 CST 2013 udev-147-2.46.el6 BUILT: Tue Jan 15 06:41:59 CST 2013 device-mapper-1.02.77-9.el6 BUILT: Wed Jan 23 10:06:55 CST 2013 device-mapper-libs-1.02.77-9.el6 BUILT: Wed Jan 23 10:06:55 CST 2013 device-mapper-event-1.02.77-9.el6 BUILT: Wed Jan 23 10:06:55 CST 2013 device-mapper-event-libs-1.02.77-9.el6 BUILT: Wed Jan 23 10:06:55 CST 2013 cmirror-2.02.98-9.el6 BUILT: Wed Jan 23 10:06:55 CST 2013 Ok, so the messages are still there, but the original problem is not (mirror unable to deactivate)? I will have to push this out to 6.7 for a possible fix. I am also seeing more of these "device-mapper: waitevent ioctl on failed: Interrupted system call" messages in bug 1127231, which is another failure testing scenario involving RAID. I'm not yet sure why those are happening, but they seem to accompany these issues - not cause them. I still see the messages: Feb 25 18:39:32 bp-01 lvm[9299]: device-mapper: waitevent ioctl on failed: Interrupted system call Feb 25 18:39:32 bp-01 lvm[9299]: dm_task_run failed, errno = 22, Invalid argument However, they have no ill effect. I am unclear what is causing them, but I'm setting priority and severity to "low". Likely, the cause won't be discovered and fixed in 6.7. need verbose logs from: - dmeventd - and other LVM commands Created attachment 1002962 [details]
Logs from 3 machines that were running tests when problem occurred
This tarball contains the logs from three machines that were performing the tests at the time the message was printed. The error only seemed to be produced from bp-02, but the other logs are provided for completeness.
Mar 17 12:28:06 bp-02 lvm[33921]: device-mapper: waitevent ioctl on failed: Interrupted system call Mar 17 12:28:06 bp-02 lvm[33921]: dm_task_run failed, errno = 22, Invalid argument Could we try to reproduce this with debug logging enabled in dmeventd to get a better idea of the sequence of events that might be occurring? Add -ddd (making sure syslog captures it) (We're wondering whether the original errno gets overwritten before the caller reads it, so we'll change the code to store and return it explicitly.) The errno code has been improved (upstream) since the last reports of this here, so once we have 6.8 builds, we need to see whether the problem is still there or not, and if it is, obtain the updated diagnostics that the new code provides. This bug no longer exists in its original harmful form. We had seen harmless error messages that we were looking to address with this bug, but we haven't seen them lately either. closing this bug... |