Bug 543298 - dmeventd doesn't work
Summary: dmeventd doesn't work
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: lvm2
Version: 5.5
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Milan Broz
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 502927
TreeView+ depends on / blocked
 
Reported: 2009-12-02 04:49 UTC by Mikuláš Patočka
Modified: 2013-03-01 04:07 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-12-18 12:50:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
strace -f of dmeventd (787.70 KB, text/plain)
2009-12-02 17:11 UTC, Mikuláš Patočka
no flags Details
syslog of dmeventd (6.92 KB, text/plain)
2009-12-02 17:16 UTC, Mikuláš Patočka
no flags Details

Description Mikuláš Patočka 2009-12-02 04:49:48 UTC
On RHEL 5.5 system: I create 4 loopback devices, created 4 dm-linear devices that map the whole loopback devices and created a volume group on the top of them.

I create a mirror in the volume group and activated it.

I reloaded the primary leg with dm-error target.

I tried to read and write to the mirror. The reads and writes were redirected to the secondary log, but dmeventd did nothing. (it is supposed to remove the failing leg). lvconvert --repair did nothing either (it claimed "The mirror is consistent, nothing to repair.", but the status line clearly shows 'D' status for write error).

See this:

# dmsetup table
loop4: 0 131072 linear 7:4 0
vg2-test_long_mimage_1: 0 65536 linear 253:1 128
loop3: 0 131072 error
vg2-test_long_mimage_0: 0 65536 linear 253:2 128
vg2-test_long_mlog: 0 4096 linear 253:0 20864
loop2: 0 131072 linear 7:2 0
vg2-test_long: 0 65536 mirror disk 3 253:4 1024 block_on_error 2 253:5 0 253:6 0loop1: 0 131072 linear 7:1 0
# dmsetup status
loop4: 0 131072 linear
vg2-test_long_mimage_1: 0 65536 linear
loop3: 0 131072 error
vg2-test_long_mimage_0: 0 65536 linear
vg2-test_long_mlog: 0 4096 linear
loop2: 0 131072 linear
vg2-test_long: 0 65536 mirror 2 253:5 253:6 63/64 1 DA 3 disk 253:4 A
loop1: 0 131072 linear
# lvconvert --repair vg2/test_long
  /dev/mapper/loop3: read failed after 0 of 4096 at 0: Chyba vstupu/výstupu
  /dev/mapper/vg2-test_long_mimage_0: read failed after 0 of 4096 at 0: Chyba vstupu/výstupu
  The mirror is consistent, nothing to repair.

Comment 1 Milan Broz 2009-12-02 11:27:56 UTC
rpm -q lvm2 lvm2-cluster device-mapper ?

Comment 2 Mikuláš Patočka 2009-12-02 16:32:34 UTC
balíček lvm2-cluster není nainstalován
device-mapper-1.02.32-1.el5
device-mapper-1.02.32-1.el5

(I upgraded it with yum just before reporting and the upgraded version doesn't work neither)

Comment 3 Mikuláš Patočka 2009-12-02 16:33:25 UTC
lvm2 version: lvm2-2.02.46-8.el5_4.2

Comment 4 Mikuláš Patočka 2009-12-02 17:11:09 UTC
Created attachment 375498 [details]
strace -f of dmeventd

This is strace -f of dmeventd.

Notice repeated select(6, [5], NULL, NULL, {1, 0}) = 0 (Timeout) [ that's where nothing was hapenning ],
then <... ioctl resumed> , 0x1c5703c0) = 0 [ the event happened, I replaced the primary leg with dm-error and wrote to the mirror ]
... then the long list of calls, as dmeventd is scanning devices.
... and then it goes to sleep with ioctl(7, DM_DEV_WAIT <unfinished ...>
without fixing the mirror
... and repeated selects again ...

Comment 5 Mikuláš Patočka 2009-12-02 17:16:17 UTC
Created attachment 375500 [details]
syslog of dmeventd

Syslog. At 18:00:22 the device was activated. At 18:00:48 I simulated the error. Dmeventd scans all the devices and does nothing.

Comment 6 Milan Broz 2009-12-02 17:23:57 UTC
> device-mapper-1.02.32-1.el5
> lvm2-2.02.46-8.el5_4.2  

please retest it with 2.02.56 and dm 1.02.29 (I sent info about testing packages to lvm-team list)

dmeventd mirror handling changed completely (--repair change).

Comment 7 Mikuláš Patočka 2009-12-02 18:18:31 UTC
I upgraded to device-mapper-event-1.02.39-1.el5.x86_64.rpm, device-mapper-1.02.39-1.el5.x86_64.rpm, lvm2-2.02.56-1.el5.x86_64.rpm and it still doesn't work.

The log message differs:
Dec  2 19:19:37 schizoid lvm[30078]: Monitoring mirror device vg2-test_long for events
Dec  2 19:19:37 schizoid lvm[30078]: vg2-test_long is now in-sync
Dec  2 19:20:24 schizoid lvm[30078]: Mirror device, 253:5, has failed.
Dec  2 19:20:24 schizoid lvm[30078]: Device failure in vg2-test_long
Dec  2 19:20:24 schizoid lvm[30078]: Failed to remove faulty devices in vg2-test

Comment 8 Mikuláš Patočka 2009-12-02 18:20:00 UTC
Manual execution of lvconvert --repair vg2/test_long doesn't work either:
/dev/mapper/loop3: read failed after 0 of 4096 at 0: Chyba vstupu/výstupu
/dev/mapper/vg2-test_long_mimage_0: read failed after 0 of 4096 at 0: Chyba vstupu/výstupu
The mirror is consistent, nothing to repair.

even though "dmsetup status" shows there is an error:
vg2-test_long: 0 65536 mirror 2 253:5 253:6 63/64 1 DA 3 disk 253:4 A

Comment 9 Milan Broz 2009-12-04 17:13:21 UTC
Just FYI - if you have this mapping (0 offset):

  loop4: 0 131072 linear 7:4 0

system will find the failed PV on underlying device (/dev/loop4) so this will not work anyway (just add some offset there so signature is not directly visible on proper place in /dev/loop).

That's probably why it says "The mirror is consistent, nothing to repair."

(But it still fails even with this change, just this time it is propably real bug in code ;-)

Comment 10 Mikuláš Patočka 2009-12-07 14:18:09 UTC
I fixed it by filtering "r|/dev/loop.*|" in lvm.conf --- so it was really caused by lvm finding these real loop devices and using them instead of "dm-linear" devices.

Milan, is it OK to set the bug as Invalid? Or do you get any other problem that needs to be investigated?

Comment 11 Milan Broz 2009-12-07 15:46:13 UTC
Thank to this bug I found much more serious problem on cluster when handling local VG (but with cluster locking), so please do not close it yet, I'll close it later :)


Note You need to log in before you can comment on or make changes to this bug.