543298 – dmeventd doesn't work

Bug 543298 - dmeventd doesn't work

Summary: dmeventd doesn't work

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	5.5
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Milan Broz
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	502927
TreeView+	depends on / blocked

Reported:	2009-12-02 04:49 UTC by Mikuláš Patočka
Modified:	2013-03-01 04:07 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-12-18 12:50:54 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
strace -f of dmeventd (787.70 KB, text/plain) 2009-12-02 17:11 UTC, Mikuláš Patočka	no flags	Details
syslog of dmeventd (6.92 KB, text/plain) 2009-12-02 17:16 UTC, Mikuláš Patočka	no flags	Details
View All

Description Mikuláš Patočka 2009-12-02 04:49:48 UTC

On RHEL 5.5 system: I create 4 loopback devices, created 4 dm-linear devices that map the whole loopback devices and created a volume group on the top of them.

I create a mirror in the volume group and activated it.

I reloaded the primary leg with dm-error target.

I tried to read and write to the mirror. The reads and writes were redirected to the secondary log, but dmeventd did nothing. (it is supposed to remove the failing leg). lvconvert --repair did nothing either (it claimed "The mirror is consistent, nothing to repair.", but the status line clearly shows 'D' status for write error).

See this:

# dmsetup table
loop4: 0 131072 linear 7:4 0
vg2-test_long_mimage_1: 0 65536 linear 253:1 128
loop3: 0 131072 error
vg2-test_long_mimage_0: 0 65536 linear 253:2 128
vg2-test_long_mlog: 0 4096 linear 253:0 20864
loop2: 0 131072 linear 7:2 0
vg2-test_long: 0 65536 mirror disk 3 253:4 1024 block_on_error 2 253:5 0 253:6 0loop1: 0 131072 linear 7:1 0
# dmsetup status
loop4: 0 131072 linear
vg2-test_long_mimage_1: 0 65536 linear
loop3: 0 131072 error
vg2-test_long_mimage_0: 0 65536 linear
vg2-test_long_mlog: 0 4096 linear
loop2: 0 131072 linear
vg2-test_long: 0 65536 mirror 2 253:5 253:6 63/64 1 DA 3 disk 253:4 A
loop1: 0 131072 linear
# lvconvert --repair vg2/test_long
  /dev/mapper/loop3: read failed after 0 of 4096 at 0: Chyba vstupu/výstupu
  /dev/mapper/vg2-test_long_mimage_0: read failed after 0 of 4096 at 0: Chyba vstupu/výstupu
  The mirror is consistent, nothing to repair.

Comment 1 Milan Broz 2009-12-02 11:27:56 UTC

rpm -q lvm2 lvm2-cluster device-mapper ?

Comment 2 Mikuláš Patočka 2009-12-02 16:32:34 UTC

balíček lvm2-cluster není nainstalován
device-mapper-1.02.32-1.el5
device-mapper-1.02.32-1.el5

(I upgraded it with yum just before reporting and the upgraded version doesn't work neither)

Comment 3 Mikuláš Patočka 2009-12-02 16:33:25 UTC

lvm2 version: lvm2-2.02.46-8.el5_4.2

Comment 4 Mikuláš Patočka 2009-12-02 17:11:09 UTC

Created attachment 375498 [details]
strace -f of dmeventd

This is strace -f of dmeventd.

Notice repeated select(6, [5], NULL, NULL, {1, 0}) = 0 (Timeout) [ that's where nothing was hapenning ],
then <... ioctl resumed> , 0x1c5703c0) = 0 [ the event happened, I replaced the primary leg with dm-error and wrote to the mirror ]
... then the long list of calls, as dmeventd is scanning devices.
... and then it goes to sleep with ioctl(7, DM_DEV_WAIT <unfinished ...>
without fixing the mirror
... and repeated selects again ...

Comment 5 Mikuláš Patočka 2009-12-02 17:16:17 UTC

Created attachment 375500 [details]
syslog of dmeventd

Syslog. At 18:00:22 the device was activated. At 18:00:48 I simulated the error. Dmeventd scans all the devices and does nothing.

Comment 6 Milan Broz 2009-12-02 17:23:57 UTC

> device-mapper-1.02.32-1.el5
> lvm2-2.02.46-8.el5_4.2  

please retest it with 2.02.56 and dm 1.02.29 (I sent info about testing packages to lvm-team list)

dmeventd mirror handling changed completely (--repair change).

Comment 7 Mikuláš Patočka 2009-12-02 18:18:31 UTC

I upgraded to device-mapper-event-1.02.39-1.el5.x86_64.rpm, device-mapper-1.02.39-1.el5.x86_64.rpm, lvm2-2.02.56-1.el5.x86_64.rpm and it still doesn't work.

The log message differs:
Dec  2 19:19:37 schizoid lvm[30078]: Monitoring mirror device vg2-test_long for events
Dec  2 19:19:37 schizoid lvm[30078]: vg2-test_long is now in-sync
Dec  2 19:20:24 schizoid lvm[30078]: Mirror device, 253:5, has failed.
Dec  2 19:20:24 schizoid lvm[30078]: Device failure in vg2-test_long
Dec  2 19:20:24 schizoid lvm[30078]: Failed to remove faulty devices in vg2-test

Comment 8 Mikuláš Patočka 2009-12-02 18:20:00 UTC

Manual execution of lvconvert --repair vg2/test_long doesn't work either:
/dev/mapper/loop3: read failed after 0 of 4096 at 0: Chyba vstupu/výstupu
/dev/mapper/vg2-test_long_mimage_0: read failed after 0 of 4096 at 0: Chyba vstupu/výstupu
The mirror is consistent, nothing to repair.

even though "dmsetup status" shows there is an error:
vg2-test_long: 0 65536 mirror 2 253:5 253:6 63/64 1 DA 3 disk 253:4 A

Comment 9 Milan Broz 2009-12-04 17:13:21 UTC

Just FYI - if you have this mapping (0 offset):

  loop4: 0 131072 linear 7:4 0

system will find the failed PV on underlying device (/dev/loop4) so this will not work anyway (just add some offset there so signature is not directly visible on proper place in /dev/loop).

That's probably why it says "The mirror is consistent, nothing to repair."

(But it still fails even with this change, just this time it is propably real bug in code ;-)

Comment 10 Mikuláš Patočka 2009-12-07 14:18:09 UTC

I fixed it by filtering "r|/dev/loop.*|" in lvm.conf --- so it was really caused by lvm finding these real loop devices and using them instead of "dm-linear" devices.

Milan, is it OK to set the bug as Invalid? Or do you get any other problem that needs to be investigated?

Comment 11 Milan Broz 2009-12-07 15:46:13 UTC

Thank to this bug I found much more serious problem on cluster when handling local VG (but with cluster locking), so please do not close it yet, I'll close it later :)

Note You need to log in before you can comment on or make changes to this bug.