182432 – cluster mirrors should not be attempted when cmirror modules are not loaded

Bug 182432 - cluster mirrors should not be attempted when cmirror modules are not loaded

Summary: cluster mirrors should not be attempted when cmirror modules are not loaded

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Jonathan Earl Brassow
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	236345 (view as bug list)
Depends On:
Blocks:	388661 438834
TreeView+	depends on / blocked

Reported:	2006-02-22 15:40 UTC by Corey Marthaler
Modified:	2012-06-20 13:28 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-06-20 13:28:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Corey Marthaler 2006-02-22 15:40:21 UTC

Description of problem:
This is related to bz 177716. Those errors are caused by missing cmirror
modules. An error stating which modules are needed should be given instead.

[root@taft-04 lvm]# lvcreate -l 34875 -n mirror_20 -m 2 mirror_2
  Error locking on node taft-03: Internal lvm error, check syslog
  Error locking on node taft-01: Internal lvm error, check syslog
  Error locking on node taft-02: Internal lvm error, check syslog
  Error locking on node taft-04: Internal lvm error, check syslog
  Failed to activate new LV.

Feb 21 09:17:00 link-08 kernel: device-mapper: dm-mirror: Invalid number of mirrors
Feb 21 09:17:00 link-08 kernel: device-mapper: error adding target to table

Comment 1 Jonathan Earl Brassow 2006-02-22 15:45:03 UTC

I doubt that this is the error you will recieve with the latest RPMS...

Also, use '-m1'.  We are not interested in more than 2 sided mirrors at this point.

Comment 2 Corey Marthaler 2006-02-22 16:04:23 UTC

By the "lastest rpms" you mean the latest cmirror rpms, correct?

If you don't have any cmirror rpms installed yet still have the latest
device-mapper and lvm2, you will see these errors. What you should see though,
is a missing module or rpm warning.


[root@link-01 ~]# rpm -q device-mapper
device-mapper-1.02.03-1.0.RHEL4
[root@link-01 ~]# rpm -q lvm2
lvm2-2.02.02-1.0.RHEL4
[root@link-01 ~]# rpm -q lvm2-cluster
lvm2-cluster-2.02.01-1.2.RHEL4

[root@link-02 ~]# rpm -q device-mapper
device-mapper-1.02.03-1.0.RHEL4
[root@link-02 ~]# rpm -q lvm2
lvm2-2.02.02-1.0.RHEL4
[root@link-02 ~]# rpm -q lvm2-cluster
lvm2-cluster-2.02.01-1.2.RHEL4

[root@link-08 ~]# rpm -q device-mapper
device-mapper-1.02.03-1.0.RHEL4
[root@link-08 ~]# rpm -q lvm2
lvm2-2.02.02-1.0.RHEL4
[root@link-08 ~]# rpm -q lvm2-cluster
lvm2-cluster-2.02.01-1.2.RHEL4


[root@link-02 ~]# lvcreate -m 1 -L 100M VG -n cmirror1
  Error locking on node link-01: Internal lvm error, check syslog
  Error locking on node link-08: Internal lvm error, check syslog
  Error locking on node link-02: Internal lvm error, check syslog

device-mapper: dm-mirror: Error creating mirror dirty log
device-mapper: error adding target to table

[root@link-02 ~]# lvs
  LV       VG   Attr   LSize   Origin Snap%  Move Log           Copy%
  cmirror  VG   mwi-d- 100.00M                    cmirror_mlog    0.00
  cmirror1 VG   mwi-d- 100.00M                    cmirror1_mlog   0.00

Without the cmirror rpm/module it shouldn't attempt to create any clustered mirror.

Comment 3 Jonathan Earl Brassow 2006-02-22 16:21:42 UTC

I mean the latest device-mapper and lvm2[-cluster] RPMs, which you now have; and, as you can see, 
the error messages are different from the original post:
"Feb 21 09:17:00 link-08 kernel: device-mapper: dm-mirror: Invalid number of mirrors
Feb 21 09:17:00 link-08 kernel: device-mapper: error adding target to table"
vs.
"device-mapper: dm-mirror: Error creating mirror dirty log
device-mapper: error adding target to table"

As far as not attempting to create a cluster mirror if the module is not loaded, I think that stretching 
it...  I think it should fail, but perhaps give a better indication of the problem... like "Error creating 
mirror dirty log of type 'clustered_disk'".  What do you think?

Comment 4 Corey Marthaler 2006-02-22 16:39:06 UTC

I still gotta go with with what I posted, it shouldn't try it if you don't have
the code. Take snaps for instance, I don't have the code, it doesn't try. 

[root@link-02 ~]# lvcreate -s /dev/VG/cmirror -L 100M -n snappy
  Clustered snapshots are not yet supported.

In this mirror case, I end up stuck with a "real" volume in some unkown state
that depending on the user, will either get deleted, or attempted to be used. 

There are many instances in CS where if you don't have the right rpm/module,
that code will call you an idiot. As 177716 shows, we've already got one
possible customer and one tester trying this and not knowing what to do after it
failed or even why it failed.

To me, the following does not say, "look moron, why don't you load the proper
rpm if you want clustered mirrors", to me is says, "for some reason your
clustered mirror attempt failed" which leavs me thinking, why did it fail?

device-mapper: dm-mirror: Error creating mirror dirty log
device-mapper: error adding target to table

Comment 5 Jonathan Earl Brassow 2006-05-16 14:27:28 UTC

A init script has been introduced to fix this problem.

Comment 6 Corey Marthaler 2006-05-30 22:59:05 UTC

What if the init script doesn't get run or fails for some reason? 

If we are just going to have the attempt fail instead of not attempting it at
all, then a better message is needed. Something about not being able to contact
cmirror or the module not being loaded. To a newbie, "dm-mirror: Error creating
mirror dirty log" doesn't really mean anything other than something is wrong.

Comment 7 Corey Marthaler 2006-06-02 20:33:35 UTC

More ammo in defense of mirror creation not being attempted when the module is
not loaded is the whole Gulm issue (193597 and 193907). If we don't support
mirroring on gulm, we shouldn't try it. Instead, it blindly attempts it and then
half works/half fails. What's the user supposed to think at that point?

[root@taft-04 ~]# gulm_tool getstats $(hostname)
I_am = Client
Master = taft-01.lab.msp.redhat.com
rank = -1
quorate = true
GenerationID = 1149257635110865
run time = 24
pid = 4284
verbosity = Default
failover = enabled

[root@taft-04 ~]# lvcreate -L 10G -m 1 -n deanmirror mirror_1
  Error locking on node taft-03: Internal lvm error, check syslog
  Error locking on node taft-01: Internal lvm error, check syslog
  Error locking on node taft-02: Internal lvm error, check syslog
  Error locking on node taft-04: Internal lvm error, check syslog
  Failed to activate new LV.

[root@taft-04 ~]# lvscan
  ACTIVE            '/dev/mirror_1/deanmirror' [10.00 GB] inherit

[root@taft-04 ~]# lvs
  LV         VG       Attr   LSize  Origin Snap%  Move Log             Copy%
  deanmirror mirror_1 mwi-d- 10.00G                    deanmirror_mlog   0.00

Comment 8 Corey Marthaler 2006-07-20 15:11:41 UTC

Hit this senario again, I forgot to load the module, it's still early in the
morning give me a break :) and it took me awhile to realize why my creations
were failing all of a sudden. Nothing in the syslog or the errors from the cmd
say, "the module isn't loaded dummy".

Comment 10 Alasdair Kergon 2006-10-18 17:59:21 UTC

We don't actually have a proper mechanism yet for userspace to ask the kernel
what mirror logs are registered.

Comment 11 Alasdair Kergon 2006-10-18 18:05:04 UTC

(for now, the best you could do is have mirrored.c's local target_present check
for CLUSTERED and log_lv and extend target_present() to check /proc/modules &
issue modprobe for unsupported cases like this - and there'll be more in future
if modules get shared with multipath)

Comment 12 Jonathan Earl Brassow 2006-11-29 23:22:39 UTC

something would have to be added to the activation code.  Otherwise, the node
issuing the command will load the module, but all the other nodes won't - which
means that the create still fails (on most of the nodes).

Comment 13 Kiersten (Kerri) Anderson 2007-01-04 17:16:47 UTC

Moving this to the 4.6 release consideration due to the impact of the code
changes required.

Comment 16 Corey Marthaler 2007-04-13 16:00:01 UTC

*** Bug 236345 has been marked as a duplicate of this bug. ***

Comment 17 Corey Marthaler 2007-04-13 16:03:44 UTC

Now that this is causing confusion for customers, maybe we should do something
with this bug.

Comment 18 Michael Hideo 2007-06-06 04:45:13 UTC

Adding 'cc ecs-dev-list for tracking

Comment 19 Corey Marthaler 2007-08-14 19:47:54 UTC

If comments 4, 6, and 7 (especially 7), aren't good enough check out how some of
our other components deal with not having the init script started:

[root@taft-03 ~]# ccs_tool update 2
Unable to connect to the CCS daemon: Connection refused

Failed to update config file.


[root@taft-03 ~]# cman_tool nodes
cman_tool: can't open /proc/cluster/nodes, cman not running


[root@taft-03 ~]# gulm_tool getstats taft-02
Failed to connect to taft-02 (::ffff:10.15.89.68 40040) Connection refused
In src/gulm_tool.c:607 (1.0.10) death by:
Failed to connect to server


[root@taft-03 ~]# clustat
Could not connect to cluster service


They all either don't attempt the operation or give a meaningful error.

Comment 25 Jiri Pallich 2012-06-20 13:28:40 UTC

Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.

Note You need to log in before you can comment on or make changes to this bug.