1161520 – [3.4-6.6] Use edit-node tools to add plugin will cause kernel panic after auto-install and TUI-install

Bug 1161520 - [3.4-6.6] Use edit-node tools to add plugin will cause kernel panic after auto-install and TUI-install

Summary: [3.4-6.6] Use edit-node tools to add plugin will cause kernel panic after aut...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-node
Sub Component:
Version:	3.4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	3.5.0
Assignee:	Fabian Deutsch
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:	node
Depends On:
Blocks:	1158465 1166705
TreeView+	depends on / blocked

Reported:	2014-11-07 09:48 UTC by Yaning Wang
Modified:	2016-02-10 20:06 UTC (History)
CC List:	26 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1166705 (view as bug list)
Environment:
Last Closed:	2015-02-11 21:05:37 UTC
oVirt Team:	Node
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
kernel panic after auto-install reboot (107.70 KB, image/png) 2014-11-07 09:48 UTC, Yaning Wang	no flags	Details
Console output (464.58 KB, text/plain) 2014-11-07 19:18 UTC, Ryan Barry	no flags	Details
Booted with udev debug output (2.68 MB, text/plain) 2014-11-12 10:35 UTC, Fabian Deutsch	no flags	Details
udevadm info --export-db (143.01 KB, text/plain) 2014-11-12 15:07 UTC, Ryan Barry	no flags	Details
udevadm info console (144.14 KB, text/plain) 2014-11-13 14:45 UTC, Ryan Barry	no flags	Details
console_output_after_autoinstall_reboot (670 bytes, text/plain) 2014-11-18 03:02 UTC, Yaning Wang	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2015:0160	0	normal	SHIPPED_LIVE	ovirt-node bug fix and enhancement update	2015-02-12 01:34:52 UTC
oVirt gerrit	35160	0	master	MERGED	recipe: Fix udev patch	Never

Description Yaning Wang 2014-11-07 09:48:32 UTC

Created attachment 954858 [details]
kernel panic after auto-install reboot

Description:

Use edit-node tools to add plugin will cause kernel panic after auto-install

Version:

rhev-hypervisor6-6.6-20141106.1.iso
ovirt-node-3.0.1-19.el6.22.noarch.rpm

ovirt-node-tools-3.0.1-19.el6.22.noarch.rpm
ovirt-node-plugin-puppet-3.0.1-19.el6.22.noarch.rpm   


Steps:
1. add puppet plugin

edit-node --install-plugin=ovirt-node-plugin-puppet-3.0.1-19.el6.22.noarch.rpm --repo=edit-node.repo /rhev-hypervisor6-6.6-20141106.1.iso   

2. auto-install with the edited iso

"storage_init=ata firstboot"

Actual results:

after reboot, will kernel panic 

Additional info: 

this will also happens when add automation plugin, so will block auto-test

Comment 1 Ying Cui 2014-11-07 14:19:44 UTC

This bug impact RHEVH 6.6 for 3.4.z automation testing, about 100+ automation test cases can not be executed.

Comment 2 Ryan Barry 2014-11-07 19:18:32 UTC

Created attachment 955047 [details]
Console output

Comment 3 Ryan Barry 2014-11-07 19:25:47 UTC

Harald -

This appears to be a problem with devicemapper.

If you drop to a rescue shell on the image (which is only possible by changing root=... to something different, possible bug? I'd expect rdshell rddebug to give me a shell instead of panicing if it times out waiting for the root device), devicemapper is holding on to the disk.

/dev/mapper/1ATA_QEMU_HARDDISK_QM00001 is present, but none of its partitions.

/dev/sda3 is present, but not able to be mounted.

dmsetup table shows:
1ATA_QEMU_HARDDISK_QM00001: 0 16777216 multipath 0 0 1 1 round-robing 0 1 1 8:0 1

Removing it (dmsetup remove 1ATA...) lets sda3 mount, and booting continue as normal.

This is unexpected behavior when booting with rd_NO_DM, and a change from the last image. An SRPM diff is below, and it's in one of these packages:

    --- rhev-hypervisor6-6.6-20141021.0.iso.d/isolinux/manifest-srpm.txt 2014-10-21 00:35:05.000000000 -0700
    +++ rhev-hypervisor6-6.6-20141106.1.iso.d/isolinux/manifest-srpm.txt 2014-11-06 10:24:00.000000000 -0700
    -curl-7.19.7-37.el6_5.3.src.rpm
    +curl-7.19.7-40.el6_6.1.src.rpm
    -initscripts-9.03.46-1.el6.src.rpm
    -ipmitool-1.8.11-21.el6.src.rpm
    -iproute-2.6.32-32.el6_5.src.rpm
    +initscripts-9.03.46-1.el6_6.1.src.rpm
    +ipmitool-1.8.11-20.el6.src.rpm
    +iproute-2.6.32-33.el6_6.src.rpm
    -kernel-2.6.32-504.1.2.el6.src.rpm
    +kernel-2.6.32-504.el6.src.rpm
    -ovirt-node-3.0.1-19.el6.18.src.rpm
    -ovirt-node-plugin-vdsm-0.1.1-26.el6ev.src.rpm
    +ovirt-node-3.0.1-19.el6.22.src.rpm
    +ovirt-node-plugin-vdsm-0.1.1-27.el6ev.src.rpm
    -tzdata-2014h-1.el6.src.rpm
    +tzdata-2014i-1.el6.src.rpm
    -vdsm-4.14.17-1.el6ev.src.rpm
    +vdsm-4.14.13-2.el6ev.src.rpm
    -wget-1.12-5.el6.src.rpm
    +wget-1.12-5.el6_6.1.src.rpm 

Serial output with rdinitdebug rddebug is attached. If I knew how to generate an rdsosreport manually, I'd attach that, too...

Let me know if you need anything else.

Comment 4 Yaning Wang 2014-11-11 09:05:16 UTC

When install via TUI still kernel panic after reboot with same error output

Comment 5 Harald Hoyer 2014-11-11 13:01:29 UTC

(In reply to Ryan Barry from comment #3)
> This is unexpected behavior when booting with rd_NO_DM, and a change from
> the last image. An SRPM diff is below, and it's in one of these packages:

So, dracut gets triggered by the existence of /dev/disk/by-label/Root .

If I understand it correctly, multipath stole /dev/sda and does not provide partitions for the multipath disk.

Is multipath wanted for the disk? If no, I suggest adding rd_NO_MULTIPATH to the kernel command line.

Comment 6 Harald Hoyer 2014-11-11 13:05:46 UTC

If multipath _is_ wanted, please check, that the initramfs image contains 40-multipath.rules.

# lsinitrd <image> | grep 40-multipath.rules

And if the initramfs contains this file, please reassign the bug to device-mapper-multipath, because these rules should run kpartx, which should provide the partitions.

Comment 9 Ryan Barry 2014-11-11 15:38:17 UTC

(In reply to Harald Hoyer from comment #5)
> (In reply to Ryan Barry from comment #3)
> > This is unexpected behavior when booting with rd_NO_DM, and a change from
> > the last image. An SRPM diff is below, and it's in one of these packages:
> 
> So, dracut gets triggered by the existence of /dev/disk/by-label/Root .
> 
> If I understand it correctly, multipath stole /dev/sda and does not provide
> partitions for the multipath disk.
> 
> Is multipath wanted for the disk? If no, I suggest adding rd_NO_MULTIPATH to
> the kernel command line.

We use rd_NO_MULTIPATH for installing, but the actual boots exclude it.

Your understanding is spot-on. multipath stole /dev/sda, but doesn't provide partitions for the multipath disk.

Experimentally, running kpartx from the dracut shell correctly handles this.

This is not reproducible in earlier images. The big change that I'm noticing is to 40-multipath.rules:

http://gerrit.ovirt.org/#/c/34792/

Is it possible that this is a regression caused by the patch in bug#1148979 comment#12, which we've applied to resolve a different issue?

Comment 10 Fabian Deutsch 2014-11-11 15:42:02 UTC

Maybe Peter also knows something on this issue, he was also involved in bug 1148979

Comment 13 Peter Rajnoha 2014-11-12 08:11:50 UTC

(In reply to Fabian Deutsch from comment #10)
> Maybe Peter also knows something on this issue, he was also involved in bug
> 1148979

When dropped to dracut debug shell, what is the udev db content (udevadm info --export-db)? Can you attach it here? 

There should be an mpath device set up with DM_ACTIVATION=1 variable set - this is the variable based on which the kpartx is triggered then... Let's check if this is so.

Comment 14 Fabian Deutsch 2014-11-12 10:35:26 UTC

Created attachment 956680 [details]
Booted with udev debug output

Peter, I can not reproduce this on my local machine, but I've got some logs from a previous attempt, do they help as well? (See attachement)

Comment 15 Fabian Deutsch 2014-11-12 11:17:22 UTC

This could also be a dupe of bug 1161520

Comment 16 Peter Rajnoha 2014-11-12 12:37:30 UTC

(In reply to Fabian Deutsch from comment #14)
> Created attachment 956680 [details]
> Booted with udev debug output
> 
> Peter, I can not reproduce this on my local machine, but I've got some logs
> from a previous attempt, do they help as well? (See attachement)

Well, it would be better to grab the comple "udevadm info --export-db", the logs attached contain only information about executed commands within udev mostly where I can see various variables imported from external commands (like blkid or dmsetup), but I can't see the DM_ACTIVATION variable which is set solely based on udev rule (not importing it from any external command).

As such, I can see that kpartx was not triggered, but I don't see why - I think that full udev db content would reveal more info... So I can't say from udev debug logs only. If you could reproduce once more and grab the udevadm info --export-db, that would be great.

Comment 17 Fabian Deutsch 2014-11-12 13:45:27 UTC

(In reply to Peter Rajnoha from comment #16)
> (In reply to Fabian Deutsch from comment #14)

…

> As such, I can see that kpartx was not triggered, but I don't see why - I
> think that full udev db content would reveal more info... So I can't say
> from udev debug logs only. If you could reproduce once more and grab the
> udevadm info --export-db, that would be great.

Right, no problem, I just can't provide it, but I think Ryan can.

Comment 18 Ryan Barry 2014-11-12 15:07:14 UTC

Created attachment 956764 [details]
udevadm info --export-db

Comment 19 haiyang,dong 2014-11-13 08:15:50 UTC

Workaround:Using "--skip-initramfs" parameter of edit-node tools when add plugin into iso, no kernel panic for boot after install this iso into host:
edit-node --repo=edit-node.repo --install=ovirt-node-plugin-puppet-3.1.0-0.25.20141107gitf6dc7b9.el6.noarch.rpm --skip-initramfs rhev-hypervisor6-6.6-20141107.0.iso 

so we could guess root cause happens in "_rebuild_initramfs" function in edit-node tools code.

Comment 20 Peter Rajnoha 2014-11-13 09:24:34 UTC

(In reply to Ryan Barry from comment #18)
> Created attachment 956764 [details]
> udevadm info --export-db

This log shows that the mpath device was not even set up in this case - there's no dm-* device present (however, in previous case, the log from comment #14 shows that there is at least one dm device - the mpath one presumably based on the commands executed by udev just missing the kpartx call), so it's also different case from comment #3 it seems.

Was the reproducing environmnet exactly the same as in comment #3?

Comment 21 Fabian Deutsch 2014-11-13 09:50:29 UTC

AFAIK yes, he uses a VM.

Comment 22 Ryan Barry 2014-11-13 14:45:31 UTC

Created attachment 957189 [details]
udevadm info console

Comment 23 Ryan Barry 2014-11-13 14:49:14 UTC

(In reply to Peter Rajnoha from comment #20)
> (In reply to Ryan Barry from comment #18)
> > Created attachment 956764 [details]
> > udevadm info --export-db
> 
> This log shows that the mpath device was not even set up in this case -
> there's no dm-* device present (however, in previous case, the log from
> comment #14 shows that there is at least one dm device - the mpath one
> presumably based on the commands executed by udev just missing the kpartx
> call), so it's also different case from comment #3 it seems.
> 
> Was the reproducing environmnet exactly the same as in comment #3?

Same virtual machine, same cmdline.

The only difference in the environment is that once I hit the dracut shell, I manually stopped the dm devices (which was necessary to mount a partition to save the udev info). I don't know if that would affect 

I attached a log from the serial console with the DM device left active.

Comment 24 Ben Marzinski 2014-11-13 19:59:22 UTC

What do you mean by "manually stopped the dm devices"? I'm trying to figure out if the multipath device got removed, or was never created.

Comment 25 Ryan Barry 2014-11-13 20:08:14 UTC

(In reply to Ryan Barry from comment #3)
> /dev/mapper/1ATA_QEMU_HARDDISK_QM00001 is present, but none of its
> partitions.
> 
> /dev/sda3 is present, but not able to be mounted.
> 
> dmsetup table shows:
> 1ATA_QEMU_HARDDISK_QM00001: 0 16777216 multipath 0 0 1 1 round-robing 0 1 1
> 8:0 1
> 
> Removing it (dmsetup remove 1ATA...) lets sda3 mount, and booting continue
> as normal.
The part quoted above is relevant.

It was created, but none of its partitions were created. It grabbed sda, and I wasn't able to mount any partitions on sda (to save the udevadm info --export-db output) without "dmsetup remove".

A devicemapper device is created for the disk, but not for any of its partitions.

/dev/disk/by-label/Root points to sda3, but sda3 is not able to be mounted since multipath is holding onto the disk. It should point to 1ATA_QEMU_HARDDISK_QM00001p3, but that device does not exist.

The most recent attachment (udevadm info console) did not have this step taken, since I grabbed the output by attaching a serial console to the VM, and I didn't need to mount any partitions to save it.

Comment 26 Peter Rajnoha 2014-11-14 08:44:11 UTC

(In reply to Ryan Barry from comment #22)
> Created attachment 957189 [details]
> udevadm info console

Hmm, the variable based on which the kpartx should trigger is there:

P: /devices/virtual/block/dm-0
N: dm-0 
...
S: mapper/1ATA_QEMU_HARDDISK_QM00001
...
E: MAJOR=253
E: MINOR=0
E: DEVNAME=/dev/dm-0
E: DEVTYPE=disk
E: SUBSYSTEM=block
...
E: DM_ACTIVATION=1
...
E: ID_PART_TABLE_TYPE=gpt
...

Also, it's clear there's a partition table header (blkid gives E: ID_PART_TABLE_TYPE=gpt). So for some reason the kpartx call is skipped, I'll try to recheck the rules...

Comment 27 Peter Rajnoha 2014-11-14 08:46:14 UTC

(In reply to Peter Rajnoha from comment #26)
> (In reply to Ryan Barry from comment #22)
> > Created attachment 957189 [details]
> > udevadm info console
> 
> Hmm, the variable based on which the kpartx should trigger is there:
> 

This one exactly:

> E: DM_ACTIVATION=1

Comment 28 Fabian Deutsch 2014-11-14 11:03:38 UTC

For completeness, the difference between a working and non working image is exactly the rule which got introduced with bug 1148979

(a: does not work, b: works)

diff -ur a/etc/udev/rules.d/40-multipath.rules b/etc/udev/rules.d/40-multipath.rules
--- a/etc/udev/rules.d/40-multipath.rules	2014-11-13 16:21:38.965409873 +0100
+++ b/etc/udev/rules.d/40-multipath.rules	2014-11-13 16:22:17.326373921 +0100
@@ -20,5 +20,5 @@
 ENV{DM_UUID}!="mpath-?*", GOTO="end_mpath"
 ENV{DM_SUSPENDED}=="1", GOTO="end_mpath"
 ENV{DM_ACTION}=="PATH_FAILED", GOTO="end_mpath"
-ENV{DM_ACTIVATION}==1, RUN+="$env{MPATH_SBIN_PATH}/kpartx -a -p p $tempnode"
+RUN+="$env{MPATH_SBIN_PATH}/kpartx -a -p p $tempnode"
 LABEL="end_mpath"

Comment 29 Peter Rajnoha 2014-11-14 11:13:42 UTC

(In reply to Fabian Deutsch from comment #28)
> For completeness, the difference between a working and non working image is
> exactly the rule which got introduced with bug 1148979
> 
> (a: does not work, b: works)
> 
> diff -ur a/etc/udev/rules.d/40-multipath.rules
> b/etc/udev/rules.d/40-multipath.rules
> --- a/etc/udev/rules.d/40-multipath.rules	2014-11-13 16:21:38.965409873 +0100
> +++ b/etc/udev/rules.d/40-multipath.rules	2014-11-13 16:22:17.326373921 +0100
> @@ -20,5 +20,5 @@
>  ENV{DM_UUID}!="mpath-?*", GOTO="end_mpath"
>  ENV{DM_SUSPENDED}=="1", GOTO="end_mpath"
>  ENV{DM_ACTION}=="PATH_FAILED", GOTO="end_mpath"
> -ENV{DM_ACTIVATION}==1, RUN+="$env{MPATH_SBIN_PATH}/kpartx -a -p p $tempnode"

(...I'm thinking whether that should be ENV{DM_ACTIVATION}=="1" instead of ...==1 (no quotes))

Comment 30 Peter Rajnoha 2014-11-14 11:18:30 UTC

Yes, it's the quoting :)

So it should be ENV{DM_ACTIVATION}=="1"

Comment 31 Peter Rajnoha 2014-11-14 11:19:14 UTC

(so just a typo when copying the original patch from bug #1148979)

Comment 34 Yaning Wang 2014-11-18 03:02:10 UTC

Created attachment 958409 [details]
console_output_after_autoinstall_reboot

Tested on:

rhev-hypervisor6-6.6-20141113.0.iso
ovirt-node-plugin-puppet-3.0.1-19.el6.23.noarch.rpm
ovirt-node-tools-3.0.1-19.el6.23.noarch

Test Steps:

auto-install with

"bootif= adminpw= storage_init= firstboot"


after reboot, still kernel panic

Comment 35 Ryan Barry 2014-11-18 05:13:02 UTC

I would be very surprised if 6.6-20141113 worked, since it does not have the fix attached to this bug included.

Fabian, was that the intended build?

Comment 37 Yaning Wang 2014-11-19 02:03:05 UTC

Tested on:

rhev-hypervisor6-6.6-20141114.0.el6ev
ovirt-node-plugin-puppet-3.0.1-19.el6.23.noarch.rpm
ovirt-node-tools-3.0.1-19.el6.23.noarch

Test Steps:

auto-install with

"bootif= adminpw= storage_init= firstboot"


after reboot, can successful log-in, issue fixed on this build

Comment 38 Ying Cui 2014-11-19 12:53:42 UTC

Fabian, according to comment 37, the patch is valid to fix this issue, so we need to  backport the patch to 3.4.z.

Comment 41 Yaning Wang 2014-11-20 02:26:11 UTC

Tested on:

rhev-hypervisor6-6.6-20141119.0.el6ev
ovirt-node-plugin-puppet-3.0.1-19.el6.23.noarch.rpm
ovirt-node-tools-3.0.1-19.el6.23.noarch

Test Steps:

auto-install with

"bootif= adminpw= storage_init= firstboot"


after reboot, can successful log-in, pupper show in TUI menu, issue fixed on this build

Comment 42 Yaning Wang 2014-11-20 05:34:04 UTC

correct ovirt-node-plugin-puppet and ovirt-node-tools version from comment 41

they should are

ovirt-node-plugin-puppet-3.0.1-19.el6.24.noarch.rpm
ovirt-node-tools-3.0.1-19.el6.24.noarch.rpm


Tested on:

rhev-hypervisor6-6.6-20141119.0.el6ev
ovirt-node-plugin-puppet-3.0.1-19.el6.24.noarch.rpm
ovirt-node-tools-3.0.1-19.el6.24.noarch.rpm
Test Steps:

auto-install with

"bootif= adminpw= storage_init= firstboot"


after reboot, can successful log-in, pupper show in TUI menu, issue fixed on this build

Comment 46 errata-xmlrpc 2015-02-11 21:05:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2015-0160.html

Note You need to log in before you can comment on or make changes to this bug.

agk
bmarzins
cshao
dfediuck
dougsland
dwysocha
ecohen
fdeutsch
gklein
hadong
hannsj_uhl
harald
heinzm
huiwa
iheim
leiwang
lsurette
msnitzer
prajnoha
prockai
rbalakri
rbarry
scohen
yaniwang
ycui
zkabelac