Bug 1176087 - [6.6-3.5]kernel panic occurred when boot hypervisor from UEFI machine
Summary: [6.6-3.5]kernel panic occurred when boot hypervisor from UEFI machine
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-node
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.5.0
Assignee: Fabian Deutsch
QA Contact: Virtualization Bugs
URL:
Whiteboard: node
Depends On:
Blocks: rhev35rcblocker rhev35gablocker
TreeView+ depends on / blocked
 
Reported: 2014-12-19 11:09 UTC by cshao
Modified: 2016-02-10 20:03 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-12-22 10:39:37 UTC
oVirt Team: Node
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
r510.log (53.72 KB, text/plain)
2014-12-22 07:30 UTC, cshao
no flags Details
r510-new-output.log (45.76 KB, text/plain)
2014-12-22 08:47 UTC, cshao
no flags Details
init.log (529.00 KB, text/plain)
2014-12-22 08:47 UTC, cshao
no flags Details

Description cshao 2014-12-19 11:09:28 UTC
Created attachment 971101 [details]
kernel-panic-r510.png

Description of problem:
[6.6-3.5]kernel panic occurred when boot from UEFI machine(Dell-R510)

Version-Release number of selected component (if applicable):
rhev-hypervisor6-6.6-20141218.0.el6ev
ovirt-node-3.1.0-0.37.20141218gitcf277e1.el6.noarch

How reproducible:
I tested about 4 times, 2 times encountered this bug)

Steps to Reproduce:
1. Enter UEFI mode on Dell-r510.
2. Attach virtual-media and boot from it.
3. Reinstall the hypervisor on dell-r510 with uefi mode.
4. Reboot
5. Boot the hypervisor with uefi mode.

Actual results:
kernel panic occurred when boot from UEFI machine(Dell-R510)

Expected results:
Boot the hypervisor can succeed on UEFI mode.

Additional info:
We didn't met this issue on rhev-hypervisor6-6.6-20141119.0.iso(6.6-3.5), so consider it is a regression bug.

Due to met kernel panic issue, so I can provide more log info,

Comment 1 Fabian Deutsch 2014-12-19 11:24:49 UTC
Mike, the change between the previous RHEV-H which was not affected by this bug, and the build which is affected is (beyond others):

    -kernel-2.6.32-504.1.3.el6.src.rpm
    +kernel-2.6.32-504.3.3.el6.src.rpm

I saw that many dm patches went in between those two versions. Can you tell if this might be related to those patches?

Comment 2 Mike Snitzer 2014-12-19 14:13:15 UTC
(In reply to Fabian Deutsch from comment #1)
> Mike, the change between the previous RHEV-H which was not affected by this
> bug, and the build which is affected is (beyond others):
> 
>     -kernel-2.6.32-504.1.3.el6.src.rpm
>     +kernel-2.6.32-504.3.3.el6.src.rpm
> 
> I saw that many dm patches went in between those two versions. Can you tell
> if this might be related to those patches?

The changes that went in were focused on improving DM thin-provisioning.

$ git log rhel-6.6.z/master -- drivers/md | grep "RHEL6.7 PATCH" | tac
    O-Subject: [RHEL6.7 PATCH 01/25] dm thin: fix DMERR typo in pool_status error path
    O-Subject: [RHEL6.7 PATCH 02/25] dm thin: cleanup noflush_work to use a proper completion
    O-Subject: [RHEL6.7 PATCH 03/25] dm thin metadata: do not allow the data block size to change
    O-Subject: [RHEL6.7 PATCH 04/25] dm bufio: use kzalloc when allocating dm_bufio_client
    O-Subject: [RHEL6.7 PATCH 05/25] dm bufio: update last_accessed when relinking a buffer
    O-Subject: [RHEL6.7 PATCH 06/25] dm bufio: switch from a huge hash table to an rbtree
    O-Subject: [RHEL6.7 PATCH 07/25] dm bufio: evict buffers that are past the max age but retain some buffers
    O-Subject: [RHEL6.7 PATCH 08/25] dm bio prison: switch to using a red black tree
    O-Subject: [RHEL6.7 PATCH 09/25] dm thin metadata: change dm_thin_find_block to allow blocking, but not issuing, IO
    O-Subject: [RHEL6.7 PATCH 10/25] dm transaction manager: add support for prefetching blocks of metadata
    O-Subject: [RHEL6.7 PATCH 11/25] dm thin: prefetch missing metadata pages
    O-Subject: [RHEL6.7 PATCH 12/25] dm thin: throttle incoming IO
    O-Subject: [RHEL6.7 PATCH 14/25] dm thin: adjust max_sectors_kb based on thinp blocksize
    O-Subject: [RHEL6.7 PATCH 15/25] dm: improve documentation and code clarity in dm_merge_bvec
    O-Subject: [RHEL6.7 PATCH 16/25] dm thin: implement thin_merge
    O-Subject: [RHEL6.7 PATCH 17/25] dm thin: grab a virtual cell before looking up the mapping
    O-Subject: [RHEL6.7 PATCH 18/25] dm thin: performance improvement to discard processing
    O-Subject: [RHEL6.7 PATCH 19/25] dm thin: factor out remap_and_issue_overwrite
    O-Subject: [RHEL6.7 PATCH 20/25] dm thin: defer whole cells rather than individual bios
    O-Subject: [RHEL6.7 PATCH 21/25] dm thin: remap the bios in a cell immediately
    O-Subject: [RHEL6.7 PATCH 22/25] dm thin: direct dispatch when breaking sharing
    O-Subject: [RHEL6.7 PATCH 23/25] dm thin: sort the deferred cells
    O-Subject: [RHEL6.7 PATCH 24/25] dm thin: optimize retry_bios_on_resume
    O-Subject: [RHEL6.7 PATCH 25/25] dm thin: refactor requeue_io to eliminate spinlock bouncing
    O-Subject: [RHEL6.7 PATCH 26/25] dm thin: fix potential for infinite loop in pool_io_hints
    O-Subject: [RHEL6.7 PATCH v2 27/25] dm thin: fix pool_io_hints to avoid looking at max_hw_sectors

I see you're using old DM snapshot (which has nothing to do with dm-thinp).. and there are errors about trying to use "DM_snapshot_cow" has a filesystem type when mounting.  But beyond that I have no context to be able to _really_ say what the system was doing.

But I really doubt these DM changes have anything to do with you your UEFI boot problem.

Comment 3 Mike Snitzer 2014-12-19 14:28:32 UTC
(In reply to Mike Snitzer from comment #2)

> I see you're using old DM snapshot (which has nothing to do with dm-thinp)..

NOTE: dm-snapshot does use dm-bufio.  And there were a handful of dm-bufio changes listed in comment#2.  But I'm not aware of any potential for dm-snapshot regression with these dm-bufio changes.

I think you need to first silence the "mount: unknown filesystem type 'DM_snapshot_cow'" errors.

Comment 4 cshao 2014-12-22 07:30:10 UTC
Created attachment 971897 [details]
r510.log

Hi fabiand, 

I just obtain the panic log info via serial console, provides for you to debug.
Thanks!

Comment 5 Ying Cui 2014-12-22 07:44:40 UTC
(In reply to shaochen from comment #4)
> Created attachment 971897 [details]
> r510.log
> 
> Hi fabiand, 
> 
> I just obtain the panic log info via serial console, provides for you to
> debug.
> Thanks!

Chen, Thanks.
We also need more, please add _rdshell_ _rdinitdebug_ and removing _quiet_ to get /init.log for helps, btw rdsosreport is not available a on rhel 6.6.

Thanks
Ying

Comment 6 cshao 2014-12-22 08:46:47 UTC
(In reply to Ying Cui from comment #5)
> (In reply to shaochen from comment #4)
> > Created attachment 971897 [details]
> > r510.log
> > 
> > Hi fabiand, 
> > 
> > I just obtain the panic log info via serial console, provides for you to
> > debug.
> > Thanks!
> 
> Chen, Thanks.
> We also need more, please add _rdshell_ _rdinitdebug_ and removing _quiet_
> to get /init.log for helps, btw rdsosreport is not available a on rhel 6.6.
> 
> Thanks
> Ying

OK, I have added "rdshell" "rdinitdebug" to CMD and obtain the new log , Please check "r510-new-output.log" & "init.log" for more details.

Thanks!

Comment 7 cshao 2014-12-22 08:47:22 UTC
Created attachment 971916 [details]
r510-new-output.log

Comment 8 cshao 2014-12-22 08:47:59 UTC
Created attachment 971917 [details]
init.log

Comment 11 cshao 2014-12-22 10:39:37 UTC
Test version:
rhev-hypervisor6-6.6-20141218.0.el6ev
ovirt-node-3.1.0-0.37.20141218gitcf277e1.el6.noarch

Test 5 times after pull out the usb disk, didn't met kernel panic issue any more, so close this bug as WORKSFORME.

Thanks!

Comment 12 Ying Cui 2014-12-22 10:54:40 UTC
Due to env. issue, I consider to close it as notabug. Thanks.


Note You need to log in before you can comment on or make changes to this bug.