Bug 1251335

Summary: lvm2 errors, system will not boot on 4.1 kernels
Product: [Fedora] Fedora Reporter: fdinoto
Component: lvm2Assignee: Heinz Mauelshagen <heinzm>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 22CC: agk, bmarzins, bmr, casper, dwysocha, heinzm, jonathan, lvm-team, mcsontos, msnitzer, prajnoha, prockai, zkabelac
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-06 13:07:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Good-boot_4.0.8_kernel
none
Bad-boot_4.1_kernell none

Description fdinoto 2015-08-07 02:50:20 UTC
System will not boot on 4.1 kernels but 4.0 kernels boot as expected. My system has many local disks. I have 3 volume groups. The volume that seems to be generating the "lvm2" errors at boot time is mirrored. It is not a OS required file system.

Is there something fundamentally different about how the 4.1 kernel views physical disks or the way LVM works? I believe there have been 2 4.1 kernels for Fedora 22 as of this time and I have had this problem with both.

I was unsure what to specify as the component, I only chose lvm2 because it was what was generating the errors but I don't think it is the root cause.

I removed the mount points that reference the volumes LVM is having problems with from fstab but the problem persists. It seems like LVM itself that is not able to initialize the logical volume, and the system won't boot past emergency mode.

The rest of the volumes in that volume group are accessible/mountable in emergency mode. Trying to manually mount the LV that is mirrored produces an error about the special file under /dev/mapper doesn't exist. This is correct, only the special files for the 2 mirrors and the 2 log volumes exist but not the one that you actually mount.

Comment 1 fdinoto 2015-08-07 19:45:45 UTC
Created attachment 1060460 [details]
Good-boot_4.0.8_kernel

This boot log shows what happens on a pre-4.1 kernel. All of my LVM groups/volumes come online with no issues.

Comment 2 fdinoto 2015-08-07 19:47:33 UTC
Created attachment 1060461 [details]
Bad-boot_4.1_kernell

This log shows what happens when I boot a 4.1 kernel. My disks/LVM do not come online as expected and the system boots into emergency mode.

Comment 3 fdinoto 2015-08-08 02:16:01 UTC
I ended up migrating the data to another VG and deleting the VG that was having issues. I was then able to boot on 4.1.3-201 kernel.

I wiped the 3 PV and added them in to the VG I just used to migrate my data. I've been able to reboot multiple times on 4.1.3-201 now. Luckily I had just enough space to move things around.

Comment 4 Marian Csontos 2015-08-10 15:15:08 UTC
What exactly was the layout?
The name of the LV (r1v0) suggests you were using dm-raid with RAID1.
Looks like an instance of kernel bug https://bugzilla.kernel.org/show_bug.cgi?id=100491.

Aug 06 18:54:02 host.example.com kernel: md/raid1:mdX: active with 2 out of 2 mirrors
Aug 06 18:54:02 host.example.com kernel: md-cluster module not found.
Aug 06 18:54:02 host.example.com kernel: mdX: Could not setup cluster service (256)
Aug 06 18:54:02 host.example.com kernel: mdX: bitmap file superblock:
Aug 06 18:54:02 host.example.com kernel:          magic: 6d746962
Aug 06 18:54:02 host.example.com kernel:        version: 4
Aug 06 18:54:02 host.example.com kernel:           uuid: 00000000.00000000.00000000.00000000
Aug 06 18:54:02 host.example.com kernel:         events: 1119
Aug 06 18:54:02 host.example.com kernel: events cleared: 1119
Aug 06 18:54:02 host.example.com kernel:          state: 00000000
Aug 06 18:54:02 host.example.com kernel:      chunksize: 524288 B
Aug 06 18:54:02 host.example.com kernel:   daemon sleep: 5s
Aug 06 18:54:02 host.example.com kernel:      sync size: 976752640 KB
Aug 06 18:54:02 host.example.com kernel: max write behind: 0
Aug 06 18:54:02 host.example.com kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000100
Aug 06 18:54:02 host.example.com kernel: IP: [<ffffffff817a1795>] _raw_spin_lock_irq+0x15/0x50
Aug 06 18:54:02 host.example.com kernel: PGD 0
Aug 06 18:54:02 host.example.com kernel: Oops: 0002 [#1] SMP
Aug 06 18:54:02 host.example.com kernel: Modules linked in: dm_raid snd_hda_codec_realtek snd_hda_codec_generic cfg80211 snd_hda_intel snd_hda_controller snd_hda_codec raid456 ssb snd_usb_audio intel_rapl snd_usbmidi_lib snd_h
Aug 06 18:54:02 host.example.com kernel:  video
Aug 06 18:54:02 host.example.com kernel: CPU: 2 PID: 949 Comm: lvm Not tainted 4.1.3-201.fc22.x86_64 #1
Aug 06 18:54:02 host.example.com kernel: Hardware name: ASUS All Series/Z87-DELUXE/DUAL, BIOS 2103 08/15/2014
Aug 06 18:54:02 host.example.com kernel: task: ffff880880a1d8e0 ti: ffff88088a1f4000 task.ti: ffff88088a1f4000
Aug 06 18:54:02 host.example.com kernel: RIP: 0010:[<ffffffff817a1795>]  [<ffffffff817a1795>] _raw_spin_lock_irq+0x15/0x50
Aug 06 18:54:02 host.example.com kernel: RSP: 0018:ffff88088a1f7b48  EFLAGS: 00010006
Aug 06 18:54:02 host.example.com kernel: RAX: 0000000000010000 RBX: 0000000000000100 RCX: 0000000000000000
Aug 06 18:54:02 host.example.com kernel: RDX: ffff88088a1f7ba8 RSI: 0000000000000000 RDI: 0000000000000100
Aug 06 18:54:02 host.example.com kernel: RBP: ffff88088a1f7b48 R08: 0000000800000000 R09: 00000008ffffffff
Aug 06 18:54:02 host.example.com kernel: R10: ffffffff816129a7 R11: 0000000000000246 R12: 0000000000000000
Aug 06 18:54:02 host.example.com kernel: R13: 0000000000000000 R14: 0000000000000000 R15: ffff88088a1f7ba8
Aug 06 18:54:02 host.example.com kernel: FS:  00007f11fa40e880(0000) GS:ffff8808afa80000(0000) knlGS:0000000000000000
Aug 06 18:54:02 host.example.com kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 06 18:54:02 host.example.com kernel: CR2: 0000000000000100 CR3: 000000088794e000 CR4: 00000000001407e0
Aug 06 18:54:02 host.example.com kernel: Stack:
Aug 06 18:54:02 host.example.com kernel:  ffff88088a1f7b98 ffffffff8160aa78 ffff88088a1f7b88 00000000897356c7
Aug 06 18:54:02 host.example.com kernel:  ffff880017de4458 0000000000000000 ffff880888d96810 0000000000000100
Aug 06 18:54:02 host.example.com kernel:  ffff880889170e00 0000000000000000 ffff88088a1f7bd8 ffffffff8160c709
Aug 06 18:54:02 host.example.com kernel: Call Trace:
Aug 06 18:54:02 host.example.com kernel:  [<ffffffff8160aa78>] bitmap_start_sync+0x48/0x100
Aug 06 18:54:02 host.example.com kernel:  [<ffffffff8160c709>] bitmap_load+0x59/0x150
Aug 06 18:54:02 host.example.com kernel:  [<ffffffffa0747e25>] raid_resume+0x135/0x230 [dm_raid]
Aug 06 18:54:02 host.example.com kernel:  [<ffffffff81615209>] dm_table_resume_targets+0x99/0xf0
Aug 06 18:54:02 host.example.com kernel:  [<ffffffff81612641>] dm_resume+0xc1/0x100
Aug 06 18:54:02 host.example.com kernel:  [<ffffffff81617b4b>] dev_suspend+0x12b/0x280
Aug 06 18:54:02 host.example.com kernel:  [<ffffffff81617a20>] ? table_load+0x370/0x370
Aug 06 18:54:02 host.example.com kernel:  [<ffffffff816184a2>] ctl_ioctl+0x232/0x520
Aug 06 18:54:02 host.example.com kernel:  [<ffffffff81324988>] ? SYSC_semtimedop+0x2c8/0xea0
Aug 06 18:54:02 host.example.com kernel:  [<ffffffff816187a3>] dm_ctl_ioctl+0x13/0x20
Aug 06 18:54:02 host.example.com kernel:  [<ffffffff8123fcb6>] do_vfs_ioctl+0x2c6/0x4d0
Aug 06 18:54:02 host.example.com kernel:  [<ffffffff81335901>] ? security_shm_alloc+0x11/0x20
Aug 06 18:54:02 host.example.com kernel:  [<ffffffff8123ff41>] SyS_ioctl+0x81/0xa0
Aug 06 18:54:02 host.example.com kernel:  [<ffffffff810250d7>] ? syscall_trace_leave+0xc7/0x140
Aug 06 18:54:02 host.example.com kernel:  [<ffffffff817a1a6e>] system_call_fastpath+0x12/0x71
Aug 06 18:54:02 host.example.com kernel: Code: 66 39 ca 75 f1 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 fa 66 0f 1f 44 00 00 b8 00 00 01 00 <f0> 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 05 5d c3
Aug 06 18:54:02 host.example.com kernel: RIP  [<ffffffff817a1795>] _raw_spin_lock_irq+0x15/0x50
Aug 06 18:54:02 host.example.com kernel:  RSP <ffff88088a1f7b48>
Aug 06 18:54:02 host.example.com kernel: CR2: 0000000000000100
Aug 06 18:54:02 host.example.com kernel: ---[ end trace d7ce3b74c5de09f9 ]---

Comment 5 Heinz Mauelshagen 2016-07-06 13:07:40 UTC
This early error while adding new MD raid1 cluster has been fixed a while ago, closing. Please reopen if you can still reproduce.