Bug 1180717

Summary: device-mapper causes kernel panic/Oops if iscsi devices contain old thinp volumes are discovered
Product: Red Hat Enterprise Linux 7 Reporter: Nenad Peric <nperic>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
lvm2 sub component: Thin Provisioning QA Contact: Cluster QE <mspqa-list>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: unspecified CC: agk, cmarthal, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, zkabelac
Version: 7.1Keywords: Regression
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-01-12 19:30:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nenad Peric 2015-01-09 19:18:10 UTC
Description of problem:
If you try to install the newest/latest nightly ( RHEL 7.1-20150109.n.0 ) of RHEL 7.1 the machine would crash/reboot into kdump kernel at first boot, or machine would manage to boot and live for a few mins but again cause a kernel Oops/panic as soon as the network is connected (and iscsi devices discovered).


Version-Release number of selected component (if applicable):
RHEL 7.1-20150109.n.0


How reproducible:

Everytime

Steps to Reproduce:

Not really sure what happened, but this is what can be seen when the same disks are attached to a RHEL 7.0 server (since it does not have this issue):

[root@tardis-01 ~]# vgs
  VG              #PV #LV #SN Attr   VSize   VFree
  rhel_tardis-01    1   3   0 wz--n- 278.88g      0
  thinpool_1_9059   2   6   0 wz--n- 186.25g 185.24g
[root@tardis-01 ~]# lvs -a
  LV                       VG              Attr       LSize   Pool             Origin Data%  Move Log Cpy%Sync Convert
  home                     rhel_tardis-01  -wi-ao---- 224.88g
  root                     rhel_tardis-01  -wi-ao----  50.00g
  swap                     rhel_tardis-01  -wi-ao----   4.00g
  [lvol0_pmspare]          thinpool_1_9059 ewi-------   4.00m
  thinpool_1_90590         thinpool_1_9059 twima-tz--   1.00g                           0.00
  [thinpool_1_90590_tdata] thinpool_1_9059 Twi-ao----   1.00g
  [thinpool_1_90590_tmeta] thinpool_1_9059 ewi-ao----   4.00m
  virt1                    thinpool_1_9059 Vwi-a-tz-- 500.00m thinpool_1_90590          0.00
  virt2                    thinpool_1_9059 Vwi-a-tz-- 500.00m thinpool_1_90590          0.00
  virt3                    thinpool_1_9059 Vwi-a-tz-- 500.00m thinpool_1_90590          0.00
  virt4                    thinpool_1_9059 Vwi-a-tz-- 500.00m thinpool_1_90590          0.00
  virt5                    thinpool_1_9059 Vwi-a-tz-- 500.00m thinpool_1_90590          0.00
[root@tardis-01 ~]# dmsetup ls
rhel_tardis--01-swap	(253:0)
rhel_tardis--01-root	(253:1)
thinpool_1_9059-thinpool_1_90590-tpool	(253:5)
thinpool_1_9059-thinpool_1_90590_tdata	(253:4)
thinpool_1_9059-thinpool_1_90590_tmeta	(253:3)
thinpool_1_9059-thinpool_1_90590	(253:130)
thinpool_1_9059-virt5	(253:10)
thinpool_1_9059-virt4	(253:9)
thinpool_1_9059-virt3	(253:8)
rhel_tardis--01-home	(253:2)
thinpool_1_9059-virt2	(253:7)
thinpool_1_9059-virt1	(253:6)
[root@tardis-01 ~]# dmsetup status
rhel_tardis--01-swap: 0 8388608 linear
rhel_tardis--01-root: 0 104857600 linear
thinpool_1_9059-thinpool_1_90590-tpool: 0 2097152 thin-pool 5 15/1024 0/16384 - rw no_discard_passdown queue_if_no_space
thinpool_1_9059-thinpool_1_90590_tdata: 0 2097152 linear
thinpool_1_9059-thinpool_1_90590_tmeta: 0 8192 linear
thinpool_1_9059-thinpool_1_90590: 0 2097152 linear
thinpool_1_9059-virt5: 0 1024000 thin 0 -
thinpool_1_9059-virt4: 0 1024000 thin 0 -
thinpool_1_9059-virt3: 0 1024000 thin 0 -
rhel_tardis--01-home: 0 471597056 linear
thinpool_1_9059-virt2: 0 1024000 thin 0 -
thinpool_1_9059-virt1: 0 1024000 thin 0 -
[root@tardis-01 ~]# lvs -a -o+devices
  LV                       VG              Attr       LSize   Pool             Origin Data%  Move Log Cpy%Sync Convert Devices
  home                     rhel_tardis-01  -wi-ao---- 224.88g                                                          /dev/sda2(1024)
  root                     rhel_tardis-01  -wi-ao----  50.00g                                                          /dev/sda2(58592)
  swap                     rhel_tardis-01  -wi-ao----   4.00g                                                          /dev/sda2(0)
  [lvol0_pmspare]          thinpool_1_9059 ewi-------   4.00m                                                          /dev/sdd1(0)
  thinpool_1_90590         thinpool_1_9059 twima-tz--   1.00g                           0.00                           thinpool_1_90590_tdata(0)
  [thinpool_1_90590_tdata] thinpool_1_9059 Twi-ao----   1.00g                                                          /dev/sdd1(1)
  [thinpool_1_90590_tmeta] thinpool_1_9059 ewi-ao----   4.00m                                                          /dev/sdf1(0)
  virt1                    thinpool_1_9059 Vwi-a-tz-- 500.00m thinpool_1_90590          0.00
  virt2                    thinpool_1_9059 Vwi-a-tz-- 500.00m thinpool_1_90590          0.00
  virt3                    thinpool_1_9059 Vwi-a-tz-- 500.00m thinpool_1_90590          0.00
  virt4                    thinpool_1_9059 Vwi-a-tz-- 500.00m thinpool_1_90590          0.00
  virt5                    thinpool_1_9059 Vwi-a-tz-- 500.00m thinpool_1_90590          0.00
[root@tardis-01 ~]#

Actual results:

[  202.273535] device-mapper: thin: Data device (dm-4) discard unsupported: Disabling discard passdown. 
2015-01-09 20:05:42,243 backend linfo: INFO BackendFactory: Started to connect. 
2015-01-09 20:05:42,247 backend linfo: INFO BackendFactory: Connected.  Address: IPv4Address(TCP, '10.34.71.138', 12432) 
2015-01-09 20:05:42,2[  202.328672] BUG: unable to handle kernel NULL pointer dereference at           (null) 
[  202.338271] IP: [<ffffffff810a06fb>] __wake_up_common+0x2b/0x90 
[  202.344892] PGD c1a920067 PUD c1c04d067 PMD 0  
[  202.349911] Oops: 0000 [#1] SMP  
[  202.353526] Modules linked in:47 backend linfo dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iptable_filter ip_tables sctp kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sp5100_tco ipmi_devintf tpm_infineon ipmi_si serio_raw amd64_edac_mod ipmi_msghandler i2c_piix4 edac_mce_amd pcspkr edac_core shpchp k10temp fam15h_power acpi_cpufreq dm_multipath xfs libcrc32c sr_mod cdrom ata_generic pata_acpi mgag200 syscopyarea sysfillrect sysimgblt igb drm_kms_helper sd_mod ttm crc_t10dif ptp ahci crct10dif_common libahci pata_atiixp pps_core drm dca libata be2net hpsa i2c_algo_bit vxlan i2c_core ip_tunnel dm_mirror dm_region_hash dm_log dm_mod 
[  202.431133] CPU: 28 PID: 275 Comm: kworker/u64:1 Not tainted 3.10.0-221.el7.x86_64 #1 
: INFO BackendFa[  202.439871] Hardware name: HP ProLiant DL165 G7, BIOS O37 10/17/2012 
[  202.448510] Workqueue: dm-thin do_worker [dm_thin_pool] 
[  202.454355] task: ffff88061d0e2220 ti: ffff88061d110000 task.ti: ffff88061d110000 
[  202.462704] RIP: 0010:[<ffffffff810a06fb>]  [<ffffffff810a06fb>] __wake_up_common+0x2b/0x90 
[  202.472041] RSP: 0018:ffff88061d113cd8  EFLAGS: 00010086 
[  202.477972] RAX: 0000000000000286 RBX: ffff88181d37b598 RCX: 0000000000000000 
[  202.485932] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88181d37b598 
[  202.493892] RBP: ffff88061d113d10 R08: 0000000000000000 R09: 000000018020001f 
[  202.501860] R10: ffffffff81159187 R11: ffffea004878df00 R12: ffff88181d37b5a0 
[  202.509828] R13: 0000000000000286 R14: 0000000000000000 R15: 0000000000000003 
[  202.517790] FS:  00007f04faffd700(0000) GS:ffff88183fd00000(0000) knlGS:0000000000000000 
[  202.526824] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b 
[  202.533234] CR2: 0000000000000000 CR3: 0000000c1e3fb000 CR4: 00000000000407e0 
[  202.541202] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
[  202.549168] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
[  202.557135] Stack: 
[  202.559378]  00000001a04067ea 0000000000000000 ffff88181d37b598 ffff88181d37b590 
[  202.567681]  0000000000000286 0000000000000246 0000000000000001 ffff88061d113d38 
[  202.575986]  ffffffff810a241c ffff880c110e01f8 ffff880c110e0000 ffff88061d113db0 
[  202.584290] Call Trace: 
[  202.587024]  [<ffffffff810a241c>] complete+0x3c/0x50 
[  202.592570]  [<ffffffffa0401ad8>] thin_put+0x28/0x30 [dm_thin_pool] 
[  202.599566]  [<ffffffffa0401b44>] get_next_thin+0x64/0x70 [dm_thin_pool] 
[  202.607048]  [<ffffffffa0406bf3>] do_worker+0x323/0x860 [dm_thin_pool] 
[  202.614335]  [<ffffffff8108f0ab>] process_one_work+0x17b/0x470 
[  202.620848]  [<ffffffff8108fe8b>] worker_thread+0x11b/0x400 
[  202.627070]  [<ffffffff8108fd70>] ? rescuer_thread+0x400/0x400 
[  202.633582]  [<ffffffff8109726f>] kthread+0xcf/0xe0 
[  202.639029]  [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140 
[  202.646317]  [<ffffffff816151fc>] ret_from_fork+0x7c/0xb0 
[  202.652343]  [<ffffffff810971a0>] ? kthread_create_on_node+0x140/0x140 
[  202.659625] Code: 66 66 66 66 90 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 54 4c 8d 67 08 53 48 83 ec 10 89 55 cc 48 8b 57 08 4c 89 45 d0 <48> 8b 0a 49 39 d4 48 8d 42 e8 4c 8d 69 e8 75 0b eb 3b 0f 1f 00  
[  202.681525] RIP  [<ffffffff810a06fb>] __wake_up_common+0x2b/0x90 
[  202.688241]  RSP <ffff88061d113cd8> 
[  202.692130] CR2: 0000000000000000 

Expected results:

any iscsi devices not pertaining to the system should not cause a kernel to panic. 

Additional info:

Comment 3 Scott Dodson 2015-01-12 15:46:49 UTC
I get the same behavior now and then when working with docker. ABRT submitted this as Bug 1181218

Comment 4 Nenad Peric 2015-01-12 19:23:09 UTC
Looks like the same issue. 

I encountered the ABRT first during my tests, then ran into this issue later when I tried to make a 'clean' start by re-installing the machine with the newest 7.1 snapshot. 

Thought it was maybe related to how my machine was set up (was upgraded a few times rather than cleanly installed), but that was clearly not the cause.

Comment 5 Mike Snitzer 2015-01-12 19:30:27 UTC

*** This bug has been marked as a duplicate of bug 1175282 ***