This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1269677 - cache policy change: NULL pointer dereference at dm_tm_new_block+0x15/0xb0 [dm_persistent_data]
cache policy change: NULL pointer dereference at dm_tm_new_block+0x15/0xb0 [d...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2 (Show other bugs)
7.2
x86_64 Linux
unspecified Severity high
: rc
: ---
Assigned To: Zdenek Kabelac
cluster-qe@redhat.com
:
Depends On:
Blocks: 1269959 1277134
  Show dependency treegraph
 
Reported: 2015-10-07 18:56 EDT by Corey Marthaler
Modified: 2015-11-19 07:48 EST (History)
7 users (show)

See Also:
Fixed In Version: lvm2-2.02.130-4.el7
Doc Type: Bug Fix
Doc Text:
Cause: Kernel dm-cache module target crashes with NULL pointer dereference when cache cleaner policy has been used with writeback mode. Consequence: Unchaching 'writeback' cached LV or using cleaner policy directly with writeback mode resulted into a kernel crash. Fix: Code now detects usage of cleaner policy and implicitly changes mode to writethrough. Result: System no longer kernel panics, and also it should not endlessly loop when uncaching cached LV.
Story Points: ---
Clone Of:
: 1269959 (view as bug list)
Environment:
Last Closed: 2015-11-19 07:48:06 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2015-10-07 18:56:26 EDT
Description of problem:

[root@host-116 ~]# lvcreate -L 4G -n display_cache cache_sanity /dev/sdd1
  Logical volume "display_cache" created.
[root@host-116 ~]# lvcreate -L 4G -n pool cache_sanity /dev/sde1
  Logical volume "pool" created.
[root@host-116 ~]# lvcreate -L 12M -n pool_meta cache_sanity /dev/sde1
  Logical volume "pool_meta" created.
[root@host-116 ~]# lvconvert --yes --type cache-pool --cachepolicy cleaner --cachemode writeback -c 32 --poolmetadata cache_sanity/pool_meta cache_sanity/pool
  WARNING: Converting logical volume cache_sanity/pool and cache_sanity/pool_meta to pool's data and metadata volumes.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
  Converted cache_sanity/pool to cache pool.
[root@host-116 ~]# lvconvert --yes --type cache --cachepool cache_sanity/pool cache_sanity/display_cache
  Logical volume cache_sanity/display_cache is now cached.
[root@host-116 ~]# lvs -a -o +devices,cachepolicy
  LV                    Attr       LSize   Pool   Origin                Data%  Meta% Cpy%Sync Devices                Cache Policy
  display_cache         Cwi-a-C---   4.00g [pool] [display_cache_corig] 0.00   8.66  100.00   display_cache_corig(0) cleaner
  [display_cache_corig] owi-aoC---   4.00g                                                    /dev/sdd1(0)
  [lvol0_pmspare]       ewi-------  12.00m                                                    /dev/sdb1(0)
  [pool]                Cwi---C---   4.00g                              0.00   8.66  100.00   pool_cdata(0)          cleaner
  [pool_cdata]          Cwi-ao----   4.00g                                                    /dev/sde1(0)
  [pool_cmeta]          ewi-ao----  12.00m                                                    /dev/sde1(1024)


[root@host-116 ~]# lvchange --cachepolicy smq cache_sanity/display_cache
  Logical volume "display_cache" changed.
[root@host-116 ~]# lvs -a -o +devices,cachepolicy
  LV                    Attr       LSize   Pool   Origin                Data%  Meta% Cpy%Sync Devices                Cache Policy
  display_cache         Cwi-a-C---   4.00g [pool] [display_cache_corig] 0.00   8.69  100.00   display_cache_corig(0) smq
  [display_cache_corig] owi-aoC---   4.00g                                                    /dev/sdd1(0)
  [lvol0_pmspare]       ewi-------  12.00m                                                    /dev/sdb1(0)
  [pool]                Cwi---C---   4.00g                              0.00   8.69  100.00   pool_cdata(0)          smq
  [pool_cdata]          Cwi-ao----   4.00g                                                    /dev/sde1(0)
  [pool_cmeta]          ewi-ao----  12.00m                                                    /dev/sde1(1024)


[root@host-116 ~]# lvchange --cachepolicy mq cache_sanity/display_cache
[781422.660884] BUG: unable to handle kernel NULL pointer dereference at           (null)
[781422.661008] IP: [<ffffffffa0359135>] dm_tm_new_block+0x15/0xb0 [dm_persistent_data]
[781422.661008] PGD 0 
[781422.661008] Oops: 0000 [#1] SMP 
[781422.661008] Modules linked in: dm_cache_cleaner dm_cache_smq dm_cache_mq dm_cache dm_snapshot dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio sd_mod crc_t10dif crct10dif_generic crct10dif_common sg iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi iptable_filter ppdev pcspkr i2c_piix4 virtio_balloon i2c_core i6300esb parport_pc parport nfsd auth_rpcgss sch_fq_codel nfs_acl lockd grace dm_multipath sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk virtio_net ata_piix libata serio_raw virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
[781422.661008] CPU: 0 PID: 18559 Comm: lvchange Not tainted 3.10.0-319.el7.x86_64 #1
[781422.661008] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[781422.661008] task: ffff880024306780 ti: ffff880025fb0000 task.ti: ffff880025fb0000
[781422.661008] RIP: 0010:[<ffffffffa0359135>]  [<ffffffffa0359135>] dm_tm_new_block+0x15/0xb0 [dm_persistent_data]
[781422.661008] RSP: 0018:ffff880025fb3b20  EFLAGS: 00010282
[781422.661008] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff880025f3b9f5
[781422.661008] RDX: ffff880025fb3b68 RSI: ffffffffa035e210 RDI: 0000000000000000
[781422.661008] RBP: ffff880025fb3b48 R08: ffff880025f3b9f5 R09: ffff880025f3b9b8
[781422.661008] R10: ffffffffa0358adc R11: 0000000000000000 R12: ffff88001b53f000
[781422.661008] R13: ffff880025f3b9b8 R14: ffff880025f3b988 R15: ffff880025f3b8a0
[781422.661008] FS:  00007f7398b83880(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[781422.661008] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[781422.661008] CR2: 0000000000000000 CR3: 000000003bfd6000 CR4: 00000000000006f0
[781422.661008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[781422.661008] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[781422.661008] Stack:
[781422.661008]  0000000000000000 0000000000000005 ffff880025f3b8d0 ffff88001b53f000
[781422.661008]  ffff880025f3b9b8 ffff880025fb3b58 ffffffffa035bb8b ffff880025fb3ba0
[781422.661008]  ffffffffa0359294 ffffffffa0354f3d 00000000ce13a9d5 ffff880025f3b800
[781422.661008] Call Trace:
[781422.661008]  [<ffffffffa035bb8b>] new_block+0x1b/0x20 [dm_persistent_data]
[781422.661008]  [<ffffffffa0359294>] dm_btree_empty+0x34/0xe0 [dm_persistent_data]
[781422.661008]  [<ffffffffa0354f3d>] ? get_array_entry+0x5d/0xe0 [dm_persistent_data]
[781422.661008]  [<ffffffffa03541f2>] dm_array_empty+0x12/0x20 [dm_persistent_data]
[781422.661008]  [<ffffffffa038f739>] dm_cache_write_hints+0x169/0x1f0 [dm_cache]
[781422.661008]  [<ffffffff81638302>] ? down_write+0x12/0x30
[781422.661008]  [<ffffffffa038cc67>] cache_postsuspend+0x447/0x4b0 [dm_cache]
[781422.661008]  [<ffffffff810a5a80>] ? wake_up_atomic_t+0x30/0x30
[781422.661008]  [<ffffffffa00078aa>] dm_table_postsuspend_targets+0x4a/0x60 [dm_mod]
[781422.661008]  [<ffffffffa0004cb1>] dm_suspend+0xe1/0xf0 [dm_mod]
[781422.661008]  [<ffffffffa0009ff0>] ? table_load+0x380/0x380 [dm_mod]
[781422.661008]  [<ffffffffa000a184>] dev_suspend+0x194/0x250 [dm_mod]
[781422.661008]  [<ffffffffa0009ff0>] ? table_load+0x380/0x380 [dm_mod]
[781422.661008]  [<ffffffffa000aa35>] ctl_ioctl+0x255/0x500 [dm_mod]
[781422.661008]  [<ffffffff811939a7>] ? do_wp_page+0x3b7/0x800
[781422.661008]  [<ffffffffa000acf3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[781422.661008]  [<ffffffff811f0a45>] do_vfs_ioctl+0x2e5/0x4c0
[781422.661008]  [<ffffffff8128a6ee>] ? file_has_perm+0xae/0xc0
[781422.661008]  [<ffffffff811f0cc1>] SyS_ioctl+0xa1/0xc0
[781422.661008]  [<ffffffff816440c9>] system_call_fastpath+0x16/0x1b
[781422.661008] Code: 8b 3c 24 e8 6e f9 ff ff 48 8b 03 48 89 c7 ff 10 44 89 e8 eb c3 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 89 fb 48 83 ec 10 <8b> 0f 65 48 8b 04 25 28 00 00 00 48 89 45 e0 31 c0 b8 f5 ff ff 
[781422.661008] RIP  [<ffffffffa0359135>] dm_tm_new_block+0x15/0xb0 [dm_persistent_data]
[781422.661008]  RSP <ffff880025fb3b20>
[781422.661008] CR2: 0000000000000000
[781422.704139] ---[ end trace 8be363a17bac77e1 ]---
[781422.704658] Kernel panic - not syncing: Fatal exception


Version-Release number of selected component (if applicable):
3.10.0-319.el7.x86_64
lvm2-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
lvm2-libs-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
lvm2-cluster-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-libs-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-event-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-event-libs-1.02.107-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
device-mapper-persistent-data-0.5.5-1.el7    BUILT: Thu Aug 13 09:58:10 CDT 2015
cmirror-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
sanlock-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
sanlock-lib-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
lvm2-lockd-2.02.130-2.el7    BUILT: Tue Sep 15 07:15:40 CDT 2015
Comment 2 Corey Marthaler 2015-10-07 19:24:01 EDT
This is reproducible when doing the exact setup and changes. Yet when just randomly looping over cachepolicy changes, I'm unable to hit this.


lvcreate -L 4G -n display_cache cache_sanity /dev/sdg1
lvcreate -L 4G -n pool cache_sanity /dev/sdf1
lvcreate -L 12M -n pool_meta cache_sanity /dev/sdf1
lvconvert --yes --type cache-pool --cachepolicy cleaner --cachemode writeback -c 64 --poolmetadata cache_sanity/pool_meta cache_sanity/pool
lvconvert --yes --type cache --cachepool cache_sanity/pool cache_sanity/display_cache
lvchange --cachepolicy smq cache_sanity/display_cache
lvchange --cachepolicy mq cache_sanity/display_cache
Comment 3 Zdenek Kabelac 2015-10-14 04:49:37 EDT
Lvm2 code now enforces   'writethrough' mode for cleaner policy:

https://www.redhat.com/archives/lvm-devel/2015-October/msg00034.html
Comment 5 Corey Marthaler 2015-10-15 15:50:55 EDT
This fix didn't make the 130-5 rpm.

3.10.0-322.el7.x86_64
lvm2-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
lvm2-libs-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
lvm2-cluster-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-libs-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-event-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-event-libs-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-persistent-data-0.5.5-1.el7    BUILT: Thu Aug 13 09:58:10 CDT 2015
cmirror-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
sanlock-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
sanlock-lib-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
lvm2-lockd-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015

[root@host-082 ~]# vgcreate cache_sanity /dev/sd[abcdefgh]1
  Volume group "cache_sanity" successfully created
[root@host-082 ~]# lvcreate -L 4G -n display_cache cache_sanity /dev/sdg1
  Logical volume "display_cache" created.
[root@host-082 ~]# lvcreate -L 4G -n pool cache_sanity /dev/sdf1
  Logical volume "pool" created.
[root@host-082 ~]# lvcreate -L 12M -n pool_meta cache_sanity /dev/sdf1
  Logical volume "pool_meta" created.

[root@host-082 ~]# lvconvert --yes --type cache-pool --cachepolicy cleaner --cachemode writeback -c 64 --poolmetadata cache_sanity/pool_meta cache_sanity/pool
  WARNING: Converting logical volume cache_sanity/pool and cache_sanity/pool_meta to pool's data and metadata volumes.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
  Converted cache_sanity/pool to cache pool.
[root@host-082 ~]# lvconvert --yes --type cache --cachepool cache_sanity/pool cache_sanity/display_cache
  Logical volume cache_sanity/display_cache is now cached.

[root@host-082 ~]# lvs -a -o +devices,cache_policy,cache_mode
  LV                    VG           Attr       LSize   Pool   Origin                Data%  Meta%  Cpy%Sync Devices                Cache Policy Cachemode
  display_cache         cache_sanity Cwi-a-C---   4.00g [pool] [display_cache_corig] 0.00   4.39   100.00   display_cache_corig(0) cleaner      writeback
  [display_cache_corig] cache_sanity owi-aoC---   4.00g                                                     /dev/sdg1(0)
  [lvol0_pmspare]       cache_sanity ewi-------  12.00m                                                     /dev/sda1(0)
  [pool]                cache_sanity Cwi---C---   4.00g                              0.00   4.39   100.00   pool_cdata(0)          cleaner      writeback
  [pool_cdata]          cache_sanity Cwi-ao----   4.00g                                                     /dev/sdf1(0)
  [pool_cmeta]          cache_sanity ewi-ao----  12.00m                                                     /dev/sdf1(1024)
[root@host-082 ~]# lvchange --cachepolicy smq cache_sanity/display_cache
  Logical volume "display_cache" changed.
[root@host-082 ~]# lvchange --cachepolicy mq cache_sanity/display_cache


[  266.840237] device-mapper: cache cleaner: version 1.0.0 loaded
Oct 15 14:43:58 host-082 kernel: device-mapper: cache cleaner: version 1.0.0 loaded
[  301.147662] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  301.148017] IP: [<ffffffffa034c135>] dm_tm_new_block+0x15/0xb0 [dm_persistent_data]
[  301.148017] PGD 0 
[  301.148017] Oops: 0000 [#1] SMP 
[...]
Comment 6 Zdenek Kabelac 2015-10-16 03:59:52 EDT
Also needs kernel fix.

For lvm2 side -  'dmsetup table' should be now showing 'writethrough' mode when 'cleaner' policy is in use.

(Later we will provide more command warnings about enforced usage of 'writethrough' mode when cleaner policy is being set).
Comment 7 Corey Marthaler 2015-10-19 11:40:38 EDT
This is fixed with the latest kernel. Even though "writeback" mode was specified on the cmdline, "writethrough" is shown through 'dmsetup table'.

[root@host-113 ~]# lvconvert --yes --type cache-pool --cachepolicy cleaner --cachemode writeback -c 64 --poolmetadata cache_sanity/pool_meta cache_sanity/pool
  WARNING: Converting logical volume cache_sanity/pool and cache_sanity/pool_meta to pool's data and metadata volumes.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
  Converted cache_sanity/pool to cache pool.
[root@host-113 ~]# lvconvert --yes --type cache --cachepool cache_sanity/pool cache_sanity/display_cache
  Logical volume cache_sanity/display_cache is now cached.

[root@host-113 ~]# lvs -a -o +devices,cache_policy,cache_mode
  LV                    Attr       LSize  Pool   Origin                Data%  Meta%  Cpy%Sync  Devices                Cache Policy Cachemode
  display_cache         Cwi-a-C---  4.00g [pool] [display_cache_corig] 0.00   4.39   100.00    display_cache_corig(0) cleaner      writeback
  [display_cache_corig] owi-aoC---  4.00g                                                      /dev/sda1(0)
  [lvol0_pmspare]       ewi------- 12.00m                                                      /dev/sda1(1024)
  [pool]                Cwi---C---  4.00g                              0.00   4.39   100.00    pool_cdata(0)          cleaner      writeback
  [pool_cdata]          Cwi-ao----  4.00g                                                      /dev/sdb1(0)
  [pool_cmeta]          ewi-ao---- 12.00m                                                      /dev/sdb1(1024)

# cache mode is "writethrough" even though "writeback" was specified:
[root@host-113 ~]# dmsetup table
cache_sanity-display_cache: 0 8388608 cache 253:4 253:3 253:5 128 1 writethrough cleaner 0
cache_sanity-display_cache_corig: 0 8388608 linear 8:1 2048
cache_sanity-pool_cdata: 0 8388608 linear 8:17 2048
cache_sanity-pool_cmeta: 0 24576 linear 8:17 8390656

[root@host-113 ~]# lvchange --cachepolicy smq cache_sanity/display_cache
  Logical volume "display_cache" changed.
[root@host-113 ~]# lvchange --cachepolicy mq cache_sanity/display_cache
  Logical volume "display_cache" changed.
[root@host-113 ~]# dmsetup table

cache_sanity-display_cache: 0 8388608 cache 253:4 253:3 253:5 128 1 writeback mq 0
cache_sanity-display_cache_corig: 0 8388608 linear 8:1 2048
cache_sanity-pool_cdata: 0 8388608 linear 8:17 2048
cache_sanity-pool_cmeta: 0 24576 linear 8:17 8390656


3.10.0-325.el7.x86_64

lvm2-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
lvm2-libs-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
lvm2-cluster-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-libs-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-event-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-event-libs-1.02.107-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
device-mapper-persistent-data-0.5.5-1.el7    BUILT: Thu Aug 13 09:58:10 CDT 2015
cmirror-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
sanlock-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
sanlock-lib-3.2.4-1.el7    BUILT: Fri Jun 19 12:48:49 CDT 2015
lvm2-lockd-2.02.130-5.el7    BUILT: Wed Oct 14 08:27:29 CDT 2015
Comment 8 errata-xmlrpc 2015-11-19 07:48:06 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2147.html

Note You need to log in before you can comment on or make changes to this bug.