Bug 1506287

Summary: Kernel NULL pointer dereference, device mapper in call trace
Product: [Community] LVM and device-mapper Reporter: Tony Asleson <tasleson>
Component: device-mapperAssignee: Mikuláš Patočka <mpatocka>
Status: NEW --- QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 2.02.173CC: agk, heinzm, jbrassow, mpatocka, msnitzer, prajnoha, zkabelac
Target Milestone: ---Keywords: Triaged
Target Release: ---Flags: rule-engine: lvm-technical-solution?
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Kernel oops output none

Description Tony Asleson 2017-10-25 15:17:58 UTC
Created attachment 1343311 [details]
Kernel oops output

Description of problem:

While running stratis unit tests on loopback devices I encountered a kernel Oops which has a call trace which shows dm in the stack.

Version-Release number of selected component (if applicable):
F27

4.13.8-300.fc27.x86_64
lvm2-2.02.173-4.fc27
device-mapper-1.02.142-4.fc27

$ sudo lvm version
  LVM version:     2.02.173(2) (2017-07-20)
  Library version: 1.02.142 (2017-07-20)
  Driver version:  4.37.0
  Configuration:   ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-default-dm-run-dir=/run --with-default-run-dir=/run/lvm --with-default-pid-dir=/run --with-default-locking-dir=/run/lock/lvm --with-usrlibdir=/usr/lib64 --enable-lvm1_fallback --enable-fsadm --with-pool=internal --enable-write_install --with-user= --with-group= --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --enable-pkgconfig --enable-applib --enable-cmdlib --enable-dmeventd --enable-blkid_wiping --enable-python2-bindings --enable-python3-bindings --with-cluster=internal --with-clvmd=corosync --enable-cmirrord --with-udevdir=/usr/lib/udev/rules.d --enable-udev_sync --with-thin=internal --enable-lvmetad --with-cache=internal --enable-lvmpolld --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-dbus-service --enable-notify-dbus --enable-dmfilemap

How reproducible:
Unsure

Steps to Reproduce:
I was running the stratis `make test-loop` unit tests with some experimental code which systematically umounts all stratis mounted XFS filesystems and then runs `dmsetup remove_all --force` between each unit test to ensure we start clean for each unit test.  With these changes in place I encountered a kernel oops.

Actual results:
Kernel Oops

Expected results:
No kernel Oops

Additional info:

Comment 1 Tony Asleson 2017-10-30 16:22:30 UTC
It appears that when dm devices are stacked that they need to be removed in reverse order.  If you use dmsetup remove_all --force or systematically remove devices using the deferred remove flag in an incorrect order you cause this kernel oops.  If this is expected we can certainly close out this issue.

Comment 2 Mike Snitzer 2017-10-30 20:12:45 UTC
(In reply to Tony Asleson from comment #1)
> It appears that when dm devices are stacked that they need to be removed in
> reverse order.  If you use dmsetup remove_all --force or systematically
> remove devices using the deferred remove flag in an incorrect order you
> cause this kernel oops.  If this is expected we can certainly close out this
> issue.

Yeah, don't use remove_all --force.  Using --force is a HUGE hammer.  Your testing needs to be way more careful to teardown more systematically.

As for using deferred removal.  That _should_ be OK.  Because it doesn't actually remove the device until it is no longer used.

So I'm not sure why you'd be having problems with it.

Comment 3 Tony Asleson 2017-10-31 15:10:50 UTC
(In reply to Mike Snitzer from comment #2)
> As for using deferred removal.  That _should_ be OK.  Because it doesn't
> actually remove the device until it is no longer used.
> 
> So I'm not sure why you'd be having problems with it.

Well maybe there is a bug when using the deferred flag... when I don't use the flag I get an error that device is busy, but I don't oops the kernel.

Basically I just iterate over all the dm devices doing a remove with the deferred flag, so in theory it should result in all the devices getting removed eventually, but doing so I can cause a kernel oops pretty easily.  If I repeatedly loop over the device list without the deferred flag, I can get all the devices to get removed without causing a kernel oops while getting the device in use errors.

Comment 4 Tony Asleson 2017-11-17 19:57:05 UTC
I discovered that the unit tests were being massively threaded by default, thus we had concurrent access to the device mapper ioctl interface.  I would still assume that in this use case that the dm code would not end up dereferencing a NULL pointer.  Hopefully this will help in root cause determination.

Comment 5 Zdenek Kabelac 2017-11-17 20:21:49 UTC
The kernel trace suggest - there was already a trouble reported by thin-pool targets.


remove --force  is 'very nasty' last-defense logic implement to help with cases the device is holding some resources but cannot be closed - so with --force at least 'error' target is put as replacement.

However this 'replacement' is NOT support with more complex target like thin-pool  as some operation inside this complex target do support only 'clean' target removal not such brute force trying.

So I assume there can be code path where thin-pool is put into an untested territory where all 'reads/write'  fail in  preresume case.

The logic behind kernel 'preresume' somewhat changed it's original meaning of simple validation of user's passed state into a very complex 'metadata' parsing stage and at it's known 'postponned' issue. 

There is problem that such preresume/resume failure is typically unsolvable by any user-land app as happens after commit point...

Comment 6 Mikuláš Patočka 2017-11-21 00:31:45 UTC
BTW. what does the dmsetup --force flag really do?

My tests show that it does nothing (dmsetup remove --force or remove_all --force don't remove an open device). And looking at the kernel source, I don't see any code that tests for that flag at all in dev_remove or remove_all.

dm_hash_remove_all will perform forced remove only if the first argument is false, but it is called with the first false argument only from dm_hash_exit and that is called from dm_interface_init and dm_interface_exit - that is module initialization and cleanup and it can't be triggered by userspace ioctls.

I.e. there seems to be no way how userspace could trigger forced removal of a device mapper device.

BTW. could it be that your test scripts do "rmmod -f"? That could cause crashes and the scripts shouldn't do that.

Comment 7 Zdenek Kabelac 2017-11-21 15:49:36 UTC
(In reply to Mikulas Patocka from comment #6)
> BTW. what does the dmsetup --force flag really do?
> 
> My tests show that it does nothing (dmsetup remove --force or remove_all
> --force don't remove an open device). And looking at the kernel source, I
> don't see any code that tests for that flag at all in dev_remove or
> remove_all.
> 


Just to answer this specifically -

'dmsetup remove --force'   

tries to 'remove'  a device - however if device is found to be in-use (opened) - it cannot be dropped from table - but resource associated with a device can be usually dropped -

so i.e. if linear device holds a reference to  /dev/sdX device - it can be dropped by replacing target with 'error' segment of the equal size.

The clear purpose is - any further I/O gets error (possibly making it even possible to actually close such device -  of course this may from time to time hit a bug on the road - but i.e.   ext4 or XFS handles such case well enough these days.

But of course - whever user is using --force - he is doing something seriously unwanted and should consider twice whether such risk is worth...

Comment 8 Mikuláš Patočka 2017-11-21 19:22:09 UTC
Zdenek Kabeláč: replacing a device with error target shouldn't crash - so the advices to not use --force are not correct and we should find the bug and fix it.