Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1559692

Summary: Hang or NULL pointer dereference if reading sysfs during VDO start/stop.
Product: Red Hat Enterprise Linux 7 Reporter: Sweet Tea Dorminy <sweettea>
Component: kmod-kvdoAssignee: Thomas Jaskiewicz <tjaskiew>
Status: CLOSED ERRATA QA Contact: Jakub Krysl <jkrysl>
Severity: unspecified Docs Contact:
Priority: high    
Version: 7.5CC: awalsh, bgurney, jkrysl, rhandlin, tjaskiew
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 6.1.1.60 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1567744 (view as bug list) Environment:
Last Closed: 2018-10-30 09:39:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1567744    

Description Sweet Tea Dorminy 2018-03-23 03:21:31 UTC
Description of problem:
If someone attempts to read sysfs entries while a VDO is starting up or shutting down, a hang or a NULL pointer dereference may occur. The VDO may be freed or not yet exist for the specific parts needed for the sysfs invocation.

Version-Release number of selected component (if applicable):
6.1.0.155

How reproducible:
1 in 10

Steps to Reproduce:
1. In a shell, run 'while true; cat /sys/kvdo/vdo0/statistics/data_blocks_used || true; done;'
2. Make a VDO but don't start it.
3. Start and stop the VDO in a loop.

Actual results:
Eventually, 120s hung task warnings will result for both a dmsetup command and a cat. Alternately, a NULL pointer dereference may occur.

Expected results:
No hung tasks or crashes.

Additional info:

Comment 2 Jakub Krysl 2018-04-13 14:24:40 UTC
Reproduced, acking...

1) # vdo create --name vdo0 --device /dev/sdb --activate disabled
2) # vdo activate --name vdo0
3) # while true; do vdo start --name vdo0 --verbose; vdo stop --name vdo0 --verbose; done;
4) (in separate terminal after few cycles of 3) ) # while true; do cat /sys/kvdo/vdo0/statistics/data_blocks_used || true; done;
5) terminals gets stuck
6) sudo shutdown -r now
7) this appears in console:

[  OK  ] Stopped Availability of block devices.
[     *] (2 of 2) A stop job is running for ... user root (1min 25s / 1min 30s)[  963.792399] INFO: task dmsetup:7961 blocked for more than 120 seconds.
[  963.827624] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  963.865243] Call Trace:
[  963.876890]  [<ffffffff89b12f49>] schedule+0x29/0x70
[  963.900072]  [<ffffffff896a124d>] __kernfs_remove+0x17d/0x260
[  963.926933]  [<ffffffff894bbe20>] ? wake_up_atomic_t+0x30/0x30
[  963.954539]  [<ffffffff896a21b1>] kernfs_remove+0x21/0x30
[  963.980037]  [<ffffffff896a46c0>] sysfs_remove_dir+0x50/0x80
[  964.006754]  [<ffffffff8974ccb8>] kobject_del+0x18/0x50
[  964.031334]  [<ffffffff8974cd4e>] kobject_release+0x5e/0x1b0
[  964.057751]  [<ffffffff8974cc08>] kobject_put+0x28/0x60
[  964.082184]  [<ffffffffc087d663>] freeKernelLayer+0x223/0x2f0 [kvdo]
[  964.112320]  [<ffffffffc086e8ad>] vdoDtr+0xfd/0x1b0 [kvdo]
[  964.138357]  [<ffffffff89548250>] ? dyntick_save_progress_counter+0x30/0x30
[  964.171241]  [<ffffffffc0140763>] dm_table_destroy+0x73/0x120 [dm_mod]
[  964.171252]  [<ffffffffc013c726>] __dm_destroy+0x136/0x230 [dm_mod]
[  964.171269]  [<ffffffffc013ec23>] dm_destroy+0x13/0x20 [dm_mod]
[  964.171281]  [<ffffffffc0144c5e>] dev_remove+0x11e/0x1a0 [dm_mod]
[  964.171292]  [<ffffffffc0145b02>] ctl_ioctl+0x212/0x4e0 [dm_mod]
[  964.171308]  [<ffffffffc0144b40>] ? dev_suspend+0x260/0x260 [dm_mod]
[  964.171319]  [<ffffffffc0145dde>] dm_ctl_ioctl+0xe/0x20 [dm_mod]
[  964.171325]  [<ffffffff8962fb90>] do_vfs_ioctl+0x350/0x560
[  964.171329]  [<ffffffff896d82bf>] ? file_has_perm+0x9f/0xb0
[  964.171333]  [<ffffffff8962fe41>] SyS_ioctl+0xa1/0xc0
[  964.171340]  [<ffffffff89b1f7d5>] system_call_fastpath+0x1c/0x21
[  964.171343] INFO: task cat:7963 blocked for more than 120 seconds.
[  964.171343] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  964.171510] Call Trace:
[  964.171515]  [<ffffffff89751384>] ? __radix_tree_lookup+0x84/0xf0
[  964.171521]  [<ffffffff89b12f49>] schedule+0x29/0x70
[  964.171524]  [<ffffffff89b108b9>] schedule_timeout+0x239/0x2c0
[  964.171530]  [<ffffffff895962de>] ? filemap_fault+0x17e/0x490
[  964.171535]  [<ffffffff894f8c5f>] ? __getnstimeofday64+0x3f/0xd0
[  964.171538]  [<ffffffff894f8cfe>] ? getnstimeofday64+0xe/0x30
[  964.171542]  [<ffffffff89b132fd>] wait_for_completion+0xfd/0x140
[  964.171548]  [<ffffffff894cee80>] ? wake_up_state+0x20/0x20
[  964.171570]  [<ffffffffc0873040>] ? finishVDOAction+0x20/0x20 [kvdo]
[  964.171587]  [<ffffffffc0872fb2>] performKVDOOperation+0xb2/0xe0 [kvdo]
[  964.171603]  [<ffffffffc0873040>] ? finishVDOAction+0x20/0x20 [kvdo]
[  964.171616]  [<ffffffffc0873040>] ? finishVDOAction+0x20/0x20 [kvdo]
[  964.171632]  [<ffffffffc08737a7>] getKVDOStatistics+0x57/0x80 [kvdo]
[  964.171648]  [<ffffffffc0877c06>] poolStatsDataBlocksUsedShow+0x36/0x70 [kvdo]
[  964.171663]  [<ffffffffc0873c41>] poolStatsAttrShow+0x21/0x30 [kvdo]
[  964.171667]  [<ffffffff896a3e8f>] sysfs_kf_seq_show+0xcf/0x1f0
[  964.171671]  [<ffffffff896a25d6>] kernfs_seq_show+0x26/0x30
[  964.171675]  [<ffffffff89641410>] seq_read+0x110/0x3f0
[  964.171679]  [<ffffffff896a2e35>] kernfs_fop_read+0xf5/0x160
[  964.171683]  [<ffffffff8961ab3f>] vfs_read+0x9f/0x170
[  964.171686]  [<ffffffff8961ba0f>] SyS_read+0x7f/0xf0
[  964.171692]  [<ffffffff89b1f7d5>] system_call_fastpath+0x1c/0x21
[   ***] (1 of 2) A stop job is running for ...1 of user root (2min 28s / 3min)

Comment 3 Thomas Jaskiewicz 2018-04-13 18:24:12 UTC
*** Bug 1567215 has been marked as a duplicate of this bug. ***

Comment 6 Jakub Krysl 2018-07-03 08:01:22 UTC
Tested on:
RHEL-7.6-20180626.0
kernel-3.10.0-915.el7
kmod-vdo-6.1.1.99-1.el7
vdo-6.1.1.99-2.el7

I was not able to reproduce this anymore, the stop/start cycle keeps going.
Regression testing did not find any issues.

Comment 8 errata-xmlrpc 2018-10-30 09:39:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3094