Bug 1018171

Summary: /dev/md/xxx symlink not removed
Product: Red Hat Enterprise Linux 6 Reporter: Jan Safranek <jsafrane>
Component: udevAssignee: Harald Hoyer <harald>
Status: CLOSED ERRATA QA Contact: Leos Pol <lpol>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.4CC: jsafrane, lpol, mnavrati, psklenar, udev-maint-list
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
If udev processed the uevent queue for a device that was already removed, the internal handling failed to process an already removed device. Consequently, some symbolic links were not removed for these devices. Now, udev no longer relies on the existence of a device when dealing with the backlog of the uevent queue, and all symbolic links are removed as expected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-22 07:19:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
reproducer. Don't forget to change DISKS variable inside!
none
reproducerDon't forget to change DISKS variables inside!
none
Updated reproducer
none
script output
none
syslog messages none

Description Jan Safranek 2013-10-11 11:12:33 UTC
Created attachment 811002 [details]
reproducer. Don't forget to change DISKS variable inside!

'mdadm -S /dev/md/test' sometimes does not remove /dev/md/test symlink, while /dev/md127 device is destroyed correctly.

Version-Release number of selected component (if applicable):
udev-147-2.48.el6.x86_64
kernel-2.6.32-422.el6.x86_64

How reproducible:
sometimes

Steps to Reproduce:
1. run attached test script

Actual results (typically):
Failed after: 2

Expected results:
Endless loop of mdadm -C and mdadm -S

Additional info:
It is reproducible only sometimes. When I clear both RAID members with 'dd', I usually get correct behavior. I have complex test suite for OpenLMI project which creates/destroys MD RAIDs, partitions, LVs, various filestems etc. and after the test suite finishes, I can reproduce the bug reliably. It seems some stray data on the RAID members or in memory make the bug reproducible.

Comment 1 Jan Safranek 2013-10-11 11:13:40 UTC
dmesg of mdadm -C  and mdadm -S at the time of buggy behavior, imho nothing interesting:

md: bind<sda>
md: bind<sdb1>
bio: create slab <bio-1> at 1
md/raid0:md127: md_size is 2287616 sectors.
md: RAID0 configuration for md127 - 2 zones
md: zone0=[sda/sdb1]
      zone-offset=         0KB, device-offset=         0KB, size=    191488KB
md: zone1=[sda]
      zone-offset=    191488KB, device-offset=     95744KB, size=    952320KB

md127: detected capacity change from 0 to 1171259392
 md127: unknown partition table
md127: detected capacity change from 1171259392 to 0
md: md127 stopped.
md: unbind<sdb1>
md: export_rdev(sdb1)
md: unbind<sda>
md: export_rdev(sda)

Comment 3 Jan Safranek 2013-10-11 11:35:37 UTC
Created attachment 811006 [details]
reproducerDon't forget to change DISKS variables inside!

Attached new reproducer script, now I can reproduce it with 100% success rate.

Comment 4 RHEL Program Management 2013-10-15 01:34:14 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 5 Harald Hoyer 2013-10-31 12:07:32 UTC
(In reply to Jan Safranek from comment #3)
> Created attachment 811006 [details]
> reproducerDon't forget to change DISKS variables inside!
> 
> Attached new reproducer script, now I can reproduce it with 100% success
> rate.

instead of "sleep 2", you probably should use "udevadm settle"

See, if you can reproduce it then.

Comment 6 Jan Safranek 2013-10-31 12:48:14 UTC
Result is the same with udevadm settle, fails after several loops.

Comment 7 Harald Hoyer 2013-10-31 14:07:37 UTC
(In reply to Jan Safranek from comment #6)
> Result is the same with udevadm settle, fails after several loops.

and after the script ends/fails, the symlinks are still there and never ever removed? even after a 3 minutes?

maybe add

# udevadm control --log-priority=info

or run in parallel:

# udevadm monitor

and attach the last relevant messages from /var/log/messages and the monitor output

Comment 8 Jan Safranek 2013-10-31 16:13:10 UTC
I can't reproduce the bug with 'udevadm control --log-priority=info'. This probably indicates the bug is dependent on exact timing / race.

Comment 9 Jan Safranek 2013-10-31 16:15:06 UTC
Created attachment 817971 [details]
Updated reproducer

Updated reproducer script with enhanced logging (basically run udevadm monitor in background + print timestaps as the script progresses).

Comment 10 Jan Safranek 2013-10-31 16:23:59 UTC
Created attachment 817976 [details]
script output

Log with udevadm + script mixed together. udevadm seems to write log in batches, I'm not able to judge if it's due to buffered output or some delay in kernel/udev.

Anyway, just beware the udevadmm logs are written late and merge them to appropriate place.

Comment 11 Jan Safranek 2013-10-31 16:24:52 UTC
Created attachment 817979 [details]
syslog messages

syslog from the time the reproducer was running

Comment 12 Jan Safranek 2013-10-31 16:30:56 UTC
umm, syslog shows this error at the time the bug occurs:

Oct 31 16:10:01 rhel6_test udevd-work[1613]: inotify_add_watch(6, /dev/md127, 10) failed: No such file or directory

(=1383232199.000000000 UNIX time)

Comment 13 Harald Hoyer 2013-11-04 16:24:54 UTC
and after the script ends/fails, the symlinks are still there and never ever removed? even after a 3 minutes????

Comment 14 Jan Safranek 2013-11-04 16:43:59 UTC
Yes, after a few minutes of poking around the system, the symlink is still there.

Comment 15 Harald Hoyer 2013-11-04 17:07:33 UTC
I guess stracing udevd and its workers will change the timing also

Comment 16 Harald Hoyer 2013-11-04 17:21:59 UTC
(In reply to Jan Safranek from comment #10)
> Log with udevadm + script mixed together. udevadm seems to write log in
> batches, I'm not able to judge if it's due to buffered output or some delay
> in kernel/udev.

redirect stdout to stderr for udevadm monitor.

stdout is buffered, stderr is not

Comment 20 Leos Pol 2015-04-15 08:31:10 UTC
# rpm -q udev
udev-147-2.57.el6.x86_64
....
lrwxrwxrwx. 1 root root 8 15. dub 04.21 /dev/md/test -> ../md127
Failed after: 2


# rpm -q udev
udev-147-2.61.el6.x86_64
....
runs forewer

Honzo, thanks for reproducer.

Comment 21 errata-xmlrpc 2015-07-22 07:19:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1382.html