Bug 1018171 - /dev/md/xxx symlink not removed
/dev/md/xxx symlink not removed
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: udev (Show other bugs)
6.4
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Harald Hoyer
Leos Pol
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-11 07:12 EDT by Jan Safranek
Modified: 2015-07-22 03:19 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
If udev processed the uevent queue for a device that was already removed, the internal handling failed to process an already removed device. Consequently, some symbolic links were not removed for these devices. Now, udev no longer relies on the existence of a device when dealing with the backlog of the uevent queue, and all symbolic links are removed as expected.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-07-22 03:19:42 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
reproducer. Don't forget to change DISKS variable inside! (305 bytes, application/x-shellscript)
2013-10-11 07:12 EDT, Jan Safranek
no flags Details
reproducerDon't forget to change DISKS variables inside! (526 bytes, application/x-shellscript)
2013-10-11 07:35 EDT, Jan Safranek
no flags Details
Updated reproducer (882 bytes, text/plain)
2013-10-31 12:15 EDT, Jan Safranek
no flags Details
script output (20.77 KB, text/plain)
2013-10-31 12:23 EDT, Jan Safranek
no flags Details
syslog messages (18.14 KB, text/plain)
2013-10-31 12:24 EDT, Jan Safranek
no flags Details

  None (edit)
Description Jan Safranek 2013-10-11 07:12:33 EDT
Created attachment 811002 [details]
reproducer. Don't forget to change DISKS variable inside!

'mdadm -S /dev/md/test' sometimes does not remove /dev/md/test symlink, while /dev/md127 device is destroyed correctly.

Version-Release number of selected component (if applicable):
udev-147-2.48.el6.x86_64
kernel-2.6.32-422.el6.x86_64

How reproducible:
sometimes

Steps to Reproduce:
1. run attached test script

Actual results (typically):
Failed after: 2

Expected results:
Endless loop of mdadm -C and mdadm -S

Additional info:
It is reproducible only sometimes. When I clear both RAID members with 'dd', I usually get correct behavior. I have complex test suite for OpenLMI project which creates/destroys MD RAIDs, partitions, LVs, various filestems etc. and after the test suite finishes, I can reproduce the bug reliably. It seems some stray data on the RAID members or in memory make the bug reproducible.
Comment 1 Jan Safranek 2013-10-11 07:13:40 EDT
dmesg of mdadm -C  and mdadm -S at the time of buggy behavior, imho nothing interesting:

md: bind<sda>
md: bind<sdb1>
bio: create slab <bio-1> at 1
md/raid0:md127: md_size is 2287616 sectors.
md: RAID0 configuration for md127 - 2 zones
md: zone0=[sda/sdb1]
      zone-offset=         0KB, device-offset=         0KB, size=    191488KB
md: zone1=[sda]
      zone-offset=    191488KB, device-offset=     95744KB, size=    952320KB

md127: detected capacity change from 0 to 1171259392
 md127: unknown partition table
md127: detected capacity change from 1171259392 to 0
md: md127 stopped.
md: unbind<sdb1>
md: export_rdev(sdb1)
md: unbind<sda>
md: export_rdev(sda)
Comment 3 Jan Safranek 2013-10-11 07:35:37 EDT
Created attachment 811006 [details]
reproducerDon't forget to change DISKS variables inside!

Attached new reproducer script, now I can reproduce it with 100% success rate.
Comment 4 RHEL Product and Program Management 2013-10-14 21:34:14 EDT
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 5 Harald Hoyer 2013-10-31 08:07:32 EDT
(In reply to Jan Safranek from comment #3)
> Created attachment 811006 [details]
> reproducerDon't forget to change DISKS variables inside!
> 
> Attached new reproducer script, now I can reproduce it with 100% success
> rate.

instead of "sleep 2", you probably should use "udevadm settle"

See, if you can reproduce it then.
Comment 6 Jan Safranek 2013-10-31 08:48:14 EDT
Result is the same with udevadm settle, fails after several loops.
Comment 7 Harald Hoyer 2013-10-31 10:07:37 EDT
(In reply to Jan Safranek from comment #6)
> Result is the same with udevadm settle, fails after several loops.

and after the script ends/fails, the symlinks are still there and never ever removed? even after a 3 minutes?

maybe add

# udevadm control --log-priority=info

or run in parallel:

# udevadm monitor

and attach the last relevant messages from /var/log/messages and the monitor output
Comment 8 Jan Safranek 2013-10-31 12:13:10 EDT
I can't reproduce the bug with 'udevadm control --log-priority=info'. This probably indicates the bug is dependent on exact timing / race.
Comment 9 Jan Safranek 2013-10-31 12:15:06 EDT
Created attachment 817971 [details]
Updated reproducer

Updated reproducer script with enhanced logging (basically run udevadm monitor in background + print timestaps as the script progresses).
Comment 10 Jan Safranek 2013-10-31 12:23:59 EDT
Created attachment 817976 [details]
script output

Log with udevadm + script mixed together. udevadm seems to write log in batches, I'm not able to judge if it's due to buffered output or some delay in kernel/udev.

Anyway, just beware the udevadmm logs are written late and merge them to appropriate place.
Comment 11 Jan Safranek 2013-10-31 12:24:52 EDT
Created attachment 817979 [details]
syslog messages

syslog from the time the reproducer was running
Comment 12 Jan Safranek 2013-10-31 12:30:56 EDT
umm, syslog shows this error at the time the bug occurs:

Oct 31 16:10:01 rhel6_test udevd-work[1613]: inotify_add_watch(6, /dev/md127, 10) failed: No such file or directory

(=1383232199.000000000 UNIX time)
Comment 13 Harald Hoyer 2013-11-04 11:24:54 EST
and after the script ends/fails, the symlinks are still there and never ever removed? even after a 3 minutes????
Comment 14 Jan Safranek 2013-11-04 11:43:59 EST
Yes, after a few minutes of poking around the system, the symlink is still there.
Comment 15 Harald Hoyer 2013-11-04 12:07:33 EST
I guess stracing udevd and its workers will change the timing also
Comment 16 Harald Hoyer 2013-11-04 12:21:59 EST
(In reply to Jan Safranek from comment #10)
> Log with udevadm + script mixed together. udevadm seems to write log in
> batches, I'm not able to judge if it's due to buffered output or some delay
> in kernel/udev.

redirect stdout to stderr for udevadm monitor.

stdout is buffered, stderr is not
Comment 20 Leos Pol 2015-04-15 04:31:10 EDT
# rpm -q udev
udev-147-2.57.el6.x86_64
....
lrwxrwxrwx. 1 root root 8 15. dub 04.21 /dev/md/test -> ../md127
Failed after: 2


# rpm -q udev
udev-147-2.61.el6.x86_64
....
runs forewer

Honzo, thanks for reproducer.
Comment 21 errata-xmlrpc 2015-07-22 03:19:42 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1382.html

Note You need to log in before you can comment on or make changes to this bug.