Created attachment 811002 [details]
reproducer. Don't forget to change DISKS variable inside!
'mdadm -S /dev/md/test' sometimes does not remove /dev/md/test symlink, while /dev/md127 device is destroyed correctly.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. run attached test script
Actual results (typically):
Failed after: 2
Endless loop of mdadm -C and mdadm -S
It is reproducible only sometimes. When I clear both RAID members with 'dd', I usually get correct behavior. I have complex test suite for OpenLMI project which creates/destroys MD RAIDs, partitions, LVs, various filestems etc. and after the test suite finishes, I can reproduce the bug reliably. It seems some stray data on the RAID members or in memory make the bug reproducible.
dmesg of mdadm -C and mdadm -S at the time of buggy behavior, imho nothing interesting:
bio: create slab <bio-1> at 1
md/raid0:md127: md_size is 2287616 sectors.
md: RAID0 configuration for md127 - 2 zones
zone-offset= 0KB, device-offset= 0KB, size= 191488KB
zone-offset= 191488KB, device-offset= 95744KB, size= 952320KB
md127: detected capacity change from 0 to 1171259392
md127: unknown partition table
md127: detected capacity change from 1171259392 to 0
md: md127 stopped.
Created attachment 811006 [details]
reproducerDon't forget to change DISKS variables inside!
Attached new reproducer script, now I can reproduce it with 100% success rate.
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
(In reply to Jan Safranek from comment #3)
> Created attachment 811006 [details]
> reproducerDon't forget to change DISKS variables inside!
> Attached new reproducer script, now I can reproduce it with 100% success
instead of "sleep 2", you probably should use "udevadm settle"
See, if you can reproduce it then.
Result is the same with udevadm settle, fails after several loops.
(In reply to Jan Safranek from comment #6)
> Result is the same with udevadm settle, fails after several loops.
and after the script ends/fails, the symlinks are still there and never ever removed? even after a 3 minutes?
# udevadm control --log-priority=info
or run in parallel:
# udevadm monitor
and attach the last relevant messages from /var/log/messages and the monitor output
I can't reproduce the bug with 'udevadm control --log-priority=info'. This probably indicates the bug is dependent on exact timing / race.
Created attachment 817971 [details]
Updated reproducer script with enhanced logging (basically run udevadm monitor in background + print timestaps as the script progresses).
Created attachment 817976 [details]
Log with udevadm + script mixed together. udevadm seems to write log in batches, I'm not able to judge if it's due to buffered output or some delay in kernel/udev.
Anyway, just beware the udevadmm logs are written late and merge them to appropriate place.
Created attachment 817979 [details]
syslog from the time the reproducer was running
umm, syslog shows this error at the time the bug occurs:
Oct 31 16:10:01 rhel6_test udevd-work: inotify_add_watch(6, /dev/md127, 10) failed: No such file or directory
(=1383232199.000000000 UNIX time)
and after the script ends/fails, the symlinks are still there and never ever removed? even after a 3 minutes????
Yes, after a few minutes of poking around the system, the symlink is still there.
I guess stracing udevd and its workers will change the timing also
(In reply to Jan Safranek from comment #10)
> Log with udevadm + script mixed together. udevadm seems to write log in
> batches, I'm not able to judge if it's due to buffered output or some delay
> in kernel/udev.
redirect stdout to stderr for udevadm monitor.
stdout is buffered, stderr is not
# rpm -q udev
lrwxrwxrwx. 1 root root 8 15. dub 04.21 /dev/md/test -> ../md127
Failed after: 2
# rpm -q udev
Honzo, thanks for reproducer.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.