1018171 – /dev/md/xxx symlink not removed

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1018171 - /dev/md/xxx symlink not removed

Summary: /dev/md/xxx symlink not removed

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	udev
Sub Component:
Version:	6.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Harald Hoyer
QA Contact:	Leos Pol
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-10-11 11:12 UTC by Jan Safranek
Modified:	2015-07-22 07:19 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	If udev processed the uevent queue for a device that was already removed, the internal handling failed to process an already removed device. Consequently, some symbolic links were not removed for these devices. Now, udev no longer relies on the existence of a device when dealing with the backlog of the uevent queue, and all symbolic links are removed as expected.
Clone Of:
Environment:
Last Closed:	2015-07-22 07:19:42 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
reproducer. Don't forget to change DISKS variable inside! (305 bytes, application/x-shellscript) 2013-10-11 11:12 UTC, Jan Safranek	no flags	Details
reproducerDon't forget to change DISKS variables inside! (526 bytes, application/x-shellscript) 2013-10-11 11:35 UTC, Jan Safranek	no flags	Details
Updated reproducer (882 bytes, text/plain) 2013-10-31 16:15 UTC, Jan Safranek	no flags	Details
script output (20.77 KB, text/plain) 2013-10-31 16:23 UTC, Jan Safranek	no flags	Details
syslog messages (18.14 KB, text/plain) 2013-10-31 16:24 UTC, Jan Safranek	no flags	Details
Show Obsolete (2) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2015:1382	0	normal	SHIPPED_LIVE	udev bug fix update	2015-07-20 17:58:19 UTC

Description Jan Safranek 2013-10-11 11:12:33 UTC

Created attachment 811002 [details]
reproducer. Don't forget to change DISKS variable inside!

'mdadm -S /dev/md/test' sometimes does not remove /dev/md/test symlink, while /dev/md127 device is destroyed correctly.

Version-Release number of selected component (if applicable):
udev-147-2.48.el6.x86_64
kernel-2.6.32-422.el6.x86_64

How reproducible:
sometimes

Steps to Reproduce:
1. run attached test script

Actual results (typically):
Failed after: 2

Expected results:
Endless loop of mdadm -C and mdadm -S

Additional info:
It is reproducible only sometimes. When I clear both RAID members with 'dd', I usually get correct behavior. I have complex test suite for OpenLMI project which creates/destroys MD RAIDs, partitions, LVs, various filestems etc. and after the test suite finishes, I can reproduce the bug reliably. It seems some stray data on the RAID members or in memory make the bug reproducible.

Comment 1 Jan Safranek 2013-10-11 11:13:40 UTC

dmesg of mdadm -C  and mdadm -S at the time of buggy behavior, imho nothing interesting:

md: bind<sda>
md: bind<sdb1>
bio: create slab <bio-1> at 1
md/raid0:md127: md_size is 2287616 sectors.
md: RAID0 configuration for md127 - 2 zones
md: zone0=[sda/sdb1]
      zone-offset=         0KB, device-offset=         0KB, size=    191488KB
md: zone1=[sda]
      zone-offset=    191488KB, device-offset=     95744KB, size=    952320KB

md127: detected capacity change from 0 to 1171259392
 md127: unknown partition table
md127: detected capacity change from 1171259392 to 0
md: md127 stopped.
md: unbind<sdb1>
md: export_rdev(sdb1)
md: unbind<sda>
md: export_rdev(sda)

Comment 3 Jan Safranek 2013-10-11 11:35:37 UTC

Created attachment 811006 [details]
reproducerDon't forget to change DISKS variables inside!

Attached new reproducer script, now I can reproduce it with 100% success rate.

Comment 4 RHEL Program Management 2013-10-15 01:34:14 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 5 Harald Hoyer 2013-10-31 12:07:32 UTC

(In reply to Jan Safranek from comment #3)
> Created attachment 811006 [details]
> reproducerDon't forget to change DISKS variables inside!
> 
> Attached new reproducer script, now I can reproduce it with 100% success
> rate.

instead of "sleep 2", you probably should use "udevadm settle"

See, if you can reproduce it then.

Comment 6 Jan Safranek 2013-10-31 12:48:14 UTC

Result is the same with udevadm settle, fails after several loops.

Comment 7 Harald Hoyer 2013-10-31 14:07:37 UTC

(In reply to Jan Safranek from comment #6)
> Result is the same with udevadm settle, fails after several loops.

and after the script ends/fails, the symlinks are still there and never ever removed? even after a 3 minutes?

maybe add

# udevadm control --log-priority=info

or run in parallel:

# udevadm monitor

and attach the last relevant messages from /var/log/messages and the monitor output

Comment 8 Jan Safranek 2013-10-31 16:13:10 UTC

I can't reproduce the bug with 'udevadm control --log-priority=info'. This probably indicates the bug is dependent on exact timing / race.

Comment 9 Jan Safranek 2013-10-31 16:15:06 UTC

Created attachment 817971 [details]
Updated reproducer

Updated reproducer script with enhanced logging (basically run udevadm monitor in background + print timestaps as the script progresses).

Comment 10 Jan Safranek 2013-10-31 16:23:59 UTC

Created attachment 817976 [details]
script output

Log with udevadm + script mixed together. udevadm seems to write log in batches, I'm not able to judge if it's due to buffered output or some delay in kernel/udev.

Anyway, just beware the udevadmm logs are written late and merge them to appropriate place.

Comment 11 Jan Safranek 2013-10-31 16:24:52 UTC

Created attachment 817979 [details]
syslog messages

syslog from the time the reproducer was running

Comment 12 Jan Safranek 2013-10-31 16:30:56 UTC

umm, syslog shows this error at the time the bug occurs:

Oct 31 16:10:01 rhel6_test udevd-work[1613]: inotify_add_watch(6, /dev/md127, 10) failed: No such file or directory

(=1383232199.000000000 UNIX time)

Comment 13 Harald Hoyer 2013-11-04 16:24:54 UTC

and after the script ends/fails, the symlinks are still there and never ever removed? even after a 3 minutes????

Comment 14 Jan Safranek 2013-11-04 16:43:59 UTC

Yes, after a few minutes of poking around the system, the symlink is still there.

Comment 15 Harald Hoyer 2013-11-04 17:07:33 UTC

I guess stracing udevd and its workers will change the timing also

Comment 16 Harald Hoyer 2013-11-04 17:21:59 UTC

(In reply to Jan Safranek from comment #10)
> Log with udevadm + script mixed together. udevadm seems to write log in
> batches, I'm not able to judge if it's due to buffered output or some delay
> in kernel/udev.

redirect stdout to stderr for udevadm monitor.

stdout is buffered, stderr is not

Comment 17 Harald Hoyer 2013-11-11 15:18:53 UTC

http://cgit.freedesktop.org/systemd/systemd/commit/?id=bf9d233f781f27841be6638ee745e9c80bda5f4d

Comment 20 Leos Pol 2015-04-15 08:31:10 UTC

# rpm -q udev
udev-147-2.57.el6.x86_64
....
lrwxrwxrwx. 1 root root 8 15. dub 04.21 /dev/md/test -> ../md127
Failed after: 2


# rpm -q udev
udev-147-2.61.el6.x86_64
....
runs forewer

Honzo, thanks for reproducer.

Comment 21 errata-xmlrpc 2015-07-22 07:19:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1382.html

Note You need to log in before you can comment on or make changes to this bug.