Bug 1018171

Summary:

/dev/md/xxx symlink not removed

Product:

Red Hat Enterprise Linux 6

Reporter:

Jan Safranek <jsafrane>

Component:

udev

Assignee:

Harald Hoyer <harald>

Status:

CLOSED ERRATA

QA Contact:

Leos Pol <lpol>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

6.4

CC:

jsafrane, lpol, mnavrati, psklenar, udev-maint-list

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

If udev processed the uevent queue for a device that was already removed, the internal handling failed to process an already removed device. Consequently, some symbolic links were not removed for these devices. Now, udev no longer relies on the existence of a device when dealing with the backlog of the uevent queue, and all symbolic links are removed as expected.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-07-22 07:19:42 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
reproducer. Don't forget to change DISKS variable inside!	none
reproducerDon't forget to change DISKS variables inside!	none
Updated reproducer	none
script output	none
syslog messages	none

Description Jan Safranek 2013-10-11 11:12:33 UTC

Created attachment 811002 [details]
reproducer. Don't forget to change DISKS variable inside!

'mdadm -S /dev/md/test' sometimes does not remove /dev/md/test symlink, while /dev/md127 device is destroyed correctly.

Version-Release number of selected component (if applicable):
udev-147-2.48.el6.x86_64
kernel-2.6.32-422.el6.x86_64

How reproducible:
sometimes

Steps to Reproduce:
1. run attached test script

Actual results (typically):
Failed after: 2

Expected results:
Endless loop of mdadm -C and mdadm -S

Additional info:
It is reproducible only sometimes. When I clear both RAID members with 'dd', I usually get correct behavior. I have complex test suite for OpenLMI project which creates/destroys MD RAIDs, partitions, LVs, various filestems etc. and after the test suite finishes, I can reproduce the bug reliably. It seems some stray data on the RAID members or in memory make the bug reproducible.

Comment 1 Jan Safranek 2013-10-11 11:13:40 UTC

dmesg of mdadm -C  and mdadm -S at the time of buggy behavior, imho nothing interesting:

md: bind<sda>
md: bind<sdb1>
bio: create slab <bio-1> at 1
md/raid0:md127: md_size is 2287616 sectors.
md: RAID0 configuration for md127 - 2 zones
md: zone0=[sda/sdb1]
      zone-offset=         0KB, device-offset=         0KB, size=    191488KB
md: zone1=[sda]
      zone-offset=    191488KB, device-offset=     95744KB, size=    952320KB

md127: detected capacity change from 0 to 1171259392
 md127: unknown partition table
md127: detected capacity change from 1171259392 to 0
md: md127 stopped.
md: unbind<sdb1>
md: export_rdev(sdb1)
md: unbind<sda>
md: export_rdev(sda)

Comment 3 Jan Safranek 2013-10-11 11:35:37 UTC

Created attachment 811006 [details]
reproducerDon't forget to change DISKS variables inside!

Attached new reproducer script, now I can reproduce it with 100% success rate.

Comment 4 RHEL Program Management 2013-10-15 01:34:14 UTC

This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 5 Harald Hoyer 2013-10-31 12:07:32 UTC

(In reply to Jan Safranek from comment #3)
> Created attachment 811006 [details]
> reproducerDon't forget to change DISKS variables inside!
> 
> Attached new reproducer script, now I can reproduce it with 100% success
> rate.

instead of "sleep 2", you probably should use "udevadm settle"

See, if you can reproduce it then.

Comment 6 Jan Safranek 2013-10-31 12:48:14 UTC

Result is the same with udevadm settle, fails after several loops.

Comment 7 Harald Hoyer 2013-10-31 14:07:37 UTC

(In reply to Jan Safranek from comment #6)
> Result is the same with udevadm settle, fails after several loops.

and after the script ends/fails, the symlinks are still there and never ever removed? even after a 3 minutes?

maybe add

# udevadm control --log-priority=info

or run in parallel:

# udevadm monitor

and attach the last relevant messages from /var/log/messages and the monitor output

Comment 8 Jan Safranek 2013-10-31 16:13:10 UTC

I can't reproduce the bug with 'udevadm control --log-priority=info'. This probably indicates the bug is dependent on exact timing / race.

Comment 9 Jan Safranek 2013-10-31 16:15:06 UTC

Created attachment 817971 [details]
Updated reproducer

Updated reproducer script with enhanced logging (basically run udevadm monitor in background + print timestaps as the script progresses).

Comment 10 Jan Safranek 2013-10-31 16:23:59 UTC

Created attachment 817976 [details]
script output

Log with udevadm + script mixed together. udevadm seems to write log in batches, I'm not able to judge if it's due to buffered output or some delay in kernel/udev.

Anyway, just beware the udevadmm logs are written late and merge them to appropriate place.

Comment 11 Jan Safranek 2013-10-31 16:24:52 UTC

Created attachment 817979 [details]
syslog messages

syslog from the time the reproducer was running

Comment 12 Jan Safranek 2013-10-31 16:30:56 UTC

umm, syslog shows this error at the time the bug occurs:

Oct 31 16:10:01 rhel6_test udevd-work[1613]: inotify_add_watch(6, /dev/md127, 10) failed: No such file or directory

(=1383232199.000000000 UNIX time)

Comment 13 Harald Hoyer 2013-11-04 16:24:54 UTC

and after the script ends/fails, the symlinks are still there and never ever removed? even after a 3 minutes????

Comment 14 Jan Safranek 2013-11-04 16:43:59 UTC

Yes, after a few minutes of poking around the system, the symlink is still there.

Comment 15 Harald Hoyer 2013-11-04 17:07:33 UTC

I guess stracing udevd and its workers will change the timing also

Comment 16 Harald Hoyer 2013-11-04 17:21:59 UTC

(In reply to Jan Safranek from comment #10)
> Log with udevadm + script mixed together. udevadm seems to write log in
> batches, I'm not able to judge if it's due to buffered output or some delay
> in kernel/udev.

redirect stdout to stderr for udevadm monitor.

stdout is buffered, stderr is not

Comment 17 Harald Hoyer 2013-11-11 15:18:53 UTC

http://cgit.freedesktop.org/systemd/systemd/commit/?id=bf9d233f781f27841be6638ee745e9c80bda5f4d

Comment 20 Leos Pol 2015-04-15 08:31:10 UTC

# rpm -q udev
udev-147-2.57.el6.x86_64
....
lrwxrwxrwx. 1 root root 8 15. dub 04.21 /dev/md/test -> ../md127
Failed after: 2


# rpm -q udev
udev-147-2.61.el6.x86_64
....
runs forewer

Honzo, thanks for reproducer.

Comment 21 errata-xmlrpc 2015-07-22 07:19:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1382.html