Bug 130232

Summary: IDE subsystem causes infinite hotplug add/remove loop
Product: [Fedora] Fedora Reporter: David Zeuthen <davidz>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: alan, kjw, mclasen, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-11-16 23:09:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/messages logs none

Description David Zeuthen 2004-08-18 12:05:03 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7)
Gecko/20040706 Firefox/0.9.1

Description of problem:
The ide-cs driver may cause infinite hotplug add/remove loop if the
device files for the block devices it is managing are touched before
being mounted. Specifically, mount(1) reads from these device files to
determine the filesystem if the filesystem isn't explicitly given as
an option. Also, hal likes to probe for the filesystem to determine if
the media contains a filesystem without partition tables, specifically
this causes infinite loops (the current hal in rawhide detects whether
ide-cs is used and so avoids to cause this problem).

Notably, this doesn't happen if the compact flash media has no
partition tables (which may be rare, but it's not illegal). Also, this
bug also seems to affect ide-floppy with Zip drives, see
http://marc.theaimsgroup.com/?t=109239500600001&r=1&w=2 for details.


Version-Release number of selected component (if applicable):
kernel-2.6.7-1.517

How reproducible:
Always

Steps to Reproduce:
1. See attached document

Actual Results:  The ide-cs causes hotplug remove/add on the block
device representing a partition.

Expected Results:  The ide-cs driver shouldn't cause hotplug
remove/add on the block device representing a partition.

Additional info:

Another bug is that the place in sysfs for the PCMCIA (not Cardbus)
device is something like /sys/devices/ide2/... It would be nice to
include these patches which should be in 2.6-mm, and place the ide2 as
a child of the 16-bit PCMCIA devices (this is not happening today even
with these patches)

http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.7/2.6.7-mm7/broken-out/driver-model-and-sysfs-support-for-pcmcia-1-3.patch
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.7/2.6.7-mm7/broken-out/update-drivers-net-pcmcia-2-3.patch
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.7/2.6.7-mm7/broken-out/update-drivers-net-wireless-3-3.patch

He does a few things:

        - actually use the device model (struct device) with pcmcia
          devices
        - create sysfs entries for the device
        - generate hotplug events

The latter two patches create the network class device for PCMCIA
networking devices, since that is done manually for each device via a
macro.

Comment 1 David Zeuthen 2004-08-18 12:06:46 UTC
Created attachment 102836 [details]
/var/log/messages logs

This describes how to reproduce the bug

Comment 2 David Zeuthen 2004-08-18 12:11:38 UTC
Oh, '# touch /dev/hde1' just before the Oops should be '# remove
PCMCIA card reader'. Sorry about that.

Comment 3 Alan Cox 2004-08-18 12:28:29 UTC
IDE hotplug in current 2.6 is terminally broken. I've been working on
fixing it for about a week and its approaching the point of basic
stability. Even then it cannot fully support sysfs and making it
handle sysfs is a major job that would require full time engineering
resource and a lot of prior upstream discussion and design. 

Secondly ide-cs does nothing to deal with hotplug events, you are
discussing a property of the core block layer code with removable
devices where each open causes a recheck of the partition data. Thats
"UPSTREAM" and isn't PCMCIA specific. Deal with it in your HAL design
I suspect.

With my current patches I can rmmod/insmod ide drivers without
crashes, security holes in /proc and so on. I'm now working on getting
them in a form the maintainer is happy about and merged upstream. 
Until then its Arjan's call but essentially /proc/ide, and ide
hotplugging of any kind are not safe in the FC1/FC2 kernel.


Comment 4 David Zeuthen 2004-08-18 12:45:19 UTC
So if hotplug remove/add is a property of the core block layer code
when rereading partition data, how come I'm only seeing this for block
devices backed by IDE and not USB or IEEE1394? Are you trying to say
this is a bug upstream?

Comment 5 Alan Cox 2004-08-18 13:07:25 UTC
I'm not sure why USB doesn't trigger it. Perhaps that uses a different
approach for reading partition tables. The IDE layer itself has
nothing to do with the hotplug events however. In fact its gloriously
ignorant on a lot of hotplug issues.

One of the problems IDE has is that there isn't a good way to learn
about media changes reliably. So each open assumes the media might
have changed - much like floppy except we don't partition them.


Comment 6 David Zeuthen 2004-08-18 13:25:10 UTC
I wasn't clear; I'm implying it's a bug that hotplug events are
triggered from the block layer for IDE devices, not that USB etc.
should also trigger them! That would be a nightmare.

Here's why triggering the hotplug events is a bad idea: from userspace
we do want to open the device even if it's not mounted because we want
to read off the drive_id (serial# etc.) from the top-level block
device and volume_id (label etc.) from the partition-level block
devices stuff before even creating a mount point (the mount point may
include the label). Just look at some of the callouts included with
udev or look at the hal code.

Another general problem is that even though USB Mass Storage devices
has mechanisms for reporting media changes very few vendors correctly
implements it. So, in conclusion, all polling for removable media
should occur from userspace.

Therefore, is it correct to say that these hotplug events from my
PCMCIA Compact Flash card readers is a (upstream) bug? That was my
original question anyway :-)

Comment 7 Alan Cox 2004-08-18 13:31:33 UTC
When the last user closes an IDE device we discard all information
about it. We simply don't know if its the same CF card next open. We
don't even get a media changed error on an I/O because the IDE
controller is in the CF card so it hasn't seen a media change.

So every time you are first opener we will generate a partition table.
This I suspect generates the hot plug events. 

Side item - there are two cases to teach your HAL code about for drive
vendor and model where duplicates occur which might be worth knowing
about.

#1 Maxtor in the model and a serial of "M0000000000000000000"
#2 "Integrated Technology Express" in the model/vendor info. h/w IDE
raid volumes all with the same id/serial.

Alan


Comment 8 David Zeuthen 2004-08-18 13:46:41 UTC
Well, USB and all the other buses (SCSI emulation I presume) does it
somewhat differently [1] and also supports removable media. Since the
kernel is an abstraction mechanism this is bad, it breaks the
abstraction, as the behaviour depends on physical connection mechanisms. 

I think by now it's sane to assume that some userspace process is
polling on the devices with removable storage if the use of the system
is desktop etc. (hal does this for Fedora but right now I have to
blacklist the ide-cs stuff, so yeah, it works, but I can't read the
volume label before creating the mount point etc)

I guess upstream agrees here as well cf. the 'removable' file in sysfs
for every block device (which is only an approximation btw), so how
about changing the behaviour in the IDE code?

[1] : btw, why is that, isn't the kernel layered, e.g. partition table
detection should be above the block layer? (I'm just trying to pick up
some kernel tricks on the side, thank you :-)

Comment 9 Alan Cox 2004-08-18 14:36:31 UTC
The IDE hardware doesn't support a change in the way the IDE code
assumes that media can change without warning. I refuse to break that
and let users trash disks without warning just because it gives HAL
some hiccups.

The partition table scanning is a seperate library routine called by
various drivers.

Alan


Comment 10 Alan Cox 2004-08-18 14:50:31 UTC
PS: not just ide-cs - all ide removables will do this I suspect - eg
ide floppies, M/O drives 


Comment 11 David Zeuthen 2004-08-18 15:09:53 UTC
As I've already stated this is not specific to HAL at all - it applies
to mount(1) (when using -t auto), udev callouts and anything else that
opens the device before it's mounted. While this may not have been an
issue in the past, it certainly is now given that the kernel sends
hotplug events and udev creates/removes device nodes based on this.
This is a real problem. Even for mount(1).

To me this seems like a split personality :-). On the one hand the
kernel refuses to poll for new media (which is fair enough), and the
other hand it sends lots of hotplug events if userspace tries to. For
some devices. Either way, hal already works around this issue, so I'll
just shut up now.

Thanks,
David

Comment 12 David Zeuthen 2004-11-16 20:45:44 UTC
Reopening this bug as it is the root cause for a regression in FC3.

Comment 13 Kevin Wang 2005-08-14 19:34:31 UTC
I note the following behaviour in FC3 (full updates as of 2005.08.14):

mkdir /var/log/hotplug
touch /var/log/hotplug/events
cardctl eject 0
tail -f /var/log/hotplug/events &
cardctl insert 0      # slot 0 contains cf card

You should see in the log:
add for pcmcia
remove for module /module/ide_cs
remove for module drivers /bus/pcmcia/drivers/ide-cs
add for module /module/ide_cs
add for module drivers /bus/pcmcia/drivers/ide-cs
(pause as pcmcia scripts run)
add for ide
add for block (hdc)
add for block (hdc1)
remove for block (hdc1)
add for block (hdc1)

and I'm not trying to mount the drive or anything.  hotplug just seems to
excessively thrash on the adding and deleting of hdc1.  here's some more
strangeness:

# ls -al /dev/hdc1 ; mount -v /dev/hdc1 /mnt/cf ; ls -al /dev/hdc1
brw-rw----  1 root disk 22, 1 Aug 14 11:30 /dev/hdc1
mount: you didn't specify a filesystem type for /dev/hdc1
       I will try all types mentioned in /etc/filesystems or /proc/filesystems
Trying vfat
mount: special device /dev/hdc1 does not exist
ls: /dev/hdc1: No such file or directory

and simultaneously the log shows a remove/add for hdc1 pair. so the act of
mounting causes a remove? That sure is strange.

However, mount specifying -t vfat seems to work, but only because it seems to
beat hotplug to the punch.  note that the trailing ls still fails to find the
device, because /dev/hdc1 has been removed by the same hotplug remove/add pair
that you saw above:

# ls -al /dev/hdc1 ; mount -v -t vfat /dev/hdc1 /mnt/cf ; ls -al /dev/hdc1
brw-rw----  1 root disk 22, 1 Aug 14 11:33 /dev/hdc1
/dev/hdc1 on /mnt/cf type vfat (rw)
ls: /dev/hdc1: No such file or directory

Similarly, hdparm -i /dev/hdc causes hotplug to remove/add hdc1

Indeed, the above suggested "touch /dev/hdc1" triggers a hotplug remove/add pair.

What does this break? /etc/pcmcia/ide for one. Though bug 120486 is suggesting
that it should be migrated out of the pcmcia scripts into hotplug.  but while
hotplug exhibits this hyper-aggressive remove/add behaviour, that just isn't
going to happen.

Comment 14 Eli 2005-10-29 16:08:50 UTC
This is still broken with a fully updated FC3 as of 2005-10-29.

*poke* *poke* :)

Comment 15 Alan Cox 2005-12-07 10:03:09 UTC
FC3 wont fix. Upstream changes for this did get discussed and will probably get
into FC5, and maybe FC4 as the upstream kernel changes.



Comment 16 David Zeuthen 2006-08-07 01:48:10 UTC
It appears this is working now (and probably have for some time), so I've
removed the special handling for ide-cs in hal.
 http://gitweb.freedesktop.org/?p=hal;a=commit;h=602bbb270d0851047a0bebc442a1fdc92a4f91c7

Do you know when in 2.6 this change was introduced? The git history don't tell
me much...

Comment 18 Alan Cox 2006-08-08 10:27:19 UTC
It was done as part of fixing ide_cs handling logic, unrelated to but happening
tofix the problem you see in that case. Hal will I suspect still trigger the
same behaviour if faced with a true removable such as an Iomega or maybe a clik
drive. Actually I've got a Clik I should see if we still handle that right.


Comment 19 Alan Cox 2006-08-08 10:29:48 UTC
Ok we handle clik! as ide-floppy and it correctly avoids duplicate scans because
it has proper media change logic.


Comment 20 David Zeuthen 2006-08-08 15:00:43 UTC
Yup, that's here

 http://gitweb.freedesktop.org/?p=hal;a=blob;h=7cb74ee685d9da847c1060ea1277211ee6b8b657;hb=7b1d143b988b378b3269b767259d387e64b14718;f=fdi/preprobe/10osvendor/10-ide-drives.fdi

Right now we only support partitioned media (with the fs on partition 4) but
that's about to change in a release or two...