Bug 754518 (sd_revalidate_disk)
Summary: | oops in sd_revalidate_disk | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jaganadh G <jaganadhg> | ||||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 16 | CC: | 2011zm, abdulkarimmemon, adundovi, aldrian_math, ananttickoo, anthonio_2002, a.thiaville, bcl, bonomi4, claudiomar.costa, emmanuel.pacaud, gansalmon, garyjeffersii, gindar, gpreziuso, hc_hdez, hous3y, hsirig+redhat, itamar, jks, jonathan, kb6tal, kernel-maint, kxra, lacombar, madhu.chinakonda, marco.capile, mikolaj.bugzilla, online-print, rwahl, sam, samuel, sgruszka, terry1, thenscheid, trialero, waltersheridan, zulu | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | abrt_hash:23f7c5cf9455b4f31807fa7b6e4907300b0b31e6 | ||||||||||
Fixed In Version: | kernel-2.6.42.7-1.fc15 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2012-02-23 02:25:53 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Jaganadh G
2011-11-16 17:49:37 UTC
Package: kernel Architecture: x86_64 OS Release: Fedora release 16 (Verne) Comment ----- Hibernate and resume the system Package: kernel Architecture: x86_64 OS Release: Fedora release 16 (Verne) Comment ----- kernel bug Package: kernel Architecture: x86_64 OS Release: Fedora release 16 (Verne) Comment ----- Resume the system after hibernating *** Bug 757708 has been marked as a duplicate of this bug. *** drivers/scsi/sd.c:2356 static int sd_revalidate_disk(struct gendisk *disk) { struct scsi_disk *sdkp = scsi_disk(disk); ==> struct scsi_device *sdp = sdkp->device; sdkp is NULL here (scsi_disk(disk) returned NULL) *** Bug 757563 has been marked as a duplicate of this bug. *** *** Bug 757334 has been marked as a duplicate of this bug. *** *** Bug 758793 has been marked as a duplicate of this bug. *** *** Bug 759412 has been marked as a duplicate of this bug. *** Package: kernel Architecture: x86_64 OS Release: Fedora release 15 (Lovelock) Comment ----- I don't know how this happened exactly as I was switching active Windows while GNOME 3 was running, and the display briefly switched to the text console show the oops. Prior to this I had unmounted and then unplugged a USB stick. There was a patch posted for this, and a subsequent discussion of it: http://www.spinics.net/lists/linux-scsi/msg55636.html as far as I can tell, it died out with this final question: http://www.spinics.net/lists/linux-scsi/msg55654.html I spent the afternoon trying to recreate some sort of crash to no avail. Using this: while [ 1 ]; do sudo fdisk -l /dev/sdc; sleep 0.05; done To try and "read" a usb stick here, I then ran this: while [ 1 ]; do echo 1 > /sys/bus/usb/devices/1-1.2/bConfigurationValue ; sleep 1.5; echo 0 > /sys/bus/usb/devices/1-1.2/bConfigurationValue ; done to simulate plug/unplugging it. (I plugged/unplugged by hand for about 15min but that got really old.) There are all kinds of things in dmesg, but none of them are an oops. So it seems to be somewhat difficult to hit. *** Bug 761324 has been marked as a duplicate of this bug. *** *** Bug 761328 has been marked as a duplicate of this bug. *** *** Bug 767960 has been marked as a duplicate of this bug. *** *** Bug 768595 has been marked as a duplicate of this bug. *** *** Bug 768642 has been marked as a duplicate of this bug. *** 1. Insert a SD card with an adapter in a card reader slot. 2. Wait something like 30 seconds. 3. Gnome-Shell crashes. This has happend everytime I have done this, the gnome-shell cannot be recovered even if I make: f2 + r + enter The only way I have seen that the computer recovers "naturally" is when I suspend the machine while the crash is in progress, but the SD card is not recognized though. rating: (null) Package: kernel Architecture: x86_64 OS Release: Fedora release 16 (Verne) I may have caused this by plugging in a 4GB thumbdrive or 1-2seconds and then unplugging it. rating: (null) Package: kernel Architecture: x86_64 OS Release: Fedora release 16 (Verne) I have the same issue under F15 with kernel 2.6.41.4-1.fc15.x86_64. It happened twice when removing an 16GB USB stick containing just random data and no valid partition table. *** Bug 770606 has been marked as a duplicate of this bug. *** *** Bug 771116 has been marked as a duplicate of this bug. *** *** Bug 771055 has been marked as a duplicate of this bug. *** *** Bug 750389 has been marked as a duplicate of this bug. *** I got this report just after I've disconnected a properly unmounted mass storage USB device. rating: (null) Package: kernel Architecture: x86_64 OS Release: Fedora release 16 (Verne) Can reproduce on Fedora 16 using kernel 3.1.6-1.fc16.x86_64 when using a USB block device and a damaged cable that seems to 'bounce' the connection (connect, disconnect, repeatedly). Device has a valid 8GB FAT32 partition. Stack trace: http://pastebin.com/raw.php?i=bAHdm0J4 I hit this again: * Used digikam to import photos from my camera, mounted on /media/NIKON D40 * umounted it * ran sync * waited a few seconds (camera light was not blinking) * pulled cable Created attachment 552276 [details]
oops
kernel is 2.6.41.4-1.fc15.x86_64 and system is a quad core i7-2600K
Package: kernel Architecture: i686 OS Release: Fedora release 15 (Lovelock) Comment ----- I don't know how this happened. Package: kernel Architecture: x86_64 OS Release: Fedora release 15 (Lovelock) Comment ----- don't know Mounting another partition, then unmounting it. Package: kernel Architecture: i686 OS Release: Fedora release 16 (Verne) *** Bug 787029 has been marked as a duplicate of this bug. *** *** Bug 789057 has been marked as a duplicate of this bug. *** *** Bug 789052 has been marked as a duplicate of this bug. *** Created attachment 560699 [details]
Don't dereference sdkp if it is NULL
Here's some helpful info. I hit this exact bug on vanilla 3.1.6 doing the same thing as the bug reporters were doing. I ejected then unmounted an sdcard. Then it crashed the same backtrace. Looking at the code I have here:
(gdb) li *sd_revalidate_disk+0x39
0x26c9 is in sd_revalidate_disk (/home/rostedt/work/git/nobackup/linux-build.git/drivers/scsi/sd.c:2356).
2351 * @disk: struct gendisk we care about
2352 **/
2353 static int sd_revalidate_disk(struct gendisk *disk)
2354 {
2355 struct scsi_disk *sdkp = scsi_disk(disk);
2356 struct scsi_device *sdp = sdkp->device;
2357 unsigned char *buffer;
2358 unsigned flush = 0;
Seems that scsi_disk(disk) is returning NULL, which will crash when the next line is hit.
Looking at the disassembly of this code:
0x00000000000026c9 <+57>: mov 0x8(%rbx),%r15
And in my backtrace, %rbx is zero. The simple solution here is to return if sdkp is NULL. If it isn't suppose to be NULL, perhaps we can add a warn on, but lets not crash the kernel. It's becoming annoying (this is the third time it happened to me).
I just wrote the attached patch and will apply it to my custom kernel.
1. hibernating 2. plug-in mobile broadband usb stick 3. resuming 4. crash Package: kernel OS Release: Fedora release 16 (Verne) (In reply to comment #35) > Created attachment 560699 [details] > Don't dereference sdkp if it is NULL > > Here's some helpful info. I hit this exact bug on vanilla 3.1.6 doing the same > thing as the bug reporters were doing. I ejected then unmounted an sdcard. Then > it crashed the same backtrace. Looking at the code I have here: > > (gdb) li *sd_revalidate_disk+0x39 > 0x26c9 is in sd_revalidate_disk > (/home/rostedt/work/git/nobackup/linux-build.git/drivers/scsi/sd.c:2356). > 2351 * @disk: struct gendisk we care about > 2352 **/ > 2353 static int sd_revalidate_disk(struct gendisk *disk) > 2354 { > 2355 struct scsi_disk *sdkp = scsi_disk(disk); > 2356 struct scsi_device *sdp = sdkp->device; > 2357 unsigned char *buffer; > 2358 unsigned flush = 0; > > > Seems that scsi_disk(disk) is returning NULL, which will crash when the next > line is hit. > > Looking at the disassembly of this code: > > 0x00000000000026c9 <+57>: mov 0x8(%rbx),%r15 > > And in my backtrace, %rbx is zero. The simple solution here is to return if > sdkp is NULL. If it isn't suppose to be NULL, perhaps we can add a warn on, but > lets not crash the kernel. It's becoming annoying (this is the third time it > happened to me). > > I just wrote the attached patch and will apply it to my custom kernel. Yeah, that's basically the exact same patch that was sent upstream. James replied with something hand-wavy about that not being proper. The maintainer's response was... lacking. However, I think I agree with you. I don't care if it shouldn't be NULL, because it obviously is and we shouldn't crash. I'll poke upstream about this one more time and if it doesn't go anywhere we should just apply this patch. There's yet another thread asking for status on this upstream and no response. http://thread.gmane.org/gmane.linux.scsi/71496/focus=1233463 I'm applying Steven's patch across the releases. There's no good excuse for letting Fedora users trip over this at this point. Applied across all Fedora branches. *** Bug 787047 has been marked as a duplicate of this bug. *** Patch is just workaround, I prefer if we add WARN_ON(), so we will know this bug is still unfixed. (In reply to comment #41) > Patch is just workaround, I prefer if we add WARN_ON(), so we will know this > bug is still unfixed. I'm not opposed to that. I'll look at doing it later today, though I will probably use WARN_ONCE. Created attachment 561664 [details]
WARN_ONCE on null skdp
Stanislaw, Steven, does this look suitable?
I tried to think of printing more information that might be relevant to root cause, but that is going to be much more involved than doing a simple stop-gap patch. One would need to likely track the reference counting on the device from creation to tear-down.
*** Bug 790423 has been marked as a duplicate of this bug. *** (In reply to comment #43) > Stanislaw, Steven, does this look suitable? Patch is ok for me. (In reply to comment #45) > (In reply to comment #43) > > Stanislaw, Steven, does this look suitable? > Patch is ok for me. Thanks. I've updated the patch in the f17 and master branches since those are the newest kernel where this remains unfixed. *** Bug 790982 has been marked as a duplicate of this bug. *** Seems proper fix showed up: http://marc.info/?l=linux-scsi&m=132935572512352&w=2 (In reply to comment #48) > Seems proper fix showed up: > http://marc.info/?l=linux-scsi&m=132935572512352&w=2 Yep, I saw that too. I'm watching the thread to see if it works for some of the users reporting the issue (and to see if/when it goes into the block tree). I'll bring it into Fedora soon. *** Bug 795081 has been marked as a duplicate of this bug. *** kernel-3.2.7-1.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/kernel-3.2.7-1.fc16 kernel-2.6.42.7-1.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/kernel-2.6.42.7-1.fc15 Package kernel-2.6.42.7-1.fc15: * should fix your issue, * was pushed to the Fedora 15 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-2.6.42.7-1.fc15' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-2136/kernel-2.6.42.7-1.fc15 then log in and leave karma (feedback). kernel-3.2.7-1.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report. kernel-2.6.42.7-1.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report. *** Bug 799308 has been marked as a duplicate of this bug. *** |