Description of problem: Just loaded a brand new DVD-RAM into a Plextor DVD PX-810SA (SATA) drive and tried to format it with dvd-rw+format. The format never finishes. It reports progress until up to about 70% of the disc size and then aborts (the computer may even freeze, if my memory serves me well). When trying to mount the DVD-RAM (without first creating a new filesystem, something I have not been successful at, either) the computer immediately CRASHES. I just cannot use this drive for anything useful (even gave up trying to upgrade my system to Fedora 7 with it, but in this case there seems to exist a widely reported bug that may account for that). Always failed (to me). Don't think it is this exact drive, since the vendor claims it works OK with one of its Windows machines, and because I also tried briefly another drive (same model).
Plextor drives may not be completly mmc compliant. Besides of that, the computer should _not_ crash. Does it crash _hard_? Or are you able to do some things after the crash?
(In reply to comment #1) > Plextor drives may not be completly mmc compliant. Besides of that, the computer > should _not_ crash. > Does it crash _hard_? Or are you able to do some things after the crash? It really crashes, the first thing I see after that is the POST screen. (this happens when trying to mount the disc; when trying to format I can often continue working on other tasks)
kernel should not crash on mounting a dvd-ram
There is no need to format or use any special tools at all with DVD-RAM discs. To create a bootable disk from the Fedora install ISO, just: dd if=<filename> of=<cd device> bs=64k How are you trying to mount the disc? It should be pre-formatted as UDF.
Here is all the info I can remember right now. I doubt I'll be able to tell you more than this... 1) the drive is a Plextor DVD PX-810SA (SATA), which according to some rumours on the web is just a model from another maker, in disguise (possibly with Plextor's own firmware) 2) the discs are marked Verbatim DVD-RAM 4.7 GB 120 min 3x certified bought brand new 3) I was never able to format the discs with dvd+rw-format; used something like dvd+rw-format -force=full -ssa=default /dev/scd0 and I think I also tried other combinations of the options. The format stopped after reaching about 70% of the disc (this took a long time, certainly more than 30 min) and then the system would _hang_ on some occasions (I think). Maybe there is an implicit time bound in dvd+rw-format that should be increased so that the formatting has enough time to finish. 4) supposedly, the discs come pre-formatted, but (most likely) with no predefined filesystem; I wasn't able to define a filesystem, because mkfs.ext2 would tell me that no changes could be saved: prompt# /sbin/mkfs.ext2 /dev/scd0 mke2fs 1.40.2 (12-Jul-2007) /dev/scd0 is entire device, not just one partition! Proceed anyway? (y,n) y /dev/scd0: Read-only file system while setting up superblock The device is not declared in fstab, does it make a difference? Alternatively: prompt# /sbin/mkfs.ext2 -F -F /dev/scd0p1 mke2fs 1.40.2 (12-Jul-2007) mkfs.ext2: No such file or directory while trying to determine filesystem size Or: prompt# /sbin/mkfs.ext2 /dev/scd0p1 mke2fs 1.40.2 (12-Jul-2007) Could not stat /dev/scd0p1 --- No such file or directory The device apparently does not exist; did you specify it correctly? And if I try fdisk: prompt# fdisk /dev/scd0 You will not be able to write the partition table. Note: sector size is 2048 (not 512) Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel Building a new DOS disklabel. Changes will remain in memory only, until you decide to write them. After that, of course, the previous content won't be recoverable. Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite) Command (m for help): p Disk /dev/scd0: 4580 MB, 4580769792 bytes 255 heads, 63 sectors/track, 139 cylinders Units = cylinders of 16065 * 2048 = 32901120 bytes Device Boot Start End Blocks Id System Command (m for help): q If I try to create a (single) partition it then tells me: Unable to write /dev/scd0 5) It is the first time I am attempting to record DVD-RAM... I don't know much about UDF, I just tried mkudffs --udfrev=0x0201 --blocksize=4096 --vid=DVD-RAM --lvid=DVD-RAM --media-type=dvdram /dev/scd0 and got the message: trying to change type of multiple extents 6) so in despair I tried to mount a disc as mount /dev/scd0 /mnt/mount_point and the net result was an immediate crash of the computer (I tried this a few times and the result was always the same). Even if there is no filesystem the result should be just an error message sent to the screen, no? 7) To sum up: I haven't executed a single successfull command yet!
What does /proc/sys/dev/cdrom/info say about the CD drive? Please post the contents...
prompt# cat /proc/sys/dev/cdrom/info CD-ROM information, Id: cdrom.c 3.20 2003/12/17 drive name: sr1 sr0 drive speed: 40 0 drive # of slots: 1 1 Can close tray: 1 0 Can open tray: 1 0 Can lock tray: 1 1 Can change speed: 1 1 Can select disk: 0 0 Can read multisession: 1 1 Can read MCN: 1 1 Reports media changed: 1 1 Can play audio: 1 1 Can write CD-R: 1 0 Can write CD-RW: 1 0 Can read DVD: 0 0 Can write DVD-R: 0 0 Can write DVD-RAM: 0 0 Can read MRW: 1 0 Can write MRW: 1 0 Can write RAM: 1 1 Observations: there are really two drives, the other being PX-W4012A (a CD recorder from Plextor too, which has been working fine as much as I am aware, although I don't use it often); the actual devices are /dev/scd0 and /dev/scd1 (not sr0 and sr1 as listed in the info file, but that is probably OK); looking at the column for the DVD it sure looks a bit "unsupported"...
I have this for one that I know works (PATA, not SATA though.) Yours is SATA and isn't detecting any DVD capability at all. drive name: sr0 drive speed: 24 drive # of slots: 1 Can close tray: 1 Can open tray: 1 Can lock tray: 1 Can change speed: 1 Can select disk: 0 Can read multisession: 1 Can read MCN: 1 Reports media changed: 1 Can play audio: 1 Can write CD-R: 1 Can write CD-RW: 1 Can read DVD: 1 Can write DVD-R: 1 Can write DVD-RAM: 1 Can read MRW: 1 Can write MRW: 1 Can write RAM: 1 So something is very wrong somewhere with yours. Please post the contents of /var/log/dmesg so we can see what gets detected. Alan, does the ATA subsystem work with SATA DVD-RAM drives?
Created attachment 237981 [details] Contents of dmesg Contents of dmesg in attachment.
libata is happy with SATA DVD devices and actually just uses the core SCSI code for them. Might be worth forcing the controller out of ADMA mode ?
From dmesg: sr0: scsi3-mmc drive: 0x/0x caddy So it's not detecting the drive capabilities at all... Paulo, can you try disabling ADMA mode? You would need to edit /etc/modprobe.conf and then rebuild the initrd to get the driver options into the initrd. Directions are at: https://fedoraproject.org/wiki/KernelCommonProblems
From new dmesg: (!) sr0: scsi3-mmc drive: 39x/39x writer dvd-ram cd/rw xa/form2 cdda tray The new /proc/sys/dev/cdrom/info looks very promising too... so let us try mkfs. prompt# /sbin/mkfs.ext2 /dev/scd0 mke2fs 1.40.2 (12-Jul-2007) /dev/scd0 is entire device, not just one partition! Proceed anyway? (y,n) y Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 560000 inodes, 1118352 blocks 55917 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=1149239296 35 block groups 32768 blocks per group, 32768 fragments per group 16000 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736 Writing inode tables: done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 32 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. Next try mount... prompt# mount /dev/scd0 /mnt/dvd prompt# df -k | grep scd0 /dev/scd0 4403064 8760 4170636 1% /mnt/dvd Now copy a full (but small) directory, unmount, eject, close tray, mount again, and try a recursive diff. Success! I'm speechless ! I will try fdisk, dvd+rw-format, and mkudffs hopefully tomorrow or then early next week. Thanks!
Can we detect ATAPI and drop out of ADMA mode automatically? Or is it supposed to work with ATAPI devices?
fdisk: fdisk seems able to modify the partition table of the DVD-RAM disk; for example, after creating some partitions I ran fdisk a second time just to check whether the changes had been saved and got prompt# fdisk /dev/scd0 Note: sector size is 2048 (not 512) Command (m for help): p Disk /dev/scd0: 4580 MB, 4580769792 bytes 255 heads, 63 sectors/track, 139 cylinders Units = cylinders of 16065 * 2048 = 32901120 bytes Device Boot Start End Blocks Id System /dev/scd0p1 1 100 3212874 83 Linux /dev/scd0p2 101 139 1253070 83 Linux But then trying to use mkfs ( mkfs.ext2, mkfs.ext3, mkreiserfs ) got me nowhere, since I get messages like Could not stat /dev/scd0p1 --- No such file or directory The device apparently does not exist; did you specify it correctly? or Stat of the device '/dev/scd0p1' failed. As I said, I'm new to DVD-RAM, so maybe I simply should not partition the disk... or use a different notation for the partitions... When using the full device /dev/scd0 as argument to mkfs then I seem to have obtained successful filesystem creation with the following: mkfs.ext2, mkfs.ext3, mkreiserfs (with -f) As for mkudffs it also seems OK: prompt# mkudffs --udfrev=0x0201 --blocksize=4096 --vid=DVD-RAM --lvid=DVD-RAM --media-type=dvdram /dev/scd0 start=0, blocks=8, type=RESERVED start=8, blocks=3, type=VRS start=11, blocks=245, type=USPACE start=256, blocks=1, type=ANCHOR start=257, blocks=16, type=PVDS start=273, blocks=1, type=LVID start=274, blocks=1117821, type=PSPACE start=1118095, blocks=1, type=ANCHOR start=1118096, blocks=239, type=USPACE start=1118335, blocks=16, type=RVDS start=1118351, blocks=1, type=ANCHOR It remains to check dvd+rw-format, which I may need to perform a low level blanking (or maybe I can use badblocks?) By the way, is there a simple explanation of what ADMA is?
You can't low-level format a DVD-RAM disk, they are permanently formatted and work just like a removeable hard drive. If you want to blank one, just: dd if=/dev/zero of=/dev/scdX bs=64K Since there's no support for partitioned CD's, you probably can't partition the DVD-RAM. LVM might work. Best bet is to just stick with UDF on unpartioned disks, though.
The driver is supposed to switch the controller out of ADMA mode internally if an ATAPI device is connected, and it seems that does happen here (see the code in nv_adma_slave_config which sets different DMA parameters for ATAPI devices and also sets the NV_ADMA_ATAPI_SETUP_COMPLETE flag to indicate that requests should be issued and completed in non-ADMA mode). Therefore theoretically the adma=0 option should make no difference and should not be needed. Apparently in some way it does, though. It's curious (and rather unfortunate) that the sr driver is saying "0x/0x caddy" but doesn't indicate why. Is there a way to make it more verbose about what happened?
Collar Jeff Garzik on sata_nv stuff. The 0x/0x is hard to tell - I see no obvious errors so it may just be a side effect of whatever the real breakage is.
Created attachment 242451 [details] new dmesg file The new dmesg file seems to differ from the old file at some other points, for the ATA/SATA thing. One thing that caught my attention was how some hexadecimal constants could be so different. Maybe that is just part of what the adma option does... if not, it could provide you with a clue. New dmesg as attachment.
The difference in the cmd and ctl values is expected, as we are using MMIO instead of PIO to access the controller. I see your machine has memory mapped above the 4GB mark. In ADMA mode we can access the entire 64-bit address space but in legacy mode, as we switch into for ATAPI, we can only access 32-bit (4GB). We are calling blk_queue_bounce_limit with the lower mask, but maybe that's not working somehow. Can you try booting with ADMA enabled and the option mem=3000M to see what that does? That will prevent that high memory from being used which may make a difference. If that doesn't help, could you try moving the DVD drive to another SATA port on the motherboard? Each controller device runs two of the ports on the board, and you have the DVD and the HD on the same one. Could be an issue with ATAPI and non-ATAPI on the same controller.
Created attachment 242611 [details] dmesg with mem=3000M and original initrd-2.6.22.9-91.fc7.img Robert, that is probably a good guess, because i tried it and things appear to be fine at the moment; in fact I am now (running with the original initird and the memory bound you suggested) verifying a backup I did a couple of hours ago on a DVD-RAM disk while using the modified initrd for which adma=0, and it seems to be doing its work (if somewhat slowly, but the backup amounts to about 1.5 GB) This reminds me that when I had problems doing the upgrade to FC7 the behaviour of the installation process seemed to depend on the amount of RAM installed, either 2GB or 4GB (ie, it never worked right, I think, but even then the sequence of options presented to me by the installation procedure was not the same, as far as I remember). Then after using memtest and finding no problems, I borrowed another drive and forgot about it. Third dmesg follows, for original initrd-2.6.22.9-91.fc7.img and mem=3000M PS I occasionally get sporadic errors when copying (and then verifying) large directories (say hundreds of megabytes or gigabytes); is there any suspicion about eg the nv chipset being buggy? To provide an example, the recursive diff I mentioned before has now been run twice with conflicting and erroneous output: the first time it told me that a single file was different, and in the second time the contents of an unrelated directory were different; however, after localizing the test, diff would report no differences. I also remember something of the kind involving two hard drives, the internal ATA hard disk and a similar external one connected via USB.
Created attachment 246361 [details] Patch to fix ATAPI issues with memory >4GB It looks like there were some problems with ATAPI device handling where the PRD table and padding buffer could be allocated with an incorrect DMA mask for operating in legacy mode. Can you test this patch? It's against 2.6.24-rc1-git10, but I think it should also apply to 2.6.23.
If I understand correctly, you would like me to apply this patch to a source version of the kernel, then compile and install it. Please note that I am mostly used to yum and (binary?) rpms... OK, I have also used configure/Make a few times, but _never_ for the kernel... (assuming it's the same process) If it's somewhat different, are there any instructions available? Can one clean the "mess" afterwards without much trouble? I downloaded a couple of kernels from kernel.org, and just tried patch for a start; you'll find the results below... prompt# patch ./linux-2.6.23.1/drivers/ata/sata_nv.c sata_nv-fix-atapi-issues-over-4gb.patch patching file ./linux-2.6.23.1/drivers/ata/sata_nv.c Hunk #1 FAILED at 247. Hunk #2 FAILED at 748. Hunk #3 FAILED at 782. Hunk #4 succeeded at 693 (offset -133 lines). Hunk #5 succeeded at 1172 (offset -1 lines). Hunk #6 succeeded at 1056 (offset -133 lines). Hunk #7 FAILED at 1266. 4 out of 7 hunks FAILED -- saving rejects to file ./linux-2.6.23.1/drivers/ata/sata_nv.c.rej prompt# patch ./linux-2.6.24-rc1/drivers/ata/sata_nv.c sata_nv-fix-atapi-issues-over-4gb.patch patching file ./linux-2.6.24-rc1/drivers/ata/sata_nv.c Hunk #5 succeeded at 1172 (offset -1 lines). Hunk #7 FAILED at 1399. 1 out of 7 hunks FAILED -- saving rejects to file ./linux-2.6.24-rc1/drivers/ata/sata_nv.c.rej How do I get the exact version of the kernel you mentioned? (there are some patches on kernel.org but I am not sure about the conventions!)
There are directions for building a custom Fedora kernel here: http://fedoraproject.org/wiki/Docs/CustomKernel
If you want to try it against the exact version, you'd need to first get 2.6.23 (not 2.6.23.1 - use the B link on the main kernel.org page for 2.6.24-rc1 to get plain 2.6.23), then get the 2.6.24-rc1 patch and apply it, then get the 2.6.24-rc1-git10 patch (which is off the main page now, but git11 would likely work just as well, or you can dig through their download site to find git10) and apply that. Or, a Fedora development kernel source RPM would likely work just as well, assuming it's tracking upstream -git..
I am now more confused than before and need help, please. The link on comment #23 leads to a page where it is assumed one has a src.rpm of the kernel, right? I simply have a source tree for _some_ versions of the kernel, and I can't seem to get src.rpms for 2.6.23 either (kernel.org does not distribute src.rpms, right?). Updates for Fedora 7 seem to have jumped from 2.6.22.9-91 to 2.6.23.1-10. So while I could get a src.rpm for 2.6.23.1-10 you seem to imply that is no good. Just before reading comment #24 I had tried applying patch 2.6.24-rc1-git10 on 2.6.24-rc1, and then your patch on the result, with no apparent errors; however, you seem to imply that is not OK, not to mention that I have no src.rpm either.
If you've got 2.6.24-rc1-git10 patched without errors, you should be able to do that. The procedure on the Fedora site is for building a kernel from a source RPM, not from the kernel source tarball. To do that you would: Install the latest Fedora update kernel, and copy the config-(version) file into the root directory of the patched kernel tree, renaming it to .config. Then do: make oldconfig This will prompt you for the settings for all the config settings which are new in 2.6.24. You can likely just hit enter to accept the defaults for all of them. Then do a regular "make", then when it finishes, log in as root and do: make modules_install install That should get you a bootloader entry that you can select on reboot.
Created attachment 247451 [details] dmesg for 2.6.24-rc1-git10 The patch does not work yet, I'm afraid. With 2.6.24-rc1-git10 the system boots (I am running it right now) although it prints "sysctl table check failed" With your patch sata_nv-fix-atapi-issues-over-4gb.patch the system does not seem to boot (if it does it will take a very long time): a bit after printing the above message it then enters into a sequence (cycle?) of "Call Trace" error notifications, clearly involving the terms "nv_adma_interrupt" "libata" "scsi_mod" Attachment is for dmesg with normal 2.6.24-rc1-git10
Not a lot to go on just from that I'm afraid.. Could you capture the call trace contents, maybe by taking a picture of the screen with a digital camera? If you boot with "vga=ask" on the kernel command line, it will prompt you to select the console video mode. If you select the mode with the most rows (80x60 or something) it will let you see more of the crash on the screen.
Better still would be having a boot option like calltracedelay=50s so that one could have a good look at each report (Cntr-S seems not to work) Now that I have a properly compiled kernel tree, would it be very difficult for you to give me a patch for the routine that displays Call Traces, ie, assuming it is possible? (I mean a simple delay implemented via a local parameter) I don't have a digital camera myself, I'll see what I can do.
The fedora kernel has a patch for slowing down the boot messages: linux-2.6-debug-boot-delay.patch You add "boot_delay=<number>" to the command line to enable it.
Here's what I have for now -- let's hope there is no transcription error... I'll try to confirm anything that may not seem quite right. Please note that the output may have been _slightly_ reformatted. Welcome to Fedora Press 'I' to enter interactive startup. Setting clock (utc): Tue Nov 6 ... Starting udev: Unable to handle NULL pointer dereference at 0000000000000000 RIP: [<ffffffff880f30de>] :libata:ata_qc_prep+0xdb/0x156 PGD 101d1a067 PUD 10218a067 PMD 0 Oops: 0002 [1] SMP CPU 1 Modules linked in: button sr_mod k8temp soundcore pata_amd hwmon i2c_nforce2 pcspkr forcedeth snd_page_alloc cdrom i2c_core sg sata_nv ata_generic libata sd_mod scsi_mod ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd Pid: 830, comm: modprobe Not tainted 2.6.24-rc1-git10 #0 RIP: 0010:[<ffffffff880830de>] [<ffffffff880830de>] :libata:ata_qc_prep+0xdb/0x156 RSP: 0000:ffff8101025d1968 EFLAGS: 00010002 RAX: 0000000000000000 RBX: ffff810102c2c0e0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000080 RDI: ffff81010247d600 RBP: ffff810102c2c000 R08: 0000000000000000 R09: 0000000000c00000 R10: 0000000000000001 R11: ffffffff880ab000 R12: ffff810102c2e420 R13: ffff810102c2c000 R14: ffff81010247d600 R15: ffff810102c2e2f0 FS: 00002b21005576f0(0000) GS:ffff810100001800(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000001024d6000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 At some later point: Call Trace: <IRQ> [<ffffffff880ac9f7>] :sata_nv_adma_interrupt+0x19/0x3cc [<ffffffff8106a88c>] handle_IRQ_event+0x25/0x53[<ffffffff8106a88c>] handle_ [<ffffffff8106be07>] handle_fasteoi_irq+0x94/0xd1 [<ffffffff8100e65b>] do_IRQ+oxf1/0x162 [<ffffffff8100c3a1>] ret_from_intr+0x0/0xa <EOI> [<ffffffff811175b3>] number +0x37/0x1f9 [<ffffffff8111768a>] number +0x10e/0x1f9 [<ffffffff810a6ee9>] d_rehash+0x21/0x23 [<ffffffff810d35bc>] proc_lookup+0x96/0xea [<ffffffff81072e6f>] __rmqueue_smallest+0x89/0x103 [<ffffffff81118371>] vsnprintf+0x554/0x598 [<ffffffff810afc25>] seq_printf+0x67/0x8f [<ffffffff81074b04>] __alloc_pages+0x7e/0x313 [<ffffffff81074065>] __get_free_pages+0xe/0x4d [<ffffffff810d55c3>] show_stat+0x3af/0x41a [<ffffffff810b00da>] seq_read+0x106/0x284 [<ffffffff81011887>] arch_get_unmapped_area+0x184/0x1f9 [<ffffffff810cf2d9>] proc_reg_read+0x7e/0x99 [<ffffffff8109811b>] vfs_read+0xc3/0x16b [<ffffffff810984fc>] sys_read+0x45/0x6e [<ffffffff8100be8e>] system_call+0x7e/0x83
Sorry, the earlier call trace should read as follows. Call Trace: <IRQ> [<ffffffff880ac9f7>] :sata_nv_adma_interrupt+0x19/0x3ec [<ffffffff8106a88c>] handle_IRQ_event+0x25/0x53 [<ffffffff8106be07>] handle_fasteoi_irq+0x94/0xd1 [<ffffffff8100e65b>] do_IRQ+0xf1/0x162 [<ffffffff8100c3a1>] ret_from_intr+0x0/0xa <EOI> [<ffffffff811175b3>] number +0x37/0x1f9 [<ffffffff8111768a>] number +0x10e/0x1f9 [<ffffffff810a6ee9>] d_rehash+0x21/0x34 [<ffffffff810d35bc>] proc_lookup+0x96/0xea [<ffffffff81072e6f>] __rmqueue_smallest+0x89/0x103 [<ffffffff81118371>] vsnprintf+0x554/0x598 [<ffffffff810afc25>] seq_printf+0x67/0x8f [<ffffffff81074b04>] __alloc_pages+0x7e/0x313 [<ffffffff81074065>] __get_free_pages+0xe/0x4d [<ffffffff810d55c3>] show_stat+0x3af/0x41a [<ffffffff810b00da>] seq_read+0x106/0x284 [<ffffffff81011887>] arch_get_unmapped_area+0x184/0x1f9 [<ffffffff810cf2d9>] proc_reg_read+0x7e/0x99 [<ffffffff8109811b>] vfs_read+0xc3/0x16b [<ffffffff810984fc>] sys_read+0x45/0x6e [<ffffffff8100be8e>] system_call+0x7e/0x83
I suspect either the stack trace is missing some things or the two crashes don't really line up, since it shouldn't be able to get from nv_adma_interrupt to ata_qc_prep, at least not in one step. Can you tell if these are from the same oops output? The key parts needed would be the RIP line and the first few call trace entries..
If you are saying that these two sets of messages should not be consecutive, you may well be right. I did write "At some later point:" between the two. In no way did I mean to say that the call trace I reported was the first one. I didn't find the patch linux-2.6-debug-boot-delay.patch yet, I'll try a bit more (please note that this report does not refer to a patched (?) Fedora kernel!). So I cannot really tell what is the first call trace error dumped on the screen.
I believe that patch has been merged already in 2.6.24-rc1, so you should be able to use it already. The delay value is in milliseconds, so boot_delay=1000 would delay 1 second for each line..
Another change you could try - I don't have time to make up a proper patch right now, but in drivers/ata/sata_nv.c in nv_adma_qc_prep, around line 1383, you've got: if (nv_adma_use_reg_mode(qc)) { nv_adma_register_mode(qc->ap); ata_qc_prep(qc); return; } Can you try changing that to: if (nv_adma_use_reg_mode(qc)) { BUG_ON( (qc->flags & ATA_QCFLAG_DMAMAP) && !(pp->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) ); nv_adma_register_mode(qc->ap); ata_qc_prep(qc); return; } i.e. add the BUG_ON line, and see what happens there? I suspect what's happening is that somebody's issuing a DMA command on a device which ends up going through the register interface but shouldn't be because we haven't set up the port for non-ADMA mode yet, but we can't do register-mode DMA in ADMA mode. In that case with this change it will still crash, but hopefully earlier and in a more useful manner..
The first part is very similar: Welcome to Fedora Press 'I' to enter interactive startup. Setting clock (utc): Tue Nov 6 ... Starting udev: Unable to handle NULL pointer dereference at 0000000000000000 RIP: [<ffffffff880830de>] :libata:ata_qc_prep+0xdb/0x156 PGD 102db2067 PUD 102dc8067 PMD 0 Oops: 0002 [1] SMP CPU 1 Modules linked in: button sr_mod k8temp soundcore pata_amd hwmon i2c_nforce2 pcspkr forcedeth snd_page_alloc cdrom i2c_core sg sata_nv ata_generic libata sd_mod scsi_mod ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd Pid: 875, comm: modprobe Not tainted 2.6.24-rc1-git10 #0 RIP: 0010:[<ffffffff880830de>] [<ffffffff880830de>] :libata:ata_qc_prep+0xdb/0x156 RSP: 0018:ffff8101025d1968 EFLAGS: 00010002 RAX: 0000000000000000 RBX: ffff810102c200e0 RCX: ffff810102095968 RDX: 0000000000000000 RSI: 0000000000000080 RDI: ffff810102795100 RBP: ffff810102c2c000 R08: 0000000000000000 R09: 0000000000c00000 Now the first call trace, I think... (the extra BUG_ON line was included in the source code) Process modprobe (pid:738, threadinfo ffff810102086000, task ffff8101020f5180 Stack: ffff810102c22420 etc (total of 12 0x constants) Call Trace: [<ffffffff88084a0b>] :libata:ata_qc_issue+0x4aa/0x521 :scsi_mod:scsi_alloc_sgtable+0xbd/0x18d :scsi_mod:scsi_done+0x0/0x18 :libata:atapi_xlat+0x0/0xee :libata:ata_scsi_translate+0x101/0x12e Is there some other critical piece of information needed? By the way, the display option set by means of vga=ask lasts for a few seconds only; it seems to be automatically overridden just before the message "Welcome to Fedora" is displayed. Is this also a bug?
Whoops, there's an obvious bug in my patch - it forgot to actually allocate the PRD buffer for the non-ADMA mode case. I'll cook up a new one shortly. As far as the display option not sticking, I think that the Fedora initscripts are setting their own display font that overrides the one the kernel selected. So it's expected, though annoying. I suspect there's another setting that would set that font after boot starts as well..
Created attachment 254191 [details] Updated patch to fix ATAPI issues with memory >4GB OK, can you test out this updated patch? (Assuming your tree has the previous patch applied, you'll have to do a patch -R on that one and then patch in this one.) This should actually allocate the PRD table when using the ATAPI device and hopefully fix the problem you were seeing..
Created attachment 255811 [details] dmesg for latest patch Boot errors have vanished! (OK, with the exception of "sysctl table check failed:" and an old one about "/bin/sh" and "line 1 of ..." that is probably not very relevant) In new dmesg (attachment) one may notice that the dvd detection looks fine. There is a new value for the CPU 0 aperture pointer, is this OK? At the O/S level, there is no longer a crash when trying to mount the DVD-RAM... .. although another problem seems to exist (I think it existed before now, I simply did not tried it then): the "mount" command does not like udf formatted DVD-RAMs (or else mkudffs is not running OK, even if it looks like it does). Mount will display the following: prompt# mount -t udf /dev/scd0 /mnt/tmp mount: wrong fs type, bad option, bad superblock on /dev/scd0, missing codepage or other error In some cases useful info is found in syslog - try dmesg | tail or so Or if run with no fstype: prompt# mount /dev/scd0 /mnt/tmp mount: you must specify the filesystem type Now dd does not complain, eg prompt# dd if=file.gz of=/dev/scd0 bs=64K 5+1 records in 5+1 records out 350924 bytes (351 kB) copied, 0.104168 s, 3.4 MB/s but I cannot see the result! Actually, I then ran mkudffs over a DVD-RAM disk which had reiserfs on it, and nothing changed ie, I could still mount it as reiserfs, even after ejecting and loading it again! I am still seeing the sporadic errors I reported before (comment #20) but this one is very tricky, there could be many reasons for this, starting with the driver/media/etc. I'll try to record at the lowest speed. Something I did _not_ see in dmesg: is Linux detecting ECC memory errors? My guess: when configuring the BIOS (and thus the chipset) for using ECC memory the whole thing stays under the radar (ie, no O/S notifications) because the Linux kernel does not interfere with it (should work though). Is this correct? Lastly, how do I clean up the files created by the install procedures outside of the kernel source tree? Is /boot the only other place where modifications are made? PS /proc/sys/dev/cdrom/info: CD-ROM information, Id: cdrom.c 3.20 2003/12/17 drive name: sr1 sr0 drive speed: 40 40 drive # of slots: 1 1 Can close tray: 1 1 Can open tray: 1 1 Can lock tray: 1 1 Can change speed: 1 1 Can select disk: 0 0 Can read multisession: 1 1 Can read MCN: 1 1 Reports media changed: 1 1 Can play audio: 1 1 Can write CD-R: 1 1 Can write CD-RW: 1 1 Can read DVD: 0 1 Can write DVD-R: 0 1 Can write DVD-RAM: 0 1 Can read MRW: 1 0 Can write MRW: 1 0 Can write RAM: 1 1
Thanks for the report that this seems to fix the problem (someone else also reported success with it). I'll submit the patch shortly. As far as ECC, normally the kernel is not involved unless there are errors happening (tends to trigger NMI interrupts, etc.) The kernel install process will put some files in /boot as well as installing modules in the /lib/modules/(kernel version) directory.
Created attachment 257561 [details] Simpler updated patch to fix ATAPI issues with memory >4GB Could you test the attached, simpler patch to fix this problem instead? You might get some "applied with offset" messages when applying this one, that's OK.
No, it doesn't work -- my machine FROZE when trying to mount a DVD_RAM disk. Drive detection problem also seen in dmesg. This time (with make) several (new?) files were compiled in raid and infiniband drivers. I don't remember clearing anything, but am not 100% sure. PS1 Problem with mount/mkudffs is likely due to blocksize; before was 4096, seems OK now with 2048. PS2 With 2.6.24-rc1-git10 (did not see it with Fedora 7, and have not tried Fedora 8) prompt# find /proc -name cdrom /proc/sys/dev/cdrom find: WARNING: Hard link count is wrong for /proc/net: this may be a bug in your filesystem driver. Automatically turning on find's -noleaf option. Earlier results may have failed to include directories that should have been searched.
Created attachment 259141 [details] Additional patch That's a bit curious - can you try this one as well and see what behavior you get?
Rather similar results, only that it crashed this time...
Did you get the stack trace from that crash?
Don't remember seeing any... after waiting a while for the DVD-RAM disk to be mounted (I could hear the disk spinning in the drive, at least a few times) the machine just crashed leaving a blank (ie black) screen and then displayed the POST screen. PS The _original_ 2.6.24-rc1-git10 sata_nv.c was patched with the above attachment, id 259141.
Ah, actually I wanted you to try with both patches applied (259141 and 257561).
It doesn't work either... In fact, mounting the disk is not even required, it seems. This time I did not issue the mount command, just closed the tray and then heard the disk spinning, waited a bit more and then the computer crashed like on some of the previous occasions.
Curious.. I'm rather puzzled as to what's different with these two patches applied versus the last one. Could you try adding some debug prints to the nv_adma_port_start function at around line 1139, just after these lines: rc = ata_port_start(ap); if (rc) return rc; Add in: ata_port_printk(ap, KERN_NOTICE, "prd alloc, virt %p, dma %llx\n", ap->prd, (unsigned long long)ap->prd_dma); ata_port_printk(ap, KERN_NOTICE, "pad alloc, virt %p, dma %llx\n", ap->pad, (unsigned long long)ap->pad_dma); Then boot up with that change, and post the dmesg that you get on startup from that (before attempting to access the DVD-RAM)?
Created attachment 263811 [details] dmesg for request in comment #50
That seems to show that you're getting the PRD and pad buffers allocated above 4GB, which really shouldn't happen since the DMA mask isn't set to 64-bit until after those buffers get allocated. Can you make certain that you only have the patches in attachments 257561 and 259141 applied on top of the stock kernel and there are no remnants of either of the older patches remaining? Maybe re-extract a clean copy and copy sata_nv.c from it into the copy you're using, then apply the patches, just to be sure.
Created attachment 264021 [details] sata_nv.c corresponding to dmesg of comment 51 It looks ok to me, please check too!
Hmm, I think I know what is happening. It works OK for the first port, but with the second port the DMA mask is still set to 64-bit when allocating the PRD and pad buffers and so it fails. How about this: Add in some lines in nv_adma_port_start around line 1146: VPRINTK("ENTER\n"); pci_set_dma_mask(pdev, DMA_BIT_MASK(32)); pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32)); rc = ata_port_start(ap); if (rc) return rc; i.e. add in the pci_set_dma_mask and pci_set_consistent_dma_mask lines. Hopefully that will fix the problem..
Created attachment 265241 [details] dmesg for latest patching (comment 54) Now it seems to work! Even in the /proc/sys/dev/cdrom/info file all the entries for the DVD drive are set to "1" except for "Can select disk:", obviously. That was not the case a week ago, since the entries "MRW" were set to zero (see comment #40).
Created attachment 265521 [details] call trace dump (kernel bug) There may be other bugs associated with the problems I reported, I'm afraid. This dump was extracted from /var/log/messages and was generated after copying > 1GB of data to a DVD-RAM, udf format. When it seemed to have stopped (after 2 hours!) I issued the eject command. It then seemed to write more data to the disk (possibly part of the data had not been really copied, and it was syncing) and a few minutes after this notice came up. Maybe I should go back to FC7, but I am not really sure...
> Maybe I should go back to FC7, but I am not really sure... The last sentence of my previous comment may be misleading -- I am/was in 2.6.24-rc1-git10, not FC8...
That sounds like a different problem - could be a UDF filesystem bug. I'd report that one on the Linux kernel mailing list.. I'll be submitting the sata_nv patches upstream shortly. Quite likely they should be submitted to the 2.6.23-stable series as well..
Created attachment 265601 [details] Another updated patch for ATAPI issues Just for reference, here's the final patch I'm submitting (along with the one in attachment 259141 [details] which is also still needed).
Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage There hasn't been much activity on this bug for a while. Have the above patches resolved the issue or is this still outstanding? If the problem no longer exists then please close this bug or I'll do so in a few days if there is no additional information lodged.
Can you leave it until we have a final 2.6.24 tree with definite fixes and workarounds for some of the ugly cases
Hi Robert. I just write down my stuff here again,.. to focus everything at one central point. As I've told you at lkml, I have the same problem with the following system: -Debian sid using a custom 2.6.23.10 kernel -4 GB ECC registered RAM -2x DualCore AMD Opteron -Nvidia nforce professional chipset using sata_nv drivers for SATA -LG GGW-H20L SATA bluray device It crashes as soon as I close the tray with any media in it. No need to mount or sth. like this (although it might be that HAL tries to acces the drive). Anyway,... if you need further information or want me to do some testing, don't hesitate to ask :-) Best wishes, Chris.
Created attachment 293360 [details] Yet another attempt at fixing the problem Can those experiencing this problem test out the attached patch, against 2.6.24? It does boot for me, but I can't test the actual issue since I don't have an ATAPI device connected.
Created attachment 293601 [details] dmesg for 2.6.24 with patch (id=293360) Seems to work OK with 4 GB RAM. There are a couple of (minor?) things that caught my attention so far: 1) in /proc/sys/dev/cdrom/info a couple of entries are now set to zero (this is not he first time, see eg comment #40, but then in comment #55 I reported those entries were set to one): Can read MRW: 0 Can write MRW: 0 (this time dvd entries only) 2) there a few differences on dmesg (mainly for the RAM and masks) hence the attachment Robert, do you know whether the earlier patch was actually integrated into 2.6.23.x and/or 2.6.24.x, and if so, which versions?
No, I didn't push the earlier patch upstream as it didn't fix the problem for everyone. This one should, hopefully.
Is this fixed now? Will it go into 2.6.25?
Yes (hopefully), and yes.
The information we've requested above is required in order to review this problem report further and diagnose/fix the issue if it is still present. Since there have not been any updates to the report since thirty (30) days or more since we requested additional information, we're assuming the problem is either no longer present in the current Fedora release, or that there is no longer any interest in tracking the problem. Setting status to "CLOSED INSUFFICIENT_DATA". If you still experience this problem after updating to our latest Fedora release and can provide the information previously requested, please feel free to reopen the bug report. Thank you in advance. Note that maintenance for Fedora 7 will end 30 days after the GA of Fedora 9.
Re-opening, not sure why this one got hit by an insufficient data
This message is a reminder that Fedora 7 is nearing the end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 7. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '7'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 7's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 7 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. If possible, it is recommended that you try the newest available Fedora distribution to see if your bug still exists. Please read the Release Notes for the newest Fedora distribution to make sure it will meet your needs: http://docs.fedoraproject.org/release-notes/ The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping