Bug 219639
Summary: | Crash dump fails on IA64 with block_order set to 10 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Ben Romer <benjamin.romer> | ||||||
Component: | kernel | Assignee: | Takao Indoh <tindoh> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 4.4 | CC: | coughlan, jbaron, luyu, lwang, tao | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHSA-2008-0665 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2008-07-24 19:12:20 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 456425 | ||||||||
Attachments: |
|
Description
Ben Romer
2006-12-14 16:20:02 UTC
Hello Wade, Thank you for the patch. I like this patch. Some adapters can handle the larger I/O size than their max_sector fields and some customers actually use them to improve the performance of diskdump. I prefer that diskdump just show a warning like this patch when the specified block size is larger than the size which is allowed by the adapter instead of prohibitting them because it allows customers to have a choice. Hello Wade, what is the status on this bug. I cannot see the patch. If you need my help to get the patch into rhel 4.7 please just open the comment#1 to me. Probably I can help review , test and post. Thanks, Luming User ntachino's account has been closed Luming, the patch Wade proposed is as follows: The attached patch makes diskdump display the warning message when diskdump service starts. diff -ruNp linux-2.6.9.old/drivers/block/diskdump.c linux-2.6.9/drivers/block/diskdump.c --- linux-2.6.9.old/drivers/block/diskdump.c 2006-12-19 21:08:23.000000000 +0900 +++ linux-2.6.9/drivers/block/diskdump.c 2006-12-19 21:05:09.000000000 +0900 @@ -1205,6 +1205,11 @@ static int add_dump(struct device *dev, blkdev_put(bdev); return ret; } + /* + * If the device has limitations of transfer size, use it. + */ + if (dump_device->max_blocks < (1 << block_order)) + Warn("I/O size exceeds the maximum block size of SCSI device. Signature check may fail"); if (!try_module_get(dump_type->owner)) { kfree(dump_device); blkdev_put(bdev); However, Ernie has a different approach: --- linux-2.6.9/drivers/block/diskdump.c.orig +++ linux-2.6.9/drivers/block/diskdump.c @@ -354,12 +354,7 @@ static int check_dump_partition(struct d if (sample_rate < 0) /* No check */ return 1; - /* - * If the device has limitations of transfer size, use it. - */ chunk_blks = 1 << block_order; - if (dump_part->device->max_blocks < chunk_blks) - Warn("I/O size exceeds the maximum block size of SCSI device. Signature check may fail"); skips = chunk_blks << sample_rate; lapse = 0; @@ -1205,7 +1200,8 @@ static int add_dump(struct device *dev, blkdev_put(bdev); return ret; } - if (!try_module_get(dump_type->owner)) { + if (dump_device->max_blocks < (1 << block_order) || + !try_module_get(dump_type->owner)) { kfree(dump_device); blkdev_put(bdev); return -EINVAL; Since Takao-san is taking over the diskdump kernel issues, I am reassigning the issues to him instead. fyi. *** Bug 233057 has been marked as a duplicate of this bug. *** *** Bug 234114 has been marked as a duplicate of this bug. *** Created attachment 296089 [details]
print waring patch
Hello Ben, The attached file is the latest patch to fix this patch. Actually this patch is the same as Comment #5. If I/O size of diskdump is larger than max_sector value of driver, warning message is printed when diskdump module is loaded. Therefore customers can notice that when diskdump service starts. Please let me know whther this patch works for you. Thanks I'm a little confused - this patch doesn't appear to actually solve the problem of the dump failing when the value is too high. Wouldn't it be safer to reset it to a default, known-good value rather than taking it and then failing later? Other kernel parameters are smart enough to know to do this when given bad values... I'm concerned about it because someone could easily miss a one-line warning in their boot log, get hit with a serious problem later, have their dump fail, and then we won't be able to address the failure because we have no dump. I'd rather still get the dump regardless of how slow a default value might be in taking the dump, than lose data due to a customer setting the value too high by mistake. Hello Ben, I understand what you pointed out, but it is difficult to reset block_order to safety value because of the following reason. Diskdump can not know the safety value of block_order for each driver. The only way to know it is using a "max_sectors" value of each driver, but this value is not precise. For example, max_sectors value of MegaRAID driver is 128, so maximum value of block_order for MegaRAID is 4(128*512/4096), but MegaRAID can work even if larger value than 4 is specified. Actually some customer uses larger value than 4 to improve the performance. I know a customer who uses MegaRAID driver, and they uses block_order=8. Therefore, what diskdump can do is printing warning message when block_order is larger than the value calculated by max_sectors. If I change diskdump to reset block_order according to max_sectors, it is a regression for some customers. Thanks, Takao Indoh OK, that makes sense to me then - if the value in max_sectors isn't reliable, then there's no way to fix the issue in Diskdump. Just printing out a warning sounds like the best solution. We'll have to put something in our documentation to strongly warn our customers about using a number higher than the device can accept. Thanks! :) According to the patch in comment#8, and comments#11,12, this is diskdump specific problem which should affect other architecture rather than just ia64, moving it to "ALL" category.. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Committed in 68.25. Released in 68.26. RPMS are available at http://people.redhat.com/vgoyal/rhel4/ Hello Takao, I guess we still have one corner case not covered by the patch. The situation happens when using SATA in LBA48 mode, so the max_sectors will be ATA_MAX_SECTORS_LBA48 = 65535 and then no warning is printed unless block_order is 13, but the kernel panics with 7. Flavio (flipping from ON_QA to Assigned) Unfortunately this bugzilla was not resolved in time for RHEL 4.7 Beta. It has now been proposed for inclusion in RHEL 4.8 but must regain Product Management approval. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Hello Flavio,
>I guess we still have one corner case not covered by the patch.
>The situation happens when using SATA in LBA48 mode, so the max_sectors
>will be ATA_MAX_SECTORS_LBA48 = 65535 and then no warning is printed
>unless block_order is 13, but the kernel panics with 7.
What kind of card did you use? Could you upload dmesg and /proc/diskdump?
I tried using Promise TX2plus(sata_promise) in LBA48 mode, and warning was
printed with 8. In this case, max_sectors is SCSI_DEFAULT_MAX_SECTORS(1024), so
block_order is 8.
Hello Takao,
> What kind of card did you use? Could you upload dmesg and /proc/diskdump?
> I tried using Promise TX2plus(sata_promise) in LBA48 mode, and warning was
> printed with 8. In this case, max_sectors is SCSI_DEFAULT_MAX_SECTORS(1024),
> so block_order is 8.
Yes, I must have misunderstood something - I've checked again and piix
should be using max_sectors = SCSI_DEFAULT_MAX_SECTORS too.
This is the condition to print the warning message:
if (dump_device->max_blocks < (1 << block_order))
The /proc/diskdump shows block_order = 7. 1 << 7 = 128.
max_blocks is (sdev->sector_size * sdev->host->max_sectors) >> DUMP_BLOCK_SHIFT
512 * 1024 >> 12 = 128
then the condition to print the warning message fails but the box
panics at the start of dumping by diskdump when it fails to allocate a
buffer by using SWIOTLB.
Perhaps, to be on safe side, change to:
- if (dump_device->max_blocks < (1 << block_order))
+ if (dump_device->max_blocks <= (1 << block_order))
dmesg: (I'll attach the full file asap)
Linux version 2.6.9-72.ELsmp (brewbuilder.redhat.com) (gcc
version 3.4.6 20060404 (Red Hat 3.4.6-9)) #1 SMP Tue Jun 3 16:32:03 EDT 2008
...
libata version 2.00 loaded.
ata_piix 0000:00:1f.2: version 2.00ac7
ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 17 (level, low) -> IRQ 217
PCI: Setting latency timer of device 0000:00:1f.2 to 64
ata1: SATA max UDMA/133 cmd 0x1C20 ctl 0x1C16 bmdma 0x1C00 irq 217
ata2: SATA max UDMA/133 cmd 0x1C18 ctl 0x1C12 bmdma 0x1C08 irq 217
scsi0 : ata_piix
ata1.00: ATA-7, max UDMA/133, 156250080 sectors: LBA48 NCQ (depth 0/32)
ata1.00: ata1: dev 0 multi count 16
ata1.00: configured for UDMA/133
scsi1 : ata_piix
Using
cfq io scheduler
Vendor: ATA Model: WDC WD800JD-19MS Rev: 10.0
Type: Direct-Access
ANSI SCSI revision: 05 SCSI device
sda: 156250080 512-byte hdwr sectors (80000 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 156250080 512-byte hdwr sectors (80000 MB)
SCSI device sda: drive cache: write back
sda: sda1 sda2 sda3 sda4 < sda5 >
...
/proc/diskdump
# sample_rate: 8
# block_order: 7
# fallback_on_err: 1
# allow_risky_dumps: 1
# dump_level: 0
# compress: 0
# total_blocks: 1146392
#
sda5 88293303 20016927
Flavio
Created attachment 309867 [details]
dmesg.log
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0665.html |