Bug 219639

Summary:

Crash dump fails on IA64 with block_order set to 10

Product:

Red Hat Enterprise Linux 4

Reporter:

Ben Romer <benjamin.romer>

Component:

kernel

Assignee:

Takao Indoh <tindoh>

Status:

CLOSED ERRATA

QA Contact:

Brian Brock <bbrock>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

4.4

CC:

coughlan, jbaron, luyu, lwang, tao

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

RHSA-2008-0665

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2008-07-24 19:12:20 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

456425

Attachments:

Description	Flags
print waring patch	none
dmesg.log	none

Description Ben Romer 2006-12-14 16:20:02 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; InfoPath.1; .NET CLR 2.0.50727)

Description of problem:
When attempting to take a crash dump with an LSI SAS 3442 card and block_order set to 10, we receive an error message:

<4>disk_dump: I/O size exceeds the maximum block size of SCSI device. Signature check may fail
<3>disk_dump: read error on block 0
<3>disk_dump: check partition failed.



Version-Release number of selected component (if applicable):


How reproducible:
Always


Steps to Reproduce:
1. Configure diskdump to dump to the LSI SAS adapter
2. Trigger a force dump from the system console 

Actual Results:
Delaying for 5 seconds...
NaT bits        0000000000000000
pr              0bad0bad0bad4825
b0              e0000000fd01c070 0xe0000000fd01c070
ar.rsc          0000000000000000
cr.iip          e0000000fd039592 0xe0000000fd039592
cr.ipsr         0000141008120010
cr.ifs          800000000000050a
xip             a0000001000162c0 ia64_pal_call_static+0xa0/0xc0
xpsr            0000101008126010
xfs             800000000000050a
b1              00000000fdf3b410 0xfdf3b410
 
static registers r0-r15:
 r0- 3 0000000000000000 a0000001009bb720 a0000001006e7e20 000000000000050f
 r4- 7 00000000ffd07800 0000000000000000 00000000fd000010 0000000000000004
 r8-11 fffffffffffffffe 0000000000000000 0000000000000000 0000000000000000
r12-15 a0000001006e7d80 a0000001006e0000 0000000000000000 0000101008122010
 
bank 0:
r16-19 a0000001006e1018 0000000000000308 0000000000000000 0000000000000000
r20-23 0009804c8a70433f a000000100017940 5ffc000000000008 000000000001003e
r24-27 0000000000000006 0000000000000000 000000000000050f 0000000000000003
r28-31 a0000001000162c0 0000101008126010 800000000000050a 0bad0bad0bad4825
 
bank 1:
r16-19 0000000000000001 0bad0bad0bad4825 0000000000000000 00000000000002ab
r20-23 0000101008120010 a0000001000162a0 0000001008122010 0000000080000100
r24-27 0000000080000100 0003000000000000 80000000ffe4e0f0 a0000001006e1158
r28-31 0000000000000000 0000000000000003 0000000000000000 0000000000000000
Backtrace of current task (pid 0, swapper)
 
Call Trace:
 [<a0000001000162c0>] ia64_pal_call_static+0xa0/0xc0
                                sp=a0000001006e7d80 bsp=a0000001006e0fc0
 [<a000000100017940>] default_idle+0x140/0x1e0
                                sp=a0000001006e7d80 bsp=a0000001006e0f70
 [<a000000100017b00>] cpu_idle+0x120/0x2c0
                                sp=a0000001006e7e20 bsp=a0000001006e0f28
 [<a0000001000091b0>] rest_init+0xb0/0x120
                                sp=a0000001006e7e20 bsp=a0000001006e0f10
 [<a000000100680cf0>] start_kernel+0x4b0/0x6c0
                                sp=a0000001006e7e20 bsp=a0000001006e0ea8
 [<a000000100008190>] __end_ivt_text+0x270/0x290
                                sp=a0000001006e7e30 bsp=a0000001006e0e10
NaT bits        0000000000000000
pr              0bad0bad0bad1827
b0              e0000000fd01c070 0xe0000000fd01c070
ar.rsc          0000000000000000
cr.iip          e0000000fd039592 0xe0000000fd039592
cr.ipsr         0000141008120010
cr.ifs          800000000000050a
xip             a0000001000162c0 ia64_pal_call_static+0xa0/0xc0
xpsr            0000101008126010
xfs             800000000000050a
b1              00000000fdf238b0 0xfdf238b0
 
static registers r0-r15:
 r0- 3 0000000000000000 a0000001009bb720 e0000001052d7e30 000000000000050f
 r4- 7 00000000fdf33730 0000000000000001 00000000fd000010 0000000000000004
 r8-11 fffffffffffffffe 0000000000000000 0000000000000000 0000000000000000
r12-15 e0000001052d7d90 e0000001052d0000 0000000000000000 0000101008122010
 
bank 0:
r16-19 e0000001052d0f30 0000000000000308 0000000000000000 0000000000000000
r20-23 0009804c8a70433f a000000100017940 0000000000000000 000000000001003e
r24-27 800000000000000c 0000000000000000 000000000000050f 0000000000000003
r28-31 a0000001000162c0 0000101008126010 800000000000050a 0bad0bad0bad1827
 
bank 1:
r16-19 0000000000000004 0bad0bad0bad1827 0000000000000000 00000000000002ab
r20-23 0000101008120010 a0000001000162a0 0000001008122010 0000000080000100
r24-27 0000000080000100 0003000000000000 80000000ffe4e0f0 e0000001052d1078
r28-31 0000000000000000 0000000000000003 0000000000000000 0000000000000000
Backtrace of current task (pid 0, swapper)
 
Call Trace:
 [<a0000001000162c0>] ia64_pal_call_static+0xa0/0xc0
                                sp=e0000001052d7d90 bsp=e0000001052d0ee0
 [<a000000100017940>] default_idle+0x140/0x1e0
                                sp=e0000001052d7d90 bsp=e0000001052d0e90
 [<a000000100017b00>] cpu_idle+0x120/0x2c0
                                sp=e0000001052d7e30 bsp=e0000001052d0e48
 [<a00000010005d170>] start_secondary+0x2b0/0x2e0
                                sp=e0000001052d7e30 bsp=e0000001052d0e10
 [<a000000100008180>] __end_ivt_text+0x260/0x290
                                sp=e0000001052d7e30 bsp=e0000001052d0e10
timeout 1
CPU frozen:
CPU#0 is executing diskdump.
start dumping to sdb1
<4>mptscsi: ioc0: Attempting host reset! (sc=a000000200174728)
<6>mptbase: Initiating ioc0 recovery
check dump partition...
<4>disk_dump: I/O size exceeds the maximum block size of SCSI device. Signature check may fail
<3>disk_dump: read error on block 0
<3>disk_dump: check partition failed.
<3>disk_dump: No more dump device found
<4>scsi_dump: SYNCHRONIZE_CACHE failed, but try to continue dumping
<6>disk_dump: diskdump failed, fall back to trying netdump

Expected Results:
The device driver should have recognized that 10 was an unacceptable value and used an acceptable value instead (the actual maxmimum of the device, or the default value).

Additional info:

Comment 2 Nobuhiro Tachino 2007-03-21 14:42:46 UTC

Hello Wade,
 
Thank you for the patch. I like this patch. Some adapters can handle the larger
I/O size than their max_sector fields and some customers actually use them to
improve the performance of diskdump. I prefer that diskdump just show a warning
like this patch when the specified block size is larger than the size which is
allowed by the adapter instead of prohibitting them because it allows customers
to have a choice.

Comment 3 Luming Yu 2007-08-13 07:07:33 UTC

Hello  Wade,

what is the status on this bug. I cannot see the patch. If you need my help to
get the patch into rhel 4.7 please just open the comment#1 to me. Probably I can
help review , test and post. 

Thanks,
Luming

Comment 4 Red Hat Bugzilla 2007-10-19 04:05:58 UTC

User ntachino's account has been closed

Comment 5 Linda Wang 2007-11-28 00:08:09 UTC

Luming, the patch Wade proposed is as follows:

The attached patch makes diskdump display the warning message when diskdump
service starts.

diff -ruNp linux-2.6.9.old/drivers/block/diskdump.c
linux-2.6.9/drivers/block/diskdump.c
--- linux-2.6.9.old/drivers/block/diskdump.c	2006-12-19 21:08:23.000000000 +0900
+++ linux-2.6.9/drivers/block/diskdump.c	2006-12-19 21:05:09.000000000 +0900
@@ -1205,6 +1205,11 @@ static int add_dump(struct device *dev, 
 			blkdev_put(bdev);
 			return ret;
 		}
+		/*
+		 * If the device has limitations of transfer size, use it.
+		 */
+		if (dump_device->max_blocks < (1 << block_order))
+			Warn("I/O size exceeds the maximum block size of SCSI device. Signature
check may fail");
 		if (!try_module_get(dump_type->owner)) {
 			kfree(dump_device);
 			blkdev_put(bdev);

However, Ernie has a different approach:

--- linux-2.6.9/drivers/block/diskdump.c.orig
+++ linux-2.6.9/drivers/block/diskdump.c
@@ -354,12 +354,7 @@ static int check_dump_partition(struct d
 	if (sample_rate < 0)		/* No check */
 		return 1;
 
-	/*
-	 * If the device has limitations of transfer size, use it.
-	 */
 	chunk_blks = 1 << block_order;
-	if (dump_part->device->max_blocks < chunk_blks)
-		Warn("I/O size exceeds the maximum block size of SCSI device. Signature check
may fail");
 	skips = chunk_blks << sample_rate;
 
 	lapse = 0;
@@ -1205,7 +1200,8 @@ static int add_dump(struct device *dev, 
 			blkdev_put(bdev);
 			return ret;
 		}
-		if (!try_module_get(dump_type->owner)) {
+		if (dump_device->max_blocks < (1 << block_order) ||
+		    !try_module_get(dump_type->owner)) {
 			kfree(dump_device);
 			blkdev_put(bdev);
 			return -EINVAL;

Since Takao-san is taking over the diskdump kernel issues, I am reassigning 
the issues to him instead. fyi.

Comment 6 Takao Indoh 2008-01-22 16:42:37 UTC

*** Bug 233057 has been marked as a duplicate of this bug. ***

Comment 7 Takao Indoh 2008-01-22 16:47:43 UTC

*** Bug 234114 has been marked as a duplicate of this bug. ***

Comment 8 Takao Indoh 2008-02-27 16:54:54 UTC

Created attachment 296089 [details]
print waring patch

Comment 9 Takao Indoh 2008-02-27 17:03:46 UTC

Hello Ben, 

The attached file is the latest patch to fix this patch. Actually this patch is
the same as Comment #5. If I/O size of diskdump is larger than max_sector value
of driver, warning message is printed when diskdump module is loaded. Therefore
customers can notice that when diskdump service starts. Please let me know
whther this patch works for you.

Thanks

Comment 10 Ben Romer 2008-02-29 13:53:10 UTC

I'm a little confused - this patch doesn't appear to actually solve the problem
of the dump failing when the value is too high. Wouldn't it be safer to reset it
to a default, known-good value rather than taking it and then failing later?
Other kernel parameters are smart enough to know to do this when given bad values...

I'm concerned about it because someone could easily miss a one-line warning in
their boot log, get hit with a serious problem later, have their dump fail, and
then we won't be able to address the failure because we have no dump. I'd rather
still get the dump regardless of how slow a default value might be in taking the
dump, than lose data due to a customer setting the value too high by mistake.

Comment 11 Takao Indoh 2008-02-29 21:24:48 UTC

Hello Ben,

I understand what you pointed out, but it is difficult to reset block_order to
safety value because of the following reason.

Diskdump can not know the safety value of block_order for each driver. The only
way to know it is using a "max_sectors" value of each driver, but this value is
not precise. For example, max_sectors value of MegaRAID driver is 128, so
maximum value of block_order for MegaRAID is 4(128*512/4096), but MegaRAID can
work even if larger value than 4 is specified. Actually some customer uses
larger value than 4 to improve the performance. I know a customer who uses
MegaRAID driver, and they uses block_order=8.

Therefore, what diskdump can do is printing warning message when block_order is
larger than the value calculated by max_sectors. If I change diskdump to reset
block_order according to max_sectors, it is a regression for some customers.

Thanks,
Takao Indoh

Comment 12 Ben Romer 2008-03-03 17:30:30 UTC

OK, that makes sense to me then - if the value in max_sectors isn't reliable,
then there's no way to fix the issue in Diskdump. Just printing out a warning
sounds like the best solution. We'll have to put something in our documentation
to strongly warn our customers about using a number higher than the device can
accept.

Thanks! :)

Comment 13 Luming Yu 2008-03-14 02:03:42 UTC

According to the patch in comment#8, and comments#11,12, this is diskdump
specific problem which should affect other architecture rather than just ia64,
moving it to "ALL" category..

Comment 14 RHEL Program Management 2008-03-15 21:18:52 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 17 Vivek Goyal 2008-03-25 21:04:54 UTC

Committed in 68.25. Released in 68.26. RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 22 Flavio Leitner 2008-05-16 15:21:24 UTC

Hello Takao,

I guess we still have one corner case not covered by the patch.
The situation happens when using SATA in LBA48 mode, so the max_sectors
will be ATA_MAX_SECTORS_LBA48 = 65535 and then no warning is printed 
unless block_order is 13, but the kernel panics with 7.

Flavio
(flipping from ON_QA to Assigned)

Comment 23 Suzanne Logcher 2008-05-28 21:25:19 UTC

Unfortunately this bugzilla was not resolved in time for RHEL 4.7 Beta.
It has now been proposed for inclusion in RHEL 4.8 but must regain Product
Management approval.

Comment 24 RHEL Program Management 2008-05-28 21:40:08 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 25 Takao Indoh 2008-06-04 16:31:26 UTC

Hello Flavio,

>I guess we still have one corner case not covered by the patch.
>The situation happens when using SATA in LBA48 mode, so the max_sectors
>will be ATA_MAX_SECTORS_LBA48 = 65535 and then no warning is printed 
>unless block_order is 13, but the kernel panics with 7.

What kind of card did you use? Could you upload dmesg and /proc/diskdump?
I tried using Promise TX2plus(sata_promise) in LBA48 mode, and warning was
printed with 8. In this case, max_sectors is SCSI_DEFAULT_MAX_SECTORS(1024), so
block_order is 8.

Comment 28 Flavio Leitner 2008-06-19 18:02:42 UTC

Hello Takao,

> What kind of card did you use? Could you upload dmesg and /proc/diskdump?
> I tried using Promise TX2plus(sata_promise) in LBA48 mode, and warning was
> printed with 8. In this case, max_sectors is SCSI_DEFAULT_MAX_SECTORS(1024),
> so block_order is 8.

Yes, I must have misunderstood something - I've checked again and piix 
should be using max_sectors = SCSI_DEFAULT_MAX_SECTORS too.

This is the condition to print the warning message:
if (dump_device->max_blocks < (1 << block_order))

The /proc/diskdump shows block_order = 7.  1 << 7 = 128.

max_blocks is (sdev->sector_size * sdev->host->max_sectors) >> DUMP_BLOCK_SHIFT
512 * 1024 >> 12 = 128

then the condition to print the warning message fails but the box
panics at the start of dumping by diskdump when it fails to allocate a 
buffer by using SWIOTLB.

Perhaps, to be on safe side, change to:
- if (dump_device->max_blocks < (1 << block_order))
+ if (dump_device->max_blocks <= (1 << block_order))

dmesg: (I'll attach the full file asap)
Linux version 2.6.9-72.ELsmp (brewbuilder.redhat.com) (gcc
version 3.4.6 20060404 (Red Hat 3.4.6-9)) #1 SMP Tue Jun 3 16:32:03 EDT 2008
...
libata version 2.00 loaded.
ata_piix 0000:00:1f.2: version 2.00ac7                                         
                        ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]             
             
ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 17 (level, low) -> IRQ 217          
                       
PCI: Setting latency timer of device 0000:00:1f.2 to 64                        
                     
ata1: SATA max UDMA/133 cmd 0x1C20 ctl 0x1C16 bmdma 0x1C00 irq 217             
                       
ata2: SATA max UDMA/133 cmd 0x1C18 ctl 0x1C12 bmdma 0x1C08 irq 217             
                       
scsi0 : ata_piix                                                               
                       
ata1.00: ATA-7, max UDMA/133, 156250080 sectors: LBA48 NCQ (depth 0/32)        
                     
ata1.00: ata1: dev 0 multi count 16                                            
                        ata1.00: configured for UDMA/133                       
                                                scsi1 : ata_piix               
                                                                        Using
cfq io scheduler                                                               
                    Vendor: ATA       Model: WDC WD800JD-19MS  Rev: 10.0       
                                            Type:   Direct-Access              
       ANSI SCSI revision: 05                                     SCSI device
sda: 156250080 512-byte hdwr sectors (80000 MB)                                
          
SCSI device sda: drive cache: write back                     
SCSI device sda: 156250080 512-byte hdwr sectors (80000 MB)                    
                     
SCSI device sda: drive cache: write back
  sda: sda1 sda2 sda3 sda4 < sda5 >                                            
                         
...

/proc/diskdump
# sample_rate: 8
# block_order: 7
# fallback_on_err: 1
# allow_risky_dumps: 1
# dump_level: 0
# compress: 0
# total_blocks: 1146392
#
sda5 88293303 20016927


Flavio

Comment 29 Flavio Leitner 2008-06-19 18:04:57 UTC

Created attachment 309867 [details]
dmesg.log

Comment 35 errata-xmlrpc 2008-07-24 19:12:20 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html