Bug 662952

Summary:	PPC64 sometimes failed to boot up after finished diskdump
Product:	Red Hat Enterprise Linux 4	Reporter:	Chao Ye <cye>
Component:	kernel	Assignee:	Jiri Skala <jskala>
Status:	CLOSED WONTFIX	QA Contact:	Red Hat Kernel QE team <kernel-qe>
Severity:	medium	Docs Contact:
Priority:	high
Version:	4.9	CC:	aglotov, anderson, plyons, qcai, sbest, tindoh
Target Milestone:	rc
Target Release:	---
Hardware:	ppc64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	662950	Environment:
Last Closed:	2012-06-20 16:18:46 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	662950
Bug Blocks:

Description Chao Ye 2010-12-14 09:15:06 UTC

+++ This bug was initially created as a clone of Bug #662950 +++

Description of problem:
I setup diskdump service. When triggered a crash, it didn't start dump vmcore. I got these from console:
===================================================
SysRq : Crashing the kernel by request
cpu 0x2: Vector: 300 (Data Access) at [c00000002964f920]
    pc: c0000000001d047c: .sysrq_handle_crash+0x4/0xc
    lr: c0000000001d08d8: .__handle_sysrq+0xb8/0x18c
    sp: c00000002964fba0
   msr: 8000000000001032
   dar: 0
 dsisr: 42000000
  current = 0xc00000002f46c5c0
  paca    = 0xc000000000409800
    pid   = 3812, comm = bash
enter ? for help
2:mon> 

Version-Release number of selected component (if applicable):
# rpm -q kernel diskdumputils
kernel-2.6.9-92.EL
diskdumputils-1.4.1-7

How reproducible:
100%

Steps to Reproduce:
1.Get RHEL4-U9-re20101130.0 installed
2.Setup diskdump service 
3.Trigger a crash
  
Actual results:


Expected results:
Diskdump should save vmcore

Additional info:

Comment 1 Chao Ye 2010-12-15 10:55:27 UTC

I tried to append "xmon=off" to kernel option, then diskdump works. But when dump finished, system failed to boot up:
=============================================
ibm-js20-04.lab.bos.redhat.com login: SysRq : Crashing the kernel by request
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=64 NUMA PSERIES LPAR 
NIP: C0000000001D04FC XER: 0000000000000000 LR: C0000000001D0958
REGS: c000000072c2f920 TRAP: 0300   Not tainted  (2.6.9-91.EL)
MSR: 8000000000001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: 0000000000000000, DSISR: 0000000042000000
TASK: c00000000f490bb0[3450] 'runtest.sh' THREAD: c000000072c2c000 CPU: 0
GPR00: C0000000001D04F8 C000000072C2FBA0 C00000000050C740 0000000000000063 
GPR04: 0000000000000000 0000000000000000 0000000000000001 00000001000F7C2B 
GPR08: 0000000000000000 0000000000000000 C00000000053DBB8 C0000000001D0504 
GPR12: 0000000044242428 C000000000408800 00000000100E50D0 0000000000000000 
GPR16: 00000000FFFFFFFF 0000000010060000 0000000000000000 0000000000000000 
GPR20: 0000000000000000 0000000000000000 00000000100C0000 0000000000000000 
GPR24: 8000000000009032 0000000000000000 0000000000000000 0000000000000006 
GPR28: 0000000000000063 C00000000044C790 C00000000048A080 C000000000475068 
NIP [c0000000001d04fc] .sysrq_handle_crash+0x4/0xc
LR [c0000000001d0958] .__handle_sysrq+0xb8/0x18c
Call Trace:
[c000000072c2fba0] [c0000000001d0924] .__handle_sysrq+0x84/0x18c (unreliable)
[c000000072c2fc50] [c00000000011b59c] .write_sysrq_trigger+0x84/0xb4
[c000000072c2fcf0] [c0000000000c4694] .vfs_write+0x148/0x1ac
[c000000072c2fd90] [c0000000000c47d0] .sys_write+0x4c/0x8c
[c000000072c2fe30] [c000000000011280] syscall_exit+0x0/0x18
CPU frozen: #1
CPU#0 is executing diskdump.
start dumping to hda3
check dump partition...
dumping memory(partial dump with dump_level 19)..
507904(332176 saved 175728 skipped)/507904                        
<6>disk_dump: diskdump succeeded

E209
E20A
E20F
E200
E20B
E20D
E211
E201
E212
E202
E213
E214
E215
E216
E217
E218
E204
E201
E219
E21A
E21C
E21D
E21E
E21F
E21B
E210
E21B
E203
E206
E20C
E13A
E134
E19D
E138
E440
E441
E139
E442
E443
D010
D011
E13A
E134
E19D
E138
E440
E441
E139
E149
E14C
20D00902  
E101
E102
E10A
E10B
E19E
E150
E154
E154
E154
E442
E443
D012
D00E
D00D
E170
E172  U8842.4TX.23GLF8H-P1
E151  U8842.4TX.23GLF8H-P1
E152  U8842.4TX.23GLF8H-P1
E153  U8842.4TX.23GLF8H-P1
E152  U8842.4TX.23GLF8H-P1
E153  U8842.4TX.23GLF8H-P1
E172  U8842.4TX.23GLF8H-P1
EAA1  U8842.4TX.23GLF8H-P1
E152  U8842.4TX.23GLF8H-P1-T6
E153  U8842.4TX.23GLF8H-P1-T6
E152  U8842.4TX.23GLF8H-P1-T7
E153  U8842.4TX.23GLF8H-P1-T7
EAA1  U8842.4TX.23GLF8H-P1
E172  U8842.4TX.23GLF8H-P1
EAA1  U8842.4TX.23GLF8H-P1
E152  U8842.4TX.23GLF8H-P1
E153  U8842.4TX.23GLF8H-P1
E152  U8842.4TX.23GLF8H-P1
E153  U8842.4TX.23GLF8H-P1
EAA1  U8842.4TX.23GLF8H-P1
E172  U8842.4TX.23GLF8H-P1
EAA1  U8842.4TX.23GLF8H-P1-C5
E152  U8842.4TX.23GLF8H-P1-C5-T1
E153  U8842.4TX.23GLF8H-P1-C5-T1
E152  U8842.4TX.23GLF8H-P1-C5-T2
E153  U8842.4TX.23GLF8H-P1-C5-T2
EAA1  U8842.4TX.23GLF8H-P1-C5
D001
D003
D004
E139
E14A
D008
E1F0
	20d00902





E1F1



D099
D5BB 1 


E1AA
E1AD
BOOTP: chosen-network-type = ethernet,auto,none,auto
BOOTP: server   IP =        0.0.0.0
BOOTP: requested filename = 
BOOTP: client   IP =        0.0.0.0
BOOTP: client   HW addr =   0 d 60 1e e0 87
BOOTP: gateway  IP =        0.0.0.0
BOOTP: device    /pci@8000000f8000000/pci@0/ethernet@1,1
BOOTP: loc-code  U8842.4TX.23GLF8H-P1-T7


BOOTP R = 1 BOOTP S = 2 
FILE: ppc/0a102c55
BOOTP: read-first-block fail: 0 

FILE: ppc/0a102c55
BOOTP: read-first-block fail: 1 

FILE: ppc/0a102c55
BOOTP: read-first-block fail: 2 
BOOTP ERROR: TFTP of first block failed, ABORT

20A80005 

-
Elapsed time since release of system processors: 0 mins 50 secs

E105
Config file read, 32768 bytes
Config file error: Token is too long near line 0 in file /etc/yaboot.conf
Syntax error or read error config
Welcome to yaboot version 1.3.12
Enter "help" to get some basic usage information
boot: linux
Please wait, loading kernel...
linux: Not a valid ELF image
boot: help

Press the tab key for a list of defined images.
The label marked with a "*" is is the default image, press <return> to boot it.

To boot any other label simply type its name and press <return>.

To boot a kernel image which is not defined in the yaboot configuration 
file, enter the kernel image name as [[device:][partno],]/path, where 
"device:" is the OpenFirmware device path to the disk the image 
resides on, and "partno" is the partition number the image resides on.
Note that the comma (,) is only required if you specify an OpenFirmware
device, if you only specify a filename you should not start it with a ","

If you omit "device:" and "partno" yaboot will use the values of 
"device=" and "partition=" in yaboot.conf, right now those are set to: 
device=/pci@8000000f8000000/ide@4,1/disk@0
partition=-1

boot: 


I also tested with ibm-sf2a-lp2.rhts.eng.rdu.redhat.com. Failed to boot up.
When try to install a new system on these two machine, I noticed that the partition table seems fine, but i can't figure out if there was a corruption.

Comment 2 Chao Ye 2010-12-21 06:39:40 UTC

Tested with RHEL4-U8, diskdump can dump vmcore successfully. Regression added.

Comment 4 Chao Ye 2010-12-22 08:01:03 UTC

I didn't set release flag cause seems I don't have the permission.

Comment 5 Chao Ye 2010-12-27 07:59:26 UTC

Tested with RHEL4-U9-re20101215.1_nfs-AS-ppc64 on ibm-js12-vios-01-lp3.rhts.eng.bos.redhat.com, diskdump success to save vmcore.
Also tested RHEL4-U9-re20101130.0_nfs-AS-ppc64 on ibm-js22-vios-01-lp3.rhts.eng.bos.redhat.com, diskdump success to save vmcore.

Comment 6 Chao Ye 2010-12-27 08:02:02 UTC

[root@ibm-js20-04 ~]# cd /mnt/tests/kernel/diskdump/setup/
[root@ibm-js20-04 setup]# make run
chmod a+x ./runtest.sh
./runtest.sh
0
24
* Getting dump partition
    dump partition is /dev/hda3
* Umount dump partition
* initial format dump partition
Formatting dump device: 
Do you want to format /dev/hda3 (yes/NO)? /dev/hda3:                          /dev/hda3:                                                              [  0.2/dev/hda3: ########################                                     [ 40.9/dev/hda3: ###############################################              [ 78.4/dev/hda3: ############################################################ [100.0/dev/hda3:                                                                    /dev/hda3: [100.0%]
* Starting diskdump on startup
* Reboot in 10 sec after completing dumping
* Set dump_level to 19
* Delete dump partition entry in fstab
* Start diskdump service if not enabled
diskdump not enabled
Starting diskdump: [  OK  ]
* Install Kernel Debug info
- install debuginfo packages from yum repo.
Setting up Install Process
Setting up Repos
beaker-tasks              100% |=========================| 1.1 kB    00:00     
beaker-distro1            100% |=========================| 1.1 kB    00:00     
beaker-harness            100% |=========================|  951 B    00:00     
Reading repository metadata in from local files
beaker-tas: ######################tas: #######################################beaker-tas: ################################################## 7524/7524
beaker-dis: ################################################## 1814/1814
beaker-har: ################################################## 21/21
Parsing package install arguments
No Match for argument: kernel-debuginfo-2.6.9-93.EL.ppc64
Nothing to do
- warn: error installing kernel-debuginfo-2.6.9-93.EL.ppc64
- install debuginfo packages from brew.
Retrieving http://download.devel.redhat.com/brewroot/packages/kernel/2.6.9/93.EL/ppc64/kernel-debuginfo-2.6.9-93.EL.ppc64.rpm
Preparing...        ########################################### [100%]  (100%)
   1:kernel-debuginfo       ########################################### [100%]
* All done! Good luck!
/kernel/diskdump/setup result: PASS
   metric: 0
   Log: /tmp/tmp.tN3187
   DMesg: /tmp/dmesg.log
   Info: Searching AVC errors produced since 1293453030 (Mon Dec 27 07:30:30 2010)
   Info: No AVC messages found with /usr/bin/env LC_ALL=en_US.UTF-8 /sbin/ausearch -sv no -m AVC -m USER_AVC -m SELINUX_ERR -ts 12/27/2010 07:30:30 < /dev/null

   AvcLog: /tmp/tmp.NE3194
[root@ibm-js20-04 setup]# cd ../crash/
[root@ibm-js20-04 crash]# make run
chmod a+x ./runtest.sh
./runtest.sh
0
24
* Crashing system
/mnt/tests/kernel/diskdump/include/runtest.sh: line 82: [: -ne: unary operator expected
/mnt/tests/kernel/diskdump/include/runtest.sh: line 86: [: -ne: unary operator expected
/kernel/diskdump/crash/boot-2nd-kernel result: PASS
   metric: 0
   Log: /tmp/tmp.mt3390
   Info: Searching AVC errors produced since 1293453890 (Mon Dec 27 07:44:50 2010)
   Info: No AVC messages found with /usr/bin/env LC_ALL=en_US.UTF-8 /sbin/ausearch -sv no -m AVC -m USER_AVC -m SELINUX_ERR -ts 12/27/2010 07:44:50 < /dev/null

   AvcLog: /tmp/tmp.gb3397
SysRq : Crashing the kernel by request
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=64 NUMA PSERIES LPAR 
NIP: C0000000001D057C XER: 0000000000000000 LR: C0000000001D09D8
REGS: c00000007545b920 TRAP: 0300   Not tainted  (2.6.9-93.EL)
MSR: 8000000000001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: 0000000000000000, DSISR: 0000000042000000
TASK: c00000007b231620[3388] 'runtest.sh' THREAD: c000000075458000 CPU: 1
GPR00: C0000000001D0578 C00000007545BBA0 C00000000050C740 0000000000000063 
GPR04: 0000000000000000 0000000000000000 0000000000000001 000000010018C7CF 
GPR08: 0000000000000000 0000000000000000 C00000000053DBB8 C0000000001D0584 
GPR12: 0000000044242428 C000000000409000 00000000100E1128 0000000000000000 
GPR16: 00000000FFFFFFFF 0000000010060000 0000000000000000 0000000000000000 
GPR20: 0000000000000000 0000000000000000 00000000100C0000 0000000000000000 
GPR24: 8000000000009032 0000000000000000 0000000000000000 0000000000000006 
GPR28: 0000000000000063 C00000000044C790 C00000000048A080 C000000000475068 
NIP [c0000000001d057c] .sysrq_handle_crash+0x4/0xc
LR [c0000000001d09d8] .__handle_sysrq+0xb8/0x18c
Call Trace:
[c00000007545bba0] [c0000000001d09a4] .__handle_sysrq+0x84/0x18c (unreliable)
[c00000007545bc50] [c00000000011b53c] .write_sysrq_trigger+0x84/0xb4
[c00000007545bcf0] [c0000000000c4624] .vfs_write+0x148/0x1ac
[c00000007545bd90] [c0000000000c4760] .sys_write+0x4c/0x8c
[c00000007545be30] [c000000000011280] syscall_exit+0x0/0x18
CPU frozen: #0
CPU#1 is executing diskdump.
start dumping to hda3
check dump partition...
dumping memory(partial dump with dump_level 19)..
507904(332858 saved 175046 skipped)/507904                        
<6>disk_dump: diskdump succeeded
<==========Got stuck here, manual reboot via beaker.
E209
E20A
E20F
E200
E20B
E20D
E211
E201
E212
E202
E213
E214
E215
E216
E217
E218
E204
E201
E219
E21A
E21C
E21D
E21E
E21F
E21B
E210
E21B
E203
E206
E20C
E13A
E134
E19D
E138
E440
E441
E139
E442
E443
D010
D011
E13A
E134
E19D
E138
E440
E441
E139
E149
E14C
20D00902  
E101
E102
E10A
E10B
E19E
E150
E154
E154
E154
E442
E443
D012
D00E
D00D
E170
E172  U8842.4TX.23GLF8H-P1
E151  U8842.4TX.23GLF8H-P1
E152  U8842.4TX.23GLF8H-P1
E153  U8842.4TX.23GLF8H-P1
E152  U8842.4TX.23GLF8H-P1
E153  U8842.4TX.23GLF8H-P1
E172  U8842.4TX.23GLF8H-P1
EAA1  U8842.4TX.23GLF8H-P1
E152  U8842.4TX.23GLF8H-P1-T6
E153  U8842.4TX.23GLF8H-P1-T6
E152  U8842.4TX.23GLF8H-P1-T7
E153  U8842.4TX.23GLF8H-P1-T7
EAA1  U8842.4TX.23GLF8H-P1
E172  U8842.4TX.23GLF8H-P1
EAA1  U8842.4TX.23GLF8H-P1
E152  U8842.4TX.23GLF8H-P1
E153  U8842.4TX.23GLF8H-P1
E152  U8842.4TX.23GLF8H-P1
E153  U8842.4TX.23GLF8H-P1
EAA1  U8842.4TX.23GLF8H-P1
E172  U8842.4TX.23GLF8H-P1
EAA1  U8842.4TX.23GLF8H-P1-C5
E152  U8842.4TX.23GLF8H-P1-C5-T1
E153  U8842.4TX.23GLF8H-P1-C5-T1
E152  U8842.4TX.23GLF8H-P1-C5-T2
E153  U8842.4TX.23GLF8H-P1-C5-T2
EAA1  U8842.4TX.23GLF8H-P1-C5
D001
D003
D004
E139
E14A
D008
E1F0
	20d00902





E1F1



D099
D5BB 1 


E1AA
E1AD
BOOTP: chosen-network-type = ethernet,auto,none,auto
BOOTP: server   IP =        0.0.0.0
BOOTP: requested filename = 
BOOTP: client   IP =        0.0.0.0
BOOTP: client   HW addr =   0 d 60 1e e0 87
BOOTP: gateway  IP =        0.0.0.0
BOOTP: device    /pci@8000000f8000000/pci@0/ethernet@1,1
BOOTP: loc-code  U8842.4TX.23GLF8H-P1-T7


BOOTP R = 1 BOOTP S = 3 
FILE: ppc/0a102c55
BOOTP: read-first-block fail: 0 

FILE: ppc/0a102c55
BOOTP: read-first-block fail: 1 

FILE: ppc/0a102c55
BOOTP: read-first-block fail: 2 
BOOTP ERROR: TFTP of first block failed, ABORT

20A80005 

-
Elapsed time since release of system processors: 0 mins 51 secs

E105
Config file read, 32768 bytes
Config file error: Token is too long near line 0 in file /etc/yaboot.conf
Syntax error or read error config
Welcome to yaboot version 1.3.12
Enter "help" to get some basic usage information
boot: linux
Please wait, loading kernel...
linux: Not a valid ELF image
boot:

Comment 7 Takao Indoh 2011-01-03 21:24:26 UTC

I tested diskdump with 2.6.9-89.EL, 2.6.9-92.EL and 2.6.9-94.EL on ibm-js20-04.lab.bos.redhat.com. All tests succeeded. I could not find any problems except that system was not rebooted automatically after diskdump finished.

> * Getting dump partition
>     dump partition is /dev/hda3
> * Umount dump partition
> * initial format dump partition
> Formatting dump device: 
> Do you want to format /dev/hda3 (yes/NO)? /dev/hda3:                         
> /dev/hda3:                                                              [ 

It seems that /dev/hda3 was used as dump device. Chao, hda3 was really unused disk? Could you get sosreport at first and then test diskdump again?

Comment 8 Chao Ye 2011-01-04 05:49:07 UTC

(In reply to comment #7)
> I tested diskdump with 2.6.9-89.EL, 2.6.9-92.EL and 2.6.9-94.EL on
> ibm-js20-04.lab.bos.redhat.com. All tests succeeded. I could not find any
> problems except that system was not rebooted automatically after diskdump
> finished.
> 
> > * Getting dump partition
> >     dump partition is /dev/hda3
> > * Umount dump partition
> > * initial format dump partition
> > Formatting dump device: 
> > Do you want to format /dev/hda3 (yes/NO)? /dev/hda3:                         
> > /dev/hda3:                                                              [ 
> 
> It seems that /dev/hda3 was used as dump device. Chao, hda3 was really unused
> disk? Could you get sosreport at first and then test diskdump again?

I just reproduced this issue on ibm-js20-04.lab.bos.redhat.com again with 1222.0. 
# cd /mnt/tests/kernel/diskdump/setup/; make run
This /dev/hda3 was configured as dump target. ibm-js20-04.lab.bos.redhat.com got less than 2GB Mem, I allocated 4GB for /dev/hda3.

Comment 9 Takao Indoh 2011-01-05 00:40:41 UTC

(In reply to comment #8)
> I just reproduced this issue on ibm-js20-04.lab.bos.redhat.com again with
> 1222.0. 
> # cd /mnt/tests/kernel/diskdump/setup/; make run
> This /dev/hda3 was configured as dump target. ibm-js20-04.lab.bos.redhat.com
> got less than 2GB Mem, I allocated 4GB for /dev/hda3.

When I tested, I installed RHEL4.9 manually to set up the partitions as follows.

/dev/hda1 /boot
/dev/hda2 /
/dev/hda3 swap
/dev/hda4 extend
/dev/hda5 dump device

How did you install? How was the partition layout?

Comment 10 Chao Ye 2011-01-05 01:36:55 UTC

(In reply to comment #9)
> (In reply to comment #8)
> > I just reproduced this issue on ibm-js20-04.lab.bos.redhat.com again with
> > 1222.0. 
> > # cd /mnt/tests/kernel/diskdump/setup/; make run
> > This /dev/hda3 was configured as dump target. ibm-js20-04.lab.bos.redhat.com
> > got less than 2GB Mem, I allocated 4GB for /dev/hda3.
> 
> When I tested, I installed RHEL4.9 manually to set up the partitions as
> follows.
> 
> /dev/hda1 /boot
> /dev/hda2 /
> /dev/hda3 swap
> /dev/hda4 extend
> /dev/hda5 dump device
> 
> How did you install? How was the partition layout?
Here is layout:
/dev/hda2 /boot
/dev/hda3 /dump
/dev/hda4 extend
/dev/hda5 LVM

Comment 11 Linda Wang 2011-01-05 21:54:52 UTC

Chao, can you use Indoh-san's partition layout and see
if you can reproduce the reboot problem?

Comment 12 Chao Ye 2011-01-06 03:55:11 UTC

(In reply to comment #9)
> (In reply to comment #8)
> > I just reproduced this issue on ibm-js20-04.lab.bos.redhat.com again with
> > 1222.0. 
> > # cd /mnt/tests/kernel/diskdump/setup/; make run
> > This /dev/hda3 was configured as dump target. ibm-js20-04.lab.bos.redhat.com
> > got less than 2GB Mem, I allocated 4GB for /dev/hda3.
> 
> When I tested, I installed RHEL4.9 manually to set up the partitions as
> follows.
> 
> /dev/hda1 /boot
> /dev/hda2 /
> /dev/hda3 swap
> /dev/hda4 extend
> /dev/hda5 dump device
> 
> How did you install? How was the partition layout?

I tried your layout with a little modification. But from anaconda, I was told that RHEL4 can't be installed without a PPC PReP Boot partition. So here is my new layout:
/dev/hda1 PPC PReP Boot
/dev/hda2 /boot
/dev/hda3 /
/dev/hda4 extend
/dev/hda5 swap
/dev/hda6 /dump

With this layout, I can't reproduce reboot issue on kernel-2.6.9-94.EL on ibm-js20-04.lab.bos.redhat.com box. Four times all success to save vmcore.

Comment 13 Takao Indoh 2011-01-06 04:18:57 UTC

> I tried your layout with a little modification. But from anaconda, I was told
> that RHEL4 can't be installed without a PPC PReP Boot partition. So here is my
> new layout:
> /dev/hda1 PPC PReP Boot
> /dev/hda2 /boot
> /dev/hda3 /
> /dev/hda4 extend
> /dev/hda5 swap
> /dev/hda6 /dump
Yeah, the layout I wrote in comment 9 was wrong. This is exactly same as the layout I used.
 
> With this layout, I can't reproduce reboot issue on kernel-2.6.9-94.EL on
> ibm-js20-04.lab.bos.redhat.com box. Four times all success to save vmcore.
Ok, so this problem seems to be related to the partition layout. If you have the  recipe you used when install, could you send it to me?

Comment 14 Chao Ye 2011-01-06 10:21:09 UTC

(In reply to comment #13)
> Ok, so this problem seems to be related to the partition layout. If you have
> the  recipe you used when install, could you send it to me?

I just tried another layout:
/dev/hda2 /boot    100M
/dev/hda3 /dump  4GB
/dev/hda4 extend
/dev/hda5 LVM      10GB
/dev/        Free

I didn't used the entire disk, my plan is use these reserved free disk space to install a new system if it failed to boot up after diskdump. So I can check the /etc/yaboot.conf, to make sure if it was damaged.
But, this layout don't reproduced this issue, all dump process successed. And manually reboot via Beaker successfully.
I'll do more test tomorrow.

Comment 15 Takao Indoh 2011-01-13 23:27:07 UTC

Chao, could you send me your job XML when you reproduced this problem? I cloned your job(J:42744) and tried to make the same environment as yours, but installation failed.

Comment 16 Chao Ye 2011-01-24 10:22:59 UTC

I found at least these systems have this issue,
ibm-js20-5.rhts.eng.rdu.redhat.com
ibm-l4b-lp4.rhts.eng.rdu.redhat.com
ibm-js20-04.lab.bos.redhat.com

I also try to install a RHEL4-U8 on ibm-sf2a-lp2.rhts.eng.rdu.redhat.com, then add a RHEL4-U9 repo to upgrade all packages to latest.
But no boot issue can be found by this way.
ibm-sf2a-lp2.rhts.eng.rdu.redhat.com have two disks, I used the entire second disk hdb as dump target. Beaker Job ID is 48255:
https://beaker.engineering.redhat.com/jobs/48255

Comment 17 Chao Ye 2011-01-24 10:27:37 UTC

(In reply to comment #15)
> Chao, could you send me your job XML when you reproduced this problem? I cloned
> your job(J:42744) and tried to make the same environment as yours, but
> installation failed.

You can append "manual" to kernel_options, and "vnc" to kernel_options_post to start a manual installation.
When system is ready, you can run rh-tests-kernel-diskdump-setup tests. This tests will setup diskdump properly.

Comment 20 Takao Indoh 2011-01-24 23:29:19 UTC

Thanks, finally I could reproduce on ibm-js20-5.rhts.eng.rdu.redhat.com.
After installation of RHEL4-U9-re20101222.0, I did the following operation.

1) Install rh-tests-kernel-diskdump-setup
2) run setup script
3) reboot

After the reboot, system hung up with the error message below.

Config file read, 32768 bytes
Config file error: Token is too long near line 0 in file /etc/yaboot.conf
Syntax error or read error config

Still under investigation.

Comment 22 Takao Indoh 2011-01-28 20:11:14 UTC

Add information I found.

It seems that this is not diskdump problem but boot loader(yaboot) problem,
because it can be reproduced without using diskdump as follows. And it
occurs on RHE4.8 as well.

1. Make free partition next to /boot partition(just after /boot).
   For example,

Minor    Start       End     Type      Filesystem  Flags
1          0.031      7.844  primary               boot <== PPC PReP
2          7.844    109.819  primary   ext3             <== /boot
3        109.819   4110.380  primary   ext3             <== /test
4       4110.381  38154.375  extended
5       4110.412  38154.375  logical   ext3             <== /

In this case, /test is the free partition.


2. Unmount /test, and fill the head of partition with zero.
dd if=/dev/zero of=/dev/hda3 bs=1024 count=1024

3. Reboot, then system hangs up with the error message below.

Config file read, 32768 bytes
Config file error: Token is too long near line 0 in file /etc/yaboot.conf
Syntax error or read error config

Comment 25 Takao Indoh 2011-09-20 20:56:41 UTC

> Seems to be the same issue as BZ#579834 filed against RHEL-5. I've prepared a
> scratch build to test.
> 
> https://brewweb.devel.redhat.com/taskinfo?taskID=3649692
> 
> Takao, please, could you test it? Thanks, 

I tested using yaboot-1.3.12-7.6.bz662952.el4.ppc.rpm, but this problem is not
fixed. Here is the test log.

1. Install yaboot
[root@ibm-js20-04 ~]# rpm -qa | grep yaboot
yaboot-1.3.12-7.6
[root@ibm-js20-04 ~]# rpm -Uvh yaboot-1.3.12-7.6.bz662952.el4.ppc.rpm
Preparing...                ########################################### [100%]
   1:yaboot                 ########################################### [100%]
[root@ibm-js20-04 ~]# rpm -qa | grep yaboot
yaboot-1.3.12-7.6.bz662952.el4
[root@ibm-js20-04 ~]# reboot

Broadcast message from root (hvc0) (Tue Sep 20 12:44:53 2011):

The system is going down for reboot NOW!
INIT: Switching to runlevel: 6
INIT: Sending processes the TERM signal
Stopping HAL daemon: [  OK  ]
Stopping system message bus: [  OK  ]
(snip)

2. Run test
[root@ibm-js20-04 ~]# mount
/dev/hda5 on / type ext3 (rw)
none on /proc type proc (rw)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
usbfs on /proc/bus/usb type usbfs (rw)
/dev/hda2 on /boot type ext3 (rw)
none on /dev/shm type tmpfs (rw)
/dev/hda3 on /test type ext3 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
[root@ibm-js20-04 ~]# umount /test
[root@ibm-js20-04 ~]# dd if=/dev/zero of=/dev/hda3 bs=1024 count=1024
1024+0 records in
1024+0 records out
[root@ibm-js20-04 ~]# reboot

Broadcast message from root (hvc0) (Tue Sep 20 12:49:27 2011):

The system is going down for reboot NOW!
INIT: Switching to runlevel: 6
INIT: Sending pStopping HAL daemon: [  OK  ]
(snip)
Elapsed time since release of system processors: 0 mins 53 secs

E105
Config file read, 32768 bytes
Config file error: Token is too long near line 0 in file /etc/yaboot.conf
Syntax error or read error config
Welcome to yaboot version 1.3.12
Enter "help" to get some basic usage information
boot:

Comment 28 Jiri Pallich 2012-06-20 16:18:46 UTC

Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.