Bug 463915

Summary: [5.3] SCP - dd: /dev/mem: Bad address
Product: Red Hat Enterprise Linux 5 Reporter: Qian Cai <qcai>
Component: kexec-toolsAssignee: Neil Horman <nhorman>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: low Docs Contact:
Priority: low    
Version: 5.3CC: duck, riek, syeghiay
Target Milestone: rc   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-20 21:00:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch to interrogate /proc/iomem to find a place to dd from dev/mem
none
new patch to fix dd errors
none
/proc/iomem from altix4.rhts.bos.redhat.com
none
new version of patch none

Description Qian Cai 2008-09-25 12:14:46 UTC
Description of problem:
When configured Kdump SCP target, 

net root.bos.redhat.com

I have seen the following on IA64 when dumping,

Saving to remote location root.bos.redhat.com
dd: /dev/mem: Bad address
[: 1189: unknown operand
[: 1189: unknown operand.06 MB
[: 1189: unknown operand.06 MB
[: 1189: unknown operand.06 MB
[: 1189: unknown operand.06 MB
[: 1189: unknown operand.06 MB
[: 1189: unknown operand.06 MB
[: 1189: unknown operand.06 MB
[: 1189: unknown operand.06 MB
[: 1189: unknown operand.06 MB
[: 1189: unknown operand.06 MB
[: 1189: unknown operand.06 MB
[: 1189: unknown operand.06 MB
[: 1189: unknown operand.06 MB
Copied 828.484 MB / 1531.06 MB

There was a line in init file from the initramfs,

dd if=/dev/mem of=/dev/urandom count=1 bs=512 skip=100

If I ran it manually, I got,
[root@hp-lp1 ~]# dd if=/dev/mem of=/dev/urandom count=1 bs=512 skip=100
dd: reading `/dev/mem': Bad address
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000457773 seconds, 0.0 kB/s

There is no such problem on i386 and x86_64.

Neil, do you also have an idea on where are those "unknown operand" errors come from?

Version-Release number of selected component (if applicable):
kexec-tools-1.102pre-40.el5
kernel-2.6.18-10

How reproducible:
always

Comment 1 Neil Horman 2008-09-25 17:21:07 UTC
hmm, the unknown operand error location will depend on your init script in your initramfs, if you could attach that please it would be helpful.

As for the dev/mem error, my guess is your testing on a system in which there is a memory hole early in ram.  I'll attach a patch for that shortly.

Comment 2 Neil Horman 2008-09-25 17:26:57 UTC
Created attachment 317712 [details]
patch to interrogate /proc/iomem to find a place to dd from dev/mem

Could you try this patch please and let me know if it fixes the /dev/mem access problem?  Thanks!

Comment 3 Qian Cai 2008-09-26 06:52:59 UTC
I am afraid it won't work,

- config file:
net root.bos.redhat.com

- mkdumprd snip:
...
+ mkdir -p /tmp/initrd.d10855/root
+ cp -a /root/.ssh /tmp/initrd.d10855/root/
+ cp -a /etc/ssh /tmp/initrd.d10855/etc
+ mknod /tmp/initrd.d10855/dev/urandom c 1 9
+ emit 'START_ADDR=`grep "System RAM" /proc/iomem | head -n 1 | cut -d"-" -f1`'
+ NONL=
+ '[' 'START_ADDR=`grep "System RAM" /proc/iomem | head -n 1 | cut -d"-" -f1`' == -n ']'
+ echo 'START_ADDR=`grep "System RAM" /proc/iomem | head -n 1 | cut -d"-" -f1`'
+ emit 'SKIP_COUNT=`dc -e"$START_ADDR 512 / 1 +`'
+ NONL=
+ '[' 'SKIP_COUNT=`dc -e"$START_ADDR 512 / 1 +`' == -n ']'
+ echo 'SKIP_COUNT=`dc -e"$START_ADDR 512 / 1 +`'
+ emit 'dd if=/dev/mem of=/dev/urandom count=1 bs=512 skip=$SKIP_COUNT'
+ NONL=
+ '[' 'dd if=/dev/mem of=/dev/urandom count=1 bs=512 skip=$SKIP_COUNT' == -n ']'
+ echo 'dd if=/dev/mem of=/dev/urandom count=1 bs=512 skip=$SKIP_COUNT'
+ emit 'ssh -q -o BatchMode=yes -o StrictHostKeyChecking=no root.65.108 mkdir /var/crash/10.16.64.220-$DATE'
+ NONL=
+ '[' 'ssh -q -o BatchMode=yes -o StrictHostKeyChecking=no root.65.108 mkdir /var/crash/10.16.64.220-$DATE' == -n ']'
...

- run SysRq-C
...
Activating logical volumes
  2 logical volume(s) in volume group "VolGroup00" now active
hwclock: Could not access RTC: No such file or directory
mapping eth0 to eth0
udhcpc (v1.2.0) started
udhcpc[1104]: udhcpc (v1.2.0) started
Sending discover...
udhcpc[1104]: Sending discover...
Sending discover...
udhcpc[1104]: Sending discover...
tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is off for TX and off for RX.
Sending discover...
udhcpc[1104]: Sending discover...
Sending select for 10.16.64.220...
udhcpc[1104]: Sending select for 10.16.64.220...
Lease of 10.16.64.220 obtained, lease time 86400
udhcpc[1104]: Lease of 10.16.64.220 obtained, lease time 86400
deleting routers
route: SIOC[ADD|DEL]RT: No such process
adding dns 10.16.255.2
adding dns 10.16.255.3
Saving to remote location root.bos.redhat.com
dd: invalid number `'
[: 1147: unknown operand
[: 1147: unknown operand.85 MB
[: 1147: unknown operand.85 MB
[: 1147: unknown operand.85 MB
[: 1147: unknown operand.85 MB
...

- init from the initramfs snip:
...
echo Saving to remote location root.bos.redhat.com
START_ADDR=`grep "System RAM" /proc/iomem | head -n 1 | cut -d"-" -f1`
SKIP_COUNT=`dc -e"$START_ADDR 512 / 1 +`
dd if=/dev/mem of=/dev/urandom count=1 bs=512 skip=$SKIP_COUNT
ssh -q -o BatchMode=yes -o StrictHostKeyChecking=no root.65.108 mkdir /var/crash/10.16.64.220-$DATE
VMCORE=/var/crash/10.16.64.220-$DATE/vmcore
export VMCORE
monitor_scp_progress root.65.108 /var/crash/10.16.64.220-$DATE/vmcore-incomplete &
scp -q -o BatchMode=yes -o StrictHostKeyChecking=no /proc/vmcore root.65.108:$VMCORE-incomplete
exitcode=$?
if [ $exitcode == 0 ]
...

Comment 5 Neil Horman 2008-09-26 15:51:49 UTC
Created attachment 317804 [details]
new patch to fix dd errors

sorry, this one should take care of it.  I tested it myself and it corrected both errors that you were seeing.  Please test and confirm, and I'll check it in asap

Comment 6 Qian Cai 2008-09-27 03:10:22 UTC
Neil, those "unknown operand" errors are gone, but the dd part is still failed here.

- run SysRq-C
...
Saving to remote location root.bos.redhat.com
dd: invalid number `5.85953e+06'
Copied 317.062 MB / 7323.19 MB
...

- init from the initramfs snip:
...
echo Saving to remote location root.bos.redhat.com
START_ADDR=`grep "System RAM" /proc/iomem | head -n 1 | cut -d"-" -f1`
SKIP_COUNT=`echo "$START_ADDR 512 / 1 + p" | dc`
dd if=/dev/mem of=/dev/urandom count=1 bs=512 skip=$SKIP_COUNT
...

If I run those manually, I got the following.

# START_ADDR=`grep "System RAM" /proc/iomem | head -n 1 | cut -d"-" -f1`
# echo $START_ADDR
3000080000
SKIP_COUNT=`echo "$START_ADDR 512 / 1 + p" | dc`
# echo $SKIP_COUNT
5859532

- If use busybox version of "dc", it could reproduce the original error.

SKIP_COUNT=`echo "$START_ADDR 512 / 1 + p" | busybox dc`
# echo $SKIP_COUNT
5.85953e+06

- Even if use the seems correct offset, it is still failed here.

# dd if=/dev/mem of=/dev/urandom count=1 bs=512 skip=5859532
dd: reading `/dev/mem': Bad address
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00036695 seconds, 0.0 kB/s

Comment 7 Neil Horman 2008-09-28 01:10:29 UTC
ugh, why can't anything ever be easy with ia64? :) 

Do me a favor, attach the /proc/iomem file from the ia64 system in question?  Thanks!

Comment 8 Qian Cai 2008-09-28 01:56:32 UTC
Created attachment 317878 [details]
/proc/iomem from altix4.rhts.bos.redhat.com

Comment 9 Neil Horman 2008-09-29 12:51:27 UTC
Cai, I just noticed that it looks like you've been using several different machines here to test this.  The /proc/iomem file is different from the system that you got the last failure on.  While thats not a big deal, we probablyneed to be consistent in the machine we test on to make sure that we don't get confused on the data that we're looking at.  I think the test results that you got on ndnc-1.lab.bos.redhat.com, are indicative of a minor problem in my use of teh dc command.  I need to tweak it so that I don't output numbers in sci notation. I'll attach a corrected patch shortly

Comment 10 Neil Horman 2008-09-29 12:56:13 UTC
Created attachment 317971 [details]
new version of patch

Heres a new version of the patch, it sets the output precision of dc such that it should not use sci notation for anything, enabling this to work properly.

Comment 11 Qian Cai 2008-09-30 05:38:07 UTC
Neil, the "dd: reading `/dev/mem': Bad address" error could be reproduced on both two of IA64 systems I have tested so far. One of them is altix4.rhts.bos.redhat.com . ndnc-1.lab.bos.redhat.com is just a Kdump scp target.

Anyway, with the patched version, I got this on altix4.rhts.bos.redhat.com,
...
deleting routers
route: SIOC[ADD|DEL]RT: No such process
adding dns 10.16.255.2
adding dns 10.16.255.3
Saving to remote location root.bos.redhat.com
dc: k: syntax error.
dd: invalid number `'
Copied 56.2344 MB / 7323.19 MB
...

Run the generated code from Kdump initramfs maunally,
# START_ADDR=`grep "System RAM" /proc/iomem | head -n 1 | cut -d"-" -f1`
# SKIP_COUNT=`echo "100 k $START_ADDR 512 / 1 + p" | busybox dc`
dc: k: syntax error.

Comment 12 Neil Horman 2008-09-30 11:08:26 UTC
Can I get on that system to poke around with this?  It appears you (or someone else is actively working on it at the moment)?

Comment 13 Qian Cai 2008-09-30 11:20:39 UTC
Sure, it is running RHEL 4.7 at the moment. Do you need to install RHEL 5.3? If so, I will cancel my reservation and reserve it for you.

Comment 14 Qian Cai 2008-09-30 11:59:52 UTC
I am signing off for today now. I have made a reservation of this machine for you, so it should be ready for you to have a look soon.

Comment 15 Neil Horman 2008-09-30 12:44:59 UTC
Thank you Cai, I'll have this working by the time you get back

Comment 16 Neil Horman 2008-09-30 19:41:16 UTC
Ok, bad news/good news.

The bad news is that  /dev/mem on ia64 seems to have a problem.  No matter where I seek to, it seems to reply with an EFAULT return code.  I'll need to dig into that further.

The good news is that after testing it, The system in question doesn't actually need the additional entropy in /dev/urandom for ssh to work properly.  As such, I can work around the problem by simply supressing the error.  So Thats how we'll be handling this for now.

I'll dig into why we keep getting EFAULT asap.

This is fixed in -44.el5.  Thanks!

Comment 20 errata-xmlrpc 2009-01-20 21:00:37 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0105.html