Bug 1041999 - ppc64 traceback after core test
Summary: ppc64 traceback after core test
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat Hardware Certification Program
Classification: Retired
Component: Test Suite (harness)
Version: 1.7.0
Hardware: ppc64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Greg Nichols
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
: 1022752 (view as bug list)
Depends On:
Blocks: 1052374 1083333
TreeView+ depends on / blocked
 
Reported: 2013-12-12 20:30 UTC by Brian Brock
Modified: 2023-09-14 01:55 UTC (History)
3 users (show)

Fixed In Version: hwcert-client 1.7.0-20140401
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1083333 (view as bug list)
Environment:
Last Closed: 2020-03-27 15:32:50 UTC
Embargoed:


Attachments (Terms of Use)
patch adds -S to the "tree" command. (731 bytes, patch)
2014-03-31 23:10 UTC, Greg Nichols
no flags Details | Diff
patch removing logging of /proc/cpuinfo and call to "tree" (5.14 KB, patch)
2014-04-02 01:50 UTC, Greg Nichols
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0752 0 normal SHIPPED_LIVE hwcert-client-1.7 bug fix and enhancement update 2014-06-12 14:10:15 UTC

Description Brian Brock 2013-12-12 20:30:31 UTC
Description of problem:
hwcert-backend began to crash with traceback, ending with

  File "/usr/share/hwcert/lib/hwcert/documentbase.py", line 281, in save
    file.write(self.document.toxml())
UnicodeEncodeError: 'ascii' codec can't encode characters in position
89835-89837: ordinal not in range(128)


Version-Release number of selected component (if applicable):

hwcert-client-1.7.0-20131210.el7


How reproducible:
unsure


log of session showing error:

Thu Dec 12 11:03:43
root@ibm-p720-02-lp4 ~
$ hwc clean all
Error: hwcert is already running (lock file /var/lock/subsys/hwcert found)
Override? (y|n) y
response: y
Are you sure you want to delete all test results? (y|n) y
response: y
Also remove certification data? (y|n) y
response: y

Thu Dec 12 11:03:54
root@ibm-p720-02-lp4 ~
$ hwc print
No test results or plan to print

Thu Dec 12 11:03:57
root@ibm-p720-02-lp4 ~
$ hwc plan
unable to initialize libusb: -99
unable to initialize libusb: -99
unable to initialize libusb: -99
skipping bad char \x00
skipping bad char \x00
Hardware: IBM 8202-E4C 8202-E4C
OS: Maipo 7

Please verify the hardware product information:
    vendor:bbrock
    make:8202-E4C
    model:8202-E4C
    product-url:
    category (Desktop/Workstation|Laptop|Component/Peripheral|Server)
Server
What certification is this system being tested for? (new|existing|none) new
response: new
Red Hat Catalog User Name:
Password:
Error: could not open a new certification:
    Fault code: 50
    Fault string: The function requires a login argument, and that argument
was not set.
Local Hardware Certification Test Server: hwcert.bos.devel.redhat.com
Created a new plan with 8 tests on 188 devices
package kabi-whitelists is not installed
The following packages are required for testing:
kabi-whitelists
Would you like to install them now? (y|n) n
response: n
Warning: some tests may fail due to missing packages:
    info requires kabi-whitelists

Thu Dec 12 11:06:03
root@ibm-p720-02-lp4 ~
$ hwc print

Test Plan:
----------------------------------------------------------------
1GigEthernet eth0       vio/30000003/net/eth0
memory
storage    host0      host0/target0:0:1/0:0:1:0/block/sda
core
profiler
info
kdump      nfs
kdump      local

Thu Dec 12 11:09:09
root@ibm-p720-02-lp4 ~
$ hwc run -t memory -t core -t profiler -t kdump
unable to initialize libusb: -99
unable to initialize libusb: -99
unable to initialize libusb: -99
Set server to hwcert.bos.devel.redhat.com for  5 test(s)

Running the following tests:
memory
core
profiler
kdump      nfs
kdump      local
info
System Memory: 1486 MB
Free Memory: 1023 MB
Swap Memory: 3103 MB

Test Verification Passed
mkdir -p /tmp/hwcert-memory-zClUZ6
cp -a threaded_memtest.c memory.py Makefile /tmp/hwcert-memory-zClUZ6
Warning: test build produced errors.
"make build" has output on stderr
threaded_memtest.c: In function ‘mem_twiddler’:
threaded_memtest.c:171:19: warning: variable ‘garbage’ set but not used
[-Wunused-but-set-variable]
     volatile long garbage;
                   ^

Subtest: Limits - Get test parameters based on hardware
System Memory: 1486 MB
Free Memory: 1021 MB
Swap Memory: 3103 MB
PASS

Subtest Single-process:
Starting Threaded Memory Test
running for more than free memory at 1072 MB for 60 sec.
Warning: memsize > free_mem. You will probably hit swap.
Detected 4 processors.
RAM: 10.8% free (160M/1486M)
Testing 1072M RAM for 60 seconds using 8 threads:
thread 0: mapping 134M RAM
thread 1: mapping 134M RAM
thread 2: mapping 134M RAM
thread 4: mapping 134M RAM
thread 3: mapping 134M RAM
thread 5: mapping 134M RAM
thread 6: mapping 134M RAM
thread 7: mapping 134M RAM
thread 6: mapping complete
thread 5: mapping complete
thread 1: mapping complete
thread 0: mapping complete
thread 4: mapping complete
thread 3: mapping complete
thread 2: mapping complete
thread 7: mapping complete
thread 6 (1): test start
thread 0 (2): test start
thread 5 (3): test start
thread 4 (4): test start
thread 3 (5): test start
thread 1 (6): test start
thread 7 (7): test start
thread 2 (8): test start
thread 6 (7): test start
thread 2 (6): test start
thread 7 (5): test start
thread 1 (4): test start
thread 3 (3): test start
thread 4 (2): test start
thread 0 (1): test start
thread 5 (0): test start
thread 5 unmapping and exiting
thread 6 unmapping and exiting
thread 2 unmapping and exiting
thread 3 unmapping and exiting
thread 7 unmapping and exiting
thread 4 unmapping and exiting
thread 0 unmapping and exiting
thread 1 unmapping and exiting
Runtime was 61.07s
thread 0: 2674897 loops
thread 1: 2653124 loops
thread 2: 2669645 loops
thread 3: 2650183 loops
thread 4: 2644443 loops
thread 5: 2698453 loops
thread 6: 2693925 loops
thread 7: 2681450 loops
Total loops per second: 349848.84
Testing complete.
done.
running for free memory
Detected 4 processors.
RAM: 79.9% free (1188M/1486M)
Testing 1128M RAM for 900 seconds using 8 threads:
thread 0: mapping 141M RAM
thread 1: mapping 141M RAM
thread 2: mapping 141M RAM
thread 3: mapping 141M RAM
thread 4: mapping 141M RAM
thread 5: mapping 141M RAM
thread 6: mapping 141M RAM
thread 7: mapping 141M RAM
thread 2: mapping complete
thread 3: mapping complete
thread 1: mapping complete
thread 0: mapping complete
thread 4: mapping complete
thread 7: mapping complete
thread 6: mapping complete
thread 5: mapping complete
thread 2 (1): test start
thread 3 (2): test start
thread 1 (3): test start
thread 0 (4): test start
thread 7 (5): test start
thread 4 (6): test start
thread 5 (7): test start
thread 6 (8): test start
thread 7 (7): test start
thread 3 (6): test start
thread 2 (5): test start
thread 6 (4): test start
thread 5 (3): test start
thread 0 (2): test start
thread 4 (1): test start
thread 1 (0): test start
thread 1 unmapping and exiting
thread 7 unmapping and exiting
thread 2 unmapping and exiting
thread 6 unmapping and exiting
thread 5 unmapping and exiting
thread 0 unmapping and exiting
thread 4 unmapping and exiting
thread 3 unmapping and exiting
Runtime was 901.25s
thread 0: 39178840 loops
thread 1: 39376033 loops
thread 2: 39446784 loops
thread 3: 39406407 loops
thread 4: 39169527 loops
thread 5: 39455355 loops
thread 6: 39501084 loops
thread 7: 39164056 loops
Total loops per second: 349179.12
Testing complete.
done.
PASS
copying attachments...
checking directory /var/log/hwcert/runs/1/memory
Skipping output.log
saveOutput: /var/log/hwcert/runs/1/memory/output.log
Return value was 0
mkdir -p /tmp/hwcert-core-YSEBuI
cp -a clocktest.c CORE2 core.py Makefile /tmp/hwcert-core-YSEBuI
cc -Wall -DCPU_ALLOC -lrt clocktest.c -o clocktest
chmod a+x ./CORE2 ./core.py

Subtest: Limits - Get test parameters based on hardware
System Memory: 1486 MB
Free Memory: 1247 MB
Swap Memory: 3103 MB
PASS

Subtest: CORE2 - Run CORE2 script for cpu info
+----------ppc64 CPU info start----------+
+-----/proc/cpuinfo-----+
processor       : 0
cpu             : POWER7 (architected), altivec supported
clock           : 3024.000000MHz
revision        : 2.3 (pvr 003f 0203)

processor       : 1
cpu             : POWER7 (architected), altivec supported
clock           : 3024.000000MHz
revision        : 2.3 (pvr 003f 0203)

processor       : 2
cpu             : POWER7 (architected), altivec supported
clock           : 3024.000000MHz
revision        : 2.3 (pvr 003f 0203)

processor       : 3
cpu             : POWER7 (architected), altivec supported
clock           : 3024.000000MHz
revision        : 2.3 (pvr 003f 0203)

timebase        : 512000000
platform        : pSeries
model           : IBM,8202-E4C
machine         : CHRP IBM,8202-E4C

+-----/proc/device-tree/cpus-----+
├── PowerPC,POWER7@0

+----------ppc64 CPU info end----------+

PASS

Subtest: clocktest - Clock tests
Clock Info: ------------------------------------------

Warning: could not determine clocksource
Clock Source in
/sys/devices/system/clocksource/clocksource*/current_clocksource: timebase

Running clock tests
Testing for clock jitter on 4 cpus
using CPU_CALLOC
PASSED, largest jitter seen was 0.002015
largest jitter seen was 0.002015
clock direction test: start time 1386865580, stop time 1386865640,
sleeptime 60, delta 0
PASSED
PASS

Subtest Stress:
Note: scaling back 12 processes at 103 MB for memory limit of 1247 MB
Running stress for 10 min.
stress --cpu 12 --io 12 --vm 12 --vm-bytes 103M --timeout 10m
stress: info: [4089] dispatching hogs: 12 cpu, 12 io, 12 vm, 0 hdd
stress: info: [4089] successful run completed in 600s
PASS
copying attachments...
checking directory /var/log/hwcert/runs/1/core
Skipping output.log
saveOutput: /var/log/hwcert/runs/1/core/output.log
Return value was 0
Traceback (most recent call last):
  File "/usr/bin/hwcert-backend", line 45, in <module>
    success = hwcertBackend.do(args)
  File "/usr/share/hwcert/lib/hwcert/backend.py", line 182, in do
    result = self.commands[self.command]()
  File "/usr/share/hwcert/lib/hwcert/harness.py", line 393, in doRun
    return self._doRun(tests)
  File "/usr/share/hwcert/lib/hwcert/harness.py", line 540, in _doRun
    self.certification.save(self.environment.getResultsPath())
  File "/usr/share/hwcert/lib/hwcert/documentbase.py", line 281, in save
    file.write(self.document.toxml())
UnicodeEncodeError: 'ascii' codec can't encode characters in position
89835-89837: ordinal not in range(128)

Thu Dec 12 11:37:25
root@ibm-p720-02-lp4 ~
$ hwc print
Error: hwcert is already running (lock file /var/lock/subsys/hwcert found)
Override? (y|n) n
response: n

Thu Dec 12 11:49:33
root@ibm-p720-02-lp4 ~
$ hwc print
Error: hwcert is already running (lock file /var/lock/subsys/hwcert found)
Override? (y|n) y
response: y
Traceback (most recent call last):
  File "/usr/bin/hwcert-backend", line 45, in <module>
    success = hwcertBackend.do(args)
  File "/usr/share/hwcert/lib/hwcert/backend.py", line 182, in do
    result = self.commands[self.command]()
  File "/usr/share/hwcert/lib/hwcert/harness.py", line 702, in doPrint
    self.load()
  File "/usr/share/hwcert/lib/hwcert/harness.py", line 57, in load
    self.certification.load(self.environment.getResultsPath())
  File "/usr/share/hwcert/lib/hwcert/certificationtest.py", line 182, in
load
    DocumentBase.load(self, filename)
  File "/usr/share/hwcert/lib/hwcert/documentbase.py", line 275, in load
    self.document = parse(file)
  File "/usr/lib64/python2.7/xml/dom/minidom.py", line 1921, in parse
    return expatbuilder.parse(file)
  File "/usr/lib64/python2.7/xml/dom/expatbuilder.py", line 928, in parse
    result = builder.parseFile(file)
  File "/usr/lib64/python2.7/xml/dom/expatbuilder.py", line 211, in
parseFile
    parser.Parse("", True)
xml.parsers.expat.ExpatError: no element found: line 1, column 0

Comment 1 Brian Brock 2013-12-12 20:42:00 UTC
cleaned up and ran again, this time it worked:

Thu Dec 12 12:36:31
root@ibm-p720-02-lp4 ~
$ hwc clean all
Error: hwcert is already running (lock file /var/lock/subsys/hwcert found)
Override? (y|n) y
response: y
Are you sure you want to delete all test results? (y|n) y
response: y
Also remove certification data? (y|n) y
response: y

Thu Dec 12 12:36:52
root@ibm-p720-02-lp4 ~
$ hwc print
No test results or plan to print

Thu Dec 12 12:36:58
root@ibm-p720-02-lp4 ~
$ hwc plan
unable to initialize libusb: -99
unable to initialize libusb: -99
unable to initialize libusb: -99
skipping bad char \x00
skipping bad char \x00
Hardware: IBM 8202-E4C 8202-E4C
OS: Maipo 7

Please verify the hardware product information:
    vendor:bbrock
    make:8202-E4C
    model:8202-E4C
    product-url:
    category (Desktop/Workstation|Laptop|Component/Peripheral|Server) Server
What certification is this system being tested for? (new|existing|none) new
response: new
Red Hat Catalog User Name:
Password:
Error: could not open a new certification:
    Fault code: 50
    Fault string: The function requires a login argument, and that argument was not set.
Local Hardware Certification Test Server: hwcert.bos.devel.redhat.com
Created a new plan with 8 tests on 188 devices
package kabi-whitelists is not installed
The following packages are required for testing:
kabi-whitelists
Would you like to install them now? (y|n) n
response: n
Warning: some tests may fail due to missing packages:
    info requires kabi-whitelists

Thu Dec 12 12:38:12
root@ibm-p720-02-lp4 ~
$ hwc print

Test Plan:
----------------------------------------------------------------
1GigEthernet eth0       vio/30000003/net/eth0
memory
storage    host0      host0/target0:0:1/0:0:1:0/block/sda
core
profiler
info
kdump      nfs
kdump      local

Thu Dec 12 12:38:15
root@ibm-p720-02-lp4 ~
$ hwc run -t memory -t profiler -t kdump
unable to initialize libusb: -99
unable to initialize libusb: -99
unable to initialize libusb: -99
Set server to hwcert.bos.devel.redhat.com for  4 test(s)

Running the following tests:
memory
profiler
kdump      nfs
kdump      local
info
System Memory: 1486 MB
Free Memory: 1217 MB
Swap Memory: 3103 MB

Test Verification Passed
mkdir -p /tmp/hwcert-memory-1TyjFc
cp -a threaded_memtest.c memory.py Makefile /tmp/hwcert-memory-1TyjFc
Warning: test build produced errors.
"make build" has output on stderr
threaded_memtest.c: In function ‘mem_twiddler’:
threaded_memtest.c:171:19: warning: variable ‘garbage’ set but not used [-Wunused-but-set-variable]
     volatile long garbage;
                   ^

Subtest: Limits - Get test parameters based on hardware
System Memory: 1486 MB
Free Memory: 1216 MB
Swap Memory: 3103 MB
PASS

Subtest Single-process:
Starting Threaded Memory Test
running for more than free memory at 1276 MB for 60 sec.
Warning: memsize > free_mem. You will probably hit swap.
Detected 4 processors.
RAM: 74.8% free (1112M/1486M) 
Testing 1276M RAM for 60 seconds using 8 threads:
thread 0: mapping 159M RAM
thread 1: mapping 159M RAM
thread 2: mapping 159M RAM
thread 3: mapping 159M RAM
thread 5: mapping 159M RAM
thread 4: mapping 159M RAM
thread 6: mapping 159M RAM
thread 7: mapping 159M RAM
thread 2: mapping complete
thread 3: mapping complete
thread 0: mapping complete
thread 6: mapping complete
thread 5: mapping complete
thread 4: mapping complete
thread 1: mapping complete
thread 7: mapping complete
thread 2 (1): test start
thread 3 (2): test start
thread 0 (3): test start
thread 6 (4): test start
thread 4 (5): test start
thread 5 (6): test start
thread 1 (7): test start
thread 7 (8): test start
^[[6~thread 3 (7): test start
thread 5 (6): test start
thread 0 (5): test start
thread 6 (4): test start
thread 7 (3): test start
thread 4 (2): test start
thread 1 (1): test start
thread 2 (0): test start
thread 2 unmapping and exiting
thread 3 unmapping and exiting
thread 5 unmapping and exiting
thread 0 unmapping and exiting
thread 6 unmapping and exiting
thread 7 unmapping and exiting
thread 4 unmapping and exiting
thread 1 unmapping and exiting
Runtime was 62.30s
thread 0: 2082 loops
thread 1: 1751 loops
thread 2: 2118 loops
thread 3: 2195 loops
thread 4: 2257 loops
thread 5: 2000 loops
thread 6: 1911 loops
thread 7: 2059 loops
Total loops per second: 262.81
Testing complete.
done.
running for free memory
Detected 4 processors.
RAM: 85.5% free (1271M/1486M)
Testing 1208M RAM for 900 seconds using 8 threads:
thread 0: mapping 151M RAM
thread 1: mapping 151M RAM
thread 2: mapping 151M RAM
thread 3: mapping 151M RAM
thread 4: mapping 151M RAM
thread 5: mapping 151M RAM
thread 6: mapping 151M RAM
thread 7: mapping 151M RAM
thread 2: mapping complete
thread 1: mapping complete
thread 0: mapping complete
thread 3: mapping complete
thread 4: mapping complete
thread 6: mapping complete
thread 5: mapping complete
thread 7: mapping complete
thread 2 (1): test start
thread 1 (2): test start
thread 0 (3): test start
thread 3 (4): test start
thread 6 (5): test start
thread 4 (6): test start
thread 5 (7): test start
thread 7 (8): test start
thread 2 (7): test start
thread 3 (6): test start
thread 1 (5): test start
thread 0 (4): test start
thread 4 (3): test start
thread 7 (2): test start
thread 5 (1): test start
thread 6 (0): test start
thread 6 unmapping and exiting
thread 0 unmapping and exiting
thread 7 unmapping and exiting
thread 1 unmapping and exiting
thread 3 unmapping and exiting
thread 2 unmapping and exiting
thread 5 unmapping and exiting
thread 4 unmapping and exiting
Runtime was 939.25s
thread 0: 40230628 loops
thread 1: 40455146 loops
thread 2: 40427525 loops
thread 3: 40339166 loops
thread 4: 40293960 loops
thread 5: 40403623 loops
thread 6: 40553402 loops
thread 7: 40332870 loops
Total loops per second: 343929.38
Testing complete.
done.
PASS
copying attachments...
checking directory /var/log/hwcert/runs/1/memory
Skipping output.log
saveOutput: /var/log/hwcert/runs/1/memory/output.log
Return value was 0
mkdir -p /tmp/hwcert-profiler-mZcAUP
cp -a  Makefile /tmp/hwcert-profiler-mZcAUP
make: `build' is up to date.  

Subtest Initialize:
using linux image /usr/lib/debug/lib/modules/3.10.0-54.0.1.el7.ppc64/vmlinux
Using Linux image /usr/lib/debug/lib/modules/3.10.0-54.0.1.el7.ppc64/vmlinux
NMI Watchdog = 1
reseting NMI watchdog
PASS

Subtest Reset:

==== START: Errors during reset may be ignored. ====
Warning:
"opcontrol --shutdown" has output on stderr
Verified data has beed removed
^^^^ END: Errors during reset may be ignored. ^^^^

PASS

Subtest Start Daemon:
starting opcontrold
ATTENTION: Use of opcontrol is discouraged.  Please see the man page for operf.
Using default event: CYCLES:100000:0:1:1
Using 2.6+ OProfile kernel interface.
Reading module info.
Using log file /var/lib/oprofile/samples/oprofiled.log
Daemon started.
started
PASS

Subtest Start OProfile:
oprofile start: initializing...
checking /dev/oprofile filesystem presence
oprofile filesystem mounted
The profiling daemon is currently active, so changes to the configuration
will be used the next time you restart oprofile after a --shutdown or --deinit.
Profiler running.
oprofile version: 0.9.9
PASS

Subtest Report:
Warning: "opreport" has output on stderr
Using /var/lib/oprofile/samples/ for samples directory.
PASS
restoring NMI watchdog

Subtest Reset:

==== START: Errors during reset may be ignored. ====
Stopping profiling.
Killing daemon.
Verified data has beed removed
^^^^ END: Errors during reset may be ignored. ^^^^

PASS
copying attachments...
checking directory /var/log/hwcert/runs/1/profiler
Skipping output.log
saveOutput: /var/log/hwcert/runs/1/profiler/output.log
Return value was 0
mkdir -p /tmp/hwcert-kdump-zJGM0y
cp -a  Makefile /tmp/hwcert-kdump-zJGM0y
make: `build' is up to date.

Subtest initialize:
Checking required packages:
kexec-tools-2.0.4-13.el7.ppc64
crash-7.0.2-2.el7.ppc64
kernel-debuginfo-3.10.0-54.0.1.el7.ppc64
Checking kdump configuration
Found crashkernel=auto boot parameter
Kernel panic reboot timeout is 10
kdump configuration:
--------------------
path = /var/crash
--------------------

Adding core_collector = makedumpfile -c -d 31 to /etc/kdump.conf
Adding nfs hwcert.bos.devel.redhat.com:/var/www/hwcert/export to /etc/kdump.conf
Attempting to mount nfs setting hwcert.bos.devel.redhat.com:/var/www/hwcert/export as nfs.
updated kdump configuration:
--------------------
path = /var/crash
core_collector = makedumpfile -c -d 31
nfs = hwcert.bos.devel.redhat.com:/var/www/hwcert/export
--------------------

restarting kdump with new configuration...
Error: kdump restart failed
"systemctl start kdump" has output on stderr
Job for kdump.service failed. See 'systemctl status kdump.service' and 'journalctl -xn' for details.
Checking kdump service
Error: kdump is not running - can not test it
"systemctl status kdump" no match for regular expression Active: active
Warning: fail to get the network interface info in-use
"/sbin/ip -o route get to hwcert.bos.devel.redhat.com" no match for regular expression .* dev (?P<ifname>[a-z]+\d+) .*
copying attachments...
checking directory /var/log/hwcert/runs/1/kdump
Skipping output.log
saveOutput: /var/log/hwcert/runs/1/kdump/output.log
Warning: could not merge output XML, reading as text file.
no element found: line 33, column 0
Return value was 1
mkdir -p /tmp/hwcert-kdump-j81CKR
cp -a  Makefile /tmp/hwcert-kdump-j81CKR
make: `build' is up to date.

Subtest initialize:
Checking required packages:
kexec-tools-2.0.4-13.el7.ppc64
crash-7.0.2-2.el7.ppc64
kernel-debuginfo-3.10.0-54.0.1.el7.ppc64
Checking kdump configuration
Found crashkernel=auto boot parameter
Kernel panic reboot timeout is 10
kdump configuration:
--------------------
path = /var/crash
core_collector = makedumpfile -c -d 31
nfs = hwcert.bos.devel.redhat.com:/var/www/hwcert/export
--------------------

core_collector currently set to "makedumpfile -c -d 31"
Adding xfs UUID=17cb29a2-be07-40a6-a55d-834effd4bf57 to /etc/kdump.conf
Removing nfs parameter for local target dump
updated kdump configuration:
--------------------
path = /var/crash
xfs = UUID=17cb29a2-be07-40a6-a55d-834effd4bf57
core_collector = makedumpfile -c -d 31
--------------------

restarting kdump with new configuration...
Error: kdump restart failed
"systemctl start kdump" has output on stderr
Job for kdump.service failed. See 'systemctl status kdump.service' and 'journalctl -xn' for details.
Checking kdump service
Error: kdump is not running - can not test it
"systemctl status kdump" no match for regular expression Active: active
copying attachments...
checking directory /var/log/hwcert/runs/1/kdump
Skipping output.log
saveOutput: /var/log/hwcert/runs/1/kdump/output.log
Warning: could not merge output XML, reading as text file.
junk after document element: line 3, column 0
Return value was 1
mkdir -p /tmp/hwcert-info-IvOs9n
cp -a  Makefile /tmp/hwcert-info-IvOs9n
make: `build' is up to date.

Subtest: Log versions - Log hwcert and OS version and release
Tested OS: Red Hat Enterprise Linux Everything 7 (Maipo)
Kernel RPM: kernel-3.10.0-54.0.1.el7
hwcert-client version 1.7.0, release 20131210
PASS

Subtest: Verify hwcert-client - Verify the hwcert-client installation
Checking hwcert configuration.
    Using defaults.
PASS

Subtest: Kernel - Check OS kernel build, version
+ rpm -ql kernel-3.10.0-54.0.1.el7
Error: Kernel is 3.10.0-54.0.1 and not Red Hat Enterprise Linux Everything 7.0 GA (None)
Boot Parameters: BOOT_IMAGE=/vmlinuz-3.10.0-54.0.1.el7.ppc64 root=/dev/mapper/rhel_ibm--p720--02--lp4-root ro rd.lvm.lv=rhel_ibm-p720-02-lp4/root vconsole.keymap=us crashkernel=auto rd.lvm.lv=rhel_ibm-p720-02-lp4/swap vconsole.font=latarcyrheb-sun16 LANG=en_US.UTF-8
FAIL

Subtest: Modules - Check kernel modules
checking modules...
PASS

Subtest: SE Linux - Capture SE Linux status
PASS

Subtest: System Report - generate system report
Usage: sosreport [options]


sosreport (version 3.0)

This command will collect system configuration and
diagnostic information from this Red Hat Enterprise Linux
system. An archive containing the collected information
will be generated in /var/tmp and may be provided to a Red
Hat support representative or used for local diagnostic or
recording purposes.

Any information provided to Red Hat will be treated in
strict confidence in accordance with the published support
policies at:

https://access.redhat.com/support/

The generated archive may contain data considered
sensitive and its content should be reviewed by the
originating organization before being passed to any third
party.

No changes will be made to system configuration.


Running plugins. Please wait ...
  
  Running 63/63: yum...
Creating compressed archive...

Your sosreport has been generated and saved in:
/var/tmp/sosreport-ibm-p720-02-lp4.rhts.eng.nay.redhat.com-20131212130318.tar.xz
Copied sosreport --batch -n selinux /var/tmp/sosreport-ibm-p720-02-lp4.rhts.eng.nay.redhat.com-20131212130318.tar.xz to /var/log/hwcert/runs/1/info
PASS
copying attachments...
checking directory /var/log/hwcert/runs/1/info
Skipping output.log
Adding /var/log/hwcert/runs/1/info/sosreport-ibm-p720-02-lp4.rhts.eng.nay.redhat.com-20131212130318.tar.xz
copied attachment file sosreport-ibm-p720-02-lp4.rhts.eng.nay.redhat.com-20131212130318.tar.xz
saveOutput: /var/log/hwcert/runs/1/info/output.log
Return value was 1

Thu Dec 12 03:18:21
root@ibm-p720-02-lp4 ~
$ hwc print

Red Hat Hardware Certification test
--------------------------------------------
Test Suite:    1.7.0    Release: 20131210
Plan Created:  2013-12-12 17:38:03
Test Server:   hwcert.bos.devel.redhat.com
--------------------------------------------

Run: 1 on 2013-12-12 17:39:13 
--------------------------------------------
Tests: 8 planned,  5 run, 2 passed, 3 failed
--------------------------------------------


Test Run 1
----------------------------------------------------------------
1GigEthernet eth0    vio/30000003/net/eth0                -
memory                                               - PASS
storage host0   host0/target0:0:1/0:0:1:0/block/sda  -
core                                                 -
profiler                                              - PASS
info                                                 - FAIL
kdump   nfs                                          - FAIL
kdump   local                                        - FAIL

Combined Results for 1 Runs:  
--------------------------------------------
   8 tests planned
   5 tests run
   3 tests always failed
   2 tests always passed

Comment 3 Brian Brock 2013-12-12 21:09:06 UTC
running 'core' is what causes the problem:

Thu Dec 12 03:52:30
root@ibm-p720-02-lp4 ~
$ hwc run -t core
unable to initialize libusb: -99
unable to initialize libusb: -99
unable to initialize libusb: -99
Set server to hwcert.bos.devel.redhat.com for  1 test(s)

Running the following tests:
core
info

Test Verification Passed
mkdir -p /tmp/hwcert-core-Z15MzT
cp -a clocktest.c CORE2 core.py Makefile /tmp/hwcert-core-Z15MzT
cc -Wall -DCPU_ALLOC -lrt clocktest.c -o clocktest
chmod a+x ./CORE2 ./core.py

Subtest: Limits - Get test parameters based on hardware
System Memory: 1486 MB
Free Memory: 1147 MB
Swap Memory: 3103 MB
PASS

Subtest: CORE2 - Run CORE2 script for cpu info
+----------ppc64 CPU info start----------+
+-----/proc/cpuinfo-----+
processor       : 0
cpu             : POWER7 (architected), altivec supported
clock           : 3024.000000MHz
revision        : 2.3 (pvr 003f 0203)

processor       : 1
cpu             : POWER7 (architected), altivec supported
clock           : 3024.000000MHz
revision        : 2.3 (pvr 003f 0203)

processor       : 2
cpu             : POWER7 (architected), altivec supported
clock           : 3024.000000MHz
revision        : 2.3 (pvr 003f 0203)

processor       : 3
cpu             : POWER7 (architected), altivec supported
clock           : 3024.000000MHz
revision        : 2.3 (pvr 003f 0203)

timebase        : 512000000
platform        : pSeries
model           : IBM,8202-E4C
machine         : CHRP IBM,8202-E4C

+-----/proc/device-tree/cpus-----+
├── PowerPC,POWER7@0

+----------ppc64 CPU info end----------+

PASS

Subtest: clocktest - Clock tests
Clock Info: ------------------------------------------

Warning: could not determine clocksource
Clock Source in /sys/devices/system/clocksource/clocksource*/current_clocksource: timebase

Running clock tests
Testing for clock jitter on 4 cpus
using CPU_CALLOC
PASSED, largest jitter seen was 0.002013
largest jitter seen was 0.002013
clock direction test: start time 1386881578, stop time 1386881638, sleeptime 60, delta 0
PASSED
PASS

Subtest Stress:
Note: scaling back 12 processes at 95 MB for memory limit of 1147 MB
Running stress for 10 min.
stress --cpu 12 --io 12 --vm 12 --vm-bytes 95M --timeout 10m
stress: info: [22837] dispatching hogs: 12 cpu, 12 io, 12 vm, 0 hdd
stress: info: [22837] successful run completed in 600s
PASS
copying attachments...
checking directory /var/log/hwcert/runs/2/core
Skipping output.log
saveOutput: /var/log/hwcert/runs/2/core/output.log
Return value was 0
Traceback (most recent call last):
  File "/usr/bin/hwcert-backend", line 45, in <module>
    success = hwcertBackend.do(args)
  File "/usr/share/hwcert/lib/hwcert/backend.py", line 182, in do
    result = self.commands[self.command]()
  File "/usr/share/hwcert/lib/hwcert/harness.py", line 393, in doRun
    return self._doRun(tests)
  File "/usr/share/hwcert/lib/hwcert/harness.py", line 540, in _doRun
    self.certification.save(self.environment.getResultsPath())
  File "/usr/share/hwcert/lib/hwcert/documentbase.py", line 281, in save
    file.write(self.document.toxml())
UnicodeEncodeError: 'ascii' codec can't encode characters in position 2827292-2827294: ordinal not in range(128)

Comment 4 Brian Brock 2013-12-12 21:11:53 UTC
After the failed core test, print is broken.  hwcert-backend run -t info immediately gives the same traceback the same way:

hu Dec 12 04:03:59
root@ibm-p720-02-lp4 ~
$ hwc print
Error: hwcert is already running (lock file /var/lock/subsys/hwcert found)
Override? (y|n) y
response: y
Traceback (most recent call last):
  File "/usr/bin/hwcert-backend", line 45, in <module>
    success = hwcertBackend.do(args)
  File "/usr/share/hwcert/lib/hwcert/backend.py", line 182, in do
    result = self.commands[self.command]()
  File "/usr/share/hwcert/lib/hwcert/harness.py", line 702, in doPrint
    self.load()
  File "/usr/share/hwcert/lib/hwcert/harness.py", line 57, in load
    self.certification.load(self.environment.getResultsPath())
  File "/usr/share/hwcert/lib/hwcert/certificationtest.py", line 182, in load
    DocumentBase.load(self, filename)
  File "/usr/share/hwcert/lib/hwcert/documentbase.py", line 275, in load
    self.document = parse(file)
  File "/usr/lib64/python2.7/xml/dom/minidom.py", line 1921, in parse
    return expatbuilder.parse(file)
  File "/usr/lib64/python2.7/xml/dom/expatbuilder.py", line 928, in parse
    result = builder.parseFile(file)
  File "/usr/lib64/python2.7/xml/dom/expatbuilder.py", line 211, in parseFile
    parser.Parse("", True)
xml.parsers.expat.ExpatError: no element found: line 1, column 0

Comment 6 Greg Nichols 2013-12-13 13:24:48 UTC
Could you try running clean, then core by itself?

Comment 8 Brian Brock 2014-03-20 20:13:30 UTC
Created attachment 877021 [details]
/var/hwcert and /var/log/hwcert

captured after running "clean" followed by "run -t core"

Comment 9 Brian Brock 2014-03-20 20:17:42 UTC
to clarify, today's update is after encountering the situation in comment 3.

Comment 10 Brian Brock 2014-03-20 20:30:24 UTC
stress: info: [19623] dispatching hogs: 12 cpu, 12 io, 12 vm, 0 hdd
stress: info: [19623] successful run completed in 600s
PASS
copying attachments...
checking directory /var/log/hwcert/runs/1/core
Skipping output.log
saveOutput: /var/log/hwcert/runs/1/core/output.log
Return value was 0
Traceback (most recent call last):
  File "/usr/bin/hwcert-backend", line 45, in <module>
    success = hwcertBackend.do(args)
  File "/usr/share/hwcert/lib/hwcert/backend.py", line 182, in do
    result = self.commands[self.command]()
  File "/usr/share/hwcert/lib/hwcert/harness.py", line 358, in doRun
    return self._doRun(tests)
  File "/usr/share/hwcert/lib/hwcert/harness.py", line 505, in _doRun
    self.certification.save(self.environment.getResultsPath())
  File "/usr/share/hwcert/lib/hwcert/documentbase.py", line 285, in save
    file.write(self.document.toxml())
UnicodeEncodeError: 'ascii' codec can't encode characters in position 115981-115983: ordinal not in range(128)

Comment 21 Brian Brock 2014-03-26 21:52:37 UTC
*** Bug 1022752 has been marked as a duplicate of this bug. ***

Comment 27 Greg Nichols 2014-03-31 23:10:12 UTC
Created attachment 881130 [details]
patch adds -S to the "tree" command.

Comment 32 Greg Nichols 2014-04-02 01:50:45 UTC
Created attachment 881583 [details]
patch removing logging of /proc/cpuinfo and call to "tree"

Comment 33 Brian Brock 2014-04-02 21:41:01 UTC
no obvious side effects to other arches

Comment 34 Brian Brock 2014-04-03 04:57:18 UTC
still waiting for ppc64 hardware to test patch.

Comment 35 Brian Brock 2014-04-03 20:29:14 UTC
verified that a system broken with 20140325.1 does not break with 20140402.1.  That system is another lpar on the same system as the lpar showing the error before.

Comment 38 Red Hat Bugzilla 2023-09-14 01:55:19 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.