Bug 724937

Summary: hwloc-1.2-0.fc16 fails xmlbuffer self check on PPC, but passes on PPC64
Product: [Fedora] Fedora Reporter: Karsten Hopp <karsten>
Component: hwlocAssignee: Jiri Hladky <hladky.jiri>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: Brice.Goglin, hladky.jiri
Target Milestone: ---   
Target Release: ---   
Hardware: powerpc   
OS: Unspecified   
Whiteboard:
Fixed In Version: hwloc-1.3-1.fc16 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-11-25 02:16:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Karsten Hopp 2011-07-22 11:16:50 UTC
Description of problem:
the xmlbuffer self check fails on ppc, see 
http://ppc.koji.fedoraproject.org/koji/getfile?taskID=256586&name=build.log

The difference between the first exported buffer and the second exported buffer is in the lines
<page_type size="17179869184" count="0"/>
vs.
<page_type size="4294967295" count="0"/>

Version-Release number of selected component (if applicable):
hwloc-1.2-0.fc16

How reproducible:
always

Steps to Reproduce:
1. ppc-koji build --scratch dist-f16 hwloc-1.2-0.fc16.src.rpm
2.
3.
  
Actual results:
http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=256555

Comment 1 Jiri Hladky 2011-09-21 21:56:18 UTC
Just tested hwloc-1.2.1, bug is still there, contacting hwloc developers

ppc-koji build --scratch dist-f16 rpmbuild/SRPMS/hwloc-1.2.1-0.fc14.src.rpm

Please see a complete build log at
http://ppc.koji.fedoraproject.org/koji/getfile?taskID=285892&name=build.log

Thanks
Jirka

PASS: glibc-sched
exported to buffer 0x10568a30 length 1835
re-exported to buffer 0x1056d118 length 1834
### First exported buffer is:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topology SYSTEM "hwloc.dtd">
<topology>
  <object type="Machine" os_level="-1" os_index="0" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003" local_memory="16091512832">
    <page_type size="17179869184" count="0"/>
    <page_type size="65536" count="245537"/>
    <page_type size="16777216" count="0"/>
    <info name="Backend" value="Linux"/>
    <info name="OSName" value="Linux"/>
    <info name="OSRelease" value="2.6.32-131.6.1.el6.ppc64"/>
    <info name="OSVersion" value="#1 SMP Mon Jun 20 14:15:43 EDT 2011"/>
    <info name="HostName" value="ppc-comm01"/>
    <info name="Architecture" value="ppc"/>
    <object type="Socket" os_level="-1" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003">
      <object type="Cache" os_level="-1" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003" cache_size="4194304" depth="2"
cache_linesize="128">
        <object type="Cache" os_level="-1" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003" cache_size="65536" depth="1" cache_linesize="128">
          <object type="Core" os_level="-1" os_index="0" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003">
            <object type="PU" os_level="-1" os_index="0" cpuset="0x00000001"
complete_cpuset="0x00000001" online_cpuset="0x00000001"
allowed_cpuset="0x00000001"/>
            <object type="PU" os_level="-1" os_index="1" cpuset="0x00000002"
complete_cpuset="0x00000002" online_cpuset="0x00000002"
allowed_cpuset="0x00000002"/>
          </object>
        </object>
      </object>
    </object>
  </object>
</topology>
### End of first export buffer
### Second exported buffer is:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topology SYSTEM "hwloc.dtd">
<topology>
  <object type="Machine" os_level="-1" os_index="0" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003" local_memory="16091512832">
    <page_type size="4294967295" count="0"/>
    <page_type size="65536" count="245537"/>
    <page_type size="16777216" count="0"/>
    <info name="Backend" value="Linux"/>
    <info name="OSName" value="Linux"/>
    <info name="OSRelease" value="2.6.32-131.6.1.el6.ppc64"/>
    <info name="OSVersion" value="#1 SMP Mon Jun 20 14:15:43 EDT 2011"/>
    <info name="HostName" value="ppc-comm01"/>
    <info name="Architecture" value="ppc"/>
    <object type="Socket" os_level="-1" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003">
      <object type="Cache" os_level="-1" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003" cache_size="4194304" depth="2"
cache_linesize="128">
        <object type="Cache" os_level="-1" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003" cache_size="65536" depth="1" cache_linesize="128">
          <object type="Core" os_level="-1" os_index="0" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003">
            <object type="PU" os_level="-1" os_index="0" cpuset="0x00000001"
complete_cpuset="0x00000001" online_cpuset="0x00000001"
allowed_cpuset="0x00000001"/>
            <object type="PU" os_level="-1" os_index="1" cpuset="0x00000002"
complete_cpuset="0x00000002" online_cpuset="0x00000002"
allowed_cpuset="0x00000002"/>
          </object>
        </object>
      </object>
    </object>
  </object>
</topology>
### End of second export buffer
FAIL: xmlbuffer
========================================================
1 of 26 tests failed
Please report to http://www.open-mpi.org/community/help/
========================================================

Comment 2 Brice Goglin 2011-09-22 04:38:13 UTC
Looks like we cast the pagesizes to unsigned long during XML import+export. Please try this patch. It should work with your 16Go pages :)
Thanks!
Brice


Index: src/topology-xml.c
===================================================================
--- src/topology-xml.c	(révision 3812)
+++ src/topology-xml.c	(copie de travail)
@@ -280,9 +280,9 @@
       const xmlChar *value = hwloc__xml_import_attr_value(attr);
       if (value) {
 	if (!strcmp((char *) attr->name, "size"))
-	  size = strtoul((char *) value, NULL, 10);
+	  size = strtoull((char *) value, NULL, 10);
 	else if (!strcmp((char *) attr->name, "count"))
-	  count = strtoul((char *) value, NULL, 10);
+	  count = strtoull((char *) value, NULL, 10);
 	else
 	  fprintf(stderr, "ignoring unknown pagetype attribute %s\n", (char *) attr->name);
       }

Comment 3 Brice Goglin 2011-09-22 05:03:01 UTC
Ho, you'll need this too, otherwise the lines would be missordered. I reproduced and fixes the problem on x86_32 so I assume it'll work for you too.

Index: src/topology.c
===================================================================
--- src/topology.c	(révision 3828)
+++ src/topology.c	(copie de travail)
@@ -889,7 +889,12 @@
   const struct hwloc_obj_memory_page_type_s *a = _a;
   const struct hwloc_obj_memory_page_type_s *b = _b;
   /* consider 0 as larger so that 0-size page_type go to the end */
-  return b->size ? (int)(a->size - b->size) : -1;
+  if (!b->size)
+    return -1;
+  /* don't cast a-b in int since those are ullongs */
+  if (b->size == a->size)
+    return 0;
+  return a->size < b->size ? -1 : 1;
 }

Comment 4 Jiri Hladky 2011-09-23 23:16:16 UTC
Hi Brice,

I have tried to apply your patches 
https://bugzilla.redhat.com/show_bug.cgi?id=724937#c2
https://bugzilla.redhat.com/show_bug.cgi?id=724937#c3
to both hwloc-1.2 and hwloc-1.2.1
but it's failing:

===================================================================
patching file src/topology.c
Hunk #1 FAILED at 889.

patching file src/topology-xml.c
Hunk #1 FAILED at 280.
===================================================================

Could you please provide a new complete patch using version hwloc-1.2.1 as base?

http://www.open-mpi.org/software/hwloc/v1.2/downloads/hwloc-1.2.1.tar.bz2

Thanks a lot!
Jirka

Comment 5 Brice Goglin 2011-09-24 04:56:38 UTC
The patch I backported to v1.2 is
  https://svn.open-mpi.org/trac/hwloc/changeset/3834

By the way, there's a 1.2.2rc1 online, and I will do the final 1.2.2 next week.

Brice

Comment 6 Jiri Hladky 2011-09-24 20:46:24 UTC
Hi Brice,

thanks a lot for creating 1.2.2rc1. I have tested it and the issue is fixed:-)

I will wait for 1.2.2 to submit a new rpm for Fedora.

Thanks
Jiri

Comment 7 Fedora Update System 2011-10-04 23:21:34 UTC
hwloc-1.2.2-0.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/hwloc-1.2.2-0.fc16

Comment 8 Fedora Update System 2011-10-04 23:44:59 UTC
hwloc-1.2.2-0.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/hwloc-1.2.2-0.fc15

Comment 9 Fedora Update System 2011-10-05 17:16:40 UTC
Package hwloc-1.2.2-0.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing hwloc-1.2.2-0.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/hwloc-1.2.2-0.fc16
then log in and leave karma (feedback).

Comment 10 Fedora Update System 2011-10-07 00:53:31 UTC
hwloc-1.2.2-1.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/hwloc-1.2.2-1.fc15

Comment 11 Fedora Update System 2011-10-07 01:03:30 UTC
hwloc-1.2.2-1.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/hwloc-1.2.2-1.fc16

Comment 12 Fedora Update System 2011-10-15 23:25:04 UTC
hwloc-1.3-0.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/hwloc-1.3-0.fc15

Comment 13 Fedora Update System 2011-11-15 00:26:37 UTC
hwloc-1.3-1.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/hwloc-1.3-1.fc16

Comment 14 Fedora Update System 2011-11-25 02:16:01 UTC
hwloc-1.3-0.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 15 Fedora Update System 2011-12-12 21:55:31 UTC
hwloc-1.3-1.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.