Bug 724937 - hwloc-1.2-0.fc16 fails xmlbuffer self check on PPC, but passes on PPC64
Summary: hwloc-1.2-0.fc16 fails xmlbuffer self check on PPC, but passes on PPC64
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: hwloc
Version: rawhide
Hardware: powerpc
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jiri Hladky
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-22 11:16 UTC by Karsten Hopp
Modified: 2011-12-12 21:55 UTC (History)
2 users (show)

Fixed In Version: hwloc-1.3-1.fc16
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-11-25 02:16:01 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Karsten Hopp 2011-07-22 11:16:50 UTC
Description of problem:
the xmlbuffer self check fails on ppc, see 
http://ppc.koji.fedoraproject.org/koji/getfile?taskID=256586&name=build.log

The difference between the first exported buffer and the second exported buffer is in the lines
<page_type size="17179869184" count="0"/>
vs.
<page_type size="4294967295" count="0"/>

Version-Release number of selected component (if applicable):
hwloc-1.2-0.fc16

How reproducible:
always

Steps to Reproduce:
1. ppc-koji build --scratch dist-f16 hwloc-1.2-0.fc16.src.rpm
2.
3.
  
Actual results:
http://ppc.koji.fedoraproject.org/koji/taskinfo?taskID=256555

Comment 1 Jiri Hladky 2011-09-21 21:56:18 UTC
Just tested hwloc-1.2.1, bug is still there, contacting hwloc developers

ppc-koji build --scratch dist-f16 rpmbuild/SRPMS/hwloc-1.2.1-0.fc14.src.rpm

Please see a complete build log at
http://ppc.koji.fedoraproject.org/koji/getfile?taskID=285892&name=build.log

Thanks
Jirka

PASS: glibc-sched
exported to buffer 0x10568a30 length 1835
re-exported to buffer 0x1056d118 length 1834
### First exported buffer is:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topology SYSTEM "hwloc.dtd">
<topology>
  <object type="Machine" os_level="-1" os_index="0" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003" local_memory="16091512832">
    <page_type size="17179869184" count="0"/>
    <page_type size="65536" count="245537"/>
    <page_type size="16777216" count="0"/>
    <info name="Backend" value="Linux"/>
    <info name="OSName" value="Linux"/>
    <info name="OSRelease" value="2.6.32-131.6.1.el6.ppc64"/>
    <info name="OSVersion" value="#1 SMP Mon Jun 20 14:15:43 EDT 2011"/>
    <info name="HostName" value="ppc-comm01"/>
    <info name="Architecture" value="ppc"/>
    <object type="Socket" os_level="-1" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003">
      <object type="Cache" os_level="-1" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003" cache_size="4194304" depth="2"
cache_linesize="128">
        <object type="Cache" os_level="-1" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003" cache_size="65536" depth="1" cache_linesize="128">
          <object type="Core" os_level="-1" os_index="0" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003">
            <object type="PU" os_level="-1" os_index="0" cpuset="0x00000001"
complete_cpuset="0x00000001" online_cpuset="0x00000001"
allowed_cpuset="0x00000001"/>
            <object type="PU" os_level="-1" os_index="1" cpuset="0x00000002"
complete_cpuset="0x00000002" online_cpuset="0x00000002"
allowed_cpuset="0x00000002"/>
          </object>
        </object>
      </object>
    </object>
  </object>
</topology>
### End of first export buffer
### Second exported buffer is:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topology SYSTEM "hwloc.dtd">
<topology>
  <object type="Machine" os_level="-1" os_index="0" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003" local_memory="16091512832">
    <page_type size="4294967295" count="0"/>
    <page_type size="65536" count="245537"/>
    <page_type size="16777216" count="0"/>
    <info name="Backend" value="Linux"/>
    <info name="OSName" value="Linux"/>
    <info name="OSRelease" value="2.6.32-131.6.1.el6.ppc64"/>
    <info name="OSVersion" value="#1 SMP Mon Jun 20 14:15:43 EDT 2011"/>
    <info name="HostName" value="ppc-comm01"/>
    <info name="Architecture" value="ppc"/>
    <object type="Socket" os_level="-1" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003">
      <object type="Cache" os_level="-1" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003" cache_size="4194304" depth="2"
cache_linesize="128">
        <object type="Cache" os_level="-1" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003" cache_size="65536" depth="1" cache_linesize="128">
          <object type="Core" os_level="-1" os_index="0" cpuset="0x00000003"
complete_cpuset="0x00000003" online_cpuset="0x00000003"
allowed_cpuset="0x00000003">
            <object type="PU" os_level="-1" os_index="0" cpuset="0x00000001"
complete_cpuset="0x00000001" online_cpuset="0x00000001"
allowed_cpuset="0x00000001"/>
            <object type="PU" os_level="-1" os_index="1" cpuset="0x00000002"
complete_cpuset="0x00000002" online_cpuset="0x00000002"
allowed_cpuset="0x00000002"/>
          </object>
        </object>
      </object>
    </object>
  </object>
</topology>
### End of second export buffer
FAIL: xmlbuffer
========================================================
1 of 26 tests failed
Please report to http://www.open-mpi.org/community/help/
========================================================

Comment 2 Brice Goglin 2011-09-22 04:38:13 UTC
Looks like we cast the pagesizes to unsigned long during XML import+export. Please try this patch. It should work with your 16Go pages :)
Thanks!
Brice


Index: src/topology-xml.c
===================================================================
--- src/topology-xml.c	(révision 3812)
+++ src/topology-xml.c	(copie de travail)
@@ -280,9 +280,9 @@
       const xmlChar *value = hwloc__xml_import_attr_value(attr);
       if (value) {
 	if (!strcmp((char *) attr->name, "size"))
-	  size = strtoul((char *) value, NULL, 10);
+	  size = strtoull((char *) value, NULL, 10);
 	else if (!strcmp((char *) attr->name, "count"))
-	  count = strtoul((char *) value, NULL, 10);
+	  count = strtoull((char *) value, NULL, 10);
 	else
 	  fprintf(stderr, "ignoring unknown pagetype attribute %s\n", (char *) attr->name);
       }

Comment 3 Brice Goglin 2011-09-22 05:03:01 UTC
Ho, you'll need this too, otherwise the lines would be missordered. I reproduced and fixes the problem on x86_32 so I assume it'll work for you too.

Index: src/topology.c
===================================================================
--- src/topology.c	(révision 3828)
+++ src/topology.c	(copie de travail)
@@ -889,7 +889,12 @@
   const struct hwloc_obj_memory_page_type_s *a = _a;
   const struct hwloc_obj_memory_page_type_s *b = _b;
   /* consider 0 as larger so that 0-size page_type go to the end */
-  return b->size ? (int)(a->size - b->size) : -1;
+  if (!b->size)
+    return -1;
+  /* don't cast a-b in int since those are ullongs */
+  if (b->size == a->size)
+    return 0;
+  return a->size < b->size ? -1 : 1;
 }

Comment 4 Jiri Hladky 2011-09-23 23:16:16 UTC
Hi Brice,

I have tried to apply your patches 
https://bugzilla.redhat.com/show_bug.cgi?id=724937#c2
https://bugzilla.redhat.com/show_bug.cgi?id=724937#c3
to both hwloc-1.2 and hwloc-1.2.1
but it's failing:

===================================================================
patching file src/topology.c
Hunk #1 FAILED at 889.

patching file src/topology-xml.c
Hunk #1 FAILED at 280.
===================================================================

Could you please provide a new complete patch using version hwloc-1.2.1 as base?

http://www.open-mpi.org/software/hwloc/v1.2/downloads/hwloc-1.2.1.tar.bz2

Thanks a lot!
Jirka

Comment 5 Brice Goglin 2011-09-24 04:56:38 UTC
The patch I backported to v1.2 is
  https://svn.open-mpi.org/trac/hwloc/changeset/3834

By the way, there's a 1.2.2rc1 online, and I will do the final 1.2.2 next week.

Brice

Comment 6 Jiri Hladky 2011-09-24 20:46:24 UTC
Hi Brice,

thanks a lot for creating 1.2.2rc1. I have tested it and the issue is fixed:-)

I will wait for 1.2.2 to submit a new rpm for Fedora.

Thanks
Jiri

Comment 7 Fedora Update System 2011-10-04 23:21:34 UTC
hwloc-1.2.2-0.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/hwloc-1.2.2-0.fc16

Comment 8 Fedora Update System 2011-10-04 23:44:59 UTC
hwloc-1.2.2-0.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/hwloc-1.2.2-0.fc15

Comment 9 Fedora Update System 2011-10-05 17:16:40 UTC
Package hwloc-1.2.2-0.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing hwloc-1.2.2-0.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/hwloc-1.2.2-0.fc16
then log in and leave karma (feedback).

Comment 10 Fedora Update System 2011-10-07 00:53:31 UTC
hwloc-1.2.2-1.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/hwloc-1.2.2-1.fc15

Comment 11 Fedora Update System 2011-10-07 01:03:30 UTC
hwloc-1.2.2-1.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/hwloc-1.2.2-1.fc16

Comment 12 Fedora Update System 2011-10-15 23:25:04 UTC
hwloc-1.3-0.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/hwloc-1.3-0.fc15

Comment 13 Fedora Update System 2011-11-15 00:26:37 UTC
hwloc-1.3-1.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/hwloc-1.3-1.fc16

Comment 14 Fedora Update System 2011-11-25 02:16:01 UTC
hwloc-1.3-0.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 15 Fedora Update System 2011-12-12 21:55:31 UTC
hwloc-1.3-1.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.