Bug 736489 - [ppc64] wrong mapping for G_TYPE_ENUM to libffi type
Summary: [ppc64] wrong mapping for G_TYPE_ENUM to libffi type
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: glib2
Version: 16
Hardware: ppc64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Matthias Clasen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 742529
TreeView+ depends on / blocked
 
Reported: 2011-09-07 20:02 UTC by Mark Hamzy
Modified: 2014-05-26 18:22 UTC (History)
17 users (show)

Fixed In Version: glib2-2.30.0-2.fc16
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-10-11 03:19:11 UTC


Attachments (Terms of Use)
reduced testcase for marshalling G_TYPE_FILE_MONITOR_EVENT (1.84 KB, text/plain)
2011-09-22 20:58 UTC, Dan Williams
no flags Details
Hackish patch which fixes marshalling on ppc64 and has no regressions on x86-64 (4.65 KB, patch)
2011-09-26 23:00 UTC, Dan Williams
no flags Details | Diff
Untested patch to fix 32-bit return values in libffi (543 bytes, patch)
2011-09-27 00:28 UTC, Benjamin Herrenschmidt
no flags Details | Diff
Comment (96.52 KB, text/plain)
2011-09-14 15:24 UTC, Mark Hamzy
no flags Details
Comment (123.35 KB, text/plain)
2011-09-14 15:26 UTC, Mark Hamzy
no flags Details


Links
System ID Priority Status Summary Last Updated
GNOME Bugzilla 659881 None None None Never

Description Mark Hamzy 2011-09-07 20:02:09 UTC
bash-4.2# cat << __EOF__ > /etc/sysconfig/network-scripts/ifcfg-eth0
> DEVICE="eth0"
> NM_CONTROLLED="yes"
> HWADDR="2A:0F:4A:82:98:04"
> ONBOOT="yes"
> BOOTPROTO=dhcp
> __EOF__
bash-4.2# [  131.676275] RTAS: event: 35, Type: Platform Information Event,
Severity: 1
[  139.675007] RTAS: event: 36, Type: Platform Information Event, Severity: 1
[  147.671431] RTAS: event: 37, Type: Platform Information Event, Severity: 1

bash-4.2# /bin/systemctl start NetworkManager.service
[  151.730553] systemd[1]: Accepted connection on private bus.
[  151.730589] systemd[1]: Running GC...
[  151.733642] systemd[1]: Got D-Bus request:
org.freedesktop.systemd1.Manager.StartUnit() on /org/freedesktop/systemd1
[  151.733718] systemd[1]: Trying to enqueue job
NetworkManager.service/start/replace
[  151.733899] systemd[1]: Installed new job NetworkManager.service/start as 56
[  151.733918] systemd[1]: Installed new job network.target/start as 99
[  151.733930] systemd[1]: Installed new job arp-ethers.service/start as 100
[  151.733944] systemd[1]: Enqueued job NetworkManager.service/start as 56
[  151.734096] systemd[1]: Starting of arp-ethers.service requested but
condition failed. Ignoring.
[  151.734115] systemd[1]: Job arp-ethers.service/start finished, result=done
[  151.734267] systemd[1]: About to execute: /usr/sbin/NetworkManager
--no-daemon
[  151.763782] systemd[1]: Forked /usr/sbin/NetworkManager as 615
[  151.764048] systemd[1]: NetworkManager.service changed dead -> start
[  151.764247] systemd[1]: Got D-Bus request:
org.freedesktop.systemd1.Manager.GetUnit() on /org/freedesktop/systemd1
[  151.764480] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.Properties.Get() on
/org/freedesktop/systemd1/unit/NetworkManager_2eservice
[  152.200371] systemd[1]: Incoming traffic on syslog.socket
[  152.200407] NetworkManager[615]: <info> NetworkManager (version
0.8.9997-7.git20110721.fc16) is starting...
[  152.200430] systemd[1]: syslog.socket changed listening -> running
[  152.200436] NetworkManager[615]: <info> Read config file
/etc/NetworkManager/NetworkManager.conf
[  152.201788] systemd[1]: Incoming traffic on dbus.socket
[  152.201811] systemd[1]: Trying to enqueue job dbus.service/start/replace
[  152.201944] systemd[1]: Installed new job dbus.service/start as 101
[  152.201955] systemd[1]: Enqueued job dbus.service/start as 101
[  152.201976] systemd[1]: dbus.socket changed listening -> running
[  152.202030] systemd[1]: About to execute: /bin/dbus-uuidgen --ensure
[  152.233788] systemd[1]: Forked /bin/dbus-uuidgen as 616
[  152.233870] systemd[1]: dbus.service changed dead -> start-pre
[  152.242162] systemd[1]: Received SIGCHLD from PID 616 (dbus-uuidgen).
[  152.242260] systemd[1]: Got SIGCHLD for process 616 (dbus-uuidgen)
[  152.242572] systemd[1]: Child 616 died (code=exited, status=0/SUCCESS)
[  152.242582] systemd[1]: Child 616 belongs to dbus.service
[  152.242595] systemd[1]: dbus.service: control process exited, code=exited
status=0
[  152.242606] systemd[1]: dbus.service running next control command for state
start-pre
[  152.242629] systemd[1]: About to execute: /bin/rm -f /var/run/messagebus.pid
[  152.273875] systemd[1]: Forked /bin/rm as 618
[  152.274097] systemd[1]: Accepted connection on private bus.
[  152.274443] systemd[1]: Got D-Bus request:
org.freedesktop.systemd1.Agent.Released() on /org/freedesktop/systemd1/agent
[  152.274776] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
[  152.277556] systemd[1]: Received SIGCHLD from PID 618 (rm).
[  152.277660] systemd[1]: Got SIGCHLD for process 618 (rm)
[  152.277989] systemd[1]: Child 618 died (code=exited, status=0/SUCCESS)
[  152.278000] systemd[1]: Child 618 belongs to dbus.service
[  152.278012] systemd[1]: dbus.service: control process exited, code=exited
status=0
[  152.278025] systemd[1]: dbus.service got final SIGCHLD for state start-pre
[  152.278102] systemd[1]: About to execute: /bin/dbus-daemon --system
--address=systemd: --nofork --systemd-activation
[  152.313908] systemd[1]: Forked /bin/dbus-daemon as 620
[  152.314324] systemd[1]: dbus.service changed start-pre -> running
[  152.314357] systemd[1]: Job dbus.service/start finished, result=done
[  152.430768] systemd[1]: Successfully connected to system D-Bus bus
44edc23d2a7e479f6be2414100000098 as :1.0
[  152.432543] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameAcquired() on /org/freedesktop/DBus
[  152.432722] systemd[1]: Accepted connection on private bus.
[  152.432958] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
[  152.432988] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameAcquired() on /org/freedesktop/DBus
[  152.433019] systemd[1]: Successfully acquired name.
[  152.433452] systemd[1]: Got D-Bus request:
org.freedesktop.systemd1.Agent.Released() on /org/freedesktop/systemd1/agent
[  152.433939] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
[  152.525020] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
[  152.526937] dbus[620]: [system] Activating service
name='org.freedesktop.PolicyKit1' (using servicehelper)
[  152.562077] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
[  152.572014] polkitd[623]: started daemon version 0.101 using authority
implementation `local' version `0.101'
[  152.572604] dbus[620]: [system] Successfully activated service
'org.freedesktop.PolicyKit1'
[  152.572873] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
[  152.612887] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
[  152.613348] NetworkManager[615]: ifcfg-rh: Acquired D-Bus service
com.redhat.ifcfgrh1
[  152.613376] NetworkManager[615]: <info> Loaded plugin ifcfg-rh: (c) 2007 -
2010 Red Hat, Inc.  To report bugs please use the NetworkManager mailing list.
[  152.613672] NetworkManager[615]: <info> Loaded plugin keyfile: (c) 2007 -
2010 Red Hat, Inc.  To report bugs please use the NetworkManager mailing list.
[  152.614035] NetworkManager[615]: ifcfg-rh: parsing
/etc/sysconfig/network-scripts/ifcfg-eth0 ...
[  152.720738] NetworkManager[615]: ifcfg-rh:     read connection 'System eth0'
[  152.720809] NetworkManager[615]: ifcfg-rh: parsing
/etc/sysconfig/network-scripts/ifcfg-lo ...
[  152.722363] NetworkManager[615]: <info> trying to start the modem manager...
[  152.722875] dbus[620]: [system] Activating service
name='org.freedesktop.ModemManager' (using servicehelper)
[  152.726514] NetworkManager[615]: <info> monitoring kernel firmware directory
'/lib/firmware'.
[  152.729666] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
[  152.729731] systemd[1]: NetworkManager.service's D-Bus name
org.freedesktop.NetworkManager now registered by :1.1
[  152.730052] systemd[1]: NetworkManager.service changed start -> running
[  152.730072] systemd[1]: Job NetworkManager.service/start finished,
result=done
[  152.730412] systemd[1]: network.target changed dead -> active
[  152.730434] systemd[1]: Job network.target/start finished, result=done
[  152.730541] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
[  152.730740] NetworkManager[615]: <info> WiFi enabled by radio killswitch;
enabled by state file
[  152.730770] NetworkManager[615]: <info> WWAN enabled by radio killswitch;
enabled by state file
[  152.730792] NetworkManager[615]: <info> WiMAX enabled by radio killswitch;
enabled by state file
[  152.730811] NetworkManager[615]: <info> Networking is enabled by state file
[  152.731689] dbus[620]: [system] Activated service
'org.freedesktop.ModemManager' failed: Cannot launch daemon, file not found or
permissions invalid
[  152.734026] NetworkManager[615]: <error> [1314825991.163488]
[nm-device-ethernet.c:751] real_update_permanent_hw_address(): (eth0): unable
to read permanent MAC address (error 0)
[  152.735800] NetworkManager[615]: <info> (eth0): carrier is OFF
[  152.736037] NetworkManager[615]: <info> (eth0): new Ethernet device (driver:
'ibmveth' ifindex: 2)
[  152.736057] NetworkManager[615]: <info> (eth0): exported as
/org/freedesktop/NetworkManager/Devices/0
[  152.736414] NetworkManager[615]: <info> (eth0): now managed
[  152.736434] NetworkManager[615]: <info> (eth0): device state change:
unmanaged -> unavailable (reason 'managed') [10 20 2]
[  152.736530] NetworkManager[615]: <info> (eth0): bringing up device.
[  152.753302] NetworkManager[615]: <info> (eth0): preparing device.
[  152.753320] NetworkManager[615]: <info> (eth0): deactivating device (reason:
2).
[  152.755623] NetworkManager[615]: <info> (eth0): carrier now ON (device state
20)
[  152.755646] NetworkManager[615]: <info> (eth0): device state change:
unavailable -> disconnected (reason 'carrier-changed') [20 30 40]
[  152.756250] NetworkManager[615]: <warn> bluez error getting default adapter:
The name org.bluez was not provided by any .service files
[  152.756837] NetworkManager[615]: <info> Auto-activating connection 'System
eth0'.
[  152.756937] NetworkManager[615]: <info> Activation (eth0) starting
connection 'System eth0'
[  152.756954] NetworkManager[615]: <info> (eth0): device state change:
disconnected -> prepare (reason 'none') [30 40 0]
[  152.757093] NetworkManager[615]: <info> Activation (eth0) Stage 1 of 5
(Device Prepare) scheduled...
[  152.757289] NetworkManager[615]: <info> Activation (eth0) Stage 1 of 5
(Device Prepare) started...
[  152.757310] NetworkManager[615]: <info> Activation (eth0) Stage 2 of 5
(Device Configure) scheduled...
[  152.757330] NetworkManager[615]: <info> Activation (eth0) Stage 1 of 5
(Device Prepare) complete.
[  152.757347] NetworkManager[615]: <info> Activation (eth0) Stage 2 of 5
(Device Configure) starting...
[  152.757393] NetworkManager[615]: <info> (eth0): device state change: prepare
-> config (reason 'none') [40 50 0]
[  152.757622] NetworkManager[615]: <info> Activation (eth0) Stage 2 of 5
(Device Configure) successful.
[  152.757643] NetworkManager[615]: <info> Activation (eth0) Stage 3 of 5 (IP
Configure Start) scheduled.
[  152.757685] NetworkManager[615]: <info> Activation (eth0) Stage 2 of 5
(Device Configure) complete.
[  152.757748] NetworkManager[615]: <info> Activation (eth0) Stage 3 of 5 (IP
Configure Start) started...
[  152.757768] NetworkManager[615]: <info> (eth0): device state change: config
-> ip-config (reason 'none') [50 70 0]
[  152.758104] NetworkManager[615]: <info> Activation (eth0) Beginning DHCPv4
transaction (timeout in 45 seconds)
[  152.768800] NetworkManager[615]: <info> dhclient started with pid 628
[  152.769346] NetworkManager[615]: <info> Activation (eth0) Stage 3 of 5 (IP
Configure Start) complete.
bash-4.2# [  153.163692] dhclient[628]: Internet Systems Consortium DHCP Client
4.2.2
[  153.163751] dhclient[628]: Copyright 2004-2011 Internet Systems Consortium.
[  153.163774] dhclient[628]: All rights reserved.
[  153.163793] dhclient[628]: For info, please visit
https://www.isc.org/software/dhcp/
[  153.191386] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
[  153.193510] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
[  153.193994] NetworkManager[615]: <info> (eth0): DHCPv4 state changed nbi ->
preinit
[  153.195386] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
[  153.195593] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
[  153.212030] dhclient[628]: Listening on LPF/eth0/2a:0f:4a:82:98:04
[  153.212101] dhclient[628]: Sending on   LPF/eth0/2a:0f:4a:82:98:04
[  153.212283] dhclient[628]: Sending on   Socket/fallback
[  153.212429] dhclient[628]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67
interval 7
[  153.212498] ibmveth 30000004: DMA-API: device driver frees DMA memory with
wrong function [device address=0x0000000003040490] [size=342 bytes] [mapped as
single] [unmapped as page]
[  153.212528] ------------[ cut here ]------------
[  153.212533] WARNING: at lib/dma-debug.c:829
[  153.212539] Modules linked in: squashfs nls_utf8 ibmvscsic
scsi_transport_srp ibmveth scsi_tgt
[  153.212558] NIP: c000000000343fb0 LR: c000000000343fac CTR: c000000000068324
[  153.212567] REGS: c000000272153090 TRAP: 0700   Not tainted 
(3.0.1-5.fc16.kh.ppc64)
[  153.212574] MSR: 8000000000029032 <EE,ME,CE,IR,DR>  CR: 48222482  XER:
00000009
[  153.212592] TASK = c0000002710d4e80[628] 'dhclient' THREAD: c000000272150000
CPU: 0
[  153.212600] GPR00: c000000000343fac c000000272153310 c00000000141a7b8
00000000000000bb
[  153.212613] GPR04: 0000000000000001 c0000000000ac328 0000000000000000
0000000000000002
[  153.212625] GPR08: 0000000000000000 c0000002710d4e80 0000000000017f20
0000000000000001
[  153.212638] GPR12: 0000000084222442 c00000000ee54000 0000000000000000
c000000271318a00
[  153.212651] GPR16: 0000000000000000 0000000020801988 0000000003040490
0000000000000020
[  153.212664] GPR20: 0000000000000e60 0000000000000000 c0000002753ac278
0000000003040490
[  153.212677] GPR24: 0000000000000156 0000000000000000 0000000000000001
c000000002040d00
[  153.212690] GPR28: c0000002760ec420 c000000272153440 c000000001399b88
c000000272153310
[  153.212710] NIP [c000000000343fb0] .check_unmap+0x3dc/0x77c
[  153.212718] LR [c000000000343fac] .check_unmap+0x3d8/0x77c
[  153.212724] Call Trace:
[  153.212728] [c000000272153310] [c000000000343fac] .check_unmap+0x3d8/0x77c
(unreliable)
[  153.212738] [c0000002721533d0] [c0000000003445a4]
.debug_dma_unmap_page+0x78/0x80
[  153.212750] [c000000272153510] [d0000000048ce194]
.ibmveth_start_xmit+0x53c/0x67c [ibmveth]
[  153.212760] [c000000272153640] [c0000000005854fc]
.dev_hard_start_xmit+0x5a8/0x7e8
[  153.212769] [c000000272153740] [c0000000005a6104]
.sch_direct_xmit+0x7c/0x278
[  153.212777] [c0000002721537f0] [c000000000585e84]
.dev_queue_xmit+0x748/0xa48
[  153.212787] [c0000002721538b0] [c00000000067caf8]
.packet_sendmsg+0xb54/0xc70
[  153.212796] [c000000272153a00] [c000000000569168]
.sock_aio_write+0x138/0x150
[  153.212805] [c000000272153b40] [c0000000001cf95c] .do_sync_write+0xa8/0xe4
[  153.212813] [c000000272153cc0] [c0000000001d00cc] .vfs_write+0xe4/0x188
[  153.212822] [c000000272153d70] [c0000000001d03d8] .SyS_write+0x58/0x88
[  153.212831] [c000000272153e30] [c000000000009928] syscall_exit+0x0/0x40
[  153.212838] Instruction dump:
[  153.212843] e97c001a e93e8048 e87e80e0 e8dd0028 e81d001a e8fd0030 796b1f24
78001f24
[  153.212857] 7d09582a 7d29002a 4835d5b9 60000000 <0fe00000> 480000c4 2f800003
409e0100
[  153.212873] ---[ end trace 23ebc7de1702caf3 ]---
[  153.212878] Mapped at:
[  153.212882]  [<c000000000344b90>] .debug_dma_map_page+0x9c/0x1c0
[  153.212889]  [<d0000000048cdf18>] .ibmveth_start_xmit+0x2c0/0x67c [ibmveth]
[  153.212897]  [<c0000000005854fc>] .dev_hard_start_xmit+0x5a8/0x7e8
[  153.212905]  [<c0000000005a6104>] .sch_direct_xmit+0x7c/0x278
[  153.212912]  [<c000000000585e84>] .dev_queue_xmit+0x748/0xa48
[  153.213305] dhclient[628]: DHCPREQUEST on eth0 to 255.255.255.255 port 67
[  153.213384] dhclient[628]: DHCPOFFER from 9.5.250.185
[  153.213653] dhclient[628]: DHCPACK from 9.5.250.185
[  153.221422] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
[  153.222206] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
[  153.223081] NetworkManager[615]: <info> (eth0): DHCPv4 state changed preinit
-> bound
[  153.223121] NetworkManager[615]: <info> Activation (eth0) Stage 4 of 5 (IP4
Configure Get) scheduled...
[  153.223220] NetworkManager[615]: <info> Activation (eth0) Stage 4 of 5 (IP4
Configure Get) started...
[  153.223472] NetworkManager[615]: <info>   address 9.5.250.146
[  153.223544] NetworkManager[615]: <info>   prefix 24 (255.255.255.0)
[  153.223573] NetworkManager[615]: <info>   gateway 9.5.250.1
[  153.223600] NetworkManager[615]: <info>   nameserver '9.10.244.100'
[  153.223671] NetworkManager[615]: <info>   nameserver '9.10.244.200'
[  153.223699] NetworkManager[615]: <info>   domain name 'rchland.ibm.com'
[  153.224288] NetworkManager[615]: <info> Activation (eth0) Stage 5 of 5 (IP
Configure Commit) scheduled...
[  153.224333] NetworkManager[615]: <info> Activation (eth0) Stage 4 of 5 (IP4
Configure Get) complete.
[  153.224566] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
[  153.224838] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
[  153.225977] NetworkManager[615]: <info> Activation (eth0) Stage 5 of 5 (IP
Configure Commit) started...
[  153.226040] dhclient[628]: bound to 9.5.250.146 -- renewal in 268 seconds.
[  154.227957] NetworkManager[615]: <info> (eth0): device state change:
ip-config -> activated (reason 'none') [70 100 0]
[  154.229133] NetworkManager[615]: <info> Policy set 'System eth0' (eth0) as
default for IPv4 routing and DNS.
[  154.229160] NetworkManager[615]: <info> Activation (eth0) successful, device
activated.
[  154.230229] NetworkManager[615]: <info> Activation (eth0) Stage 5 of 5 (IP
Configure Commit) complete.
[  154.230481] dbus[620]: [system] Activating service
name='org.freedesktop.nm_dispatcher' (using servicehelper)
[  154.263691] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
[  154.265094] dbus[620]: [system] Successfully activated service
'org.freedesktop.nm_dispatcher'
[  154.265330] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
[  154.276050] nm-dispatcher.action[633]: Script
'/etc/NetworkManager/dispatcher.d/00-netreport' exited with error status 1.
[  155.667990] RTAS: event: 38, Type: Platform Information Event, Severity: 1
[  163.662767] RTAS: event: 39, Type: Platform Information Event, Severity: 1
[  165.011042] systemd[1]: Received SIGCHLD from PID 633 (nm-dispatcher.a).
[  165.011221] systemd[1]: Got SIGCHLD for process 633 (nm-dispatcher.a)
[  165.011491] systemd[1]: Child 633 died (code=exited, status=0/SUCCESS)
[  165.011512] systemd[1]: Running GC...
[  165.011933] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
[  165.011965] systemd[1]: Got D-Bus request:
org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
[  171.659875] RTAS: event: 40, Type: Platform Information Event, Severity: 1

bash-4.2# [  179.656435] RTAS: event: 41, Type: Platform Information Event,
Severity: 1
ifconfig
eth0      Link encap:Ethernet  HWaddr 2A:0F:4A:82:98:04
          inet addr:9.5.250.146  Bcast:9.5.250.255  Mask:255.255.255.0
          inet6 addr: 2002:905:150e:302:280f:4aff:fe82:9804/64 Scope:Global
          inet6 addr: fe80::280f:4aff:fe82:9804/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:362 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:31283 (30.5 KiB)  TX bytes:1090 (1.0 KiB)
          Interrupt:20

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

Comment 1 Mark Hamzy 2011-09-07 20:09:46 UTC
I can bring the network up in a bash session.  However, anaconda cannot bring the network up.

We thought that it might be a virtual ethernet problem.  However, we installed a physical ethernet card in the system.  It still fails.

Comment 2 Jirka Klimes 2011-09-14 10:24:37 UTC
I see no error regarding NetworkManager, there's a successful connection activation in the log.
However, there is an error with memory access in ibmveth driver, thus reassigning to kernel.

Comment 3 Mark Hamzy 2011-09-14 14:05:20 UTC
The error you are talking about already has a bug assigned to it.  bug 733766.

I am able to bring the network up successfully from the command line using ifup.  I cannot bring the network up using the Anaconda installer (which uses the NetworkManager libraries).

Comment 4 Mark Hamzy 2011-09-14 15:24:13 UTC
Created attachment 915366 [details]
Comment

(This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla).

Comment 5 Mark Hamzy 2011-09-14 15:26:10 UTC
Created attachment 915367 [details]
Comment

(This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla).

Comment 6 Dan Williams 2011-09-14 17:50:09 UTC
Is there any chance we can get anaconda output for this error?  Based on these dmesg logs, it appears as though NetworkManager is correctly configuring the device, and that the device does indeed have IP connectivity (as shown by the resulting ifconfig output).  But the bug title indicates that anaconda is saying that the network was not correctly configured?  Is that correct?

Also, what does 'route -n' say?

Comment 7 Mark Hamzy 2011-09-14 19:55:59 UTC
[anaconda root@localhost dev]# ifup eth1
[anaconda root@localhost dev]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         9.5.250.1       0.0.0.0         UG    0      0        0 eth1
9.5.250.0       0.0.0.0         255.255.255.0   U     0      0        0 eth1
169.254.0.0     0.0.0.0         255.255.0.0     U     1003   0        0 eth1
[anaconda root@localhost dev]# /bin/systemctl start loader2.service

This brings up the anaconda installer.  I see the text dialog saying "You have multiple network devices on this system. Which would you like to install through?"

I select eth1 and then see the following text dialog: "Waiting for NetworkManager to configure eth1."

Which displays for a while.  I then see the dialog to check the media ("To begin testing the media before installation press OK.").  Which I skip, and it goes back to "Waiting for NetworkManager to configure eth1."  After a while it goes to the "Configure TCP/IP" dialog.

The network is still up from when I started it before.  I can ssh into the box and I only see these logs from anaconda:

[anaconda root@localhost ~]# ls -l /tmp
total 5
-rw-r--r-- 1 root root 3601 Sep 14 19:53 anaconda.log
-rw-r--r-- 1 root root   98 Sep 14 19:51 program.log
[anaconda root@localhost ~]# cat /tmp/program.log
19:51:13,711 INFO program: Running... /bin/mount -n -t iso9660 -o ro /dev/sr0 /mnt/install/source
[anaconda root@localhost ~]# cat /tmp/anaconda.log
19:48:39,815 INFO loader: kernel command line:
19:48:39,815 INFO loader:     vnc=1
19:48:39,815 INFO loader: vnc forced graphical mode from cmdline
19:48:39,815 INFO loader: early networking required for vnc
19:48:39,815 INFO loader:     ro
19:48:39,815 INFO loader:     selinux=0
19:48:39,815 INFO loader:     systemd.log_target=kmsg
19:48:39,815 INFO loader:     systemd.log_level=debug
19:48:39,815 INFO loader:     rd.break
19:48:39,815 INFO loader:     serial
19:48:39,815 INFO loader:     root=live:CDLABEL=Fedora-20110907-ppc64-DVD
19:48:39,817 DEBUG loader: readNetInfo /tmp/s390net not found, early return
19:48:39,831 INFO loader: anaconda version 16.16 on ppc64 starting
19:48:39,832 INFO loader: set console to /dev/hvc0 at 108
19:48:39,850 INFO loader: 8519680 kB (8320 MB) are available
19:48:39,850 INFO loader: 8519680 kB (8320 MB) are available
19:48:40,106 DEBUG loader: going to set language to en_US.UTF-8
19:48:40,106 DEBUG loader: locale en_US.UTF-8: base: en_US, mod: (null), charset: UTF-8
19:48:40,107 INFO loader: going to prepare locales for en_US.UTF-8 (locale: en_US, charset: UTF-8)
19:48:41,266 INFO loader: setting language to en_US.UTF-8
19:48:41,269 DEBUG loader: Saving module scsi_dh_rdac
19:48:41,269 DEBUG loader: Saving module scsi_dh_hp_sw
19:48:41,269 DEBUG loader: Saving module scsi_dh_alua
19:48:41,269 DEBUG loader: Saving module scsi_dh_emc
19:48:41,269 DEBUG loader: Saving module iscsi_tcp
19:48:41,269 DEBUG loader: Saving module libiscsi_tcp
19:48:41,269 DEBUG loader: Saving module libiscsi
19:48:41,269 DEBUG loader: Saving module scsi_transport_iscsi
19:48:41,269 DEBUG loader: Saving module cramfs
19:48:41,269 DEBUG loader: Saving module squashfs
19:48:41,269 DEBUG loader: Saving module nls_utf8
19:48:41,270 DEBUG loader: Saving module ibmvscsic
19:48:41,270 DEBUG loader: Saving module scsi_transport_srp
19:48:41,270 DEBUG loader: Saving module scsi_tgt
19:48:41,270 DEBUG loader: Saving module e1000e
19:48:41,270 DEBUG loader: probing buses
19:48:41,411 DEBUG loader: waiting for hardware to initialize
19:48:47,870 INFO loader: restarting NetworkManager
19:48:48,418 INFO loader: No iBFT table detected.
19:50:28,662 INFO loader: doing kickstart... setting it up
19:50:28,664 DEBUG loader: activating device eth1
19:51:13,704 ERR loader: failed to configure network interface
19:51:13,704 ERR loader: unable to activate device eth1
19:51:13,706 INFO loader: trying to mount CD device /dev/sr0 on /mnt/install/source
19:51:13,707 INFO loader: drive status is CDS_DISC_OK
19:51:13,719 DEBUG loader: Found installation media, so skipping lang and kbd
19:53:03,928 DEBUG loader: in doLoaderMain, step = STEP_LANG
19:53:03,928 DEBUG loader: in doLoaderMain, step = STEP_KBD
19:53:03,928 DEBUG loader: in doLoaderMain, step = STEP_METHOD
19:53:03,928 DEBUG loader: in doLoaderMain, step = STEP_DRIVER
19:53:03,930 DEBUG loader: in doLoaderMain, step = STEP_NETWORK
19:53:03,965 INFO loader: need to set up networking
19:53:03,966 DEBUG loader: in doLoaderMain, step = STEP_IP
19:53:03,966 DEBUG loader: in doLoaderMain, calling setupIfaceStruct()
19:53:03,966 DEBUG loader: in doLoaderMain, calling readNetConfig()
19:53:03,966 INFO loader: doing kickstart... setting it up
19:53:03,967 DEBUG loader: activating device eth1
19:53:49,010 ERR loader: failed to configure network interface
19:53:49,010 DEBUG loader: in STEP_IP, retry (LOADER_ERROR)
19:53:49,010 DEBUG loader: in doLoaderMain, step = STEP_IP
19:53:49,010 DEBUG loader: in doLoaderMain, calling setupIfaceStruct()
19:53:49,010 DEBUG loader: in doLoaderMain, calling readNetConfig()

Comment 8 Dan Williams 2011-09-14 20:45:37 UTC
I'm confused.  At what point here does the error this bug is about occur?  At no point should 'ifup eth1' be necessary, and I assume you're doing that because you got the aforementioned error?  Why is ifup being run manually at all, and at what point during the install are you running ifup?

Basically, what I need here is ifconfig -a and route -n after the error has occurred, but *before* any additional action like 'ifup' has been taken.  Otherwise we simply pollute the debugging results.

From the logs, it looks like NM is doing everything correctly and setting up the interface.  It shouldn't be necessary to run 'ifup' because networking is apparently being set up correctly by NetworkManager already.

Comment 9 Mark Hamzy 2011-09-14 21:06:33 UTC
Boot Fedora-20110907-ppc64-DVD-respin.iso with the following parameters:
    linux serial vnc=1 selinux=0

You will immediately run into the problem.

I use an HMC to access the console.  Unfortunately, I cannot access the other windows that anaconda creates.  So, I create the network setup files manually and start the network before anaconda runs.  This allows me to get into a bash window and poke around.

Comment 10 Phil Knirsch 2011-09-15 16:21:55 UTC
http://ppc.koji.fedoraproject.org/scratch/karsten/iso/ is the URL where the isos can be found.

Comment 11 Dan Williams 2011-09-16 21:16:06 UTC
Some observations about logs in comment #5:

1) systemd starts NM

2) NM finds there is no defined connection for eth0 or eth1, and creates some default DHCP connections and starts them up

3) these default connections successfully start at time 99

4) at time 274, NM is told to stop

5) at time 275, NM is told to start again

6) NM finds new config files ifcfg-eth0 and ifcfg-eth1, but these contain NM_CONTROLLED=no so NM does not start any network interfaces

7) at 278, anaconda is supposed to flip NM_CONTROLLED to 'yes' to let NM start the interfaces, as indicated by:
[  278.609893] loader[745]: activating device eth1

8) NM never gets an inotify signal indicating that the files have changed, thus it never starts the interfaces.  If NM got the inotify signal, we'd expect to see a message like:

NetworkManager: updating /etc/sysconfig/network-scripts/ifcfg-eth0


One other possibility is that the code in NM's config file parsing for handling NM_CONTROLLED isn't properly parsing the change from 'no' to 'yes'.  There were some changes committed around 2011-08-07 to that part of the code, but they were supposed to fix an issue like this.  I just checked and verified that flipping NM_CONTROLLED on my machine with the official NM 0.9 release (in F15+) works as expected, so I don't suspect this change yet.

Comment 12 Dan Williams 2011-09-16 21:18:13 UTC
One way to test out whether it's anaconda or NM or the kernel is to:

1) after you get the "failed to configure interface" error, do whatever you need to do to get shell access to the machine's filesystem

2) check /etc/sysconfig/network-scripts/ifcfg-eth0 and ifcfg-eth1 and see what the value of NM_CONTROLLED is

3) change that value to the opposite, ie "no" -> "yes" or "yes" -> "no"

4) look at the last 50 lines of syslog or wherever the NM output goes, and see if NM noticed the change to the ifcfg file

Comment 13 Karsten Hopp 2011-09-19 16:59:46 UTC
/etc/sysconfig/network-scripts/ifcfg-eth0 is empty after NM failed to configure eth0:


DEVICE=eth0
HWADDR=C6:.....
ONBOOT=yes
BOOTPROTO=dhcp
IPV6INIT=yes
IPV6_AUTOCONF=yes


That was with an aborted anaconda, I need to try again and see if I can only suspend it while doing the change

Comment 14 IBM Bug Proxy 2011-09-19 22:06:52 UTC
Before NM runs, /etc/sysconfig/network-scripts/ifcfg-eth0 is:

DEVICE=eth0
HWADDR=2A:....
ONBOOT=yes
BOOTPROTO=dhcp

After it fails to configure, it's:

DEVICE=eth0
HWADDR=2A:....
ONBOOT=yes
BOOTPROTO=dhcp
IPV6INIT=yes
IPV6_AUTOCONF=yes

I added NM_CONTROLLED=yes to /etc/sysconfig/network-scripts/ifcfg-eth0 and hit "Retry" in Anaconda and it just overwrote the file.

Comment 15 Dan Williams 2011-09-19 22:27:54 UTC
Yeah, anaconda will rewrite the file when trying to configure the  network, that's expected.  But after you manually modified ifcfg-eth0 and added NM_CONTROLLED=yes, did you see anything in syslog that indicated NM reread the file?  If not, try changing IPV6INIT to "no" and see if that triggers a reread.

Comment 16 Dan Williams 2011-09-21 17:55:26 UTC
From further debugging with modified NM versions it looks like NM is only getting glib's CHANGED event for file change notifications, and not CHANGES_DONE_HINT.  This could be due to changes in either glib's file monitor support or the kernel.

One thing I noticed is that glib's virtual changes done hint doesn't appear to be firing.  It has a 2-second timer to send the changes done hint but that apparently never happens.  A second odd thing is that we'd expect inotify to send an IN_CLOSE_WRITE event when the modified file is closed for writing, and that gets sent through directly by glib, but that's also not happening.

SO the next step is probably to build a glib2 package with some debug logging and figure out what's going on.

Comment 17 Dan Williams 2011-09-22 19:03:55 UTC
So after much debugging on the part of rangerpb, hamzy, and I, it comes down to libffi as the culprit.  glib now uses libffi to magically figure out the marshalling for signal arguments.  Apparently that doesn't work correctly on ppc64, as the following log shows:

18:25:31,0 INFO NetworkManager: GLib-GIO-Message: (730) emit_cb (0x100205378b0): path '/etc/sysconfig/network-scripts/ifcfg-eth0' change 0x1002059fa20 event 3 (other <none>)
18:25:31,0 NOTICE NetworkManager:    ifcfg-rh: (0x100205378b0) path changed /etc/sysconfig/network-scripts/ifcfg-eth0 (other <none>) event 0

The first part is from glib/gio/gfilemonitor.c's emit_cb() function where it does this:

       g_signal_emit (monitor, signals[CHANGED], 0,
	  	      change->child, change->other_file, change->event_type);

while the second simply prints out the signal arguments that NetworkManager got.  Note that NetworkManager receives an event type of '0', while glib is emitting an event type of 3.  That's the core problem here.

At the moment we blame libffi, but we need to run both the libffi and glib test suites on a ppc64 box before we can more definitively say where the problem lies.

Comment 18 Dan Williams 2011-09-22 19:28:26 UTC
Just for the record, the glib signal arguments here are:

  signals[CHANGED] =
    g_signal_new (I_("changed"),
		  G_TYPE_FILE_MONITOR,
		  G_SIGNAL_RUN_LAST,
		  G_STRUCT_OFFSET (GFileMonitorClass, changed),
		  NULL, NULL,
		  NULL,
		  G_TYPE_NONE, 3,
		  G_TYPE_FILE, G_TYPE_FILE, G_TYPE_FILE_MONITOR_EVENT);

and the G_TYPE_FILE_MONITOR_EVENT is the one that's apparently not getting marshalled appropriately.  That GType just represents an enum/uint32.

Comment 19 Anthony Green 2011-09-22 19:36:03 UTC
Which version of libffi? 3.0.10?

Comment 21 Dan Williams 2011-09-22 20:58:20 UTC
Created attachment 524478 [details]
reduced testcase for marshalling G_TYPE_FILE_MONITOR_EVENT

Build with:

gcc -Wall -o fmtest `pkg-config --libs --cflags gio-2.0` fmtest.c

this test passed on x86_64, fails on ppc64.

Comment 22 Dan Williams 2011-09-22 21:00:03 UTC
testcase requires only glib2-devel package installed to build.

Comment 23 Mark Hamzy 2011-09-22 21:03:57 UTC
libffi-3.0.9-3.fc16.ppc64
libffi-devel-3.0.9-3.fc16.ppc64

Comment 24 Dan Williams 2011-09-22 21:08:47 UTC
Note that I don't think we can rule glib2 out completely yet...  still investigating glib's usage of libffi and the internal mapping of types for marshalling.

Comment 25 Dan Williams 2011-09-22 21:19:04 UTC
G_TYPE_ENUM (which is what G_TYPE_FILE_MONITOR_EVENT is derived from) was not correctly mapped to libffi types during marshalling in gclosure.c.  Internally it was stored as a 'signed long' but when mapped to libffi types, was retrieved as a 'signed int' leading ppc64 to always return 0 since it presumably got the wrong 32 bits of the union in which the mapped value was held.

Comment 26 Anthony Green 2011-09-23 03:04:32 UTC
(In reply to comment #25)
> G_TYPE_ENUM (which is what G_TYPE_FILE_MONITOR_EVENT is derived from) was not
> correctly mapped to libffi types during marshalling in gclosure.c.  Internally
> it was stored as a 'signed long' but when mapped to libffi types, was retrieved
> as a 'signed int' leading ppc64 to always return 0 since it presumably got the
> wrong 32 bits of the union in which the mapped value was held.

Great - thanks for tracking this down.

Comment 27 Karsten Hopp 2011-09-23 14:37:14 UTC
I've built a custom glib2 with upstream commits 1df8160fa675b225809eed2f86d2489133e5e54d and
f42fe6cdc056b77f74ff6e332389d444c50ae7dc

the fmtest testcase still fails with
** Message: a_foo_emit: emitting signal with G_FILE_MONITOR_EVENT_CREATED (3)
**
ERROR:fmtest.c:62:check_cb: assertion failed: (event_type == G_FILE_MONITOR_EVENT_CREATED)



The glib2 selfcheck fails with:
TEST: signals... (pid=21717)
  /gobject/signals/variant:                                            OK
  /gobject/signals/generic-marshaller-1:                               **
ERROR:signals.c:193:on_generic_marshaller_2: assertion failed (v_enum == TEST_ENUM_BAR): (0 == 2)
OK
  /gobject/signals/generic-marshaller-2:                               FAIL
GTester: last random seed: R02S8fe9c3d2d27713f79a52c55453f3343b
/bin/sh: Zeile 1: 21656 Beendet                 MALLOC_CHECK_=2 MALLOC_PERTURB_=$((${RANDOM:-256} % 256)) ../../glib/gtester --verbose boxed enums param signals threadtests dynamictests binding properties reference ifaceproperties valuearray
make[4]: *** [test-nonrecursive] Fehler 143

Comment 28 Mark Hamzy 2011-09-26 14:15:41 UTC
In gobject/genums.c, there also seems to be mismatched types.

Also, since enums are ints in C, shouldn't we be storing them in ints rather than longs?

Comment 29 Dan Williams 2011-09-26 15:00:21 UTC
(In reply to comment #28)
> In gobject/genums.c, there also seems to be mismatched types.
> 
> Also, since enums are ints in C, shouldn't we be storing them in ints rather
> than longs?

Not storing the enums in v_long would be an ABI break.  I did a patch to stuff them into a temporary integer but that produced signed/unsigned issues with enums instead.  Better, but still not fixed.  The patch I had works fine on ppc32, just not ppc64.  Are there sign extension rules for storing an int to a long in ppc64 that might be coming into play?

Comment 30 Benjamin Herrenschmidt 2011-09-26 21:26:36 UTC
nothing more than any other arch :-)

If you store as an int you need to retrieve as an int, that's the one cardinal rule. IE, retrieve with the same type you used to store.

As for casting before you store, that's business as usual, ie, int -> long -will- sign extend, if you want to avoid that, you need to treat things as unsigned int -> unsigned long

Comment 31 Dan Williams 2011-09-26 21:58:14 UTC
(In reply to comment #30)
> nothing more than any other arch :-)
> 
> If you store as an int you need to retrieve as an int, that's the one cardinal
> rule. IE, retrieve with the same type you used to store.
> 
> As for casting before you store, that's business as usual, ie, int -> long
> -will- sign extend, if you want to avoid that, you need to treat things as
> unsigned int -> unsigned long

The patch we've come up with works so far.  What's not working is marshalling return values.  libffi is told that the value is a 32-bit value (ffi_type_sint) yet apparently when it comes back from ffi_call() it's been munged into a 64-bit value.  I would have expected a 32-bit value back from ffi_call, and everything I can see in ffi says that a 4-byte value got allocated on the stack for the return value, but on return it's not.

g_cclosure_marshal_generic:

  ffi_call (&cif, marshal_data ? marshal_data : cc->callback, rvalue, args);

(rvalue is a void* pointer to a stack-allocated signed int here)

but if we expect a return value of -30 and do this after ffi_call():

fprintf (stderr, "int %d\n", *(gint*)rvalue);
fprintf (stderr, "long %ld\n", *(glong*)rvalue);

we get output of:

int -1
long -30

where the -30 is really 0xFFFFFFFFFFFFFFE2 underneath.  So the question of the day is why libffi appears to be returning a 64-bit value when were requested a 32-bit signed int.

Comment 32 Dan Williams 2011-09-26 23:00:31 UTC
Created attachment 525010 [details]
Hackish patch which fixes marshalling on ppc64 and has no regressions on x86-64

This patch passes the testcases on ppc64 and x86-64, but I'm really not sure why we need to do the return munging in value_from_ffi_type() since we're telling libffi that we expect a return type of 'sint' which is a 4-byte value, but it appears libffi is returning a pointer to a 8-byte/64-bit value instead.

Comment 33 Benjamin Herrenschmidt 2011-09-26 23:53:14 UTC
IMHO that's wrong. Instead you should marshall G_TYPE_ENUM to a full long (arguably an unsigned one even).

The target code will expect all arguments to be fully zero or sign extended to 64-bit anyways, whether they are passed in registers or on the stack, and your g_value as v_long is already appropriately extended by glib, so just marshall it as a long.

Of course I have no way to tell whether that will work on not on x86_64, sparc64, mips64, ia64, ... tho I wouldn't be surprised is sparc is just like powerpc here

Comment 34 Benjamin Herrenschmidt 2011-09-27 00:28:18 UTC
Created attachment 525015 [details]
Untested patch to fix 32-bit return values in libffi

So from my quick look at it, it appears that libffi doesn't properly deal
with being asked for a 32-bit return value, it always returns a 64-bit
value on ppc64. I believe this is a bug.

This patch uses the FLAG_RETURNS_64BITS that is set by the prep code to
decide whether to use a stw or a std instruction for the return value,
which should -hopefully- fix it.

Untested here so let me know.

Comment 35 Anthony Green 2011-09-27 00:47:50 UTC
For 64-bit ABIs that extend integral returns types to 64-bits, libffi always returns full 64-bit values that you can truncate in the calling code.   It's just the way it is has always been.  Please don't change libffi.  I'll document this clearly for the next version (perhaps there is a mention of this, I haven't looked yet).

The same is true for returning 8-bit values, for instance, on 32-bit systems.  All ABIs extend those results to the full 32-bits so you need to provide a properly aligned buffer that's big enough to hold the result.

Comment 36 Anthony Green 2011-09-27 00:52:37 UTC
BTW - you can just use ffi_arg as the storage for all integral return values.  Look at the libffi testsuite code, like return_sc.c.  For ppc64 ffi_arg is an unsigned long.

Comment 37 Benjamin Herrenschmidt 2011-09-27 00:58:11 UTC
That doesn't sound right to me and totally defeats the point of typing the
return value....

If I tell libffi that I want a return value of type "int", why should I pass a
buffer to something other than "int" ?

What you say means that the caller of libffi would have to hold ABI specific
knowledge, which completely defeats the purpose of libffi here.

So I don't think you are on the right track :-) libffi should convert the
return value to whatever type has been requested by the caller. The caller
shouldn't have to make various assumptions on what ABI it's running on.
  
Now I'm not familiar with libffi, just looking at it in the context of that bug, but what happens for arguments (rather than return values) ?

If I pass an argument of type "int", am I expected to pass a pointer to a 32-bit int or a 64-bit sign extended int ?

My understanding is that the former is true, and that's how glib uses libffi. I don't see why the return argument should work differently.

Comment 38 Benjamin Herrenschmidt 2011-09-27 01:06:10 UTC
Note: If that's one of those too-late-to-fix issue (since if we change it, then things using ffi_arg in turn will be broken), then we should probably go through a big fat code audit of everything that uses the library.

IE. I can perfectly see about everybody using it getting it wrong (simply becaues it just doesn't make any sense to do it that way imho :-) not just glib here.

Comment 39 Dan Williams 2011-09-27 01:12:01 UTC
If you don't want to break libffi ABI but still handle return type in a platform agnostic way, I'd suggest adding some function like

ffi_collect_return (ffi_cif *cif, ffi_type rtype, void *rvalue_in, void *rvalue_out);

or something like that.  Which would certainly help insulate callers from platform differences like this.  Then something like glib could avoid return value handling errors with

  if (ffi_prep_cif (&cif, FFI_DEFAULT_ABI, n_args, rtype, atypes) != FFI_OK)
    return;

  ffi_call (&cif, marshal_data ? marshal_data : cc->callback, rvalue, args);

  ffi_collect_return (&cif, rtype, rvalue, rvalue_sanitized);

  if (return_gvalue && G_VALUE_TYPE (return_gvalue))
    value_from_ffi_type (return_gvalue, rvalue_sanitized);

Comment 40 Anthony Green 2011-09-27 01:25:09 UTC
(In reply to comment #39)
> If you don't want to break libffi ABI but still handle return type in a
> platform agnostic way, I'd suggest adding some function like
> 
> ffi_collect_return (ffi_cif *cif, ffi_type rtype, void *rvalue_in, void
> *rvalue_out);

An interesting idea, although I'm not sure it's necessary.  There's nothing platform specific about casting the ffi_arg result to the shorter type that you want, although it's usually not even necessary to do that.

Comment 41 Benjamin Herrenschmidt 2011-09-27 01:39:11 UTC
Anyhow, looks like we have to use ffi_arg, it's unfixable, too many things already using it that way.

On the other hand, quite a few users of libffi around seem to basically pass the pointer to their native type along just like glib does, tho it's hard to tell with a simple google code search. You may want to double check the various JNIs etc... out there.

This is such a horribly fragile design decision, so error prone, it's really sad. This is the kind of stuff that people -will- get wrong over and over again and will lead to subtle memory corruption bugs (since the problem is a lot less visible on LE, at least on BE we see the bad return value immediately).

If you meant to always get an ffi_arg for scalar type, then you shouldn't have made the return argument a void * but some kind of union or something a tad more strongly typed.

What made you ever think that doing return values differently than arguments was ever a good idea ?

Comment 42 Dan Williams 2011-09-27 01:48:29 UTC

(In reply to comment #40)
> (In reply to comment #39)
> > If you don't want to break libffi ABI but still handle return type in a
> > platform agnostic way, I'd suggest adding some function like
> > 
> > ffi_collect_return (ffi_cif *cif, ffi_type rtype, void *rvalue_in, void
> > *rvalue_out);
> 
> An interesting idea, although I'm not sure it's necessary.  There's nothing
> platform specific about casting the ffi_arg result to the shorter type that you
> want, although it's usually not even necessary to do that.

For sake of argument:

int foo;
void *rvalue;

rtype = &ffi_type_sint;
rvalue = alloca (ffi_type_sint.size);
ffi_prep_cif (&cif, FFI_DEFAULT_ABI, n_args, rtype, ...);
ffi_call (&cif, callback, rvalue, ...);

Well, the problem is on ppc64 (and maybe any 64-bit BE platform) that casting directly doesn't work.  To get the right value out of an ffi_type_sint return value, you need to:

foo = (int) (*(long*) rvalue);

instead of what you'd expect, which would be:

int foo = (int) rvalue;

since on ppc64 the second one (the direct cast) returns the wrong 32 bits of the 64-bit value from ffi_call().  Or am I missing something here?

Comment 43 Dan Williams 2011-09-27 01:56:10 UTC
(In reply to comment #42)
> 
> (In reply to comment #40)
> > (In reply to comment #39)
> > > If you don't want to break libffi ABI but still handle return type in a
> > > platform agnostic way, I'd suggest adding some function like
> > > 
> > > ffi_collect_return (ffi_cif *cif, ffi_type rtype, void *rvalue_in, void
> > > *rvalue_out);
> > 
> > An interesting idea, although I'm not sure it's necessary.  There's nothing
> > platform specific about casting the ffi_arg result to the shorter type that you
> > want, although it's usually not even necessary to do that.
> 
> For sake of argument:
> 
> int foo;
> void *rvalue;
> 
> rtype = &ffi_type_sint;
> rvalue = alloca (ffi_type_sint.size);
> ffi_prep_cif (&cif, FFI_DEFAULT_ABI, n_args, rtype, ...);
> ffi_call (&cif, callback, rvalue, ...);
> 
> Well, the problem is on ppc64 (and maybe any 64-bit BE platform) that casting
> directly doesn't work.  To get the right value out of an ffi_type_sint return
> value, you need to:
> 
> foo = (int) (*(long*) rvalue);
> 
> instead of what you'd expect, which would be:
> 
> int foo = (int) rvalue;
> 
> since on ppc64 the second one (the direct cast) returns the wrong 32 bits of
> the 64-bit value from ffi_call().  Or am I missing something here?

so benh says using ffi_arg for rvalue instead of void would do the trick and then we could directly cast.  So nevermind this.

Comment 44 Benjamin Herrenschmidt 2011-09-27 02:15:36 UTC
So Anthony, correct me if I'm wrong, but what that means is that
anything that's "bridging" two types interface, ie glib vs. ffi
java vs. ffi, js vs. ffi, smalltalk vs. ffi (to take a few examples
I picked with google) must do something along those lines:

 Is return type scalar ?

    Yes : use a pointer to a temp ffi_arg
    No  : use a pointer to the original return type

 Do the call

 Was return type scalar ?

    Yes : Do some kind of giant switch/case that appropriately casts
          that ffi_arg to the appropriate original type

From what I can tell, the various bridges i've seen around don't do that,
so it's VERY possible that I've not totally grasped what they do (they are
all quite convoluted code and all seem to manipulate language implementation
specific types that I don't know about) but it looks like they also all
do something akin to what glib did (which is broken) which is to essentially
pass a pointer to the original type to ffi_call and concert the type itself
to the appropriate ffi_type, just like for arguments.

If I'm right, that means that the majority of your users out there are wrong
and happily silently corrupting memory beyond the return value on x86_64
(or getting the return value wrong on ppc64/sparc64/mips64).

I would VERY STRONGLY suggest that you revisit this and provide a new variant
of ffi_call that converts the return values so that the pointer passed by the caller corresponds precisely to the type requested, just like arguments, and then start the process of deprecating the current variant of ffi_call.

The current situation as far as I can tell is just way too error prone and boils down to bad engineering.

Comment 45 Benjamin Herrenschmidt 2011-09-27 02:18:38 UTC
Dan: Won't work on x86, double won't fit 64-bit, unless ffi_arg is big enough to hold it there.

I'm afraid you'll really have to special case scalar vs. non-scalar.

I don't have the bandwidth for that, but somebody should really audit all users of libffi, that stuff smells real bad.

Comment 46 Anthony Green 2011-09-27 03:12:01 UTC
(In reply to comment #41)
> On the other hand, quite a few users of libffi around seem to basically pass
> the pointer to their native type along just like glib does, tho it's hard to
> tell with a simple google code search. You may want to double check the various
> JNIs etc... out there.

I'll write something up for the libffi-discuss list.

> If you meant to always get an ffi_arg for scalar type, then you shouldn't have
> made the return argument a void * but some kind of union or something a tad
> more strongly typed.
> 
> What made you ever think that doing return values differently than arguments
> was ever a good idea ?

I don't recall.  It was 15 years ago.

Comment 47 Benjamin Herrenschmidt 2011-09-27 03:18:08 UTC
Fair enough, I did stupid stuff 15 years ago too :-) Still, I believe it would be useful to introduce a new variant of ffi_call that does argument conversion.

In the end, the callers will have to do it if libffi doesn't do it and I'd rather see that sort of stuff done right in one place than with 5 different bugs in 5 different places.

Comment 48 Anthony Green 2011-09-27 03:41:03 UTC
(In reply to comment #44)
> So Anthony, correct me if I'm wrong, but what that means is that
> anything that's "bridging" two types interface, ie glib vs. ffi
> java vs. ffi, js vs. ffi, smalltalk vs. ffi (to take a few examples
> I picked with google) must do something along those lines:
> 
>  Is return type scalar ?
> 
>     Yes : use a pointer to a temp ffi_arg
>     No  : use a pointer to the original return type
> 
>  Do the call
> 
>  Was return type scalar ?
> 
>     Yes : Do some kind of giant switch/case that appropriately casts
>           that ffi_arg to the appropriate original type

Essentially, yes.

> From what I can tell, the various bridges i've seen around don't do that,

I just checked python (ctypes), JNA, Ruby FFI and gcj and they all get it
right.

This would be an interesting check for one of the static analysis tools out
there to add.

> I would VERY STRONGLY suggest that you revisit this and provide a new variant
> of ffi_call that converts the return values so that the pointer passed by the
> caller corresponds precisely to the type requested, just like arguments, and
> then start the process of deprecating the current variant of ffi_call.

Fair enough.  This may be of value, since many users have to go through this
manual step.  As I mentioned, I'll send a note to libffi-discuss.  Patches are
also welcome, of course.

Some of the reasoning for this decision is coming back to me.  The thought was
that some users will immediately want to promote the result back, so leaving
those results in their promoted sizes would be a win in those cases.  Or
something like that.  15 years is a long time!

Comment 49 Benjamin Herrenschmidt 2011-09-27 04:32:36 UTC
> Some of the reasoning for this decision is coming back to me.  The thought was
> that some users will immediately want to promote the result back, so leaving
> those results in their promoted sizes would be a win in those cases.  Or
> something like that.  15 years is a long time!

But such users could just have requested a long and be done with it :-)

No biggie, I'm happy you managed to parse JNA code, I gave up :-)

One I haven't managed to get my head around is

http://code.google.com/p/gpsee/source/browse/modules/gffi/CFunction.c?spec=svn312dc94085ad58a411b4b04281f9dbe4667e6b41&r=312dc94085ad58a411b4b04281f9dbe4667e6b41

There was another one (smalltalk realted) but can't seem to find it anymore,
and of course glib :-)

I think Davlik (android) has it wrong. I didn't find the canonical repo from google but there's a few offshoots here or there, here's one of them:

http://gitorious.org/0xdroid/dalvik/blobs/master/vm/arch/generic/Call.c

The uint32 case will be wrong here as far as I can tell

I will be away from friday for a few weeks, I don't have time to help much
more with this, sorry about that, hopefully Dan will be able to sort out glib
and we'll see what else eventually hits

Comment 50 IBM Bug Proxy 2011-09-27 21:33:53 UTC
Phil and I were looking at this today and the various patches didn't seem to allow glib2 to pass tests.  Are we still contemplating our next move?

Comment 51 Brent Baude 2011-09-27 21:36:19 UTC
Previous comment was by Brent (firefox seemed to hold my bugproxy-ness).  This bug(s) is keeping us from alpha on ppc64.

Comment 52 Benjamin Herrenschmidt 2011-09-27 21:38:11 UTC
Right, Dan needs to do a new patch for the return values, are there other tests that fail elsewhere ?

Comment 53 Phil Knirsch 2011-09-28 11:25:41 UTC
Quick update from Karsten via phone today:

He managed to compose new images yesterday with Ben's fix in libffi and the ones that Dan mentioned need to be in glib (back out one of the "wrong" patches).

The second marshaller test still fails now, but the installer now successfully works with NM and NM doesn't need to be manually restarted in the installer anymore!

IIRC Dan also mentioned he tested the libffi change on x86_64 if it breaks things there but at least according to his tests they didn't.

In any case, please retest with the latest images available from:

http://ppc.koji.fedoraproject.org/scratch/karsten/iso/Fedora-20110928-ppc64-DVD.iso

If those work we'll use the fixes for now, at least for PPC64.

Thanks & regards, Phil

Comment 54 Dan Williams 2011-09-28 15:06:32 UTC
Unfortunately I need to get back to the networking stuff I've been ignoring the last few days trying to track this down, so we need to get the glib people to work on this instead, since glib is where the bug actually is (well, and libffi's odd API but whatever).  So either Matthias or Colin Walters need to take this bug over at this point, now that we've diagnosed what the issue is.  I'm happy to grab and test whatever fixes they come up with on power02.str since that wouldn't take a large amount of time but I can't spend a ton of time on this bug for a bit...

Comment 55 Dan Williams 2011-09-28 19:25:53 UTC
My recommendation is to back out the libffi patch since it's not sanctioned upstream, and apply the attached patch to glib2 builds, pending approval by glib maintainers.

Comment 56 Dan Williams 2011-09-28 19:26:28 UTC
RFC/proposed glib patch to address marshalling issues:

http://bugzilla-attachments.gnome.org/attachment.cgi?id=197698

Comment 57 Brent Baude 2011-09-28 19:34:37 UTC
It was my understanding that the glib2 patch alone was not enough to make NM work.  Do others recall this as well?

Comment 58 Benjamin Herrenschmidt 2011-09-28 22:18:31 UTC
I would recommend renaming value_to_ffi_type to rvalue_to_ffi_type : added the "r" to make it clear that this function is only to be used on a return value, maybe add a comment explaining why we do that too, ie:

 /* ffi_call return value for scalars is always an ffi_arg, regardless
  * of the requested type, so we need to convert it appropriately

Comment 59 Benjamin Herrenschmidt 2011-09-28 22:20:06 UTC
Brent: I think the previous glib patch wasn't enough as it didn't handle the return value problem, the new patch should make it work.

Do we still have failures in glib2 tests ? Is there another case that isn't handled properly ?

Comment 60 Dan Williams 2011-09-29 15:07:25 UTC
(In reply to comment #57)
> It was my understanding that the glib2 patch alone was not enough to make NM
> work.  Do others recall this as well?

No, the glib2 patch alone should make NM work.  The original problem was failure of glib to properly marshal enums which NM was depending for inotify events, which would allow NM to detect that anaconda had updated the ifcfg file.

Comment 61 Dan Williams 2011-09-29 15:08:51 UTC
(In reply to comment #59)
> Brent: I think the previous glib patch wasn't enough as it didn't handle the
> return value problem, the new patch should make it work.
> 
> Do we still have failures in glib2 tests ? Is there another case that isn't
> handled properly ?

The linked patch passes the marshalling tests that we added to glib to expose this bug.  AFAIK there is no other case that we care about that fails.  We do need better testcase coverage of all the other return types, but given the changes in this patch those *should* work (but wouldn't hurt to verify).

Comment 62 Dan Williams 2011-10-05 19:12:49 UTC
The patch has been committed to upstream glib.

Comment 63 Fedora Update System 2011-10-05 20:22:16 UTC
glib2-2.30.0-2.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/glib2-2.30.0-2.fc16

Comment 64 Fedora Update System 2011-10-06 21:22:24 UTC
Package glib2-2.30.0-2.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing glib2-2.30.0-2.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2011-13892
then log in and leave karma (feedback).

Comment 65 Fedora Update System 2011-10-11 03:19:11 UTC
glib2-2.30.0-2.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 66 Wes Garland 2014-05-26 18:22:22 UTC
If you ever want to try and get your head around the gffi module mentioned in comment 49, please let me know. Sorry about the readability. I expect it to be relatively future-safe and portable across SUS v3 (and Linux) providing the libffi API does not change.  It was developed on 3.0.8 and we currently use 3.0.10.  Tested on x86_64, x86, 32-bit sparc v9.  Mac, Linux, Solaris.

I wrote that code several years ago, but I do recall having to handle type marshalling differently for return values and arguments, and being puzzled by that.

IIRC, the marshalling converts between JavaScript values and appropriate C types, then converts between the C types and the FFI types.  ISTR int-like values are categorized by examining the storage size and signedness.

The type converters (marshalling functions) are decided when the JS CFunction class is instanciated, based on the function arguments passed in the constructor. The type converter which is actually invoked for a given argument is stored as a function pointer in the argConverters array, which is unique per instance of the JS CFunction class.


Note You need to log in before you can comment on or make changes to this bug.