bash-4.2# cat << __EOF__ > /etc/sysconfig/network-scripts/ifcfg-eth0 > DEVICE="eth0" > NM_CONTROLLED="yes" > HWADDR="2A:0F:4A:82:98:04" > ONBOOT="yes" > BOOTPROTO=dhcp > __EOF__ bash-4.2# [ 131.676275] RTAS: event: 35, Type: Platform Information Event, Severity: 1 [ 139.675007] RTAS: event: 36, Type: Platform Information Event, Severity: 1 [ 147.671431] RTAS: event: 37, Type: Platform Information Event, Severity: 1 bash-4.2# /bin/systemctl start NetworkManager.service [ 151.730553] systemd[1]: Accepted connection on private bus. [ 151.730589] systemd[1]: Running GC... [ 151.733642] systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.StartUnit() on /org/freedesktop/systemd1 [ 151.733718] systemd[1]: Trying to enqueue job NetworkManager.service/start/replace [ 151.733899] systemd[1]: Installed new job NetworkManager.service/start as 56 [ 151.733918] systemd[1]: Installed new job network.target/start as 99 [ 151.733930] systemd[1]: Installed new job arp-ethers.service/start as 100 [ 151.733944] systemd[1]: Enqueued job NetworkManager.service/start as 56 [ 151.734096] systemd[1]: Starting of arp-ethers.service requested but condition failed. Ignoring. [ 151.734115] systemd[1]: Job arp-ethers.service/start finished, result=done [ 151.734267] systemd[1]: About to execute: /usr/sbin/NetworkManager --no-daemon [ 151.763782] systemd[1]: Forked /usr/sbin/NetworkManager as 615 [ 151.764048] systemd[1]: NetworkManager.service changed dead -> start [ 151.764247] systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.GetUnit() on /org/freedesktop/systemd1 [ 151.764480] systemd[1]: Got D-Bus request: org.freedesktop.DBus.Properties.Get() on /org/freedesktop/systemd1/unit/NetworkManager_2eservice [ 152.200371] systemd[1]: Incoming traffic on syslog.socket [ 152.200407] NetworkManager[615]: <info> NetworkManager (version 0.8.9997-7.git20110721.fc16) is starting... [ 152.200430] systemd[1]: syslog.socket changed listening -> running [ 152.200436] NetworkManager[615]: <info> Read config file /etc/NetworkManager/NetworkManager.conf [ 152.201788] systemd[1]: Incoming traffic on dbus.socket [ 152.201811] systemd[1]: Trying to enqueue job dbus.service/start/replace [ 152.201944] systemd[1]: Installed new job dbus.service/start as 101 [ 152.201955] systemd[1]: Enqueued job dbus.service/start as 101 [ 152.201976] systemd[1]: dbus.socket changed listening -> running [ 152.202030] systemd[1]: About to execute: /bin/dbus-uuidgen --ensure [ 152.233788] systemd[1]: Forked /bin/dbus-uuidgen as 616 [ 152.233870] systemd[1]: dbus.service changed dead -> start-pre [ 152.242162] systemd[1]: Received SIGCHLD from PID 616 (dbus-uuidgen). [ 152.242260] systemd[1]: Got SIGCHLD for process 616 (dbus-uuidgen) [ 152.242572] systemd[1]: Child 616 died (code=exited, status=0/SUCCESS) [ 152.242582] systemd[1]: Child 616 belongs to dbus.service [ 152.242595] systemd[1]: dbus.service: control process exited, code=exited status=0 [ 152.242606] systemd[1]: dbus.service running next control command for state start-pre [ 152.242629] systemd[1]: About to execute: /bin/rm -f /var/run/messagebus.pid [ 152.273875] systemd[1]: Forked /bin/rm as 618 [ 152.274097] systemd[1]: Accepted connection on private bus. [ 152.274443] systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Agent.Released() on /org/freedesktop/systemd1/agent [ 152.274776] systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local [ 152.277556] systemd[1]: Received SIGCHLD from PID 618 (rm). [ 152.277660] systemd[1]: Got SIGCHLD for process 618 (rm) [ 152.277989] systemd[1]: Child 618 died (code=exited, status=0/SUCCESS) [ 152.278000] systemd[1]: Child 618 belongs to dbus.service [ 152.278012] systemd[1]: dbus.service: control process exited, code=exited status=0 [ 152.278025] systemd[1]: dbus.service got final SIGCHLD for state start-pre [ 152.278102] systemd[1]: About to execute: /bin/dbus-daemon --system --address=systemd: --nofork --systemd-activation [ 152.313908] systemd[1]: Forked /bin/dbus-daemon as 620 [ 152.314324] systemd[1]: dbus.service changed start-pre -> running [ 152.314357] systemd[1]: Job dbus.service/start finished, result=done [ 152.430768] systemd[1]: Successfully connected to system D-Bus bus 44edc23d2a7e479f6be2414100000098 as :1.0 [ 152.432543] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameAcquired() on /org/freedesktop/DBus [ 152.432722] systemd[1]: Accepted connection on private bus. [ 152.432958] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus [ 152.432988] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameAcquired() on /org/freedesktop/DBus [ 152.433019] systemd[1]: Successfully acquired name. [ 152.433452] systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Agent.Released() on /org/freedesktop/systemd1/agent [ 152.433939] systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local [ 152.525020] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus [ 152.526937] dbus[620]: [system] Activating service name='org.freedesktop.PolicyKit1' (using servicehelper) [ 152.562077] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus [ 152.572014] polkitd[623]: started daemon version 0.101 using authority implementation `local' version `0.101' [ 152.572604] dbus[620]: [system] Successfully activated service 'org.freedesktop.PolicyKit1' [ 152.572873] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus [ 152.612887] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus [ 152.613348] NetworkManager[615]: ifcfg-rh: Acquired D-Bus service com.redhat.ifcfgrh1 [ 152.613376] NetworkManager[615]: <info> Loaded plugin ifcfg-rh: (c) 2007 - 2010 Red Hat, Inc. To report bugs please use the NetworkManager mailing list. [ 152.613672] NetworkManager[615]: <info> Loaded plugin keyfile: (c) 2007 - 2010 Red Hat, Inc. To report bugs please use the NetworkManager mailing list. [ 152.614035] NetworkManager[615]: ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-eth0 ... [ 152.720738] NetworkManager[615]: ifcfg-rh: read connection 'System eth0' [ 152.720809] NetworkManager[615]: ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-lo ... [ 152.722363] NetworkManager[615]: <info> trying to start the modem manager... [ 152.722875] dbus[620]: [system] Activating service name='org.freedesktop.ModemManager' (using servicehelper) [ 152.726514] NetworkManager[615]: <info> monitoring kernel firmware directory '/lib/firmware'. [ 152.729666] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus [ 152.729731] systemd[1]: NetworkManager.service's D-Bus name org.freedesktop.NetworkManager now registered by :1.1 [ 152.730052] systemd[1]: NetworkManager.service changed start -> running [ 152.730072] systemd[1]: Job NetworkManager.service/start finished, result=done [ 152.730412] systemd[1]: network.target changed dead -> active [ 152.730434] systemd[1]: Job network.target/start finished, result=done [ 152.730541] systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local [ 152.730740] NetworkManager[615]: <info> WiFi enabled by radio killswitch; enabled by state file [ 152.730770] NetworkManager[615]: <info> WWAN enabled by radio killswitch; enabled by state file [ 152.730792] NetworkManager[615]: <info> WiMAX enabled by radio killswitch; enabled by state file [ 152.730811] NetworkManager[615]: <info> Networking is enabled by state file [ 152.731689] dbus[620]: [system] Activated service 'org.freedesktop.ModemManager' failed: Cannot launch daemon, file not found or permissions invalid [ 152.734026] NetworkManager[615]: <error> [1314825991.163488] [nm-device-ethernet.c:751] real_update_permanent_hw_address(): (eth0): unable to read permanent MAC address (error 0) [ 152.735800] NetworkManager[615]: <info> (eth0): carrier is OFF [ 152.736037] NetworkManager[615]: <info> (eth0): new Ethernet device (driver: 'ibmveth' ifindex: 2) [ 152.736057] NetworkManager[615]: <info> (eth0): exported as /org/freedesktop/NetworkManager/Devices/0 [ 152.736414] NetworkManager[615]: <info> (eth0): now managed [ 152.736434] NetworkManager[615]: <info> (eth0): device state change: unmanaged -> unavailable (reason 'managed') [10 20 2] [ 152.736530] NetworkManager[615]: <info> (eth0): bringing up device. [ 152.753302] NetworkManager[615]: <info> (eth0): preparing device. [ 152.753320] NetworkManager[615]: <info> (eth0): deactivating device (reason: 2). [ 152.755623] NetworkManager[615]: <info> (eth0): carrier now ON (device state 20) [ 152.755646] NetworkManager[615]: <info> (eth0): device state change: unavailable -> disconnected (reason 'carrier-changed') [20 30 40] [ 152.756250] NetworkManager[615]: <warn> bluez error getting default adapter: The name org.bluez was not provided by any .service files [ 152.756837] NetworkManager[615]: <info> Auto-activating connection 'System eth0'. [ 152.756937] NetworkManager[615]: <info> Activation (eth0) starting connection 'System eth0' [ 152.756954] NetworkManager[615]: <info> (eth0): device state change: disconnected -> prepare (reason 'none') [30 40 0] [ 152.757093] NetworkManager[615]: <info> Activation (eth0) Stage 1 of 5 (Device Prepare) scheduled... [ 152.757289] NetworkManager[615]: <info> Activation (eth0) Stage 1 of 5 (Device Prepare) started... [ 152.757310] NetworkManager[615]: <info> Activation (eth0) Stage 2 of 5 (Device Configure) scheduled... [ 152.757330] NetworkManager[615]: <info> Activation (eth0) Stage 1 of 5 (Device Prepare) complete. [ 152.757347] NetworkManager[615]: <info> Activation (eth0) Stage 2 of 5 (Device Configure) starting... [ 152.757393] NetworkManager[615]: <info> (eth0): device state change: prepare -> config (reason 'none') [40 50 0] [ 152.757622] NetworkManager[615]: <info> Activation (eth0) Stage 2 of 5 (Device Configure) successful. [ 152.757643] NetworkManager[615]: <info> Activation (eth0) Stage 3 of 5 (IP Configure Start) scheduled. [ 152.757685] NetworkManager[615]: <info> Activation (eth0) Stage 2 of 5 (Device Configure) complete. [ 152.757748] NetworkManager[615]: <info> Activation (eth0) Stage 3 of 5 (IP Configure Start) started... [ 152.757768] NetworkManager[615]: <info> (eth0): device state change: config -> ip-config (reason 'none') [50 70 0] [ 152.758104] NetworkManager[615]: <info> Activation (eth0) Beginning DHCPv4 transaction (timeout in 45 seconds) [ 152.768800] NetworkManager[615]: <info> dhclient started with pid 628 [ 152.769346] NetworkManager[615]: <info> Activation (eth0) Stage 3 of 5 (IP Configure Start) complete. bash-4.2# [ 153.163692] dhclient[628]: Internet Systems Consortium DHCP Client 4.2.2 [ 153.163751] dhclient[628]: Copyright 2004-2011 Internet Systems Consortium. [ 153.163774] dhclient[628]: All rights reserved. [ 153.163793] dhclient[628]: For info, please visit https://www.isc.org/software/dhcp/ [ 153.191386] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus [ 153.193510] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus [ 153.193994] NetworkManager[615]: <info> (eth0): DHCPv4 state changed nbi -> preinit [ 153.195386] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus [ 153.195593] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus [ 153.212030] dhclient[628]: Listening on LPF/eth0/2a:0f:4a:82:98:04 [ 153.212101] dhclient[628]: Sending on LPF/eth0/2a:0f:4a:82:98:04 [ 153.212283] dhclient[628]: Sending on Socket/fallback [ 153.212429] dhclient[628]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 7 [ 153.212498] ibmveth 30000004: DMA-API: device driver frees DMA memory with wrong function [device address=0x0000000003040490] [size=342 bytes] [mapped as single] [unmapped as page] [ 153.212528] ------------[ cut here ]------------ [ 153.212533] WARNING: at lib/dma-debug.c:829 [ 153.212539] Modules linked in: squashfs nls_utf8 ibmvscsic scsi_transport_srp ibmveth scsi_tgt [ 153.212558] NIP: c000000000343fb0 LR: c000000000343fac CTR: c000000000068324 [ 153.212567] REGS: c000000272153090 TRAP: 0700 Not tainted (3.0.1-5.fc16.kh.ppc64) [ 153.212574] MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 48222482 XER: 00000009 [ 153.212592] TASK = c0000002710d4e80[628] 'dhclient' THREAD: c000000272150000 CPU: 0 [ 153.212600] GPR00: c000000000343fac c000000272153310 c00000000141a7b8 00000000000000bb [ 153.212613] GPR04: 0000000000000001 c0000000000ac328 0000000000000000 0000000000000002 [ 153.212625] GPR08: 0000000000000000 c0000002710d4e80 0000000000017f20 0000000000000001 [ 153.212638] GPR12: 0000000084222442 c00000000ee54000 0000000000000000 c000000271318a00 [ 153.212651] GPR16: 0000000000000000 0000000020801988 0000000003040490 0000000000000020 [ 153.212664] GPR20: 0000000000000e60 0000000000000000 c0000002753ac278 0000000003040490 [ 153.212677] GPR24: 0000000000000156 0000000000000000 0000000000000001 c000000002040d00 [ 153.212690] GPR28: c0000002760ec420 c000000272153440 c000000001399b88 c000000272153310 [ 153.212710] NIP [c000000000343fb0] .check_unmap+0x3dc/0x77c [ 153.212718] LR [c000000000343fac] .check_unmap+0x3d8/0x77c [ 153.212724] Call Trace: [ 153.212728] [c000000272153310] [c000000000343fac] .check_unmap+0x3d8/0x77c (unreliable) [ 153.212738] [c0000002721533d0] [c0000000003445a4] .debug_dma_unmap_page+0x78/0x80 [ 153.212750] [c000000272153510] [d0000000048ce194] .ibmveth_start_xmit+0x53c/0x67c [ibmveth] [ 153.212760] [c000000272153640] [c0000000005854fc] .dev_hard_start_xmit+0x5a8/0x7e8 [ 153.212769] [c000000272153740] [c0000000005a6104] .sch_direct_xmit+0x7c/0x278 [ 153.212777] [c0000002721537f0] [c000000000585e84] .dev_queue_xmit+0x748/0xa48 [ 153.212787] [c0000002721538b0] [c00000000067caf8] .packet_sendmsg+0xb54/0xc70 [ 153.212796] [c000000272153a00] [c000000000569168] .sock_aio_write+0x138/0x150 [ 153.212805] [c000000272153b40] [c0000000001cf95c] .do_sync_write+0xa8/0xe4 [ 153.212813] [c000000272153cc0] [c0000000001d00cc] .vfs_write+0xe4/0x188 [ 153.212822] [c000000272153d70] [c0000000001d03d8] .SyS_write+0x58/0x88 [ 153.212831] [c000000272153e30] [c000000000009928] syscall_exit+0x0/0x40 [ 153.212838] Instruction dump: [ 153.212843] e97c001a e93e8048 e87e80e0 e8dd0028 e81d001a e8fd0030 796b1f24 78001f24 [ 153.212857] 7d09582a 7d29002a 4835d5b9 60000000 <0fe00000> 480000c4 2f800003 409e0100 [ 153.212873] ---[ end trace 23ebc7de1702caf3 ]--- [ 153.212878] Mapped at: [ 153.212882] [<c000000000344b90>] .debug_dma_map_page+0x9c/0x1c0 [ 153.212889] [<d0000000048cdf18>] .ibmveth_start_xmit+0x2c0/0x67c [ibmveth] [ 153.212897] [<c0000000005854fc>] .dev_hard_start_xmit+0x5a8/0x7e8 [ 153.212905] [<c0000000005a6104>] .sch_direct_xmit+0x7c/0x278 [ 153.212912] [<c000000000585e84>] .dev_queue_xmit+0x748/0xa48 [ 153.213305] dhclient[628]: DHCPREQUEST on eth0 to 255.255.255.255 port 67 [ 153.213384] dhclient[628]: DHCPOFFER from 9.5.250.185 [ 153.213653] dhclient[628]: DHCPACK from 9.5.250.185 [ 153.221422] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus [ 153.222206] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus [ 153.223081] NetworkManager[615]: <info> (eth0): DHCPv4 state changed preinit -> bound [ 153.223121] NetworkManager[615]: <info> Activation (eth0) Stage 4 of 5 (IP4 Configure Get) scheduled... [ 153.223220] NetworkManager[615]: <info> Activation (eth0) Stage 4 of 5 (IP4 Configure Get) started... [ 153.223472] NetworkManager[615]: <info> address 9.5.250.146 [ 153.223544] NetworkManager[615]: <info> prefix 24 (255.255.255.0) [ 153.223573] NetworkManager[615]: <info> gateway 9.5.250.1 [ 153.223600] NetworkManager[615]: <info> nameserver '9.10.244.100' [ 153.223671] NetworkManager[615]: <info> nameserver '9.10.244.200' [ 153.223699] NetworkManager[615]: <info> domain name 'rchland.ibm.com' [ 153.224288] NetworkManager[615]: <info> Activation (eth0) Stage 5 of 5 (IP Configure Commit) scheduled... [ 153.224333] NetworkManager[615]: <info> Activation (eth0) Stage 4 of 5 (IP4 Configure Get) complete. [ 153.224566] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus [ 153.224838] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus [ 153.225977] NetworkManager[615]: <info> Activation (eth0) Stage 5 of 5 (IP Configure Commit) started... [ 153.226040] dhclient[628]: bound to 9.5.250.146 -- renewal in 268 seconds. [ 154.227957] NetworkManager[615]: <info> (eth0): device state change: ip-config -> activated (reason 'none') [70 100 0] [ 154.229133] NetworkManager[615]: <info> Policy set 'System eth0' (eth0) as default for IPv4 routing and DNS. [ 154.229160] NetworkManager[615]: <info> Activation (eth0) successful, device activated. [ 154.230229] NetworkManager[615]: <info> Activation (eth0) Stage 5 of 5 (IP Configure Commit) complete. [ 154.230481] dbus[620]: [system] Activating service name='org.freedesktop.nm_dispatcher' (using servicehelper) [ 154.263691] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus [ 154.265094] dbus[620]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher' [ 154.265330] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus [ 154.276050] nm-dispatcher.action[633]: Script '/etc/NetworkManager/dispatcher.d/00-netreport' exited with error status 1. [ 155.667990] RTAS: event: 38, Type: Platform Information Event, Severity: 1 [ 163.662767] RTAS: event: 39, Type: Platform Information Event, Severity: 1 [ 165.011042] systemd[1]: Received SIGCHLD from PID 633 (nm-dispatcher.a). [ 165.011221] systemd[1]: Got SIGCHLD for process 633 (nm-dispatcher.a) [ 165.011491] systemd[1]: Child 633 died (code=exited, status=0/SUCCESS) [ 165.011512] systemd[1]: Running GC... [ 165.011933] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus [ 165.011965] systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus [ 171.659875] RTAS: event: 40, Type: Platform Information Event, Severity: 1 bash-4.2# [ 179.656435] RTAS: event: 41, Type: Platform Information Event, Severity: 1 ifconfig eth0 Link encap:Ethernet HWaddr 2A:0F:4A:82:98:04 inet addr:9.5.250.146 Bcast:9.5.250.255 Mask:255.255.255.0 inet6 addr: 2002:905:150e:302:280f:4aff:fe82:9804/64 Scope:Global inet6 addr: fe80::280f:4aff:fe82:9804/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:362 errors:0 dropped:0 overruns:0 frame:0 TX packets:7 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:31283 (30.5 KiB) TX bytes:1090 (1.0 KiB) Interrupt:20 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
I can bring the network up in a bash session. However, anaconda cannot bring the network up. We thought that it might be a virtual ethernet problem. However, we installed a physical ethernet card in the system. It still fails.
I see no error regarding NetworkManager, there's a successful connection activation in the log. However, there is an error with memory access in ibmveth driver, thus reassigning to kernel.
The error you are talking about already has a bug assigned to it. bug 733766. I am able to bring the network up successfully from the command line using ifup. I cannot bring the network up using the Anaconda installer (which uses the NetworkManager libraries).
Created attachment 915366 [details] Comment (This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla).
Created attachment 915367 [details] Comment (This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla).
Is there any chance we can get anaconda output for this error? Based on these dmesg logs, it appears as though NetworkManager is correctly configuring the device, and that the device does indeed have IP connectivity (as shown by the resulting ifconfig output). But the bug title indicates that anaconda is saying that the network was not correctly configured? Is that correct? Also, what does 'route -n' say?
[anaconda root@localhost dev]# ifup eth1 [anaconda root@localhost dev]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 9.5.250.1 0.0.0.0 UG 0 0 0 eth1 9.5.250.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1 169.254.0.0 0.0.0.0 255.255.0.0 U 1003 0 0 eth1 [anaconda root@localhost dev]# /bin/systemctl start loader2.service This brings up the anaconda installer. I see the text dialog saying "You have multiple network devices on this system. Which would you like to install through?" I select eth1 and then see the following text dialog: "Waiting for NetworkManager to configure eth1." Which displays for a while. I then see the dialog to check the media ("To begin testing the media before installation press OK."). Which I skip, and it goes back to "Waiting for NetworkManager to configure eth1." After a while it goes to the "Configure TCP/IP" dialog. The network is still up from when I started it before. I can ssh into the box and I only see these logs from anaconda: [anaconda root@localhost ~]# ls -l /tmp total 5 -rw-r--r-- 1 root root 3601 Sep 14 19:53 anaconda.log -rw-r--r-- 1 root root 98 Sep 14 19:51 program.log [anaconda root@localhost ~]# cat /tmp/program.log 19:51:13,711 INFO program: Running... /bin/mount -n -t iso9660 -o ro /dev/sr0 /mnt/install/source [anaconda root@localhost ~]# cat /tmp/anaconda.log 19:48:39,815 INFO loader: kernel command line: 19:48:39,815 INFO loader: vnc=1 19:48:39,815 INFO loader: vnc forced graphical mode from cmdline 19:48:39,815 INFO loader: early networking required for vnc 19:48:39,815 INFO loader: ro 19:48:39,815 INFO loader: selinux=0 19:48:39,815 INFO loader: systemd.log_target=kmsg 19:48:39,815 INFO loader: systemd.log_level=debug 19:48:39,815 INFO loader: rd.break 19:48:39,815 INFO loader: serial 19:48:39,815 INFO loader: root=live:CDLABEL=Fedora-20110907-ppc64-DVD 19:48:39,817 DEBUG loader: readNetInfo /tmp/s390net not found, early return 19:48:39,831 INFO loader: anaconda version 16.16 on ppc64 starting 19:48:39,832 INFO loader: set console to /dev/hvc0 at 108 19:48:39,850 INFO loader: 8519680 kB (8320 MB) are available 19:48:39,850 INFO loader: 8519680 kB (8320 MB) are available 19:48:40,106 DEBUG loader: going to set language to en_US.UTF-8 19:48:40,106 DEBUG loader: locale en_US.UTF-8: base: en_US, mod: (null), charset: UTF-8 19:48:40,107 INFO loader: going to prepare locales for en_US.UTF-8 (locale: en_US, charset: UTF-8) 19:48:41,266 INFO loader: setting language to en_US.UTF-8 19:48:41,269 DEBUG loader: Saving module scsi_dh_rdac 19:48:41,269 DEBUG loader: Saving module scsi_dh_hp_sw 19:48:41,269 DEBUG loader: Saving module scsi_dh_alua 19:48:41,269 DEBUG loader: Saving module scsi_dh_emc 19:48:41,269 DEBUG loader: Saving module iscsi_tcp 19:48:41,269 DEBUG loader: Saving module libiscsi_tcp 19:48:41,269 DEBUG loader: Saving module libiscsi 19:48:41,269 DEBUG loader: Saving module scsi_transport_iscsi 19:48:41,269 DEBUG loader: Saving module cramfs 19:48:41,269 DEBUG loader: Saving module squashfs 19:48:41,269 DEBUG loader: Saving module nls_utf8 19:48:41,270 DEBUG loader: Saving module ibmvscsic 19:48:41,270 DEBUG loader: Saving module scsi_transport_srp 19:48:41,270 DEBUG loader: Saving module scsi_tgt 19:48:41,270 DEBUG loader: Saving module e1000e 19:48:41,270 DEBUG loader: probing buses 19:48:41,411 DEBUG loader: waiting for hardware to initialize 19:48:47,870 INFO loader: restarting NetworkManager 19:48:48,418 INFO loader: No iBFT table detected. 19:50:28,662 INFO loader: doing kickstart... setting it up 19:50:28,664 DEBUG loader: activating device eth1 19:51:13,704 ERR loader: failed to configure network interface 19:51:13,704 ERR loader: unable to activate device eth1 19:51:13,706 INFO loader: trying to mount CD device /dev/sr0 on /mnt/install/source 19:51:13,707 INFO loader: drive status is CDS_DISC_OK 19:51:13,719 DEBUG loader: Found installation media, so skipping lang and kbd 19:53:03,928 DEBUG loader: in doLoaderMain, step = STEP_LANG 19:53:03,928 DEBUG loader: in doLoaderMain, step = STEP_KBD 19:53:03,928 DEBUG loader: in doLoaderMain, step = STEP_METHOD 19:53:03,928 DEBUG loader: in doLoaderMain, step = STEP_DRIVER 19:53:03,930 DEBUG loader: in doLoaderMain, step = STEP_NETWORK 19:53:03,965 INFO loader: need to set up networking 19:53:03,966 DEBUG loader: in doLoaderMain, step = STEP_IP 19:53:03,966 DEBUG loader: in doLoaderMain, calling setupIfaceStruct() 19:53:03,966 DEBUG loader: in doLoaderMain, calling readNetConfig() 19:53:03,966 INFO loader: doing kickstart... setting it up 19:53:03,967 DEBUG loader: activating device eth1 19:53:49,010 ERR loader: failed to configure network interface 19:53:49,010 DEBUG loader: in STEP_IP, retry (LOADER_ERROR) 19:53:49,010 DEBUG loader: in doLoaderMain, step = STEP_IP 19:53:49,010 DEBUG loader: in doLoaderMain, calling setupIfaceStruct() 19:53:49,010 DEBUG loader: in doLoaderMain, calling readNetConfig()
I'm confused. At what point here does the error this bug is about occur? At no point should 'ifup eth1' be necessary, and I assume you're doing that because you got the aforementioned error? Why is ifup being run manually at all, and at what point during the install are you running ifup? Basically, what I need here is ifconfig -a and route -n after the error has occurred, but *before* any additional action like 'ifup' has been taken. Otherwise we simply pollute the debugging results. From the logs, it looks like NM is doing everything correctly and setting up the interface. It shouldn't be necessary to run 'ifup' because networking is apparently being set up correctly by NetworkManager already.
Boot Fedora-20110907-ppc64-DVD-respin.iso with the following parameters: linux serial vnc=1 selinux=0 You will immediately run into the problem. I use an HMC to access the console. Unfortunately, I cannot access the other windows that anaconda creates. So, I create the network setup files manually and start the network before anaconda runs. This allows me to get into a bash window and poke around.
http://ppc.koji.fedoraproject.org/scratch/karsten/iso/ is the URL where the isos can be found.
Some observations about logs in comment #5: 1) systemd starts NM 2) NM finds there is no defined connection for eth0 or eth1, and creates some default DHCP connections and starts them up 3) these default connections successfully start at time 99 4) at time 274, NM is told to stop 5) at time 275, NM is told to start again 6) NM finds new config files ifcfg-eth0 and ifcfg-eth1, but these contain NM_CONTROLLED=no so NM does not start any network interfaces 7) at 278, anaconda is supposed to flip NM_CONTROLLED to 'yes' to let NM start the interfaces, as indicated by: [ 278.609893] loader[745]: activating device eth1 8) NM never gets an inotify signal indicating that the files have changed, thus it never starts the interfaces. If NM got the inotify signal, we'd expect to see a message like: NetworkManager: updating /etc/sysconfig/network-scripts/ifcfg-eth0 One other possibility is that the code in NM's config file parsing for handling NM_CONTROLLED isn't properly parsing the change from 'no' to 'yes'. There were some changes committed around 2011-08-07 to that part of the code, but they were supposed to fix an issue like this. I just checked and verified that flipping NM_CONTROLLED on my machine with the official NM 0.9 release (in F15+) works as expected, so I don't suspect this change yet.
One way to test out whether it's anaconda or NM or the kernel is to: 1) after you get the "failed to configure interface" error, do whatever you need to do to get shell access to the machine's filesystem 2) check /etc/sysconfig/network-scripts/ifcfg-eth0 and ifcfg-eth1 and see what the value of NM_CONTROLLED is 3) change that value to the opposite, ie "no" -> "yes" or "yes" -> "no" 4) look at the last 50 lines of syslog or wherever the NM output goes, and see if NM noticed the change to the ifcfg file
/etc/sysconfig/network-scripts/ifcfg-eth0 is empty after NM failed to configure eth0: DEVICE=eth0 HWADDR=C6:..... ONBOOT=yes BOOTPROTO=dhcp IPV6INIT=yes IPV6_AUTOCONF=yes That was with an aborted anaconda, I need to try again and see if I can only suspend it while doing the change
Before NM runs, /etc/sysconfig/network-scripts/ifcfg-eth0 is: DEVICE=eth0 HWADDR=2A:.... ONBOOT=yes BOOTPROTO=dhcp After it fails to configure, it's: DEVICE=eth0 HWADDR=2A:.... ONBOOT=yes BOOTPROTO=dhcp IPV6INIT=yes IPV6_AUTOCONF=yes I added NM_CONTROLLED=yes to /etc/sysconfig/network-scripts/ifcfg-eth0 and hit "Retry" in Anaconda and it just overwrote the file.
Yeah, anaconda will rewrite the file when trying to configure the network, that's expected. But after you manually modified ifcfg-eth0 and added NM_CONTROLLED=yes, did you see anything in syslog that indicated NM reread the file? If not, try changing IPV6INIT to "no" and see if that triggers a reread.
From further debugging with modified NM versions it looks like NM is only getting glib's CHANGED event for file change notifications, and not CHANGES_DONE_HINT. This could be due to changes in either glib's file monitor support or the kernel. One thing I noticed is that glib's virtual changes done hint doesn't appear to be firing. It has a 2-second timer to send the changes done hint but that apparently never happens. A second odd thing is that we'd expect inotify to send an IN_CLOSE_WRITE event when the modified file is closed for writing, and that gets sent through directly by glib, but that's also not happening. SO the next step is probably to build a glib2 package with some debug logging and figure out what's going on.
So after much debugging on the part of rangerpb, hamzy, and I, it comes down to libffi as the culprit. glib now uses libffi to magically figure out the marshalling for signal arguments. Apparently that doesn't work correctly on ppc64, as the following log shows: 18:25:31,0 INFO NetworkManager: GLib-GIO-Message: (730) emit_cb (0x100205378b0): path '/etc/sysconfig/network-scripts/ifcfg-eth0' change 0x1002059fa20 event 3 (other <none>) 18:25:31,0 NOTICE NetworkManager: ifcfg-rh: (0x100205378b0) path changed /etc/sysconfig/network-scripts/ifcfg-eth0 (other <none>) event 0 The first part is from glib/gio/gfilemonitor.c's emit_cb() function where it does this: g_signal_emit (monitor, signals[CHANGED], 0, change->child, change->other_file, change->event_type); while the second simply prints out the signal arguments that NetworkManager got. Note that NetworkManager receives an event type of '0', while glib is emitting an event type of 3. That's the core problem here. At the moment we blame libffi, but we need to run both the libffi and glib test suites on a ppc64 box before we can more definitively say where the problem lies.
Just for the record, the glib signal arguments here are: signals[CHANGED] = g_signal_new (I_("changed"), G_TYPE_FILE_MONITOR, G_SIGNAL_RUN_LAST, G_STRUCT_OFFSET (GFileMonitorClass, changed), NULL, NULL, NULL, G_TYPE_NONE, 3, G_TYPE_FILE, G_TYPE_FILE, G_TYPE_FILE_MONITOR_EVENT); and the G_TYPE_FILE_MONITOR_EVENT is the one that's apparently not getting marshalled appropriately. That GType just represents an enum/uint32.
Which version of libffi? 3.0.10?
Created attachment 524478 [details] reduced testcase for marshalling G_TYPE_FILE_MONITOR_EVENT Build with: gcc -Wall -o fmtest `pkg-config --libs --cflags gio-2.0` fmtest.c this test passed on x86_64, fails on ppc64.
testcase requires only glib2-devel package installed to build.
libffi-3.0.9-3.fc16.ppc64 libffi-devel-3.0.9-3.fc16.ppc64
Note that I don't think we can rule glib2 out completely yet... still investigating glib's usage of libffi and the internal mapping of types for marshalling.
G_TYPE_ENUM (which is what G_TYPE_FILE_MONITOR_EVENT is derived from) was not correctly mapped to libffi types during marshalling in gclosure.c. Internally it was stored as a 'signed long' but when mapped to libffi types, was retrieved as a 'signed int' leading ppc64 to always return 0 since it presumably got the wrong 32 bits of the union in which the mapped value was held.
(In reply to comment #25) > G_TYPE_ENUM (which is what G_TYPE_FILE_MONITOR_EVENT is derived from) was not > correctly mapped to libffi types during marshalling in gclosure.c. Internally > it was stored as a 'signed long' but when mapped to libffi types, was retrieved > as a 'signed int' leading ppc64 to always return 0 since it presumably got the > wrong 32 bits of the union in which the mapped value was held. Great - thanks for tracking this down.
I've built a custom glib2 with upstream commits 1df8160fa675b225809eed2f86d2489133e5e54d and f42fe6cdc056b77f74ff6e332389d444c50ae7dc the fmtest testcase still fails with ** Message: a_foo_emit: emitting signal with G_FILE_MONITOR_EVENT_CREATED (3) ** ERROR:fmtest.c:62:check_cb: assertion failed: (event_type == G_FILE_MONITOR_EVENT_CREATED) The glib2 selfcheck fails with: TEST: signals... (pid=21717) /gobject/signals/variant: OK /gobject/signals/generic-marshaller-1: ** ERROR:signals.c:193:on_generic_marshaller_2: assertion failed (v_enum == TEST_ENUM_BAR): (0 == 2) OK /gobject/signals/generic-marshaller-2: FAIL GTester: last random seed: R02S8fe9c3d2d27713f79a52c55453f3343b /bin/sh: Zeile 1: 21656 Beendet MALLOC_CHECK_=2 MALLOC_PERTURB_=$((${RANDOM:-256} % 256)) ../../glib/gtester --verbose boxed enums param signals threadtests dynamictests binding properties reference ifaceproperties valuearray make[4]: *** [test-nonrecursive] Fehler 143
In gobject/genums.c, there also seems to be mismatched types. Also, since enums are ints in C, shouldn't we be storing them in ints rather than longs?
(In reply to comment #28) > In gobject/genums.c, there also seems to be mismatched types. > > Also, since enums are ints in C, shouldn't we be storing them in ints rather > than longs? Not storing the enums in v_long would be an ABI break. I did a patch to stuff them into a temporary integer but that produced signed/unsigned issues with enums instead. Better, but still not fixed. The patch I had works fine on ppc32, just not ppc64. Are there sign extension rules for storing an int to a long in ppc64 that might be coming into play?
nothing more than any other arch :-) If you store as an int you need to retrieve as an int, that's the one cardinal rule. IE, retrieve with the same type you used to store. As for casting before you store, that's business as usual, ie, int -> long -will- sign extend, if you want to avoid that, you need to treat things as unsigned int -> unsigned long
(In reply to comment #30) > nothing more than any other arch :-) > > If you store as an int you need to retrieve as an int, that's the one cardinal > rule. IE, retrieve with the same type you used to store. > > As for casting before you store, that's business as usual, ie, int -> long > -will- sign extend, if you want to avoid that, you need to treat things as > unsigned int -> unsigned long The patch we've come up with works so far. What's not working is marshalling return values. libffi is told that the value is a 32-bit value (ffi_type_sint) yet apparently when it comes back from ffi_call() it's been munged into a 64-bit value. I would have expected a 32-bit value back from ffi_call, and everything I can see in ffi says that a 4-byte value got allocated on the stack for the return value, but on return it's not. g_cclosure_marshal_generic: ffi_call (&cif, marshal_data ? marshal_data : cc->callback, rvalue, args); (rvalue is a void* pointer to a stack-allocated signed int here) but if we expect a return value of -30 and do this after ffi_call(): fprintf (stderr, "int %d\n", *(gint*)rvalue); fprintf (stderr, "long %ld\n", *(glong*)rvalue); we get output of: int -1 long -30 where the -30 is really 0xFFFFFFFFFFFFFFE2 underneath. So the question of the day is why libffi appears to be returning a 64-bit value when were requested a 32-bit signed int.
Created attachment 525010 [details] Hackish patch which fixes marshalling on ppc64 and has no regressions on x86-64 This patch passes the testcases on ppc64 and x86-64, but I'm really not sure why we need to do the return munging in value_from_ffi_type() since we're telling libffi that we expect a return type of 'sint' which is a 4-byte value, but it appears libffi is returning a pointer to a 8-byte/64-bit value instead.
IMHO that's wrong. Instead you should marshall G_TYPE_ENUM to a full long (arguably an unsigned one even). The target code will expect all arguments to be fully zero or sign extended to 64-bit anyways, whether they are passed in registers or on the stack, and your g_value as v_long is already appropriately extended by glib, so just marshall it as a long. Of course I have no way to tell whether that will work on not on x86_64, sparc64, mips64, ia64, ... tho I wouldn't be surprised is sparc is just like powerpc here
Created attachment 525015 [details] Untested patch to fix 32-bit return values in libffi So from my quick look at it, it appears that libffi doesn't properly deal with being asked for a 32-bit return value, it always returns a 64-bit value on ppc64. I believe this is a bug. This patch uses the FLAG_RETURNS_64BITS that is set by the prep code to decide whether to use a stw or a std instruction for the return value, which should -hopefully- fix it. Untested here so let me know.
For 64-bit ABIs that extend integral returns types to 64-bits, libffi always returns full 64-bit values that you can truncate in the calling code. It's just the way it is has always been. Please don't change libffi. I'll document this clearly for the next version (perhaps there is a mention of this, I haven't looked yet). The same is true for returning 8-bit values, for instance, on 32-bit systems. All ABIs extend those results to the full 32-bits so you need to provide a properly aligned buffer that's big enough to hold the result.
BTW - you can just use ffi_arg as the storage for all integral return values. Look at the libffi testsuite code, like return_sc.c. For ppc64 ffi_arg is an unsigned long.
That doesn't sound right to me and totally defeats the point of typing the return value.... If I tell libffi that I want a return value of type "int", why should I pass a buffer to something other than "int" ? What you say means that the caller of libffi would have to hold ABI specific knowledge, which completely defeats the purpose of libffi here. So I don't think you are on the right track :-) libffi should convert the return value to whatever type has been requested by the caller. The caller shouldn't have to make various assumptions on what ABI it's running on. Now I'm not familiar with libffi, just looking at it in the context of that bug, but what happens for arguments (rather than return values) ? If I pass an argument of type "int", am I expected to pass a pointer to a 32-bit int or a 64-bit sign extended int ? My understanding is that the former is true, and that's how glib uses libffi. I don't see why the return argument should work differently.
Note: If that's one of those too-late-to-fix issue (since if we change it, then things using ffi_arg in turn will be broken), then we should probably go through a big fat code audit of everything that uses the library. IE. I can perfectly see about everybody using it getting it wrong (simply becaues it just doesn't make any sense to do it that way imho :-) not just glib here.
If you don't want to break libffi ABI but still handle return type in a platform agnostic way, I'd suggest adding some function like ffi_collect_return (ffi_cif *cif, ffi_type rtype, void *rvalue_in, void *rvalue_out); or something like that. Which would certainly help insulate callers from platform differences like this. Then something like glib could avoid return value handling errors with if (ffi_prep_cif (&cif, FFI_DEFAULT_ABI, n_args, rtype, atypes) != FFI_OK) return; ffi_call (&cif, marshal_data ? marshal_data : cc->callback, rvalue, args); ffi_collect_return (&cif, rtype, rvalue, rvalue_sanitized); if (return_gvalue && G_VALUE_TYPE (return_gvalue)) value_from_ffi_type (return_gvalue, rvalue_sanitized);
(In reply to comment #39) > If you don't want to break libffi ABI but still handle return type in a > platform agnostic way, I'd suggest adding some function like > > ffi_collect_return (ffi_cif *cif, ffi_type rtype, void *rvalue_in, void > *rvalue_out); An interesting idea, although I'm not sure it's necessary. There's nothing platform specific about casting the ffi_arg result to the shorter type that you want, although it's usually not even necessary to do that.
Anyhow, looks like we have to use ffi_arg, it's unfixable, too many things already using it that way. On the other hand, quite a few users of libffi around seem to basically pass the pointer to their native type along just like glib does, tho it's hard to tell with a simple google code search. You may want to double check the various JNIs etc... out there. This is such a horribly fragile design decision, so error prone, it's really sad. This is the kind of stuff that people -will- get wrong over and over again and will lead to subtle memory corruption bugs (since the problem is a lot less visible on LE, at least on BE we see the bad return value immediately). If you meant to always get an ffi_arg for scalar type, then you shouldn't have made the return argument a void * but some kind of union or something a tad more strongly typed. What made you ever think that doing return values differently than arguments was ever a good idea ?
(In reply to comment #40) > (In reply to comment #39) > > If you don't want to break libffi ABI but still handle return type in a > > platform agnostic way, I'd suggest adding some function like > > > > ffi_collect_return (ffi_cif *cif, ffi_type rtype, void *rvalue_in, void > > *rvalue_out); > > An interesting idea, although I'm not sure it's necessary. There's nothing > platform specific about casting the ffi_arg result to the shorter type that you > want, although it's usually not even necessary to do that. For sake of argument: int foo; void *rvalue; rtype = &ffi_type_sint; rvalue = alloca (ffi_type_sint.size); ffi_prep_cif (&cif, FFI_DEFAULT_ABI, n_args, rtype, ...); ffi_call (&cif, callback, rvalue, ...); Well, the problem is on ppc64 (and maybe any 64-bit BE platform) that casting directly doesn't work. To get the right value out of an ffi_type_sint return value, you need to: foo = (int) (*(long*) rvalue); instead of what you'd expect, which would be: int foo = (int) rvalue; since on ppc64 the second one (the direct cast) returns the wrong 32 bits of the 64-bit value from ffi_call(). Or am I missing something here?
(In reply to comment #42) > > (In reply to comment #40) > > (In reply to comment #39) > > > If you don't want to break libffi ABI but still handle return type in a > > > platform agnostic way, I'd suggest adding some function like > > > > > > ffi_collect_return (ffi_cif *cif, ffi_type rtype, void *rvalue_in, void > > > *rvalue_out); > > > > An interesting idea, although I'm not sure it's necessary. There's nothing > > platform specific about casting the ffi_arg result to the shorter type that you > > want, although it's usually not even necessary to do that. > > For sake of argument: > > int foo; > void *rvalue; > > rtype = &ffi_type_sint; > rvalue = alloca (ffi_type_sint.size); > ffi_prep_cif (&cif, FFI_DEFAULT_ABI, n_args, rtype, ...); > ffi_call (&cif, callback, rvalue, ...); > > Well, the problem is on ppc64 (and maybe any 64-bit BE platform) that casting > directly doesn't work. To get the right value out of an ffi_type_sint return > value, you need to: > > foo = (int) (*(long*) rvalue); > > instead of what you'd expect, which would be: > > int foo = (int) rvalue; > > since on ppc64 the second one (the direct cast) returns the wrong 32 bits of > the 64-bit value from ffi_call(). Or am I missing something here? so benh says using ffi_arg for rvalue instead of void would do the trick and then we could directly cast. So nevermind this.
So Anthony, correct me if I'm wrong, but what that means is that anything that's "bridging" two types interface, ie glib vs. ffi java vs. ffi, js vs. ffi, smalltalk vs. ffi (to take a few examples I picked with google) must do something along those lines: Is return type scalar ? Yes : use a pointer to a temp ffi_arg No : use a pointer to the original return type Do the call Was return type scalar ? Yes : Do some kind of giant switch/case that appropriately casts that ffi_arg to the appropriate original type From what I can tell, the various bridges i've seen around don't do that, so it's VERY possible that I've not totally grasped what they do (they are all quite convoluted code and all seem to manipulate language implementation specific types that I don't know about) but it looks like they also all do something akin to what glib did (which is broken) which is to essentially pass a pointer to the original type to ffi_call and concert the type itself to the appropriate ffi_type, just like for arguments. If I'm right, that means that the majority of your users out there are wrong and happily silently corrupting memory beyond the return value on x86_64 (or getting the return value wrong on ppc64/sparc64/mips64). I would VERY STRONGLY suggest that you revisit this and provide a new variant of ffi_call that converts the return values so that the pointer passed by the caller corresponds precisely to the type requested, just like arguments, and then start the process of deprecating the current variant of ffi_call. The current situation as far as I can tell is just way too error prone and boils down to bad engineering.
Dan: Won't work on x86, double won't fit 64-bit, unless ffi_arg is big enough to hold it there. I'm afraid you'll really have to special case scalar vs. non-scalar. I don't have the bandwidth for that, but somebody should really audit all users of libffi, that stuff smells real bad.
(In reply to comment #41) > On the other hand, quite a few users of libffi around seem to basically pass > the pointer to their native type along just like glib does, tho it's hard to > tell with a simple google code search. You may want to double check the various > JNIs etc... out there. I'll write something up for the libffi-discuss list. > If you meant to always get an ffi_arg for scalar type, then you shouldn't have > made the return argument a void * but some kind of union or something a tad > more strongly typed. > > What made you ever think that doing return values differently than arguments > was ever a good idea ? I don't recall. It was 15 years ago.
Fair enough, I did stupid stuff 15 years ago too :-) Still, I believe it would be useful to introduce a new variant of ffi_call that does argument conversion. In the end, the callers will have to do it if libffi doesn't do it and I'd rather see that sort of stuff done right in one place than with 5 different bugs in 5 different places.
(In reply to comment #44) > So Anthony, correct me if I'm wrong, but what that means is that > anything that's "bridging" two types interface, ie glib vs. ffi > java vs. ffi, js vs. ffi, smalltalk vs. ffi (to take a few examples > I picked with google) must do something along those lines: > > Is return type scalar ? > > Yes : use a pointer to a temp ffi_arg > No : use a pointer to the original return type > > Do the call > > Was return type scalar ? > > Yes : Do some kind of giant switch/case that appropriately casts > that ffi_arg to the appropriate original type Essentially, yes. > From what I can tell, the various bridges i've seen around don't do that, I just checked python (ctypes), JNA, Ruby FFI and gcj and they all get it right. This would be an interesting check for one of the static analysis tools out there to add. > I would VERY STRONGLY suggest that you revisit this and provide a new variant > of ffi_call that converts the return values so that the pointer passed by the > caller corresponds precisely to the type requested, just like arguments, and > then start the process of deprecating the current variant of ffi_call. Fair enough. This may be of value, since many users have to go through this manual step. As I mentioned, I'll send a note to libffi-discuss. Patches are also welcome, of course. Some of the reasoning for this decision is coming back to me. The thought was that some users will immediately want to promote the result back, so leaving those results in their promoted sizes would be a win in those cases. Or something like that. 15 years is a long time!
> Some of the reasoning for this decision is coming back to me. The thought was > that some users will immediately want to promote the result back, so leaving > those results in their promoted sizes would be a win in those cases. Or > something like that. 15 years is a long time! But such users could just have requested a long and be done with it :-) No biggie, I'm happy you managed to parse JNA code, I gave up :-) One I haven't managed to get my head around is http://code.google.com/p/gpsee/source/browse/modules/gffi/CFunction.c?spec=svn312dc94085ad58a411b4b04281f9dbe4667e6b41&r=312dc94085ad58a411b4b04281f9dbe4667e6b41 There was another one (smalltalk realted) but can't seem to find it anymore, and of course glib :-) I think Davlik (android) has it wrong. I didn't find the canonical repo from google but there's a few offshoots here or there, here's one of them: http://gitorious.org/0xdroid/dalvik/blobs/master/vm/arch/generic/Call.c The uint32 case will be wrong here as far as I can tell I will be away from friday for a few weeks, I don't have time to help much more with this, sorry about that, hopefully Dan will be able to sort out glib and we'll see what else eventually hits
Phil and I were looking at this today and the various patches didn't seem to allow glib2 to pass tests. Are we still contemplating our next move?
Previous comment was by Brent (firefox seemed to hold my bugproxy-ness). This bug(s) is keeping us from alpha on ppc64.
Right, Dan needs to do a new patch for the return values, are there other tests that fail elsewhere ?
Quick update from Karsten via phone today: He managed to compose new images yesterday with Ben's fix in libffi and the ones that Dan mentioned need to be in glib (back out one of the "wrong" patches). The second marshaller test still fails now, but the installer now successfully works with NM and NM doesn't need to be manually restarted in the installer anymore! IIRC Dan also mentioned he tested the libffi change on x86_64 if it breaks things there but at least according to his tests they didn't. In any case, please retest with the latest images available from: http://ppc.koji.fedoraproject.org/scratch/karsten/iso/Fedora-20110928-ppc64-DVD.iso If those work we'll use the fixes for now, at least for PPC64. Thanks & regards, Phil
Unfortunately I need to get back to the networking stuff I've been ignoring the last few days trying to track this down, so we need to get the glib people to work on this instead, since glib is where the bug actually is (well, and libffi's odd API but whatever). So either Matthias or Colin Walters need to take this bug over at this point, now that we've diagnosed what the issue is. I'm happy to grab and test whatever fixes they come up with on power02.str since that wouldn't take a large amount of time but I can't spend a ton of time on this bug for a bit...
My recommendation is to back out the libffi patch since it's not sanctioned upstream, and apply the attached patch to glib2 builds, pending approval by glib maintainers.
RFC/proposed glib patch to address marshalling issues: http://bugzilla-attachments.gnome.org/attachment.cgi?id=197698
It was my understanding that the glib2 patch alone was not enough to make NM work. Do others recall this as well?
I would recommend renaming value_to_ffi_type to rvalue_to_ffi_type : added the "r" to make it clear that this function is only to be used on a return value, maybe add a comment explaining why we do that too, ie: /* ffi_call return value for scalars is always an ffi_arg, regardless * of the requested type, so we need to convert it appropriately
Brent: I think the previous glib patch wasn't enough as it didn't handle the return value problem, the new patch should make it work. Do we still have failures in glib2 tests ? Is there another case that isn't handled properly ?
(In reply to comment #57) > It was my understanding that the glib2 patch alone was not enough to make NM > work. Do others recall this as well? No, the glib2 patch alone should make NM work. The original problem was failure of glib to properly marshal enums which NM was depending for inotify events, which would allow NM to detect that anaconda had updated the ifcfg file.
(In reply to comment #59) > Brent: I think the previous glib patch wasn't enough as it didn't handle the > return value problem, the new patch should make it work. > > Do we still have failures in glib2 tests ? Is there another case that isn't > handled properly ? The linked patch passes the marshalling tests that we added to glib to expose this bug. AFAIK there is no other case that we care about that fails. We do need better testcase coverage of all the other return types, but given the changes in this patch those *should* work (but wouldn't hurt to verify).
The patch has been committed to upstream glib.
glib2-2.30.0-2.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/glib2-2.30.0-2.fc16
Package glib2-2.30.0-2.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing glib2-2.30.0-2.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2011-13892 then log in and leave karma (feedback).
glib2-2.30.0-2.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.
If you ever want to try and get your head around the gffi module mentioned in comment 49, please let me know. Sorry about the readability. I expect it to be relatively future-safe and portable across SUS v3 (and Linux) providing the libffi API does not change. It was developed on 3.0.8 and we currently use 3.0.10. Tested on x86_64, x86, 32-bit sparc v9. Mac, Linux, Solaris. I wrote that code several years ago, but I do recall having to handle type marshalling differently for return values and arguments, and being puzzled by that. IIRC, the marshalling converts between JavaScript values and appropriate C types, then converts between the C types and the FFI types. ISTR int-like values are categorized by examining the storage size and signedness. The type converters (marshalling functions) are decided when the JS CFunction class is instanciated, based on the function arguments passed in the constructor. The type converter which is actually invoked for a given argument is stored as a function pointer in the argConverters array, which is unique per instance of the JS CFunction class.