Description of problem: I did F18 Beta TC1 i386 install from PXE. In firstboot the "network time" checkbox is checked. If I click Forward, a dialog "starting NTP service" appears. If I hit Cancel on that dialog, firstboot ends prematurely (and it is run again on the next boot). This is what I found in /var/log/messages: > Oct 3 14:02:45 dhcp-28-116 firstboot[395]: INFO:firstboot.frontend:window manager started successfully > Oct 3 14:02:45 dhcp-28-116 firstboot[395]: INFO:firstboot.frontend:merging the Xresources > Oct 3 14:02:45 dhcp-28-116 firstboot[395]: INFO:firstboot.frontend:starting the window manager > Oct 3 14:02:46 dhcp-28-116 firstboot[395]: Window manager warning: Locale not understood by C library, internationalization will not work > Oct 3 14:02:46 dhcp-28-116 firstboot[395]: (metacity:599): Gtk-WARNING **: Locale not supported by C library. > Oct 3 14:02:46 dhcp-28-116 firstboot[395]: Using the fallback 'C' locale. > Oct 3 14:02:46 dhcp-28-116 firstboot[395]: (process:395): Gtk-WARNING **: Locale not supported by C library. > Oct 3 14:02:46 dhcp-28-116 firstboot[395]: Using the fallback 'C' locale. > Oct 3 14:02:46 dhcp-28-116 firstboot[395]: INFO:firstboot.loader:loading module create_user > Oct 3 14:02:53 dhcp-28-116 firstboot[395]: INFO:firstboot.loader:loading module date > Oct 3 14:02:53 dhcp-28-116 firstboot[395]: INFO:firstboot.loader:loading module keyboard > Oct 3 14:02:53 dhcp-28-116 firstboot[395]: INFO:firstboot.loader:module only available in reconfig mode > Oct 3 14:02:53 dhcp-28-116 firstboot[395]: INFO:firstboot.loader:loading module welcome > Oct 3 14:02:53 dhcp-28-116 firstboot[395]: INFO:firstboot.loader:loading module eula > Oct 3 14:02:53 dhcp-28-116 firstboot[395]: INFO:root:Setting size to 1680x1050 > Oct 3 14:02:54 dhcp-28-116 firstboot[395]: /usr/lib/python2.7/site-packages/scdate/core/zonetab.py:227: RuntimeWarning: Couldn't set locale: unsupported locale setting > Oct 3 14:02:54 dhcp-28-116 firstboot[395]: entry = ZoneTabEntry (code, lat, long, tz, comments) > Oct 3 14:02:55 dhcp-28-116 firstboot[395]: Window manager warning: Buggy client sent a _NET_ACTIVE_WINDOW message with a timestamp of 0 for 0x60007b (firstboot) > Oct 3 14:02:55 dhcp-28-116 firstboot[395]: Window manager warning: meta_window_activate called by a pager with a 0 timestamp; the pager needs to be fixed. > Oct 3 14:03:15 dhcp-28-116 firstboot[395]: Unhandled exception in firstboot occured. > Oct 3 14:03:15 dhcp-28-116 firstboot[395]: Bugreport directory was created and once you login, ABRT should > Oct 3 14:03:15 dhcp-28-116 firstboot[395]: let you file a bug. (Don't forget to check "Show all problems") > Oct 3 14:03:15 dhcp-28-116 firstboot[395]: Do you want to start /usr/bin/setup so that you can > Oct 3 14:03:15 dhcp-28-116 firstboot[395]: create your user account? [Y/n] Error in sys.excepthook: > Oct 3 14:03:15 dhcp-28-116 firstboot[395]: Traceback (most recent call last): > Oct 3 14:03:15 dhcp-28-116 firstboot[395]: File "/usr/sbin/firstboot", line 94, in <lambda> > Oct 3 14:03:15 dhcp-28-116 firstboot[395]: config) > Oct 3 14:03:15 dhcp-28-116 firstboot[395]: File "/usr/sbin/firstboot", line 84, in exception_handler > Oct 3 14:03:15 dhcp-28-116 firstboot[395]: answer = raw_input (_('Do you want to start /usr/bin/setup so that you can\n' > Oct 3 14:03:16 dhcp-28-116 firstboot[395]: EOFError: EOF when reading a line > Oct 3 14:03:16 dhcp-28-116 firstboot[395]: Original exception was: > Oct 3 14:03:16 dhcp-28-116 firstboot[395]: Traceback (most recent call last): > Oct 3 14:03:16 dhcp-28-116 firstboot[395]: File "/usr/lib/python2.7/site-packages/firstboot/interface.py", line 105, in _nextClicked > Oct 3 14:03:16 dhcp-28-116 firstboot[395]: self.advance() > Oct 3 14:03:16 dhcp-28-116 firstboot[395]: File "/usr/lib/python2.7/site-packages/firstboot/interface.py", line 148, in advance > Oct 3 14:03:16 dhcp-28-116 firstboot[395]: result = module.apply(self, self.testing) > Oct 3 14:03:16 dhcp-28-116 firstboot[395]: File "/usr/share/firstboot/modules/date.py", line 49, in apply > Oct 3 14:03:16 dhcp-28-116 firstboot[395]: rc = self.scd.firstboot_apply() > Oct 3 14:03:16 dhcp-28-116 firstboot[395]: File "/usr/share/system-config-date/scdMainWindow.py", line 213, in firstboot_apply > Oct 3 14:03:16 dhcp-28-116 firstboot[395]: return self.apply () > Oct 3 14:03:16 dhcp-28-116 firstboot[395]: File "/usr/share/system-config-date/scdMainWindow.py", line 154, in apply > Oct 3 14:03:16 dhcp-28-116 firstboot[395]: time.sleep (2) > Oct 3 14:03:16 dhcp-28-116 firstboot[395]: KeyboardInterrupt > Oct 3 14:03:16 dhcp-28-116 firstboot[395]: [xcb] Unknown sequence number while processing queue > Oct 3 14:03:16 dhcp-28-116 firstboot[395]: [xcb] Most likely this is a multi-threaded client and XInitThreads has not been called > Oct 3 14:03:16 dhcp-28-116 firstboot[395]: [xcb] Aborting, sorry about that. > Oct 3 14:03:16 dhcp-28-116 firstboot[395]: python: xcb_io.c:273: poll_for_event: Assertion `!xcb_xlib_threads_sequence_lost' failed. > Oct 3 14:03:18 dhcp-28-116 abrt[645]: Saved core dump of pid 644 (/usr/bin/python2.7) to /var/spool/abrt/ccpp-2012-10-03-14:03:16-644 (18407424 bytes) > Oct 3 14:03:18 dhcp-28-116 firstboot[395]: firstboot: Fatal IO error 4 (Interrupted system call) on X server :9. > Oct 3 14:03:18 dhcp-28-116 systemd[1]: firstboot-graphical.service: main process exited, code=exited, status=1 > Oct 3 14:03:18 dhcp-28-116 firstboot[395]: g_dbus_connection_real_closed: Remote peer vanished with error: Underlying GIOStream returned 0 bytes on an async read (g-io-error-quark, 0). Exiting. > Oct 3 14:03:18 dhcp-28-116 firstboot[395]: Window manager warning: Fatal IO error 11 (Resource temporarily unavailable) on display ':9'. > Oct 3 14:03:18 dhcp-28-116 kernel: [ 75.865268] [drm] nouveau 0000:01:00.0: Setting dpms mode 3 on vga encoder (output 0) > Oct 3 14:03:18 dhcp-28-116 kernel: [ 75.895449] [drm] nouveau 0000:01:00.0: Setting dpms mode 0 on vga encoder (output 0) > Oct 3 14:03:18 dhcp-28-116 kernel: [ 75.895452] [drm] nouveau 0000:01:00.0: Output VGA-1 is running on CRTC 0 using output A > Oct 3 14:03:18 dhcp-28-116 kernel: [ 75.895677] [drm] nouveau 0000:01:00.0: Setting dpms mode 3 on TV encoder (output 2) > Oct 3 14:03:18 dhcp-28-116 kernel: [ 75.928927] [drm] nouveau 0000:01:00.0: Setting dpms mode 0 on TV encoder (output 2) > Oct 3 14:03:18 dhcp-28-116 kernel: [ 75.928931] [drm] nouveau 0000:01:00.0: Output TV-1 is running on CRTC 1 using output B > Oct 3 14:03:21 dhcp-28-116 systemd[1]: Failed to start firstboot configuration program (graphical mode). > Oct 3 14:03:21 dhcp-28-116 systemd[1]: Unit firstboot-graphical.service entered failed state. However, I tried the same approach on a different computer and I haven't received the traceback. Instead, I was told that the service couldn't be started and asked what to do. So I wonder, is there some race condition? Version-Release number of selected component (if applicable): firstboot-18.4-1.fc18.i686 How reproducible: somewhat Steps to Reproduce: 1. install F18 Beta TC1 from PXE 2. use network time 3. hit Cancel on the dialog "starting NTP service"
This can't be fixed with an update, so might be a F18 Blocker.
Did abrt capture anything? After I attempted to reproduce this bug, abrt had two exception reports: This one was new: Bug 863609 - KeyboardInterrupt The other one was said to be a duplicate of: Bug 858768 - [abrt] python-2.7.3-7.2.fc17: poll_for_event: Process /usr/bin/python2.7 was killed by signal 6 (SIGABRT)
This is again system-config-date's fault. As firstboot only imports that screen.
*** Bug 863609 has been marked as a duplicate of this bug. ***
Clicked cancel while the "Starting NTP Service" dialog was displayed. With F18-Beta-TC4: $ rpm -q firstboot system-config-date firstboot-18.4-1.fc18.x86_64 system-config-date-1.9.68-1.fc18.noarch Package: firstboot-18.4-1.fc18.x86_64 OS Release: Fedora release 18 (Spherical Cow)
Created attachment 629010 [details] firstboot-tb-1 showing KeyboardInterrupt traceback in firstboot 18.4 firstboot 18.4 exception report Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/firstboot/interface.py", line 105, in _nextClicked self.advance() File "/usr/lib/python2.7/site-packages/firstboot/interface.py", line 148, in advance result = module.apply(self, self.testing) File "/usr/share/firstboot/modules/date.py", line 49, in apply rc = self.scd.firstboot_apply() File "/usr/share/system-config-date/scdMainWindow.py", line 213, in firstboot_apply return self.apply () File "/usr/share/system-config-date/scdMainWindow.py", line 154, in apply time.sleep (2) KeyboardInterrupt
Created attachment 629023 [details] backtrace-1 for coredump captured at the same time When this problem occurs, you get *two* abrt dump directories, one from libreport and one from ccpp. Abrt said the dump in the libreport directory was a duplicate of Bug 863609, so I have manually attached the backtrace file here. Abrt could not report the ccpp dump for some reason, so I am manually attaching the backtrace file here. warning: core file may not match specified executable file. [New LWP 984] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". warning: "/usr/lib/debug/usr/lib64/libicudata.so.49.1.1.debug": separate debug info file has no debug info Core was generated by `python /usr/sbin/firstboot'. Program terminated with signal 6, Aborted. #0 0x00007f4d49122ba5 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:63 63 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
Created attachment 629026 [details] /var/log/messages covering the same events I ran the cancel test twice, once when rebooting after installation and once after restarting (as Kamil noted above, firstboot ran again). The attached /var/log/messages file has records for both of those. The messages appear to be the same as what Kamil originally reported. Reproduced with: $ qemu-kvm -m 2048 -hda f18-test-1.img -cdrom ~/xfr/fedora/F18/F18-Beta/TC4/Fedora-18-Beta-TC4-x86_64-DVD.iso -usb -vga qxl -boot menu=on -usbdevice mouse [snippets from /var/log/messages] ... Oct 17 10:48:18 localhost abrtd: Directory 'libreport-2012-10-17-13:48:18-984' creation detected ... Oct 17 10:48:20 localhost abrtd: Directory 'ccpp-2012-10-17-13:48:18-984' creation detected ... Oct 17 14:18:33 localhost abrtd: Directory 'libreport-2012-10-17-14:18:33-961' creation detected ... Oct 17 14:18:35 localhost abrtd: Directory 'ccpp-2012-10-17-14:18:33-961' creation detected ...
Two more factoids: Normally, there is a file /etc/sysconfig/firstboot, but it is now missing on my test VM. chronyd seems to be running normally.
Those strings in /var/log/messages are from firstboot, not system-config-date. They are in the exception handler: $ less -N ./firstboot-18.4-1/progs/firstboot 44 def exception_handler(type, value, tb, obj): The exception handler is asking the user a question ... $ egrep -r "'Unhandled exception in firstboot occured.'|exception_handler" ./firstboot-18.4-1 ./firstboot-18.4-1/progs/firstboot:def exception_handler(type, value, tb, obj): ./firstboot-18.4-1/progs/firstboot: print _('Unhandled exception in firstboot occured.') ./firstboot-18.4-1/progs/firstboot: sys.excepthook = lambda ty, value, tb: exception_handler(ty, value, tb,
Created attachment 629141 [details] strace log for strace attached to firstboot processes This strace log was captured starting with the first firstboot panel. The strings for the firstboot exception handler messages can be found by searching for 'occur'. The string 'KeyboardInterrupt' is near the end of the log. Command-line: # pgrep -f firstboot # strace -tt -f -s 256 -o strace-1.log -p 451 -p 768
$ grep signo strace-1.log 2209 22:22:58.587119 --- SIGINT {si_signo=SIGINT, si_code=SI_USER, si_pid=451, si_uid=0} --- 2209 22:22:58.923541 --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=2209, si_uid=0} --- 451 22:23:00.032133 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_DUMPED, si_pid=2209, si_status=SIGABRT, si_utime=14, si_stime=4} --- 768 22:23:00.089392 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} ---
Created attachment 629608 [details] scdMainWindow.py with strace records After looking at scdMainWindow.py and the strace log, I think that what is happening is that when Cancel is clicked, 451 sends SIGINT to 2209 while 2209 is in select() (which implements sleep()). Rather than cleaning up and terminating, the firstboot exception handler is invoked. This attachment shows lines from scdMainWindow.py with what I believe are corresponding lines from the strace log. NB: IIUC, fork() is implemented with clone() (per "man fork").
(In reply to comment #13) > Created attachment 629608 [details] > scdMainWindow.py with strace records > > After looking at scdMainWindow.py and the strace log, I think that what is > happening is that when Cancel is clicked, 451 sends SIGINT to 2209 while > 2209 is in select() (which implements sleep()). Rather than cleaning up and > terminating, the firstboot exception handler is invoked. ... Correcting myself ... an unhandled KeyboardInterrupt exception is raised by the SIGINT, and then the firstboot exception handler is invoked.
[Copied from Bug 863609, Comment 12] Martin Sivák 2012-10-17 08:13:52 EDT Did you press Ctrl-C? That is what KeyboardInterrupt exception really represents. -- Thanks for pointing that out. The Python documentation says: exception KeyboardInterrupt Raised when the user hits the interrupt key (normally Control-C or Delete). http://docs.python.org/library/exceptions.html?highlight=keyboardinterrupt#exceptions.KeyboardInterrupt However, what it really, really represents is SIGINT: :-) $ man 7 signal SIGINT 2 Term Interrupt from keyboard http://man7.org/linux/man-pages/man7/signal.7.html
*** Bug 868068 has been marked as a duplicate of this bug. ***
Steve, you are of course right here. There is couple of points which I can clarify from those outputs: The exception is (not) handled by firstboot, because we use system-config-date as a python module and while unrolling the stack, it gets back to the caller - the forked firstboot. If you look at the code after the fork, the parent (451, master copy of firstboot), starts the child and then waits two seconds before sending SIGINT to the child (2209). The child was at that moment supposed to terminate ages (2s, time is relative :) ago but got stuck (or is just slow..) for some reason. That means the child will emit KeyboardInterrupt exception and as it is a forked/cloned copy of firstboot, it will show the appropriate exception handler. But it is the forked process which does that, the master one happily waits in the background (pid 451). Well not happilly as it immediately falls into the code on the line 155 and gets stuck while starting NTP too. The code is not how fork should be handled I am affraid... I do not have commit rights, so I can only try attaching patches, but as I do not know s-c-date at all, we need Niels to start commenting on all those bugs.
Thanks for your clarification: The _copy_ of the firstboot exception handler in the forked child process (2209) is invoked: 2209 22:22:58.901005 write(1, "Unhandled exception in firstboot occured. ... Is that the way it is supposed to work? I am confused about your other comments. Lines 151-158 of scdMainWindow.py are executed in the child: pid = os.fork () if pid == 0: child_process() parent_process() Another problem with scdMainWindow.py is that it does not check for fork() failing: FORK(2) http://man7.org/linux/man-pages/man2/fork.2.html "On success, the PID of the child process is returned in the parent, and 0 is returned in the child. On failure, -1 is returned in the parent, no child process is created, and errno is set appropriately."
Sorry, you are right, I misread the code... But the exception handler is really called from the cloned copy, which is definitely not right. Also the parent sends SIGINT while the child is still waiting. That seems to be also wrong.
(In reply to comment #18) ... > Another problem with scdMainWindow.py is that it does not check for fork() > failing: I'm thinking like a C programmer. Python programs catch exceptions ... :-) What should happen if fork() fails? os.fork() Fork a child process. Return 0 in the child and the child’s process id in the parent. If an error occurs OSError is raised. http://docs.python.org/library/os.html?highlight=fork#os.fork
Well, if fork fails while you are root, you have a big problem with your system (out of pids or memory). But you are right that a well behaved program should probably check that.
There hasn't been any activity in this bug for over a month and there have been no new reports or indications from the reporter that this is still an issue. I suspect that this is no longer an issue and I'm closing the bug. If I'm missing something here, please re-open the bug.
Guessing is not testing ... :-) Easily reproduced with: $ qemu-kvm -m 2048 -hda f18-test-2.img -cdrom ~/xfr/fedora/F18/F18-Beta/Final/Fedora-18-Beta-x86_64-Live-Desktop.iso -usb -vga qxl -boot menu=on -usbdevice mouse -rtc base=localtime
(In reply to comment #23) > Guessing is not testing ... :-) > > Easily reproduced with: > $ qemu-kvm -m 2048 -hda f18-test-2.img -cdrom > ~/xfr/fedora/F18/F18-Beta/Final/Fedora-18-Beta-x86_64-Live-Desktop.iso -usb > -vga qxl -boot menu=on -usbdevice mouse -rtc base=localtime Then re-open the bug instead of just making smart comments. We have a LOT of proposed blockers for F18 right now and I'm trying to filter out the ones that we don't need to discuss in a meeting. The initial report makes it sound like this is not an 'every time' bug. How often does this happen? If it isn't every time, what circumstances cause this to manifest? Is it a timing bug or tied to an install method or hardware etc.? I haven't seen this yet, so I know it isn't every install in every situation.
*** Bug 882402 has been marked as a duplicate of this bug. ***
Created attachment 655290 [details] /var/log/messages showing numerous error messages from firstboot/system-config-date Basically, firstboot runs again after rebooting a second time. Kamil reported the same thing in Comment 0. To see the ABRT problem reports, you need to start ABRT and click "Show all problems". ABRT doesn't seem to be sending notifications. The attached log file shows numerous error messages from firstboot/system-config-date. Test procedure with Fedora-18-Beta-x86_64-Live-Desktop.iso in a VM: 1. Install from the F18-Beta Live CD using default NTP settings and disk configuration. 2. Reboot into the installed system. 3. Click Cancel when firstboot/system-config-date displays "Starting NTP service ..." 4. Log in and reboot again into the installed system. 5. Observe that firstboot runs again. 6. After logging in, start ABRT, click "Show all problems", and observe that there are two problem reports (from python and firstboot). 7. Look at /var/log/messages and observe that there are numerous error messages from firstboot. Tested with: $ qemu-kvm -m 2048 -hda f18-test-2.img -cdrom ~/xfr/fedora/F18/F18-Beta/Final/Fedora-18-Beta-x86_64-Live-Desktop.iso -usb -vga qxl -boot menu=on -usbdevice mouse BTW, I can't reopen the bug, because I didn't create it. Thanks for reopening.
(In reply to comment #26) > Created attachment 655290 [details] > /var/log/messages showing numerous error messages from > firstboot/system-config-date Corresponding version info copied from Bug 882402: firstboot-18.6-1.fc18.x86_64 system-config-date-1.10.2-1.fc18.noarch python-libs-2.7.3-13.fc18.x86_64
Discussed at 2012-12-05 blocker review meeting - http://meetbot.fedoraproject.org/fedora-bugzappers/2012-12-05/f18final-blocker-review-2.2012-12-05-17.01.log.txt . We agreed weakly to accept this bug as a blocker - there's some concern about whether it'd survive a go/no-go meeting as the last bug, but there are plausible cases for cancelling that dialog and it looks pretty bad if you do. The criterion intended to cover firstboot doesn't quite cover this scenario, but we agreed in principle to amend it. Currently it reads: In most cases (see Blocker_Bug_FAQ), a system installed according to any of the above criteria (or the appropriate Beta or Final criteria, when applying this criterion to those releases) must boot to the 'firstboot' utility on the first boot after installation, without unintended user intervention, unless the user explicitly chooses to boot in non-graphical mode. This includes correctly accessing any encrypted partitions when the correct passphrase is supplied. The firstboot utility must be able to create a working user account I will amend it to also require firstboot to exit 'successfully', i.e., such that it does not run again on the next boot (that's the consequence we're most worried about here, as it forces the creation of a second user account).
Thanks. The requirement is that firstboot runs on the first boot after installation and that it does not run on subsequent boots. Testing for that requirement should be easier than testing for whether it exits "successfully".
system-config-date-1.10.4-1.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/system-config-date-1.10.4-1.fc18
The update seems to fix the problem.
Package system-config-date-1.10.4-1.fc18: * should fix your issue, * was pushed to the Fedora 18 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing system-config-date-1.10.4-1.fc18' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-19863/system-config-date-1.10.4-1.fc18 then log in and leave karma (feedback).
It occurs to me that the criteria actually do cover this, though we may still want to clarify it a bit. The subsequent criterion reads: "Following on from the previous criterion, after firstboot is completed and on subsequent boots, a system installed according to any of the above criteria (or the appropriate Beta or Final criteria, when applying this criterion to those releases) must boot to a working graphical environment without unintended user intervention." Which rather implies that subsequent boots should *not* bring up firstboot. Arguably, this was already covered.
That's good, although is GDM "a working graphical environment"? Arguably, firstboot and GDM are graphical environments ... :-)
system-config-date-1.10.4-1.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report.
system-config-date-1.10.5-1.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/system-config-date-1.10.5-1.fc16
system-config-date-1.10.5-1.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/system-config-date-1.10.5-1.fc17
system-config-date-1.10.5-1.fc16, python-slip-0.2.24-1.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.
system-config-date-1.10.5-1.fc17, python-slip-0.2.24-1.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.