Description of problem: Jun 28 15:29:45 localhost systemd[1]: Job initialise-layout-daemon.service/start failed with result 'dependency'. Jun 28 15:29:45 localhost kernel: [ 682.454946] systemd[1]: Job start-players.service/start failed with result 'dependency'. Jun 28 15:29:45 localhost kernel: [ 682.454955] systemd[1]: Job layout-daemon.service/start failed with result 'dependency'. Jun 28 15:29:45 localhost kernel: [ 682.454963] systemd[1]: Assertion 'j->installed' failed at src/manager.c:1202, function transaction_apply(). Aborting. Jun 28 15:29:45 localhost systemd[1]: Caught <ABRT>, dumped core as pid 2162. Jun 28 15:29:45 localhost systemd[1]: Freezing execution. Version-Release number of selected component (if applicable): systemd-sysv-26-3.fc15.i686 systemd-units-26-3.fc15.i686 systemd-26-3.fc15.i686 How reproducible: Haven't yet figured how. Additional info: Core dump attached.
Created attachment 510301 [details] Core dump.
Unfortunately I am unable to get anything useful from the core dump: (gdb) bt #0 0x00b2c424 in __kernel_vsyscall () #1 0x0070539e in ?? () #2 0xbf9b2bc8 in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) What are the services initialise-layout-daemon, start-players, layout-daemon? I wanted to test them myself, but I could not find anything. > How reproducible: > Haven't yet figured how. Any news? Finding a way to reproduce it would be very useful.
(In reply to comment #2) > Unfortunately I am unable to get anything useful from the core dump: > (gdb) bt > #0 0x00b2c424 in __kernel_vsyscall () > #1 0x0070539e in ?? () > #2 0xbf9b2bc8 in ?? () > Backtrace stopped: previous frame inner to this frame (corrupt stack?) > > What are the services initialise-layout-daemon, start-players, layout-daemon? I > wanted to test them myself, but I could not find anything. They are our internal services, but they do work most of the time - apart this time when they caused systemd to crash. > > How reproducible: > > Haven't yet figured how. > > Any news? Finding a way to reproduce it would be very useful. Nothing yet unfortunately.
I'm getting this with systemd-26-8.fc15.x86_64. I was able to reproduce it by commenting out a line in /etc/fstab, which caused other lines (bind mounts) to fail. Here's the backtrace: #0 0x0000003fa340edbb in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42 #1 0x000000000040ab8c in crash (sig=6) at src/main.c:121 #2 <signal handler called> #3 0x0000003fa30352d5 in __GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #4 0x0000003fa3036beb in __GI_abort () at abort.c:93 #5 0x000000000045ea21 in log_assert (file=0x4795e2 "src/manager.c", line=1205, func=0x47af30 "transaction_apply", format=<optimized out>) at src/log.c:496 #6 0x000000000040eb59 in transaction_apply (mode=<optimized out>, m=0x3640830) at src/manager.c:1205 #7 transaction_activate (e=0x0, mode=JOB_ISOLATE, m=0x3640830) at src/manager.c:1342 #8 manager_add_job (m=0x3640830, type=JOB_START, unit=0x37544e0, mode=JOB_ISOLATE, override=<optimized out>, e=0x0, _ret=0x0) at src/manager.c:1692 #9 0x000000000046b848 in unit_trigger_on_failure (u=0x37475e0) at src/unit.c:1120 #10 0x000000000047218b in job_finish_and_invalidate (j=<optimized out>, result=<optimized out>) at src/job.c:613 #11 0x00000000004723a4 in job_finish_and_invalidate (j=<optimized out>, result=JOB_FAILED) at src/job.c:575 #12 0x000000000046bdd9 in unit_notify (u=0x373bdf0, os=UNIT_ACTIVATING, ns=UNIT_FAILED, reload_success=true) at src/unit.c:1200 #13 0x0000000000427854 in mount_set_state (m=0x373bdf0, state=MOUNT_FAILED) at src/mount.c:690 #14 0x0000000000428b9c in mount_sigchld_event (u=0x373bdf0, pid=491, code=1, status=32) at src/mount.c:1266 #15 0x000000000040bd62 in manager_dispatch_sigchld (m=0x3640830) at src/manager.c:2055 #16 0x0000000000411961 in manager_process_signal_fd (m=<optimized out>) at src/manager.c:2254 #17 process_event (ev=0x7fff992f9890, m=0x3640830) at src/manager.c:2279 #18 manager_loop (m=0x3640830) at src/manager.c:2413 #19 0x0000000000408c6a in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:1280 (This is from /core.518, which I can keep around for a while -- xz-compressed size is 718kb.) Here's my /etc/fstab from a run that failed: /dev/mapper/vg_worm01-LogVol00 / ext4 defaults 1 1 UUID=cc6e7172-7f4e-4edf-8f08-a067ce88202e /boot ext3 defaults 1 2 /dev/mapper/vg_worm00-LogVol00 /home ext4 defaults 1 2 #/dev/mapper/luks-cd4fe06b-cad2-4a42-8925-9d309ca30c62 /mnt/backup ext3 defaults 1 2 UUID=66412111-7bc5-404f-a072-0f764d8007fa swap swap defaults 0 0 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 /mnt/backup/mock /var/lib/mock none bind 0 0 /mnt/backup/mock-cache /var/cache/mock none bind 0 0 As you can see, with the "/mnt/backup" line commented out, the final two lines cannot succeed.
Tim, (In reply to comment #4) > Here's the backtrace: > [...] It shows the mount unit failed as expected, local-fs.target was considered failed too. local-fs.target has "OnFailure=emergency.target" and the crash happened while trying to isolate emergency.target. I am still unable to reproduce it on my system though. Can you reproduce it at will? > (This is from /core.518, which I can keep around for a while -- xz-compressed > size is 718kb.) Do you still have it? Could you attach it here or send it to me directly?
Created attachment 528034 [details] core.518.xz
The assertion failed in transaction_apply() where the list of installed jobs is iterated in order to remove some of the jobs. /* When isolating first kill all installed jobs which * aren't part of the new transaction */ HASHMAP_FOREACH(j, m->jobs, i) { assert(j->installed); if (hashmap_get(m->transaction_jobs, j->unit)) continue; job_finish_and_invalidate(j, JOB_CANCELED); } Let's take a look at the job "j" for which the assertion failed: (gdb) p j $18 = (Job *) 0x396aed0 (gdb) p j->unit->meta.id $19 = 0x393a5f0 "\001" This looks like garbage. In fact the whole Unit structure pointed to by j->unit looks like nonsense. I wrote a gdb script to dump the whole m->jobs hashmap: define dump_job_list set $iter = ($arg0)->iterate_list_head while ($iter != 0) set $pjob = (Job*)$iter->value printf "Job %p; unit id: %s\n", $pjob, $pjob->unit->meta.id set $iter = $iter->iterate_next end end All the jobs in m->jobs look perfectly alright, with readable names: (gdb) dump_job_list m->jobs [...] Job 0x396a070; unit id: systemd-tmpfiles-setup.service Job 0x396a710; unit id: systemd-random-seed-load.service Job 0x396b070; unit id: cryptsetup@luks\x2d2a3bee45\x2d582c\x2d4045\x2d9042\x2d2e48d5febf50.service Job 0x396b190; unit id: dev-disk-by\x2duuid-2a3bee45\x2d582c\x2d4045\x2d9042\x2d2e48d5febf50.device Job 0x396b2f0; unit id: dev-mapper-luks\x2d2a3bee45\x2d582c\x2d4045\x2d9042\x2d2e48d5febf50.device [...] The listed job addresses are increasing monotonically. There is no job corresponding "j" (address 0x396aed0). The m->jobs hashmap is consistent, so how did we get the address into "j"? The answer is in the way HASHMAP_FOREACH operates. It is written to be safe against the removal of the current entry. The iterator "i" always points to the entry that will be retrieved next. This means that HASHMAP_FOREACH is not safe against the removal of the _next_ entry. job_finish_and_invalidate(j, JOB_CANCELED) can remove not only the current job, but also recursively other jobs. One of them may be the one pointed to by the iterator "i". This must result in the crash.
Upstream fix: http://cgit.freedesktop.org/systemd/commit/?id=563ba9ea6e60774086555998b957edf923e24b46
systemd-37-2.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/FEDORA-2011-14239
systemd-26-12.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/FEDORA-2011-14203
Package systemd-37-2.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing systemd-37-2.fc16' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2011-14239 then log in and leave karma (feedback).
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
systemd-26-13.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report.
systemd-37-3.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.