Bug 1415295 - lsyncd.service: Main process exited, code=exited, status=255/n/a
Summary: lsyncd.service: Main process exited, code=exited, status=255/n/a
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: lsyncd
Version: 24
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jason Taylor
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-20 19:34 UTC by Charles R. Anderson
Modified: 2017-02-27 22:52 UTC (History)
8 users (show)

Fixed In Version: lsyncd-2.2.2-1.fc24 lsyncd-2.2.2-1.fc25
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-27 22:48:40 UTC
Type: Bug


Attachments (Terms of Use)
lsyncd.conf (2.47 KB, text/plain)
2017-01-20 19:40 UTC, Charles R. Anderson
no flags Details
lsyncd.log (77.98 KB, text/plain)
2017-01-20 19:47 UTC, Charles R. Anderson
no flags Details
lsyncd config test (1.08 KB, text/x-vhdl)
2017-01-21 14:31 UTC, Jason Taylor
no flags Details
testcase: lsynctest.conf — lsync config file used to reproduce bug (408 bytes, text/plain)
2017-01-22 10:13 UTC, "FeRD" (Frank Dana)
no flags Details

Description Charles R. Anderson 2017-01-20 19:34:45 UTC
Description of problem:

lsyncd.service fails.  This happens at some point in time after the initial rsyncs are finished.  It was about 30 minutes after bootup in this case.

The message from systemd is:

● lsyncd.service - Live Syncing (Mirror) Daemon
   Loaded: loaded (/usr/lib/systemd/system/lsyncd.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2017-01-20 14:07:36 EST; 13min ago
  Process: 1064 ExecStart=/usr/bin/lsyncd -nodaemon $LSYNCD_OPTIONS (code=exited, status=255)
 Main PID: 1064 (code=exited, status=255)

Version-Release number of selected component (if applicable):
lsyncd-2.2.1-1.fc24.x86_64

How reproducible:
always

Steps to Reproduce:
1. reboot or manually (re)start lsyncd.service
2. wait for lsyncd to finish the initial rsyncs
3. after some time, perhaps 30-60 minutes, lsyncd exits with code 255

Actual results:
lsyncd service is in failed state

Expected results:
lsyncd service keeps running in normal active state

Additional info:

Full output from "systemctl status lsyncd":

● lsyncd.service - Live Syncing (Mirror) Daemon
   Loaded: loaded (/usr/lib/systemd/system/lsyncd.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2017-01-20 14:07:36 EST; 13min ago
  Process: 1064 ExecStart=/usr/bin/lsyncd -nodaemon $LSYNCD_OPTIONS (code=exited, status=255)
 Main PID: 1064 (code=exited, status=255)

Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/sessionstore-backups/recovery.js.tmp
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/storage/permanent/chrome/idb/2918063365piupsah.sqlite-shm
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/storage/permanent/chrome/idb/2918063365piupsah.sqlite-shm
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/storage/permanent/chrome/idb/2918063365piupsah.sqlite-wal
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/storage/permanent/chrome/idb/2918063365piupsah.sqlite-wal
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/storage/permanent/chrome/idb/2918063365piupsah.sqlite
Jan 20 14:07:36 hostname lsyncd[1064]: 14:07:36 Normal: Finished a list after exitcode: 0
Jan 20 14:07:36 hostname systemd[1]: lsyncd.service: Main process exited, code=exited, status=255/n/a
Jan 20 14:07:36 hostname systemd[1]: lsyncd.service: Unit entered failed state.
Jan 20 14:07:36 hostname systemd[1]: lsyncd.service: Failed with result 'exit-code'.

Comment 1 Charles R. Anderson 2017-01-20 19:37:39 UTC
This may be the same bug as #1196710 Failure with a list width exitcode = 2.  I just checked the lsync log and it has this at the end:

/home/cra/.cache/mozilla/firefox/7u6riz94.default/cache2/entries/5C7DD8CEFC3DA1517138DD8C7AF8AE57862EBBEE
/home/cra/.cache/mozilla/firefox/7u6riz94.default/cache2/entries/12727A8AA6DB17B6F46A446B9D783D7FBB122D5E
/home/cra/.cache/mozilla/firefox/7u6riz94.default/cache2/entries/55AD675C4FC6E4658DC1B97AF24DF464B7D8A8A8
/home/cra/.cache/mozilla/firefox/7u6riz94.default/cache2/entries/AE800BE3D43F2B1829D865A1DA506B118F33C9D8
/home/cra/.cache/mozilla/firefox/7u6riz94.default/cache2/entries/D433919A2D95FDB7D61AEA753F86DE7CAE72386F
/home/cra/.cache/mozilla/firefox/7u6riz94.default/cache2/entries/E44B7E34C6A535D8D7ED04F1CA15E16BFBD297C4
/home/cra/.cache/mozilla/firefox/7u6riz94.default/cache2/entries/E1B92D58525CDAB8AA41123359BD4C86F65C6081
/home/cra/.cache/mozilla/firefox/7u6riz94.default/cache2/entries/CE31F1A0EBA42BC45F6C5D4C360A21D0F74050F5
/home/cra/.cache/mozilla/firefox/7u6riz94.default/cache2/entries/B504B08C1DDC0FCF571A7FDF3824FBD67732B458
Fri Jan 20 14:35:11 2017 Error: Failure with a list with exitcode = 2
Fri Jan 20 14:35:11 2017 Error: Critical exitcode.

Comment 2 Charles R. Anderson 2017-01-20 19:40:07 UTC
Created attachment 1242971 [details]
lsyncd.conf

Comment 3 Charles R. Anderson 2017-01-20 19:47:13 UTC
Created attachment 1242972 [details]
lsyncd.log

Log showing entire lsyncd run from initial startup to failure with a list with exitcode = 2.

Comment 4 Charles R. Anderson 2017-01-20 23:42:28 UTC
Here is more detail from the journal.  It looks like the problem is cause by "ABORTING due to invalid path from sender: home/cra/.cache/mozilla/firefox/7u6riz94.default/safebrowsing-backup/base-track-d".

Jan 20 14:07:36 hostname lsyncd[1064]: file has vanished: "/home/cra/.cache/mozilla/firefox/7u6riz94.default/safebrowsing-to_delete"
Jan 20 14:07:36 hostname lsyncd[1064]: file has vanished: "/home/cra/.cache/mozilla/firefox/7u6riz94.default/safebrowsing-backup"
Jan 20 14:07:36 hostname lsyncd[1064]: file has vanished: "/home/cra/.cache/mozilla/firefox/7u6riz94.default/safebrowsing-to_delete"
Jan 20 14:07:36 hostname lsyncd[1064]: file has vanished: "/home/cra/.cache/mozilla/firefox/7u6riz94.default/safebrowsing-backup"
Jan 20 14:07:36 hostname lsyncd[1064]: file has vanished: "/home/cra/.cache/mozilla/firefox/7u6riz94.default/safebrowsing-to_delete"
Jan 20 14:07:36 hostname lsyncd[1064]: ABORTING due to invalid path from sender: home/cra/.cache/mozilla/firefox/7u6riz94.default/safebrowsing-backup/base-track-d
Jan 20 14:07:36 hostname lsyncd[1064]: rsync error: protocol incompatibility (code 2) at generator.c(1271) [generator=3.1.2]
Jan 20 14:07:36 hostname lsyncd[1064]: 14:07:36 Error: Failure with a list with exitcode = 2
Jan 20 14:07:36 hostname lsyncd[1064]: 14:07:36 Error: Critical exitcode.
Jan 20 14:07:36 hostname lsyncd[1064]: .default/storage/permanent/chrome/idb/2918063365piupsah.sqlite-wal
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/storage/permanent/chrome/idb/2918063365piupsah.sqlite-shm
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/storage/permanent/chrome/idb/2918063365piupsah.sqlite-shm
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/storage/permanent/chrome/idb/2918063365piupsah.sqlite-shm
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/storage/permanent/chrome/idb/2918063365piupsah.sqlite-wal
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/storage/permanent/chrome/idb/2918063365piupsah.sqlite-wal
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/storage/permanent/chrome/idb/2918063365piupsah.sqlite
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/sessionstore-backups/recovery.js.tmp
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/sessionstore-backups/recovery.js.tmp
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/sessionstore-backups/recovery.js
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/sessionstore-backups/recovery.bak
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/sessionstore-backups/recovery.js.tmp
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/storage/permanent/chrome/idb/2918063365piupsah.sqlite-shm
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/storage/permanent/chrome/idb/2918063365piupsah.sqlite-shm
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/storage/permanent/chrome/idb/2918063365piupsah.sqlite-wal
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/storage/permanent/chrome/idb/2918063365piupsah.sqlite-wal
Jan 20 14:07:36 hostname lsyncd[1064]: /cra/.mozilla/firefox/7u6riz94.default/storage/permanent/chrome/idb/2918063365piupsah.sqlite
Jan 20 14:07:36 hostname lsyncd[1064]: 14:07:36 Normal: Finished a list after exitcode: 0
Jan 20 14:07:36 hostname systemd[1]: lsyncd.service: Main process exited, code=exited, status=255/n/a
Jan 20 14:07:36 hostname systemd[1]: lsyncd.service: Unit entered failed state.
Jan 20 14:07:36 hostname systemd[1]: lsyncd.service: Failed with result 'exit-code'.

Comment 5 Jason Taylor 2017-01-21 00:09:47 UTC
Hi Charles, Thanks for the filing the report. 

Are you able to add an exclude entry for the transient files? If so, does the lsyncd service stay running?

Comment 6 Charles R. Anderson 2017-01-21 00:11:57 UTC
I'm struggling with the proper exclude syntax.  I can't get it to exlude these files.  Here is what I just tried, but it is still going into /home/cra/.cache

sync {
	default.rsync,
	source = "/home/",
	target = "/mnt/hddbackup/home",
	exclude = {
		"/home/cra/.cache",
		"/home/cra/.cache/*",
		"/cra/.cache",
		"/cra/.cache/*",
		"/.cache",
		"/.cache/*",
		"/*/mnt",
		"/*/.ccache",
		"/*/.buildroot-ccache",
		"/mock/var/cache",
		"/mock/var/lib/mock/*/root"
	},
	rsync = {
		one_file_system = true,
		owner = true,
		group = true,
		perms = true,
		hard_links = true,
		acls = true,
		xattrs = true,
		sparse = true,
		-- delete_excluded = true - unsupported option
	}
}

Comment 7 Jason Taylor 2017-01-21 00:42:29 UTC
It looks like it might be getting picked up in the source "/" sync definition?

Comment 8 Charles R. Anderson 2017-01-21 00:44:08 UTC
(In reply to Jason Taylor from comment #7)
> It looks like it might be getting picked up in the source "/" sync
> definition?

If that's true then one_file_system = true is broken in this version.

Comment 9 Jason Taylor 2017-01-21 00:50:21 UTC
I'll do some more testing and see if I can recreate what you are seeing.

Comment 10 Jason Taylor 2017-01-21 14:31:23 UTC
Created attachment 1243162 [details]
lsyncd config test

Comment 11 Jason Taylor 2017-01-21 14:34:00 UTC
Hi Charles,

I attached the config that I used for testing.

I was unable to reproduce the exit/errors you are seeing with this configuration.

The excludes worked as expected, I was able to restart lsyncd a number of times and run lsyncd for varied times up to and exceeding 60 minutes.

Let me know if there any additional details you see in the logs that may be of help if you continue to run into the problem.

Thanks

JT

Comment 12 "FeRD" (Frank Dana) 2017-01-22 07:37:24 UTC
This appears to be a regression in 2.2.1 — I'm seeing the same issue in the test update, whereas 2.1.5 doesn't exhibit the same behavior.

I also notice that 2.2.1 is listing a lot of duplicate files in the sync logs. For instance, where 2.1.5 might log...

Sun Jan 22 02:24:44 2017 Normal: Rsyncing list
/path/to/dir1
/path/to/dir1/file1
/path/to/dir1/file2
/path/to/dir2
/path/to/dir2/file1
/path/to/file3

2.2.1 often logs...

Sun Jan 22 02:24:44 2017 Normal: Rsyncing list
/path/to/dir1
/path/to/dir1/file1
/path/to/dir1/file1
/path/to/dir1/file2
/path/to/dir1/file2
/path/to/dir2
/path/to/dir2/file1
/path/to/file3
/path/to/file3
/path/to/file3

This makes me SUSPECT that the issue might be a race condition between multiple local rsync processes being fired off, so that when deletions are enabled they're essentially pulling files out from under each other.

I've seen references to a "max processes" config file directive, in lsync's github Issues area... I'm going to look into that and test whether it has any effect on this issue with 2.2.1.

Comment 13 "FeRD" (Frank Dana) 2017-01-22 07:43:39 UTC
Alas, adding "maxProcesses = 1" to my config file's settings {} stanza has no effect, lsyncd-2.2.1 still crashes when attempting to sync deletions.

Comment 14 "FeRD" (Frank Dana) 2017-01-22 10:13:30 UTC
Created attachment 1243306 [details]
testcase: lsynctest.conf — lsync config file used to reproduce bug

Re: comment #7, I can provide steps to reproduce this error. (Ignoring the exclusion possibility, as I'm seeing this with regular file trees — local offline archives of HTML pages — that do need to be synced, so exclusion is not an option.)

1. Store the attached lsyncd config file as /var/tmp/lsynctest.conf
2. `mkdir /var/tmp/lsynctest` on both source and destination machines
3. `lsyncd --nodaemon /var/tmp/lsynctest.conf`
4. Once lsyncd has started up, on the source machine run:
   `cp -R /etc/systemd /var/tmp/lsynctest/`
5. Wait until lsyncd syncs all of the files in /var/tmp/lsynctest/
6. `rm -r /var/tmp/lsynctest/systemd/system/multi-user.target.wants`
7. Wait for lsyncd to attempt to sync the deletion, and... boom.

file has vanished: "/var/tmp/lsynctest/systemd/system/multi-user.target.wants"
ABORTING due to invalid path from sender: systemd/system/multi-user.target.wants/ModemManager.service
rsync error: protocol incompatibility (code 2) at generator.c(1271) [generator=3.1.2]
05:05:09 Error: Failure (list): 2
05:05:09 Error: Critical exitcode.


(NB: In step 4, /etc/systemd could be any other directory tree with a little depth to it. We need a subdirectory populated with multiple files, so we can delete it in step 6.)

Comment 15 "FeRD" (Frank Dana) 2017-01-22 10:15:27 UTC
Whoops, I left my destination machine's .local name in the config file as the host="" parameter. You should edit that accordingly, of course.

Comment 16 "FeRD" (Frank Dana) 2017-01-22 10:51:40 UTC
I've just run this by the upstream developers as well, see:
https://github.com/axkibe/lsyncd/issues/422

Comment 17 Jason Taylor 2017-01-22 17:32:16 UTC
Thanks guys for testing this and reporting the issues.

Thanks Frank, I saw your report in my github feed. I'll follow up with Axel as well if we don't hear anything in the near future.

JT

Comment 18 Jason Taylor 2017-02-04 22:34:44 UTC
Frank or Charles,

Any updates on this? 

Axel responded and it looks like lsyncd is functioning as expected and the error being seen is potentially related to .bashrc[0].

[0]https://github.com/axkibe/lsyncd/issues/422

Comment 19 "FeRD" (Frank Dana) 2017-02-04 22:46:08 UTC
Yeah, I saw that response... I can't deny that environmental factors are a POSSIBILITY, but nevertheless, 2.1.5 exhibits no problems at all with the same rsync version, in the exact same user environment. So, my response to "must be a bad config" would be, "config worked fine with 2.1.5".

But I did want to run some experiments under newly-created, minimal user accounts, to explore/eliminate that possibility. I have not had the chance to do so yet, I'm afraid. I'll see if I can fit it in tomorrow (Sunday) or Monday.

Comment 20 Charles R. Anderson 2017-02-05 07:52:57 UTC
(In reply to "FeRD" (Frank Dana) from comment #19)
> Yeah, I saw that response... I can't deny that environmental factors are a
> POSSIBILITY, but nevertheless, 2.1.5 exhibits no problems at all with the
> same rsync version, in the exact same user environment. So, my response to
> "must be a bad config" would be, "config worked fine with 2.1.5".

Same here.

(In reply to Jason Taylor from comment #18)
> Axel responded and it looks like lsyncd is functioning as expected and the
> error being seen is potentially related to .bashrc[0].

That makes no sense--there is no remote rsyncing happening in my case, and the local rsync runs as root.  I have not modified root's bashrc, but the default one does source /etc/bashrc, so it would seem rsync could be at the mercy of Fedora's default bashrc configuration.  If this truly is causing the problem, it is a very fragile setup and lsyncd should be fixed to not rely on the bashrc configuration.

I have a hunch that this problem has always existed, even with 2.1.5, but that I worked around it by excluding rapidly-changing directories like the mozilla cache and other caches, and that 2.2.1 somehow broke exclusion or one_file_system as I said in comment #6 and comment #7.

Comment 21 Fedora Update System 2017-02-19 13:19:28 UTC
lsyncd-2.2.2-1.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-4085621c29

Comment 22 Jason Taylor 2017-02-19 13:22:57 UTC
Charles and Frank,

Axel released version 2.2.2 which should resolve the issues described above. The 2.2.2 packages have been pushed to testing.

Please test and post the outcomes when you all get a chance.

Thanks in advance!

JT

Comment 23 Fedora Update System 2017-02-19 13:27:57 UTC
lsyncd-2.2.2-1.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2017-0214bf1405

Comment 24 Fedora Update System 2017-02-20 00:49:59 UTC
lsyncd-2.2.2-1.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-4085621c29

Comment 25 Fedora Update System 2017-02-20 05:22:15 UTC
lsyncd-2.2.2-1.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-0214bf1405

Comment 26 Charles R. Anderson 2017-02-27 02:26:05 UTC
(In reply to Jason Taylor from comment #22)
> Axel released version 2.2.2 which should resolve the issues described above.
> The 2.2.2 packages have been pushed to testing.

Looks much better..I have been running it for the last 5 days and it hasn't crashed.  Thanks!

Comment 27 Fedora Update System 2017-02-27 22:48:40 UTC
lsyncd-2.2.2-1.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 28 Fedora Update System 2017-02-27 22:52:58 UTC
lsyncd-2.2.2-1.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.