Bug 1650289 - some service wont start with timeout when stop then start the server (reboot work)
Summary: some service wont start with timeout when stop then start the server (reboot ...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 29
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-15 18:13 UTC by Dario Lesca
Modified: 2019-11-27 22:38 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-27 22:38:15 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
some log files before and after "Wants=network-online.target" addiction (650.00 KB, application/x-tar)
2018-11-15 18:13 UTC, Dario Lesca
no flags Details

Description Dario Lesca 2018-11-15 18:13:05 UTC
Created attachment 1506198 [details]
some log files before and after "Wants=network-online.target" addiction

Description of problem:
I I stop then start my up to date qemu/kvm fedora server,
some service like bind or gssproxy wont start and go to timeout,
even the sshd service is late to start, but after some minutes it's ready

A simple restart of server do not cause this issue

Version-Release number of selected component (if applicable):
[root@s-addc after]# cat /etc/redhat-release 
Fedora release 29 (Twenty Nine)
[root@s-addc after]# rpm -q samba gssproxy bind
samba-4.9.2-0.fc29.x86_64
gssproxy-0.8.0-6.fc29.x86_64
bind-9.11.4-10.P2.fc29.x86_64
[root@s-addc after]# uname -rv
4.18.18-300.fc29.x86_64 #1 SMP Mon Nov 12 03:12:14 UTC 2018
[root@s-addc after]# 

How reproducible:
I have setup a full ADDC samba server then I have enable samba and named services (gssproxy was enable at default)
I think is sufficient enable only named to recreate the error

Steps to Reproduce:
1. configura and enable samba ad-dc with bind dns 
2. enable named service
3. stop the server
4. start the server

Actual results:
Service bind and gssproxy wont start

Expected results:
this services start

Additional info:
I have apply to this file:
/usr/lib/systemd/system/{gssproxy,named,sshd}.service

Wants=network-online.target
After=network.target network-online.target

like implemented into samba.service and all work fine

Add some log files:
./before/
./before/journalctl-b.txt
./before/journalctl-xe.txt
./before/systemctl-status-gssproxy+named.txt
./after/
./after/systemctl-status-gssproxy+named.txt
./after/journalctl-b.txt
./after/journalctl-xe.txt
./wants=network-online.target-to-gssproxy-named-sshd.service.txt

Comment 1 Harald Reindl 2018-11-19 01:07:53 UTC
hell don't touch unit files below /usr/lib/systemd/system/
this is distribution area

one of the great benefits of systemd is that you can place your local changes and overrides in /etc/systemd/system and for most caes drop-ins in /etc/systemd/systemd/servicename.service.d/ are enough to add additional Before/After and similar stuff

Comment 2 Dario Lesca 2018-11-19 10:47:54 UTC
Thank Harald, I'm sorry to bother devel ML.
 
Then this is not a bug? the named.service which not start when I stop/start my server is not a unit service bug?

Good to know.
then I must resolve my issue from myself

What is the best way to do that?

Many thanks

Dario

Comment 3 Harald Reindl 2018-11-19 10:51:03 UTC
normally named don't need network-online.target but there are setups where it does and i esplained above how to extend systemd units without touch fules below /usr/lib/systemd/system/

Google and "systemd drop-ins" leads to:
https://coreos.com/os/docs/latest/using-systemd-drop-in-units.html

Comment 4 Dario Lesca 2018-11-19 16:35:31 UTC
Thank from suggest.

I have do this:

  # mkdir -p /etc/systemd/system/named.service.d
  # printf "[Unit]\nWants=network-online.target\nAfter=network-online.target\n" > /etc/systemd/system/named.service.d/10-want-network-online.conf
  # systemctl daemon-reload
  # systemd-delta --type=extended
  [EXTENDED]   /usr/lib/systemd/system/named.service → /etc/systemd/system/named.service.d/10-want-network-online.conf
  [EXTENDED]   /usr/lib/systemd/system/systemd-udev-trigger.service → /usr/lib/systemd/system/systemd-udev-trigger.service.d/systemd-udev-trigger-no-reload.conf

  2 overridden configuration files found.

  # systemctl enable named --now
  # poweroff

and now when server start the named service start correctly.

The gssproxy I have masked (I do not know what is useful for) and sshd after a while go to online.

Someone know why in some situation some services like sshd and named have this behaviour?

How I can investigate why in my case this happen ?

Many thanks

Comment 5 Harald Reindl 2018-11-19 17:42:11 UTC
you don't need to disable/enable
systemctl daemon-reload && systemctl restart named

the situation is mostly when the service is configured ot listen on a specific IP

httpd: "Listen 0.0.0.0:80" versus "Listen 129.168.196.1:80"
the same for sshd when you use "ListenAddress" instead just configure "Port"

with 0.0.0.0 the service does a "free bind" and answers on whatever interface a packge comes in while when you tell a service listening on a specific address it needs to be up at the time the service is started

for named: it typically realizes later coming interfaces when not bound to a speific IP and starts responding there too - i prefer ordering after network and in cases openvpn is part of the game even after openvpn because at the end of the day it's mostly faster answering when all interfaces and routes are present at start

Comment 6 Dario Lesca 2018-11-20 13:44:27 UTC
I try to set bind to listen to 0.0.0.0 and remove overridden configuration files, but the problem still exist.

Another issue is the slow boot of sshd, when the interface is on and I can ping it, I'm must wait few minutes (Connection refused) to connect via ssh to server 

On my PC I have do this test:

[lesca@s-scarwall ~]$ while ! ssh root@s-addc; do date; sleep 2; done
Last login: Tue Nov 20 14:02:12 2018 from 192.168.50.254
[root@s-addc ~]# poweroff
[root@s-addc ~]# Connection to s-addc closed by remote host.
Connection to s-addc closed.
mar 20 nov 2018, 14.04.33, CET
ssh: connect to host s-addc port 22: Connection refused
mar 20 nov 2018, 14.04.35, CET
(at this point I have started the machine ...)
ssh: connect to host s-addc port 22: Connection timed out
mar 20 nov 2018, 14.05.40, CET

(at this point the machine is on and I can access via console login)

ssh: connect to host s-addc port 22: Connection refused
mar 20 nov 2018, 14.05.42, CET
ssh: connect to host s-addc port 22: Connection refused
mar 20 nov 2018, 14.05.44, CET
ssh: connect to host s-addc port 22: Connection refused
mar 20 nov 2018, 14.05.46, CET

(and so on ...)

ssh: connect to host s-addc port 22: Connection refused
mar 20 nov 2018, 14.07.20, CET
ssh: connect to host s-addc port 22: Connection refused
mar 20 nov 2018, 14.07.22, CET
ssh: connect to host s-addc port 22: Connection refused
mar 20 nov 2018, 14.07.24, CET
ssh: connect to host s-addc port 22: Connection refused
mar 20 nov 2018, 14.07.26, CET
Last login: Tue Nov 20 14:04:23 2018 from 192.168.50.254
[root@s-addc ~]# 

At this point, I am finally into machine, and in this case the named service is not started due timeout, also with the overridden configuration!

This is the systemd analyze log:
[root@s-addc ~]# systemd-analyze blame|head
    1min 48.801s named.service
          3.465s lvm2-monitor.service
          2.719s systemd-journal-flush.service
          2.562s systemd-udevd.service
          2.219s samba.service
          1.610s NetworkManager-wait-online.service
          1.552s firewalld.service
          1.412s dracut-initqueue.service
          1.378s sssd.service
          1.169s initrd-switch-root.service

I do not know why the sshd do not respond even if it is active

I have retry after a while the same cycle and in this second case the system is go to on faster and the service are started correctly

[root@s-addc ~]# poweroff
[root@s-addc ~]# Connection to s-addc closed by remote host.
Connection to s-addc closed.
mar 20 nov 2018, 14.14.42, CET
ssh: connect to host s-addc port 22: Connection refused
mar 20 nov 2018, 14.14.44, CET
ssh: connect to host s-addc port 22: Connection timed out
mar 20 nov 2018, 14.15.49, CET
Last login: Tue Nov 20 14:07:28 2018 from 192.168.50.254
[root@s-addc ~]# 

This is the systemd analyze of this boot
[root@s-addc ~]# systemd-analyze blame|head
          4.938s lvm2-monitor.service
          2.758s samba.service
          1.694s firewalld.service
          1.547s NetworkManager-wait-online.service
          1.384s lvm2-pvscan@252:2.service
          1.356s dracut-initqueue.service
          1.294s sssd.service
          1.187s named.service
          1.171s initrd-switch-root.service
           702ms systemd-vconsole-setup.service

How to I get rid with this issue?
Someone have some suggest?
Many thanks

Comment 7 Ben Cotton 2019-10-31 19:14:22 UTC
This message is a reminder that Fedora 29 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 29 on 2019-11-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '29'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 29 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 8 Ben Cotton 2019-11-27 22:38:15 UTC
Fedora 29 changed to end-of-life (EOL) status on 2019-11-26. Fedora 29 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.