844607 – aiccu treats lack of network connectivity on startup as a fatal error

Bug 844607 - aiccu treats lack of network connectivity on startup as a fatal error

Summary: aiccu treats lack of network connectivity on startup as a fatal error

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	aiccu
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Pavel Šimerda (pavlix)
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	887203 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-07-31 07:36 UTC by Eric Hopper
Modified:	2016-10-19 10:49 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-05-02 12:03:08 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Patch to retry TIC network connection on net outtage (3.13 KB, patch) 2012-08-07 01:19 UTC, Conrad Meyer	no flags	Details \| Diff
Patch to retry on net outage (4.47 KB, patch) 2012-08-07 01:25 UTC, Conrad Meyer	no flags	Details \| Diff
Patch AICCU to retry with backoff (4.04 KB, patch) 2012-08-07 15:49 UTC, Eric Hopper	no flags	Details \| Diff
Updated: Patch AICCU to retry with backoff (3.96 KB, patch) 2012-08-07 16:01 UTC, Eric Hopper	no flags	Details \| Diff
NetworkManager dispatcher script to handle aiccu (543 bytes, text/plain) 2012-08-09 09:45 UTC, David Waring	no flags	Details
Updated: NetworkManager dispatcher script to handle aiccu (333 bytes, application/octet-stream) 2013-03-20 15:56 UTC, Josh Reynolds	no flags	Details
Update2: NetworkManager dispatcher script to handle aiccu (267 bytes, text/plain) 2013-03-22 01:27 UTC, Josh Reynolds	no flags	Details
startup script for aiccu waiting for chrony sync (2.58 KB, text/plain) 2014-09-07 15:55 UTC, Frank Ansari	no flags	Details
Show Obsolete (3) View All

Description Eric Hopper 2012-07-31 07:36:20 UTC

Description of problem:
When aiccu starts up, sometimes the network isn't in a state where aiccu can actually reach the outside world. aiccu then crashes and it never starts again.

Also, sometimes aiccu can crash for some other reason, and it doesn't restart then either.

Version-Release number of selected component (if applicable):
aiccu-2007.01.15-12.fc17.x86_64

How reproducible:
On my setup, all the time.

Steps to Reproduce:
1.Boot the computer
2.Check the status of aiccu
3.Notice that it's dead and not coming back
  
Actual results:
No IPv6 tunnel

Expected results:
A working IPv6 tunnel

Additional info:
I've fixed this myself by modifying the aiccu.service file. This is an ugly thing to have to do, but it works.

I added these two lines to the [Service] section:

Restart=always
RestartSec=10

This restarts aiccu if it crashes or fails to start. This is the correct behavior as aiccu is supposed to be the enabling program for a permanent persistent IPv6 tunnel. If it's down, something is wrong and the system should be trying to bring it back up again.

Comment 1 Eric Hopper 2012-07-31 07:37:45 UTC

Oh, I realize that this is related to:

https://bugzilla.redhat.com/show_bug.cgi?id=735538

but the proposed solution there is wrong, and the assertion that the problem can't be fixed is wrong. aiccu is supposed to be maintaining a long-term persistent tunnel. It should be restarted when it crashes.

Comment 2 Fedora Update System 2012-08-01 05:50:41 UTC

aiccu-2007.01.15-14.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/aiccu-2007.01.15-14.fc16

Comment 3 Fedora Update System 2012-08-01 05:50:51 UTC

aiccu-2007.01.15-14.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/aiccu-2007.01.15-14.fc17

Comment 4 Conrad Meyer 2012-08-01 05:52:08 UTC

Added the two lines, pushed updates to F16 and F17:

https://admin.fedoraproject.org/updates/aiccu-2007.01.15-14.fc16
https://admin.fedoraproject.org/updates/aiccu-2007.01.15-14.fc17

Any chance you'd be willing to confirm that this fixes the issue?

Thanks.

Comment 5 Eric Hopper 2012-08-01 17:28:32 UTC

Well, I know the modification to the .service file certainly seems to fix the issue. But I'll see about testing this today on a different box to make sure that the version in updates-testing solves the issue. Because, who knows, maybe I just got lucky, or maybe, somehow, the modification to the .service file didn't get into the repo. :-)

Comment 6 Fedora Update System 2012-08-01 22:27:53 UTC

Package aiccu-2007.01.15-14.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing aiccu-2007.01.15-14.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-11383/aiccu-2007.01.15-14.fc16
then log in and leave karma (feedback).

Comment 7 Conrad Meyer 2012-08-02 00:52:56 UTC

(In reply to comment #5)
> Well, I know the modification to the .service file certainly seems to fix
> the issue. But I'll see about testing this today on a different box to make
> sure that the version in updates-testing solves the issue. Because, who
> knows, maybe I just got lucky, or maybe, somehow, the modification to the
> .service file didn't get into the repo. :-)

Thanks :-)

Comment 8 Lars Seipel 2012-08-03 18:10:32 UTC

Confirming. The update fixed the issue for me on F17.

Comment 9 Eric Hopper 2012-08-03 18:23:05 UTC

SixXs is being really slow about my tunnel approval, and my only connectivity right now to the machine where I originally experienced the issue is through IPv6. :-)

Comment 10 Fedora Update System 2012-08-05 21:23:37 UTC

aiccu-2007.01.15-14.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 11 Jeroen Massar 2012-08-06 07:45:07 UTC

We (SixXS) where fortunatly notified of this change that will severely cause issues for our infrastructure and the users who get to use this.

It also surprises us again that another distribution is ignorant of the requests of the producer of the code and does not contact the upstream for information on these kind of changes or for that matter any other bug.

> I added these two lines to the [Service] section:
> 
> Restart=always
> RestartSec=10

Thank you for attempting to DDoS the SixXS TIC servers.

By adding this you are causing wrongly configured clients to automatically re-connect to tic.sixxs.net every 10 seconds. When AICCU exits it does so for a reason which it logs. Fix that issue, restarting it does not resolve that problem.


From doc/README in the aiccu source:
8<-------------------------------------------
WARNING: never run AICCU from DaemonTools or a similar automated
'restart' tool/script. When AICCU does not start, it has a reason
not to start which it gives on either the stdout or in the (sys)log
file. The TIC server *will* automatically disable accounts which
are detected to run in this mode. Use 'verbose true' to see more
information which is especially handy when starting fails.
----------------------------------------------->8

As such, undo this patch ASAP, it does not resolve any problem and creates a lot of them for our infrastructure and the users.

The original bug mentioned:

"Also, sometimes aiccu can crash for some other reason, and it doesn't restart then either."

Which crashes? I do not see a crash dump or anything in this report.

Comment 12 Conrad Meyer 2012-08-06 14:25:53 UTC

The common complaint seems to be that aiccu will crash when it cannot contact the network; restarting in 10 seconds seems reasonable, because the network may be up by then. The patch resolves this problem. On network outage, of course, aiccu will not ddos sixxs.

Do you think there are enough wrongly configured clients to ddos?

I don't have a better solution to this and I'm not about to dive into an abandoned, five years dead piece of software. If you have a patch, it is welcome. Otherwise I'll end up orphaning or killing aiccu in Fedora.

Comment 13 Eric Hopper 2012-08-06 14:46:57 UTC

(In reply to comment #11)
> By adding this you are causing wrongly configured clients to automatically
> re-connect to tic.sixxs.net every 10 seconds. When AICCU exits it does so
> for a reason which it logs. Fix that issue, restarting it does not resolve
> that problem.

Your concern is understandable. I did not think it through as carefully as I ought to have. For example, if the username and password are wrong, it does little good to restart it and thousands of people doing this might well cause a DDOS.

But the thing crashes for reasons that it shouldn't. Crashing because the network isn't there when it starts up is not correct behavior. This does not prevent a DDOS of SixXS's server, and causes enormous problems for users. There are numerous situations in which the network may not be fully available when AICCU is started.

I do not know of other cases in which it incorrectly crashes. But if it crashes instead of implementing exponential backoff when the network starts having issues, this is also incorrect behavior.

It should only crash when there is a clear and obvious misconfiguration that can only be fixed by the user.

Comment 14 Jeroen Massar 2012-08-06 15:02:07 UTC

(In reply to comment #12)
> The common complaint seems to be that aiccu will crash when it cannot
> contact the network;

You state 'crash', I do not see a crash report, no log files, no crash dump, no coredump etc.

> restarting in 10 seconds seems reasonable, because the
> network may be up by then.

What about you start the daemon AFTER network connectivity is there and the time is properly synced.

This is what is done with other VPN tools too.

> The patch resolves this problem.

Which exact problem does it 'resolve'? It causes all kind of problems on external services.

> On network
> outage, of course, aiccu will not ddos sixxs.


> Do you think there are enough wrongly configured clients to ddos?

Yes, otherwise we would not be bothering to stop people from doing this.
And we would not have added extra code to block repeated offenders and notify them of this problem.

We have people who never realize that it is broken, for instance:

2012-03-14_15:02:10 2012-05-22_17:46:43 xxxx 3903819 xxx.xxx.xxx.xxx

and that is likely not even a setting for 10 seconds, and most of the requests are filtered out as they are dropped by the TIC server before even being processed

And that is just one out of currently 2546 examples that are blocked. Even though they get an email they do not fix their system as they will never look in the logs and just think 'oh it is broken'.

The fun thing is though that when you restart you are generating more log messages and then it will even nicely fill up the disk if you are lucky, as then people complain that it logs too much....
 
> I don't have a better solution to this and I'm not about to dive into an
> abandoned, five years dead piece of software.

"Abandonded Five years dead piece"???

Changes are made regularly and patches from various people are integrated too: http://www.sixxs.net/tools/aiccu/changelog

Thousands of users are using it daily, various vendors have it included in their commercial routers (draytek, cisco and d-link for instance) so it is clearly not dead in any way.


> If you have a patch, it is welcome.

A patch for what? This bug report claims 'crash' while there is none.

> Otherwise I'll end up orphaning or killing aiccu in Fedora.

That seems to be something that is common on distributions that cannot understand the problem at hand.


I'll make a repeat request: please work together with upstreams in resolving problems and make proper bug reports that can actually be looked at.

Comment 15 Jeroen Massar 2012-08-06 15:08:49 UTC

(In reply to comment #13)
> (In reply to comment #11)
> > By adding this you are causing wrongly configured clients to automatically
> > re-connect to tic.sixxs.net every 10 seconds. When AICCU exits it does so
> > for a reason which it logs. Fix that issue, restarting it does not resolve
> > that problem.
> 
> Your concern is understandable. I did not think it through as carefully as I
> ought to have. For example, if the username and password are wrong, it does
> little good to restart it and thousands of people doing this might well
> cause a DDOS.

Bingo. And next to that people's clocks seem to be misconfigured a lot too.

> But the thing crashes for reasons that it shouldn't.

You state crash. Where is the core dump for this? Where is the debug output?

> Crashing because the
> network isn't there when it starts up is not correct behavior.

I think you have the term 'crash' confused with 'logs and exit', which is proper behaviour.

> This does not
> prevent a DDOS of SixXS's server, and causes enormous problems for users.

What 'enormous' problems does AICCU exiting cause?

If there is no network, a wrong NTP time, a wrong password and a myriad of other problems, keeping restarting it won't fix anything.

The user will simply not have IPv6, which is perfectly fine as applications fall back to IPv4 and we have Happy Eyeballs anyway.

> There are numerous situations in which the network may not be fully
> available when AICCU is started.

Then do not start it yet when there is no network.

> I do not know of other cases in which it incorrectly crashes.

You claim 'crashes', again, you likely mean 'log and exit'. Fix the problem it reports.

> But if it
> crashes instead of implementing exponential backoff when the network starts
> having issues, this is also incorrect behavior.

If you start up your machine and you automatically start AICCU and there is no network connectivity, then there is nothing that AICCU can do about that situation. The user has to resolve it and then they can start AICCU.

> It should only crash when there is a clear and obvious misconfiguration that
> can only be fixed by the user.

No working IPv4 connectivity is a misconfiguration. It is a requirement to do IPv6 over IPv4 tunnels over.

Misconfigured clock is a misconfiguration, the user needs to configure their clock correctly and likely use NTP.

Wrong username/password etc, is a thing the user needs to fix.


Now, if there is actually a proper argument for this change, then please provide it, but the README clearly states that automatically restarting is not that.


I'll also remind again that we had to find out about this change from a user who noticed it. Nobody ever bothered to cc info (the upstream aiccu author) of this change. And clearly nobody bothered to read the short README.

Comment 16 Eric Hopper 2012-08-06 17:34:04 UTC

(In reply to comment #15)
> I think you have the term 'crash' confused with 'logs and exit', which is
> proper behaviour.

You are correct. And from the perspective of a user, there is little difference. It stops working and there is no apparent reason why. Users don't read log files. System administrators do. The set sometimes overlaps, but not all that often.

> What 'enormous' problems does AICCU exiting cause?

I use IPv6 to get connectivity to machines that are behind a NAT. These are machines that I administrate remotely. The users and/or corporations who own these machines are not capable of diagnosing the problem when AICCU logs and exits, they rely on me to do that.

In my particular case, I do a lot of testing to make sure that AICCU is working before I begin relying on it. So I will not be leaving it running in a state where it's constantly restarting over a problem that requires user intervention to correct.

But, of course, there are many system administrators and users who use this who are not me.

> If there is no network, a wrong NTP time, a wrong password and a myriad of
> other problems, keeping restarting it won't fix anything.
> 
> The user will simply not have IPv6, which is perfectly fine as applications
> fall back to IPv4 and we have Happy Eyeballs anyway.

Unfortunately, no, you happen to be wrong about this.

Modern systems frequently lose and re-acquire network connections. My personal MacBook Pro must lose and re-acquire its network connection 4-5 times a day at least. And it frequently does not have any network connectivity at all for periods of time.

It will automatically re-acquire network connectivity when it can with little or no intervention on my part.

Lack of network connectivity is not sufficient reason to 'log and exit'.

This is similarly true for broken time. My computer sometimes gets out of sync with the correct time by seconds or minutes. But it fixes itself without my intervention because I have it correctly configured to use NTP. Once it re-acquires a network connection and has a bit of time to learn what time it really is, the clock comes right back to being correct.

Neither of these are 'log and exit' situations. They will often fix themselves over time. And in particular, no network connectivity is most definitely not a 'log and exit' situation as retrying will most definitely not cause a DDOS. If there is no network connectivity, no DDOS is possible.

And when users need me to log into their computer remotely to fix something, they have no 'happy eyeballs'. They are, in fact, quite unhappy and will remain so until the problem (which is very rarely network connectivity) is fixed, or at the very least, diagnosed.

> Then do not start it yet when there is no network.

I was thinking through how to do this, and doing so is very non-trivial. There are so many situations in which the network can disappear or not be there when it starts. It's not possible for the system to handle this with a startup script.

Now, if AICCU were hooked into the NetworkManager infrastructure, it would probably be possible to make NetworkManager start AICCU whenever it acquired network connectivity. I do not believe that making this happen is a trivial exercise.

> You claim 'crashes', again, you likely mean 'log and exit'. Fix the problem
> it reports.

As I've explained, it 'logs and exits' in cases where it shouldn't, where the problem it reports will fix itself. Ordinary users cannot be expected to read log files to learn that AICCU got in a tiny little fit over the fact that it couldn't immediately talk to the tic server and decides to commit suicide in a fit of bizarre ennui.

> If you start up your machine and you automatically start AICCU and there is
> no network connectivity, then there is nothing that AICCU can do about that
> situation. The user has to resolve it and then they can start AICCU.

It can retry until there is network connectivity, which will frequently happen with no user intervention whatsoever.

The fact that this change has solved several people's problems is ample demonstration of the fact that yes, indeed, this problem fixes itself with zero user intervention. And, as I've previously stated, retrying in the case of no network connectivity is highly unlikely to result in a DDOS. It only will in situations in which there is connectivity outward, but the incoming packets are not making it back for some reason.

> No working IPv4 connectivity is a misconfiguration. It is a requirement to
> do IPv6 over IPv4 tunnels over.

It is not a misconfiguration. It is a common situation that occurs frequently on modern network setups. It often fixes itself.

Heck, the whole TCP protocol was designed with the idea that network connectivity might drop for hours (or days) at a time. And as long as the computers on either end still remember the state of the connection, it's just fine. So the idea that things might be just fine without network connectivity for minutes, hours or days at a time is not a new idea.

> Misconfigured clock is a misconfiguration, the user needs to configure their
> clock correctly and likely use NTP.

This is also something that might well fix itself. Particularly if the machine is configured to use NTP in conjunction with unreliable network connectivity. But I will agree that sometimes this is also a result of user misconfiguration and it will never fix itself.

> Now, if there is actually a proper argument for this change, then please
> provide it, but the README clearly states that automatically restarting is
> not that.

AICCU needs to change to take into account the fact that network connectivity is frequently unavailable or unreliable and handle the situation better. Until then, the only option is to restart. That is, unless AICCU can be clearer about why it exits. It might be possible to not restart in cases that truly are user misconfiguration. systemd is fairly flexible that way.

> I'll also remind again that we had to find out about this change from a user
> who noticed it. Nobody ever bothered to cc info (the upstream
> aiccu author) of this change. And clearly nobody bothered to read the short
> README.

Yes, you've got me (and a few other people) there. I should've read the README, and we should've tried reporting it upstream to see if we could get it fixed there.

Comment 17 Jeroen Massar 2012-08-06 18:29:18 UTC

This is getting long..... and the main issue remains:

PLEASE UNDO THIS PATCH before people start running this code and start hurting SixXS infrastructure!

This bug report and the "fix" or not bug reports (there are no logs included anywhere, or anything else) and the "fix" is going to hurt SixXS servers.


(In reply to comment #16)
> (In reply to comment #15)
> > I think you have the term 'crash' confused with 'logs and exit', which is
> > proper behaviour.
> 
> You are correct. And from the perspective of a user, there is little
> difference. It stops working and there is no apparent reason why. Users
> don't read log files. System administrators do. The set sometimes overlaps,
> but not all that often.

As there is no generic way of alerting a user of a problem except for said log file, there is nothing better that can be done from the AICCU pov.

What could possibly be done is that something presents a GUI element to the user indicating that they have an 'error' message in their logs and point this out to them.

This is IMHO out of scope of AICCU (which in the form that is released is not a GUI tool) and should be handled by the platform.

> > What 'enormous' problems does AICCU exiting cause?
> 
> I use IPv6 to get connectivity to machines that are behind a NAT. These are
> machines that I administrate remotely. The users and/or corporations who own
> these machines are not capable of diagnosing the problem when AICCU logs and
> exits, they rely on me to do that.
> 
> In my particular case, I do a lot of testing to make sure that AICCU is
> working before I begin relying on it. So I will not be leaving it running in
> a state where it's constantly restarting over a problem that requires user
> intervention to correct.

It should never have to restart. I run AICCU in various remote locations and it does not have a problem running for extremely long times, both in AYIYA and heartbeat mode, behind (a double) NAT and directly connected to the Internet.

If you do have issues where the connectivity breaks while it is running, then describe that sitation, gather detailed information and do please report it to info; I do not think that reporting it at the distribution is the right location, though it could be reported there, and then hopefully, finally the distribution's maintainer forwards that report to us.

> But, of course, there are many system administrators and users who use this
> who are not me.

People are different, that happens. Most people are just 'it is broken' and will never fix it. With this fix it will mean that their computer will be hammering every 10 seconds on our TIC servers and they will never notice it.

> > If there is no network, a wrong NTP time, a wrong password and a myriad of
> > other problems, keeping restarting it won't fix anything.
> > 
> > The user will simply not have IPv6, which is perfectly fine as applications
> > fall back to IPv4 and we have Happy Eyeballs anyway.
> 
> Unfortunately, no, you happen to be wrong about this.
> 
> Modern systems frequently lose and re-acquire network connections. My
> personal MacBook Pro must lose and re-acquire its network connection 4-5
> times a day at least. And it frequently does not have any network
> connectivity at all for periods of time.

There should not be a problem with this. If AICCU has started, it will, once, contact the TIC server and gather it's configuration details and keep on running.
If it loses connectivity or it's local IP address changes this configuration remains the same, which is why restarting does not have any effect. 

> It will automatically re-acquire network connectivity when it can with
> little or no intervention on my part.
> 
> Lack of network connectivity is not sufficient reason to 'log and exit'.

AICCU does not exit after it has retrieved it's TIC configuration. It configures the tunnel and keep on running. If you have reason to believe this is not the case, then please provide details so we can look into this.

Indeed, if you start it and then your connectivity is broken, it will log and exit, as well, it is broken and it cannot do anything about that situation.


> This is similarly true for broken time. My computer sometimes gets out of
> sync with the correct time by seconds or minutes. But it fixes itself
> without my intervention because I have it correctly configured to use NTP.
> Once it re-acquires a network connection and has a bit of time to learn what
> time it really is, the clock comes right back to being correct.

If you start AICCU with a broken clock it assumes the computer's clock is broken and the user has not setup NTP or another syncing mechanism. One can very simply do a "rdate <ntpserver>" before starting AICCU to force a time sync and all is fine.

AICCU, or more importantly the server side of the heartbeat protocol and AYIYA allow the time to be off by 2 minutes. Any packets being sent where the clock is off more than that are silently discarded. This is a protocol feature to avoid packet replay attacks.

In a similar way you need proper time for SSL certificates. As such, if your clock is wrong, your system is broken. Nothing AICCU could do about.


> Neither of these are 'log and exit' situations.

On startup it does this as AICCU cannot resolve it. When it is already running it will keep on running (as it then has no time reference to compare against which it gets from the TIC server), it just silently will break then though.

> They will often fix
> themselves over time.

There are large swats of SixXS users who have broken clocks. By notifying them at start time they have been fixing it, instead of just sitting there dumbfounded with "it does not work" on their face.

> And in particular, no network connectivity is most
> definitely not a 'log and exit' situation as retrying will most definitely
> not cause a DDOS. If there is no network connectivity, no DDOS is possible.

No network could be a lot of things, amongst others that the service is already overloaded by other clients etc. Restarts are not good.

As a small example: this bugzilla instance will not be happy if an automated tool starts createing new tickets every 10 seconds complaining about the fact that this "fix" is not accepted by the authors and runners of the SixXS service.
(it is really good that we are not that kind of people...)
Even though that client reporting those bugs automatically is broken in that it repeats it does not resolve anything. Yes, bad example.


> And when users need me to log into their computer remotely to fix something,
> they have no 'happy eyeballs'.

Please actually look up what Happy Eyeballs is (hint: RFC6555)

They have this on the client side and that should fall back to IPv4.

> They are, in fact, quite unhappy and will
> remain so until the problem (which is very rarely network connectivity) is
> fixed, or at the very least, diagnosed.

If they read the log file they will know the problem and then it can be fixed.
Nothing AICCU can do about these situations.

> > Then do not start it yet when there is no network.
> 
> I was thinking through how to do this, and doing so is very non-trivial.
> There are so many situations in which the network can disappear or not be
> there when it starts. It's not possible for the system to handle this with a
> startup script.

doing something like the following, once, at startup:
---------
while (!fping bugzilla.redhat.com)
{
   echo "No network yet, sleeping"
   sleep 5;
}

rdate ntp.redhat.com

start aiccu
--------

would do the trick perfectly fine.

Though that indeed does not guarantee that you actually have connectivity to either the TIC servers or even the PoP, it should be quite fine.


> Now, if AICCU were hooked into the NetworkManager infrastructure, it would
> probably be possible to make NetworkManager start AICCU whenever it acquired
> network connectivity. I do not believe that making this happen is a trivial
> exercise.

Likely no, and not every platform runs with or even supports Network Manager.

Also, do note that AICCU only needs to be started exactly ONCE, it will then retrieve config with TIC and all will be happy.


> > You claim 'crashes', again, you likely mean 'log and exit'. Fix the problem
> > it reports.
> 
> As I've explained, it 'logs and exits' in cases where it shouldn't, where
> the problem it reports will fix itself.

If the problem is there, it did not fix itself. As you state above you do not know how to easily resolve it at start time.

Proper suggestions are more than welcome though.

> Ordinary users cannot be expected to
> read log files to learn that AICCU got in a tiny little fit over the fact
> that it couldn't immediately talk to the tic server and decides to commit
> suicide in a fit of bizarre ennui.

If you misconfigure any other service it will also log and exit, that is how services work.

Even on Windows the pretty IIS service will log and fail and you will have to check the log files in that case.

> > If you start up your machine and you automatically start AICCU and there is
> > no network connectivity, then there is nothing that AICCU can do about that
> > situation. The user has to resolve it and then they can start AICCU.
> 
> It can retry until there is network connectivity, which will frequently
> happen with no user intervention whatsoever.

Retry sending packets to a destination that is determined to not work.
That does not resolve the problem.

> The fact that this change has solved several people's problems is ample
> demonstration of the fact that yes, indeed, this problem fixes itself with
> zero user intervention.

I only see 1 "people" claiming that this fixes their problem. Though what exactly the problem is is still unknown as no log files, crash dumps or actually any other information has been provided in this bug report.

The only thing is that 1 person "tested" this and it "worked" for them. It does affect every single other user of SixXS.

I know of, now already again, 2563 people who have not fixed anything (and they all got an email which they are obviously ignoring...).

> And, as I've previously stated, retrying in the case
> of no network connectivity is highly unlikely to result in a DDOS.

SixXS serves over 20000 users who are using our TIC servers. Your change causes it to try to contact our TIC servers every 10 seconds, that is thus possibly 20000 us eless connections per second, not even speaking about the used server resources by that.

We rather serve users who really are able to properly connect and retrieve details.

> It only
> will in situations in which there is connectivity outward, but the incoming
> packets are not making it back for some reason.

Or when the clock is misconfigured, or when the username/password is broken and lots of other situations.


> > No working IPv4 connectivity is a misconfiguration. It is a requirement to
> > do IPv6 over IPv4 tunnels over.
> 
> It is not a misconfiguration. It is a common situation that occurs
> frequently on modern network setups. It often fixes itself.


 
> Heck, the whole TCP protocol was designed with the idea that network
> connectivity might drop for hours (or days) at a time.

Great example. TCP connections stop working (disconnect) when there is no SYN/ACK for a while from the remote peer. Hence why things like MOSH and DTN exist.

Also note that TCP does not automatically retry to build things up. The application layer takes care of that.

I guess you have used SSH and tried to swap from one network (eg wifi) to another( eg wired) once. Did your connectivity break, yep.


> And as long as the
> computers on either end still remember the state of the connection, it's
> just fine. So the idea that things might be just fine without network
> connectivity for minutes, hours or days at a time is not a new idea.

If TCP did that, neither MOSH or DTN would exist. You came up with a very bad example which is totally unrelated to any problem you are trying to describe.

 
> > Misconfigured clock is a misconfiguration, the user needs to configure their
> > clock correctly and likely use NTP.
> 
> This is also something that might well fix itself.

The 'might' is the key. Lots of people do not set up their NTP correctly.

> Particularly if the
> machine is configured to use NTP in conjunction with unreliable network
> connectivity. But I will agree that sometimes this is also a result of user
> misconfiguration and it will never fix itself.

Bingo. And that is why AICCU notifies the user in a log and exits. The user can then setup NTP properly and the problem is resolved.

 
> > Now, if there is actually a proper argument for this change, then please
> > provide it, but the README clearly states that automatically restarting is
> > not that.
> 
> AICCU needs to change to take into account the fact that network
> connectivity is frequently unavailable or unreliable and handle the
> situation better.

Once AICCU is started successfully it should keep on working fine.

> Until then, the only option is to restart.

No, the only option is to resolve the issue. If the problem is not happening at start time, then please provide logs and details so that the actual issue can be researched and resolved properly.

Please note that the popular "restart it and it works again" way is not a proper bug fix. You will have to define the actual bug first though.

> That is, unless AICCU can be clearer about why it exits.

As you have not shown a single log file entry or crash dump it is completely unknown what could be made clearer.

It is obvious from this "bug report" that even the maintainers do not read simple doc/README's.

> It might be possible to not restart
> in cases that truly are user misconfiguration. systemd is fairly flexible
> that way.

What problem does or would systemd resolve?


> > I'll also remind again that we had to find out about this change from a user
> > who noticed it. Nobody ever bothered to cc info (the upstream
> > aiccu author) of this change. And clearly nobody bothered to read the short
> > README.
> 
> Yes, you've got me (and a few other people) there. I should've read the
> README, and we should've tried reporting it upstream to see if we could get
> it fixed there.

Start first by reversing this broken fix for a problem that is undefined.

Then provide log output and problem discriptions.

Instead of this thread we could have likely already determined the cause of your real issue and resolved it.

Comment 18 Eric Hopper 2012-08-06 21:32:12 UTC

There is indeed still a problem with AICCU, and it is doing the wrong thing. It aborts over a problem that causes no issues for SixXS, is a problem that fixes itself, and for which it isn't reasonable to figure out a decent way of handling it.

But I will agree that your DDoS concerns are quite valid, and this patch should likely be pulled. :-/

But, AICCU should stop 'logging and exiting' on trivial issues that will solve themselves with no user intervention.

It is true that DNS will abort of the config file is wrong. But it doesn't abort if it can't contact a root server on startup, it simply tries until it can. Numerous other services behave in the same way. A problem that is likely transient is treated as something to possibly complain about, but not abort over. A problem in the config file that results in permanent breakage causes errors in the log file and an abort. Even a problem in the config file that is merely questionable or easily worked around will often result in logging, but not exiting.

I will keep the patch applied on my servers regardless as it causes a AICCU installation that was broken (yes, AICCU IS BROKEN if it aborts on startup when not able to connect to the network instantly) to actually work. But it is the wrong fix to be rolled out in general. The right fix is modifying AICCU to actually do the right thing and not abort on transient errors. I may look into that myself.

Comment 19 Jeroen Massar 2012-08-06 21:45:31 UTC

(In reply to comment #18)
> There is indeed still a problem with AICCU, and it is doing the wrong thing.
> It aborts over a problem that causes no issues for SixXS, is a problem that
> fixes itself, and for which it isn't reasonable to figure out a decent way
> of handling it.

Please define your exact problem, provide logs and the exact situation and file a bug report to info. I see exactly 0 emails from you there.

> But I will agree that your DDoS concerns are quite valid, and this patch
> should likely be pulled. :-/
> 
> But, AICCU should stop 'logging and exiting' on trivial issues that will
> solve themselves with no user intervention.

They are not trivial and most users will not fix them.

AICCU also cannot fix them as it is a problem of other parts of the system (system clock, network connectivity) that it has no effect on.


> It is true that DNS will abort of the config file is wrong. But it doesn't
> abort if it can't contact a root server on startup, it simply tries until it
> can.

I don't know what "DNS" you are meaning, but likely you are talking about a recursive DNS server. Please not that various recursive servers do abort in that situation, cause startup delays etc. Sendmail is a notorious one in when DNS/connectivity does not work it will not start and delay the rest of the boot process till it times out.


> Numerous other services behave in the same way. A problem that is
> likely transient is treated as something to possibly complain about, but not
> abort over.

We made the decision to let AICCU abort as then hopefully the user reads the log and resolves the problem.

If the user has resolved the problem a start will work. Typically this mean they have finally configured their clock correctly, set their username/password correctly etc.

VPN-alike tools (which AICCU effectively is) are not meant for full unattended operation. OpenVPN, tinc etc all abort when they cannot reach their initial server connection for the same reason.

> A problem in the config file that results in permanent breakage
> causes errors in the log file and an abort. Even a problem in the config
> file that is merely questionable or easily worked around will often result
> in logging, but not exiting.

For AICCU, TIC is the configuration, without those details it cannot do anything.

Note also that for a static tunnel configuration AICCU will exit, as it has nothing to do anymore. Your patch causes it now to restart every 10 seconds even though the tunnel is likely perfectly correctly configured.

 
> I will keep the patch applied on my servers regardless as it causes a AICCU
> installation that was broken (yes, AICCU IS BROKEN if it aborts on startup
> when not able to connect to the network instantly) to actually work.

It is your opinion of broken.

> But it
> is the wrong fix to be rolled out in general. The right fix is modifying
> AICCU to actually do the right thing and not abort on transient errors. I
> may look into that myself.

Unless you are going to make AICCU fix the network, fix the clock (which it could in theory if one would force the clock to be set from the date by the TIC server), fix username/password, fix any weird connection settings etc, there is not much to to be done here.

It's a VPN tool, not a magic fix it tool, there are too many failure scenarios to cover to make that work.

Comment 20 Eric Hopper 2012-08-06 22:21:53 UTC

(In reply to comment #19)
> Please define your exact problem, provide logs and the exact situation and
> file a bug report to info. I see exactly 0 emails from you there.

Given this exchange, I have absolutely 0 confidence that you are in the least bit interested in fixing anything. I expect that my email will be silently eaten and I'll never hear another word, or a long tirade of abuse for having a problem with how it works.

As it stands, your software fails to work for me as it does not provide connectivity absent user intervention and there is simply no reasonable way for it to do so. With my patch, it works for what I need it to do.

> They are not trivial and most users will not fix them.

They, in fact, are completely trivial and transient, and they already fix themselves with no user intervention. Otherwise I, and the other person who claimed that it fixed their problem would not have had it fix their problem.

> AICCU also cannot fix them as it is a problem of other parts of the system
> (system clock, network connectivity) that it has no effect on.

It can't 'fix' them no. But it can avoid aborting on something that will trivially fix itself. The clock issue is very tricky, but a lack of network connectivity is easy. Looking at the code I already have a plan for how to solve that issue that won't cause SixXS any problems.

> Note also that for a static tunnel configuration AICCU will exit, as it has
> nothing to do anymore. Your patch causes it now to restart every 10 seconds
> even though the tunnel is likely perfectly correctly configured.

Yep, that is a huge issue. And good reason for this patch to be immediately reverted.

> Unless you are going to make AICCU fix the network, fix the clock (which it
> could in theory if one would force the clock to be set from the date by the
> TIC server), fix username/password, fix any weird connection settings etc,
> there is not much to to be done here.

I will simply have it back off on certain kinds of connection failure and try again. I will have it give up if it's been trying for a certain length of time without success.

It will start at 5 seconds, and have the delay go up by 50% with each try (up to a max of 300 seconds) and give up after 24 hours or so.

It will abort immediately if the server sends a non-200 response to the initial connection attempt (though it may instead retry in 1200-7200 seconds or so), a time sync problem (until I can think of better behavior) and login failure. I notice that a challenge response system is used instead of cleartext passwords. That's a big positive.

Weird response to the initial connection attempt is problematic as it's possible someone configured it to talk to a server that isn't even a TIC server. And that would be a big problem. I should look at the draft spec you have to see what kind of response the server is supposed to send. Maybe there's something that identifies it positively as a TIC server. Because if it isn't a TIC server, the thing to do is immediately abort.

Comment 21 Conrad Meyer 2012-08-06 23:55:03 UTC

(In reply to comment #14)
> What about you start the daemon AFTER network connectivity is there and the
> time is properly synced.

Again: we don't have a good way of doing this.

> "Abandonded Five years dead piece"???
> 
> Changes are made regularly and patches from various people are integrated
> too: http://www.sixxs.net/tools/aiccu/changelog

Then please, update your source tarball link. When I go to download the source tarball, I get the 2007xxxx tarball; this tells me the latest release hasn't been updated since 2007! Can you understand why I thought it was five years dead now?

> A patch for what? This bug report claims 'crash' while there is none.

A patch to either aiccu or the package itself that delays startup until the network is available.

Comment 22 Eric Hopper 2012-08-06 23:55:53 UTC

(In reply to comment #12)
> The common complaint seems to be that aiccu will crash when it cannot
> contact the network; restarting in 10 seconds seems reasonable, because the
> network may be up by then. The patch resolves this problem. On network
> outage, of course, aiccu will not ddos sixxs.
> 
> Do you think there are enough wrongly configured clients to ddos?
> 
> I don't have a better solution to this and I'm not about to dive into an
> abandoned, five years dead piece of software. If you have a patch, it is
> welcome. Otherwise I'll end up orphaning or killing aiccu in Fedora.

Could you please revert this patch and keep it from being propagated? My contentious conversation with Jeroen Massar has led me to believe that this patch is problematic for SixXS, and also will cause weird problems and sub-optimal behavior for people who are using AICCU to set up static tunnels.

I will work on a patch that makes AICCU more robust about restarting when connecting to the tic server fails for network connectivity reasons. I make no promises about having it done in any particular timeframe. In the meantime, this bug can stay as a suggestion (with lots of caveats) for how to handle the problem until there's a real fix.

Comment 23 Conrad Meyer 2012-08-06 23:56:08 UTC

(In reply to comment #21)
> (In reply to comment #14)
> > "Abandonded Five years dead piece"???
> > 
> > Changes are made regularly and patches from various people are integrated
> > too: http://www.sixxs.net/tools/aiccu/changelog
> 
> Then please, update your source tarball link. When I go to download the
> source tarball, I get the 2007xxxx tarball; this tells me the latest release
> hasn't been updated since 2007! Can you understand why I thought it was five
> years dead now?

Oh, this bit also smells like dead upstream: "Latest UNIX/Console version: 2007.01.15"

Comment 24 Conrad Meyer 2012-08-07 00:07:01 UTC

(In reply to comment #22)
> (In reply to comment #12)
> > The common complaint seems to be that aiccu will crash when it cannot
> > contact the network; restarting in 10 seconds seems reasonable, because the
> > network may be up by then. The patch resolves this problem. On network
> > outage, of course, aiccu will not ddos sixxs.
> > 
> > Do you think there are enough wrongly configured clients to ddos?
> > 
> > I don't have a better solution to this and I'm not about to dive into an
> > abandoned, five years dead piece of software. If you have a patch, it is
> > welcome. Otherwise I'll end up orphaning or killing aiccu in Fedora.
> 
> Could you please revert this patch and keep it from being propagated? My
> contentious conversation with Jeroen Massar has led me to believe that this
> patch is problematic for SixXS, and also will cause weird problems and
> sub-optimal behavior for people who are using AICCU to set up static tunnels.
> 
> I will work on a patch that makes AICCU more robust about restarting when
> connecting to the tic server fails for network connectivity reasons. I make
> no promises about having it done in any particular timeframe. In the
> meantime, this bug can stay as a suggestion (with lots of caveats) for how
> to handle the problem until there's a real fix.

Yeah, that's what I'm doing now. The plan for the patch is, basically retry to fetch configuration from TIC in a loop. On success, exit the loop; on network error, continue; on configuration error, log and exit as per status quo.

Having a hostile upstream is wonderful.

Comment 25 Jeroen Massar 2012-08-07 00:13:41 UTC

(In reply to comment #20)
> (In reply to comment #19)
> > Please define your exact problem, provide logs and the exact situation and
> > file a bug report to info. I see exactly 0 emails from you there.
> 
> Given this exchange, I have absolutely 0 confidence that you are in the
> least bit interested in fixing anything.

I've asked in this very thread already several times for log files, core dumps and related things. I've not seen these yet.

But indeed, the problem you are describing should be resolved outside of AICCU.

And I've even provided a suggested solution to your problem in this very thread, a solution mind you that can be done outside of AICCU that does not affect other users of AICCU.

> I expect that my email will be
> silently eaten and I'll never hear another word, or a long tirade of abuse
> for having a problem with how it works.

Like everybody else you will be requested to be provide actual data instead of saying that things are broken and bad.

> As it stands, your software fails to work for me as it does not provide
> connectivity absent user intervention and there is simply no reasonable way
> for it to do so.

It is designed that way, it is documented that way. If something is broken it reports it and the operator can fix it, AICCU cannot automatically fix these problems.

> With my patch, it works for what I need it to do.

With your patch you are creating problems for a lot of users and it does not fix the real issue that has a cause outside of AICCU.

> > They are not trivial and most users will not fix them.
> 
> They, in fact, are completely trivial and transient, and they already fix
> themselves with no user intervention.

NTP does not automatically get installed or configured correctly when it is not installed yet or configured properly. Clock screw in systems does not resolve it self automatically either. Username/passwords do not fix themselves either.

Lots of things do not fix themselves. Your 'solution' does not fix things, it only hides problems and causes problems for our services.

> Otherwise I, and the other person who
> claimed that it fixed their problem would not have had it fix their problem.

You claim that it fixes things, yet you are completely unable to provide even a single log line showing the problem.

> > AICCU also cannot fix them as it is a problem of other parts of the system
> > (system clock, network connectivity) that it has no effect on.
> 
> It can't 'fix' them no. But it can avoid aborting on something that will
> trivially fix itself.

It can not know that things fix itself. It notifies the operator who can do so though.

> The clock issue is very tricky, but a lack of network
> connectivity is easy.

Yes, it is very easy, if you actually read the messages you wrote you had the answer already.

> > Note also that for a static tunnel configuration AICCU will exit, as it has
> > nothing to do anymore. Your patch causes it now to restart every 10 seconds
> > even though the tunnel is likely perfectly correctly configured.
> 
> Yep, that is a huge issue. And good reason for this patch to be immediately
> reverted.

I do sincerely hope that happens ASAP.

> > Unless you are going to make AICCU fix the network, fix the clock (which it
> > could in theory if one would force the clock to be set from the date by the
> > TIC server), fix username/password, fix any weird connection settings etc,
> > there is not much to to be done here.
> 
> I will simply have it back off on certain kinds of connection failure and
> try again. I will have it give up if it's been trying for a certain length
> of time without success.
>
> It will start at 5 seconds, and have the delay go up by 50% with each try
> (up to a max of 300 seconds) and give up after 24 hours or so.

Back-off strategies do not resolve the real issue. It only hides them.

> It will abort immediately if the server sends a non-200 response to the
> initial connection attempt (though it may instead retry in 1200-7200 seconds
> or so), a time sync problem (until I can think of better behavior) and login
> failure. I notice that a challenge response system is used instead of
> cleartext passwords. That's a big positive.
> 
> Weird response to the initial connection attempt is problematic as it's
> possible someone configured it to talk to a server that isn't even a TIC
> server. And that would be a big problem. I should look at the draft spec you
> have to see what kind of response the server is supposed to send. Maybe
> there's something that identifies it positively as a TIC server. Because if
> it isn't a TIC server, the thing to do is immediately abort.

You are trying to second guess the system. Please do not do that.

Comment 26 Conrad Meyer 2012-08-07 00:29:30 UTC

(In reply to comment #25)
> > As it stands, your software fails to work for me as it does not provide
> > connectivity absent user intervention and there is simply no reasonable way
> > for it to do so.
> 
> It is designed that way, it is documented that way. If something is broken
> it reports it and the operator can fix it, AICCU cannot automatically fix
> these problems.

No. Fedora functions in a network-variable environment. It is not really acceptable to just abort on network outage, especially for system daemons. Because you think it is desired behavior, we will have to patch around it. That's all.

> > > They are not trivial and most users will not fix them.
> > 
> > They, in fact, are completely trivial and transient, and they already fix
> > themselves with no user intervention.
> 
> NTP does not automatically get installed or configured correctly when it is
> not installed yet or configured properly. Clock screw in systems does not
> resolve it self automatically either. Username/passwords do not fix
> themselves either.
> 
> Lots of things do not fix themselves. Your 'solution' does not fix things,
> it only hides problems and causes problems for our services.

The one thing you left out -- network connectivity -- does. So we will retry this and fail out when some misconfiguration is detected.

> > Otherwise I, and the other person who
> > claimed that it fixed their problem would not have had it fix their problem.
> 
> You claim that it fixes things, yet you are completely unable to provide
> even a single log line showing the problem.

Why are you so hostile? The reporter has clearly described the problem and the symptoms. Logs won't tell you anything you don't know already; you're just being difficult. Stop it.

> > > AICCU also cannot fix them as it is a problem of other parts of the system
> > > (system clock, network connectivity) that it has no effect on.
> > 
> > It can't 'fix' them no. But it can avoid aborting on something that will
> > trivially fix itself.
> 
> It can not know that things fix itself. It notifies the operator who can do
> so though.

The network can. It is very easy to just wait.

Re-opening for a better patch...

Comment 27 Fedora Update System 2012-08-07 00:32:42 UTC

aiccu-2007.01.15-15.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/aiccu-2007.01.15-15.fc17

Comment 28 Fedora Update System 2012-08-07 00:32:52 UTC

aiccu-2007.01.15-15.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/aiccu-2007.01.15-15.fc16

Comment 29 Eric Hopper 2012-08-07 00:37:16 UTC

(In reply to comment #24)
> Yeah, that's what I'm doing now. The plan for the patch is, basically retry
> to fetch configuration from TIC in a loop. On success, exit the loop; on
> network error, continue; on configuration error, log and exit as per status
> quo.

It also needs different behavior on 'aiccu start' vs. 'aiccu test'. Presumably, if someone is doing 'aiccu test' they will actually be paying attention to what it says and want immediate feedback about what's going wrong. And those features of aiccu should be documented. If someone gets really extra ambitious they could even create a manpage or something!

> Having a hostile upstream is wonderful.

Yeah. This whole thread led me to look things up about SixXS. They have a reputation for being very 'friendly' that way. I'm a little worried my tunnels will be taken down over this. I wish AYIYA (and hence AICCU) wasn't so incredibly useful. *sigh*

Maybe I can convince he.net to run an AYIYA tunnel. :-)

Comment 30 Eric Hopper 2012-08-07 00:39:00 UTC

https://bitbucket.org/omnifarious/aiccu/

Comment 31 Jeroen Massar 2012-08-07 00:49:29 UTC

(In reply to comment #21)
> (In reply to comment #14)
> > What about you start the daemon AFTER network connectivity is there and the
> > time is properly synced.
>
> Again: we don't have a good way of doing this.

See my proposal in this very thread for a very good solution.

> > "Abandonded Five years dead piece"???
> >
> > Changes are made regularly and patches from various people are integrated
> > too: http://www.sixxs.net/tools/aiccu/changelog
>
> Then please, update your source tarball link. When I go to download the
> source tarball, I get the 2007xxxx tarball; this tells me the latest release
> hasn't been updated since 2007! Can you understand why I thought it was five
> years dead now?

There are no critical bug fixes since 2007 for non-Windows systems, as such,
no new release was made as non was neccesary.

If you as a downstream have issues, ever thought about contacting the upstream?

I see exactly 0 messages from you on info thus you obviously also did
not bother to ask around.

Instead you are applying patches that are supposed to fix things but that are
causing problems for a lot of users as you are not aware of all the implications
of changing things.

> > A patch for what? This bug report claims 'crash' while there is none.
>
> A patch to either aiccu or the package itself that delays startup until the
> network is available.

I proposed a simple solution for this that can be implemented in a very simple
shell script, eg the startup script, see this very thread.

(In reply to comment #22)
> (In reply to comment #12)
> > The common complaint seems to be that aiccu will crash when it cannot
> > contact the network; restarting in 10 seconds seems reasonable, because the
> > network may be up by then. The patch resolves this problem. On network
> > outage, of course, aiccu will not ddos sixxs.
> >
> > Do you think there are enough wrongly configured clients to ddos?
> >
> > I don't have a better solution to this and I'm not about to dive into an
> > abandoned, five years dead piece of software. If you have a patch, it is
> > welcome. Otherwise I'll end up orphaning or killing aiccu in Fedora.
>
> Could you please revert this patch and keep it from being propagated? My
> contentious conversation with Jeroen Massar has led me to believe that this
> patch is problematic for SixXS, and also will cause weird problems and
> sub-optimal behavior for people who are using AICCU to set up static tunnels.

This is indeed one of the many failure scenarios why this patch is very problematic.

> I will work on a patch that makes AICCU more robust about restarting when
> connecting to the tic server fails for network connectivity reasons. I make
> no promises about having it done in any particular timeframe. In the
> meantime, this bug can stay as a suggestion (with lots of caveats) for how
> to handle the problem until there's a real fix.

Unfortunately bad ideas cannot be removed from the Internet. I do hope that
people at least read the whole thread before thinking this is the super fix
for a problem that is not even well described.

(In reply to comment #26)
> (In reply to comment #25)
> > > As it stands, your software fails to work for me as it does not provide
> > > connectivity absent user intervention and there is simply no reasonable way
> > > for it to do so.
> > 
> > It is designed that way, it is documented that way. If something is broken
> > it reports it and the operator can fix it, AICCU cannot automatically fix
> > these problems.
> 
> No. Fedora functions in a network-variable environment. It is not really
> acceptable to just abort on network outage, especially for system daemons.

Then file a bug with OpenVPN, tinc, pppoe and every other VPN-alike tool in the system as they all behave in that manner.

Yes, AICCU is a daemon, but not a standard one. It requires reaching out to external servers and thus without network connectivity and other correctness it cannot function.

> Because you think it is desired behavior, we will have to patch around it.
> That's all.

I'll state again: Clients which are detected to automatically reconnect will be auto disabled.

Please define and resolve the real problem, and do not try to patch around it.

> > > > They are not trivial and most users will not fix them.
> > > 
> > > They, in fact, are completely trivial and transient, and they already fix
> > > themselves with no user intervention.
> > 
> > NTP does not automatically get installed or configured correctly when it is
> > not installed yet or configured properly. Clock screw in systems does not
> > resolve it self automatically either. Username/passwords do not fix
> > themselves either.
> > 
> > Lots of things do not fix themselves. Your 'solution' does not fix things,
> > it only hides problems and causes problems for our services.
> 
> The one thing you left out -- network connectivity -- does. So we will retry
> this and fail out when some misconfiguration is detected.

As you obviously do not realize even half of the problems that can occur, you will only create more problems.

The nicest thing is that we'll get more mail at info and will not be able to help those users as we do not know then what weird changes where made.

We'll have to just redirect them to Redhat then it seems. I guess you'll love helping those people as you seem to know very well how the system works.


> > > Otherwise I, and the other person who
> > > claimed that it fixed their problem would not have had it fix their problem.
> > 
> > You claim that it fixes things, yet you are completely unable to provide
> > even a single log line showing the problem.
> 
> Why are you so hostile? The reporter has clearly described the problem and
> the symptoms. Logs won't tell you anything you don't know already; you're
> just being difficult. Stop it.

There is no clear problem description. There is only rants and accusations.

I suggest to revert this patch and open a NEW bug report which does clearly summarise the problem you claim to have instead. A fresh start seems very useful.

Then again, likely it will end up in a repeat of all the arguments and questions given here.


> > > > AICCU also cannot fix them as it is a problem of other parts of the system
> > > > (system clock, network connectivity) that it has no effect on.
> > > 
> > > It can't 'fix' them no. But it can avoid aborting on something that will
> > > trivially fix itself.
> > 
> > It can not know that things fix itself. It notifies the operator who can do
> > so though.
> 
> The network can. It is very easy to just wait.

Which is something I proposed in this very thread and which does not require any changes to AICCU itself. It does require a change to the init script though.


> Re-opening for a better patch...

I do hope you mean with that a instant revert of the current one first...


(In reply to comment #29)
> (In reply to comment #24)
> > Yeah, that's what I'm doing now. The plan for the patch is, basically retry
> > to fetch configuration from TIC in a loop. On success, exit the loop; on
> > network error, continue; on configuration error, log and exit as per status
> > quo.
> 
> It also needs different behavior on 'aiccu start' vs. 'aiccu test'.

Amongst others.

> Presumably, if someone is doing 'aiccu test' they will actually be paying
> attention to what it says and want immediate feedback about what's going
> wrong. And those features of aiccu should be documented. If someone gets
> really extra ambitious they could even create a manpage or something!

Or just type 'aiccu help' which details this.

> > Having a hostile upstream is wonderful.
> 
> Yeah. This whole thread led me to look things up about SixXS. They have a
> reputation for being very 'friendly' that way.

People who abuse the system indeed get what they deserve. One will always piss off one person or another if you reach enough users.

> I'm a little worried my tunnels will be taken down over this.

Wow, you really think that will happen?

We are trying to FIX problems, not make nonsense patches.

We are trying to open dialog, not mess things up.

> I wish AYIYA (and hence AICCU) wasn't so incredibly useful. *sigh*

Isn't it great that somebody made that protocol to solve problems for other people and is providing that service for nearly 10 years already in their spare time?

Please realize that SixXS exists to solve problems, not to cause more of them.

> Maybe I can convince he.net to run an AYIYA tunnel. :-)

They had a PPTP beta service for a while which addressed parts of the problem, it apparently did not scale enough or was crashy (as in core dumps and aborts). Not everybody is able to make a stable service it seems and I guess they needed the /15 IPv4 space for other purposes than giving it away for free.

That said though, they tried to 'buy' SixXS several years ago, guess they didn't realize it was a hobby project that is run for fun and education and general public good...


(In reply to comment #24)
> Yeah, that's what I'm doing now. The plan for the patch is, basically retry
> to fetch configuration from TIC in a loop. On success, exit the loop; on
> network error, continue; on configuration error, log and exit as per status
> quo.

I'll reference it another time: which part of doc/README:
8<--------------------------------
WARNING: never run AICCU from DaemonTools or a similar automated
'restart' tool/script. When AICCU does not start, it has a reason
not to start which it gives on either the stdout or in the (sys)log
file. The TIC server *will* automatically disable accounts which
are detected to run in this mode. Use 'verbose true' to see more
information which is especially handy when starting fails.
------------------------------------------------------------->8

is unclear?...... I am really wondering.

> Having a hostile upstream is wonderful.

What you are proposing is wrong and causes problems to our infrastructure
and thus other users, that is why we react in a, what you perceive as, hostile way.

Having people modify code, distribute it to users and having us take
the cost of more complaints that things are broken is VERY annoying
and time consuming, time better spend on actually resolving matters.

Note also that this 'hostile' upstream contacted the downstream with
the broken patch instead of the other way around and that we are asking
for problem reports and that you report things.

Please instead of trying to come up with improper solutions and insults
towards us, consider that we are spending large amounts of our free time
on the SixXS service (which is not a company, it is just a spare time
hobby project, just like Fedora/Redhat/etc is for others) as a whole
and that every time spend on handling people who break things is time
not spend on resolving other matters. I am fairly sure that you do not
like it if your stuff you spend time on gets broken or misused either.


As such, I'll state for the final time:
 - Please contact info in case of issues
 - Please provide log files and descriptions where possible
   (there is a nice list on http://www.sixxs.net/contact/ of what is useful)
 - In case you think there is something to enhance, do bring it up as otherwise
   the only way we found out what happens is when it is too late.

And if you really are of the opinion that your 'problems will be ignored',
then open a bug here, and cc info on the bugreport, as then
we are in the loop on things and can give proper comments before things
escalate like this here.

Comment 32 Conrad Meyer 2012-08-07 01:19:38 UTC

Created attachment 602614 [details]
Patch to retry TIC network connection on net outtage

Eric Hopper, Jeroen Massar: Attached patch is what I have in mind. Any complaints?

Comment 33 Conrad Meyer 2012-08-07 01:25:53 UTC

Created attachment 602615 [details]
Patch to retry on net outage

Slight modification to previous patch: only log failure to connect once. On eventual success, log an info-level message iff there was a failure message before.

Comment 34 Eric Hopper 2012-08-07 01:30:20 UTC

(In reply to comment #33)
> Created attachment 602615 [details]
> Patch to retry on net outage
> 
> Slight modification to previous patch: only log failure to connect once. On
> eventual success, log an info-level message iff there was a failure message
> before.

That looks good to me. Simple and effective. I was going to get fancier, but really simpler is probably a lot better. If the connect fails there is no real need for exponential backoff because there isn't any connectivity anyway.

Comment 35 Eric Hopper 2012-08-07 01:34:25 UTC

https://bitbucket.org/omnifarious/aiccu/changeset/4d2e1ddb2750bbfd7faac51bbbef1647c2ccaf8f

Comment 36 Conrad Meyer 2012-08-07 01:41:02 UTC

(In reply to comment #34)
> That looks good to me. Simple and effective. I was going to get fancier, but
> really simpler is probably a lot better. If the connect fails there is no
> real need for exponential backoff because there isn't any connectivity
> anyway.

Thanks. Yeah, that is more or less what I'm thinking. 15 seconds isn't going to kill power efficiency / shouldn't cause other issues; it'll respond more quickly to the network coming back online than exp. back-off.

(In reply to comment #35)
> https://bitbucket.org/omnifarious/aiccu/changeset/
> 4d2e1ddb2750bbfd7faac51bbbef1647c2ccaf8f

Looks like the patch got clobbered a little -- there's an extra "tic" at the top of unix-console/main.c that shouldn't be there.

I'll wait for feedback from Jeroen before pushing this further. Given the previous conversation, I shouldn't have to wait long ;).

Comment 37 Eric Hopper 2012-08-07 01:57:23 UTC

(In reply to comment #36)
> Looks like the patch got clobbered a little -- there's an extra "tic" at the
> top of unix-console/main.c that shouldn't be there.

Oops. :-)

https://bitbucket.org/omnifarious/aiccu/changeset/65fab5dc784e9a3a3a5e1283fd79fb8170b73e18

> I'll wait for feedback from Jeroen before pushing this further. Given the
> previous conversation, I shouldn't have to wait long ;).

Likely not, though he might've gone to bed, since I think he lives in Europe somewhere.

Comment 38 Lars Seipel 2012-08-07 05:35:38 UTC

Do you really think patching the aiccu source over this is such a good idea?

After all, it would be useful to have a system-wide way to start services after the network is *really* up. I don't see any trivial solution for this, though.

I (obviously) didn't think this through when I confirmed the objectionable fix. Yeah, it fixed _my_ issue of a correctly configured aiccu failing because it couldn't resolve tic.sixxs.net and then succeeding 10 secs later. But it created many other potential problems including the static tunnel case.

So, now that the original patch is reverted, could we maybe find a solution that is either acceptable for inclusion by aiccu upstream or (even better) a distribution-wide method that allows us to start services after network connectivity is established?

Waiting for Jeroen's feedback...
(it's 7 AM in the morning here in Central Europe ;-))

Comment 39 Conrad Meyer 2012-08-07 06:50:56 UTC

(In reply to comment #38)
> Do you really think patching the aiccu source over this is such a good idea?

Sure. We don't have another good way of determining that the network is up -- systemd folks don't like daemons relying on this -- aside from NetworkManager integration. And upstream appears to dislike this approach, and it's more complicated than our simple patch. The patch improves the usability of aiccu when the only issue is network outage (very likely during early bootup).

> After all, it would be useful to have a system-wide way to start services
> after the network is *really* up. I don't see any trivial solution for this,
> though.

Yes, I agree. Systemd doesn't provide this, though, and NetworkManager integration is non-trivial. I'm a volunteer, not a full time engineer =).

> So, now that the original patch is reverted, could we maybe find a solution
> that is either acceptable for inclusion by aiccu upstream or (even better) a
> distribution-wide method that allows us to start services after network
> connectivity is established?

The second idea isn't really possible. Hopefully the patch is either upstream-able, or can at least be cleaned up in such a way that upstream would accept it. If Jeroen has valid objections, we may drop it. Waiting to hear back from him =).

Comment 40 Jeroen Massar 2012-08-07 13:46:57 UTC

(In reply to comment #36)
> (In reply to comment #34)
> > That looks good to me. Simple and effective. I was going to get fancier, but
> > really simpler is probably a lot better. If the connect fails there is no
> > real need for exponential backoff because there isn't any connectivity
> > anyway.
> 
> Thanks. Yeah, that is more or less what I'm thinking. 15 seconds isn't going
> to kill power efficiency / shouldn't cause other issues; it'll respond more
> quickly to the network coming back online than exp. back-off.

Lets assume the situation that somebody was overloading the TIC server already, or that some network issue causes connections to fail. By doing a connection every 15 seconds, and at the moment there are two published TIC servers, you are doing 8 connections per minute, 8*60 = 480 per hour, 11520 per day.

Guess what, that client will never ever get out of the ratelimitted queue because of that.

Next to that, because there is absolute silence after the first log message, the user will never be able to find out that this is doing it when the log message was rotated away already and just determine that things are broken...

That is not the way to solve your problem. Please see the proposal I made in this very thread for a much better alternative if you are so insistent to require it to work when starting it during broken network connectivity (and other undefined situations).


(In reply to comment #38)
[..]
> So, now that the original patch is reverted, could we maybe find a solution
> that is either acceptable for inclusion by aiccu upstream or (even better) a
> distribution-wide method that allows us to start services after network
> connectivity is established?

Just to be clear: I am all for a proper solution.

Restarting inside AICCU is not that. See my proposal above which is being ignored...

 
> Waiting for Jeroen's feedback...
> (it's 7 AM in the morning here in Central Europe ;-))

Where I am not at the moment... things with work, travel and private life next to this thing called the Internet.

(In reply to comment #39)
> (In reply to comment #38)
> > Do you really think patching the aiccu source over this is such a good idea?
> 
> Sure. We don't have another good way of determining that the network is up

Ping/access a well known host that should be up is a really effective way of checking this. This is what Apple and Microsoft do with their products, that little thing called a connectivity-check.

One can easily combine this with fetching the current NTP date, if that succeeds you have connectivity and proper time in one go. See my proposal in this thread (search for 'rdate').

> -- systemd folks don't like daemons relying on this -- aside from
> NetworkManager integration. And upstream appears to dislike this approach,

As I have not seen a proposal of how things could be integrated with NetworkManager, there is little I can say if that would be a proper way to resolve your problem or not.


> > After all, it would be useful to have a system-wide way to start services
> > after the network is *really* up. I don't see any trivial solution for this,
> > though.
> 
> Yes, I agree. Systemd doesn't provide this, though, and NetworkManager
> integration is non-trivial. I'm a volunteer, not a full time engineer =).

Most people in open source and even these little services like SixXS who provide AICCU are volunteers and do it in their free time, next to generally being full time engineers which take the rest of their time next to private life etc.

> > So, now that the original patch is reverted, could we maybe find a solution
> > that is either acceptable for inclusion by aiccu upstream or (even better) a
> > distribution-wide method that allows us to start services after network
> > connectivity is established?
> 
> The second idea isn't really possible. Hopefully the patch is either
> upstream-able, or can at least be cleaned up in such a way that upstream
> would accept it. If Jeroen has valid objections, we may drop it. Waiting to
> hear back from him =).

Next to the comments above, there is also a minor nit:
8<------------------
	if (connection_failure_logged) {
		dolog(LOG_INFO, "Succesfully connected to the TIC server %s\n", server);
	}
------------------>8

That does not do what you want it to do, it is missing a !.

But again: it is not the proper way to solve this. Please check my proposal (search for 'rdate') in this thread.

Comment 41 Lars Seipel 2012-08-07 15:17:03 UTC

(In reply to comment #40)
> Restarting inside AICCU is not that. See my proposal above which is being
> ignored...

I noticed your proposal and it should work fine. The thing is it'd require the (re-)introduction of a wrapper-/initscript which Fedora is trying to get rid of.

I might use something along those lines on my own system but I don't know if it's ok for the Fedora package. Conrad, what do you think?

Comment 42 Eric Hopper 2012-08-07 15:19:18 UTC

(In reply to comment #40)
> Lets assume the situation that somebody was overloading the TIC server
> already, or that some network issue causes connections to fail. By doing a
> connection every 15 seconds, and at the moment there are two published TIC
> servers, you are doing 8 connections per minute, 8*60 = 480 per hour, 11520
> per day.

I calculated out the bandwidth requirements. I would be comfortable running a TIC server on my own domain hosted off a DSL line with this code.

> That is not the way to solve your problem. Please see the proposal I made in
> this very thread for a much better alternative if you are so insistent to
> require it to work when starting it during broken network connectivity (and
> other undefined situations).

It is an interesting solution. Maybe that's what should be done instead. For the record, Jeroen is proposing that aiccu is wrapped in a shell script that systemd starts instead of the actual aiccu daemon. It would work like this:

-------------------------------
#!/bin/sh

pingable_servers=c.root-servers.net f.root-servers.net

until [ $(fping -a $pingable_servers | wc -c) -gt 0 ]; do
    logger -p local7.warning "AICCU network connectivity test failed."
    sleep 5;
done

# The suggestion is some means of fetching and setting the date, but there is
# no real way to accomplish this with chrony.
# rdate ntp.redhat.com

exec aiccu start
-------------------------------

This fails to acknowledge that transient network failures are not a reason for aiccu to abort. But it would work. It does add a dependency between aiccu and fping.

I'm also not sure what systemd does with something that calls itself a forking daemon that takes awhile to fork.

> Ping/access a well known host that should be up is a really effective way of
> checking this. This is what Apple and Microsoft do with their products, that
> little thing called a connectivity-check.

I do not believe they have system daemons doing this. Their connectivity check is simply so their network GUI status indicators can show the right thing.

> Most people in open source and even these little services like SixXS who
> provide AICCU are volunteers and do it in their free time, next to generally
> being full time engineers which take the rest of their time next to private
> life etc.

See the above line about running a TIC server off my own DSL line with this code. If you want me to, I'll do it. Though I'll say that my ISP isn't always the most reliable.

> Next to the comments above, there is also a minor nit:
> 8<------------------
> 	if (connection_failure_logged) {
> 		dolog(LOG_INFO, "Succesfully connected to the TIC server %s\n", server);
> 	}
> ------------------>8
> 
> That does not do what you want it to do, it is missing a !.

I double-checked. No, it's correct. The idea is that it only reports successful connection if it's previously reported that it's been having problems.

Comment 43 Eric Hopper 2012-08-07 15:49:43 UTC

Created attachment 602791 [details]
Patch AICCU to retry with backoff

This is an update of Conrad's patch to address some of Jeroen's concerns. I implement backoff and eventual connection failure if it retries for 16 hours or so.

Comment 44 Eric Hopper 2012-08-07 16:01:31 UTC

Created attachment 602794 [details]
Updated: Patch AICCU to retry with backoff

Oops, it had a few minor mistakes. This should fix those, and make the final failure timeout a little more than a day.

Comment 45 Conrad Meyer 2012-08-07 17:08:31 UTC

Responding to all of this (sorry, in a hurry, have to go to work):

(1) We don't have init scripts any more. We could wrap aiccu startup in a separate shell script. I'd rather not do this, but it's possible.

(2) The only way my proposed patch produces network traffic for a TIC server is if the client is firewalled at an iptables level (in which case, the client cannot possibly know this). If you send a protocol-level error, the client will not retry. It will log the error and exit like before.

(3) You suggest a connectivity check; attempting to connect to TIC is a connectivity check. Given that it needs the TIC service at a given domain, why introduce dependencies on 3rd party servers that don't necessarily show client<->TIC connectivity?

(4) Why log only once? Because otherwise we're just filling the disk. The client will not log anything further until connection either fails or succeeds, so when the admin checks the logs, the last thing she will see is "connection failed". This seems fairly unambiguous. It will not get rotated, because the program does not proceed to log anything else until connection succeeds/fails. As Eric points out, the second log message is intentionally only logged when prior connection attempt(s) failed.

(5) Eric's proposed back-off patch also LGTM. If it allows clients to maybe become automatically un-blacklisted, that's cool. But I'm not sure that's true...

Comment 46 Jeroen Massar 2012-08-07 18:48:51 UTC

(In reply to comment #41)
> (In reply to comment #40)
> > Restarting inside AICCU is not that. See my proposal above which is being
> > ignored...
> 
> I noticed your proposal and it should work fine. The thing is it'd require
> the (re-)introduction of a wrapper-/initscript which Fedora is trying to get
> rid of.

Aha, I was not aware of that.

As such, as an alternative proposal: I could add similar logic to AICCU instead of a shell script, that is, the ability to define a server to 'ping' so that it does check for connectivity before connecting to the TIC server.

And when it does connect we could have an option that forces the local time to that of the TIC server, option as I don't think it is always a good idea, and it is better if there is a NTP client on the host to keep it up to date.



(In reply to comment #42)
> (In reply to comment #40)
> > Lets assume the situation that somebody was overloading the TIC server
> > already, or that some network issue causes connections to fail. By doing a
> > connection every 15 seconds, and at the moment there are two published TIC
> > servers, you are doing 8 connections per minute, 8*60 = 480 per hour, 11520
> > per day.
> 
> I calculated out the bandwidth requirements. I would be comfortable running
> a TIC server on my own domain hosted off a DSL line with this code.

While indeed this might work, it is the people (not only this thread exists about this) who add broken restarters who will kill your link.


> > That is not the way to solve your problem. Please see the proposal I made in
> > this very thread for a much better alternative if you are so insistent to
> > require it to work when starting it during broken network connectivity (and
> > other undefined situations).
> 
> It is an interesting solution. Maybe that's what should be done instead. For
> the record, Jeroen is proposing that aiccu is wrapped in a shell script that
> systemd starts instead of the actual aiccu daemon. It would work like this:

Yes, that is how it could look like, though I would not hang them off the root servers and not retry every 5 seconds. Having a special DNS label for it might be a better thing to use, eg ala the <distribution>.pool.ntp.org setups.

And as mentioned above, likely better to do it in C.

> This fails to acknowledge that transient network failures are not a reason
> for aiccu to abort.

Do you mean transient failures at start or while running?

> But it would work. It does add a dependency between
> aiccu and fping.

Unless we do it in C.
 
> I'm also not sure what systemd does with something that calls itself a
> forking daemon that takes awhile to fork.
>
> > Ping/access a well known host that should be up is a really effective way of
> > checking this. This is what Apple and Microsoft do with their products, that
> > little thing called a connectivity-check.
> 
> I do not believe they have system daemons doing this. Their connectivity
> check is simply so their network GUI status indicators can show the right
> thing.

Vista and up have a very nice "Your internet is working" notification that depends on checking a couple of MS resources.
 
> > Most people in open source and even these little services like SixXS who
> > provide AICCU are volunteers and do it in their free time, next to generally
> > being full time engineers which take the rest of their time next to private
> > life etc.
> 
> See the above line about running a TIC server off my own DSL line with this
> code. If you want me to, I'll do it. Though I'll say that my ISP isn't
> always the most reliable.

See above, there are unfortunately stupid people on this planet, it thus will not scale.

> > Next to the comments above, there is also a minor nit:
> > 8<------------------
> > 	if (connection_failure_logged) {
> > 		dolog(LOG_INFO, "Succesfully connected to the TIC server %s\n", server);
> > 	}
> > ------------------>8
> > 
> > That does not do what you want it to do, it is missing a !.
> 
> I double-checked. No, it's correct. The idea is that it only reports
> successful connection if it's previously reported that it's been having
> problems.

Ah, that makes sense indeed.


(In reply to comment #43)
> Created attachment 602791 [details]
> Patch AICCU to retry with backoff
> 
> This is an update of Conrad's patch to address some of Jeroen's concerns. I
> implement backoff and eventual connection failure if it retries for 16 hours
> or so.

Why 16? Don't you think that the user would be missing connectivity IPv6 if they started their box started, and it is not working for 16 hours?

Note that this is only being done when starting aiccu, not at any other point.
I am also quite sure that the client is then well above the ratelimitter count.


(In reply to comment #44)
> Created attachment 602794 [details]
> Updated: Patch AICCU to retry with backoff
> 
> Oops, it had a few minor mistakes. This should fix those, and make the final
> failure timeout a little more than a day.

Why that limit? See above.

(In reply to comment #45)
> Responding to all of this (sorry, in a hurry, have to go to work):
> 
> (1) We don't have init scripts any more. We could wrap aiccu startup in a
> separate shell script. I'd rather not do this, but it's possible.

Then the best option is to C-ify it. Should not be too tricky.

> (2) The only way my proposed patch produces network traffic for a TIC server
> is if the client is firewalled at an iptables level (in which case, the
> client cannot possibly know this). If you send a protocol-level error, the
> client will not retry. It will log the error and exit like before.

And that firewall can happen on the client and any intermediate box.

> (3) You suggest a connectivity check; attempting to connect to TIC is a
> connectivity check. Given that it needs the TIC service at a given domain,
> why introduce dependencies on 3rd party servers that don't necessarily show
> client<->TIC connectivity?

I agree, this is why I started thinking about having a special connectivity check label.

> (4) Why log only once? Because otherwise we're just filling the disk. The
> client will not log anything further until connection either fails or
> succeeds, so when the admin checks the logs, the last thing she will see is
> "connection failed".

You are forgetting that other things (eg CRON) do logging and can thus hide that log message.

> This seems fairly unambiguous. It will not get rotated,
> because the program does not proceed to log anything else until connection
> succeeds/fails.

AICCU will not be the only thing having connectivity issues, CRON also runs etc. As such the log will be rotated out at one point or another and/or the real issue hidden.

Unless there is a special "administrative support needed" log file that gets a message noted once (and dupes removed) and that the user gets pushed in their faces...

> As Eric points out, the second log message is intentionally
> only logged when prior connection attempt(s) failed.

That one I get now indeed.

> (5) Eric's proposed back-off patch also LGTM. If it allows clients to maybe
> become automatically un-blacklisted, that's cool. But I'm not sure that's
> true...

No they will not, as every hit you make you extend the listing.

Comment 47 Conrad Meyer 2012-08-07 19:43:35 UTC

(In reply to comment #46)
> (In reply to comment #41)
> > I noticed your proposal and it should work fine. The thing is it'd require
> > the (re-)introduction of a wrapper-/initscript which Fedora is trying to get
> > rid of.
> 
> Aha, I was not aware of that.
> 
> As such, as an alternative proposal: I could add similar logic to AICCU
> instead of a shell script, that is, the ability to define a server to 'ping'
> so that it does check for connectivity before connecting to the TIC server.

This sounds like exactly what we want. We can point it to google.com (or redhat, I guess) if you are opposed to using the TIC server for this. I would prefer TCP-connectivity check instead of ICMP ping (it's probably easier to implement this anyway). Is this a reasonable compromise?

> And when it does connect we could have an option that forces the local time
> to that of the TIC server, option as I don't think it is always a good idea,
> and it is better if there is a NTP client on the host to keep it up to date.

For many reasons, this isn't a good idea. Let's not worry about trying to fix bad time automatically.

> > This fails to acknowledge that transient network failures are not a reason
> > for aiccu to abort.
> 
> Do you mean transient failures at start or while running?

Both.

> (In reply to comment #45)
> > (2) The only way my proposed patch produces network traffic for a TIC server
> > is if the client is firewalled at an iptables level (in which case, the
> > client cannot possibly know this). If you send a protocol-level error, the
> > client will not retry. It will log the error and exit like before.
> 
> And that firewall can happen on the client and any intermediate box.

Sorry, I meant if the TIC server has a firewall for the specific client ip. I don't think a few clients sending a SYN packet every 15 seconds is going to overload your server (or how has it survived the open internet thus far?). But this is irrelevant if we point to another domain.

> > (4) Why log only once? Because otherwise we're just filling the disk. The
> > client will not log anything further until connection either fails or
> > succeeds, so when the admin checks the logs, the last thing she will see is
> > "connection failed".
> 
> You are forgetting that other things (eg CRON) do logging and can thus hide
> that log message.

Ah, I was really forgetting that aiccu does not have a dedicated log file. My day job involves a commercial unix system; the daemon I work on has a dedicated log file. Still, messages seem to last a while -- on my Fedora 17 system, I seem to have /var/log/messages dating back to July 8. If an admin hasn't figured out why aiccu isn't working for a month, it seems unlikely he ever will ;).


Jeroen: It seems like you would be willing to implement the test for network connectivity in aiccu, in an acceptable way. Would you do that? While you're at it, could you bump the source tarball and "latest version" notes on the website? Even if no critical bugs have been fixed, it's helps people (myself included) understand that the project isn't dead. Thanks.

Comment 48 Eric Hopper 2012-08-07 22:19:24 UTC

(In reply to comment #45)
> (5) Eric's proposed back-off patch also LGTM. If it allows clients to maybe
> become automatically un-blacklisted, that's cool. But I'm not sure that's
> true...

The goal isn't to automatically un-blacklist them, it's to keep them from being blacklisted in the first place in the very unusual case of a TIC server being to overloaded to answer but still there enough to notice the client asking to talk to it every 15 seconds. Or the (similarly rare) case of a misconfigured firewall blocking the reply SYN/ACK from the TIC server once the SYN is sent.

It reduces the number of daily connection attempts to ~1246 on the first day, and ~254 on the first ~5 hours of the second followed the client giving up.

I really wish the working version of the code (and its complete revision history) were publicly available.

Comment 49 Eric Hopper 2012-08-07 22:39:42 UTC

(In reply to comment #46)
> Vista and up have a very nice "Your internet is working" notification that
> depends on checking a couple of MS resources.

As I said, I suspect they do not have system daemons that rely on this notification, but I might well be wrong. But I suspect the nice background connectivity check is only used for populating the GUI.

> Why 16? Don't you think that the user would be missing connectivity IPv6 if
> they started their box started, and it is not working for 16 hours?

You know, that's likely true. The idea is to wait as long as it would take for someone running this to connect their ethernet cable or bring up the wifi if it needed a password or something.

Somewhere in the 2-10 hour range is likely fine.

But, this highlights that the real solution is to bring this into NetworkManager somehow (at least I think NetworkManager can make some connections contingent on others) so it can only bring up AICCU once there's IPv4 connectivity, and bring it down once IPv4 connectivity goes away for more than 5 minutes or so.

If NetworkManager doesn't currently do this, someone could write something that listened for connectivity notifications on D-Bus and did stuff when connectivity was achieved or lost. NetworkManager could probably use a couple of different connectivity checks itself. Simply saying "I'm connected!" once an address has been assigned through DHCP is rather naive.

Comment 50 Jeroen Massar 2012-08-08 07:13:39 UTC

(In reply to comment #47)
> (In reply to comment #46)
> > (In reply to comment #41)
> > > I noticed your proposal and it should work fine. The thing is it'd require
> > > the (re-)introduction of a wrapper-/initscript which Fedora is trying to get
> > > rid of.
> > 
> > Aha, I was not aware of that.
> > 
> > As such, as an alternative proposal: I could add similar logic to AICCU
> > instead of a shell script, that is, the ability to define a server to 'ping'
> > so that it does check for connectivity before connecting to the TIC server.
> 
> This sounds like exactly what we want. We can point it to google.com (or
> redhat, I guess) if you are opposed to using the TIC server for this. I
> would prefer TCP-connectivity check instead of ICMP ping (it's probably
> easier to implement this anyway). Is this a reasonable compromise?

I would make a special DNS label for this instead, eg check.sixxs.net or so with both an IPv4 and IPv6 only-variant for being able to easily force IP version.

Ping is actually very easy to implement, but should be followed up with a TCP check. A check.torproject.org alike thing could work there, which is inline with one of the 'surprise' items in the changelog.

> > And when it does connect we could have an option that forces the local time
> > to that of the TIC server, option as I don't think it is always a good idea,
> > and it is better if there is a NTP client on the host to keep it up to date.
> 
> For many reasons, this isn't a good idea. Let's not worry about trying to
> fix bad time automatically.

I am very aware of why it is not a good idea, which is also why I state that it would be an optional feature. Note that broken time happens to a lot of folks.

> > > This fails to acknowledge that transient network failures are not a reason
> > > for aiccu to abort.
> > 
> > Do you mean transient failures at start or while running?
> 
> Both.

This proposed change would only address a network issue when starting, as then it cannot fetch the configuration details from TIC and thus it would not know how to configure itself.

While AICCU is running, thus after it has received it's configuration details from TIC, it should not exit, it might log errors though. If it does have an issue there, I really suggest to file a bug report, as mentioned either at info or by filing it eg against a distro package AND including info in the report so that we actually see it. In that case, do provide as much detail as possible though.


> > (In reply to comment #45)
> > > (2) The only way my proposed patch produces network traffic for a TIC server
> > > is if the client is firewalled at an iptables level (in which case, the
> > > client cannot possibly know this). If you send a protocol-level error, the
> > > client will not retry. It will log the error and exit like before.
> > 
> > And that firewall can happen on the client and any intermediate box.
> 
> Sorry, I meant if the TIC server has a firewall for the specific client ip.
> I don't think a few clients sending a SYN packet every 15 seconds is going
> to overload your server (or how has it survived the open internet thus
> far?). But this is irrelevant if we point to another domain.

It survived by being over provisioned and having upstream filtering abilities.
Unfortunately there are still people on the Internet who think they can hide using IPv6 and that they thus can start harassing people on the Internet and then there are those people who retaliate to that by massively DDoSSing unrelated servers.
And hence why SixXS has strict signup policies and very quick abuse handling (which takes first prio over handling other things, but otherwise there would not be service to handle other things for).

> > > (4) Why log only once? Because otherwise we're just filling the disk. The
> > > client will not log anything further until connection either fails or
> > > succeeds, so when the admin checks the logs, the last thing she will see is
> > > "connection failed".
> > 
> > You are forgetting that other things (eg CRON) do logging and can thus hide
> > that log message.
> 
> Ah, I was really forgetting that aiccu does not have a dedicated log file.
> My day job involves a commercial unix system; the daemon I work on has a
> dedicated log file. Still, messages seem to last a while -- on my Fedora 17
> system, I seem to have /var/log/messages dating back to July 8. If an admin
> hasn't figured out why aiccu isn't working for a month, it seems unlikely he
> ever will ;).

And THAT is exactly the problem why I think a backoff or trying for extended time is a bad idea as the user won't notice and never fix it.

People who do not care WILL never fix the real problem as well they do not care and/or do not know where to find the problem. This is a generic issue on hosts where there is no urgent message service to the user. (GUI Popups are not always possible as they are not always there, email might go unread or be misconfigured too, and otherwise some people do not log in to their shell to show it from a MOTD type of thing)

Now the proposed patch though is only about retrying the TIC connection attempt till it succeeds once and it will still exit when a failure condition arrises that it cannot resolve.


> Jeroen: It seems like you would be willing to implement the test for network
> connectivity in aiccu, in an acceptable way. Would you do that?

As I proposed, yes I am very willing to resolve problems.

> While you're
> at it, could you bump the source tarball and "latest version" notes on the
> website? Even if no critical bugs have been fixed, it's helps people (myself
> included) understand that the project isn't dead. Thanks.

While there are no critical fixes that change the behaviour of AICCU, there is a lot of other work that has been done (hence the 'surprise' things in the changelog) that are not complete yet and need a lot more testing before I am willing to put that out for generic consumption, as the current code works on all platforms that are tested and the new code might not.

(In reply to comment #48)
> (In reply to comment #45)
> > (5) Eric's proposed back-off patch also LGTM. If it allows clients to maybe
> > become automatically un-blacklisted, that's cool. But I'm not sure that's
> > true...
> 
> The goal isn't to automatically un-blacklist them, it's to keep them from
> being blacklisted in the first place in the very unusual case of a TIC
> server being to overloaded

The TIC Server being overloaded is an extremely rare case that has never happened, due to the blacklisting of clients that try to do this.

> to answer but still there enough to notice the
> client asking to talk to it every 15 seconds.

There is always enough bandwidth unless the pipes are filled with more than a gigabit of traffic. That will not easily happen, though it has.

> Or the (similarly rare) case
> of a misconfigured firewall blocking the reply SYN/ACK from the TIC server
> once the SYN is sent.

This has happend also already and is a common scenario for instance in corporate networks. TIC uses a non-standard (that is non-80/443 port) and thus might quite well be blocked on in our outbound for 'security reasons'. Yes, that means that that firewall is misconfigured, but that is what it is.

Note that full TCP state is heavier as a simple ICMP check.
Which is why propose doing a ping check at first. Other simple tests can be added after that, eg connecting to the TIC server would constitute part of that.

> It reduces the number of daily connection attempts to ~1246 on the first
> day, and ~254 on the first ~5 hours of the second followed the client giving
> up.

As mentioned above, if the user is not noticing they do not have connectivity, they will likely never notice it. That is the whole issue with this problem.

> I really wish the working version of the code (and its complete revision
> history) were publicly available.

It is, there is a public changelog (http://www.sixxs.net/tools/aiccu/changelog) and a summarized history of changes (http://www.sixxs.net/tools/aiccu/history/).
Although, yes, there are some surprise features that are not listed yet as they are not complete yet.

(In reply to comment #49)
> (In reply to comment #46)
> > Vista and up have a very nice "Your internet is working" notification that
> > depends on checking a couple of MS resources.
> 
> As I said, I suspect they do not have system daemons that rely on this
> notification, but I might well be wrong.

While I am not aware of any that do, one can hook off these events to get a notification that proper connectivity exists.

> But I suspect the nice background
> connectivity check is only used for populating the GUI.

It is and it notifies the user that their connectivity is broken and the user will notice this and click on it and get options for resolving this problem.

Note that this check is for both IPv4 and IPv6.

> > Why 16? Don't you think that the user would be missing connectivity IPv6 if
> > they started their box started, and it is not working for 16 hours?
> 
> You know, that's likely true. The idea is to wait as long as it would take
> for someone running this to connect their ethernet cable or bring up the
> wifi if it needed a password or something.
> 
> Somewhere in the 2-10 hour range is likely fine.

I would say that after startup you did not connect to a network in one (1) hour you are already done and over. The Internet is a requirement today

At that, we would get: 1 hour of tries, at say every 15 initially, then backoff to every 30 and finally 60 in the first 10 minutes, thus about 10 minutes every 30 seconds, 20 connects + 50 minutes every minutes, is 30+50=80 connection attempts, which is already a lot, but do-able as likely not all networks are brokenly configured as above.
 
> But, this highlights that the real solution is to bring this into
> NetworkManager somehow (at least I think NetworkManager can make some
> connections contingent on others) so it can only bring up AICCU once there's
> IPv4 connectivity, and bring it down once IPv4 connectivity goes away for
> more than 5 minutes or so.

AICCU does not need to be brought down or restarted. Heartbeat + AYIYA are made for changing IP addresses and dis-connectivity.

> If NetworkManager doesn't currently do this, someone could write something
> that listened for connectivity notifications on D-Bus and did stuff when
> connectivity was achieved or lost. NetworkManager could probably use a
> couple of different connectivity checks itself. Simply saying "I'm
> connected!" once an address has been assigned through DHCP is rather naive.

That is what I describe above what Windows and OSX do and they then provide a notification/event to client applications to do what they need to do based on that.

Greets,
 Jeroen

Comment 51 Eric Hopper 2012-08-08 13:39:30 UTC

(In reply to comment #50)
> It is, there is a public changelog
> (http://www.sixxs.net/tools/aiccu/changelog) and a summarized history of
> changes (http://www.sixxs.net/tools/aiccu/history/).
> Although, yes, there are some surprise features that are not listed yet as
> they are not complete yet.

It would be very nice if the Subversion repository were available for use with an 'svn co' operation by random users. But those two things are helpful. Thanks.

Comment 52 Jeroen Massar 2012-08-08 14:56:28 UTC

(In reply to comment #51)
> (In reply to comment #50)
> > It is, there is a public changelog
> > (http://www.sixxs.net/tools/aiccu/changelog) and a summarized history of
> > changes (http://www.sixxs.net/tools/aiccu/history/).
> > Although, yes, there are some surprise features that are not listed yet as
> > they are not complete yet.
> 
> It would be very nice if the Subversion repository were available for use
> with an 'svn co' operation by random users. But those two things are
> helpful. Thanks.

The SVN we use is for the whole SixXS project and that SVN is very restricted.

As such, we are internally debating how to best get the AICCU source in the open, so that we can sync our internal SVN copy to that external resource.

Likely, it will sooner/later appear at https://github.com/sixxs/aiccu though...
(which would also enable bug/issue tracking for AICCU, to let people propose merges etc.... the only thing is that there are a lot of code changes which are untested, thus for short-term those features should not be used yet when we do that...) When this happens, I'll comment here too, or better, open a new bug against the various distros that they can sync against that tree.

Comment 53 David Waring 2012-08-09 09:45:21 UTC

Created attachment 603220 [details]
NetworkManager dispatcher script to handle aiccu

I had a similar problem with aiccu and solved it by adding in the attached NetworkManager dispatcher script as /etc/NetworkManager/dispatcher.d/15-aiccu

The script stops aiccu when there is no default route in the IPv4 routing tables and starts it if the interface being brought up has defined a default IPv4 route.

This handles aiccu running at the correct times so long as you are using NetworkManager to handle your network interfaces.

Comment 54 Jeroen Massar 2012-08-09 11:40:54 UTC

(In reply to comment #53)
> Created attachment 603220 [details]
> NetworkManager dispatcher script to handle aiccu
> 
> I had a similar problem with aiccu and solved it by adding in the attached
> NetworkManager dispatcher script as /etc/NetworkManager/dispatcher.d/15-aiccu
> 
> The script stops aiccu when there is no default route in the IPv4 routing
> tables and starts it if the interface being brought up has defined a default
> IPv4 route.
> 
> This handles aiccu running at the correct times so long as you are using
> NetworkManager to handle your network interfaces.

This whole bug report is about the fact that restarting AICCU is not needed.

This bug report only serves to to solve the issue of AICCU exiting when there is no connectivity when it starts. After that it should continue functioning even if the connectivity goes away for a bit and comes back later (which is what you are describing). If you have an issue with that, do post a bug report about that.

Comment 55 Jeroen Massar 2012-08-09 11:49:22 UTC

(In reply to comment #53)
> Created attachment 603220 [details]
> NetworkManager dispatcher script to handle aiccu
> 
> I had a similar problem with aiccu and solved it by adding in the attached
> NetworkManager dispatcher script as /etc/NetworkManager/dispatcher.d/15-aiccu
> 
> The script stops aiccu when there is no default route in the IPv4 routing
> tables and starts it if the interface being brought up has defined a default
> IPv4 route.
> 
> This handles aiccu running at the correct times so long as you are using
> NetworkManager to handle your network interfaces.

As an addendum to the above comment (#54), even if one would not 'stop' AICCU when the default route goes missing (thus remove that part of the dispatcher script), the start would attempt to start it even though AICCU had previously logged already that it indicated that something was wrong, which is the big issue with this bug.

Comment 56 Fedora Update System 2012-08-09 22:49:52 UTC

aiccu-2007.01.15-15.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 57 Fedora Update System 2012-08-09 23:26:46 UTC

aiccu-2007.01.15-15.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 58 Eric Hopper 2012-08-10 23:58:20 UTC

(In reply to comment #55)
> As an addendum to the above comment (#54), even if one would not 'stop'
> AICCU when the default route goes missing (thus remove that part of the
> dispatcher script), the start would attempt to start it even though AICCU
> had previously logged already that it indicated that something was wrong,
> which is the big issue with this bug.

No connectivity on boot is the normal situation in Fedora, and is most definitely not an error. When aiccu is started on boot, it reports no connectivity as being an error. This is a problem, since it isn't really.

Then in comment #19 you state that AICCU is a VPN-alike tool. Which is a fair characterization. VPN tunnels are brought down when there is no connectivity and brought up when there is. The NetworkManager thing accomplishes this.

But you don't like that solution either. You claim (rightly I expect) that AICCU handles losing connectivity while it's in operation, and that it also handles IP address changes. This is very unlike normal VPN behavior.

So, which is it? Is it a VPN-alike tool in which the VPN only exists when there's network connectivity? Or is it something of a different order in which the association persists forever regardless of connectivity?

If it's the latter, the appropriate thing to do is to make it periodically poll the needed server until connectivity is achieved and then continue on its merry and ultra-reliable way.

If it's the former, the appropriate thing is to make it a VPN-like thing that NetworkManager bring up once connectivity is achieved, and shuts down once connectivity is lost.

The polling that the patches Conrad and I (mostly Conrad) have created create an extremely mild load, and then only in somewhat unusual circumstances. I expect that in the field they will be fairly unlikely to result in any client being blacklisted. As I stated, I would be willing to field a TIC server on my DSL line with this code as long as all the servers were actually properly configured.

There is a problem, and that problem is that a lack of connectivity on boot is not an error. And a perfectly reasonable desire to be able to have AICCU start without user intervention when connectivity is achieved. What would you suggest the solution should be?

Note that telling us that all the things we're doing are wrong is not providing a solution. Please tell us how to make it work in a way we would all find acceptable.

Comment 59 Conrad Meyer 2012-08-28 06:45:51 UTC

(In reply to comment #46)
> (In reply to comment #41)
> > (In reply to comment #40)
> > > Restarting inside AICCU is not that. See my proposal above which is being
> > > ignored...
> > 
> > I noticed your proposal and it should work fine. The thing is it'd require
> > the (re-)introduction of a wrapper-/initscript which Fedora is trying to get
> > rid of.
> 
> Aha, I was not aware of that.
> 
> As such, as an alternative proposal: I could add similar logic to AICCU
> instead of a shell script, that is, the ability to define a server to 'ping'
> so that it does check for connectivity before connecting to the TIC server.

Jeroen, it's been a little over two weeks since we've heard from you. What's the status on this?

Comment 60 Jeroen Massar 2012-08-28 07:53:13 UTC

Working on it (next to that thing called real life, work, boxes that die off and need to be fixed and other such unplanned and more urgent things, helping users, etc etc etc)

Please see the changelog which is published at the following URL for a few updates that should be addressing amongst others the concerns in this ticket:
http://www.sixxs.net/tools/aiccu/changelog

More updates are following in the coming week, hopefully that time will include the ability to test this version on a large variety of platforms and then I'll push the full tree to https://github.com/SixXS/aiccu so that folks can beta test it; after various people have then confirmed that it still properly works it would then be prudent to finally update it in the repos of various distributions.

The github thing is also meant so that folks can file issues against aiccu there and of course patches so that we can more easily integrate them into the mainline codebase.

As said though, I first want it better tested to be sure it does not break working usage, as that would just cause more complaining users and thus more work which detracts from getting good things done.

Comment 61 Laurent Rineau 2012-10-31 15:06:11 UTC

Anything new?

Comment 62 Rolf Fokkens 2012-11-11 18:28:13 UTC

I've been running into this issue for quite a while now. What actually happens is this:

Nov 11 15:50:02 home01 aiccu: Couldn't resolve host tic.sixxs.net, service 3874
Nov 11 15:50:02 home01 aiccu: Couldn't connect to the TIC server tic.sixxs.net
Nov 11 15:50:02 home01 aiccu: Couldn't retrieve first tunnel for the above reason

This is caused by the fact that I run my own DNS server which is not running at the time aiccu starts.

This mostly happens unnoticed by me which means that an increasing number of IPv6 sites (google, facebook, ..) are becoming really slow here because IPv4 connectivity is used only after IPv6 fails for 30 seconds or so.

I made a small change to /usr/lib/systemd/system/aiccu.service:

After=syslog.target network.target named.service

As you can see I addedd named.service. I haven't been able to test this yet, but it might be of use.

Comment 63 Rolf Fokkens 2012-11-13 10:11:24 UTC

After several reboots I can confirm that this works for me: every time aiccu runs like a charm after a reboot now.

Comment 64 Pavel Šimerda (pavlix) 2012-11-13 11:54:19 UTC

I'm very sorry to step into your discussions, but as one of NetworkManager developers, as a SixXS user since at least 2008, IPv6-related articlers writer
since 2007, I would like to add my 2 cents.

(In reply to comment #12)
> I don't have a better solution to this and I'm not about to dive into an
> abandoned, five years dead piece of software. If you have a patch, it is
> welcome. Otherwise I'll end up orphaning or killing aiccu in Fedora.

I'm 100% sure someone will take it.

(In reply to comment #54)
> > Created attachment 603220 [details]
> > NetworkManager dispatcher script to handle aiccu
> 
> This whole bug report is about the fact that restarting AICCU is not needed.

Jeroen, I can imagine how you are angry, but… are you aware that you are replying to a solution that does *exactly* what you requested?

The dispatcher script is about starting aiccu *when global connectivity is achieved* and stopping it when global connectivity is lost.

> This bug report only serves to to solve the issue of AICCU exiting when
> there is no connectivity when it starts.

Therefore it should never be started at boot on systems that rely on NetworkManager for getting connectivity. It should be started from the
dispatcher script *only*.

> After that it should continue functioning even if the connectivity
> goes away for a bit and comes back later (which is what you are describing).

This usually is not possible. Connectivity goes away for a reason, too. When connectivity is down, we have *no* information, whether it will be back
in two minutes or in two weeks.

Can AICCU cope with two weeks offline? If yes, it might be desirable to store
configuration persistently. It might be as well desirable if it detected
network connections and time changes and runs as a proper system service. These are just questions.

As far as I read correctly, you promote aiccu as a tool to connect to dynamic tunnels. And I would expect that they can connect dynamically when it's possible.

There are two ways to achieve it:

1) The tool is aware of connectivity and waits until the computer is actually
connected. Then it must not exit just because there's no connectivity.

2) The tool is started when there's connectivity. And it's best to stop it when
the connectivity is lost, so that it can be started again when it's back.

In either case, we're leaving the NTP problem unresolved, as NTP also waits for
connectivity. Therefore aiccu should be also able to react to NTP changes. The two options above apply just as well to aiccu, where network connectivity is replaced with NTP synchronization.

(In reply to comment #60)
> Working on it (next to that thing called real life, work, boxes that die off
> and need to be fixed and other such unplanned and more urgent things,
> helping users, etc etc etc)
> 
> Please see the changelog which is published at the following URL for a few
> updates that should be addressing amongst others the concerns in this ticket:
> http://www.sixxs.net/tools/aiccu/changelog

Ah, I see some of the stuff above implemented. I'm curious about any follow-ups.

(In reply to comment #63)
> After several reboots I can confirm that this works for me: every time aiccu
> runs like a charm after a reboot now.

I'm curious about how this works and in which environment. If it also works when aiccu is not yet configured (delete the cache file), and it is started before NetworkManager gets connectivity. For example, when you start aiccu offline with no cached information and with wrong system time, and then you connect to a network and recieve time from NTP.

Comment 65 Conrad Meyer 2012-12-14 17:54:04 UTC

*** Bug 887203 has been marked as a duplicate of this bug. ***

Comment 66 Conrad Meyer 2012-12-30 18:45:53 UTC

(In reply to comment #64)
> I'm very sorry to step into your discussions, but as one of NetworkManager
> developers, as a SixXS user since at least 2008, IPv6-related articlers
> writer
> since 2007, I would like to add my 2 cents.
> 
> (In reply to comment #12)
> > I don't have a better solution to this and I'm not about to dive into an
> > abandoned, five years dead piece of software. If you have a patch, it is
> > welcome. Otherwise I'll end up orphaning or killing aiccu in Fedora.
> 
> I'm 100% sure someone will take it.

Pavel, would you be interested in taking over aiccu? You seem to be much more qualified for the job, and I have lost some motivation since I stopped using the package. Let me know and we can do the pkgdb shuffle.

Comment 67 Pavel Šimerda (pavlix) 2012-12-31 00:53:19 UTC

(In reply to comment #66)
> Pavel, would you be interested in taking over aiccu? You seem to be much
> more qualified for the job, and I have lost some motivation since I stopped
> using the package. Let me know and we can do the pkgdb shuffle.

I'm still using aiccu. I'll take it if there's nobody else to do that. But I'm afraid I will not be into quick action, so if there's anyone more active...

Comment 68 Conrad Meyer 2013-01-03 07:01:14 UTC

Okay, orphaned in pkgdb. All yours unless someone beats you to it.

Comment 69 Fedora Admin XMLRPC Client 2013-02-23 17:09:06 UTC

This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 70 Pavel Šimerda (pavlix) 2013-02-23 17:18:17 UTC

Enough time for others. Taken. Still I can't promise a quick solution, any suggestions and help are welcome.

Comment 71 Josh Reynolds 2013-03-11 02:49:48 UTC

This bug looks to be a rehash of Bug 735538 which seems to have had a reasonable solution already provided although nothing came of it.... Any thoughts on that route?

Comment 72 Pavel Šimerda (pavlix) 2013-03-11 02:52:37 UTC

(In reply to comment #71)
> This bug looks to be a rehash of Bug 735538 which seems to have had a
> reasonable solution already provided although nothing came of it.... Any
> thoughts on that route?

Will comment there.

Comment 73 Josh Reynolds 2013-03-11 04:37:34 UTC

Okay well I'm not sure about my last comment actually having a solution now but can we get a recap on what the plan is on how to handle this? Here's what I gather for now although I easily could have missed something.

I'm thinking in the interim we could have a wrapper script to run it normally on boot like it currently does & if it doesn't connect then wait 2 minutes to try running it once more then give up on trying again to let the user figure out why its not running.

Alternatively we could have NM load it via dispatcher script after IPv4 network connection established & leave it be afterwards (i.e. not force shutdown when connection lost). This will only hit the SixXS servers when a IPv4 connection comes up so shouldn't be terribly too often although could cause a low increase in connection attempts.

Long term seems best for upstream to change the source code so AICCU client will keep retrying if no network connection (specifically when "Couldn't connect to the TIC server" is returned) but exit if anything else (like bad configs) doesn't allow it to complete the connection.

Comment 74 Eric Hopper 2013-03-12 17:14:58 UTC

I actually think one of the two supplied patches is the best option until there's NetworkManager integration.

Comment 75 Pavel Šimerda (pavlix) 2013-03-13 02:48:14 UTC

(In reply to comment #73)
> Okay well I'm not sure about my last comment actually having a solution now
> but can we get a recap on what the plan is on how to handle this?

Yes, please.

> Here's what I gather for now although I easily could have missed something.
> 
> I'm thinking in the interim we could have a wrapper script to run it
> normally on boot like it currently does & if it doesn't connect then wait 2
> minutes to try running it once more then give up on trying again to let the
> user figure out why its not running.

There's nm-online available for that with NetworkManager. We could improve that if it's not good enough. Adding dcbw do Cc to comment about it.

> Alternatively we could have NM load it via dispatcher script after IPv4
> network connection established & leave it be afterwards (i.e. not force
> shutdown when connection lost). This will only hit the SixXS servers when a
> IPv4 connection comes up so shouldn't be terribly too often although could
> cause a low increase in connection attempts.

At least the dispatcher scripts can restart it so that there's no need to retry.

> Long term seems best for upstream to change the source code so AICCU client
> will keep retrying if no network connection (specifically when "Couldn't
> connect to the TIC server" is returned) but exit if anything else (like bad
> configs) doesn't allow it to complete the connection.

Could be. Let's contact Jeroen about that one.

(In reply to comment #74)
> I actually think one of the two supplied patches is the best option until
> there's NetworkManager integration.

Do I understand it correctly that if I apply either patch, the DOSing would not occur?

Plus, would you consider the attached dispatcher script a good enough integration tool to avoid the need of those patches?

Thanks,

Pavel

Comment 76 Eric Hopper 2013-03-14 21:24:12 UTC

(In reply to comment #75)
> Do I understand it correctly that if I apply either patch, the DOSing would
> not occur?

Well, I checked it out. The patch by Conrad will never stop retrying, doesn't do exponential backoff, and tries every 15 seconds. So, with thousands of clients, there is a potential DDOS.

Though it only retries if the connection timed out, which means there's no connectivity to the TIC server. So, IMHO, the potential for a DDOS is low.

The modifications I made will cause it to start retrying every second, but adds half the current backoff timeout to the backout timeout every time it fails until it backs off to trying once every 2 minutes or so. (i.e. 1, 2, 3, 4, 6, 9, 13...)

It also gives up after a few hours of trying for a max of 37 hours (57 retries). So it will not result in a system continuously trying for days.

Much lower potential for a DDOS. It has the same property of not retrying on any error other than connection timeout.

I would not be comfortable running a TIC server with thousands of clients on my own DSL line with Conrad's patch. I'd be wary, but willing with my modifications.

> Plus, would you consider the attached dispatcher script a good enough
> integration tool to avoid the need of those patches?

It probably is.

As far as I know, NM does not have a test to see if it 'really' has connectivity or not. So that's a concern. NM may have assigned a local IP address but there is still no connectivity to the public Internet until a VPN or something is brought up. So that runs the risk of AICCU failing in those sorts of environments.

But those aren't the common case.

If NM does have reliable a test for public Internet connectivity, then using that as the trigger for bringing up AICCU is a better idea.

The dispatcher script has one other disadvantage as far as Jereon is concerned. It shuts down AICCU when connectivity is lost. This is apparently not the desired behavior as it's suppose to handle local IP changes and the like just fine after it's retrieved the setup information from the TIC server.

With the shutdown code, this script will likely generate more overall traffic to the TIC servers than my patch. If enough people are running AICCU on mobile devices like laptops it's possible that this will be a very noticeable increase. My laptop's network connectivity changes 3-10 times a day.

Comment 77 Pavel Šimerda (pavlix) 2013-03-14 22:14:13 UTC

(In reply to comment #76)
> (In reply to comment #75)
> > Do I understand it correctly that if I apply either patch, the DOSing would
> > not occur?
> 
> Well, I checked it out. The patch by Conrad will never stop retrying,
> doesn't do exponential backoff, and tries every 15 seconds. So, with
> thousands of clients, there is a potential DDOS.
> 
> Though it only retries if the connection timed out, which means there's no
> connectivity to the TIC server. So, IMHO, the potential for a DDOS is low.
> 
> The modifications I made will cause it to start retrying every second, but
> adds half the current backoff timeout to the backout timeout every time it
> fails until it backs off to trying once every 2 minutes or so. (i.e. 1, 2,
> 3, 4, 6, 9, 13...)
> 
> It also gives up after a few hours of trying for a max of 37 hours (57
> retries). So it will not result in a system continuously trying for days.
> 
> Much lower potential for a DDOS. It has the same property of not retrying on
> any error other than connection timeout.
> 
> I would not be comfortable running a TIC server with thousands of clients on
> my own DSL line with Conrad's patch. I'd be wary, but willing with my
> modifications.

Sounds good. But I'm still not much comfortable with getting this in without Jeroen's consent (even better if they accept it upstream).

> > Plus, would you consider the attached dispatcher script a good enough
> > integration tool to avoid the need of those patches?
> 
> It probably is.

Good.
 
> As far as I know, NM does not have a test to see if it 'really' has
> connectivity or not.

As NetworkManager does the configuration, in most cases it does know about the connectivity. Of course we can talk about the minority of cases where it doesn't work.

Plus, if you configure NetworkManager to actually do the connectivity check, then that's even better. There's a way to do that, it's just not enabled by default.

> So that's a concern. NM may have assigned a local IP
> address but there is still no connectivity to the public Internet until a
> VPN or something is brought up.

Current version of NetworkManager allows you to bind the VPN to a physical connection so that it doesn't consider the connection to be established before the VPN is.

> So that runs the risk of AICCU failing in those sorts of environments.

Now that in most cases you can configure NetworkManager to handle this, I would consider the dispatcher.d solution as a good solution at least for the first step.

> But those aren't the common case.
> 
> If NM does have reliable a test for public Internet connectivity, then using
> that as the trigger for bringing up AICCU is a better idea.

If NM doesn't provide you with the options you need and you can't obtain them alongside NetworkManager, then we can most probably help you.
 
> The dispatcher script has one other disadvantage as far as Jereon is
> concerned. It shuts down AICCU when connectivity is lost. This is apparently
> not the desired behavior as it's suppose to handle local IP changes and the
> like just fine after it's retrieved the setup information from the TIC
> server.

So my proposal is to use the dispatcher script by default and let the user decide whether he wants to modify it or remove it.

> With the shutdown code, this script will likely generate more overall
> traffic to the TIC servers than my patch.

Unless you expect to take down your connections often, I don't think that should be the case.

> If enough people are running AICCU
> on mobile devices like laptops it's possible that this will be a very
> noticeable increase. My laptop's network connectivity changes 3-10 times a
> day.

That's not so much. I'm open to more discussion and I'm reluctant to act before we have at least a rough consensus here.

Comment 78 Eric Hopper 2013-03-15 02:21:42 UTC

(In reply to comment #77)
> Plus, if you configure NetworkManager to actually do the connectivity check,
> then that's even better. There's a way to do that, it's just not enabled by
> default.

It would be nice if there was more documentation lying about. But, to be fair, I also haven't hunted really hard.

> So my proposal is to use the dispatcher script by default and let the user
> decide whether he wants to modify it or remove it.

    .
    .
    .

> That's not so much. I'm open to more discussion and I'm reluctant to act
> before we have at least a rough consensus here.

Using the dispatcher seems fine. The things I mentioned as negatives do not seem to me to be show stoppers in any regard. As you said, all the cases I mention that might not work so well are minority cases that can be worked around by knowledgeable users. And the script would certainly work better than what happens now.

As for upstream acceptance, well, anything that smacks of restarting or retrying in any way seems to provoke outrage. So, from that perspective I can't see much difference between the code and the script.

My first proposed solution way up there was extremely problematic of course, and likely deserved a bit of outrage. I'm not used to thinking in terms of 'if tens or hundreds of thousands of users ran this against my server'.

Your NM dispatcher script is likely the less risky option. The behavior of the code is not nearly so easy to understand, even though I do believe it's correct.

And some solution is better than none. So please go with it. :-)

Comment 79 Jeroen Massar 2013-03-15 22:25:42 UTC

I in the middle of a physical move, thus I'll keep it simple:

------------------
Notes
Please also read the README included.

Following is an import section of that text:

WARNING: Never run AICCU from DaemonTools or a similar automated 'restart' tool/script. When AICCU does not start, it has a reason not to start which it gives on either the stdout or in the (sys)log file. The TIC server *will* automatically disable accounts which are detected to run in this mode. Use 'verbose true' to see more information which is especially handy when starting fails.
---------------------

Which part of that do you people not understand?

Comment 80 Eric Hopper 2013-03-15 22:59:46 UTC

(In reply to comment #79)
> I in the middle of a physical move, thus I'll keep it simple:
> 
> ------------------
> Notes
> Please also read the README included.
> 
> Following is an import section of that text:
> 
> WARNING: Never run AICCU from DaemonTools or a similar automated 'restart'
> tool/script. When AICCU does not start, it has a reason not to start which
> it gives on either the stdout or in the (sys)log file. The TIC server *will*
> automatically disable accounts which are detected to run in this mode. Use
> 'verbose true' to see more information which is especially handy when
> starting fails.
> ---------------------
> 
> Which part of that do you people not understand?

I understand it perfectly.

So, is your solution to simply not use AICCU since it's not fit for use as is?

Comment 81 Pavel Šimerda (pavlix) 2013-03-16 05:28:05 UTC

(In reply to comment #79)
> I in the middle of a physical move, thus I'll keep it simple:
> 
> ------------------
> Notes
> Please also read the README included.
> 
> Following is an import section of that text:
> 
> WARNING: Never run AICCU from DaemonTools or a similar automated 'restart'
> tool/script. When AICCU does not start, it has a reason not to start which
> it gives on either the stdout or in the (sys)log file. The TIC server *will*
> automatically disable accounts which are detected to run in this mode. Use
> 'verbose true' to see more information which is especially handy when
> starting fails.
> ---------------------
> 
> Which part of that do you people not understand?

We in Fedora are used to being respectful to each other. It keeps the community working better. I could just as well ask what you didn't understand in the bugreport. But on the contrary, I will be happy to answer whatever questions you have. By posting the above citation you clearly indicate that you fail to understand the difference between daemontools and what we are trying to achieve.

This is the last time I'm asking for cooperation on resolving (as opposed to ignoring) the issues. I will be happy to cooperate on making aiccu useful for day-to-day use without having to run it manually from the command-line. But the upstream refuses to help us, we don't have any other chance than to do the best to our own knowledge.

We are fond of our users and want to provide them with good networking experience with the open source tools they want to use.

You could know better than being rude to a fresh new maintainer of Fedora aiccu package who kindly asked you for advice and opinion *before* taking action.

(In reply to comment #80)
> I understand it perfectly.
> 
> So, is your solution to simply not use AICCU since it's not fit for use as
> is?

I must admit I am a little bit sad about Jeroen's attitude and still hope that he'll change that. But as the software is open source an can be used to connect to other services, too, we will finally have to make a careful decision at the Fedora downstream level.

Comment 82 Leoš Bitto 2013-03-18 22:10:45 UTC

I have a Fedora 18 server which runs bind as a caching nameserver and radvd which announces the IPv6 prefix to LAN. This seems to be a very basic setup, yet it does not work - aiccu starts early and exits with "Couldn't resolve host tic.sixxs.net, service 3874", because bind is not started yet. I think that temporary DNS outage like this deserves much better handling!

Comment 83 Pavel Šimerda (pavlix) 2013-03-19 10:24:53 UTC

(In reply to comment #82)
> I have a Fedora 18 server which runs bind as a caching nameserver

By the way bind has the same problem and we use a dispatcher.d script to work around it. I wonder whether after 'service bind restart', bind is immediately able to answer DNS queries or not.

> and radvd which announces the IPv6 prefix to LAN.

Just curious, what's your use case for aiccu on a network *with* IPv6?

> This seems to be a very basic setup, yet it does not work

True. Aiccu is apparently not prepared to be run at boot with dynamic configuration and we probably can't expect help from upstream with that.

> - aiccu starts early and exits with "Couldn't resolve
> host tic.sixxs.net, service 3874", because bind is not started yet. I think
> that temporary DNS outage like this deserves much better handling!

Definitely.

Comment 84 Leoš Bitto 2013-03-19 10:48:38 UTC

(In reply to comment #83)
> (In reply to comment #82)
> > I have a Fedora 18 server which runs bind as a caching nameserver
> 
> By the way bind has the same problem and we use a dispatcher.d script to
> work around it. I wonder whether after 'service bind restart', bind is
> immediately able to answer DNS queries or not.

I don't understand how is this related. I have no problems with bind (besides starting after aiccu, which I do not consider to really be a problem with bind).

> > and radvd which announces the IPv6 prefix to LAN.
> 
> Just curious, what's your use case for aiccu on a network *with* IPv6?

My use case is that aiccu is providing IPv6 - I don't have any other source of IPv6 connectivity. I know that there is a possibility for improvement - radvd should start announcing the IPv6 prefix only after aiccu successfully connects, but that would either require aiccu to be able to run some script after it connects (like pppd does with /etc/ppp/ip-up for example) or I would need to implement some polling of the network interface created by aiccu.

Comment 85 Laurent Rineau 2013-03-19 10:59:42 UTC

> I know that there is a possibility for improvement -
> radvd should start announcing the IPv6 prefix only after aiccu successfully
> connects, but that would either require aiccu to be able to run some script
> after it connects (like pppd does with /etc/ppp/ip-up for example) or I
> would need to implement some polling of the network interface created by
> aiccu.

By the way, the Debian version of aiccu carries a patch that allows to run "setupscripts" after the connectivity is enabled. That is an upstream feature of aiccu, but not working correctly. The Debian patch fixes that.

Comment 86 Pavel Šimerda (pavlix) 2013-03-19 11:45:06 UTC

(In reply to comment #84)
> (In reply to comment #83)
> > (In reply to comment #82)
> > > I have a Fedora 18 server which runs bind as a caching nameserver
> > 
> > By the way bind has the same problem and we use a dispatcher.d script to
> > work around it. I wonder whether after 'service bind restart', bind is
> > immediately able to answer DNS queries or not.
> 
> I don't understand how is this related.

Let me explain.

It is related in that the bind upstream code is not capable of properly running in a dynamically configured environment. We work around it in Fedora using NetworkManager dispatcher scripts.

> I have no problems with bind

Because it is worked around using the dispatcher scripts in Fedora.

> (besides starting after aiccu, which I do not consider to really be a
> problem with bind).

It is a problem with aiccu that can be most probably worked around the same way as the bind problem is.

I was also curious whether bind will answer DNS queries immediately after restart. As bind will be restarted and very short after also aiccu would be restarted as both would use the same workaround.

> My use case is that aiccu is providing IPv6 - I don't have any other source
> of IPv6 connectivity.

Therefore you're only running radvd deamon on that machine and that's entirely irrelevant to both aiccu and bind.

> I know that there is a possibility for improvement -
> radvd should start announcing the IPv6 prefix only after aiccu successfully
> connects,

This is not necessarily true. Not running radvd means that your local IPv6 network is not fully configured and that may or may not be what you want. But unless you have a full integration of aiccu with NetworkManager, this doesn't really make sense.

> but that would either require aiccu to be able to run some script
> after it connects (like pppd does with /etc/ppp/ip-up for example) or I
> would need to implement some polling of the network interface created by
> aiccu.

NetworkManager will be polling for network configuration in future and will be able to react to it. Then you could use its scripting capabilities for radvd, too, at your option.

(In reply to comment #85)
By the way, the Debian version of aiccu carries a patch that allows to run "setupscripts" after the connectivity is enabled. That is an upstream feature of aiccu, but not working correctly. The Debian patch fixes that.

If this is the case, we might want to use that patch, too, for people with special needs.

Comment 87 Leoš Bitto 2013-03-19 12:46:35 UTC

(In reply to comment #86)
> (In reply to comment #84)
> > (In reply to comment #83)
> > > (In reply to comment #82)
> > > > I have a Fedora 18 server which runs bind as a caching nameserver
> > > 
> > > By the way bind has the same problem and we use a dispatcher.d script to
> > > work around it. I wonder whether after 'service bind restart', bind is
> > > immediately able to answer DNS queries or not.
> > 
> > I don't understand how is this related.
> 
> Let me explain.
> 
> It is related in that the bind upstream code is not capable of properly
> running in a dynamically configured environment. We work around it in Fedora
> using NetworkManager dispatcher scripts.

I supposed so, but I was not able to find any NetworkManager dispatcher scripts for this in my Fedora 18.

> > I have no problems with bind
> 
> Because it is worked around using the dispatcher scripts in Fedora.

Or maybe because I have used bind for a much longer time (even before NetworkManager existed) and I always use listen-on and listen-on-v6 in named.conf with IP addresses of a reliable local network connection only.

> > (besides starting after aiccu, which I do not consider to really be a
> > problem with bind).
> 
> It is a problem with aiccu that can be most probably worked around the same
> way as the bind problem is.
> 
> I was also curious whether bind will answer DNS queries immediately after
> restart. As bind will be restarted and very short after also aiccu would be
> restarted as both would use the same workaround.

I don't think that this is the right approach, unless you would be able to synchronously wait for bind to finish its restart and only then restart aiccu.

> > My use case is that aiccu is providing IPv6 - I don't have any other source
> > of IPv6 connectivity.
> 
> Therefore you're only running radvd deamon on that machine and that's
> entirely irrelevant to both aiccu and bind.

I have described my whole environment, because I wanted everybody to see that I want a fully automatic solution which properly handles common transient situations (like DNS or network outages), not something like "do restart aiccu manually after you investigate and correct whatever the problem was".

> > I know that there is a possibility for improvement -
> > radvd should start announcing the IPv6 prefix only after aiccu successfully
> > connects,
> 
> This is not necessarily true. Not running radvd means that your local IPv6
> network is not fully configured and that may or may not be what you want.

In my case I am fine with running my local network with IPv4, but I do not want to announce any global IPv6 prefix when it is not routable yet. I know that I could use some additional local IPv6 prefix, but that is just too inconvenient when IPv4 works fine. No, I am not looking for some theoretical IPv6-only network yet.

> NetworkManager will be polling for network configuration in future and will
> be able to react to it. Then you could use its scripting capabilities for
> radvd, too, at your option.

Great!

Comment 88 Josh Reynolds 2013-03-20 15:56:03 UTC

Created attachment 713303 [details]
Updated: NetworkManager dispatcher script to handle aiccu

Edit of previous NM script with section to stop on interface down removed.

Comment 89 Josh Reynolds 2013-03-20 16:15:29 UTC

(In reply to comment #78)
> (In reply to comment #77)
> > So my proposal is to use the dispatcher script by default and let the user
> > decide whether he wants to modify it or remove it.
> 
>     .
>     .
>     .
> 
> > That's not so much. I'm open to more discussion and I'm reluctant to act
> > before we have at least a rough consensus here.
> 
> Using the dispatcher seems fine. The things I mentioned as negatives do not
> seem to me to be show stoppers in any regard. As you said, all the cases I
> mention that might not work so well are minority cases that can be worked
> around by knowledgeable users. And the script would certainly work better
> than what happens now.
> 
> As for upstream acceptance, well, anything that smacks of restarting or
> retrying in any way seems to provoke outrage. So, from that perspective I
> can't see much difference between the code and the script.
> 
> My first proposed solution way up there was extremely problematic of course,
> and likely deserved a bit of outrage. I'm not used to thinking in terms of
> 'if tens or hundreds of thousands of users ran this against my server'.
> 
> Your NM dispatcher script is likely the less risky option. The behavior of
> the code is not nearly so easy to understand, even though I do believe it's
> correct.
> 
> And some solution is better than none. So please go with it. :-)

After rereading things once again, I agree the NM dispatcher script approach would be best & simplest for short-term. I do understand upstream's frustrations though he was very clear that we shouldn't restart AICCU (see below) but the current NS script does. I've attached a revision of the existing script so it *only* starts aiccu after the interface goes up. Can we include that as part of the package ASAP since it gets old having to manually restart the aiccu service on boot as is.

(In reply to comment #17)
> > 
> > Modern systems frequently lose and re-acquire network connections. My
> > personal MacBook Pro must lose and re-acquire its network connection 4-5
> > times a day at least. And it frequently does not have any network
> > connectivity at all for periods of time.
> 
> There should not be a problem with this. If AICCU has started, it will,
> once, contact the TIC server and gather it's configuration details and keep
> on running.
> If it loses connectivity or it's local IP address changes this configuration
> remains the same, which is why restarting does not have any effect. 
> 
> > It will automatically re-acquire network connectivity when it can with
> > little or no intervention on my part.
> > 
> > Lack of network connectivity is not sufficient reason to 'log and exit'.
> 
> AICCU does not exit after it has retrieved it's TIC configuration. It
> configures the tunnel and keep on running. If you have reason to believe
> this is not the case, then please provide details so we can look into this.
> 

I'm not sure the attached patch is best for now since it retries too many times for my comfort although otherwise is what I referred to earlier as better long-term. I think if upstream integrates or approves it then should be enough of a green light for that approach.

In-between each of those 2 solutions would be to use NetworkManager-wait-online.service (uses nm-online) but bug 837793 is holding that up for now.

Comment 90 Josh Reynolds 2013-03-20 16:44:29 UTC

Hrm... well that's what I get for rushing it. The edited script looked okay but didn't work for me so could someone verify it works as designed? I don't have the time to find the logs on if it even attempted to load aiccu right now. Thanks.

Comment 91 Pavel Šimerda (pavlix) 2013-03-20 17:47:58 UTC

(In reply to comment #87)
> I supposed so, but I was not able to find any NetworkManager dispatcher
> scripts for this in my Fedora 18.

# rpm -qf /etc/NetworkManager/dispatcher.d/13-named 
bind-9.9.2-8.P1.fc18.x86_64

> > > I have no problems with bind
> > 
> > Because it is worked around using the dispatcher scripts in Fedora.
> 
> Or maybe because I have used bind for a much longer time (even before
> NetworkManager existed) and I always use listen-on and listen-on-v6 in
> named.conf with IP addresses of a reliable local network connection only.

Or that. There is a number of use cases for which the problem doesn't occur.

> > I was also curious whether bind will answer DNS queries immediately after
> > restart. As bind will be restarted and very short after also aiccu would be
> > restarted as both would use the same workaround.
> 
> I don't think that this is the right approach,

It's cheap to say that unless you have a better one. That doesn't mean you're not right, though.

> unless you would be able to synchronously wait for bind to finish
> its restart and only then restart aiccu.

That is why I'm bringing up this issue.

> I have described my whole environment, because I wanted everybody to see
> that I want a fully automatic solution which properly handles common
> transient situations (like DNS or network outages), not something like "do
> restart aiccu manually after you investigate and correct whatever the
> problem was".

Good point.

> > This is not necessarily true. Not running radvd means that your local IPv6
> > network is not fully configured and that may or may not be what you want.
> 
> In my case I am fine with running my local network with IPv4, but I do not
> want to announce any global IPv6 prefix when it is not routable yet. I know
> that I could use some additional local IPv6 prefix, but that is just too
> inconvenient when IPv4 works fine. No, I am not looking for some theoretical
> IPv6-only network yet.

Yet I do have to think about IPv4-only, IPv6-only and dualprotocol networks when looking for solutions that actually make sense and don't bring a bunch of new problems.

But I admit that with aiccu we're in a situation that might justify quick fixes to just make it work.

(In reply to comment #89)
> Can
> we include that as part of the package ASAP since it gets old having to
> manually restart the aiccu service on boot as is.

Definitely. Did you test the dispatcher script you're attaching?

> (In reply to comment #17)
> I'm not sure the attached patch is best for now since it retries too many
> times for my comfort although otherwise is what I referred to earlier as
> better long-term. I think if upstream integrates or approves it then should
> be enough of a green light for that approach.

Ok, let's see whether the dispacher script makes people happy.

> In-between each of those 2 solutions would be to use
> NetworkManager-wait-online.service (uses nm-online) but bug 837793 is
> holding that up for now.

I consider NetworkManager-wait-online.service a much uglier hack then the dispatcher scripts.

Comment 92 Leoš Bitto 2013-03-21 10:34:31 UTC

(In reply to comment #91)
> 
> > > I was also curious whether bind will answer DNS queries immediately after
> > > restart. As bind will be restarted and very short after also aiccu would be
> > > restarted as both would use the same workaround.
> > 
> > I don't think that this is the right approach,
> 
> It's cheap to say that unless you have a better one. That doesn't mean
> you're not right, though.

I have described one right approach in the following part of the same sentence, so I do not consider it cheap in any way. 

> > unless you would be able to synchronously wait for bind to finish
> > its restart and only then restart aiccu.
> 
> That is why I'm bringing up this issue.

Maybe that synchronous waiting for bind to finish its restart would be too complicated, in that case there still is the possibility to change aiccu to properly detect at least the DNS outage (or in the ideal situation any transient outage) and not exit in that case (retry instead). Retrying in aiccu should probably be configurable to avoid undesirable side effects, because I can imagine that in some cases exitting might be preferred.

> Yet I do have to think about IPv4-only, IPv6-only and dualprotocol networks
> when looking for solutions that actually make sense and don't bring a bunch
> of new problems.
> 
> But I admit that with aiccu we're in a situation that might justify quick
> fixes to just make it work.

I believe that with aiccu most users (if not all) run dual-stack networks, because they need IPv4 connectivity for aiccu to work (maybe with NAT) and with aiccu providing IPv6 that naturally leads to dual-stack.

I think that IPv4-only is not a valid option here. Sure, I can choose not to distribute IPv6 from aiccu to my IPv4-only LAN, but then there are no problems anyway.

I can imagine IPv6-only networks with aiccu, but because IPv6 is so complicated in this case, I would leave it to the administrators to configure it their preferred way. The idea of forcing the only RedHat configuration would probably not work.

Comment 93 Pavel Šimerda (pavlix) 2013-03-21 18:14:33 UTC

(In reply to comment #92)
> (In reply to comment #91)
> > It's cheap to say that unless you have a better one. That doesn't mean
> > you're not right, though.
> 
> I have described one right approach in the following part of the same
> sentence, so I do not consider it cheap in any way. 

Then either you weren't clear enough or I wasn't bright enough.

> > > unless you would be able to synchronously wait for bind to finish
> > > its restart and only then restart aiccu.
> > 
> > That is why I'm bringing up this issue.
> 
> Maybe that synchronous waiting for bind to finish its restart would be too
> complicated, in that case there still is the possibility to change aiccu to
> properly detect at least the DNS outage (or in the ideal situation any
> transient outage) and not exit in that case (retry instead).

Upstream first. I will be more than happy if you can persuade upstream to accept such a modification. Otherwise I'm going to be very careful admitting Fedora-specific patches to solve general problems.

> Retrying in
> aiccu should probably be configurable to avoid undesirable side effects,
> because I can imagine that in some cases exitting might be preferred.

If you are willing to work on that, I will be more than happy and I wish you best luck persuading upstream to accept changes like that.

> > Yet I do have to think about IPv4-only, IPv6-only and dualprotocol networks
> > when looking for solutions that actually make sense and don't bring a bunch
> > of new problems.
> > 
> > But I admit that with aiccu we're in a situation that might justify quick
> > fixes to just make it work.
> 
> I believe that with aiccu most users (if not all) run dual-stack networks,

That doesn't constitute a reason to forget about the users that run IPv6-only networks with aiccu. And I can assure you that there are users that run IPv6-only networks behind aiccu.

> because they need IPv4 connectivity for aiccu to work (maybe with NAT) and
> with aiccu providing IPv6 that naturally leads to dual-stack.

I'm afraid you are forgetting that we were talking about the local network, not the router running aiccu. We were talking about potentially disabling radvd for local network when aiccu cannot connect.

But this discussion doesn't help much anyway, as we shouldn't force users and especially IPv6 experimenters and early adopters (for whom aiccu was created) to our own use cases.

> I think that IPv4-only is not a valid option here.

And you are wrong for the reasons stated above.

> Sure, I can choose not to
> distribute IPv6 from aiccu to my IPv4-only LAN, but then there are no
> problems anyway.

As long as it is a matter of choice, not force.

> I can imagine IPv6-only networks with aiccu, but because IPv6 is so
> complicated in this case,

It's not complicated at all.

> I would leave it to the administrators to configure it their preferred way. 

Agreed. With sane defaults. Which for radvd is the current defaults (running it unconditionally).

> The idea of forcing the only RedHat configuration would probably not work.

There is no red hat configuration that you could even consider.

Comment 94 Josh Reynolds 2013-03-22 01:23:13 UTC

(In reply to comment #91)
> (In reply to comment #87)
> > I supposed so, but I was not able to find any NetworkManager dispatcher
> > scripts for this in my Fedora 18.
> 
> # rpm -qf /etc/NetworkManager/dispatcher.d/13-named 
> bind-9.9.2-8.P1.fc18.x86_64
> 
[snip]
> > unless you would be able to synchronously wait for bind to finish
> > its restart and only then restart aiccu.
> 
> That is why I'm bringing up this issue.
> 
Well the script we're considering using begins with 15 so would be ran after 13 (named bind above) when the interface comes up which I assume would address this concern.

> (In reply to comment #89)
> > Can
> > we include that as part of the package ASAP since it gets old having to
> > manually restart the aiccu service on boot as is.
> 
> Definitely. Did you test the dispatcher script you're attaching?
> 
Nope, like I said in comment #90. :-| Although it was just a permissions issue so when I changed them to match the other scripts in the dispatcher.d dir then it ran fine. Now, I'm attaching another revision of the script because along with us using this script the service should *not* be enabled (to load on boot) nor testing that it is set to enabled before loading. It will only load too early (like current behavior) or twice as is. I did test my latest & works as it should.

> > In-between each of those 2 solutions would be to use
> > NetworkManager-wait-online.service (uses nm-online) but bug 837793 is
> > holding that up for now.
> 
> I consider NetworkManager-wait-online.service a much uglier hack then the
> dispatcher scripts.

What?!? Fedora is trying its best to migrate from SysV init to systemd and that is the recommended way to handle this situation as per:
http://www.freedesktop.org/wiki/Software/systemd/NetworkTarget
Otherwise why do we even have it setup to run as a systemd service? We might as well remove all the systemd related things if this isn't the goal. I/we can explain further if needed....

Also I'm not sure if all Fedora derivatives or spins always have NetworkManager installed or not but might want to consider adding it as a dependency for now when repackaging.

Comment 95 Josh Reynolds 2013-03-22 01:27:46 UTC

Created attachment 714220 [details]
Update2: NetworkManager dispatcher script to handle aiccu

Second update to original NM script to not stop the service & to not require to be "enabled."

Comment 96 Pavel Šimerda (pavlix) 2013-03-22 09:19:08 UTC

Thanks, Josh.

Unfortunately, I don't think we should start aiccu when the administrator didn't explicitly enable it. On the other hand, we might want aiccu to wait for NetworkManager on boot. I don't say this is easy and we struggle to solve this problem with NetworkManager and I heard that rsyslog had problems along that line with systemd also.

Could you please post what is the status from systemd's perspective when aiccu goes down because of no connectivity? If we could detect that specific condition via systemd, that would be helpful. 

Cheers,

Pavel

Comment 97 Josh Reynolds 2013-03-30 03:07:51 UTC

Well my assumption was that with bug 837793 open & other comments here that it would affect this situation but it actually doesn't as I just found out with simply enabling NetworkManager-wait-online.service and rebooting. So this whole discussion about there being a bug in aiccue was in vain due to our lack of knowledge of how to use systemd & NetworkManager together to delay aiccu to start until after the network is up, like upstream asked for in their second reply.

So again we don't need to change anything and just need to direct users of aiccu to enable the NetworkManager-wait-online.service. FYI, it starts at about the same time the NM dispatcher scripts do. Only catch is it could delay startup 30 seconds so isn't enabled by default.

Now the script could still be used along with disabling aiccu from starting automatically (so doesn't load twice) if someone wants to hold on to the old style of things.

This discussion does bring up a great example of the need to get the word out about using NetworkManager-wait-online.service more. Not sure if adding a README or something to each package that depends on network.target or forcing requiring nm-online for those that do. There's plenty of other similar packages affected (see bug 744399). NetworkManager-wait-online.service was added in 2011 with bug 692008 & mailing list discussion (http://lists.freedesktop.org/archives/systemd-devel/2011-March/001692.html) so should be better known & used by now....

Last thing is for upstream or those brave enough to edit SixXS' code, someday aiccu should be revised to keep trying to connect to the TIC server but quit if login or other things fail at start (like our edited patch). Also someyear could be rewritten as what systemd calls a new-style daemon.

Comment 98 Pavel Šimerda (pavlix) 2013-03-30 14:08:51 UTC

(In reply to comment #97)
> Well my assumption was that with bug 837793 open & other comments here that
> it would affect this situation but it actually doesn't as I just found out
> with simply enabling NetworkManager-wait-online.service and rebooting.

NetworkManager-wait-online is a hack for a very specific situation.

> So this whole discussion about there being a bug in aiccue was in vain

I don't agree.

> due to
> our lack of knowledge of how to use systemd & NetworkManager together to
> delay aiccu to start until after the network is up, like upstream asked for
> in their second reply.

I don't see them asking for that in their second reply.

> So again we don't need to change anything

So you propose to leave this unfixed? 

> and just need to direct users of
> aiccu to enable the NetworkManager-wait-online.service.

Looking at the upstream NetworkManager tree, NetworkManager-wait-online.service is automatically enabled by enabling NetworkManager.service.

> FYI, it starts at about the same time the NM dispatcher scripts do.

It is still and ugly hack that doesn't work in general. You are IMO specifically testing it in a situation where you have connectivity in 30 seconds after reboot. It won't work otherwise.

> Only catch is it could delay startup 30 seconds so isn't enabled by default.

It is specifically designed to delay startup of network services up to 30 seconds when there's no connectivity. And it will probably get enabled when you enable NetworkManager by hand. This looks like a bug in NetworkManager or at least an inconsistency.

> Now the script could still be used along with disabling aiccu from starting
> automatically (so doesn't load twice) if someone wants to hold on to the old
> style of things.

I think the latest version of the script is misguided from the beginning and isn't suitable for distribution, as it circumvents the systemd tools for enabling/disabling services.

> This discussion does bring up a great example of the need to get the word
> out about using NetworkManager-wait-online.service more.

Or a great example of giving more value to partially working hacks as opposed to working solutions.

> Not sure if adding
> a README or something to each package that depends on network.target or
> forcing requiring nm-online for those that do.

As nm-online is just a hack to slighty delay the start of network services, it is up to the packagers to choose whether the package needs such a hack.

> There's plenty of other
> similar packages affected (see bug 744399).

Please reference bugs by URL :).

> NetworkManager-wait-online.service was added in 2011 with bug 692008 &
> mailing list discussion
> (http://lists.freedesktop.org/archives/systemd-devel/2011-March/001692.html)
> so should be better known & used by now....

It is good enough as a quick hack to match the previous limited network-scripts behavior. The less it is used, the better.

> Last thing is for upstream or those brave enough to edit SixXS' code,
> someday aiccu should be revised to keep trying to connect to the TIC server
> but quit if login or other things fail at start (like our edited patch).

Some day aiccu could be properly integrated to NetworkManager e.g. as a specialized VPN plugin. Then the administrator could even bind it to specific physical connections where he wants to use it.

On top of the VPN plugin system there is also a VPN GUI plugin system so it could be seamlessly integrated to the network configuration. After all the main use for aiccu is on laptops and desktops behind NAT and there specifically nm-wait-online fails on all levels.

> Also someyear could be rewritten as what systemd calls a new-style daemon.

As systemd doesn't control network connections, it makes more sense to integrate with NetworkManager than with systemd. And dbus-activatable daemons don't match the original syntax of service start/stop. I used to think that dbus activation is a great idea but I'm no longer convinced about it.

For example see https://bugzilla.redhat.com/show_bug.cgi?id=815243

Comment 99 Josh Reynolds 2013-03-31 21:32:01 UTC

(In reply to comment #98)
> (In reply to comment #97)
> > So this whole discussion about there being a bug in aiccue was in vain
> 
> I don't agree.
> 
> > due to
> > our lack of knowledge of how to use systemd & NetworkManager together to
> > delay aiccu to start until after the network is up, like upstream asked for
> > in their second reply.
> 
> I don't see them asking for that in their second reply.
> 

OK how about:
(In reply to comment #14)
> (In reply to comment #12)
> > restarting in 10 seconds seems reasonable, because the
> > network may be up by then.
> 
> What about you start the daemon AFTER network connectivity is there and the
> time is properly synced.
> 
> This is what is done with other VPN tools too.

(In reply to comment #98)
> (In reply to comment #97)
> > So again we don't need to change anything
> 
> So you propose to leave this unfixed? 
> 

No I was saying no code is needed to be changed since just enabling the NetworkManager-wait-online.service seems to be an acceptable fix for now. Upstream doesn't seem to be okay with us editing their code nor up for doing it themselves even though its obvious that its not going to kill their server if we can't even reach it. The NM dispatcher script is better than none of those though but now probably is best for leaving it for someone to just copy the file themselves (and not including in the package) since there's only room for a few scripts without renaming existing default dispatcher scripts.

> > and just need to direct users of
> > aiccu to enable the NetworkManager-wait-online.service.
> 
> Looking at the upstream NetworkManager tree,
> NetworkManager-wait-online.service is automatically enabled by enabling
> NetworkManager.service.
> 

Well I started with a fresh F17 install then upgraded to F18 & had to manually enable NetworkManager-wait-online.service for the first time the other day. Elsewhere I've seen this is listed as intentional behavior although what I saw could be out of date where someone decided that future releases has it enabled by default.

> > FYI, it starts at about the same time the NM dispatcher scripts do.
> 
> It is still and ugly hack that doesn't work in general. You are IMO
> specifically testing it in a situation where you have connectivity in 30
> seconds after reboot. It won't work otherwise.
> 

Huh? You keep saying its a horrible hack but not explaining any details on why or giving a better solution. Its clear in the man page for nm-online that it waits "until NetworkManager reports an active connection, or specified timeout expires." Looking at the source shows it uses dbus to check with NM so it doesn't just sit and wait for 30 seconds needlessly. The advantage of using it is that all systemd service unit files that have network.target as a dependency will wait to start till nm-online reports that there is a connection or timeout (like this one). If not used then NetworkManager starts at exactly the same time as all the rest that are waiting for network.target. Now until daemons/services are redesigned to not depend on network.target then this seems best.

> > Only catch is it could delay startup 30 seconds so isn't enabled by default.
> 
> It is specifically designed to delay startup of network services up to 30
> seconds when there's no connectivity. And it will probably get enabled when
> you enable NetworkManager by hand. This looks like a bug in NetworkManager
> or at least an inconsistency.
> 

One could easily use a replacement service unit file to specify 45 or 60 seconds for the timeout. Also, what specifically looks to be a bug?

> > Now the script could still be used along with disabling aiccu from starting
> > automatically (so doesn't load twice) if someone wants to hold on to the old
> > style of things.
> 
> I think the latest version of the script is misguided from the beginning and
> isn't suitable for distribution, as it circumvents the systemd tools for
> enabling/disabling services.

You are conflicting yourself sir:
(In reply to comment #77)
> (In reply to comment #76)
> > (In reply to comment #75)
> > The dispatcher script has one other disadvantage as far as Jereon is
> > concerned. It shuts down AICCU when connectivity is lost. This is apparently
> > not the desired behavior as it's suppose to handle local IP changes and the
> > like just fine after it's retrieved the setup information from the TIC
> > server.
> 
> So my proposal is to use the dispatcher script by default and let the user
> decide whether he wants to modify it or remove it.

Although its been mentioned more times than I want to quote, I'm getting the feeling you still believe that AICCU needs to be restarted if the connection is lost. The main issue with aiccu is that it fails to load if a connection isn't there /upon startup/ but once it loads successfully the first time then will maintain the IPv6 tunnel despite network interruptions.

(In reply to comment #98)
> (In reply to comment #97)
> 
> > This discussion does bring up a great example of the need to get the word
> > out about using NetworkManager-wait-online.service more.
> 
> Or a great example of giving more value to partially working hacks as
> opposed to working solutions.
> 

Again, please share what working solution you have in mind since you're thoroughly confusing me on your viewpoints of this.

> > Last thing is for upstream or those brave enough to edit SixXS' code,
> > someday aiccu should be revised to keep trying to connect to the TIC server
> > but quit if login or other things fail at start (like our edited patch).
> 
> Some day aiccu could be properly integrated to NetworkManager e.g. as a
> specialized VPN plugin. Then the administrator could even bind it to
> specific physical connections where he wants to use it.
> 
> On top of the VPN plugin system there is also a VPN GUI plugin system so it
> could be seamlessly integrated to the network configuration. After all the
> main use for aiccu is on laptops and desktops behind NAT and there
> specifically nm-wait-online fails on all levels.
> 

Yeah I asked dcbw about adding this & other transitional IPv6 methods quite awhile ago but making sure native IPv6 works properly I'm sure is more the focus.

> > Also someyear could be rewritten as what systemd calls a new-style daemon.
> 
> As systemd doesn't control network connections, it makes more sense to
> integrate with NetworkManager than with systemd. And dbus-activatable
> daemons don't match the original syntax of service start/stop. I used to
> think that dbus activation is a great idea but I'm no longer convinced about
> it.
> 
> For example see https://bugzilla.redhat.com/show_bug.cgi?id=815243

I agree that integration with NetworkManager does make more sense than with systemd for network dependent services. I don't know enough to chime in on the dbus angle.

Either way thanks for helping out with leading this! I think we're getting closer now.

Comment 100 Rolf Fokkens 2013-04-01 19:34:01 UTC

While the discussion goes on I upgraded to F18, after which aiccu no longer started during boot. So I reapplied comment 62, and now all works fine again.

Regarding integration with NetworkManager (previous comments) I'd like to add that I'm not using NetworkManager because bridges are not supported (or has that changed?), so for non-NetworkManager users the suggestion in comment 62 may only work (for some users).

Comment 101 Pavel Šimerda (pavlix) 2013-04-01 22:10:46 UTC

(In reply to comment #100)
> While the discussion goes on I upgraded to F18, after which aiccu no longer
> started during boot. So I reapplied comment 62, and now all works fine again.

Thank you.

> Regarding integration with NetworkManager (previous comments) I'd like to
> add that I'm not using NetworkManager because bridges are not supported (or
> has that changed?),

It did change with the 0.9.8 release but the bridging support is not yet perfect. I'm maintaining a wiki page for that but if it doesn't work for you, please contact me and I'll give you more information about what's going on:

https://fedoraproject.org/wiki/Networking/Bridging#The_.27keyfile.27_way

> so for non-NetworkManager users the suggestion in
> comment 62 may only work (for some users).

But then maybe we should add all nameservers that could be possibly used as local recursors. I'm curious whether it is just a coincidence that bind is already able to resolve when the next service is started or whether it is craft.

Cheers,

Pavel

Comment 102 Josh Reynolds 2013-04-02 01:52:26 UTC

Well, after I had time to test what happens with no Internet on boot I see what Pavel means that nm-wait-online only helps if within the timeout period & doesn't reload things later on like the NM dispatcher scripts would so I'm changing back to that for now. Maybe we can talk upstream into considering the revised patch or just patch it.

Comment 103 Fedora End Of Life 2013-07-04 06:28:30 UTC

This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 104 Bill Smith 2013-10-21 00:26:42 UTC

I read through this entire thread and saw the same entries in /var/log/messages that were explained in Comment 62:

Oct 20 19:53:38 citadel aiccu: Couldn't resolve host tic.sixxs.net, service 3874
Oct 20 19:53:38 citadel aiccu: Couldn't connect to the TIC server tic.sixxs.net
Oct 20 19:53:38 citadel aiccu: Couldn't retrieve first tunnel for the above reason, aborting
Oct 20 19:53:38 citadel systemd[1]: aiccu.service: control process exited, code=exited status=255
Oct 20 19:53:38 citadel systemd[1]: Unit aiccu.service entered failed state.


I tried the fix offered in Comment 62 and found no joy.


I read and tried the fix in Comment 97 and was pleased to see that it worked just fine:

Oct 20 20:10:14 citadel aiccu: Succesfully retrieved tunnel information for T#####
Oct 20 20:10:14 citadel aiccu: AICCU running as PID 1270
Oct 20 20:10:15 citadel aiccu: [AYIYA-start] : Anything in Anything (draft-02)
Oct 20 20:10:15 citadel aiccu: [AYIYA-tun->tundev] : (Socket to TUN) started


This is what I'm running:

$ cat /etc/fedora-release 
Fedora release 19 (Schrödinger’s Cat)
$ rpm -qa | grep aiccu
aiccu-2007.01.15-16.fc19.x86_64
$

My RPMs are updated regularly.

My need for AICCU is simple: my ISP, for reasons that they fail to explain to me, has no plans to offer IPv6 to my neighborhood.  I therefore must provide my own connection via tunnel.  SIXXS provides this nicely.  :)

I appreciate the efforts on all sides in providing a solution to this matter.

Comment 105 Pavel Šimerda (pavlix) 2014-03-18 10:02:31 UTC

I'm posting an update because there hasn't been one for long time.

The problem that aiccu depends on network connectivity instead of waiting for it is still there. As a system service, it should never fail and should always be ready to pick up. The precise details on how it should react to specific events aren't set in stone and there is a disagreement between the needs of people using aiccu as a system service and the aiccu upstream who prefer the use case of running aiccu more as a driver of one specific tunnel connection.

Integration points with systemd and NetworkManager are also a bit unclear. For NetworkManager, the bast way to integrate aiccu would be to have a connection plugin for aiccu. For non-NetworkManager folks, a separate daemon would work as well, or a modification/fork of aiccu working as a full-fledged system service with some IPC to start/stop the tunnel would work.

I don't have the motivation to write up any of the possible implementations, as I'm using aiccu only for testing purposes and only occasionally. But I'll be happy to help if someone takes that task.

(In reply to Bill Smith from comment #104)
> I read through this entire thread and saw the same entries in
> /var/log/messages that were explained in Comment 62:
>
> I tried the fix offered in Comment 62 and found no joy.

In general, running a recursive nameserver shouldn't be helpful nor harmful.

> I read and tried the fix in Comment 97 and was pleased to see that it worked
> just fine:

Waiting for nm-online fixes at least one very specific and very common use case. Do you suggest any modifications to the current aiccu package? There are other packages that make use of waiting for nm-online. It would be possible to contact someone from NetworkManager for clarification. My activity in NetworkManager is very limited right now.

Comment 106 Pavel Šimerda (pavlix) 2014-04-01 06:38:27 UTC

So finally there's some news for the lack of connectivity at startup. I discussed it with Brno systemd developers and it would be theoretically possible to use nm-online directly in the ExecStartPre. The problem is that nm-online itself doesn't check whether NetworkManager is about to be started or not and waits unconditionally. A solution also useful to non-NetworkManager setups would have to check that.

Currently I see basically two possibilities:

1) Close this bug and let people enable NetworkManager-wait-online.service. Wiki could be used to document it.

2) Add a more sophisticated script capable of waiting for NetworkManager only if it's started or scheduled to be started.

Comment 107 Pavel Šimerda (pavlix) 2014-04-01 11:19:39 UTC

And... after another discussion...

3) Fix the NetworkManager-wait-online.service so that nm-online is only being run when NetworkManager is running or scheduled to run. There's currently no known way to do that but it would help many other services, not just aiccu.

Comment 108 Pavel Šimerda (pavlix) 2014-04-23 15:43:52 UTC

(In reply to Pavel Šimerda (pavlix) from comment #107)
> And... after another discussion...
> 
> 3) Fix the NetworkManager-wait-online.service so that nm-online is only
> being run when NetworkManager is running or scheduled to run. There's
> currently no known way to do that but it would help many other services, not
> just aiccu.

What if NetworkManager just included Wants=NetworkManager-wait-online?

Comment 109 Pavel Šimerda (pavlix) 2014-05-02 12:03:08 UTC

I'm going to modify the aiccu service file so that it follows the current systemd documentation and also maintains backwards compatibility in case it's used with an older systemd.

diff --git a/aiccu.service b/aiccu.service
index cef6a7c..0dab022 100644
--- a/aiccu.service
+++ b/aiccu.service
@@ -1,6 +1,7 @@
 [Unit]
 Description=AICCU (Automatic IPv6 Connectivity Configuration Utility)
-After=syslog.target network.target
+Wants=network.target network-online.target
+After=network.target network-online.target
 
 [Service]
 Type=forking

This doesn't currently help any of the users as the issue is in NetworkManager service files and/or systemd itself, not aiccu [talking about the specific issue of boot up ordering, not other aiccu integration problems].

As we cannot solve the startup in aiccu and upstream is not willing to work on the other related issues, I'm closing this bug as WONTFIX. Anyone who is still concerned about the issues is welcome to participate in the following:

1) NetworkManager service files should be updated to implement network-online.target properly. There's an upstream bugzilla ticket for it.

https://bugzilla.gnome.org/show_bug.cgi?id=728965

Feel free to start a Fedora bug ticket for it if interested, as I'm *not* doing that.

2) Systemd should be updated to support initscripts properly. This doesn't affect Aiccu on Fedora directly but can delay #1. This is in their TODO list.

http://cgit.freedesktop.org/systemd/systemd/commit/?id=d20850cbf4545715340580c179cf316005d53905

3a) Work with Aiccu upstream about any issues related to dynamically acquiring or losing connectivity.

3b) Submit acceptable patches in a new Fedora bug report (as those would not be directly related to the reported issue).

Thanks everyone for participating and sorry for not reaching a better solution. The generally accepted workaround is:

systemctl enable NetworkManager-wait-online.service

Comment 110 Jeroen Massar 2014-05-02 14:29:17 UTC

> 3a) Work with Aiccu upstream about any issues related to dynamically acquiring or losing connectivity.

Even though the first letter in AICCU stands for Automatic, that does not mean it can automatically solve all connectivity issues.

If the user has a setup that is broken, the result will be broken. In the same way, when the user decides to start AICCU when there is no connectivity, AICCU will log an error and exit.

As such "dynamically acquiring ... connectivity" is not a possibility, as AICCU can never know when the user wants to gain that connectivity and if it is in a broken state it is unable to repair that state as it is possibly intended that that state is like that.


For the "or losing connectivity" part, this is also earlier in this thread, I have still to receive a single email/ticket that actually contains details about what people mean with this let. With details I mean things like address lists, routing tables, interfaces, tcpdumps and other such things that could actually help in diagnosing the issue; let alone replicating it.

Saying "it crashes" is not helpful, as then I'll ask again for the coredump so that I can actually look at that "crash".


As such, as asked above: please actually provide details. kthx.

Comment 111 Pavel Šimerda (pavlix) 2014-05-02 16:44:46 UTC

(In reply to Jeroen Massar from comment #110)
> If the user has a setup that is broken,

I'm afraid that in 2014 wireless networking with laptops and other mobile devices is a standard, not a broken configuration. Linux distributions now run either NetworkManager or a similar tool that allows the user to connect to new networks at any time, automatically or manually.

A software that *assumes* that devices *only* acquire connectivity at boot time cannot be cosidered ready for use with contemporary technology on contemporary linux distributions.

> when the user decides to start AICCU

In contemporary linux distributions, system services are typically not being started from a user session. And automated networking tools like NetworkManager, dhclient, wpa_supplicant and, in an ideal case, aiccu.

Services started by the user are typical for developers' environment, not for those of end users.

I already described some possibilities in my previous commands but to solve those integration issues, aiccu would basically need to either (a) be driven by a tool like NetworkManager (this solution is common for VPNs) or (b) be able to run as a regular service and wait for network connectivity itself.

> As such "dynamically acquiring ... connectivity" is not a possibility,

Steps to reproduce:

1) Boot a distribution with networking powered by NetworkManager.

2) Check that you have no connectivity.

3) Request that NM connects you to a wireless network.

4) Check that you now have connectivity.

Debunked.

> As such, as asked above: please actually provide details. kthx.

The very use case described above shows where integration between aiccu and the rest of the system doesn't work as expected. So far upstream didn't express any interest in improving that.

Comment 112 Jeroen Massar 2014-05-02 21:41:11 UTC

(In reply to Pavel Šimerda (pavlix) from comment #111)
> (In reply to Jeroen Massar from comment #110)
> > If the user has a setup that is broken,
> 
> I'm afraid that in 2014 wireless networking with laptops and other mobile
> devices is a standard, not a broken configuration.

Did I say that that kind of setup (wireless etc) is a broken configuration?

Please note that this is the first time in this ticket that wireless is even mentioned (not counting the two mentions of 'wifi' as an example network)

> Linux distributions now run either NetworkManager or a similar tool that
> allows the user to connect to new networks at any time, automatically or manually.

Thus what you are ACTUALLY complaining about is that you changed the networking paradigm on your platform and that you would love to see Network Manager support.


> A software that *assumes* that devices *only* acquire connectivity at boot
> time cannot be cosidered ready for use with contemporary technology on
> contemporary linux distributions.

AICCU does not assume that. You can actually start it at any point later if you want. (just do not do that in a repeating fashion that hammers on the external service that you are using...)

> > when the user decides to start AICCU
> 
> In contemporary linux distributions, system services are typically not being
> started from a user session. And automated networking tools like
> NetworkManager, dhclient, wpa_supplicant and, in an ideal case, aiccu.

You mean that there are daemons that start these things, where that daemon has an interface that allows a non-0-uid user to manage those services through that interface.

That is the same as letting the user run 'sudo aiccu start' and setting the correct permissions (eg a group called 'aiccu' where the user has to be in) in the sudoers(.d/aiccu) file.

Nothing "contemporary" about that, that is how services have been managed for decades.

But for the fun of it, lets check nm-openvpn-service.conf:
------
<busconfig>
        <policy user="root">
                <allow own="org.freedesktop.NetworkManager.openvpn"/>
                <allow send_destination="org.freedesktop.NetworkManager.openvpn"/>
        </policy>
        <policy context="default">
                <deny own="org.freedesktop.NetworkManager.openvpn"/>
                <deny send_destination="org.freedesktop.NetworkManager.openvpn"/>
        </policy>
</busconfig>
-----------

Righty, thus a user who knows the root password can communicate with that service.
Gee, exactly what I described above.

What a lot of undocumented code in there.

> Services started by the user are typical for developers' environment, not
> for those of end users.

How are end-users different from developers?

I am a developer and I run everything as my own uid when developing unless it is a tool that needs special privileges at which point I use sudo and friends.

End-users use their systems in the same way, the sudo bit just happens automatically through some GUI magic (which calls a daemon with more privs and/or uses a simple system(sudo whatever) to do the magic.

> I already described some possibilities in my previous commands but to solve
> those integration issues, aiccu would basically need to either (a) be driven
> by a tool like NetworkManager (this solution is common for VPNs) or (b) be
> able to run as a regular service and wait for network connectivity itself.

Then the solution would be obviously a) would it not?

This is apparently what OpenVPN does, or actually how Network Manager adds support for OpenVPN, as no actual modifications to OpenVPN where made to add support for this.

> > As such "dynamically acquiring ... connectivity" is not a possibility,

You skipped the rest of the sentence, let alone the rest of the question. Again, "wireless" was not even mentioned before your previous comment.

> Debunked.

(You debunked that you don't read the whole sentence? :) Sorry, no fish.)

> > As such, as asked above: please actually provide details. kthx.
> 
> The very use case described above shows where integration between aiccu and
> the rest of the system doesn't work as expected. So far upstream didn't
> express any interest in improving that.

Network Manager is something introduced by you, not by the 'upstream'.

Now, if there was a simple feature request for support for Network Manager support, instead of a huge thread about how AICCU "crashes" and other such things (without ever providing any kind of logs or other details), that would be quite a completely different thing would it not?


I still see not a single routing table, no addresses, no interfaces, no tcpdump (and please, use the underlying IPv4 interface, not the tunnelled interface...). And this ticket really contains "crash" as a reason several times.


PS: is "contemporary" on the "use this word today list" somewhere?

-----

As an alternative one could make a simple dispatch script based on:
https://wiki.archlinux.org/index.php/NetworkManager#Use_dispatcher_to_connect_to_a_VPN_after_a_network-connection_is_established

The big big problem with that is that there is no way for AICCU to tell the user "hey your setup is broken (time off, no connectivity etc)", as AICCU does not know where the user's eyes are (CLI? GUI? GNOME? KDE?).

As such, the user will never figure out WHY it is broken, the user will thus keep on "turning it on". AICCU keeps on restarting, and reporting the error, which ends up in syslog, which the user does not read as that is not where their eyes are...

See where the problem leads?

Comment 113 Pavel Šimerda (pavlix) 2014-05-05 07:56:06 UTC

Jeroen,

you could simply say you're not interested in integration of aiccu into the current versions of Fedora and other distributions as I describe it. You can certainly complain about various changes in distributions including adopting NetworkManager or any other software but that doesn't change much.

I consider this bug report a pretty good resource of how users want to use aiccu. They're happy they have a semi-automated solution to connect to IPv6 via tunnels and they would welcome an improvement to a fully automated solution that, once configured, automatically piggybacks on any running (IPv4-only) connection.

Upstream has chosen to ignore it. I, as the maintainer, did my best to fix the actual reported bug on the aiccu side, I'm in contact with systemd upstream folks about a system-wide fix, and I closed it as WONTFIX with a workaround specified.

For the related issues, I'm going to close any new bugs as WONTFIX unless they include a decent patch or there's a clear indication that upstream changed their mind.

Cheers,

Pavel

Comment 114 Jeroen Massar 2014-05-05 09:17:59 UTC

> You could simply say you're not interested in integration of aiccu into the
> current versions of Fedora and other distributions as I describe it. You can
> certainly complain about various changes in distributions including adopting
> NetworkManager or any other software but that doesn't change much.

I thought that I quite clearly made the point that you are asking for a new feature "Network Manager integration", as that is your problem, thus why you are stating the above is unclear to me.

> I consider this bug report a pretty good resource of how users want to use aiccu.

s/bug report/feature request/ right?

I only see people asking for the ability to start it later then they actually have IPv4 connectivity. That ability is already there, for years, you just call 'aiccu start' and it works.

As for "automatically starting it after some random entity (be that network manager or the user themselves) has established connectivity", it needs to be made possible to integrate it properly and provide integration so that the user actually can see what is wrong. Obviously nobody reads log files (though a few people actually did check them in this thread).

See also the last section of my previous mail. The network manager 'feature' can be done easily that way, but the user will never be properly notified of any issues that might be there.

Note also that some people stated that just enabling the waitonline option (which you consider a 'hack', which it is) makes things work for them and they even state it is not an AICCU issue, but a NM issue.


> They're happy they have a semi-automated solution to connect to IPv6 via
> tunnels and they would welcome an improvement to a fully automated solution
> that, once configured, automatically piggybacks on any running (IPv4-only)
> connection.

As I described it is impossible to always automatically set things up. There are various situations where the network does not allow (policy, but also forced firewalling), next to the situations where there is already native IPv6 available (which might be broken, eg when a rogue router does RAs but does not have connectivity, which btw might be on purpose) just to name a few issues.

As such, "fully automatic" cannot work and AICCU has to error, which is what it does already. That is the only way the user can notice and can resolve that problem. AICCU is unfortunately not able to perform magic and nor am I going to teach it to override a decision by either the administrator or the policy of a network.

And in the end, without the ability to show a message properly to the user and the user not reading the logs (which they should), they will just see it as broken.

> Upstream has chosen to ignore it.

With the large number of replies I have been giving here, how exactly have I ignored what?

I have been asking several times for actual details of the multiple different "problems" mentioned in this ticket. I have not seen a single proper reply (again: core dumps, as crashes are mentioned, routing tables, tcpdumps etc)

> I, as the maintainer, did my best to fix the actual reported bug on the aiccu
> side, I'm in contact with systemd upstream folks about a system-wide fix, and I
> closed it as WONTFIX with a workaround specified.

Here you are claiming there is no bug with AICCU per se but with the "integration" that you did and that that problem is actually caused by something else...

What am I supposed to do about this?

> For the related issues, I'm going to close any new bugs as WONTFIX unless they
> include a decent patch or there's a clear indication that upstream changed
> their mind.

Instead of asking for random patches, maybe do what I have been doing several times already: ask for details about the problem...

Actually knowing what a problem is and what causes it is really the best way to solve a problem properly.

And no, I will not change my mind about accepting random patches with no background whatsoever that only cause problems (as seen in this very thread...)
At least you removed that "patch" and thus did not cause many users to DoS our servers.

Comment 115 Leoš Bitto 2014-05-05 11:25:57 UTC

(In reply to Jeroen Massar from comment #114)
> 
> I only see people asking for the ability to start it later then they
> actually have IPv4 connectivity. That ability is already there, for years,
> you just call 'aiccu start' and it works.
> 

Unfortunately it does not work when there is a temporary DNS outage - then AICCU fails to resolve the IPv4 address of tic.sixxs.net and stops.

> 
> As such, "fully automatic" cannot work and AICCU has to error, which is what
> it does already. That is the only way the user can notice and can resolve
> that problem. AICCU is unfortunately not able to perform magic and nor am I
> going to teach it to override a decision by either the administrator or the
> policy of a network.
> 

I would appreciate if AICCU could perform at least this bit of magic: Clearly distinguish (maybe by using a different exit status) when it exits due to a temporary outage (and so it is safe to restart it automatically) and when it exits due to a permanent failure (and so it must not be restarted automatically).

Comment 116 Jeroen Massar 2014-05-05 11:42:20 UTC

Leos: thank you for your proper question instead of just poking into eyes.


> Unfortunately it does not work when there is a temporary DNS outage - then AICCU
> fails to resolve the IPv4 address of tic.sixxs.net and stops.

AICCU cannot know that that situation is temporary, let alone that it is because there is a "DNS outage", it can be a lot of other situations.

AICCU thus logs and exits. When the problem has been resolved, you can manually try again.

> I would appreciate if AICCU could perform at least this bit of magic:
> Clearly distinguish (maybe by using a different exit status) when it
> exits due to a temporary outage

But how can it know the problem is "temporary"?

As per your "DNS outage" example, how would AICCU know that "DNS" is out? It does not do the DNS request itself, it just asks the resolver library to answer. Now, you could say "well, it if does not resolve, it could be temporary"; ever heard of RPZ? Indeed, it might be an administrator who blocked it. Or even a better example for home networks: the little DNS resolver in your DSL/Cable-box/Wifi-thing/NAT-box is broken and does not handle AAAA, (large) TXT records, multiple A records[1]: AICCU does not get any response; is that a "temporary" issue, should AICCU keep on querying the DNS box till it gets an answer that satisfies some random requirement?

How can it differentiate between these states?

[1] we published ~20-ish A records once for tic.sixxs.net and quite a few clients suddenly fell over; this while it is totally valid for DNS, just requires TCP on the DNS level...

> (and so it is safe to restart it
> automatically) and when it exits due to a permanent failure (and so
> it must not be restarted automatically).

If AICCU would be magically able to determine that a state is 'temporary', then it could as you probably then want to simply wait a bit for the temporary problem to be over.

For that matter, as you obviously want an exit, and then restart it, how would the external tool know that the temporary problem is over and then restart AICCU?

As AICCU cannot know that such a situation (eg "DNS outage") is "temporary" let alone how long such a "temporary" situation will last, there is no way to determine how long to wait.


Hence: AICCU logs an error and exits. This so that the user knows there is a problem (as obviously it did not start, and is not running anymore) and the user can investigate and resolve the problem and then try again.

Comment 117 Leoš Bitto 2014-05-05 12:29:50 UTC

(In reply to Jeroen Massar from comment #116)
>
> Hence: AICCU logs an error and exits. This so that the user knows there is a
> problem (as obviously it did not start, and is not running anymore) and the
> user can investigate and resolve the problem and then try again.
>

I am trying to configure AICCU to work automatically, without any user intervention as much as possible. It seems that this setup is unsupported, because AICCU relies on user investigating every problem when it exits. Could we improve this, please?

I know that I must not restart AICCU automatically to avoid hammering on tic.sixxs.net. So far the biggest problems for me are network outages when AICCU starts - they lead to DNS failure which leads to AICCU logging "Couldn't resolve host tic.sixxs.net, service 3874" and stopping. So, if you do not distinguish which outages are temporary and which are permanent, could we do it another way? Let AICCU use its exit status to inform whether it exits before it was able to open the TCP socket to tic.sixxs.net and so it should be safe to restart automatically (after some time), because there probably is just some temporary network outage. In every other case (after the TCP connection to tic.sixxs.net was established) automatic restarts must be avoided, because that could lead to overloading tic.sixxs.net, and the investigation of the situation has to be done by the user.

Comment 118 Jeroen Massar 2014-05-05 13:10:12 UTC

> So far the biggest problems for me are network outages when AICCU starts
> - they lead to DNS failure which leads to AICCU logging
> "Couldn't resolve host tic.sixxs.net, service 3874" and stopping.

How could AICCU resolve this problem? See previous reply, it cannot solve this.

Keeping on restarting it, or even just retrying, will not make your DNS work.

The big question is: what is broken in your network that your DNS resolving does not work? Fix that.


> Let AICCU use its exit status to inform whether it exits before it was able
> to open the TCP socket to tic.sixxs.net and so it should be safe to restart
> automatically (after some time)

As in the previous reply, even if AICCU could determine that it is a 'temporary' issue, it could do that, but how long would it need to wait and should it inform the user? Should it keep on trying forever, what if the problem 'resolve itself' just after it ended trying and gave up?

There is no way to know when an external problem is resolved. The user needs to resolve these problems, they can then start it again.

As an example of a service that reaches out: Does VLC keep on hammering on your DNS and on your server when the connection to an Internet Radio station cannot be made? Indeed, it gives up on that entry, do you notice? yes, as there is no sound. For AICCU though, you will not notice except "it is broken as it does not work", till you read the log and realize what is broken.

Or as another example: does your web-browser retry to contact a website? Nope, it will tell you that either DNS resolution failed or that the connection could not be made and it gives up.

Unfortunately there is no way for AICCU to show a message to the user that easily as it is unknown of the user has a GUI, is in a CLI or is using SSH or a myriad of other methods that the user could use it. It thus just logs and exits and lets the user look at it.

> In every other case (after the TCP connection to tic.sixxs.net was
> established) automatic restarts must be avoided, because that could lead
> to overloading tic.sixxs.net, and the investigation of the situation has
> to be done by the user.

Note that the "do not restart" is not only for not overloading the TIC server (which is handling about a 100 connections/s, of which only about 1 is valid, even though one should never restart, it is also simply because it does not resolve the actual problem, which it nicely logs.

Fixing the actual problem you have is the best thing to do: Fix your DNS!

Comment 119 Pavel Šimerda (pavlix) 2014-05-05 13:35:50 UTC

Hi Leoš,

thanks you for trying.

Pavel

Comment 120 Leoš Bitto 2014-05-05 13:46:58 UTC

(In reply to Jeroen Massar from comment #118)
> > So far the biggest problems for me are network outages when AICCU starts
> > - they lead to DNS failure which leads to AICCU logging
> > "Couldn't resolve host tic.sixxs.net, service 3874" and stopping.
> 
> How could AICCU resolve this problem? See previous reply, it cannot solve
> this.

I am not asking AICCU to resolve this problem.

> 
> Keeping on restarting it, or even just retrying, will not make your DNS work.
> 
> The big question is: what is broken in your network that your DNS resolving
> does not work? Fix that.
> 

Sure I will get this fixed. But I do not want to be forced to manually restart every service which did not start because DNS did not work properly - I want to automate this.

> 
> > Let AICCU use its exit status to inform whether it exits before it was able
> > to open the TCP socket to tic.sixxs.net and so it should be safe to restart
> > automatically (after some time)
> 
> As in the previous reply, even if AICCU could determine that it is a
> 'temporary' issue, it could do that, but how long would it need to wait and
> should it inform the user? Should it keep on trying forever, what if the
> problem 'resolve itself' just after it ended trying and gave up?

I am not asking AICCU to do any of this.

> 
> There is no way to know when an external problem is resolved. 
>

The easy way seems to be to automate restarting AICCU until it connects properly, however I would like to do it safely to avoid overloading tic.sixxs.net. All I need from AICCU is to let me know when it is safe to restart it, its current approach "never restart automatically" is what I would like to improve.

>
> The user needs to resolve these problems, they can then start it again.
>

In the current state - yes. But I would like to improve this to enable automatic resolving of at least some problems (temporary network outages).

> As an example of a service that reaches out: Does VLC keep on hammering on
> your DNS and on your server when the connection to an Internet Radio station
> cannot be made? Indeed, it gives up on that entry, do you notice? yes, as
> there is no sound. For AICCU though, you will not notice except "it is
> broken as it does not work", till you read the log and realize what is
> broken.
> 
> Or as another example: does your web-browser retry to contact a website?
> Nope, it will tell you that either DNS resolution failed or that the
> connection could not be made and it gives up.
> 
> Unfortunately there is no way for AICCU to show a message to the user that
> easily as it is unknown of the user has a GUI, is in a CLI or is using SSH
> or a myriad of other methods that the user could use it. It thus just logs
> and exits and lets the user look at it.

VLC and web-browser are not the same thing as AICCU, for one very important difference: they have GUI which the user uses. AICCU on the other hand can run without any UI, on a machine with no monitor and keyboard and most importantly with no user watchig it - just an administrator investigating it when it does not work. I would like to minimize the administrator's work by safely automating AICCU restarts in the most common situations which lead to AICCU stopped - temporary network outages.

> > In every other case (after the TCP connection to tic.sixxs.net was
> > established) automatic restarts must be avoided, because that could lead
> > to overloading tic.sixxs.net, and the investigation of the situation has
> > to be done by the user.
> 
> Note that the "do not restart" is not only for not overloading the TIC
> server (which is handling about a 100 connections/s, of which only about 1
> is valid, even though one should never restart, it is also simply because it
> does not resolve the actual problem, which it nicely logs.
> 
> Fixing the actual problem you have is the best thing to do: Fix your DNS!

It is not DNS, it is my network connection which is not 100% reliable. I am not able to get 100% reliable network connection. I would like to automate the handling of the network outages and all I need from AICCU is to provide information (I have suggested special exit status) whether it is safe to restart it (because it was not able to connect to tic.sixxs.net and therefore it will not overload it).

Comment 121 Jeroen Massar 2014-05-05 14:16:50 UTC

Pavel wrote:
> thanks you for trying.

Pavel, if you have nothing to add, just don't, it just adds more nonsense to this ticket.

Also, can I suggest that you talk to the Red hat HR department and request a "interaction with developers" training. Passive insults do not fly in this world and will only do you more harm than good.

When you actually have a valid problem description with routing tables, tcpdumps etc, you might continue to attempt this kind of wording.

Leos wrote:
> I am not asking AICCU to resolve this problem.

Then what would resolve the problem and how would AICCU know about it?

> Sure I will get this fixed. But I do not want to be forced to manually
> restart every service which did not start because DNS did not work properly
> - I want to automate this.

Understandable. But what way could AICCU be informed that suddenly everything is fine? Do you expect every service that require similar connectivity to keep on just retrying all the time while the failure scenarios why things are failing are unknown?

> The easy way seems to be to automate restarting AICCU until it connects
> properly,

There is no need for restarts when your network (DNS, connectivity, etc) is working.

> however I would like to do it safely to avoid overloading
> tic.sixxs.net.

Just do not restart it, then you do not cause that problem.

> All I need from AICCU is to let me know when it is safe to
> restart it, its current approach "never restart automatically" is what I
> would like to improve.

You did read the part where I stated that AICCU does not know right?

Hence, if it does not know it can not tell you "restart it then".

You, as a user, can start it manually though as you do know these factors and you are aware that something has been resolved or not.

> But I would like to improve this to enable automatic resolving of at least
> some problems (temporary network outages).

If you are aware that there is a 'temporary network outage', then why do you try to start AICCU? Why not start it when you know that that problem is not there?

As you are talking about 'users', are these users using the CLI (in that case a "sudo aiccu start" works great) or a GUI, in which case they might just need a GUI element that does a similar thing.

> VLC and web-browser are not the same thing as AICCU, for one very important
> difference: they have GUI which the user uses.

Correct. Would it then maybe help to have a UI for AICCU that can be used by the user?

Note though, that will still not resolve the problem or diagnose the issue in any way.

> I would like to minimize the administrator's work by safely automating
> AICCU restarts in the most common situations which lead to AICCU stopped
> - temporary network outages.

Are you aware that once AICCU is running a "temporary network outage" does not affect it right? And that it keeps on running and working fine?

Hence, once you have proper working connectivity, you only need to start it once.

No restarts are needed.

> It is not DNS, it is my network connection which is not 100% reliable.

(You described that your DNS was not working).

Anyway, if your connectivity is not "100% reliable" then resolve that by arranging better connectivity.

> I am not able to get 100% reliable network connection.

Why not? What kind of problems does your network have?

Please note that that will affect ALL your network using applications, not only AICCU.

> I would like to automate the handling of the network outages

As stated, if you have a network outage when you start AICCU it will indeed properly exit noting what the problem is. If it is running it will keep on running.

> and all I need from AICCU is to provide information (I have suggested special
> exit status) whether it is safe to restart it (because it was not able to
> connect to tic.sixxs.net and therefore it will not overload it).

As stated above, it does not know when it is "safe" as it does not know what the status of your connection, or that of the network, is and what parts of that are causing the problem. It can only report that there is a problem, not that it is resolved or how easy it is to fix it.

Comment 122 Pavel Šimerda (pavlix) 2014-05-05 14:22:19 UTC

(In reply to Leoš Bitto from comment #120)
> All I need from AICCU is to let me know when it is safe to
> restart it, its current approach "never restart automatically" is what I
> would like to improve.

That would indeed be a great first step if we could easily distinguish whether aiccu was denied by the service provider or whether it didn't even get there (because of local conditions). Even nicer would be an option to keep aiccu running and be able to notify it when it's suitable to retry it.

Comment 123 Pavel Šimerda (pavlix) 2014-05-05 14:36:26 UTC

(In reply to Jeroen Massar from comment #121)
> Also, can I suggest that you talk to the Red hat HR department and request a
> "interaction with developers" training. Passive insults do not fly in this
> world and will only do you more harm than good.

http://fedoraproject.org/code-of-conduct

While you deserve respect as upstream developer, you are still in the Fedora project's bugzilla and you are expected to refrain from personal attacks.

> When you actually have a valid problem description with routing tables,
> tcpdumps etc, you might continue to attempt this kind of wording.

I believe the issues have been described multiple times in a way that is both understandable and reproducible for anyone.

Comment 124 Jeroen Massar 2014-05-05 14:47:04 UTC

> Even nicer would be an option to keep aiccu running and be able to notify it when it's suitable to retry it.

Even if that is in place, what would notify this?

What do you mean with "suitable"?

How do you determine this?

(adding spacing, maybe then you realize I am asking multiple questions every time, which are being skipped over)

> While you deserve respect as upstream developer, you are still in
> the Fedora project's bugzilla and you are expected to refrain from
> personal attacks.

I agree that you should refrain from personal attacks.
Please re-read the statements you have been making and stop so.

Be happy that I have a very thick skin that I am continue on interacting in this thread to get to the bottom of the problem and get a result out of it that actual solves any real problems that might exist.

> I believe the issues have been described multiple times in a way that is both
> understandable and reproducible for anyone.

I am very sure that is not the case. Note that a few replies up there was a "DNS issue" which then changed into a "not 100% reliable network" issue. The problem set has not even been properly defined after so many replies, hence how can you make that statement?

It seems most commenters do not even understand that you only have to start the tool once (that is -1- time) as then it has the details it needs and can keep on running.

I am asking for more details with good reasons.

Comment 125 Leoš Bitto 2014-05-05 14:50:21 UTC

(In reply to Jeroen Massar from comment #124)

I am not going to quote now, because that leads to long messages which do not seem to get us anywhere.

My setup is AICCU running on a router with Linux and users in LAN which have no shell access to this router. The users have no Linux skills and both them and me are not interested in any Linux training. The router has IPv4-only internet connection which is wireless (sorry, no wired internet connection is feasible at this place) and so I cannot guarantee 100% reliability. The only IPv6 connectivity is via one AICCU process running on the router, for both the router and the computers in the LAN.

All I would like to get from AICCU is to set a special exit status which would mean "it is safe to automatically restart me". This special exit status should be set only when AICCU exits before it was able to open the TCP socket to the TIC server. The idea is that when the TCP connection to the TIC server was not established, restarting AICCU would not lead to overloading the TIC server, which I understand is the main reason why automatic restarts of AICCU are forbidden.

Comment 126 Jeroen Massar 2014-05-05 15:04:08 UTC

> My setup is AICCU running on a router with Linux [..] computers in the LAN.

Thus a normal setup that is used by literally tens of thousand of users.

Well, except that those users/nobody apparently are not able to do anything with that box when it is "broken". For whatever value of "broken" that is.

(and most people have quite stable connectivity, even over wireless, but that is a big aside)

> The idea is that when the TCP connection to the TIC server was not established, restarting AICCU would not lead to overloading the TIC server,

So, to come back to the big point: Why are you needing to restart anything?

Your IPv4 connectivity is enabled once. Then you start AICCU. It keeps on running, even if you disconnect.

What is the problem?

> which I understand is the main reason why automatic restarts of AICCU are forbidden.

Yes, because if that was not happening then we would not have to write articles like:

https://www.sixxs.net/news/2013/#tichammeringcontinuespleasecon-0722
https://www.sixxs.net/faq/aiccu/?faq=tic

and many others. See also the README mentioned in previous replies and other such pieces of text.

Comment 127 Pavel Šimerda (pavlix) 2014-05-05 15:40:53 UTC

(In reply to Jeroen Massar from comment #124)
> > Even nicer would be an option to keep aiccu running and be able to notify it when it's suitable to retry it.
> 
> Even if that is in place, what would notify this?

There typicaly is a component in the system that keeps track of configuration. Often it is a consolidated network configuration service like NetworkManager, connman, netifd or systemd-networkd or a specialized system component that handles the DNS configuration like dnssec-triggerd.

> What do you mean with "suitable"?

Basically when network has just been (re)configured.

> How do you determine this?

In a typical situation NetworkManager (or an alternative) keeps track of all configuration changes from kernel through DNS to VPNs and therefore has this information already.

> (adding spacing, maybe then you realize I am asking multiple questions every
> time, which are being skipped over)
> 
> > While you deserve respect as upstream developer, you are still in
> > the Fedora project's bugzilla and you are expected to refrain from
> > personal attacks.
> 
> I agree that you should refrain from personal attacks.
> Please re-read the statements you have been making and stop so.
> 
> Be happy that I have a very thick skin that I am continue on interacting in
> this thread to get to the bottom of the problem and get a result out of it
> that actual solves any real problems that might exist.

I have yet to see any actual positive results. I didn't take the Fedora package to be happy about your presence in a bugzilla assigned to me but to improve the situation.

> > I believe the issues have been described multiple times in a way that is both
> > understandable and reproducible for anyone.
> 
> I am very sure that is not the case.

If I knew upstream is actually interested, I can put it on my TODO list and eventually come up with some real testing and a thorough description of how aiccu could be properly integrated with a linux system using systemd and NetworkManager (or similar tools). I'm already trying to improve the situation on the systemd side regarding the boot up which will fix the original bug (as in description). But I hope you don't expect me to devote my free time to something I will be only insulted for.

> Note that a few replies up there was a
> "DNS issue" which then changed into a "not 100% reliable network" issue.

Aiccu itself doesn't need to care about the specifics of the network issue. Instead, this is up to the network service to handle that and notify aiccu that we are ready for a new attempt.

> The
> problem set has not even been properly defined after so many replies, hence
> how can you make that statement?

I hope you don't want a specific network issue description when we are talking about a general case of not fully configured networking at the time of aiccu start. The discussion includes a number of examples already and you already confirmed that you see this as an "integration RFE".

> It seems most commenters do not even understand that you only have to start
> the tool once (that is -1- time) as then it has the details it needs and can
> keep on running.

I see. I can't say which commenters don't understand it and which simply don't understand why it's important. And I beleive the systemd fix that I proposed to systemd developers will fix almost all cases experienced by the users. As you already noted, we are talking about improving integration here.

> I am asking for more details with good reasons.

As I consider this bug report effectively dead, because the specific issue originally described must be fixed in systemd and has a workaround, is there a proper channel to keep information about possible further improvements? I'd say a wiki might be better than bugzilla or similar tools so we can converge to some reasonable description? Are you actually interested?

Comment 128 Jeroen Massar 2014-05-05 16:25:50 UTC

> Basically when network has just been (re)configured.

Does that also check that DNS works, and NTP is correctly synced? Amongst a few situations.

And what if that could notify AICCU, what does it do when AICCU (in it's current form) exits again, do you keep on restarting it or?

Or do you expect that AICCU then supports that notification mechanism?
(which I mentioned various replies above back as a possible feature request)

> I have yet to see any actual positive results. I didn't take the Fedora
> package to be happy about your presence in a bugzilla assigned to me but
> to improve the situation.

Thank you for clearly stating that you are not happy that I actually respond to the insults you are giving me.

Noticed that hint about talking to HR about a certain training, please do use it.

> If I knew upstream is actually interested

Why are you so much repeating this?

Let me state it again in big letters this time: I AM INTERESTED IN FIXING ACTUAL PROBLEMS.

If I where not, then I would not be replying to this thread and wasting my time in doing so.

> I'm already trying to improve the situation on the systemd side regarding
> the boot up which will fix the original bug (as in description). 

You mean the description where the filer of the bug stated himself that the "solution can generate a DoS" ?

Please instead of stating things like the above actually state what you are going to do. Maybe a new ticket would be a good restart of all this nonsense? So that the information is in a single place?

> But I hope you don't expect me to devote my free time to something I will be only insulted for.

You are insulted? You mean that you are insulting me for having bugs in my tool which your organisation added and then requiring ME to do extra work in MY free time!?

Note that SixXS is just one of those fun hobby projects done fully in one's spare time that continues to be a huge time sink just because people cannot read and think they know better how to do things instead of asking the people who should know how something works.

Or should I start listing the bug reports I filed against various distributions that wasted my time by adding "automatic restarts" even do we have a README which clearly states not to and to please contact info. Please note: nobody ever did, we always have to contact people to resolve the problem.

But yes, your free time is important too of course.


> Aiccu itself doesn't need to care about the specifics of the network issue.
> Instead, this is up to the network service to handle that and notify aiccu
> that we are ready for a new attempt.

Thank you, thus there is sanity! That actually sounds quite reasonable.
Please tell more about this, tickets, links? Things we could do to facilitate this?


> The discussion includes a number of examples already and you already
> confirmed that you see this as an "integration RFE".

Aha, now I get it. You are ignoring the fact that this thread claims several problems (possibly unrelated, as no data, thus hard to tell) and solely focus on the issue of the-first-time-one-gets-connectivity.

> And I beleive the systemd fix that I proposed to systemd developers will
> fix almost all cases experienced by the users.

You mean the item you put on their TODO list last week even though this ticket is from 2 years ago? Which will get no attention because they do not have a use case for it?

You might, in that ticket, actually point out WHY you want it, oh and like, involve the developer of the tools that this is actually for so that they can provide feedback.


> As you already noted, we are talking about improving integration here.

That is because there is a new thing in town that apparently does not support existing tools.
If that support needs to be added and it makes people happy, great.


>> I am asking for more details with good reasons.
>
> As I consider this bug report effectively dead, because the
> specific issue originally described must be fixed in systemd
> and has a workaround, is there a proper channel to keep information
> about possible further improvements? I'd say a wiki might be
> better than bugzilla or similar tools so we can converge to some
> reasonable description? 

So you are saying that you do not have a description for the problem, even though according to you the problem is dead and the problem has been repeated over and over?

Depends completely about which issue you are talking. If you are talking about THIS issue (redhat added NM, then requires AICCU to support it considering it a bug there), either using this ticket which already discusses it to death or creating a new clean ticket is the best way to go. (hey, already noticed that before...).

Wiki's are not the right tool for bug reporting, the complete timeline will be gone.
If they where the right tool bugzilla and various other tools would not be used anymore.

> Are you actually interested?

Again that question? Really? Why?

It is amazing how you try to point me to the code of conduct but are failing at every single point, let alone the spirit, yourself.

Comment 129 Leoš Bitto 2014-05-05 20:10:50 UTC

(In reply to Jeroen Massar from comment #126)
> 
> > The idea is that when the TCP connection to the TIC server was not established, restarting AICCU would not lead to overloading the TIC server,
> 
> So, to come back to the big point: Why are you needing to restart anything?

Because it stopped after started before (due to failed communication with the TIC server). There are many causes that can lead to failed communication of AICCU with the TIC server - my connection to my ISP, connection of my ISP to its upstream ISP, connection of the TIC server to its ISP - I absolutely cannot ensure that this all will work fine.

> 
> Your IPv4 connectivity is enabled once. Then you start AICCU. It keeps on
> running, even if you disconnect.
> 

As long as AICCU starts and is able to communicate with the TIC server, everything is fine - there is no need to discuss this case. What is the case which I am concerned about is what happens when AICCU is not able to reach the TIC server when it starts. My idea is that AICCU should inform about this case, a special exit status seems appropriate to me (I am trying to avoid parsing logs).

>
> What is the problem?
> 

How can I verify that the TIC server is reachable before I start AICCU (and stays reachable even after AICCU starts)? Even my configured network interface and reachable DNS server does not guarantee this and I am afraid that this approach would not lead to any perfect solution, so let's focus on another question: How can I reliably find out that AICCU stopped due to unreachable TIC server, to be able to properly automatically handle this special situation (perform controlled restarting of AICCU)?

Comment 130 Jeroen Massar 2014-05-06 07:25:31 UTC

> I absolutely cannot ensure that this all will work fine.

Indeed. And there definitely is nothing AICCU can do about this.

Even Network Manager and many other similar tools cannot detect those failures let alone when they are resolved.

Hence why AICCU gives up and lets the user try again later.

> As long as AICCU starts and is able to communicate with the TIC server,
> everything is fine - there is no need to discuss this case.

Afaik nobody was discussing the situation where everything works?

> What is the case which I am concerned about is what happens when AICCU
> is not able to reach the TIC server when it starts. My idea is that AICCU should inform about this case, a special exit status seems appropriate to me (I am trying to avoid parsing logs).

I'll ask again: how can AICCU know the difference between those failures?

If DNS lookups fail it can mean many many things. See your own statement in your reply where you cannot ensure that it all will work fine.

> How can I verify that the TIC server is reachable before I start AICCU (and stays reachable even after AICCU starts)?

You can't.

> Even my configured network interface and reachable DNS server does not
> guarantee this and I am afraid that this approach would not lead to any
> perfect solution

That is what I have been saying all along. Hence why restarts will not resolve the problem.

>  How can I reliably find out that AICCU stopped due to unreachable TIC server,

When it failed it exited and logged a message.

> to be able to properly automatically handle this special situation (perform controlled restarting of AICCU)?

Why this focus on restarting the whole tool?

Did you see comment #79? I thought I was quite clear that that does not solve any of your problems, see also the rest of this discussion and your own notes that failures can happen anywhere in the network.

It does not fix your network, it does not fix your problem.

You are just going to be automatically restarting something which will cause packets to be send over and over to a variety of destinations that cannot improve your situation.

Comment 131 Leoš Bitto 2014-05-06 08:16:21 UTC

(In reply to Jeroen Massar from comment #130)
> 
> > What is the case which I am concerned about is what happens when AICCU
> > is not able to reach the TIC server when it starts. My idea is that AICCU should inform about this case, a special exit status seems appropriate to me (I am trying to avoid parsing logs).
> 
> I'll ask again: how can AICCU know the difference between those failures?
> 

AICCU knows whether it succeeded opening the TCP socket to the TIC server or not. The main difference is that if anything failed before successful opening of the TCP socket to the TIC server, automatic restarting is possible, because it would not overload the TIC server.

> 
> > How can I verify that the TIC server is reachable before I start AICCU (and stays reachable even after AICCU starts)?
> 
> You can't.
> 

I have suspected this. So I do not know when to start AICCU, therefore I want to implement proper automatic restarting because I am not happy with the situation when administrator has to solve this manually.

> 
> >  How can I reliably find out that AICCU stopped due to unreachable TIC server,
> 
> When it failed it exited and logged a message.
> 

It would be nice if I would not have to parse log messages, that is not very stable solution. For automated processing the proper exit status would be much easier to use.

> 
> > to be able to properly automatically handle this special situation (perform controlled restarting of AICCU)?
> 
> Why this focus on restarting the whole tool?
> 
> Did you see comment #79? I thought I was quite clear that that does not
> solve any of your problems, see also the rest of this discussion and your
> own notes that failures can happen anywhere in the network.
> 

Just because failures can happen anywhere in the network it does not mean that we should resort to humans solving them every time - or do we?

> 
> It does not fix your network, it does not fix your problem.
> 

Yes, it does not fix my network. But it _does_ fix my problem, because my problem is that AICCU stays turned off after if it stops due to a temporary network outage when starting.

> 
> You are just going to be automatically restarting something which will cause
> packets to be send over and over to a variety of destinations that cannot
> improve your situation.
> 

It definitelly _does_ improve my situation, because it enables AICCU to start working automatically soon after the network connection becomes available, which is much better than administrator having to solve this manually by reading logs and reacting to them.

Comment 132 Jeroen Massar 2014-05-06 08:32:46 UTC

> AICCU knows whether it succeeded opening the TCP socket to the TIC server or not.

But not why that failed.

> The main difference is that if anything failed before successful opening of the TCP socket to the TIC server, automatic restarting is possible, because it would not overload the TIC server.

It would indeed not overload the TIC server, it would just overload another component in the network stack, eg that DNS server that was unable to answer.

(note that the overload would just happen if you keep on hammering a lot of course or with a lot of clients)

> > You can't.
> 
> I have suspected this. So I do not know when to start AICCU, therefore I want
> to implement proper automatic restarting because I am not happy with the
> situation when administrator has to solve this manually.

I understand that that is what you want to solve, but when the network is unreliable or there are other situations which cannot be resolved, there is nothing that AICCU can do to solve that. Keeping on restarting does not solve it. Somebody needs to solve the actual problem at hand.

> It would be nice if I would not have to parse log messages, that is not very stable solution. For automated processing the proper exit status would be much easier to use.

So what you are actually stating is, that even though there are huge warnings about not doing automatic restarts, you are still going to do them?

Please, go abuse somebody elses network.

> Just because failures can happen anywhere in the network it does not mean
> that we should resort to humans solving them every time - or do we?

Until we have AI that can determine and resolve the problem, that is what will need to happen unfortunately.

What do you do when your unreliable wireless connection goes down or breaks in any way? Do you just keep on hammering on your wireless provider by setting up continues connections (well, associations) to it even though maybe it simply because the network is really overloaded or the AP is gone?

>  But it _does_ fix my problem, because my problem is that AICCU stays turned off after if it stops due to a temporary network outage when starting.

No, it does not fix your problem, it works around your problem.

> It definitelly _does_ improve my situation, because it enables AICCU to start
> working automatically soon after the network connection becomes available,

"soon", thus you really suggest hammering on your local network a lot, flooding it with packets till the situation "automatically resolves itself"?

Ever heard people joke about Windows, that one has to restart it all the time? Indeed, non-Windows people are typically so proud that they are not doing that. Thus why are you suggesting that behaviour?

> which is much better than administrator having to solve this manually by reading logs and reacting to them.

That administrator should already have noticed that your network is unreliable and that he should resolve that instead.

Comment 133 Leoš Bitto 2014-05-06 09:17:58 UTC

(In reply to Jeroen Massar from comment #132)
> > AICCU knows whether it succeeded opening the TCP socket to the TIC server or not.
> 
> But not why that failed.
> 

Even plain information that it failed, without exact reason, is valuable.

> > The main difference is that if anything failed before successful opening of the TCP socket to the TIC server, automatic restarting is possible, because it would not overload the TIC server.
> 
> It would indeed not overload the TIC server, it would just overload another
> component in the network stack, eg that DNS server that was unable to answer.
> 
> (note that the overload would just happen if you keep on hammering a lot of
> course or with a lot of clients)
> 

I am going to make about one restart per minute, that is not going to overload anything, even when multiplied by number of clients in the same part of the network doing the same.

> > It would be nice if I would not have to parse log messages, that is not very stable solution. For automated processing the proper exit status would be much easier to use.
> 
> So what you are actually stating is, that even though there are huge
> warnings about not doing automatic restarts, you are still going to do them?
> 
> Please, go abuse somebody elses network.
> 

I do plan the restarts in a responsible way - avoiding to overload the TIC server, and avoiding to overload the local network elements as well. I would to prefer call it "use" instead of "abuse".

> 
> > Just because failures can happen anywhere in the network it does not mean
> > that we should resort to humans solving them every time - or do we?
> 
> Until we have AI that can determine and resolve the problem, that is what
> will need to happen unfortunately.
> 
> What do you do when your unreliable wireless connection goes down or breaks
> in any way? Do you just keep on hammering on your wireless provider by
> setting up continues connections (well, associations) to it even though
> maybe it simply because the network is really overloaded or the AP is gone?
> 

Why would I do any kind of hammering? Slow restarts are harmless, they are about the same as normal use of the network.

> >  But it _does_ fix my problem, because my problem is that AICCU stays turned off after if it stops due to a temporary network outage when starting.
> 
> No, it does not fix your problem, it works around your problem.
> 

Which is still good enough because it achieves my goals: shortening IPv6 outages and avoiding the need of any human administrative intervention.

> 
> > It definitelly _does_ improve my situation, because it enables AICCU to start
> > working automatically soon after the network connection becomes available,
> 
> "soon", thus you really suggest hammering on your local network a lot,
> flooding it with packets till the situation "automatically resolves itself"?

There are two different words: "soon" and "immediately". I have used "soon" instead of "immediately", so I think that it is not appropriate to react with words like "flooding" or "hammering".

Comment 134 Pavel Šimerda (pavlix) 2014-05-06 09:55:23 UTC

Jeroen, 

In your answer to Leoš you seem to insist that any dynamically handled networking is the wrong anyway. Why should I devote my free time to explaining issues and ideas in that area? You also seem to enjoy insulting anyone who doesn't 100% agree with you. You finally achieved to drive away my interest in the discussion.

As for systemd/NetworkManager issues, links to all the resources are already in this bug report.

Have a nice day,

Pavel

(In reply to Leoš Bitto from comment #133)
> I am going to make about one restart per minute, that is not going to
> overload anything, even when multiplied by number of clients in the same
> part of the network doing the same.
> 
> ...
>
> I do plan the restarts in a responsible way - avoiding to overload the TIC
> server, and avoiding to overload the local network elements as well. I would
> to prefer call it "use" instead of "abuse".

Please note that this will still violate the SixXS usage policy (or how it's called). Please avoid that, get a native IPv6 or switch to a service that allows that. If your issue happens during the boot up period, feel free to start a bug report with the 'NetworkManager' component and put me to Cc.

Also please don't continue to discuss usage of SixXS infrastructure in violation with their policies in Fedora bugzilla. If you believe the policies should be changed, please contact SixXS directly.

Thank you,

Pavel

Comment 135 Jeroen Massar 2014-05-06 11:27:37 UTC

> Even plain information that it failed, without exact reason, is valuable.

That is exactly what it is already providing.

But as you stated, your user/admin won't read it anyway.

Hence why you want to automate it. But as there is no way to know what the problem is, you will just keep on guessing instead of actually solving the problem at hand.

> I am going to make about one restart per minute, that is not going to
> overload anything, even when multiplied by number of clients in the same
> part of the network doing the same.
[..]
>  Slow restarts are harmless, they are about the same as normal use of the network.

That is exactly what the people thought who have added automatic restarts to AICCU before "Oh, I am smart I know that this is okay, 1 times per minute does not hurt".

And then we ended up with clients who connected million times per day and where the person who runs it does not care and will never resolve it. Fortunately one can tarpit IP addresses.

> Which is still good enough because it achieves my goals: shortening IPv6
> outages and avoiding the need of any human administrative intervention.

The only "outage" you will have is when you have restarted the machine and then have no connectivity and then try to start AICCU.

When AICCU is running (thus after it received it's details from TIC) it does not exit, even if your addresses change etc.

You only need to start AICCU once, there is no need for restarts.

Pavel wrote:
> In your answer to Leoš you seem to insist that any dynamically handled networking is the wrong anyway.

Your sentence does not make sense, you likely wanted to add some extra words.

Trying to parse it though, the primary thing (see also above) is that there is a mismatch with you idea/perception of "dynamically handled networking" with:
 A> "I have no connectivity when I start a tool that requires a working network"
and:
 B> "I lost connectivity while the application was running"

AICCU handles (or well the protocols heartbeat&AYIYA handle, AICCU is the implementation) case B>. It does not handle case A>, as I have been trying to explain (but clearly am missing the point somewhere to be able to explain that people) it cannot do that.

> Why should I devote my free time to explaining issues and ideas in that area?

Nobody is asking you to "devote your free time". You took an interest yourself by commenting in this bug, nobody is forcing or requiring that from you (unless it is a goal by Redhat to do Fedora work, but that is not likely is it?)

> You also seem to enjoy insulting anyone who doesn't 100% agree with you.

Is it possible for you that instead of trying to go the "You are insulting people" route, you could come up with actual technical responses? That would be a great start instead of just pointing at me as the black sheep every single comment you make. I'll for the last time suggest you talk to your HR department about that training. And no, I am not going to take the high route in just ignoring all those comments you are making, even though I should just do that.

> As for systemd/NetworkManager issues, links to all the resources are already in this bug report.

You mean the links that you characterised as hacks that should not be used?
Or do you mean the bugs that are resolved already by just closing them?

> If your issue happens during the boot up period, feel free to start a bug report with the 'NetworkManager' component and put me to Cc.

Isn't that EXACTLY what this bug (and Leos's case) is about?
Did you even read it or where you just too busy with trying to insult me and steer the discussion away?

> Also please don't continue to discuss usage of SixXS infrastructure
> in violation with their policies in Fedora bugzilla
> If you believe the policies should be changed, please contact SixXS directly.

I have not seen a single discussion of anybody in this thread noting that the "policy" to not restart AICCU and overload the TIC server. Everybody seems to agree that that is a standard way of protecting services from abuse.

The only person bringing it up is you. Is that just another way for you to attempt to insult the SixXS service?

I am sorry, but really, if you just want to insult people or the services they provide in their free time, just stop commenting.

We are trying to figure out how to solve a problem that somebody has with a package that Fedora is distributing, which is why it is in this bug tracker.

Comment 136 Pavel Šimerda (pavlix) 2014-05-06 12:08:42 UTC

(In reply to Pavel Šimerda (pavlix) from comment #109)
> I'm going to modify the aiccu service file so that it follows the current
> systemd documentation and also maintains backwards compatibility in case
> it's used with an older systemd.
> 
> diff --git a/aiccu.service b/aiccu.service
> index cef6a7c..0dab022 100644
> --- a/aiccu.service
> +++ b/aiccu.service
> @@ -1,6 +1,7 @@
>  [Unit]
>  Description=AICCU (Automatic IPv6 Connectivity Configuration Utility)
> -After=syslog.target network.target
> +Wants=network.target network-online.target
> +After=network.target network-online.target
>  
>  [Service]
>  Type=forking
> 
> This doesn't currently help any of the users as the issue is in
> NetworkManager service files and/or systemd itself, not aiccu [talking about
> the specific issue of boot up ordering, not other aiccu integration
> problems].
> 
> As we cannot solve the startup in aiccu and upstream is not willing to work
> on the other related issues, I'm closing this bug as WONTFIX. Anyone who is
> still concerned about the issues is welcome to participate in the following:
> 
> 1) NetworkManager service files should be updated to implement
> network-online.target properly.

I just realized that this is already handled in Fedora, just not in upstream. Therefore the change above that has been made to the rawhide aiccu package is a solution itself. Changing the bug status accordingly.

Anyone is welcome to test the rawhide package, I'm soon going to release an update for current releases of Fedora after some more testing. The preferred way to test it is by rebooting, otherwise you at least need to stop the network-online.target first.

> There's an upstream bugzilla ticket for it.
> 
> https://bugzilla.gnome.org/show_bug.cgi?id=728965

I'm working with NetworkManager folks to upstream the Fedora's fix.

> Feel free to start a Fedora bug ticket for it if interested, as I'm *not*
> doing that.

This no longer applies as Fedora already included this fix.

> 2) Systemd should be updated to support initscripts properly. This doesn't
> affect Aiccu on Fedora directly but can delay #1. This is in their TODO list.
> 
> http://cgit.freedesktop.org/systemd/systemd/commit/
> ?id=d20850cbf4545715340580c179cf316005d53905

This still applies and anyone using LSB initscripts with systemd is welcome to continue using the workaround below.

> systemctl enable NetworkManager-wait-online.service

Comment 137 Pavel Šimerda (pavlix) 2014-05-06 12:40:51 UTC

For anyone who wants to quicly test it without installing a new version, contents of aiccu.service follows:

[Unit]
Description=AICCU (Automatic IPv6 Connectivity Configuration Utility)
Wants=network.target network-online.target
After=network.target network-online.target

[Service]
Type=forking
EnvironmentFile=-/etc/sysconfig/aiccu
ExecStart=/usr/sbin/aiccu start $OPTIONS
ExecStop=/usr/sbin/aiccu stop
PIDFile=/run/aiccu.pid

[Install]
WantedBy=multi-user.target

Comment 138 Frank Ansari 2014-09-07 15:52:37 UTC

I admit I did not read all of the stuff here because this is really a lot.
Only thing I want to say is that I have problems with aiccu startup since I use it.

The above solution (Comment 137) did not work for me (I am using Fedora 20).

The reason why aiccu will not start correctly is that aiccu depends on the time. So chronyd must run and has to be in a synchronized state before aiccu should be allowed to start.

"Once upon a time" I wrote a ruby script for this. This is only a workaround and may not work 100% reliable but in most cases after restarting the system aiccu is in a working state. So it is at least more than nothing but of course it should be fixed in professional way with systemd.

Put this script in the "ExecStart" (see Comment 137) at it works at least better that before.

Comment 139 Frank Ansari 2014-09-07 15:55:11 UTC

Created attachment 935170 [details]
startup script for aiccu waiting for chrony sync

Use for ExecStart in /usr/lib/systemd/system/aiccu.service

Comment 140 Frank Ansari 2014-09-07 15:57:40 UTC

Comment on attachment 935170 [details]
startup script for aiccu waiting for chrony sync

Don't forget to create a directory /var/log/scripts if not already availabe.

Comment 141 Pavel Šimerda (pavlix) 2014-09-08 08:05:28 UTC

(In reply to Frank Ansari from comment #138)
> I admit I did not read all of the stuff here because this is really a lot.
> Only thing I want to say is that I have problems with aiccu startup since I
> use it.
> 
> The above solution (Comment 137) did not work for me (I am using Fedora 20).

Updated Fedora 20 handles boot time network connectivity waiting correctly, so this particular bug is over.

> The reason why aiccu will not start correctly is that aiccu depends on the
> time. So chronyd must run and has to be in a synchronized state before aiccu
> should be allowed to start.

Non sequitur. Probably most machines that run Fedora 20 have a hardware clock and therefore don't need to wait for clock synchronization. For machines that
don't, it might be reasonable to ask systemd to provide a target to wait for that would mean an attempt to synchronize time has been already made, either successful or unsuccessful (like network-online.target).

> "Once upon a time" I wrote a ruby script for this. This is only a workaround
> and may not work 100% reliable but in most cases after restarting the system
> aiccu is in a working state. So it is at least more than nothing but of
> course it should be fixed in professional way with systemd.

Also AFAIK systemd folks have their own synchronization code.

Comment 142 Michal Schmidt 2014-09-08 10:48:21 UTC

(In reply to Pavel Šimerda (pavlix) from comment #141)
> it might be reasonable to ask systemd to provide a target to wait for
> that would mean an attempt to synchronize time has been already made, either
> successful or unsuccessful (like network-online.target).

Looks like it already exists: time-sync.target
For its semantics see "systemctl help time-sync.target" (aka. "man systemd.special").

Comment 143 Pavel Šimerda (pavlix) 2014-09-08 15:26:25 UTC

(In reply to Michal Schmidt from comment #142)
> (In reply to Pavel Šimerda (pavlix) from comment #141)
> > it might be reasonable to ask systemd to provide a target to wait for
> > that would mean an attempt to synchronize time has been already made, either
> > successful or unsuccessful (like network-online.target).
> 
> Looks like it already exists: time-sync.target
> For its semantics see "systemctl help time-sync.target" (aka. "man
> systemd.special").

Thanks. Pushed the change to rawhide and started a build. Does this target actually work in any Fedora <22?

Note You need to log in before you can comment on or make changes to this bug.