diff options
author | Michel Stam <m.stam@fugro.nl> | 2014-10-02 11:39:16 +0000 |
---|---|---|
committer | Steven Barth <steven@midlink.org> | 2014-10-16 21:17:41 +0200 |
commit | fd7d5b43d85b2e3bb2eee98644bca833c6202048 (patch) | |
tree | 7f668ac05bfd1d881e4a147147d4e07059ae2b34 | |
parent | 4210b3247c8afd9981a4534bb3cba9e30959b611 (diff) |
proto-shell: fix for not handling switch from DHCP to static race
When a shell script call is finished, proto_shell_task_finish( ) is
called to monitor processes, and determine the next interface state.
When the interface is brought up after a reconfiguration from dhcp
to static, it will first try to (erroneously?) reconfigure the
interface for DHCP. Upon doing this, it realises the mistake and
kills off the script by setting the state to S_SETUP_ABORT. This is
done by the proto_shell_handler. When this happens. the scripts
have 1 second to finish.
When this happens, S_SETUP_ABORT in proto_shell_task_finish( )
should issue a 'teardown' event to the shell script to deconfigure
the interface. It is here that things go wrong.
Shell scripts do not execute commands themselves, they should
finish as quick as possible. This is very race condition sensitive,
though; Instead of executing commands, they post messages to
execute commands. It is therefore possible that when the script
finishes, there's still commands to execute.
The dhcp protocol handler script, one of the scripts involved,
notifies netifd of changes by (indirectly) calling
proto_shell_update_link( ).
Once every so often, the dhcp script will not be finished in time,
and proto_shell_task_finish( ) cannot immediately continue,
because (in this case) the proto_task is still pending.
If this happens, the proto_shell_task_finish( ) will wait, but if
the proto_shell_update_link( ) notification is then received, it
will set the statemachine to idle, thus breaking the
S_SETUP_ABORT. Furthermore, an event is generated to indicate that
the network interface should be set to UP, rather than DOWN.
This confuses netifd, and the result is a stuck process that does
not respond to UCI calls anymore.
Note that a similar situation happens in the S_TEARDOWN state in
proto_shell_task_finish( ).
The fix, although a bit ugly, is to prevent the UP event from
being sent, and not to reset the state machine to idle in
proto_shell_update_link( ).
Signed-off-by: Michel Stam <m.stam@fugro.nl>
-rw-r--r-- | proto-shell.c | 8 |
1 files changed, 5 insertions, 3 deletions
diff --git a/proto-shell.c b/proto-shell.c index 7c23caa..0131e19 100644 --- a/proto-shell.c +++ b/proto-shell.c @@ -506,9 +506,11 @@ proto_shell_update_link(struct proto_shell_state *state, struct blob_attr *data, interface_update_complete(state->proto.iface); - if (!keep) - state->proto.proto_event(&state->proto, IFPEV_UP); - state->sm = S_IDLE; + if ((state->sm != S_SETUP_ABORT) && (state->sm != S_TEARDOWN)) { + if (!keep) + state->proto.proto_event(&state->proto, IFPEV_UP); + state->sm = S_IDLE; + } return 0; } |