Start from clean GIT

This commit is contained in:
Tim Schubert 2018-09-21 11:45:21 +02:00
commit e841f14a95
2112 changed files with 6638085 additions and 0 deletions

View file

@ -0,0 +1,70 @@
\acresetall
\chapter{Conclusion}
% - evaluate effect of transient node resets + hardened implementation
% - create tools for setting up + measuring + analysing experiments
% - \fitlab
% - stable topology
% --> comparable
% - transient node resets are a thing
% - if reset: changes ++
% - less with hardening
% - if no reset: less good --> depends on use-case
% - convergence time
% - restoring state takes time
% - restoring wrong state even more --> validation helps
% - performance: delay, packet loss
% - hardened implementation loses fewer packets when used without restore UID
% - but both worse without single node reset
The topic of this work has been to evaluate the reset of single node restarts
and the possibly resulting transient node failures on a \ac{WSN} which uses
the \ac{RPL}. The evaluation has been performed using \fitlab. For this software has
been created to automate the experiments using the existing infrastructure
provided by \fitlab and analyze the data obtained from the experiments. For
running the hardened implementation in \fitlab, the implementation has been
ported to a newer version of \emph{Contiki} that has device support for the
sensor nodes.
Each experiment has been performed in six different phases, where the first two
(N and R) use the default implementation of \ac{RPL} in \emph{Contiki}, the
second pair (H and HR) uses the hardened implementation \cite{mueller2017} and
the third pair (HS and HSR) uses the hardened implementation that also checks
the validity of the restored routing information.
First, the \ac{DAG} that results from each experiment has been compared and
checked for variances. It has been discovered that, using \fitlab it is possible
to reliably create a tree where a certain node (e.g. the node to be
reset during R, HR and HSR) joins the \ac{DAG} at a certain rank. This means
that comparable results can be obtained for the effect of single node resets.
Next, the instability of the \ac{DAG} has been evaluated by counting the number
of changes that occur during each phase. It has been observed that a single reset
will trigger a large amount of changes in the \ac{DAG}, which will result in a
higher energy consumption. If such resets occur, both versions of the hardened
implementations will reduce the number of changes to the
\ac{DAG}, but will increase the number of changes if no resets occur, in which
case the default implementation performs better.
From observing the network convergence times for \ac{RPL} for each
implementation with and without single node resets, it can be concluded that
restoring the state takes time, during which arriving packets will be lost. The
hardened implementation, but without the validity checks enabled, performs best in
reducing the time packets are dropped while the node recovers from its reset. It
has been discovered that the validity checks that involve the exchange of
\ac{DIO} messages take longer than recreating the state without restoring it
from persistent memory.
The final part of the evaluation looked at the message overhead generated by
each implementation and for a network with and without single node resets. Here
it has been observed, that the default implementation creates the lowest
overhead for a scenario without single node resets, while the hardened
implementations create a large overhead if no single node reset occurs. For a
scenario where single node resets occur, both of the hardened implementations
manage to achieve a smaller message overhead than the default implementation.
In general, it can be concluded that the energy consumption of the hardened
implementation is significantly higher than for the default implementation if no
single node resets occur. If single node resets occur, the hardened
implementation will only have a small advantage over the default implementation
in terms of energy consumption.

View file

@ -0,0 +1,538 @@
\chapter{Evaluation} % English: Evaluation
\label{chap:evaluation}
This chapter summarizes the results of the experiments. At the center of the
evaluation stands the comparison of the energy consumption of the default
\emph{Contiki}-\ac{RPL} and version with added persistance of routing
information. Another important point is how the network topology behaves in the
different configuration and how the network reacts to resetting nodes. For this,
the power consumption of the individual nodes and the complete network, the stability
of the \ac{DAG} and the performance of the network are evaluated.
\section{Firmware Configurations}
\label{sec:confs}
For the firmware, three different modes have been evaluated. The first mode uses
the default \emph{Contiki}-\ac{RPL} (N), the second uses the hardened implementation with
only the storing of persistent state enabled (H) and in the third mode additionally
the sanity of the routing information is checked using \acp{UID} and the clock
values of the neighboring nodes (HS). Each mode is tested for a network where a
reset occurs (R) and for a network where no resets occur. From this result 6
different variants of the experiment which are shown in \autoref{tab:variants}.
\begin{table}[h]
\centering
\caption{Experiment configurations}
\begin{tabular}{r c c c}
\toprule
Test run & Hardened & Sanity & Resets \\
\midrule
N & & & \\
R & & & X \\
H & X & & \\
HR & X & & X \\
HS & X & X & \\
HSR & X & X & X \\
\end{tabular}
\label{tab:variants}
\end{table}
\section{Influence of Environmental Factors}
When evaluating the data obtained from \fitlab some environmental factors that
have an influence on wireless transmissions have to be taken into account.
\subsection{Radio Interference}
The radio of the \emph{M3} node uses the 2.4 GHz \ac{ISM} bands. One problem
with this is, that this shared spectrum suffers from interference with other
users since it is widely used for other network applications (e.g. \emph{WiFi}).
This means that interference with surrounding devices in the building and with
other experiments in the testbed is legitimate concern. As long as other devices
use some form of \ac{MAC} that is compatible with the one used in IEEE 802.15.4,
this is less of a problem since different senders is able to
coordinate their transmissions to some degree. Because the complete networking
stack the \ac{ON} uses can be user-defined, as is the case with all software
running on the \ac{ON}, this may not always be given.
In the case of previous experiments by Müller et al. \cite{mueller2017}, this
problem was responded to by only running the experiments at night time, where
other senders are less active. In the case of \fitlab, where experiments
may run unsupervised, this is likely not a valid strategy. Another mitigation that
has been performed is to select channel 16 in the 2.4 GHz spectrum in use for
IEEE 802.15.4 and then capturing traffic that can be seen on this channel. Also,
the \ac{API} allows to monitor if other experiments are scheduled during an
experiment and a more appropriate time can be selected instead.
Since we use almost all nodes that are available for experiments and not down
for maintenance, other experiments are effectively prevented from interfering
with ours. Selecting channel 16, in this case, has also proven to be effective in limiting
interference from other networks.
\subsection{Signal Propagation}
As can be seen in \autoref{fig:testbed}, the test network is located on the floor
of a building. This building has multiple floors. The larger part of the
building on the right is separated from the smaller part of the building on the
left in some places by a wall. There are a few pillars between some of the test
nodes. All of them present obstacles for the propagation of the signal. As for the influence
they have on the experiment, it can be noted that the positions of all obstacles
remains constant during and between each experiment. However, the physical topology
(e.g. which nodes are neighboring each other) changes compared to a strictly
linear topology as suggested by the map.
\section{Topology of the RPL Network}
The topology of the network does have a large influence on how much the network
is affected by a resetting node, as was previously shown by Kulau et al. \cite{kulau2017energy}.
For the different runs of the experiment to remain comparable, it is necessary
that under the same conditions the network obtains a similar topology.
\subsection{Ideal Network Topologies}
As previously established, the affect resetting nodes have on the network
depends on the position and role of the node inside the network.
In a tree-like topology as depicted in \autoref{fig:treetop}, nodes have at most
one alternative parent and the entire sub-tree underneath the resetting node
will be affected as they lose their default route to the root node.
For a mesh-like topology, such as in \autoref{fig:meshstar}, the network may be
able to quickly recover after the nodes have selected one of their alternative
parents.
While these topologies serve the purpose of illustrating which factors play into
the behavior of a recovering network, it is not easily possible to recreate such
topologies in a real-wold scenario, since the topology of the network is limited
by the different configurations \fitlab offers.
\begin{figure}
\centering
\begin{tikzpicture}[<-,>=stealth', level/.style={sibling distance = 5cm/#1, level distance = 1.5cm}, font = \small, every node/.style={circle,draw}]
\node {1}
child { node {2}
child { node {5}
child { node {11} }
child { node {12} }
}
child { node {6}
child { node {12} }
child { node {13} }
}
}
child { node {3}
child { node {7}
}
child { node {8}
child { node {14} }
child { node {15} }
}
child { node {9}
}
}
child { node {4}
child { node {10}
child { node {16} }
child { node {17} }
child { node {18} }
}
child { node {11}
child { node {19} }
}
};
\end{tikzpicture}
\caption{Tree topology}
\label{fig:treetop}
\end{figure}
% TODO build in \latex \tikz
\begin{figure}
\centering
\includegraphics[width=.5\textwidth]{../images/sim_star_new.pdf}
\caption{Star mesh topology \cite{mueller2017}}
\label{fig:meshstar}
\end{figure}
\subsection{Measured Topology}
Each topology resulting from any experiments in \fitlab converges upon a very similar
\acs{DAG}. An example of such a \ac{DAG} is shown in \autoref{fig:dagexample}.
One thing that can be observed is that the resulting tree has up to 6 layers.
It should be noted that most nodes are an equal distance to another as displayed
in \autoref{fig:testbed} and the links can be assumed to be of similar quality.
\begin{figure}
\centering
\includegraphics[width=\textwidth]{../images/dag.pdf}
\caption{An example of a DAG generated for the configuration used in the evaluation}
\label{fig:dagexample}
\end{figure}
\subsubsection{Relationship to the Physical Topology}
One common property of the \ac{DAG}s is that node 159 is the root of the largest
sub-tree. When comparing the nodes of this sub-tree to their positions on the
map, one noticeable property of this sub-tree is that it even contains nodes that
have a closer physical distance to nodes from other sub-trees. In the depicted
tree an example would be node 196 joining the sub-tree of 159 instead of a
acquiring the directly neighboring node 155 as a parent. When comparing to the
shape of the surrounding building these two nodes are divided by two outside
facing walls of the surrounding building, while the path across 159, 200 and 224
is only obstructed by a dry-wall which presents less of an obstacle to the radio
signal.
\subsubsection{Selecting a Node to Reset}
Initial evaluations of the resulting topologies give an indication to which node
needs to reset to have a measureable effect on the network. Node 200 has been
selected to be reset during a single random time during the phases with resets,
R, HR and HSR. The reasons for this are that node 200 has a large enough sub-tree in
most runs of the experiment to affect enough nodes and because it is the node
that is most frequently selected as a preferred parent when connecting the two
halves of the building.
\subsection{Route Stability}
The stability of the network is determined by how stable the conditions are on
which the routing protocol bases its decisions. For a network where resets
occur, these conditions will change upon the reset of a node and the routing
protocol reacts to this situation. Therefore, the number of changes of the
routing decisions is a measure of the stability of the network created by the
routing protocol.
\begin{figure}
\centering
\includegraphics[height=.3\textheight]{../images/stability.pdf}
\caption{Number of changes of the default route of any node during a phase}
\label{fig:stability}
\end{figure}
\autoref{fig:stability} shows the number of changes of the default route of any
node during each phase of an experiment. The default implementation in phase N
causes the fewest changes if no single node resets occur. The hardened
implementations of H and HS on the other hand lead to more changes. This may be
due to the processing, or restoring and invalidating a previous invalid state from
persistent memory.
The number of changes for the hardened implementations during a phase with
single node resets is smaller than for the default implementation. This means
that the hardened implementations recover more easily from single node resets
than the default implementation does.
\subsubsection{Relation to Location}
\autoref{fig:hmroutes} shows a heat-map of the distribution of default routes
during each phase. Each cell results from the number of times the route has been
selected at the end of an interval of 10 seconds during the phase of an
experiment. This number was then then normalized by the length of the phase
during the experiment, since the length of a phase may vary due to variations in
how fast the test-lab reacts to instructions given by the orchestration
component. Thus, routes that are more often selected are shown in a darker hue,
while routes that are rarely selected are shown in a lighter hue.
\begin{figure}
\centering
\includegraphics[width=\textwidth]{../images/routes.pdf}
\caption{Heat-map of the normalized choice of default routes for the different
phases}
\label{fig:hmroutes}
\end{figure}
The most noticeable thing about the these maps is that the distribution of
routes of routes between the different phases does not vary much, in that the
maps are almost identical. One thing to conclude from this is that the choice
of the \ac{DAG} is not altered in a relevant way by adding the persistance mode
in phase H or additionally validating \acp{UID} in phase HS. This means that
that there is not enabling the hardened implementation in regard to the choice of
an optimal \ac{DAG}.
Another thing to be noted is that the when viewing each row of the map, some
nodes show a distribution of routes which is very dense for a small number of
different neighbors while others do not have such routes. When compared to the
topology of the \ac{DAG}, the nodes that offer more stable routes are generally
inside nodes of the tree (e.g. not leaves). Nodes that tend to change
routes more join the tree as nodes.
The comparison of the phases have resets with their counterparts yields that there
are more changes between the N and the R phase than are between the H and the HR
and the HS and the HSR phases. From this it can be interpreted that more route
changes may have occurred during the reset of the node in phase R. This would
mean that the network is in a more unstable state during R than in HR and HSR.
%\subsubsection{Relation to Rank and Number of Neighbors}
%
%\autoref{fig:rankvsneighvschanges} shows the pairwise relationship of the rank
%of a node, its number of neighbors and the number of changes of its preferred
%parent selection.
%
%% TODO create new figure
%
%%\begin{figure}
%% \centering
%% \includegraphics[width=\textwidth]{../images/changes.pdf}
%% \caption{Rank, default parent changes, number of neighbors}
%% \label{fig:rankvsneighvschanges}
%%\end{figure}
%
%With increased rank, the number of changes of the preferred parent increases.
%This may cause an increase in energy consumption. One possible explanation may
%be that the possible number of nodes that may fail along the path to the sink is
%increased when a node has a higher rank. This is supported by the fact that the
%increase in changes is higher for the phases that include resetting a node. As
%such it can be expected to see a noticeable increase in power consumption for
%these phases (see \autoref{fig:consum-rank.pdf}).
\subsection{Convergence Time}
When considering the resulting routing topology, the time it takes the network
to converge upon one topology is also of interest. A larger amount of
routing messages will have to be transmitted, the longer some single node in the
network takes to acquire a preferred parent.
\autoref{fig:convtime} shows the convergence time of the network for each phase.
The phases without resets are grouped to the left and on the right are displayed
the phases with resets.
\begin{figure}
\centering
\includegraphics[height=0.3\textheight]{../images/convergence.pdf}
\caption{Network convergence time for each phase}
\label{fig:convtime}
\end{figure}
It is noticeable that both the H and HS phases, which use the hardened
implementation, have generally longer convergence times than the default
implementation of \ac{RPL}. In the case of the restoration of the routing state
from persistent memory, as in phase R and HS, all previously recorded \ac{DIO}
messages will be replayed to the \ac{RPL} module of \emph{Contiki}. This implies
that the time it takes to process these messages and make changes to the saved
\ac{DAG} and routing table adds to the time it takes to choose a preferred
parent and therefore lengthens the network formation time. It can be assumed
that much of this time is spend writing and reading this data from the peristant
memory.
In the case of the HS and HSR phases, additional \ac{DIO} messages are
exchanged to verify the stored routing information and messages before restoring
them. These messages contain \acp{UID} to identify the information and measure
the freshness of the information using the clock the implementation keeps for
the local routing information. Presumably, the time it takes to exchange these
messages further adds to the delay until a suitable preferred parent is selected
for each node.
Another thing that is remarkable is how much the convergence time varies for the
H and HS phases compared to the phases without the hardened implementation,
regardless of whether there are resets or not. As a consequence, the network
forms in a more reliable manner in the default implementation.
An interesting observation about the difference in the convergence time between the
phases with resets and those without is that generally the convergence time is
shorter for a network with resets. It would be expected that the inverse of this relation would
be the case. The exact reason for this can only be speculated upon. If a reset
occurs during the initial formation of the \ac{DAG} during a phase, the
resetting node might not partake in the formation of the network. This then
would mean that less alternative paths inside the network exist from which to
choose which might lead to the network converging faster.
% TODO network convergence time derectly after reset
\section{Energy Consumption}
% TODO error from measuring 2nd phases --> indirect comparison by # messages ...
This section discusses the energy consumption of the test
network and how it changes based on the implementation in use and whether a single
node reset occurs. \autoref{fig:consphases} shows the total energy consumption
of the network during the different phases. Nodes 200 and 157 have been excluded
since they act as the resetting node and the root node.
\begin{figure}
\centering
\includegraphics[height=0.3\textheight]{../images/consumption-phases.pdf}
\caption{Total consumption except nodes 200 and 157}
\label{fig:consphases}
\end{figure}
For a network in which no resets occur, the consumption of the default
implementation (N) is significantly lower than for both versions of the hardened
implementation (H, HS). One possible factor in this may be the energy spend on writing
the persistent state to the flash memory. Another may be the additional
computing time spend on processing the restored state and in case of the HS
phase, the exchange of \ac{DIO} messages.
This effect is amortized when recovering from a reset in the phases HR and HSR,
where the default implementation uses more energy than the hardened version (HR).
For instances, the extended hardened implementation (HSR) uses less power than
the default implementation, but the mean of the consumption is higher for HSR.
This means that the additional exchange of messages to verify the state stored
in persistent memory consumes more energy on average than the restoring of the
persistent state saves.
\subsubsection{Constant error between consecutive phases}
The comparison of the phases in which no resets occur (N, H, HS) versus the
phases with resets (R, HR, HSR) yields, that a smaller energy consumption is
measured for the phases with resets.
For each firmware, the phase with and without resets run consecutively. For each
series of measurements for each individual node the power consumption of each
second phase is smaller over the complete duration of the phase. This leads to
the conclusion that this behavior is not triggered by the single node reset,
but rather caused by an external factor.
For this reason, the measured values of the energy consumption of the
individual phases are only valid for comparison between phases that either have
resets or do not. For the comparison of the phases that use the same firmware
(e.g. N and R) other variables can be used, such as the number of protocol
messages and the number of changes of the preferred parent.
%TODO measurement sequence
\subsection{Consumption of the DAG Root and Resetting Node}
\autoref{fig:cmpsinkreset} shows the total consumption of the network, except for
the sink node and the resetting node. Node 123 is shown for comparison as it is
close to the average of the consumption of all other nodes.
\begin{figure}
\centering
\includegraphics[height=0.3\textheight]{../images/consumption-hosts.pdf}
\caption{Energy consumption for the sink node and the resetting node}
\label{fig:cmpsinkreset}
\end{figure}
When viewed separatly, the energy consumption of these nodes varies widely from
the other nodes and from each other as seen in \autoref{fig:cmpsinkreset}. In
the case of the sink node, this is because, to minimize packet loss, its radio
is forced to the \ac{RX} state and it acts as the \ac{UDP}-sink and thus has to
process many packets. The resetting node can be expected to consume much less
power while it is resetting, since a restart involves power-cycling the node.
Thus, for the phases with resets (R, HR, HSR), the consumption is lower for node
200 and 157 consumes significantly more energy than the average node. Such a
node would typically be powered from the power grid.
%\subsection{Relation to Rank}
%
%% TODO
%
%As can be viewed from \autoref{fig:relconsum}, the energy consumption increases
%with the rank for HS and HSR. For the N and H, this relationship is inverted.
%
%\begin{figure}
% \centering
% \includegraphics[height=0.4\textheight]{../images/consumption-regress.pdf}
% \caption{Relation of rank and consumption of a node}
% \label{fig:relconsum}
%\end{figure}
\subsection{Relation to Position inside the Testbed}
\autoref{fig:posenergy} displays the positions of the nodes inside the testbed.
Every cell shows a color representing the relative energy consumption associated with
that node. Lighter colors represent a lower energy consumption, whereas darker
colors represent a higher energy consumption.
\begin{figure}
\centering
\includegraphics[width=0.4\textheight]{../images/consumption-nodes.pdf}
\caption{Energy consumption of nodes arranged by their position}
\label{fig:posenergy}
\end{figure}
One noticeable thing about this distribution is that node 87 has the highest
energy consumption in all phases. When looking at the \ac{DAG}, this node is
mostly located at a lower rank. Nodes that are physically located closer to the
root node 157, tend to have a lower energy consumption than surrounding nodes.
A wall separates the nodes 192, 194 and 196 from 157. This coincides with these
nodes having a higher energy consumption.
\section{Network Performance}
In this section, the network performance and the control overhead for the
different phases is evaluated.
\subsection{End-to-End}
\autoref{fig:perf} shows the average end-to-end delay for all nodes during each
phase of the experiment. While in phase H, shorter delays are possible. At the
same time, the distribution varies more. With the added sanity checks in phase
HS, the distribution is more focused around 2 ms. The default implementation in
N lies somewhere in-between the two.
%\begin{figure}
% \centering
% \subfloat[delay]{{\includegraphics[width=.5\textwidth]{../images/performance-delay.pdf}}}%
% \qquad
% \subfloat[jitter]{{\includegraphics[width=0.5\textwidth]{../images/performance-jitter.pdf}}}%
% \subfloat[loss]{{\includegraphics[width=0.5\textwidth]{../images/performance-loss.pdf}}}%
% \caption{Delay, jitter, loss for each phase}
% \label{fig:perf}
%\end{figure}
\begin{figure}
\centering
\subfloat[delay]{{\includegraphics[width=.5\textwidth]{../images/performance-delay.pdf}}}%
\subfloat[loss]{{\includegraphics[width=.5\textwidth]{../images/performance-loss.pdf}}}%
\caption{End-to-end delay and package reception rate}
\label{fig:perf}
\end{figure}
The packet loss during each phase is displayed in \autoref{fig:perf}. For a
scenario without single node resets, the default implementation fares best,
while in a scenario with single node resets the hardened version without the use
of \acp{UID} looses the fewest packets. If additionally the sanity checking
of the persistent state is enabled, the most packets are lost.
One possible explanation for this is that if the persistent state is directly
restored, most of the time this state is sufficient for the forwarding of newly
arriving packets. If the node must first validate the saved state, it looses
time during which arriving packets may be dropped. This would suggest that the
validation of the saved state is actually slower than the default method of
recovery.
\subsection{Control Overhead}
The number of messages that need to be emitted during the repair operations of
the \ac{DAG} determines the utilisation of the radio of the node. It is to be
expected that a large part of the energy consumption of the each node is
determined by the number of messages it emits. Thus, when evaluating the
efficiency of the different implementations and the impact of the single node
resets, the overhead of messages that are transmitted by the implementation
serves an important measure.
\autoref{fig:overhead} shows the overhead created by control messages that were created
during each phase by message type. For each type, the default implementation
creates the fewest additional messages of any type, while the number of messages
is the highest for the implementation used in HS. This may be attributable to
the higher number of messages exchanged during the validation process. The
larger number of overhead created during phase H is likely due to an old state
being restored from previous runs and then invalidated.
\begin{figure}
\centering
\includegraphics[height=0.3\textheight]{../images/performance-overhead.pdf}
\caption{Overhead created by \ac{RPL} messages}
\label{fig:overhead}
\end{figure}
For the phase R the overhead is higher than for the phase N, where no resets
occur. At the same time the inverse is true for the phases with hardened
implementations. Here the effect of an old state being restored and then
invalidated is later canceled out, when restoring the state after the node reset
occurred and actually less overhead is created than for the default implementation.
It should also be noted that there is no significant difference for the overhead
during the HR and HSR phases, which means that the implementation used in HS and
HSR does not offer a clear benefit over the implementation used in H and HR in
terms of message overhead.
\subsubsection{Consumption}
\begin{figure}
\centering
\includegraphics[width=\textwidth]{../images/performance-consumption.pdf}
\caption{Relationship between overhead and energy consumption}
\label{fig:overconsum}
\end{figure}
As can be seen from \autoref{fig:overconsum}, the number of control messages
correlate to the observed consumption. For a larger overhead, the total
consumption increases proportionally.

593
thesis/chapters/fitlab.tex Normal file
View file

@ -0,0 +1,593 @@
\chapter{FIT IoT-LAB}
\label{chap:hardware}
\fitlab \cite{adjih2015fit} is part of a an open testbed for \ac{WSN} and is
composed of more than 2000 nodes. It is part of a larger federation of \ac{WSN}
testbeds called \emph{OneLab} \cite{baron2015onelab}, which also includes
\emph{CorteXlab}\footnote{\url{http://www.cortexlab.fr/}},
\emph{NITLab6}\footnote{\url{ http://nitlab.inf.uth.gr/}} and \emph{FIT
NITOS-Lab}\footnote{\url{http://fit-nitos.fr/}}, \emph{PlanetLab
Europe}\footnote{\url{http://planet-lab.eu/}}, \emph{FUSECO
Playground}\footnote{\url{http://fuseco-playground.org/}} and
\emph{w-iLab.t}\footnote{\url{http://ilabt.iminds.be/wilabt}}. \fitlab provides
a large scale test network for educational, scientific and industrial purposes
to end users which can be used for obtaining reproducible results since experiments
can run fully automated and all hardware and software is freely available under
open source licenses.
\section{Sensor Nodes}
\label{sec:architecture}
\usetikzlibrary{shapes,shapes.misc,positioning,circuits.ee.IEC}
\tikzstyle{system}=[shape=rounded rectangle,fill=tubsBlueLight20,text centered,draw]
\tikzstyle{sensor}=[rectangle,fill=tubsOrangeLight20,text centered,draw]
\tikzstyle{processor}=[rectangle,fill=tubsBlueLight20,text centered,draw]
\tikzstyle{connector}=[rectangle,fill=tubsGreenLight20,text centered,draw]
\tikzstyle{memory}=[rectangle,fill=tubsGreenLight20,text centered,draw]
\tikzstyle{radio}=[rectangle,fill=tubsBlue20,text centered,draw]
\tikzstyle{bus}=[<->,color=black,draw]
\tikzstyle{buslabel}=[near end,color=black,font=\tiny,auto]
\tikzstyle{vcc}=[->,color=tubsRed,font=\tiny,draw,text=black]
Each node in the testbed is itself assembled from three individual nodes: The
\ac{ON}, the \ac{GW}, and the \ac{CN} (see also \autoref{fig:fitnode}). The
\ac{ON}, which in this setup is the actual sensor node, is controlled and
programmed by the \ac{CN} via the \emph{Open Node Connector}, In the case of the
\emph{M3} (see \autoref{subsec:m3}) \ac{ON}, the \ac{GW} and \ac{CN} are both
implemented on the same node, called the \emph{Host Node}.
\begin{figure}[h]
\centering
\begin{tikzpicture}[node/.style={circle,draw}]
\begin{scope}[node distance=2cm]
\node (a) {ON};
\node (b) [right=of a] {CN};
\node (c) [right=of b] {GW};
\end{scope}
\begin{scope}[<->,>=stealth',auto]
\path (a) edge[bend left] (b)
(b) edge (c);
\end{scope}
\node[draw,dotted,fit=(b)(c)](group){};
\draw[line width=1pt,black,decorate,decoration={amplitude=7pt,brace,mirror}]
(group.south west) -- (group.south east);
\node[below=of group,anchor=center]{Host Node};
\end{tikzpicture}
\caption{Architecture of a \fitlab node in the testbed}
\label{fig:fitnode}
\end{figure}
\subsection{The Open Node}
The \ac{ON} is either based on the
\emph{WSN430}\footnote{\url{https://www.iot-lab.info/hardware/wsn430/}}, an
\emph{M3}\footnote{\url{https://www.iot-lab.info/hardware/m3/}} or an
\emph{A8}\footnote{\url{https://www.iot-lab.info/hardware/a8/}} microprocessor,
and a different number of nodes of each type are available depending on the test
site. All nodes except the \emph{A8} node can run \emph{RIOT}
\cite{baccelli2013riot}, \emph{OpenWSN} \cite{watteyne2012openwsn},
\emph{FreeRTOS}\footnote{\url{http://www.freertos.org/}} and \emph{Contiki},
The \emph{A8} only supports
Linux. In addition, the \emph{WSN430}
also has support for \emph{TinyOS} \cite{levis2005tinyos}. Some of the nodes are
mobile and can be configured to move on specified paths through the network.
Their movements can be tracked using \ac{GPS} and are available to the user.
In this evaluation, the \emph{M3} node has been selected for use as the \ac{ON}.
(see \autoref{subsubsec:reasonsm3}).
\subsection{The M3 Node}
\label{subsec:m3}
All experiments in this work have been performed using the \emph{M3} open node.
A simplified block diagram of this node can be seen in
\autoref{fig:m3node}\footnote{\url{https://github.com/iot-lab/iot-lab/wiki/Hardware_M3-node}}.
None of the external peripherals are used in the experiment and can therefore be
disabled to conserve energy.
\begin{figure}
\centering
\begin{tikzpicture}
\node[processor] (cpu)[minimum width=50,minimum height=50] {M3}; %STM32F103RKEY
\node[memory] (flash) [below of=cpu,below=0.5] {Flash}; %N25Q128A
\draw[bus,transform canvas={xshift=-10}] (flash) edge node[buslabel] {GPIO} (cpu);
\draw[bus,transform canvas={xshift=10}] (flash) edge node[buslabel] {SPI} (cpu);
\node[radio] (wireless) [right of=cpu,above of=cpu,right=0.4] {Radio}; %AT86RF231
\path[bus] (wireless.240) |- node[buslabel,above] {GPIO} (cpu.10);
\path[bus] (wireless) |- node[buslabel] {SPI} (cpu);
\node[system] (usbb) [above of=cpu,above=1] {USB Bridge};
\path[bus] (usbb) edge node[buslabel] {JTAG} (cpu);
\path[bus,transform canvas={xshift= 20}] (usbb) edge node[buslabel] {UART} (cpu);
\node[connector] (oc) [above of=usbb,above=1,minimum width=100] {Open Node Connector};
\path[bus] (usbb) edge node[buslabel]{USB} (oc) ;
\path[bus] (oc.190) |- node[buslabel]{GPIO} (cpu.160) ;
\node[connector] (usb) [right of=oc,below of=oc,right=0.5] {USB};
\path[bus] (oc) |- (usb);
\node[processor] (pm) [below of=usb] {Power};
\node[sensor] (light) [minimum width=80,below left of=oc,left=1.5] {Light Sensor}; %ISL29020
\node[sensor] (pressure)[minimum width=80,below of=light] {Pressure Sensor}; %LPS331AP
\node[sensor] (gyro) [minimum width=80,below of=pressure] {Gyroscope}; %L3G4200D
\node[sensor] (magneto) [minimum width=80,below of=gyro] {Magnetometer}; %LSM303DLHC
\path[bus] (oc.188) |- (light);
\path[bus] (oc.188) |- (gyro);
\path[bus] (oc.188) |- (magneto);
\path[bus] (oc.188) |- (pressure);
\path[bus] (oc.188) |- node[buslabel,below] {I2C} (cpu.190);
\path[bus] (gyro) -| node[buslabel,below] {GPIO} (cpu.120);
\path[bus] (magneto.10)-| (cpu.120);
\path[vcc] (pm.210) |- node[below] {+3.3 V} (usbb);
\path[vcc,orange,text=black] (pm) -- node[auto] {+3.3 V mon.} (wireless);
\path[vcc] (usb) -- node[left] {+5} (pm);
\path[vcc] (oc.340) |- node[near start,auto] {+3.3} (pm.165);
\path[vcc,orange,text=black] (oc.330) |- node[near start,left] {+5} (pm.175);
\path[vcc,orange,text=black] (pm) |- (1,1.8) -| (cpu.70);
\path[vcc,orange,text=black] (pm) |- (1,1.8) -- (0.5,1.8);
\node[rounded rectangle,right of=pm,right=0.1] (gnd) {GND};
\path[vcc,<->] (pm) -- (gnd);
\end{tikzpicture}
\caption{Simplified block diagram of the M3 node}
\label{fig:m3node}
\end{figure}
% better than wsn430: more diversity, closer to real world deployments in modern iot
\subsubsection{CPU}
The \emph{M3} node features a 32-bit ARM Cortex-M3 based CPU (STM32F103REY)
\cite{stm32f103re} with a maximum clock frequency of 72 MHz. The external clock
runs the \ac{CPU} at this maximum clock frequency, which is to be considered
when evaluating the energy consumption. With all peripherals enabled, this
results in a maximum of 70 mA at 3.3 V \ac{VCC}.
The \ac{CPU} has support for low power modes. When in the \emph{Sleep} mode, the
\emph{CPU} is stopped and all peripherals continue working, in the \emph{Stop}
mode all clocks are stopped and only the contents of \ac{SRAM} and \ac{CPU}
registers are preserved. When in the \emph{Standby} mode the contents of the
registers and \emph{SRAM} are not preserved and the \emph{CPU} can only be woken
up by alarms from the \ac{RTC} or an external reset.
The \ac{JTAG} and \ac{UART} are bridged over \ac{USB} and exposed through the
\emph{Open Node Connector}.
\subsubsection{Flash Memory}
The external \emph{N25Q128} \cite{N25Q128} NOR flash is connected to the
\ac{SPI} of the \ac{CPU} and has a storage capacity of 128 MByte and supports a
maximum frequency of 100 MHz. The \ac{CPU} uses a maxium of 18 MHz for the
\ac{SPI} which results in a maximum 18 Mb/s read / write speed. The
current consumption for page size read / write access is 20 mA.
\subsubsection{Wireless Radio Transceiver}
The \emph{M3} has an ATMEL AT86RF231 2.4 GHz radio \cite{AT86RF231}, which is
connected through \ac{SPI} and \ac{GPIO} and are supplied with 3.3 V. It consumes up to
14 mA when transmitting with a maximum \ac{TX} gain of 3 dBm. For this work, the
\ac{TX} has been set to -7 dBm and the \ac{RX} threshold has been set to -60 dBm
(see \autoref{subsec:rssi}). As such, \ac{TX} needs less than 11 mA when transmitting and
10 mA while receiving. The \emph{6LowPAN} network \cite{rfc4944} stack of \emph{Contiki}
uses the IEEE 802.15.4 support of the transceiver.
\subsubsection{Rationale for Choosing the M3 Node}
\label{subsubsec:reasonsm3}
Instead of the \emph{WSN430}, the \emph{M3} node has been selected for the
evaluation. While the \emph{WSN430} has the advantage of having almost the same processor
(\emph{MSP430}) as the \emph{Z1} node, driver support for the \emph{WSN430}
peripherals like the NOR Flash is based on an an outdated version of
\emph{Contiki}. Moreover, only a very limited amount of nodes of this type had
been available at the time of writing and also not in the linear topology
desirable for the evaluation \autoref{sec:ptopo}.
While the drivers for the \emph{M3} do not support energy estimations using
\emph{Energest} \cite{dunkels2011powertrace}, in-case of the \emph{M3} node,
\emph{Host Node} can record the voltage, current and power consumption of the
\ac{ON} in real time. In doing so, even more accurate and realistic measurements
of the energy consumption are available compared to the profiling done using
\emph{Powertrace}. The \emph{WSN430} for one is not controlled by the \emph{Host
Node} so not all variables that would be desirable for the evaluation, such as
\ac{PCAP} files, can be recorded with it.
Another reason for selecting the \emph{M3} node instead of the \emph{WSN430}
was, that a more powerful processor is available, as will likely be the case in
most modern \ac{IoT} applications, and as such it better models the energy
consumption of these use-cases.
\subsubsection{Considerations for Power Consumption}
One thing to keep in mind when measuring the power consumption at the \ac{ON} is
that only the consumption of the components that can be controlled by the
firmware is monitored. For this, the power management unit monitors the voltage
and current of a separate output besides the one supplying the \ac{USB} bridge. The
power management unit can be supplied both through the open node connector (3.3
V) or directly via \ac{USB} (+5 V).
The core components, on which the experiments in this work have an effect in
terms of power consumption,, are the flash memory, because of the persistence
layer for \ac{RPL}, the \ac{CPU} because the restoring needs some amount of
cycles to complete and the radio, as additional / fewer routing messages may be
sent in the individual modes of the hardened implementation.
The peripheral components, such as the different sensors are not of interest for
the evaluation, since they are not differently depending on whether there are
resets occurring or if the hardened implementation is being used.
\begin{table}
\centering
\caption{Consumption estimates for monitored components}
\begin{tabular}{lr}
\toprule
Component & Current Consumption (3.3 V) \\
\midrule
CPU & 70 mA \\
Radio & SLEEP 20 $\mu$A | OFF 0.4 mA | RX 10.3 mA | TX 10 mA\\
Flash & 20 mA
\end{tabular}
\label{tab:consum}
\end{table}
\autoref{tab:consum} shows the different consumption estimates as they were
obtained from the data sheets of the components. It should be noted, that the
consumption of the \ac{CPU} and the wireless radio could be further reduced by
reducing the clock rate the \ac{CPU} is initialized with and reducing the
transmission power of the radio. Both can be configured using software. For the
evaluation, it was sufficient to keep these values, since a relative comparison
between networks with and without resets should be made.
\subsection{Host Node}
For the \emph{M3} and \emph{A8} nodes, the \ac{HN} serves as both the \ac{GW}
and the \ac{CN}. Each \ac{HN} is directly connected to one or more \ac{ON} using
the \emph{Open Node Connector} and serves the purpose of controlling and
recording the experiment on the \ac{ON}. A simplified block diagram of the
\emph{Host Node} is shown in \autoref{fig:hnblock}.
The \ac{CN} is used for starting and stopping the \ac{ON}, flashing the firmware
and providing a remote \ac{JTAG} debugger via the \ac{USB} bridge on the \ac{ON}
and powering the \ac{ON} from battery or the an external power supply. It also
can monitor the power consumption and record the \ac{RSSI} at the \ac{ON}. As an
alternative to monitoring the \ac{RSSI} the \ac{CN} can record \acp{PCAP} of the
network at the \ac{ON}. For this it has its own radio, similar to the one
supplying the \emph{M3} \ac{ON}. It can also be used for recording sensor data from the
peripherals of the \ac{ON} using \ac{I$^2$C}.
The \ac{GW} part of the \ac{HN} features a more powerful \emph{A8} application
processor and is connected to the site server using a wired
\emph{Ethernet} connection. It is running \emph{Linux} and can provide the
\ac{HN} with network access for applications such as running a remote debugger,
flashing the firmware and starting and stopping the node. It also stores the
measured data from the \ac{CN} which then periodically is fetched by the
site server.
Both the \ac{CT} and the \ac{AM} of the power management unit are configurable
in the profile of the experiment so that a range of different \ac{PM} can be
used for sampling the consumption.
\begin{equation}
PM = CT \times AM \times 2
\end{equation}
\begin{figure}
\pgfdeclarelayer{background}
\pgfdeclarelayer{foreground}
\pgfsetlayers{background,main,foreground}
\centering
\begin{tikzpicture}
% CN part
\node[processor] (cpu)[minimum width=50,minimum height=50] {M3};
\node[radio] (wireless) [right of=cpu,right=0.5] {Radio};
\node[system] (bridge) [below of=cpu,below=0.5] {USB Bridge};
\path[bus] (bridge.120) -- node [buslabel,left] {JTAG} (cpu.260);
\path[bus] (bridge.60) -- node [buslabel,right] {SPI} (cpu.280);
\path[bus] (wireless.north west) -- node [buslabel,above] {SPI} (cpu.19);
\path[bus] (wireless.south west) -- node [buslabel,below] {GPIO} (cpu.-19);
\begin{pgfonlayer}{background}
\node [draw,fit=(cpu)(wireless)(bridge),label=above:\tiny{Control Node},fill=tubsGray20] (cn) {};
\end{pgfonlayer}
\node[processor] (a8) [minimum width=100,minimum height=100,left of=cn,left=4,above of=cpu] (a8) {A8};
\node[system] (eth) [below of=a8,below=1] {ETH SW};
\node[connector] (eth0) [below of=eth,right of=eth,below=0.001] {ETH};
\node[connector] (eth1) [below of=eth,below=0.001] {ETH};
\node[connector] (eth2) [below of=eth,left of=eth,below=0.001] {ETH};
\path[bus] (eth) -- node[buslabel,above] {ETH} (a8);
\path[bus] (eth1) -- (eth);
\path[bus] (eth0) |- (eth);
\path[bus] (eth2) |- (eth);
\node[system] (uhub) [above of=a8,above=2,left of=a8] {USB Hub};
\path[bus] (uhub) -- node[buslabel,below] {USB} (a8.120);
\node[connector] (usb0) [above of=uhub] {USB};
\node[connector] (usb1) [above of=uhub,left of=uhub] {USB};
\node[connector] (usb2) [above of=uhub,right of=uhub] {USB};
\path[bus] (usb2) |- (uhub);
\path[bus] (usb0) -- (uhub);
\path[bus] (usb1) |- (uhub);
\node[system] (ubridge2) [right of=uhub,right=2,above of=a8,above=1] {USB Bridge};
\path[bus] (ubridge2) -- node[buslabel,below] {UART} (a8.60);
\node[connector] (onc) [minimum width=200,right of=usb2,right=0.5] {Open Node Connector};
\node[connector] (pwr) [right of=onc,right=3] {Power};
\node[processor] (pwrmgnt) [below of=pwr,below=1] {Power Mgnt};
\node[processor] (cmsr) [below of=pwrmgnt,left of=pwrmgnt,left=2] {Current Msr};
\path[bus] (ubridge2) |- node[buslabel] {USB} (onc.south west);
\path[bus] (eth.north east) -| node[buslabel] {ETH} (onc.187);
\path[bus] (bridge.north west) -| node[buslabel] {UART} (onc.190);
\path[bus] (cpu.160) -| node[buslabel,left] {GPIO} (onc.192);
\path[bus] (cpu.north west) -| node[buslabel,right] {I2C} (onc.194);
\path[vcc,<-] (usb2.south east) |- (3.55,4) -- (pwrmgnt.north west);
\node[label=above left:\tiny{5V}] at (3.55,4) {};
\path[vcc] (3.55,4) -| (cmsr.100);
\path[vcc] (pwrmgnt.north west) -- (onc.south east);
\path[vcc] (pwr) -- node[buslabel,right] {+3.3V to 5V} (pwrmgnt);
\path[vcc,<->] (cmsr.north east) |- (pwrmgnt);
\path[vcc] (pwrmgnt) |- node[buslabel] {+48V} (4.5,-2.4) -| (eth0.east);
\path[vcc,orange] (pwrmgnt.160) |- (onc.east);
\path[vcc,orange] (pwrmgnt.160) |- node[buslabel,below] {+3.3V} (1,3.5) -- (cmsr.28);
\node[draw,rounded rectangle] (gnd) at (2.8,2) {GND};
\path[vcc] (gnd) |- (pwrmgnt);
\end{tikzpicture}
\caption{Simplified block diagram of the Host Node}
\label{fig:hnblock}
\end{figure}
\section{Physical Topology of the Testbed}
\label{sec:ptopo}
When looking at previous studies, the influence of the physical topology of a
network on the routing topology came to attention. This can be used to
affect the routing decisions of \ac{RPL} and reliably create certain \acp{DAG}
in some cases. Nevertheless, with such a large test network of linearly
distributed, we can study the full range of possible outcomes resulting from
\ac{RPL} being applied to it.
For the evaluation, a large network of 44 nodes that are approximately linearly
distributed inside the test lab has been selected. A map of the test network at
\emph{Inria}\footnote{\url{https://www.iot-lab.info/deployment/lille/}} in
France is shown in \autoref{fig:testbed}, where the red nodes are selected for
our experiment, the grey nodes are mounted at the ceiling over a 1.2 $\times$
1.2 m grid at a height of 2.5 m, yellow nodes are attached to vertical poles at
2.4 m, 1.5 m and 0.6 m high and the green square represents the position of the
server cabinet. Additionally, the floor-plan is shown as an outline.
Nodes participating in the experiment are marked as blue circle. Node \emph{157}
has been statically selected as the root of the \ac{DAG}.
\tikzstyle{cnode}=[circle,fill=tubsLightOrange100,text centered,font=\tiny,fill
opacity=0.5,draw opacity=0.5,text opacity=1.0]
\tikzstyle{snode}=[circle,fill=tubsGray20,text centered,font=\tiny,fill
opacity=0.2,draw opacity=0.2,text opacity=1.0]
\tikzstyle{pnode}=[circle split,draw,text centered,fill=tubsLightOrange20,draw opacity=0.1,text opacity=1.0,fill opacity=0.1,font=\tiny]
%TODO latex number vs strings wtf
\newcommand{\fitnode}[3]{%
\ifthenelse
{#3 = 47 \OR #3 = 49 \OR #3 = 51 \OR #3 = 53 \OR #3 = 57 \OR #3 = 59 \OR #3 = 83 \OR #3 = 85 \OR #3 = 87 \OR #3 = 89 \OR #3 = 91 \OR #3 = 93 \OR #3 = 95 \OR #3 = 123 \OR #3 = 127 \OR #3 = 131 \OR #3 = 133 \OR #3 = 151 \OR #3 = 153 \OR #3 = 155 \OR #3 = 157 \OR #3 = 159 \OR #3 = 161 \OR #3 = 192 \OR #3 = 194 \OR #3 = 196 \OR #3 = 198 \OR #3 = 200 \OR #3 = 202 \OR #3 = 204 \OR #3 = 218 \OR #3 = 220 \OR #3 = 222 \OR #3 = 224 \OR #3 = 226 \OR #3 = 228 \OR #3 = 230 \OR #3 = 244 \OR #3 = 246 \OR #3 = 248 \OR #3 = 250 \OR #3 = 252 \OR #3 = 254 \OR #3 = 256}%\isin{47}{ \OR #3 = 47 \OR #3 = 49 \OR #3 = 51 \OR #3 = 53 \OR #3 = 57 \OR #3 = 59 \OR #3 = 83 \OR #3 = 85 \OR #3 = 87 \OR #3 = 89 \OR #3 = 91 \OR #3 = 93 \OR #3 = 95 \OR #3 = 123 \OR #3 = 127 \OR #3 = 131 \OR #3 = 133 \OR #3 = 151 \OR #3 = 153 \OR #3 = 155 \OR #3 = 157 \OR #3 = 159 \OR #3 = 161 \OR #3 = 192 \OR #3 = 194 \OR #3 = 196 \OR #3 = 198 \OR #3 = 200 \OR #3 = 202 \OR #3 = 204 \OR #3 = 218 \OR #3 = 220 \OR #3 = 222 \OR #3 = 224 \OR #3 = 226 \OR #3 = 228 \OR #3 = 230 \OR #3 = 244 \OR #3 = 246 \OR #3 = 248 \OR #3 = 250 \OR #3 = 252 \OR #3 = 254 \OR #3 = 256}}
{\node at (#1,#2) [cnode] {#3};}
{\node at (#1,#2) [snode] {#3};}
}
\begin{figure}
\centering
\usetikzlibrary{shapes}
\begin{tikzpicture}
\draw [black!20] (-0.3,13) -- (7.5,13) -- (7.5,4.8) -- (8.7,4.8) -- (8.7,0.8) -- (4.2,0.8) -- (4.2,-0.6) -- (-0.3,-0.6) -- cycle;
\draw [tubsBlack!20] (8.7,13) -- (13.2,13) -- (13.2,-0.6) -- (8.7,-0.6) -- cycle;
\fill [tubsLightGreen] (4.7,12.4) rectangle (5.3,12.9);
\fill [tubsBlack!20] (-0.3,9.2) rectangle (-0.2,11.9);
\fill [tubsBlack!20] (-0.3,0.2) rectangle (-0.2,2.6);
\fill [tubsBlack!20] (0,13) rectangle (2.2,13.1);
\draw [tubsBlack!20] (-0.4,3.95) rectangle (-0.3,4.1);
\draw [tubsBlack!20] (-0.4,7.95) rectangle (-0.3,8.10);
\draw [tubsBlack!20] (4.05,3.75) rectangle (4.2,3.9);
\draw [tubsBlack!20] (4.05,8.05) rectangle (4.2,8.2);
\draw [tubsBlack!20] (7.40,8.05) rectangle (7.55,8.2);
\draw [tubsBlack!20] (8.65,3.75) rectangle (8.80,3.9);
\foreach \x in {0,1,...,4} {
\foreach \y in {0,1,...,12} {
\pgfmathparse{int(29+\x*18+(12-\y))}
\edef\p{\pgfmathresult}
\fitnode{\x}{\y}{\p}
}
}
\foreach \x in {0,1,...,2} {
\foreach \y in {1,2,...,12} {
\pgfmathparse{int(123+\x*14+(12-\y))}
\edef\p{\pgfmathresult}
\fitnode{\x+5}{\y}{\p}
}
}
\foreach \y in {1,2,...,7} {
\pgfmathparse{int(190-\y)}
\edef\p{\pgfmathresult}
\fitnode{8}{\y}{\p}
}
\foreach \y in {9,10,...,12} {
\pgfmathparse{int(191-\y)}
\edef\p{\pgfmathresult}
\fitnode{8}{\y}{\p}
}
\foreach \x in {0,1,...,4} {
\foreach \y in {0,1,...,12} {
\pgfmathparse{int(256-(4-\x)*13-\y)}
\edef\p{\pgfmathresult}
\fitnode{\x+9}{\y}{\p}
}
}
\foreach \y in {0,2,...,24} {
\pgfmathparse{int(25-\y)}
\edef\p{\pgfmathresult}
\pgfmathparse{int(\p+1)}
\edef\q{\pgfmathresult}
\node at (-0.3,\y/2) [pnode] {\p \nodepart{lower} \q};
}
\node at (0,13) [pnode] {27 \nodepart{lower} 28};
\node at (1,13) [pnode] {45 \nodepart{lower} 46};
\node at (2,13) [pnode] {63 \nodepart{lower} 64};
\node at (3,13) [pnode] {81 \nodepart{lower} 82};
\node at (4,13) [pnode] {99 \nodepart{lower} 100};
\node at (0,-1) [pnode] {42,43 \nodepart{lower} 44};
\node at (1,-1) [pnode] {60,61 \nodepart{lower} 62};
\node at (2,-1) [pnode] {78,79 \nodepart{lower} 80};
\node at (3,-1) [pnode] {96,97 \nodepart{lower} 98};
\node at (4,-1) [pnode] {114,115 \nodepart{lower} 116};
\node at (4.3,0.8) [pnode] {\nodepart{lower} 121,122};
\node at (5,0.8) [pnode] {\nodepart{lower} 135,136};
\node at (6,0.8) [pnode] {\nodepart{lower} 149,150};
\node at (7,0.8) [pnode] {\nodepart{lower} 163,164};
\node at (8,0.8) [pnode] {\nodepart{lower} 190,191};
\foreach \y in {0,2,4,6,8} {
\pgfmathparse{int(177-\y)}
\edef\p{\pgfmathresult}
\pgfmathparse{int(\p+1)}
\edef\q{\pgfmathresult}
\node at (7.5,\y/2+5) [pnode] {\p \nodepart{lower} \q};
}
\foreach \y in {12,14} {
\pgfmathparse{int(179-\y)}
\edef\p{\pgfmathresult}
\pgfmathparse{int(\p+1)}
\edef\q{\pgfmathresult}
\node at (7.5,\y/2+5) [pnode] {\p \nodepart{lower} \q};
}
\node at (4.2,3.9) [pnode] {\nodepart{lower} 119,120};
\node at (4.2,8.1) [pnode] {117,118};
\end{tikzpicture}
\caption{Positions of nodes inside the Lille testbed}
\label{fig:testbed}
\end{figure}
\subsection{Radio Distances and RSSI}
\label{subsec:rssi}
Another important consideration when selecting the number of nodes has been the
distance between the nodes. When performing preliminary testing on the formation
of the network, it has been observed that the network quickly broke down after a
larger number of nodes had been started. By default, the \ac{RX} threshold and
the \ac{TX} power of each nodes radio transceiver are configured to such values
that all nodes are within radio distance of each other. When the network starts
up, each node attempts to announce its presence to its neighbors using
\ac{ICMPv6} messages. With the selected number of nodes this causes a sufficient
number of collisions for the back-off procedure of \emph{ContikiMAC} to fail and
thus no messages can be transmitted successfully.
For such a problem, multiple resolutions can be thought of including increasing
the distance between the nodes, reducing the number of nodes and reducing
\ac{TX} power and increasing the \ac{RX} threshold. Since the positions
of the nodes are fixed and the area of the test lab is limited, increasing the
distance between the nodes is not an option. Reducing the number of nodes would
decrease the size of the resulting routing topology, which also is not
desirable. In this case, it has been sufficient to follow the recommended
method\footnote{\url{https://github.com/iot-lab/iot-lab/wiki/Limit-nodes-connectivity}}
for controlling node connectivity and to reduce the transmission gain from 0 dBm
to -3 dBm and increase the sense threshold from -101 dBm to -60 dBm.
\subsection{Maintenance Status of Nodes}
The status of all nodes inside the test lab can be viewed using the \ac{API} of
\fitlab. When observing the set of available nodes, a significant amount of
nodes was not available for experimentation and marked as \emph{down}, which
amounted for about half of all nodes.
\section{Accessing to the Testbed}
\fitlab offers programmable interfaces for setting up and controlling
experiments, and obtaining the results. These interfaces are accessible over the
internet using either the \ac{REST}-\ac{API}. Measurement results can be
obtained through \ac{SSH} from a server that is located at the site. The
different interfaces are depicted in \autoref{fig:access}.
\begin{figure}
\centering
\usetikzlibrary{fit}
\begin{tikzpicture}
\node (U) at (0,0) [shape=circle,draw,fill=tubsLightGreen20] {User};
\node (A) at (3,0) [fill=tubsLightOrange20,shape=rounded rectangle,draw] {API};
\node (S) at (6,0) [fill=tubsLightOrange20,shape=rectangle,draw] {Site Server};
\path (U) edge [arrows=<->,auto,swap] node {REST} (A);
\path (U) edge [arrows=<->,bend left] node [auto,swap] {SSH} (S);
\path (A) edge [arrows=<->] (S);
\node (n1) at (8.5,-1) [fill=tubsLightBlue20,shape=ellipse,draw] {HN};
\node (n2) at (8.5,0) [fill=tubsLightBlue20,shape=ellipse,draw] {HN};
\node (n3) at (8.5,1) [fill=tubsLightBlue20,shape=ellipse,draw] {HN};
\foreach \n in {n1,n2,n3} \path (\n) edge [arrows=<->,auto,swap] (S);
\path (A) edge [arrows=<->] (S);
\begin{pgfonlayer}{background}
\node [label=above:Intranet,fill opacity=0.2,shape=rounded rectangle,fill=tubsLightViolet80,fit=(n1)(n2)(n3)(S)] {};
\node [label=below left:Internet,fill opacity=0.2,shape=rounded rectangle,fill=tubsLightBlue80,fit=(U)(A)(S)] {};
\end{pgfonlayer}
\end{tikzpicture}
\caption{Accessing the testbed}
\label{fig:access}
\end{figure}
\subsection{REST API}
The \ac{REST} \ac{API} is provided by an \ac{HTTP} server that and can access
all site servers and as such can send commands to and query all \ac{HN} at
each side. This section describes methods of \ac{API} that have been used in the
experiments.
The \ac{API} for controlling experiments supports the creation and deletion of
experiments. All previous experiments can be queried for a description of their
settings and resources. Also, the state of each experiment can be monitored.
Each configuration of an experiment includes a unique number by which the
experiment will be identified, a list of resources (e.g. firmwares) and nodes.
Each experiment also includes an association of each node with a firmware file
and a profile.
Profiles are managed independently of experiments, although for each experiment a
different profile can be selected. Profiles describe how the \ac{HN} of a node
will be configured during the experiment. These settings include the whether the
node will be powered from battery of from an external power supply, the settings
for the current measurement unit, and the settings for monitoring the radio of
the \ac{CN}.
New firmware files can be uploaded through the \ac{API} and flashed to either a
selection of nodes or all nodes associated with the experiment. This way it is
possible to assign different firmwares for the \ac{DAG} root and all other
nodes.
\subsection{Site Server}
At each site, a central server takes care of coordinating all nodes
participating in an experiment. It is accessed by the \ac{API} for executing the
functions it exposes through the its \ac{REST} interface. Besides through the
\ac{API}, the site server can also be accessed directly using \ac{SSH}. Both the
server and each \ac{HN} are part of the same \emph{Ethernet} network and
communicate using \ac{TCP}/\ac{IP}.
Measurement data is periodically collected from the different \ac{HN} and stored
inside the home directory of the user running the associated experiment. This
data can then be transferred or examined by the user by executing commands on
the server.

View file

@ -0,0 +1,92 @@
\chapter{Introduction}
In recent years, \acp{WSN} have been widely adopted as a means for applications
such as industrial monitoring \cite{ding2010sensing} \cite{rfc5673}, wild-life
tracking \cite{cassens2017automated} and public infrastructure \cite{rfc5867}.
One significant trend is the rise of networked embedded devices that contain
sensors and actors for performing dedicated tasks, which is often referred to as
the \acf{IoT}.
Many of these applications have in common that they need some means of
transporting commands to these embedded devices and in turn receive sensor data
from the devices. The way such a transport channel is achieved is often through
creating a wireless mesh network, since deployment is comparatively convenient
and cost efficient compared to wired networks. At the same time, such devices
will often be battery powered, since at the deployment side a connection to the
power grid might not be available with reasonable effort. In total, these
restrictions lead to different constraints on the software that creates the mesh
routing.
When nodes are supplied by a battery, the maintenance interval of the node
itself depends on how efficiently the node manages its limited amount of energy.
In extreme cases (e.g. where the sink depletes its battery first) the total
lifetime of the network may even depend on the minimum lifetime of any node in
the network. For routing in \ac{WSN} the amount of energy it takes to establish
a routing topology within the network of devices is largely due to the time the
radio is powered for transmitting messages, which is proportional to the number
and sizes of messages that need to be exchanged to create and maintain the
routing topology. The amount of energy used also depends on the efficiency of
the resulting routing topology when transmitting messages through the network.
There is often a trade-off between the energy consumption of a network and the
performance in terms of network latency and throughput. For this, different
metrics can be applied and have to be carefully selected based on the conditions
the network operates in.
A problem that often occurs, especially in harsh environmental conditions (e.g.
outdoor deployments, wildlife monitoring) or with faulty software, is from
single or repeated node restarts. When a node resets, it has to reconfigure its
routing information by exchanging message with surrounding nodes, which makes up
a great part of the energy costs associated with \ac{WSN}. Such restarts do not
only affect the specific resetting node, but also nodes that depend on the
resetting node inside the routing topology. Previous work has shown that this
behavior of transitively failing nodes can have a large impact on the energy
efficiency of \acp{WSN} using \ac{RPL} \cite{kulau2017energy},
\cite{mueller2017}.
The \ac{RPL} is a protocol for routing messages in wireless mesh networks
and has explicitly been designed for use with networks of nodes with low power
consumption and lossy wireless links, as is typical for many deployment
scenarios. It has become the de-facto standard routing protocol for wireless
sensor networks and, as such, its behavior in conditions where transient node
failures occur is of great interest to this work.
One aspect of the effectiveness of a mesh routing protocol is how it deals with
such transient node failures. It has to quickly detect these failures, decide
how to invalidate the routes using the failed node and create and announce
alternative routes in the network.
In subsequent work \cite{mueller2017}, a hardened implementation of \ac{RPL} for
the \emph{Contiki} operating system has been developed, that managed to reduce
the influence of node restarts, by restoring a previous state that has been
stored in the flash memory of the node. This hardened implementation has
previously been evaluated both using simulations and within a limited test
network.
The \fitlab is a shared \ac{IoT} test network that features over 2000 nodes and
provides an interface for the remote configuration and scheduling of
experiments. Measurement data can be obtained for a variety of variables including
\ac{RSSI}, \acp{PCAP}, serial interface output, and event logs for each node.
The goal of this work is to further verify the findings concerning the effect of
node restarts on the performance of \ac{RPL} and to evaluate the effectiveness
of the hardened implementation for reducing the energy impact of transient node
failures on a larger scale. In addition to the parameters needed for the
verification of previous work \cite{mueller2017}, some further variables have
been recorded (see \autoref{tab:params}) and evaluated that allow some
conclusion about the performance \ac{RPL}.
In \autoref{chap:relwork} an introduction to the mechanisms of \ac{RPL} is given
and other work discussing the protocol performance of \ac{RPL} is presented
including an analysis of the effect of transient node failures in \ac{WSN}. From
the literature, different attempts for hardening \ac{RPL} against various forms
of attacks on the protocol and other extensions that aim to add security and
better performance are presented.
The topology of test network and the hardware used for the sensor nodes is
descibed in \autoref{chap:hardware}, while in \autoref{chap:setup} the different
software components are further described including how the data for the
individual variables is recorded, aggregated and analyzed.
In \autoref{chap:evaluation} a description of the different configurations the
experiments were run in, how parameters were controlled is given for and the
analysis of the network topology, network performance and energy consumption is
are presented.

View file

@ -0,0 +1,218 @@
\chapter{Related Work} % English: Related Work
\label{chap:relwork}
Much work has been done evaluating \ac{RPL} and its repair process.
In the following, a brief introduction to \ac{RPL} will be given, then previous research concerning the general performance of the protocol will be presented.
After this, an overview of the sources of unreliability in \ac{WSN} follows and resetting nodes as a factor for network disruption are considered in more detail, before work is presented discussing the effects of the resulting transient node failures in more detail.
From the literature, extensions to \ac{RPL}, that may help to improve protocol reliability and network lifetime, will be presented, including optimizing the \ac{OF} for network lifetime, improving the formation of network paths, implementing fairer broadcast suppression, using intrusion detection systems, adding trust and authenticity and storing routing information persistently.
\section{Introduction to RPL}
\ac{RPL} as defined in \cite{rfc6550} is a routing protocol for \acp{LLN} that provides energy efficient networking for resource-constrained devices in networks where interconnects are expected to be bidirectional, but may have low data rates, high error rates and be unstable.
In \ac{RPL}, nodes self-organize to form a \ac{DODAG}, where the node with the lowest rank is the destination of and the root of the \ac{DAG}.
Such a \ac{DODAG} is displayed in \autoref{fig:dodag}.
The bootstrapping process defines how nodes may join the network by selecting a parent and how to globally or locally repair the network when necessary.
Each node emits \ac{DIO} messages targeted at all nodes in transmission range.
These messages advertise the presence of a node, its affiliation with an existing \ac{DODAG}, the current routing costs and related metrics.
A joining node may receive these messages and select a parent in the \ac{DODAG}
based on the received rank and routing costs, but must not select a node with a rank lesser than its current.
The separation of the route metric from the forwarding process is an important characteristic of \ac{RPL} as the function that is used to calculate the route metric, the \ac{OF}, can be exchanged to form \acp{DODAG} based on different characteristics.
\begin{figure}[h]
\centering
\begin{tikzpicture}[edge from parent/.style={draw,latex-},
every node/.style={circle,draw},level/.style={sibling distance=90mm/#1}]
\node (s) {$r_{0,0}$}
child { node {$r_{1,1}$}
child { node (d) {$r_{2,1}$}}
child { node (a) {$r_{2,2}$}
child { node (f) {$r_{3,1}$} }
}
}
child { node (c) {$r_{1,2}$}
child { node {$r_{2,3}$}
child { node (b) {$r_{3,2}$} }
child { node (e) {$r_{3,3}$} }
}
};
\path (s) edge[dashed] (a);
\path (s) edge[dashed] (b);
\path (c) edge[dashed] (b);
\path (a) edge[dashed] (d);
\path (c) edge[dashed] (e);
\path (d) edge[dashed] (f);
\path (f) edge[dashed] (b);
\end{tikzpicture}
\caption{\acf{DODAG}}
\label{fig:dodag}
\end{figure}
\section{Protocol Performance}
One of the main goals of this work is to evaluate the performance of \ac{RPL} when dealing with transient node failures.
Much research has already been done concerning the general performance of \ac{RPL}.
An extensive survey of energy efficient routing protocols is presented in \ac{WSN} \cite{pantazis2013energy}.
They classify protocols into four main categories, what their forwarding descisions are based on: network structure, communication mode, topology based and reliable routing.
\ac{RPL} is listed as as a reliable routing protocol and further subcategorized as a multipath-based protocol.
As an advantage they list low energy consumption, as a drawback that it only supports unicast traffic.
The scalability is rated as good, as is mobility and robustness.
The effectiveness and performance of \ac{RPL} has been evaluated in many publications \cite{rfc6687,accettura2011performance,korte2012study,ali2012performance,banh2015performance}.
In \cite{rfc6687}, the authors studied \ac{RPL} both in test networks of varying sizes and in simulations.
For the simulations, the link failure model and the network topology have been derived from measurements gathered from real-life deployments.
Simulations were performed using the \emph{OMNet++} \cite{varga2008overview} network simulator.
The authors measured path quality, routing table size, delay bounds, control overhead and connectivity loss.
The study does not directly consider network lifetime and energy consumption as a metric, but results where obtained pertaining to the scalability and performance under realistic scenarios.
It has been found, that \ac{RPL} scales well to very large topologies, provides near optimum path quality, neglible control overhead, and meets desired delay and convergency requirements for the given scenarios.
They also find, that with \ac{RPL} it is possible to tradeoff routing stability for less control overhead and thereby increase network lifetime.
A detailed study of \ac{RPL}, with a range of different parameter settings, as well as a comparison of the OF0 \cite{rfc6552}, which tries optimize for connectivity, and the \ac{ETX}-based \acp{OF} can be found in \cite{ali2012performance}.
Observations where made using the \emph{Cooja} \cite{osterlind2006cross} network simulator for \emph{Contiki}, energy usage has been measured using \emph{Powertrace} \cite{dunkels2011powertrace}.
Results hint to the importance of the settings for the Trickle timer for resource utilisation and the measured parameters.
The \emph{ContikiMac} \cite{dunkels2011contikimac} duty cycling is also found to be beneficiary to energy efficiency.
The findings also include results concerning network latency, delivery ratio, convergence time, control overhead and the energy consumption, using OF0 and \ac{ETX} respectively.
The tendency of node failures to be frequent and often transient is remarked at one point, but the effects were studied explicitly.
In \cite{korte2012study} the \ac{RPL} repair process is studied in more detail.
They evalutate a limited test network using the \emph{Contiki} \ac{RPL} implementation and studied the effect of node failures on the formation of the \ac{DODAG} before it undergoes local or global repair.
Here, the duration of the repair process and its individual steps are recorded and evaluated, as well as the results.
They were also able to confirm that the repaired \ac{DODAG} still matches the optimal \ac{DODAG} created by the \ac{OF}.
Although \cite{korte2012study} did not consider the additional energy usage from node failure, they find that most of the time it takes to recover from failure is spent on detecting a failed node, which may be useful when optimising the energy efficiency of the recovery process.
They also hint at the ability of \ac{RPL} to make use of an \ac{OF} that uses the remaing energy of a node as a metric.
This may help to balance energy consumption inside the network and thereby increase overall network lifetime.
While multiple other studies find that simulations of \ac{RPL} and experimental
results are largely consistent, Korte et al. \cite{korte2012study} measured a noticeable difference when comparing experimental results to simulations made in the network simulator \emph{NS2}\footnote{\url{https://www.isi.edu/nsnam/ns/}}.
They also remark that, for some situations, the time at which certain protocol messages must be send is underspecified in \ac{RPL}.
Depending on implementation details, these may lead to the creation of loops, which make it necessary to initiate a global repair of the \ac{DODAG}.
Additionally, the \ac{NUD} employed in \ac{RPL} may not be indicative for a loss of connectivity if higher layer protocols that function across longer than one hop paths, like \ac{TCP}, are used as an indication of connectivity.
When it comes to the effects of failing nodes, most research focuses on the performance of the local and global repair processes, but either it does not consider the possibility of transient node failure, or neglects the effects on the network lifetime from the resulting increased energy usage.
\section{Sources of Unreliability in WSN}
When evaluating networks with failing nodes, it is important to differentiate
the underlying cause for the failure, since failures from different causes may
exhibit different behavior concerning frequency, duration, and intensity of the
fault. As an example, while an error in programming for an edge-case may result
in only a single node restart, failures from overheating may be result in many
consecutive restarts. \cite{boano2013hot,boano2010impact}
The sources of unreliability in \ac{WSN} can be classified into those where an active attacker is involved and those where unreliable behavior is caused by passive environmental conditions.
%In energy constrained devices such as wireless sensor nodes, duty cycling is an important mechanism for conserving energy.
%One example of this is \emph{ContikiMAC} \cite{dunkels2011contikimac}, where the radio is only enabled for certain intervals of time.
When undervolting a sensor node, components of the nodes are run outside of their specified voltage range.
While this presents an interesting opportunity to increase the lifetime of \acp{WSN}, it may also increase the error rate of components and therefore may cause unpredictable behavior or even temporary node failure \cite{kulau2015undervolting}.
An implementation of undervolting for \ac{WSN} has been done by \cite{kulau2016idealvolting}.
They use supervised learning to adapt the voltage levels for individual nodes based on clock speed, temperature variations and differences in the manufacturing process.
This made it possible to prolong the lifetime of the network by more than 40\%.
Besides undervolting, there are many other factors in \ac{WSN} that may cause temporary node failure, such as temperature variations \cite{boano2010impact,boano2013hot,reynolds1974thermally}, programming errors and faulty components.
An overview of common active attacks against \acp{WSN} is presented by Karlof et
al.\cite{karlof2003secure}.
Surveyed modes of attack include: spoofed, altered, or replayed routing information, selective forwarding, sinkhole attacks, Sybil attacks, wormholes, HELLO flood attacks and acknowledgement spoofing.
As an ad-hoc, hierarchical routing protocol, \ac{RPL} is generally vulnerable against all of the described attacks.
\section{Effects of Transient Node Failures}
One attack not explicitly mentioned in \cite{karlof2003secure} is based on repeatedly restarting nodes as a possible attack vector.
Depending on the topology of the network, a single restarting node may cause transient node failures in other parts of the network and significantly increase the overall energy consumption of the network \cite{kulau2017energy}.
This may also be exploited by an active attacker.
Attacker controlled nodes integrate with the network, possibly using wormholing, in a way that as many paths as possible include the nodes as their parents.
The nodes then fail for a short time and subsequently restart.
By coordinating the timing and spacing of the restarts, an attacker repeatedly forces the network to repair itself.
As this behavior may also be triggered by malfunctioning nodes, such an behavior
may also be triggered accidentally.
In \cite{kulau2017energy}, the energy impact of single node restarts when using \ac{RPL} is studied in detail.
Experiments were done using the \emph{Cooja} network simulator and then compared to a reference simulation without resetting nodes.
Both the effect of single node restarts and multiple node restarts were investigated on a binary tree topology and for a meshed network, where each node can have more than one parent.
They discovered that a single node restart leads to an increased energy consumption of up to 20\% for the restarting node and its direct neighbors.
To remedy the effect of passive node failure, they suggest optimising \ac{RPL} parameters and keeping persistent information across node restarts, while, in case of an active attacker an \ac{IDS} would be more applicable.
\section{Extensions to RPL}
While \ac{RPL} is comparatively easy to implement, it has some weaknesses when it comes to mobility, energy consumption and packet delivery rates.
Some research that was done to extend and improve the protocol is presented here.
\subsection{Objective Functions}
An \ac{OF} is the function by which \ac{RPL} selects a parent in the \ac{DODAG} based on a metric like end-to-end delay, energy usage or delivery probability.
Since, in \ac{RPL}, the choice of \ac{OF} is independent of the forwarding mechanism, it is possible to substitute an \ac{OF} that produces a \ac{DODAG} that will be less effected by certain types of failure conditions.
A network can even have multiple \acp{DODAG}, that each can be optimized for different use cases.
Kamgueu et al. Kamgueu et al. \cite{kamgueu2013energy} implemented an \ac{OF} that uses the remaining energy of a candidate parent as a metric.
This way it is possible to create a \ac{DODAG} that distributes energy usage within the network more evenly and therefore increases network lifetime.
As opposed to computing the total energy level of a path, the costs for a path is the minimum energy level of any node in the path.
The \ac{OF} is evaluated using the \emph{Cooja} simulator for a network of 20 nodes.
They where able to increase network lifetime by around 14\% compared to a network using the \ac{ETX}-based \ac{OF}.
At the same time, the energy-based \ac{OF} achieved around 3\% worse delivery ratio compared to the \ac{ETX}-based \ac{OF}.
They note that future work would be needed to combine \ac{ETX} and energy-based \ac{OF}, to obtain both long network lifetime and a stable network.
\subsection{Coronal RPL}
In \cite{gaddour2014co} \ac{Co-RPL} is proposed as an an extension to \ac{RPL} and evaluated.
Co-\ac{RPL} makes use of the Corona mechanism \cite{olariu2006design} to help in the selection of a parent router and includes a procedure for reducing packet loss in case of a failing parent node.
It has been found that, in specific scenarios, \ac{Co-RPL} reduces end-to-end delay by up to 2.5 seconds, packet loss by up to 45\% and energy consumption by up to 50\%.
\subsection{Trickle Timer}
The time and therefore energy needed for a failed node to re-join the network is also influenced by the behavior of its \emph{Trickle} timer \cite{levis2004trickle}.
For \ac{RPL}, such a timer, based on the number of messages received during a sensing interval, regulates if the sender may send messages after the sensing interval.
Since the behavior of the \emph{Trickle} timer for networks of more than one node is inherently non-deterministic, it is possible that the share of sending time each node gets may be unfair \cite{vallati2013trickle}.
This in turn can result in less than optimal route selections when sensing for possible parents during the bootstrapping process.
\emph{Trickle-F} \cite{vallati2013trickle} is an attempt at adding fair broadcast suppression to the \emph{Trickle} algorithm.
Evaluations have shown its validity and that it was possible to obtain more efficient routes with same power consumption as the original algorithm.
\subsection{Intrusion Detection Systems}
As a method for recognizing and preventing large scale attacks on \ac{WSN}, different \acfp{IDS} implementations have been discussed in the literature \cite{le2011specification,raza2013svelte,kasinathan2013denial}.
These approaches have some considerable disadvantages for \ac{WSN}.
First, \ac{IDS} are most efficient if all information is available at a central location.
This requires a considerable traffic flow from each node to a central sink node, which consumes additional energy and therefore reduces network lifetime.
Additionally, nodes closer to the sink node will see more traffic than nodes closer to a leaf of the \ac{DODAG}, which again reduces total network lifetime.
For the node that processes the collected data, a connection to the power mains and additional storage and processing capabilities may be required.
As a consequence of misbehavior, nodes may be prohibited from accessing the network in certain ways.
This in turn requires that other nodes can be provisioned with rules that facilitate such penalties, which, depending on the network state, may not always be given and could be prevented by an active attacker using blackholing attacks.
In this case a distributed algorithm would appear to be more promising.
\subsection{Authentication and Trust}
In \cite{kantert2016combining} an approach for combining trust and \ac{ETX} is demonstrated, that improves the robustness of \ac{WSN} against unreliable or intentionally malicious nodes.
This technique also has been shown to reduce the impact of nodes that repeatedly employ selective forwarding.
Since a repeatedly failing node may also be interpreted as a node that selectively drops packets, it is possible that this will also be detected by this method.
One problem of security in \ac{WSN} is that, because of the limited capabilities of the nodes, message authenticity is often not implemented, which makes the network susceptible to spoofed, altered or replayed routing information \cite{karlof2003secure}.
If the network was be protected against spoofed messages, it would be considerably more difficult for an active attacker to impersonate nodes or create virtual nodes, that take part in attacks on the network.
An implementation of message authenticity and protection against replay attacks can be found in \cite{perazzo2017implementation}.
The authors show that their protection against replay attacks has a considerable negative impact on network formation time, while the message authenticity and encryption only had a modest impact on performance.
\subsection{Persistent Routing Information}
Another promising approach for hardening \ac{RPL} against transient node failures is to reduce the time the bootstrapping process takes by saving some of the state of the \ac{RPL} implementation between node restarts and restoring it after the node has failed.
An implementation of this approach has been created for \emph{Contiki\ac{RPL}} \cite{mueller2017}.
Multiple new problems arise from this approach: The implementation has to guarantee that the saved state remains consistent, even if the node fails while still editing the saved state, and the node needs to be able to decide if a restored state still remains valid.
To solve the problem of data integrity, the implementation constructs a checksum for the stored data and stores it along with the \ac{RPL} state.
On each node, the implementation keeps a clock that describes the recentness of the saved information.
From the clock, the \ac{DODAG} ID, instance ID and the version number of the \ac{DODAG} a \ac{UID} is computed and send alongside other information as part of the \ac{DIO} messages.
Joining nodes receive these \acp{UID}, and can use them to decide if the state of the surrounding network has diverged from the state they have stored before.
Another issue is that the write operations cause additional energy usage.
This issue has been addressed by reducing the number and frequency of writes by directly accessing the device driver instead of relying on the file system.
The evaluation is done using simulations in the \emph{Cooja} network simulator.
Only two topologies were evaluated, a binary tree topology and a meshed star topology, similar to \cite{kulau2017energy}.
The energy overhead of the hardened implementation was measured and compared against the same network using default \emph{Contiki\ac{RPL}} and without using the UID.
Networks without failing nodes and with a repeatedly failing node have been simulated, as well as multiple clock intervals.
The simulation has been validated within a very limited test network of seven nodes at \ac{IBR}.
The test nodes were \emph{Zolertia Z1} sensor nodes that were programable using \emph{Raspberry Pi} \ac{SoC} computers.
Except for the size, this setup is in many ways similar to the setup used by \fitlab sensor nodes.
In contrast to the simulations, link-quality was below 100\% because of interference.
Similar to \cite{ali2012performance}, energy measurements where done using \emph{Powertrace}.
The evaluation has shown a maximum of 0.5\% energy overhead compared to the default implementation, and during individual or multiple node restarts the additional energy usage was reduced by 55\% to 70\%.

327
thesis/chapters/setup.tex Normal file
View file

@ -0,0 +1,327 @@
\chapter{Measurement Setup}
\label{chap:setup}
This chapter describes the setup used for configuring, and controlling the
experiments inside the lab. Furthermore,it is described how measurement data are
collected, processed and stored. The general software architecture is shown in
\autoref{fig:components}. The components are distributed across several
different platforms.
Measurement data for multiple parameters were collected from the network. The
variables and their data sources are listed in \autoref{tab:params}. The packet
captures can be substituted with outputs on the serial output. For telling the
energy consumption of an implementation, a \ac{PM} of one sample per second is
sufficient.
\begin{table}[h]
\centering
\caption{Measured variables at each node}
\begin{tabular}{ll}
\toprule
Variable & Source \\
\midrule
Network latency & PCAP, serial output \\
Delivery ratio & PCAP, serial output \\
Control overhead & PCAP, serial output \\
Number of DIO, DAO & PCAP, serial output \\
DODAG state & serial output \\
Routing table & serial output \\
Node reset times & HN event log \\
RSSI & HN radio receiver \\
Energy consumption & HN power monitor \\
\end{tabular}
\label{tab:params}
\end{table}
When evaluating such a large set of variables from multiple different sources,
the amount of generated data and if enough processing power and storage space is
available has to carefully considered. \autoref{tab:amountdata} shows the
estimated amounts of data generated for each variable. For 50 runs this merely
amounts to 500 MByte of raw data, which can be further compressed using text
compression, since they, especially in the case of the serial log, contain many
repeating strings.
\begin{table}
\centering
\caption{Estimated amount of data generated by an experiment with 44 nodes,
lasting 600 seconds}
\begin{tabular}{lrr}
\toprule
Source & Single (bytes) & Total (KBytes)\\
\midrule
Serial output & 300 & 7920 \\
Event log & 40 & 1056 \\
Consumption log & 46 & 1214 \\
RSSI log & 46 & 1214 \\
\midrule
Total & & 11404 \\
\end{tabular}
\label{tab:amountdata}
\end{table}
\section{Testlab Nodes}
As previously mentioned in \autoref{sec:architecture}, each node inside the
testlab is consists of the \ac{ON} that runs the firmware supplied by the user
and the \ac{HN} that controls and monitors the \ac{ON}. In this section, the
software running on the node and how data is collected from it is further
described.
\subsection{Open Node}
Each \ac{ON} inside the test lab runs a version of the \emph{Contiki} operating
system. The node at the root of the \ac{DAG} is programmed to act as a sink for
the \ac{UDP} traffic that is periodically emitted the other nodes that act as
sources. Each such packet contains a sequence number. The reception of these
packets at the sink and the emission from the source are logged to the serial
output of the node node. This makes it possible to detect whether a packet has
been received when comparing the log of the sink and that of the respective
sender.
Further entries inside the log on the serial output from each node include the
number of emitted \ac{DIO} and \ac{DAO} messages, changes of the routing table and changes
of the preferred parent.
Different versions of the firmware have been produced for the \ac{DAG} root and
all other nodes in the network. The firmware of the root creates a new
\ac{DAG} with the node itself at the root. All other nodes will attempt to join
this \ac{DAG}.
\subsection{Porting the Firmware}
The firmware supplied to the \ac{ON} comes in different versions which each have
different features enabled. The default configuration of \emph{Contiki} 2.7 with
added device support for the \emph{M3} node is provided by \fitlab. The hardened
implementation has been developed based on an older version of \emph{Contiki}, that
also did not include device support for the \emph{M3} node. In preparation for
the evaluation, the hardened implementation has been ported to this newer version
of \emph{Contiki} and some changes and additions had to be made to take into
account changed software interfaces and device specific differences.
Some build options specific to the \emph{Z1} where required for the hardened
implementation to function. These options have been added as necessary.
A missing consistency check has been discovered when restoring the routing state
from flash memory and has then been added that erases the state from the flash
if an inconsistency is discovered.
When testing the hardened implementation, it came to attention that a restart of
the \ac{DAG} root will result in it becoming unable to reconfigure as the
\ac{DAG} root, leading to the network being unusable until the state is purged
from the flash memory of the root node. This is caused by the root node
restoring the previously recorded \ac{DIO} messages from its flash and replaying
them onto the \ac{RPL} module before it configures itself as the root of a
\ac{DAG}. This leads to it joining an existing \ac{DAG} discovered from the
\ac{DIO} messages. This \ac{DAG} is the same \ac{DAG} as was the node previously
the root of. The node joins this \ac{DAG} and selects one of its nodes as a
preferred parent, setting its own rank to a rank larger than its parent. It is
likely that the new parent node previously had selected the former root node as
a parent. In this case, the new parent node of the former root node discovers
that its own parent has a larger rank than In this case, the new parent node of
the former root node discovers that its own parent has a larger rank than
itself. This triggers the former root node to be dropped as the parent at this
node. Thus the tree looses its root and becomes unusable but also unable to
recover, since the root node believes to be already part of a \ac{DAG} and will
not create a new \ac{DAG} with itself as the root node.
\subsection{Host Node}
The \ac{HN} collects the log output from the \ac{ON} and offers it to
connecting clients of a \ac{TCP} network socket. The \ac{HN} also can record
\acp{PCAP} in the direct vicinity of the \ac{ON} and forwards them in a similar
manner to the serial output. Alternatively, the \ac{HN} can record the local
\ac{RSSI}. For each event, including start, stop and reset of the attached
\ac{ON}, a corresponding entry is appended to an event log. The \ac{HN} can also
record accurate values for the power consumption, voltage and current from the
power management unit of the \ac{ON}.
For the purpose of measuring the energy consumption of the node in
reaction to a transient node failure, a high \ac{AM} can be selected since
\ac{RPL} takes some time to react to the change in the network, so the state
will persist for a multiple second. This way it is possible to reduce
the noise component and acquire more accurate data.
The \ac{HN} stores both the consumption data and the \ac{RSSI} in the
\ac{OML}. These log files are stored locally and periodically collected and
stored by the site server.
\section{Site Server}
At each site, a shared server is provided running a multi-user system
(\emph{Linux}). The server periodically queries the events, \ac{RSSI} and
consumption of each \ac{HN} and for each node in the form of log files, stores
them inside the home directory of the user who owns the experiment. The forwarded
serial output of the nodes and the output of the network sniffer are not stored
directly on the shared server because a large amount of data might be generated
this way. Instead, scripts are provided to aggregate the serial output and
stream of network packages into two aggregate streams of messages. These
streams can then be forwarded and stored for later analysis on the local
computer.
\section{Local Computer}
The local computer submits the experiment to the \ac{API}, controls the execution
of the experiment and collects the results from the site server. The data is
stored, processed and later displayed to the user.
\subsection{Orchestration Scripts}
For the purpose of configuring the experiment through the \ac{API}, an
extendable set of scripts has been created. The \texttt{run} script submits the
configuration of the experiment to the \ac{API}, monitors the state of the
experiment, sets up forwardings for the aggregated serial output and packet
captures and loads and calls into hook functions that can be specified per
experiment. The control flow of the script is shown in \autoref{fig:orchest}.
\begin{figure}[h]
\centering
\begin{tikzpicture}
\tikzumlset{fill class=tubsLightBlue20, fill object=tubsLightBlue20, fill component=tubsLightBlue40}
\begin{umlseqdiag}
\umlobject[class=script]{run}
\begin{umlcallself}[op=pre]{run}
\end{umlcallself}
\begin{umlcallself}[op=wait running]{run}
\end{umlcallself}
\umlobject[class=hook]{during}
\begin{umlcall}[op=subshell,type=asynchron]{run}{during}
\umlobject[class=Subshell]{track}
\begin{umlcall}[op=subshell,type=asynchron]{run}{track}
\begin{umlcallself}[op=serial aggregator,type=asynchron]{track}
\begin{umlcallself}[op=sniffer aggregator,type=asynchron]{track}
\end{umlcallself}
\end{umlcallself}
\end{umlcall}
\end{umlcall}
\umlobject[class=hook]{post}
\begin{umlcallself}[op=wait terminated]{run}
\end{umlcallself}
\begin{umlcall}{run}{post}
\end{umlcall}
\begin{umlcallself}[op=save results]{run}
\end{umlcallself}
\end{umlseqdiag}
\end{tikzpicture}
\caption{A Sequence diagram of the orchestration scripts}
\label{fig:orchest}
\end{figure}
First, the script calls the \texttt{pre} hook function from a script file
associated with the experiment, this function can be used for setting up the
environment at the local computer with settings specific to the experiment. Then
the script sends send the configuration of the experiment together with the
associated firmware files to the \ac{REST}-\ac{API} and waits for the experiment
to be started on the test-lab and stores the ID announced by the \ac{API}.
After the \ac{API} has marked the experiment as \texttt{Running}, the script
proceeds by calling into the \texttt{during} function of the experiment. This
function can be used for manipulating the experiment during its executing. In
the case of the experiments run in the evaluation, this function takes care of
resetting a node at a random time during a phase with resets and flashing the different versions of the firmware at the
required time.
After the \texttt{during} function completes, the orchestration script copies
any further results from the side server to the local computer.
\subsection{Aggregation and Storage}
For the storage and analysis of the measurement data, the local computer connects
to the shared server. The aggregated serial output and \acp{PCAP} are
transported to the local computer on one \ac{SSH} connections and each stored
into a separate files. The local computer also collects the log files containing
the events, energy consumption and \ac{RSSI} from the shared server and stores them for
later analysis.
After all data of an experiment has been collected, the contents of the
different files is pre-processed according to the criteria specified for the
evaluation and this data is then stored into a
\emph{SQLite3} database for later analysis. The main reason for pre-processing
the data is to be able to perform the analysis within adequate computation time.
The pre-processed data is then fetched from the database using \ac{SQL}. Numeric
processing of the data is done using \emph{SciPy}
\footnote{\url{https://www.scipy.org/}}, a software library for scientific
computation that is mostly written in the programming language \emph{Python}
but binds to functions written in lower level programming languages that contain
optimized code for certain tasks. The results of the analysis are then displayed
using \emph{Matplotlib}, a \emph{Python} library for displaying structured
data-sets.
\begin{landscape}
\begin{figure}
\centering
\begin{tikzpicture}
\tikzumlset{fill component=tubsLightBlue20, fill port=tubsLightBlue40}
\begin{umlcomponent}[x=-0.5,y=15,fill=white]{Testlab Node}
\end{umlcomponent}
\begin{umlcomponent}[x=3.5,y=15,fill=white]{Testlab Node}
\end{umlcomponent}
\begin{umlcomponent}[x=7.5,y=15,fill=white]{Testlab Node}
\end{umlcomponent}
\begin{umlcomponent}[x=-1,y=14,fill=white]{Testlab Node}
\end{umlcomponent}
\begin{umlcomponent}[x=3,y=14,fill=white]{Testlab Node}
\end{umlcomponent}
\begin{umlcomponent}[x=7,y=14,fill=white]{Testlab Node}
\end{umlcomponent}
\begin{umlcomponent}[y=1]{Testlab Node}
\begin{umlcomponent}{Open Node}
\umlbasiccomponent{UDP Sink / Source}
\umlbasiccomponent[y=3]{Logger}
\umlbasiccomponent[y=6]{Powertrace}
\end{umlcomponent}
\begin{umlcomponent}[x=6]{Host Node}
\umlbasiccomponent{Consumption}
\umlbasiccomponent[y=2.5]{RSSI}
\umlbasiccomponent[y=5]{Event Log}
\umlbasiccomponent[y=7.5]{Sniffer}
\umlbasiccomponent[y=10]{Forwarded Serial}
\end{umlcomponent}
\end{umlcomponent}
\begin{umlcomponent}[y=1,x=14.5]{Shared Server}
\umlbasiccomponent[y=0]{API}
\umlbasiccomponent[y=2.5]{Log files}
\umlbasiccomponent[y=5]{Sniffer Aggregator}
\umlbasiccomponent[y=7.5]{Serial Aggregator}
\end{umlcomponent}
\begin{umlcomponent}[y=1,x=20,fill=tubsLightGreen20]{Local Computer}
\umlbasiccomponent[y=12]{Orchestration}
\umlbasiccomponent[y=8]{Database}
\umlbasiccomponent[y=4]{Analysis}
\begin{umlcomponent}[y=0]{Presentation}
\umlactor[scale=0.5]{User}
\end{umlcomponent}
\umlassemblyconnector[interface=Parser]{Database-north-port}{Orchestration-south-port}
\umlassemblyconnector[interface=SQL]{Analysis-north-port}{Database-south-port}
\umlassemblyconnector[interface=Python]{Presentation-north-port}{Analysis-south-port}
\end{umlcomponent}
\umlbasiccomponent[x=14.5,y=14,fill=tubsLightBlue20]{API Server}
\umlassemblyconnector[interface=REST,with port]{Orchestration-west-port}{API Server}
\umlassemblyconnector[interface=SSH,with port]{API Server-south-port}{Shared Server-north-port}
\umlassemblyconnector[interface=SSH,with port]{Orchestration}{Shared Server}
\umlassemblyconnector[interface=ON Con,with port]{Host Node}{Open Node}
%\umlHVHassemblyconnector[interface=Configuration,with port]{API}{Host Node}
\umlHVHassemblyconnector[with port,arm2=+2cm]{Log files}{Consumption}
\umlassemblyconnector[interface=Log Collection,with port]{Log files}{RSSI}
\umlHVHassemblyconnector[with port,arm2=+2cm]{Log files}{Event Log}
\umlassemblyconnector[interface=TCP,with port]{Serial Aggregator}{Forwarded Serial}
\umlassemblyconnector[interface=TCP,with port]{Sniffer Aggregator}{Sniffer}
\end{tikzpicture}
\caption{Software components used in the evaluation}
\label{fig:components}
\end{figure}
\end{landscape}