Estimation of a Growth Factor to Achieve Scalable ad hoc Networks 1

Introduction : this study focuses on determining the conditions that guarantee scalability in a hierarchical architecture based on the resources available in the different layers of the network. Methods : a model based on the truncated geometric distribution that allows the characterization of the available resources in an ad hoc network was proposed. From this characterization, the concept of growth factor j was introduced as a constant value that represents the appropriate relationship from among the available resources in two successive layers of the network to ensure scalability. The ns-3 software was used to develop simulation scenarios that were explored to develop the estimation process of the growth factor j in the ad hoc networks of two-and three-layer hierarchical architectures. Results : The results showed that the growth of resources among the layers in an ad hoc network can be expressed in a linear model. Moreover, a relation was found between the estimated value of j and the golden ratio as a way to inspire the design of artificial systems based on attributes that can be found in nature.


Introduction
One of the most desirable properties of ad hoc networks is the ability to increase their size, receive new nodes and configure new applications without affecting the quality of the services.This property, known as scalability, is one of the main challenges in protocol design and is required to achieve ad hoc networks with high deployability.
In general terms, ad hoc networks may be configured into flat or hierarchical architectures.Because flat architectures are not scalable, this study focuses on determining the conditions that guarantee the scalability of a hierarchical architecture based on the resource availability in the different layers of the network.In hierarchical architectures, the upper layers must be able to support the related additional workload to serve as intermediaries in network communications.As consequence, the following question arises: what is the appropriate relationship among the available resources in two successive layers of the network to ensure scalability?To answer that question in this paper, the concept of the growth factor was proposed.
To do this, we present a model based on truncated geometric distribution to characterize the proportion of resources in a node compared to the overall resources in the network.This characterization allows the estimation of the growth factor, which is a constant value that represents the necessary resources in the upper layers to achieve scalability.This paper can be considered as an extended version of the work published in [1], [2].We extend our previous work by verifying the scalability of two and three-layer hierarchical architectures and showing their relation to the golden ratio.
This paper is organized as follows: in section 1, we briefly introduce some basic concepts related to the scalability property in ad hoc networks.In subsection 1.3, we describe the probabilistic model based on the truncated geometric distribution.In section 2, we present the estimation process of the growth factor through simulation, together with the statistical analysis of results.Finally, in section 3, we draw conclusions and identify future work.

Ad hoc networks: an overview
Ad hoc networks are self-organized computing systems that consist of a set of nodes that communicate with each other through wireless connections and do not depend on a preexisting infrastructure to operate [3], [4].One of the most important properties of ad hoc networks is their architecture.This property is directly related to scalability and describes the configuration of the nodes, the functional organization and the necessary protocols for network operation [5], [6].As shown in Figure 1, there are two types of architecture in the ad hoc networks: flat and hierarchical [4], [7].In flat architectures, all the nodes behave independently and have to perform routing and packet forwarding without any external control.This type of architecture has scalability problems because the size of the routing tables is proportional to the number of nodes, and the traffic capability consequently decreases by 1 n as the number of nodes increases.This result is widely explained in the work of Gupta and Kumar [8], which, to the best of the authors' knowledge, remains the most important work related to scalability in ad hoc networks.
On the other hand, hierarchical architectures are generated by clustering algorithms that create groups of adjacent nodes aiming to address the needs of the network (routing, reduce energy consumption, and enhance cooperation) and decrease the workload of the nodes [9].This type of configuration arises as an alternative that attempts to improve the scalability problems of flat architectures, focusing on the availability of resources in the network layers and the tasks that they have to perform [10].

Related works
The deployment of scalable ad hoc networks entails the evaluation of the relationship among the network architecture, the available resources and the task assignment [11], [1].Some works that describe the relationship between these properties and scalability in ad hoc networks are presented below.
In the case of flat architectures, there are studies attempting to improve the limits presented by Gupta and Kumar [8].In different works [12]- [14], the mobility and cooperation between nodes is used as a way to improve the scalability of the network without establishing limits in communications delays [11] studied the scalability of military networks using self-similar traffic patterns, showing better performance due to the use of traffic patterns related to a specific operational context.It is important to mention that this improvement is evaluated through the implementation process.In works such as [1], [15] and [14], the influence of the network architecture in scalability is described.Although these works present better performances, it is not possible to generalize the results because the proposed solutions are dependent on the operational context [9].Consequently, the theoretical limit proposed by Gupta and Kumar is still a challenge in the design of protocols for flat architectures [15].
In a hierarchical architecture, one of the most important challenges is to achieve a suitable relation between resource distribution and the workload over network layers [4], [16].Recently, many investigations have tried to identify the main aspects that affect the operation of an ad hoc network, from the routing protocols [9], [11], [17], the resources distribution [1], [17] and the election of the clusterhead [9].In this sense, the proposed solutions for both flat and hierarchical architectures depend on the operational context and respond to only a particular purpose, which is why it is not possible to make generalizations about them [4].In the studies related to resource distribution, we can find the clustering algorithms under metrics combination [11], [2], [18] as a way to characterize the available resources in the nodes.This type of algorithm inspired part of the model proposed in this research.

Stochastic model for the characterization of a cluster and its resources
The model presented below is based on the truncated geometric distribution, and its purpose is to characterize the available resources in the nodes of an ad hoc network.For that reason, it is necessary to first introduce some definitions regarding probability.

Truncated geometric distribution
Let X ~ G(p) be a random variable with a geometric distribution [19] and a density probability function that is shown in equation (1) with parameter p ∈ (0,1) and domain x ∈  = {1, 2, 3, ...} For the purposes of this investigation, it is necessary to restrict the domain of the geometric random variable to a subset A ⊆  and thus adjust the probabilistic structure of the random variable to this subset.Therefore, if Y represents a random variable with a geometric probability distribution with parameter p but truncated to the subset A, then: where I A is the indicator function of the A set, and P (A) = ∑ x∈A f X (x).Finally, it is possible to obtain the probability density function of a random variable with a geometric truncated distribution as shown in equation (3).
1.3.2.Characterization of a cluster and its resources from the truncated geometric distribution Traditionally, the geometric family has been used as a model to represent the number of Bernoulli trials before the first success.However, in this study the geometric family is proposed as a model to describe the percentage of total resources in a node compared to the 100% in a cluster [2].Therefore, it is convenient to model the resource percentage of a node through a set of random variables of the truncated geometric distribution, keeping in mind that a cluster is formed by a finite number of nodes.Then, if a cluster Y is assumed to have a truncated geometric distribution, the percentage of resources of the y-th node will be determined by equation (3).
Starting with the fact that an ad hoc network could be formed by more than one cluster, it is necessary to define a set of random variables belonging to the truncated geometric family that enables modeling the total number of clusters in the network.With this purpose, {Y i } is defined where i ∈  is a set of discrete random variables with probability density functions determined by equation (4), with parameter p i and truncated to A i ⊆ , which represents the total nodes in the cluster Y i .
To characterize the available resources in the nodes of a cluster, the following aspects must be considered: • Each cluster Y i will be associated with the probability density function f Y i (y i ).
• The nodes in the cluster will be represented by the value y i in f Y i (y i ).
• The set of nodes in a cluster will be represented by the A set, where its cardinal number |Ai| = N i is the total number of nodes in the cluster.• The probability value obtained from f Y i (y i ) for a node y i represents the percentage of available resources of a node compared to the total resources of the cluster.• It is possible to verify the level of heterogeneity in the resource distribution of a cluster by tracking changes in the parameter p.
In Figures 2 and 3 it is possible to observe an example of characterization for the clusters Y 1 and Y 2 using the proposed model [2].The cluster Y 1 has a value of N 1 = 5, and the cluster Y 2 has value of N 2 = 3.The domain elements of both random variables represent the nodes in each cluster.In Figure 2  Source: authors' own elaboration Source: authors' own elaboration

Expected value of resources in a node
As shown, the node y i in f Y i (y i ) represents the node in a cluster.However, it is necessary to achieve a value that represents the average of the most significant computational resources in a node.To obtain this value, the function h:  n →  + is proposed.Several possibilities can be explored for h: (•), but in this research we proposed a model based on clustering algorithms through metric combination [17].The features of h: (•) are explained below: 1. Define the value n as the number of parameters to be evaluated.

Obtain the values
) that represent the value of all re- sources in the i-th node in the network.

Resource 1 m i1
Resource 1 m i1 3. Calculate the expected value of resources in the i-th node as shown below: The values of a can be interpreted as normalization constants with ∑ n j=1 a j = 1, which allows us to add more relevance to some resources based on the operational context of the network.4. The expected value of the computational resources in a node y i will be represented by h m ij

Expected value of resources in a cluster
It is possible to use h: (•) as a way to calculate the expected value of resources in a cluster W Y i as a linear combination of the resources of the nodes [9], [17] as shown in equation (6).
where W Y i represents the average value of available resources in the cluster Y i .It is important to note that the expected value of the random variable Y i is not equal to the value of W Y i , and this is due to the nature of the proposed model.If it were necessary to obtain the expected value E[h(Y i )] , this would represent the available resources in any node of the cluster Y i .

Characterization example
Initially, for the development of this investigation, only one parameter per node will be considered to determine its level of resources.This parameter will be the average traffic generated by a node [20] y i and can be calculated as: where A y i is defined as the portion of time in which the node y i is sending data during the communication process.If the node y i sends data for 30 seconds, the value of A y i will be 0.5.The variable D y i is defined as the average rate of information sent by a node.If a node has a transmission rate D y i = 512 kbps and a value of A y i = 0.5, then the average traffic will be B y i = (512 kbps) (0.5) = 256 kbps.This concept can be implemented using the on-off model [21], which allows us to describe the traffic pattern of a node in the network.
To give an example of the characterization process based on the proposed model, the cluster shown in Figure 2 can be used.The steps of the previous sections will be followed below.1. Define the number of parameters to be evaluated: as mentioned before, for this study, the average traffic of a node will be used as the only parameter for the function.Therefore, n = 1. 2. Obtain the values that represent the resources in the nodes: to obtain the values of the average traffic, the values presented in Table 1 will be used.Source: authors' own elaboration 3. Calculate the expected value of resources in the nodes: using the definition of average traffic generated by the node presented in equation (8), it is possible to calculate the expected value of resources in cluster Y i .It is important to note that for this particular example, the value of h(y i ) = B y i .In table 1, it is possible to observe both the obtained results for each node and the value of W Y i .
After obtaining the expected value of resources in Y 1 , it is necessary to use the moments method to estimate the value of p of the truncated geometric distribution that modeled the cluster Y 1 .Finally, a value of p = 0.7 is obtained.A detailed example of how the moments method works in the truncated geometric family can be found in [19], [20], [22].

The growth factor j
It is reasonable to suppose that when new layers are added to the ad hoc network under a hierarchical architecture, an increase is required in the available resources in the upper layers to maintain the service quality.Formally if m i represent 100% of the resources in the i-th level of the network, the resources in the next layer can be expressed as: where g :  →  + is an increasing function that allows us to estimate the value in which we have to increase the resources in the layer i + 1 with regards to the resources in layer i to maintain the quality of the network services.It is possible to explore several possibilities for g (•), but in this study, a proportional growth is assumed, as it is shown in the equation (9).
There are two extreme scenarios in the relationship that can be found between the level of available resources of two successive layers in the network.The first one is when the upper layers have unlimited resources.In this case, the difference between the resources in both layers will be large enough to avoid problems with the quality of the services.The second one is when both layers have the same resources.Then, it is possible that the upper layers will have problems supporting the additional work, as they serve as intermediaries in the communications of the network.Thus, it is reasonable to suppose that the growth factor j should be a value between these two scenarios.
As shown, to achieve two successive layers in the network that can keep a minimum level of the quality of the service implies that the expected value of available resources in the layer i + 1 has to increase as a function of the available resources in the layer i.This relationship can be expressed as: To find a method to estimate the value of j, and starting with the fact that the layer of the network can be formed by more than one cluster, it is necessary to use the following notation: Let Y i (j) be the y-th cluster in the i-th layer of the network.Each cluster is associated with the probability density function f Y i ) of the truncated geometric family that enables the characterization of the distribution of the available resources in the cluster.In Figure 4, it is possible to observe a hierarchical architecture of two layers with the proposed notation.• In the first layer, i = 1 is formed by the clusters in Y 1 (j) with j ∈ {1, 2, 3}.• In the second layer, i = 2 is formed by the clusters in Y 2 (1) .(1) Source: authors' own elaboration Therefore, using the mentioned notation and the topology presented in Figure 3, it is possible to calculate the minimum value of the necessary resources in the network's second layer to maintain a minimum level in the quality of service.The steps are shown below: • First, it is necessary to calculate W Y 1 and W Y 3 .These values represent the expected value of resources in the cluster Y 1 (1) , Y 1 (2) and Y 1 (3) .• The total expected resources in the first layer m i can be obtained as according to equation (6).

Estimation of the growth factor j
The simulation scenarios proposed in this section were developed in the network simulator ns-3 [2] and were designed to estimate values of j in two-and three-layer hierarchical architectures.For all scenarios shown in this section, the following configuration parameters were used: 1.We established a packet loss of less than 1% as performance measure.
2. The operation of the cluster during the simulation was assumed to be stable.Thus, there would be no changes in the network architecture or in the number of nodes in a cluster.3. The stochastic model based on truncated geometric distribution proposed in section.2.3 was used to determine the level of resources in each cluster of the network.4.After finding a level of resources that satisfied the performance condition of the network, the growth factor was calculated to be = μ 2 μ 1 .5. Finally, the mean of j, the standard deviation and a confidence interval with a level of significance of 5% through the t-Distribution (that was used due to the sample size) were calculated.

Estimation of the j constant in a two-layer hierarchical architecture
The three simulation scenarios included the following configuration parameters: a geographical space of 500x500 meters, a Poisson traffic model [23], a OLSR routing algorithm [19], the data transport protocol UDP and a mobility model Randomway Point.It has a uniformly distributed velocity between 0-1 m/s, according to which the nodes were in motion at all times.In Table 2, the configuration parameters can be observed.

Simulation Parameters
Geographical space 500 m × 500 m

Source: authors' own elaboration
To estimate j, it is necessary to characterize the available resources in the first layer of the network.To do this, the model based on the truncated geometric distribution presented in the above section was used.The simulation scenarios were configured in a similar way to the architecture presented in Figure 3.The variation in the simulation scenarios consisted of the number of nodes per cluster and total number of clusters in the network.The architectures' description and simulation results for the estimation process of j are presented in Table 3. Table 3. j estimation phi results -Two-layer architectures

Description
ˆCI with a = 5% Source: authors' own elaboration These results let us suppose that the value of j for a two-layer architecture will be approximately 1.5 times the level of available resources in the first layer of the network.It is important to mention that these partial results were used to verify the behavior of the proposed model and the configuration of ns-3.An exhaustive analysis of hierarchical architectures of three layers is shown in the below section.

Estimation of the j constant in a three-layer hierarchical architecture
The following aspects were considered for the estimation process of j in the three simulation scenarios proposed: 1.To achieve a three-layer hierarchical architecture, a similar configuration to that shown in Figure 3 was employed.We aggregated a third layer to serve as an intermediary in the communication process of two networks with hierarchical two-layer architectures.2. The general configuration parameters to the network operation are shown in Table 1. 3. Bandwidths of 1,2,6,9,12,18 and 24 Mb were established as the available resource in the second layer of the network.After that, 20 simulations were performed for each bandwidth, and the results were analyzed through Weak Law of Large Numbers [19], [24] to determine the mean amount of resources in the third layer of the network.4. Finally, a linear regression was made through the origin.The coefficient of determination and a confidence interval for j with a 5% significance were calculated for the purpose of validating the proposed model.

Estimation process -Scenario 1
This scenario had 4 clusters with 6 nodes in the first layer and 2 clusters with 2 nodes in the second layer.In Table 4 and Figure 5 it is possible to observe the obtained results in the simulation process, as well as a confidence interval of 95% for values of m 1 .Source: authors' own elaboration

Estimation process -Scenario 2
This scenario had 10 clusters with 3 nodes in the first layer and 8 clusters with 6 nodes in the second layer.In Table 5 and Figure 6, it is possible to observe the obtained results in the simulation process as well as a confidence interval of 95% for values m 1 .This scenario had 8 cluster with 6 nodes in the first layer and 3 cluster of 4 nodes in the second layer.In Table 6 and Figure 7, it is possible to observe the obtained results in the simulation process as well as a confidence interval of 95% for values m 1 .

Source: authors' own elaboration
In Table 7 it is possible to observe the estimated values φ for the proposed simulation scenarios.In three cases, a good fit of the obtained data shows that the proposed model responds to the behavior of an ad hoc network under the established conditions in this study.
The growth factor concept verifies the importance of resource distribution among the layers in an ad hoc network.This model can be used to stabilize the performance of a cluster under dynamic conditions.For instance, if the level of resources in the bottom layers change during the operation of the network, the value of the growth factor will represent the amount of resources the upper layer needs to increase to maintain the performance of the network.

The φ estimated value and the golden ratio
The Golden Ratio is an irrational number discovered in ancient times.The number is not as famous for its usefulness in arithmetic as for its presence in nature, art, geometry and for its relation with the concept of beauty.It was believed that aesthetic character of objects lied in the expression of this proportion.It is well known as an implicit relation in nature that can be observed in the plant growth, in the distribution of leaves in a tree or in the growth of a snail shell.
Its constant presence in nature as a relation between proportion of two amounts suggests the possibility of considering the Golden Ration as a possible value for the growth factor.If the golden ratio works in nature systems, why should it be excluded from artificial systems?
Formally, let i = μ i+1 μ i with i = 1,2 being the growth factor for the i-th layer of the network under the relation proposed in equation (9).Thus, keeping in mind the obtained result, it is possible to define a random variable X ij as the j.th estimation of j i for the i-th layer of the network, where X ij ~ N (ji, σ i 2 ), and σ i 2 is an unknown value.In this sense, and considering the possibility that the growth factor can be related to the Golden Ratio, the following hypotheses were proposed: n with a = 5% constructed from Student's t-test distributions were then used [22] for each of the simulation scenarios proposed in this research, as shown in Table 7.These results indicate the lack of evidence for rejecting H 0 .Therefore, it is possible to consider that the value of 1+ 5  2 corresponds to the growth factor proposed in equation ( 9), with a = 5%.This value opens the possibility to determine the best relationship between the level of resources in two successive layers of the ad hoc network.

Conclusions
The massive use of wireless devices makes it necessary to develop mechanisms to handle large amounts of nodes without losing quality in the services offered by the communication networks.Achieving scalability in ad hoc networks is a challenge in the design of protocols, and so it is a necessary condition to obtain ad hoc networks with high deployability.
A model based on the truncated geometric distribution was proposed to characterize the level of resources in an ad hoc network under a hierarchical architecture.This model makes it possible to determine the level of heterogeneity in a cluster to classify the nodes in terms of their resources.
A growth factor was proposed as a constant value that represents the suitable relationship between the level of resources in two successive layers of the network.This relationship can be modeled like a proportional growth between the resources and the performance metrics of the network.This result can be useful to complement the clustering algorithms through metric combination and help to maintain the cluster in a stable condition when there are frequent changes in the amount of the resources in the network.
The growth factor j was estimated through simulation for the hierarchical architectures of two and three layers, finding a value of = 1+ 5  2 with a confidence level of 95%.A similarity between the growth factor and the Golden Ratio was shown as way to inspire the design process of artificial systems based on attributes found in nature.The determination coefficients obtained from the simulation scenarios show a good fit of the results to the proposed model.This investigation can be considered as an extended version of the work published in [1], [2].The main contributions of this paper were the verification of the value of the growth factor for two-layer hierarchical architectures and the exploration of its behavior for three-layer hierarchical architectures.Additionally, we showed through simulation results that it is possible to model the relationship between two successive layers of the network as a linear model.
In future work we expect to explore the behavior of the growth factor in an ad hoc network with self-similar traffic.This type of traffic appears in wireless networks with different services.We want to explore how the value of the growth factor changes when the ad hoc network can support different traffic flows, each one with different performance metrics.

Figure 1 .
Figure 1.Flat and hierarchical architectures , the cluster represented by Y 1 shows nodes with different computational capacities (laptops, smartphones and tablets) and thus a heterogeneous resource distribution in the network.According to the proposed model, clusters like these are modeled by values of p close to 1. On the other hand, clusters like Y 2 , where we found a homogeneous resources distribution in the clusters, are modeled by values of p close to 0.

Table 1 .
Characterization example cluster Y 1

Table 4 .
Simulation results for the three-layer hierarchical architecture -Scenario 1

Table 5 .
Simulation results for the three-layer hierarchical architecture -Scenario 2

Table 6 .
Simulation results for the three-layer hierarchical architecture -Scenario 3

Table 7 .
j estimation results -Three-layer architecture Source: authors' own elaboration