Abstract:
Interconnection networks plays an important role for the performance of
modern high performance computing systems. It consists of a series of
nodes and links. Each node interact with each other for communications
through links. The interconnection network is a requirement of any paral-
lel computer because parallel system shows high performance by providing
reliable and quick communication over the networks. In this context, com-
ponents failures have an extremely high impact because most of the routing
algorithms have not been designed to tolerate faults. Since, one link and/or
node failure may halt the entire computing system and stop the scientific
applications running on them.
In this thesis, we present fault-tolerant routing algorithms based on adap-
tive protocols. Adaptive routing protocols can use alternative paths between
communicating nodes. Multipath networks and adaptive routing protocols
dynamically adapt to network conditions, thus capable of serving intercon-
nection networks affected by a large number of node/link failures. Three
contributions are presented throughout this thesis, namely: fault-tolerant
distributed node-to-node routing, fault-tolerant node-to-set disjoint-path
routing, and reliable broadcasting via independent spanning trees.
The aim of this thesis is to study, implement and evaluate fault-tolerant
routing algorithms for the hypercube interconnection network. We have
addressed and designed fault tolerant routing algorithms in the presence of
high number of node and/or link faults. We addressed this issue by de-
signing adaptive routing protocols for hypercube interconnection networks.
This technique addresses network latency and bandwidth utilization for
parallel architectures. Adaptive routing algorithms exploit gains of path
redundancy in n-cube.
The first contribution of this thesis is the adaptive fault-tolerant routing
algorithm for hypercube topology. This algorithm has been designed in
such a way that they can use alternative path available in hypercube topol-
ogy, making more efficient use of network bandwidth and allowing hyper-
cube networks to perform in the presence of large number of faults. The
proposed algorithm is a simple uniform distributed algorithm that can tol-
erate a large number of process failures, while delivering all n messages
over optimal-length disjoint paths. However, no distributed algorithm uses
acknowledgement messages (acks) for fault tolerance. So, for dealing the
faults, acknowledgement messages (acks) are included in the proposed al-
gorithm for routing messages over node-disjoint paths in the hypercube
network. Simulation results confirm that the proposed node-to-node rout-
ing algorithm provides an average of 10% improvement in the performance
of hypercube network in comparison with the previously proposed routing
algorithms–depth first search algorithm and unsafety vectors algorithm.
The second contribution is the node-to-set node-disjoint routing algorithm
for the hypercube networks with faulty nodes. This algorithm has been de-
signed to the problems of the disjoint shortest paths routing. The proposed
algorithm can tolerate maximum n− 1 faulty nodes, where n is the dimen-
sion of the hypercube. The proposed NoSeRo algorithm used the subcube
property of the n-dimensional hypercube. It adapts divide-and-conquer ap-
proach to take full advantage of the regularity of the hypercube. Hence,
proposed algorithm generates fault-free node-disjoint paths in a faulty en-
vironment. The proposed fault tolerant routing algorithm for faulty hy-
percube networks which finds n disjoint paths from source process s to n
destination processes in n-dimensional hypercube in O(n2) time with opti-
mal path lengths at most n + f + 1, where n is the number of destination
node and f is the number of faulty nodes. Then simulation results showed
that the proposed algorithm reduce the average path length by about 20%
in comparison of Bossard’s algorithm in 8-dimensional hypercube (H8).
The third contribution is the reliable data broadcasting scheme by gener-
ating Independent Spanning Trees (ISTs) on hypercubes. The proposed
scheme can be useful for secure message transmission. Using n-IST-based
broadcasting from same root r on hypercube network (N = 2n) provides n-
degree fault tolerance. The proposed algorithm can be easily implemented
in parallel or distributed systems. Using ISTs one can enhance the fault-
tolerance, bandwidth, and security. In this chapter, we study the existence
and construction of n ISTs rooted at an arbitrary vertex in Hn(n 1). A
parallel algorithm with the time complexity O(n) is proposed to construct
n ISTs on Hn, where n 1.
In this thesis, we have designed, implemented and evaluated the different
algorithms for fault-tolerant routing. The methodologies of all the algo-
rithms are based on existing theories, knowledge and observations. The
algorithms proposed in this thesis were developed on a theoretical basis
and were implemented practically. With the help of relevant books and
related research papers, we have focussed on the design, implementation,
and evaluation of similar algorithms with efficient complexity. We have
analysed the effectiveness of all the proposed fault-tolerant routing algo-
rithms through simulation. For this, we have developed simulation models
for experimental evaluation of our propositions.
All of the proposals made in this thesis are suitable (without hardware mod-
ification) to be implemented on currently personal computing systems and
all the proposed algorithms are able to tolerate dynamically a reasonable
number of faults.