Redundancy Schemes for High Availability Computer Clusters
Abstract
The primary goal of computer clusters is to improve computing performances by taking advantage of the parallelism they intrinsically provide. Moreover, their use of redundant hardware components enables them to offer high availability services. In this paper, we present an analytical model for analyzing redundancy schemes and their impact on the cluster’s overall performance. Furthermore, several cluster redundancy techniques are analyzed with an emphasis on hardware and data redundancy, from which we derive an applicable redundancy scheme design. Also, our solution provides a disaster recovery mechanism that improves the cluster’s availability. In the case of data redundancy, we present improvements to the replication and parity data replication techniques for which we investigate the availability of the cluster under several scenarios that take into account, among other things, the number of replicated nodes, the number of CPUs that hold parity data and the relation between primary and replicated data. For this purpose, we developed a simulator that analyzes the impact of a redundancy scheme on the processing rate of the cluster. We also studied the performance of two well-known schemes according to the usage rate of the CPUs. We found that two important aspects influencing the performance of a transaction-oriented cluster were the cluster’s failover and data redundancy schemes. We simulated several data redundancy schemes and found that data replication offered higher cluster availability than the parity model.
DOI: https://doi.org/10.3844/jcssp.2006.33.47
Copyright: © 2006 Christian K. Bassek, Samuel Pierre and Alejandro Quintero. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 3,542 Views
- 2,598 Downloads
- 4 Citations
Download
Keywords
- Computer cluster
- high availability
- redundancy scheme
- performance evaluation
- fault tolerance