Figures xv Tables xix Equations xxi Preface and Acknowledgments xxiii
Audience xxiv
Organization xxiv
Acknowledgments xxvi
PART 1 BASICS 1 1 SERVICE, RISK, AND BUSINESS CONTINUITY 31.1 Service Criticality and Availability Expectations 3
1.2 The Eight-Ingredient Model 4
1.3 Catastrophic Failures and Geographic Redundancy 7
1.4 Geographically Separated Recovery Site 11
1.5 Managing Risk 12
1.6 Business Continuity Planning 14
1.7 Disaster Recovery Planning 15
1.8 Human Factors 17
1.9 Recovery Objectives 17
1.10 Disaster Recovery Strategies 18
2 SERVICE AVAILABILITY AND SERVICE RELIABILITY 202.1 Availability and Reliability 20
2.2 Measuring Service Availability 25
2.3 Measuring Service Reliability 33
PART 2 MODELING AND ANALYSIS OF REDUNDANCY 35 3 UNDERSTANDING REDUNDANCY 373.1 Types of Redundancy 37
3.2 Modeling Availability of Internal Redundancy 44
3.3 Evaluating High-Availability Mechanisms 52
4 OVERVIEW OF EXTERNAL REDUNDANCY 594.1 Generic External Redundancy Model 59
4.2 Technical Distinctions between Georedundancy and Co-Located Redundancy 74
4.3 Manual Graceful Switchover and Switchback 75
5 EXTERNAL REDUNDANCY STRATEGY OPTIONS 775.1 Redundancy Strategies 77
5.2 Data Recovery Strategies 79
5.3 External Recovery Strategies 80
5.4 Manually Controlled Recovery 81
5.5 System-Driven Recovery 83
5.6 Client-Initiated Recovery 85
6 MODELING SERVICE AVAILABILITY WITH EXTERNAL SYSTEM REDUNDANCY 986.1 The Simplistic Answer 98
6.2 Framing Service Availability of Standalone Systems 99
6.3 Generic Markov Availability Model of Georedundant Recovery 103
6.4 Solving the Generic Georedundancy Model 115
6.5 Practical Modeling of Georedundancy 121
6.6 Estimating Availability Benefit for Planned Activities 130
6.7 Estimating Availability Benefit for Disasters 131
7 UNDERSTANDING RECOVERY TIMING PARAMETERS 1337.1 Detecting Implicit Failures 134
7.2 Understanding and Optimizing RTO 141
8 CASE STUDY OF CLIENT-INITIATED RECOVERY 1478.1 Overview of DNS 147
8.2 Mapping DNS onto Practical Client-Initiated Recovery Model 148
8.3 Estimating Input Parameters 154
8.4 Predicted Results 165
8.5 Discussion of Predicted Results 172
9 SOLUTION AND CLUSTER RECOVERY 1749.1 Understanding Solutions 174
9.2 Estimating Solution Availability 177
9.3 Cluster versus Element Recovery 179
9.4 Element Failure and Cluster Recovery Case Study 182
9.5 Comparing Element and Cluster Recovery 186
9.6 Modeling Cluster Recovery 187
PART 3 RECOMMENDATIONS 201 10 GEOREDUNDANCY STRATEGY 20310.1 Why Support Multiple Sites? 203
10.2 Recovery Realms 204
10.3 Recovery Strategies 206
10.4 Limp-Along Architectures 207
10.5 Site Redundancy Options 208
10.6 Virtualization, Cloud Computing, and Standby Sites 216
10.7 Recommended Design Methodology 217
11 MAXIMIZING SERVICE AVAILABILITY VIA GEOREDUNDANCY 21911.1 Theoretically Optimal External Redundancy 219
11.2 Practically Optimal Recovery Strategies 220
11.3 Other Considerations 228
12 GEOREDUNDANCY REQUIREMENTS 23012.1 Internal Redundancy Requirements 230
12.2 External Redundancy Requirements 233
12.3 Manually Controlled Redundancy Requirements 235
12.4 Automatic External Recovery Requirements 237
12.5 Operational Requirements 242
13 GEOREDUNDANCY TESTING 24313.1 Georedundancy Testing Strategy 243
13.2 Test Cases for External Redundancy 246
13.3 Verifying Georedundancy Requirements 247
13.4 Summary 254
14 SOLUTION GEOREDUNDANCY CASE STUDY 25614.1 The Hypothetical Solution 256
14.2 Standalone Solution Analysis 259
14.3 Georedundant Solution Analysis 263
14.4 Availability of the Georedundant Solution 269
14.5 Requirements of Hypothetical Solution 269
14.6 Testing of Hypothetical Solution 277
Summary 285 Appendix: Markov Modeling of Service Availability 292 Acronyms 296 References 298 About the Authors 300 Index 302Eric Bauer is Reliability Engineering Manager in the IMSSolutions Organization of Alcatel-Lucent, where he focuses onreliability of Alcatel-Lucent's IMS solution and the networkelements that comprise the IMS solution. He has written Designfor Reliability: Information and Computer-Based Systems and Practical System Reliability. Randee Adams is a Consulting Member of Technical Staff inthe Applications Group of Alcatel-Lucent. Currently, she isfocusing on reliability for Alcatel-Lucent's softwareapplications. Daniel Eustace is a Distinguished Member of TechnicalStaff in the IMS Solutions Organization of Alcatel-Lucent.Currently, he is a solution architect focusing on reliability, keyquality indicators, geographical redundancy, and callprocessing.
![]() |
Ask a Question About this Product More... |
![]() |