Every company that does business over the web, WAN, or through a data center must have a plan to protect connectivity and assets in case any data center components fail. The key is having a good business continuity and disaster recovery solution in place. Over the last few months I’ve been working on a solution for architecting the network for disaster avoidance and recovery. I delivered a webinar about this just days before Hurricane Sandy landed on the East Coast. It’s not often that a topic is so relevant. As I watched the news a couple of days later it was shocking to see what happened to people’s neighborhoods. My first reaction was to hope that rescue efforts were underway and that people would be safe. As a few days went by I was wondering about how people were doing with getting their businesses up and running again. I wrote the need for good disaster planning in a blog in early September, see link.
What We Have Learned From Our Conversations At Juniper we have talked with many organizations about their networks and their business continuity situation. I’d like to share some of what we learned. Many organizations tell us that they have grown organically and also through acquisitions. As a result they are concerned about inconsistent IT management policies. They have a range of applications from the organizations that they acquired and since they are often in high growth businesses and are adding applications for special projects they see server and application sprawl. Often a new CIO will initiate a process of checks on the infrastructure in an effort to see how they can normalize policies and streamline IT management. Sometimes the news of a natural disaster prompts a review the BC/DR plan.
Challenges Confronting the Organization What organizations often find is that they are confronted with a number of challenges. They might build infrastructure without clearly identifying their application needs. This can result in poor SLA definition for applications, instead of having strict requirements with metrics. Many times they deploy infrastructure in an ad hoc manner without consistent policies. The result is many failure points, as well as difficulty managing the network and with provisioning it. Poor link utilization, with links that are frequently idle is another consequence. They often have a distributed authentication, authorization and enforcement infrastructure. The result is complex firewall policies that prevent user specific enforcement and that are deployed based on local Data Center IT policies, not global policies. These inconsistent policies for users and application access resulted in security holes.
Many organizations don’t have automated backup and use a manual backup and configuration synchronization system. In the event of a failure they manually restore configurations between the two data centers. The result is inconsistent states, which affects the user experience since policies can be out of sync due to the time delay in restoring them. Legacy applications are often impacted by outages as they cannot always be replicated and made to work in new locations due to their hard coded IP addresses. Since data is coming from locations such as branch offices, call centers traffic can vary greatly and this will cause congestion in the case of a link failure, and since traffic might not be prioritized based on application relevance, lesser priority applications can impact the performance of critical applications.
Requirements for New BC/DR Architecture After we reviewed the challenges and their impact on the business, we considered the requirements for business continuity and disaster recovery. We found that organizations realize that they need to improve their customer experience when using applications. Infrastructure needs to be configured with centralized policy administration and enforcement to eliminate inconsistencies in policies and ensure synchronization of policies. They realize that deploying all applications in an active-active setup is expensive and they want to deploy only the most critical applications in active-active mode. In order to ensure application performance and replication over the WAN the infrastructure needs to dynamically scale to allow for varying amounts of bandwidth, especially since bandwidth demands increase greatly in the case of a disaster due to the backup activity and user failover to new locations. In developing the BC/DR plan organizations need to allow their remote workers and partners to securely and easily collaborate even when they aren’t able to access their traditional collaboration tools.
Juniper’s Recommended Six Step Approach To address the need for a robust data center and network solution Juniper has created a set of six steps to improve your network. These are:
1. Analyze application workflows to ensure proper prioritization of application availability requirements 2. Simplify and central the network architecture to minimize the number of failure points 3. Synchronize data to ensure that applications are available in the back up location 4. Monitor network performance to ensure application performance 5. Build network resiliency on the WAN so that it can failover rapidly to minimize data loss 6. Redirect users rapidly and securely to the new destination to enable user access
Why These Steps Are Important
Analyze Application Workflows - Determining application priorities ensures that the applications can be categorized for replication while understanding application dependencies ensures the applications are available in the back up location. Identifying user access privileges ensures that they are retained at the backup location.
Network Architecture Simplification - Reducing the number of devices reduces the number of points of potential failure and simplifies provisioning while centralizing control. This ensures that policies are consistent across data centers, so that users will get consistent access to application resources and it improves security with consistent policy administration.
Synchronize Data Migration – To ensure availability you need to replicate data cost effectively and reliably from the primary to the backup location under changing loads. This requires interoperability with your existing equipment and the resiliency of carrier-class technology that is designed to rapidly scale to meet your needs.
Monitoring Network Performance - Cost effective tools are needed to monitor network usage so that you can prioritize your business critical applications and understand network behavior and application usage so that you can provision your network proactively and keep costs down.
Build Network Resiliency - Building resiliency minimizes data loss that can result from failures in the network. You need to ensure that the traffic between data centers is optimally routed through the least congested paths to ensure minimal delay. Optimal routing should account for dynamic changes in traffic loads minimizing the impact of congestion or failure.
Rapidly Redirect Users - You need a way to make sure that your legacy software can be moved in the event of a disaster. Your remote users and partners need to connect securely with minimal delays even when there are peak loads in the network and you need secure connectivity and access to collaboration applications.