This is a guest blog post. Views expressed in this post are original thoughts posted by Glen Kemp, Solutions Consultant at SecureData Europe. These views are his own and in no way do they represent the views of the company he works for.
A customer recently asked me to provide a proposal for a consolidated data centre with some fairly fixed requirements. It had to fit in a fixed budget, a fixed amount of rack space and host a fixed number of servers.
It was the same week that Juniper Networks launched the EX4550 Ethernet Switch. This represented for me the last piece of a puzzle in terms of what data centre networks will increasingly look like. So when my customer asked me: “design me a data centre, I’ll need it operational within a month” what did I propose? Well, for a start a shopping list much shorter and a timescale much less pessimistic than you’d think.
Breaking it down
The modern data centre can be broken down into five basic components:
Starting with a clean sheet of paper, how do we build a new data centre from these five basic components? If we select a single vendor and technology for each component we can standardise upon off-the shelf components. This reduces the quantity and classes of physical hardware needed; in turn this makes redundancy simpler and leaves less equipment idling. Here’s the design I put together:
At the heart there has to be a core fabric; in our case an Ethernet switch. Whilst Ethernet is an imperfect transmission medium, it’s fast, open and very easy to scale. The EX4550 is an ideal network core as it allows us to collapse the classic access, aggregation and core layers into a single pair of switches. A single rack unit can directly connect our compute platform to our storage at wire rate whilst handling full layer 3 routing. Each switch provides a frankly mind-boggling 480Gbps of bandwidth linked via a 256Gbps Virtual Chassis backplane. Whilst I can quote speeds and feeds till I’m blue in the face, what makes this truly impressive is the list price; $19,000. For a next generation switch with full Layer 3 routing features this provides ample capacity for our “east-west” server-to-server and server-to-storage traffic.
The Compute Platform
A quad-socket Intel Xeon E7 server can provide 40 cores of processing cores in 2u of rack space. This essentially provides a platform to host a very large number of virtualised servers in a hyper dense environment. The servers themselves would essentially be diskless; save an industrial flash drive to boot the hypervisor. To connect the platform to the fabric 10GbE direct attached copper (DAC) cables provide inexpensive, straightforward connectivity without having to mess around with expensive GBICs and delicate fibre. Four connections are all that’s required to link each server to the fabric; two dedicated for server traffic and two for storage I/O. Capacity planning is obviously critical, but the partnership of new processor and RAM technologies such as Intel Xeon E7 and LRDIMMS make this kind of server density not only feasible, but a practical solution. As few as four servers would provide 320 execution cores in a less than a quarter of a standard 42u rack. A slightly less avant-garde approach would be to use dual-socket, 1u servers with slightly fewer cores. This would provide a greater redundancy and lessen the impact of a single server failure, at the cost of increasing the number of physical connections.
Separating the storage from the compute platform is commonplace; however there are different ways of achieving this. The iSCSI standard is commonly deployed and supported by many vendors. It allows flexibility of deployment without proprietary lock-in of the storage adaptor, transit switch or disk array. By again utilising “of the shelf” technology, expensive cards for the servers and dedicated Fibre Channel switches can be avoided. Whilst Fibre Channel probably represents the highest performance solution for disk access, it doesn’t offer value for money or flexibility an iSCSI solution would. Given the need for a high-performance Ethernet switch for the server interconnect, utilising iSCSI to connect the compute platform to the storage represents a simple, elegant solution.
The Security Layer
By virtualising most, if not all, of the application servers into the compute platform, we create a new problem. Most virtualised environments place the servers on the same layer 2 subnet. Whilst this is “easy” it doesn’t provide a straightforward way to police server-to-server network traffic or provide in-line intrusion detection. Artificially breaking a virtualised network into subnets is painful; routing that traffic via a firewall (either physical or virtual) almost certainly creates a performance bottleneck and definitely increases the complexity. A more practical solution is to use a network security tool designed for the job at hand; a virtual security gateway integrated into the hypervisor such as Juniper’s vGW. The traditional approach of a “firewall in a virtual machine” is not a particularly efficient method of forwarding traffic; it is forcing a virtualised environment to behave like a physical one; a square peg in a round hole. By hooking into the VMware native APIs, vGW can directly intercept the network I/O of each virtualised machine as it heads down network stack, making the forwarding decision at the point of egress rather than a traditional “default gateway” approach. This allows much higher performance than forcing the traffic to an external firewall. vGW acts as a transparent layer 2 bridge between the guest and the virtual switch. This approach is also more cost effective than purchasing a dedicated firewall cluster capable of handing inter-server communication. Ultimately this means layer 3 routing decisions can be handled at layer 3 by a device designed to handle such high throughputs; the Juniper EX4550 switch.
vGW Kernel Interaction (from the vGW documentation)
Time to Live
One of the benefits of this methodology is that the data centre build out time is significantly reduced. Fewer physical devices mean less network cabling, less power and less cooling. The initial “stand up” on a new data centre deployment is perhaps the slowest and most painful part. Once the basic connectivity is addressed, the application, network and security teams can then start their individual tasks. If there is less to “stand up” the “time to live” can be radically reduced and project life cycles shortened.
The amazing thing about this design is that there is very little actual equipment to deploy. In fact, the total rack budget is tiny:
2x Juniper Networks EX4550 switches = 2u
4x Dell D820 PowerEdge™ Server = 8u
2x Dell PowerVault™ MD3600i 10GbE iSCSI Array & 12 600GB 15K SAS = 4u
Total rack space required = 14u
This design is obviously created with a fixed capacity in mind, but it can easily be scaled upwards to accommodate more physical and logical servers. Because the EX4550 is designed as a “top of rack” switch, up to 10 switches can be added in the same virtual chassis should you need to support dozens if not hundreds of physical hosts.
Ultimately, by consolidating the core of the infrastructure, you end up with a much simpler deployment. This means you save in real-terms on the fifth important part of the data centre; the bit which costs you money and never see any return on; the utility. Fewer devices mean less power draw, less rack space to rent, fewer network changes to make and fewer mechanical components to fail. The upshot is that the new network is physically and logically smaller than one, which it will replace. It will cost less to maintain and be faster to “stand up”. Anyone who has a pricing model based upon physical hosts deployed or number of rack units powered is going to very shortly run into a series of very big problems.
So, what aspects of have I missed out? Where is the Achilles heel in my plan? What other innovations are out there which I’ve not mentioned? I’d be pleased to hear about them in the comments section below.