Authored by Mr. Manish Gupta, Vice President and GM, Infrastructure Solutions Group, Dell Technologies
There’s an old saying: you can’t manage what you can’t measure. That is the case with Dell Digital’s new infrastructure capacity planning and forecasting effort to keep pace with our record organic growth and new business demand for IT services inside Dell.
In the fall of 2020, we began building a team to address the challenge that Dell Digital, Dell’s IT organization, needed better forecasting guidance to stay ahead of our growing infrastructure demand.
Our organic growth of existing IT systems, which normally runs around 8-10 percent year-over-year, had exceeded 41%. We also faced rapid new internal business growth from new products and an internal self-service catalog.
However, our data on existing and projected IT resource needs was spread out in separate locations across our organization with no cohesive measurement standards or analysis strategy. We relied on manual processes and spreadsheets to perform infrastructure capacity planning and forecasting.
We built a capacity forecasting team of data scientists and analysts to review data spanning our current systems, organic growth trends, business demands and future resource insights. The team not only aggregated historic, current and projected resource data from different sources into a central data repository but also created a uniform measurement model to provide better clarity for data center forecasts and transparency for users.
The result is an automated planning and forecasting model that enables Dell Digital to maintain six months of on-demand capacity and lets us keep an 18-month rolling forecast for our supply chain.
Forecasting with T-shirt Sizes
We started creating our capacity forecasting model by looking at what our systems were doing and what we predicted they’d do down the road. We looked at organic growth over time plus internal business growth and worked with our business segments to understand what they’re looking toward doing.
The next step was converting that data into a demand forecast so that we can signal increases or decreases in capacity requirements to both manufacturing and to our interlock teams, including those that manage our facilities, power and rack space. We also strove to signal our manufacturers about what demand projected to be 18 months in advance.
As we mapped our forecasting strategy, we decided we needed a standard measurement of our infrastructure use to track current and future capacity based on how users consume IT resources rather than on individual infrastructure components. We chose an increasingly popular and friendly measurement technique based on T-shirt sizes. T-shirt sizing is a capacity planning tool in which you assign each project or task a T-shirt size—from extra small to double extra-large (XXL)—to represent its scope or scale.
For example, an extra small T-shirt might be a sandbox or proof-of-concept environment. An extra-large T-shirt would be a full-scale, full-size production series of databases for a major project.
This planning measurement approach lets us tie our forecasting strategy to how our team members consume our IT infrastructure through our self-service Dell Digital Cloud Portal. Since users are consuming our products in T-shirt sizes, it makes sense to plan our capacity in T-shirt sizes. We then take that data and use AI and ML algorithms to help us spot trends and create forecasts.
Each size takes up a defined space and has an associated cost, which is essential to staying within budget from a CapEx perspective.
By using T-shirt sizes, we can be more transparent with users about what they consume and more precise in forecasting capacity needs in our data centers. We work with internal business groups to determine how much of a certain size T-shirt they want and help them understand what they can get for their money.
A T-shirt could be any number of infrastructure products including virtual machines, containers or even functions as a service. Each of our business segments has unique ownership of our internal applications, and once we understand the behaviors of those applications, how they break down into T-shirt sizes, we can start to better understand the future.
We now standardize our multicloud experiences by leveraging T-shirt sizing across Dell Digital to enable a consistent experience spanning our private cloud and our public cloud offerings.
Sizing Up the Big Picture
Using our T-shirt model, we took quarterly and yearly historical trends, converted that into how many T-shirts we consume in each of our 27 data centers and scaled up from there.
In the process, we’re looking at everything from how much power and space we have, how many racks we need, and how much power we need going to each rack for our substrate design.
We’ve created a 15-year model around how we think we need to grow and scale our data centers. This includes new data centers coming online, older data centers spinning down and migration models for those transitions.
Overall, we’re thinking of our capacity and data center strategy from the big picture over five, 10 and 15 years.
At the very outset of our team’s forecasting work, we discovered and addressed a critical IT resource need. The data was clearly telling us a story. Analyzing data from multiple sources and looking beyond our previous incremental growth assessment, we realized that we only had about an 18-month runway before we would be constrained in our current data centers.
Armed with this new data, Dell Digital was able to spin up two new data centers within 90 days to meet our urgent demand for storage and compute capacity.
Since then, our planning and forecasting strategy has vastly improved our visibility on IT capacity needs, with a five to 10-year data center strategy, an 18-month supply chain projection and better consumption insights and metrics for our users.
Our planning goal going forward is one that any modern IT organization needs to achieve. We are seeking to balance our facilities, network and power usage and make sure that we’re building in resiliency plans. Capacity planning is not just looking at application requirements but looking at the overall health of the environment. And that could be everything from space, power and cooling to racks, software-defined storage, compute and networking.