Please send RFIs and RFQs to stan@securepower.io

APC White Paper 197: Facility Operations Maturity Model for Data Centers

White Paper 197

Revision 1

by Jennifer Schafer and Patrick Donovan

Executive summary

An operations & maintenance (O&M) program determines to a large degree how well a data center lives up to its design intent. The comprehensive data center facility operations maturity model (FOMM) presented in this paper is a useful method for determining how effective that program is, what might be lacking, and for benchmarking performance to drive continuous improvement throughout the life cycle of the facility. This understanding enables on-going concrete actions that make the data center safer, more reliable, and operationally more efficient.

NOTE: The complete FOMM is embedded in the Resources page at the end of this paper.

Introduction

Figure 1 shows the various phases of the data center life cycle. The primary focus of a facility operations team would obviously be in the “Operate” phase. However, Facilities team involvement in the early planning, design, and commissioning phases is important. Their detailed and practical knowledge of operations and maintenance can help ensure poor design and construction choices are avoided that might, otherwise, compromise performance, efficiency, and/or availability once the data center becomes operational.

Figure 1: Assessing performance and O&M maturity are key tasks within the data center life cycle

To learn more about the benefits of including facility operation teams in earlier phases of the life cycle, see The Green Grid’s White Paper 52, An Integrated Approach to Operational Efficiency and Reliability.

As described in White Paper 196, Essential Elements of Data Center Facility Operations, it is important to monitor, measure, and report on the performance of the data center so that performance, efficiency, and resource-related problems can be avoided or, at least, identified early. Besides problem prevention, assessments are necessary to benchmark performance, determine whether changes are needed and what specific steps are required to reach the next desired performance or maturity level. The maturity model presented in this paper offers a framework for assessing the completeness and thoroughness of an O&M program. Ideally, an organization would do the first assessment during Commissioning for new data centers or as soon as possible for an existing data center. Next, results should be compared against the data center’s goals for criticality, efficiency, and budget. Gaps should be identified and decisions made as to whether any changes need to be made in the program. Once the level of maturity has been benchmarked in this way, periodic assessments using the model should be conducted at regular intervals (perhaps annually) or whenever there is a major change in personnel, process, budget, or goals for the facility that might warrant a significant change in the O&M program.

How the model works

The Schneider Electric data center facility operations maturity model (FOMM) proposed in Facility Operations Maturity Model for Data Centers this paper has a form and function based on the IT Governance Institute’s maturity model structure . The model is built around 7 core disciplines (see Figure 2). Each discipline has several operations-related elements associated with it. Each element is further divided into several sub-elements. Each sub-element is graded or ranked on a scale of “1” to “5” (see Figure 3) with “1” being least mature to “5” being the most developed. And for each of these program sub-elements, each of the five maturity levels are defined in terms of the specific criteria needed to achieve that particular score. The score criteria and the model it supports have been tested and vetted with real data centers and their owners. The score criteria represents a realistic view of the spectrum and depth of O&M program elements that owners have in place today ranging from poorly managed data centers to highly evolved, forward thinking data centers with proactive, measurable programs.

Figure 2: The FOMM is divided into 7 disciplines that are further divided into elements and sub-elements. This image shows the 7 disciplines and their 26 elements only.

Maturity level characteristics

In order to further clarify the meaning and Figure 3, the following characteristics are differences between the maturity levels shown in provided:

Level 1: Initial / ad hoc

  • No awareness of the importance of issues related to the activity.
  • No documentation exists.
  • No monitoring is performed.
  • No activity improvement actions take place.
  • No training is taking place on the activity.

Figure 3: Each of the elements is graded on a scale of 1 to 5 with 1 representing the lowest level of operational maturity and 5 being the highest level.

Level 2: Repeatable, but intuitive

  • Some awareness of the importance of issues related to the activity.
  • No documentation exists.
  • No monitoring is performed.
  • No activity improvement actions take place.
  • No formal training is taking place on the activity.

Level 3: Defined process

  • Affected personnel are trained in the means and goals of the activity.
  • Documentation is present.
  • No monitoring is performed.
  • No activity improvement actions take place.
  • Formal training has been developed for the activity.
Level 4: Managed and measurable

  • Affected personnel are trained in the means and goals of the activity.
  • Documentation is present.
  • Monitoring is performed.
  • The activity is under constant improvement.
  • Formal training on the activity is being routinely performed and tracked.
  • Automated tools are employed, but in a limited and fragmented way.

Level 5: Optimized

  • Affected personnel are trained in the means and goals of the activity.
  • Documentation is present.
  • Monitoring is performed.
  • The activity is under constant improvement.
  • Formal training on the activity is being routinely performed and tracked.
  • Automated tools are employed in an integrated way, to improve quality and effectiveness of the activity. 

 

Scoring and goal setting

The maturity model embedded in this paper does not provide a form or describe a specific method for tallying and reporting the grading for all the sub-elements. However, Figures 4, 5, and 6 show examples of useful methods used by Schneider Electric for scoring and reporting elements. Figure 4 shows a method for visually showing an element’s present level of maturity scores for each of its sub-elements against what the organization’s goals are for each sub-element.
Figure 4: Example of how to depict a sub-element’s present level of maturity score; the colors indicate to what degree the score meets goals.

Figure 5 shows a unique score graphic called a “Risk Identification Chart” which shows the level of risk (i.e., threat of system disruption; 100% represents highest risk) by line of inquiry. That is, for any element in the model, each has sub-elements related to one of three “lines of inquiry”: process, awareness & training, and implementation in the field (of whatever task, knowledge, resources, etc. are required for that element to be in place). The scores for the sub-elements are then grouped and divided based on these three lines of inquiry. These particular lines of inquiry represent three key focus areas of any highly reliable and mature data center facility operations team. Knowing which of the three areas poses the greater risk to the facility helps organizations more quickly identify the type and amount of resources needed to make corrections. Immediate corrective action plans should be developed to address any element with risk levels at 60% or above.

Figure 6 shows a method for taking sub-elements that are deemed to have unacceptable scores and ranking them based on how easy they are to improve (or implement) vs. their impact on operations (if corrected). This is an effective way to help organizations prioritize “where to go from here” based on FOMM goals, business objectives, time, and available resources. “Quick wins” can be easily identified and separated from items that fit longer term, strategic objectives that might require significant changes in staff competencies and behav- iors. Base-lining the current implementation of the O&M program against the organization’s desired levels should then lead to a concrete action plan with defined goals and owners.

Figure 6: Example illustration of how to rank elements in terms of their cost/ease of implementation vs. their impact on operation

Who should perform the assessment?

It is important for the person or team who conducts the assessment to be objective and thorough with an “eye” for detail. Accurately determining to what degree a sub-element
exists and how consistently it is used and maintained for the facility can be a challenge. Organizations that are low in O&M experience may also have difficulty in determining the best path forward once the initial score baseline has been established. While the model’s 5 defined levels of maturity for each sub-element is specifically designed to help guide “where to go next”, some may not know what are the most effective steps to get there.

Those who determine they lack the required time, expertise, or objectiveness would be best served to hire a third party service provider with good facility operations experience. A third party would more likely play an independent and objective role in the process having no investment in the way things “have always been done”. There’s also value in having a “new set of eyes” judge the program whose fresh viewpoint might yield more insightful and actionable data analysis. Experienced service vendors offer the benefit of having knowledge gained through the repeated performance of data center assessments throughout the industry. Broad experience makes the third party more efficient and capable. This knowledge makes it possible, for example, to provide their customer with an understanding of how their O&M program compares to their peers or other data centers with similar business requirements. Beyond performing the assessment and helping to set goals, experienced third parties can also be effective at providing implementation oversight which might lead to a faster return on investment, especially when resources are already constrained.

Conclusion

Preventing or reducing the impact of human error and system failures, as well as managing the facility efficiently, all requires an effective and well-maintained O&M program. Ensuring such a program exists and persists over time requires periodic reviews and effort to reconcile assessment results with business objectives. With an orientation towards reducing risk, the Facility Operations Maturity Model presented and attached to this paper is a useful framework for evaluating and grading an existing program. Use of this assessment tool will enable teams to thoroughly understand their program including:

  • Whether and to what degree the facility is in compliance with statutory regulations and safety requirements
  • How responsive and capable staff is at handling and mitigating critical events and emergencies
  • The level of risk of system interruption from day-to-day operations and maintenance activities
  • Levels of staff knowledge and capabilities
Also know that grading and assessment of results is best done by an experienced, unbiased assessor.

About the authors

Jennifer Schafer is the Manager of Products and Solutions Services at Schneider Electric. Jennifer is responsible for the growth and development of the critical facility assessment business, primarily focusing on the operational and maintenance programs, practices, documentation and support systems. Jennifer has more than 18 years of experience in the mission critical industry, and holds a BSAST in Nuclear Engineering Technology from Thomas Edison State College. Prior to Schneider Electric, and after 9 years of service in the U.S. Navy as a Nuclear-trained Electronics Technician, Jennifer spent 5 years serving in various data center roles, rising from Technical Operations to being a Critical Facility Manager. She also spent 4 years as a Field Service Engineer specializing in industrial power distribution systems including switchgear and UPS Systems.

Patrick Donovan is a Senior Research Analyst for the Data Center Science Center at Schneider Electric. He has over 18 years of experience developing and supporting critical power and cooling systems for Schneider Electric’s IT Business unit including several award- winning power protection, efficiency and availability solutions. An author of numerous white papers, industry articles, and technology assessments, Patrick's research on data center physical infrastructure technologies and markets offers guidance and advice on best practices for planning, designing, and operating data center facilities.