By: Henrik Hasenkamp
Decentralisation and learning algorithms will play an increasingly important role in future data centres.
Increasing complexity and growing data volumes characterize more and more data centers. Even today, trends are showing how to meet the associated requirements: with decentralization in a new dimension and machine learning for more automation and proactive management.
Whether companies pack all IT resources into their own data centers or resort to cloud services and similar infrastructure services depends on countless factors. There are now so many services of this kind availability that the appropriate infrastructure variant is available for every application - if required, as a software service, with or without management tools, based on a pay-per-use cloud or dedicated resources. This almost inevitably results in "infrastructure fragments" - i.e. hybrid infrastructures. The resource requirements and requirements for individual applications or specialist departments are too different - especially when data-intensive applications such as Internet of Things (IoT) or Data Analytics are added.
Management tools that use dashboards to provide a central overview of used resources or support alarms and semi-automated processes can help in the management of hybrid infrastructures. Previously, these software solutions were primarily used for monitoring and reactive resource management. This is an important but very time-consuming step towards managing distributed IT infrastructures today and being able to react promptly to additional requirements. The cloud provider Gridscale believes that in the future the scope of both the management tools and the administrators who use them will expand significantly. This is because optimal use of resources will require even more flexibility in the future. Even now, it is often of secondary importance where the IT resources come from - whether from a public or private cloud, whether shared or dedicated. The main thing is that they are available when they are needed and offer the required performance and security at an attractive price.
The way of thinking that data centers are designed from the available infrastructure is outdated. Applications and data are critical to the business. The infrastructure for this can nevertheless evaluate management software: The system checks its own resources and cloud services depending on costs and framework conditions, such as geographical availability or ecological requirements. Whether these cloud services will in future have to come solely from special service providers is not clear. Even companies that at times do not make full use of private or booked public cloud resources could offer them in a kind of resource pool. Initial approaches towards cloud marketplaces have already been made, but they have always been based on one service provider putting together different offers. In the future, a self-managing resource pool could emerge that is not subject to a single vendor, but makes free resources - wherever they come from - available.
Predictive Maintenance was just the beginning
If resources are to be made available in such a flexible way, the degree of automation of data center services must increase overall. In other words: With reactive alerts, e.g. if hard disks have failed or websites are no longer available, the administrator is hardly helped. Rather, the data center or the software that manages it should find an optimal solution on its own.
This goes far beyond the predictive maintenance approaches known up to now: Whereas previously telemetry data from hardware devices was used to determine when the optimum replacement time was before the failure, in the future the system will automatically initiate measures based on learned dependencies. In addition to maintenance processes, infrastructure and even business process optimizations can also be implemented. A ransomware attack, for example, causes an unusually high I/O load, which the system can recognize as a danger if it has been taught beforehand that this is an anomaly. The system learns what is not normal in terms of day-to-day operation by means of predefined features - these are the important criteria in the specific application case - and large amounts of runtime data from data centers. These do not always have to be attacks. Rather, it is about automated, immediate scaling or migration, so that workloads can be relocated without the intervention of the admins, for example when other cloud services are cheaper at this moment.
The challenge is the process of machine learning. For a system does not become more intelligent if it is simply given what is "abnormal" behaviour. A high I/O rate is not necessarily the result of a Ransomware attack. Tons of telemetry and other environmental data, sometimes even from external or less infrastructural sources, must be evaluated and put into the right context. The more example data an algorithm can learn and the more features are defined, the better - but at the same time the more complex and computationally intensive.
The goal is to increase the level of automation in data centers while simultaneously improving service quality. This appears to be absolutely necessary, since the increasingly complex IT infrastructure requirements can already hardly be managed manually. From the point of view of those responsible, the resource question should soon no longer be one that requires a defined answer. For applications, the need for performance, availability and security is more likely to be defined, whereupon an intelligent algorithm selects the optimal infrastructure solution. It also takes into account interactions with other infrastructure factors - such as price/performance or empirical values.
The original article in german can be found here.