ICT infrastructure components have different performance properties and characteristics and therefore behave differently within the operational environment. The analysis of their performance and the subsequent tuning of their operation must recognize and account for these differences. The Capacity Management process is ultimately responsible for performance management. Therefore, any actions taken by Operations within this area should always be in conjunction with, if not be under the direction of, Capacity Management. When different elements near their saturation point, they each behave in a different way. A processor, for instance, cannot run at half throttle. When a system resource monitor tool indicates a 50% load on a processor, this means that half of the instructions that are being processed are program instructions, either from applications or from the operating system itself, and the other half are so-called NOP instructions together, one arrives at 100% processor load or full throttle the CPU is always fully on or full of, never half on.
The reason that a monitor tool will indicate the percentage of program instructions being processed by the processor is that 100% often means that more than 100% is required. In this case, there will be a queue of program instructions waiting for the processor to process, which means a shortage of processing power. Depending on the specific situation appropriate measures must be taken. A processor load of 90% may be nothing to worry about: every instruction will be executed with the same speed as when the processor load is 5%. The most common cause for 100% processor load, and thus very likely a more than 100% processor load, is a poorly written application. An application might enter what is known as a loop, the continuous processing of a small number of program instructions, which can only be interrupted if a specific condition is met. The application code, however, specifies, this condition in such a manner that it can never be met.
The constant execution of the instructions in the loop will cause the system monitor tool to display a processor load of 100% because of all the NOP instructions now make way for the loop instructions. Depending on the processor-sharing mechanism this could also adversely affect other programs executing concurrently on the same processor: they will slow down or come to a halt. This, of course, requires action from the Operation staff. The course of action is to kill the process that is running the program with the loop instructions and try and force that process to print out a listing of the contents of the memory space, so it can be analysed to pinpoint the offending instructions. There are a large number of tools and techniques to aid writing applications to prevent these errors, but the deciding factor is still human and that is how many of these logical errors originate. When memory nears its saturation point, all activity on the system will experience delays. The reason for this delay is the fact that most of the memory available is in the form of virtual memory. This is an imitation of internal memory on a slower and cheaper medium, often hard disk, transparent to the system, but an imitation nonetheless. The more demand there is for memory, the more apparent the drawbacks of the virtual memory becomes. The drawback is often sped: the difference in speed between solid state memory and a hard disk is in the range of one or two orders of magnitude.
There are a number of techniques to be used to lessen the speed penalty:
The law of diminishing returns
Consider a configuration that is running at 80% of its maximum performance. The first tuning effort may gain half of the unused performance potential. In turn, the second tuning effort may gain half of the unused performance potential and the third tuning effort may gain half of the unused performance potential.
Providing the design, manufacture, and deployment of any configuration within the ICT infrastructure is properly carried out, most out-of-the-box configurations are reasonable. If there are, however, specific requirements or demands, the parameter set of the MOs involved will need to be adapted. It is imperative that a solid understanding of the workings and required functionality of the MO and the way the parameter settings influence their functionality is established before any changes to the parameter set are implemented. The same type of MO to values that satisfy the requests made on its functionality. This will be satisfactory for most of the time. However, there often remains ‘room for improvement’. This opportunity occurs because of the experience gained over time with a particular MO, because of changes in other parts of the infrastructure, or because of changes in the way this MO is utilised. The ‘need for improvement’ is often a consequence of volume growth, although it could also be the result of poor design or poorly controlled changes.
Tuning is the activity undertaken to progress from using a resource to using a resource, to using a resource optimally. This is where performance tuning comes in. By setting parameter values, it is often possible to improve the performance of a MO. Note that an improvement does not necessarily mean more of this or more of that (or even less of this or less of that for that matter). Tuning a MO or set of MOs from its okay to its optimum is just as much an improvement as any other. Improvements, essentially, are either quantitative or qualitative. For example, enlarging the buffers for a network interface card can result in higher throughput and thus a quantitative improvement. There are a number of manufacturers offering ‘tuning’ or ‘performance’ tools. These tools are often platformed specific and are sometimes provided by the platform manufacturer.
A well-known pitfall of tuning in the ‘sub-optimization pitfall’. All attention and effort are put into the ultimate performance of one or two MOs. Apart from being uneconomical in the usage of time and resources, it is also often counterproductive. Alleviating on bottleneck only exposes the next bottleneck in the sequence. Bottlenecks are an inherent part of ICT infrastructure can be considered to consist of a series of bottlenecks as illustrated in figure 1.
The bottleneck effect figure 1.
The way in which these bottlenecks hinder the normal operation of the ICT infrastructure is, of course, dependent on the demand placed on the infrastructure by the ICT services that use it. It is also dependent upon the speed and throughput of the ‘narrowest’ bottleneck. Even though subsequent bottlenecks may be wider, the end-to-end performance will still be constrained by the effects of the slowest link. Also, even though the speed and throughput of the constraining bottleneck may be doubled, this benefit may not be fully realised because of the effect of another slow bottleneck on the end-to-end route of the service traffic may now become a limiting factor. The effects of ‘sub-optimization’ can only be overcome by considering the end-to-end picture of the overall infrastructure.