Gamasutra - Features - "Play by Play: Effective Memory Management" [08.02.02]

Gama Network Presents:

Play by Play: Effective Memory Management

By Brian Hixon, Daniel Martin, Rob Moore, Greg Schaefer, and Richard Wifall
Gamasutra
August 2, 2002
URL: http://www.gamasutra.com/features/20020802/hixon_01.htm

Back before Al Gore invented the Internet, back when 64KB was more memory than any computer would ever need, there was a time when memory managers didn’t exist. But gradually, new computer systems came out with larger amounts of memory (Figure 1). Game designers discovered ways to eat up RAM faster than any system could dish it out. It became obvious that managing all this data with a simple static memory map was too much work. Programmers needed a way to create memory maps that could change dynamically at run time depending on the state of the program. Thus, the memory manager was born.

For an excellent introduction to how a basic memory manager works, check out www.memorymanagement.org/articles/begin.html. This article will assume this basic knowledge and focus instead on things to think about when writing your own memory manager and point out some of the problems you may encounter in using a memory manager. This discussion is based on our experiences in writing and rewriting the mem-ory manager for Madden NFL 97 to Madden NFL 2002. This article is also slanted toward the C language as written for console game machines. The same principles would apply to C++ and Windows programming, but the details are likely different.

Growing Your Own

So you know that memory managers are great, but what’s wrong with just using malloc and free? They are part of the standard C library, so they should be the best, right? One major problem is that you don’t really know what they are doing. If you write your own memory manager, you can generate statistics, add debug code, and add advanced features that might not be found in the default memory manager — features like handles, heap compaction, and virtual memory. The ultimate weapon of game programmers is context. You know the needs of your game and can tailor the memory manager accordingly. Developing a successful memory manager requires addressing five different issues: ease of use, performance, memory over-head, fragmentation control, and debugging support. Unfortunately, these attributes are all interconnected, and optimizing one can result in substandard results for another. The memory manager designer must govern a delicate balancing act.

At a lower level, memory manager design also requires paying attention to platform-specific requirements. In addition, it may be possible to utilize hardware support to assist the memory manager.

Ease of Use

With respect to a memory manager, the ease of use consideration really comes down to a single question: Should the memory manager support pointers, handles, or both? When design-ing a memory manager and dealing with the problem of fragmentation, the use of handles can be very appealing. Unfortunately, while handles provide a straight-forward solution to the fragmentation problem, they are much more difficult to use than pointers. A memory manager that supports only handles is essentially pushing its internal complexity back onto the user.

While supporting both handles and pointers is possible, the resulting memory manager is more complicated than one that supports a single method. Madden used to support both handles and point-ers until an analysis showed that pointers were being used 99 percent of the time. Not surprisingly, when given a choice, programmers used pointers, since they were the easiest solution. Therefore, we simplified the latest Madden memory manager by removing handle support and optimizing the pointer support.

Performance

Performance must be addressed both in terms of speed and consistency. A memory manager that is fast most of the time but slows down dramatically at unpredictable times is unacceptable in a gaming environment. Therefore it is important to understand the issues that contribute to performance. From the memory manager user’s point of view, two operations will impact the game: allocations and recycling.

Allocation performance is determined by allocation poli-cy, free list management, and the use of garbage collection. The most popular allocation policies are best fit, first fit, and next fit. By organizing the free blocks as a linked list, best fit has consistent O(n) performance, while first fit and next fit have worst-case O(n) and on average O(n/2). By organizing the free blocks as a size-sorted binary tree, best fit has worst-case O(n) and on average O(n log n). By organizing the free blocks as a balanced binary tree, best fit has consistent O(n log n). Garbage collection applies only to memory managers with handle support, and generally involves a fairly significant performance penalty when it occurs, as it essentially makes a copy of the entire memory heap during compaction.



	Figure 1: Main memory growth for console game machines.

Recycling performance is based on free list management and the use of free block coalescing. Free block coalescing requires the ability to locate the blocks immediate-ly preceding and following any arbitrary block. Some memory structures, such as boundary tag (see Donald Knuth’s Fundamental Algorithms under References for more information on the boundary tag), allow this in O(1) time, while those that don’t require an O(n) scan of the free block list. The current Madden memory manager uses boundary tags to allow O(1) access to previous/subsequent blocks and organizes the free list as an unbalanced binary tree. The result is allocation and recycling performance both in the O(n log n) to O(n) range.

MemoryOverhead

Memory overhead is the cost we have to pay to manage memory, since each allocation will cost some memory. Memory overhead is an important consideration, especially if the memory manager will be handling a large number of objects. The first decision is whether the memory map state should be stored internally or externally to the memory being managed. If it is stored internally, then the maximum number of allocated blocks does not need to be known in advance, but memory that is not directly CPU-addressable cannot be managed. If it is stored externally, you must know the maximum allocated and free blocks in advance and set aside memory for this state, but address spaces that are not directly CPU-addressable can be managed.

Madden previously used external storage for the memory state, but this required additional overhead because it was impossible to predict accurately the maximum number of allocated and free blocks. We had to include a “safety factor,” which turned directly into wasted memory. The new memory manager uses internal storage as shown in Figure 2, thus avoiding the entire issue. All allocations are 16-byte aligned and each block has 16-byte overhead. By limiting allocations to a minimum of 16 bytes, every block is guaranteed to have this much storage available. Therefore, when an allocated block is released, those 16 bytes can be used to organize the block into the free list.

It is worthwhile to digress slightly and consider the management of non-CPU-addressable memory. Because consoles are designed for low cost and high performance, they sometimes incorporate special-purpose memory that is not directly CPU-addressable. For example, the Gamecube has 16MB of auxiliary memory (ARAM) that supports only DMA access. A memory manager that stores its state internal to the memory it is managing cannot be used in such cases, while a memory manager that stores its state externally can. While it may seem appealing to use external state storage in order to support all kinds of memory, our experience with Madden has shown this to be a mistake. Memory that is not directly CPU-addressable is normally used only for special types of objects, due to the com-plexity of using the memory, and often contains only a small number of objects. Therefore, Madden now uses a single, general-purpose internal storage memory manager for all CPU-addressable memory and an additional, customized external storage memory manager for any special-purpose case, such as the Gamecube’s ARAM.

Fragmentation Control

Figure 2: Memory structure image.

Memory fragmentation is a condition that occurs over time as memory is allocated and released and isolated free blocks form. This is not usually an immediate problem, but as more fragments form and they get smaller, the opportunity for an allocation failure due to fragmentation increases. Eventually, if fragmentation gets severe enough, an allocation may fail because no single free block is large enough to accommodate the request even though the total amount of free memory is (Figure 3). Therefore, a good memory manager needs to either take steps to limit the amount of fragmentation that occurs or be able to consolidate fragmentation periodically through garbage collection. An allocation failure is often a lethal error for game code and must be avoided at all costs. While it is generally fairly easy to determine the maximum memory required by an application, fragmentation can make such a calculation meaningless. Allocation policy, free block coalescing and multiple heaps all play a part in minimizing fragmentation. Free memory coalescing is a straight-forward technique for limiting fragmentation that attempts to merge a newly released memory block with its neighbors. If the block immediately preceding or following the newly released block is also free, merging the blocks together results in a single, larger free block. This particular technique is almost mandatory, as without it, fragmentation occurs very quickly. In addition, this limits the size of the free list to the minimum number of blocks, which generally has a positive performance impact on allocation. Since this technique has no impact on how the memory manager is used externally, it is incorporated into most memory manager designs.

The choice of allocation algorithm has a definite impact on fragmentation. Unfortunately, because the size and timing of allocations and releases vary for every application and are often different from run to run of an application, it is impossible to say that a particular algorithm is always better or worse than another. However, in general, it has been found that “best fit” algorithms tend to exhibit the least fragmentation. See Johnstone and Wilson’s “The Memory Fragmentation Problem: Solved?” in References for more information on the impact of allocation techniques on fragmentation.

Figure 3 : Although 128 free bytes are available, the largest possible allocation is only 64 bytes due to fragmentation.

A technique that requires more involvement from the memory manager user is to allocate blocks with similar lifetimes close to each other. If the lifetime of a memory block is known when it is being allocated, the allocation algorithm can attempt to place it by other blocks with a similar lifetime. When these blocks are released, they will be merged into a single large free block. The problem is that this requires the caller to specify both the lifetime of each block as well as the total size of the group of blocks of similar lifetime. As a result, a simpler version of this technique is usually implemented through the use of multiple heaps. By allocating blocks with similar lifetimes within their own heap, a similar effect is achieved, though there are generally practical limitations on the number of heaps that can be effectively utilized.

Debugging Support

One of the main benefits of writing a memory manager is the abilityto add capabilities that are not provided in the supplied memory manager. By adding some debugging features you can make sure that you are managing the memory rather than stepping in the heap. To avoid stepping in the heap, you’ll need to be able to see the heap. Adding information and tools to help visualize memory usage can be a big help when managing memory. This can be done in a number of ways.

The debugging techniques used by the Madden memory manager include the ability to audit memory usage, collect usage statistics, check for consistency, and follow the memory map from a debugger. Auditing is really nothing more than being able to mark a group of blocks when they are allocated and later check to see if they were all released. Usage statistics, such as maximum number of allocated and free blocks, as well as maximum allocated memory, are valuable for evaluating the memory utilization of the application.

Consistency checking goes through the memory map and makes sure there are no detectable errors in it. This means verifying block sizes, checking boundary tags, ensuring matching block counts, and so on. By performing consistency checking prior to every allocation and release (in a debug build), certain forms of memory corruption can be detected quickly. Also, the memory map contains ASCII debugging information that can be viewed from within the debugger. Prefixing every allocated block with a text string indicat-ing the module and line number of the caller that allocated the memory greatly assists when debugging after a crash. Naming the memory blocks is a necessity. Imagine going to a store where every item was in a plain box that only had its UPC code and box size printed on it.

Finding the item you wanted to buy would be a big challenge. Likewise, if you have a list of memory blocks that only contains their address and size, locating memory blocks is going to be difficult unless you give those memory blocks names. Using the source file name and line number where the allocation occurred would be a good start. In C, this can be accomplished quite easily through the use of macros and the C preprocessor. Now you can recognize your blocks in a crowd, but it sure would be nice to know who they hang out with. By adding a group label or flags, you can group your allocations at the conceptual level rather than being limited to grouping your blocks by source file. This way you can know that a memory block that was allocated is really being used by your main character, even though the actual memory allocation occurred in your texture library.

Figure 4 shows an example memory dump as seen by a (slightly drunk) debugger. Including debugging information in textual form within the memory map allows you can make sense of it from the debugger. For example, if you had an invalid address access at 0x007cfea4 , you could find that address in the debugger and page around it to see that it occurred in the Joystick Driver library and that the memory in question was allocated by pJoyDrvr.c at line 225.



	Figure 4: Memory blocks with debug information as viewed in the debugger.

With all this information available, you will need to find ways to view that information that can help you when managing your game. A linear list of memory blocks can be useful for spotting potential memory fragmentation, while a list of memory blocks sorted by name can be useful when you have memory leaks.

If you do find a memory block that has leaked, you will know from its name exactly where the allocation occurred. With group labels you can print out compact memory maps that show how each conceptual module in your code is using resources. By tracking this throughout the project, you can easily spot modules that are using more or less memory than you had budgeted in your original design. You can also create functions to check whether groups of memory allocations have been freed. This can help prevent memory leaks if you know that in certain situations some groups have no allocations.

Keeping track of worst-case situations is also important. Have the memory manager save the lowest amount of free memory it has ever encountered (update this every time an allocation occurs). If you are using some sort of node system to keep track of your memory blocks, keep track of the highest number of nodes that the memory manager has ever used so that you know if you are close to running out of memory blocks. Sentinels added to the beginning and end of memory allocations can help detect overflow situations that would normally corrupt other data in memory. If the sentinels don’t check correctly when the memory is freed, then some code has been bad.

Filling memory with recognizable patterns can be extremely useful. We use a different pattern for unallocated, allocated, and freed memory. This way, we can tell in the debugger at a glance what the situation of a particular piece of memory is. When someone forgets to initialize memory they allocated properly, the “allocated” pattern shows up.

You can also have functions that scan free/unallocated memory and make sure that it all still matches the prespecified patterns. If it doesn’t match, some code out there is incorrectly writing to memory locations that it doesn’t own. Finally, make sure that you set up an extra heap for debug memory and put all this extra debug information there. You want your game memory to be as similar as possible between a debug and final build.

Platform Specifics

A memory manager presents a logical view of memory in the same way that a file system provides a logical view of a disk drive. Most often, memory managers are concerned with managing dynamic RAM of some sort. Some console makers like to make things more interesting by providing a relatively large main memory but also scattering other smaller chunks of RAM around the system.

The memory manager allows us to abstract away the physical details of the memory subsystem and deal instead with a nice, logical model of memory. For example, on the PS2 we don’t necessarily need to know that the first megabyte of RAM is reserved for the operating system. It’s enough that the memory manager knows. By abstracting away some of the details of the physical memory system, our game can become more platform independent.

Most console hardware has alignment requirements that, not so surprisingly, differ from platform to platform. Many platforms require anywhere from 4- to 64- byte alignment for buffers in graphics rendering or file IO. Each type of hardware might need the memory manager to be tweaked to better fit the needs and abilities of the platform. Often this information can be passed to the memory manager using a heap setup structure.

Finally, you should be wary of third-party libraries that may use malloc and free , effectively bypassing your memory manager’s interface. The printf function in the PS2 standard C library uses malloc and free ; our solution was to write our own printf function.

Hardware Support

On most modern computers, the issue of fragmentation has been greatly reduced by the use of a hardware-based paged memory manager unit (PMMU).

Obviously, the fact that virtual memory provides an application with lots of addressable memory means that even an inefficient memory manager can be used. However, the more interesting point is that the PMMU without any secondary storage can dramatically help with fragmentation. The PMMU takes a very large address space (larger than the physical RAM it represents) and maps it onto physical memory. Obviously, this mapping is not one-to-one, but rather it maps a subset of the memory space onto a “working set” of memory pages.

The key impact of using a PMMU in terms of fragmentation is that when a memory block is released, any portion of that block completely spanning one or more PMMU pages can be remapped by the PMMU. The result is actually two forms of fragmentation: address-space fragmentation and memory fragmentation. While this effect might seem to make a bad problem worse, it actually simplifies things. Because the PMMU provides a large address space, the address-space fragmentation can be largely ignored. Instead, the allocation algorithm concentrates on minimizing memory fragmentation by first attempting to utilize the unused portions of allocated memory pages (pages that contain some allocation but are not completely full) before remapping unused memory pages back into memory.

Managing Your Memory Manager

Memory managers are like most real-life managers. You have to keep them under control, or else they tend to wander off without adequate information and make strange decisions. Your memory manager sometimes needs your help to stay on the right track too. Let’s look at some common problems that can occur and how to work around them.

Fragged again.One of the major side effects of hand grenades and memory managers is fragmentation. Previously, we discussed fragmentation in the context of designing the memory manager to avoid or reduce the effects of fragmentation. However, there are also some application- level techniques that can reduce fragmentation.

The use of static allocation (memory defined within a module with storage allocated within the application image) avoids the memory manager completely and thus avoids fragmentation. Of course, this usage really only makes sense when a single object will be represented and when the lifetime of that object is approximately the duration of the entire application. In such cases, static allocation can provide a benefit by limiting utilization of the dynamic memory manager to those memory objects that are truly dynamic.

Another strategy that relies entirely on the memory manager user is to perform submanagement of memory based on specific program knowledge. For example, if a module knows that it needs to allocate X objects of size Y, it may be far more efficient for the user to allocate a single block of X * Y bytes and perform its own management within that larger block. By reducing the number of blocks that the memory manager has to deal with, the user has generally made the job of course, a caveat. Depending on the amount of fragmentation and the size of X * Y, it is possible that the application could find itself in the situation where an allocation of X * Y fails due to fragmentation, whereas X allocations of Y would have succeeded. We also try to discourage this practice when possible, as there is code-maintenance overhead.

One way to help avoid memory fragmentation is to always free memory in the opposite order from which it was allocated. In fact, if you were always able to allocate and release in such an order, you would never have memory fragmentation. Realistically, it’s not possible to follow this practice all the time, but doing it as much as possible will help. Memory fragmentation is going to occur, and at some point you will probably run into a situation where it causes a problem in your game. You might have fragmentation that is occurring in one part of your game that is causing a memory allocation to fail in a totally unrelated area. Fragmentation might even manifest itself as a situation where your game will fail only when arriving at part of your game through a specific path (that causes the fragmentation). If you don’t have some advanced form of garbage collection, you are going to have to use other, more crude methods to limit this problem. One possibility is to change the code that is causing the fragmentation to use a different allocator so that it doesn’t cause fragmentation. A common way to accomplish this is to have an allocator that allocates from the other end of memory.

Depending on your game, fragmentation can become more problematic over time (especially if your allocations don’t have close locality of lifespan). You can use brute force to minimize these effects, such as shutting down and restarting code modules between levels as a brute-force garbage collection technique.

Release builds.When running a release build, there isn’t any debugging information in the game, as it will consume extra memory. But you still need a way to know where the game runs out of memory. For Madden, we assume in memory. If the game does run out, it will crash and optionally dump some memory info. With a special build, we display some partial information about the memory manager and the game state so that we can determine if it ran out of memory because of fragmentation or other reasons.

Getting on the Field

When talking about memory management, programmers often resort to words more often associated with cow pastures than games. Terms like heaps, stacks, and chunks are thrown around like flies buzzing around you-know-what. To see how important a memory manager is to a game, you have to get past the abstract poo. A good memory manager allows you to have more animations, more characters, more textures, more sounds — in short, more of everything that your game-playing customers love.

In this article we have described some of the issues that may come up in writing and using your own memory manager. After years of writing and rewriting memory managers for the Madden series, one piece of advice that bears highlighting is simply to make sure you schedule adequate design time on this very important piece of your system; you’ll be glad you did.

References

The Memory Management Reference Beginner’s Guide: www.memorymanagement.org/
articles/begin.html

Johnstone, Mark S., and Paul R. Wilson. “The Memory Fragmentation Problem: Solved?” In Proceedings of the International Symposium on Memory Management.ACM Press, 1998. pp. 26–36. www.cs.utexas.edu/users/wilson/papers/fragsolved.pdf

Knuth, Donald E. The Art of Computer Programming, vol. 1, Fundamental Algorithms. Reading, Mass.: Addison-Wesley, 1973. pp. 435–444.