Memory and garbage collection



Most programmers have the same few objections when they hear about .Net. "What makes this any better than Java?"

"Oh, language neutrality. Yes, that makes sense."

"But, what about that JIT overhead?"

"No, I guess that doesn’t sound that bad."

But then there’s the Big One. It’s usually not even phrased as a question: "Garbage collection sucks."

The only possible answer to that is No, it really doesn’t suck. Garbage collection actually has a lot of nice features:

Allocation is fast. The system is just advancing a pointer, not manipulating a linked list.

Consecutive allocations are adjacent, not scattered all over the heap, which helps cache performance.

Your code is smaller, simpler, and more reliable, because you never have to worry about who owns a block and because you never have to free the memory you allocate.

You never have memory leaks. You never have data structures that refer to memory that’s been freed.

These are four rather impressive advantages. Reference counting (like Delphi’s strings, dynamic arrays, and interfaces use) offers the same no-need-to-free simplicity and safety, but you pay for it with the overhead of maintaining the reference counts and reference counting can’t handle circular references. (That is, if A refers to B, and B refers to A, neither reference count will ever go to 0.)

Garbage Collection Speed

You may be thinking that it doesn’t matter how garbage collection can help you if it means your program might lock up for several seconds anytime it gets asked to do something. And you’d be right that did suck, back in the ’70’s and ’80’s on Lisp machines and such.

Microsoft did a lot of hard work, and their garbage collection doesn’t suck. A full garbage collection one that scavenges all freed memory and leaves all the free memory as a single contiguous chunk – takes less time than a page fault. Which you typically don’t even notice.

Garbage collection can be so fast because memory life spans are distributed according to a power law. Most memory is freed quite soon after it’s allocated. Most of what’s left is freed within seconds or minutes. And most of what lasts longer than that lasts until the program shuts down.

So, the CLR has a three generation garbage collector. When the system has done ‘enough’ allocations (by default, this is tied to the size of the CPU’s Level 2 cache), it does a generation 0 garbage collect. This looks at the most recently allocated blocks, and finds the ones that are still in use. The system only has to pay attention to the blocks that aren’t garbage. These get moved down to the bottom of the partition and promoted to generation 1, which means that the next generation 0 collection won’t look at them. Once all the current data has been moved to the bottom of the partition, what’s left is free memory.


 When you’ve done ‘enough’ more allocation or a generation 0 collection can’t make enough room – the system does a generation 1 collection, which finds all the blocks that have become garbage since being promoted to generation 1. All survivors are moved and marked as generation 2, and won’t be touched again until a generation 1 garbage collection can’t make enough room. A generation 2 garbage collection just moves the surviving blocks down; it does not promote them to generation 3.

As you can see, this three generation garbage collection minimizes the time the system spends repeatedly noticing that a long-lived object is still alive. This in turn decimates the number of times a long-lived block gets moved. The idea of generations also saves time in a more subtle way. The way the system detects that an object is still live is to walk every reference from a set of "roots" on down. (It can do this because it has type data for every structure in the system. It knows every field of every structure.) This walk can stop as soon as it reaches an object that is a higher generation than the garbage collection: e.g., every reference in a generation 1 object is to a generation 1 or 2 object, which a generation 0 sweep doesn’t care about.


Since the garbage collector can find all active references to any allocated object, the runtime doesn’t need to track reference counts for strings, dynamic arrays, and interfaces. Not tracking reference counts can save a lot of time, especially with routines that pass their string parameters on to lower-level routines.

One thing that reference counting does do better than garbage collection is resource protection. That file will get closed, that visual cue will get restored, at the moment when your interface variable goes out of scope and the object is freed. With garbage collection, you can have a finalization routine that gets called when the block is scavenged, but you have no control over when it happens. This means that a whole class of "failsafe" Delphi techniques that rely on interface finalization are invalid under .Net.

Weak references

One final nice point about garbage collection is that it lets you have weak references, just like Java does. A weak reference is a reference to a bit of data that you can regenerate, if you have to, but that you’d like to keep around, if possible, because regeneration is expensive. This is useful for things like a browser cache, or relatively infrequently used singleton objects like the system Printer or Clipboard.

When you need the data again, you can examine the weak reference’s Target property, which will either contain a valid reference or Nil. If the Target is Nil, that means the memory has been garbage collected. If the Target is not nil, you now have a normal (strong) reference, which will keep the data from being garbage collected just like any other normal reference does.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s