I recently had the opportunity to spend a lot of time with .Net and COM. The architecture for the application is a COM C++ dll containing the business logic called through a c# .Net 1.1 dll to abstract away some of the complexity in a classic ASP web application for the UI. The issue started simply, a user loads everything up then logs out and the next user causes the web server to crash, sometimes with a db driver error.
The issue was reported in a way that could be it’s own blog post on how users can get a bug report wrong. I began investigating the db driver. However, all signs indicated it was the right one.
Some more research isolated the exact sequence of actions. At first, it seemed random. As described above the app has three layers; c++, c#, asp. In most of the app the middle c# layer is done in VB6. When you login and work and logout in the VB6 portions nothing happens. When you login and workand logout in the c# then go back to VB6, nothing happens. But when you login and work in c# and logout, the next user to login causes IIS to fail. This is odd, because if an error was in the c# code why did
jumping through VB6 code fix the error. It seems like the cause is c# or c++, but where?
So, the next step was to fire up the VC6 debugger, attach to inetinfo, and hope to catch an exception in c++. c# has much less code to check, but all of the code is suspect. c++ could be killing IIS, but since you’re logging out the user might not get an error. This took a lot of time and got into aspects of COM I hope never to see again. Exceptions were happening as expected, but the errors made no sense. There where three areas. Two, in user generated code that gets run all the time by lots of time. One in computer generated code that is not available to the programmer. I fixed one of the user code areas, the other was not changeable. Stack is of note too, because it gave some really odd readings. The debugger would stop on a line of code and if the variables were readable the address for the object might be x0. That’s the address of null and it’s not possible to have an object there much less have properties.
It seemed like there was a bug in our Standard Template Library (STL). Windows had a report for a fixed defect with somewhat similar properties. For the moment I set this idea aside. Changing STL is very expensive and the functions that errored are very basic. A real STL bug would have shown up years ago.
The error I was able to fix in the c++ code was a clue. I could track it back to a line of c# code. So, I tried modifying c# without changing the purpose of the code. The first attempt was changing a one line call to a two line and it was successsful. Like below. I could undo my c++ change and the 2 lines in c# would keep the error from occurring.
MyObject myObject = objSession.objectManager.GetNewMyObject();
to
ObjectManager objectManager = objSession.objectManager;
MyObject myObject = objectManager.GetNewMyObject();
This didn’t particularly make sense. I was happy it worked, but the advice I got was that it shouldn’t matter. However, we getting into the interface between Microsoft old COM and Microsoft new .Net. In the end I understood exactly why this worked.
After some research into COM/.Net, related memory leaks, attempts to contents of the garbage collector, and some well place debug messages, I firmly believed that the cause of my defect were unreleased parts of the c++ COM app in .Net’s garbage collector that leaves memory in a bad state the next time an application runs. In the above code, objSession is released, but objectManager or myObject is not. .Net tells Windows it still has a reference to those objects. When you run the VB6 part of the application the c# code is destroyed completely and the COM(c++) on COM(VB6) action works out the inconsistencies.
Now for the .Net wackiness. I found out to my surprise that you might have to tell .Net to release/destroy a COM object more times than you created it. For example, the function GetDependencyArray might create many more myObjects in memory. If you destroy only one the others may or may not cause a memory leak.
ObjectManager objectManager = objSession.objectManager;
MyObject myObject = objectManager.GetNewMyObject();
System.Array ar = myObject.GetDependencyArray();
If you dig a little deeper you’ll see that .Net releasing a COM object is not like anything else in another language. To destroy a COM object you release it, which signals to the Garbage Collector (GC) that it can reuse that memory when it wants to. You’re done with it. The call for the above example would be:
int count = Marshal.ReleaseCOMObject(myObject);
myObject = null;
The integer that is returned tells you how many more COM objects of the same rough type there are left in memory. Setting myObject to null keeps you from accidentally trying to use the object and get an error about how the RCW has already been released.
Since you may have multiple instances that you’re unaware of MS suggests running this in a loop till the count is 0. In .Net 2.0, they have a function to run the loop for you, FinalReleaseCOMObject.
do {
int count = Marshal.ReleaseCOMObject(myObject);
} while (count > 0);
myObject = null;
The wierd part is that ReleaseCOMObject only cares about the type of myObject. This is a problem, because it provides a global destruction spaces for all COM objects. Any part of the application can repeatedly call ReleaseCOMObject and destroy the COM objects used by the rest of the program that have that same type. In most languages you want the memory address of the object and release that. Once the object/address is gone it’s gone. You can’t do it twice and you certainly can’t destroy another object, just because they have the same type.
It’s hard for me not to get really pissed off and rant for the rest of the blog. In my opinion MS fucked this up and didn’t look back. Here’s why and the cause of my month long trek to defect resolution.
I read Sam Gentile’s blog, the .Net team blog, and a few programmer websites. They all said the same thing. Run ReleaseCOMObject in a loop, review your program to keep from wiping out too many objects, and next time don’t call COM from .Net it’s not designed to do that very well. You should use all .Net. After all anything else is unmanaged code.
My requirements were to look for every single line that created a COM object and provide a destructor. It was not unreasonable for two or more parts of the application to have a reference to the same object. And it’s impossible for the parts to know about the other’s reference so that they don’t release it causing a runtime error. When the .Net code closes all COM objects everywhere must be released. This solves my original defect. I never did figure out where in c# it was causing the c++ errors.
This is my solution and it’s unique so far as I’ve looked on the web. First, you need to record the reference count of every type of COM object that’s released. Any that aren’t 0 are suspect. Next, you need to create an object cache for the COM objects. Remember, you only need one cached object for each type of COM object. You can use Reflection to get the types. Store that and your reference count. I also store the variable name for debug tracking purposes. You need two functions, one Releases once, no loop. Most times only one call is needed. If the reference count is greater than 0, the object is saved in the cache. The other function is a FinalRelease. The name is misleading, because FinalRelease always leave one object in memory. When you’re program exits it’s time to release everything completely. Go through the object cache and release them till the count is 0. I also have it spit out a debug message when it is not 0. Sometimes, the object cacher releases everything and the normal destructor runs and drops the reference count to -1.
What would have made this a lot easy would be very easy for MS to have done in .Net 2.0. After all they had four years.
- Make the reference count a public variable.
We all know it’s there, but the only time you can see it is when you release a COM object. There is not reason to not provide and interface. Then you could record the count when your functions starts and release down to that count when your function stops. SIMPLE solution. Private to public, wow. - Provide an object cache mechanism like I just created.
I’m no genius, why didn’t MS figure this out. Certainly, a layer to deal with specifically with this type of COM/.Net interface is to be expected. What’s with the ability to release all objects globally. OMG, are you kidding me? What part of the first day of programming in the first class of CS did they not understand? - Interops should make destructors that call ReleaseCOMObject.
One of the solutions I really liked, but didn’t have time to implement was to provide a layer over the .Net Interop to the COM object. For each object you would add a destructor that calls ReleaseCOMObject. Now, you want to be careful and not release all COM objects, but with #1 and #2 that should be easier. Why am I required to do this? Can’t the Interop? After all it’s solely created to be the interface between .Net and COM. - Release COM Objects used by a .Net dll when .Net is destroyed.
The root problem wouldn’t have existed if .Net kept track, like I had to do, of which .Net app owned which COM object. When that .Net app is cleaned up so should all it’s COM references. Maybe technically difficult, but seems obvious that is should be done. What’s a Garbage Collector that doesn’t collect garbage? - Release a COM object by giving only the type.
At one point, I thought, “there’s no need for a cache I’ll just make a COM reference to what I know is there and release them all.” After all ReleaseCOMObject, doesn’t need a real COM Object. It just uses the type. So, why not let us pass in a type or type string and release those objects. MS, don’t abstract in complexity under the guise of simplicity. - ReleaseCOMObject sets the object to null
Why do I always have to have a second line to set the object to null after I released it? It’s dumb and makes my code verbose. Should the object ipsofacto be set to null since I just released. MS, why do you leave it open to cause an error when something else tries to call the released unnulled object?
MS knew .Net and COM would have to work together. After all COM has been around for a decade or more and by now and thousands, maybe millions of apps, large and small, are written using it. MS says you can call COM from .Net, but as this experience outlines, it’s not easy and is buggy. There should be a range of functions in .Net that tap into COM or are available to COM. Since, I haven’t written the COM that my .Net app is call I don’t know how I could write it better for .Net. Though you should be able to just like you had to tweak c++ for VB6.
My perspective is a VB6 programmer. It’s not complicated, you can write very quickly, programs are a little slow, but mostly it just works. I have no idea what managed or unmanaged code is and I don’t care. My job isn’t space shuttle software or real time nuclear reactor control. Just make it work.
Why is it this bad, still after 5-6 years? My guess is this interface deal was a hard task. And it was not given top priority, as it should. A bit of programmers would rather work on new and shiney than old and reliable. I’m sure it’s on the list, MS isn’t stupid nor do they hire stupid people. Based on experience, these issues will be resolved after the switch to .Net from COM has hit the tipping point. Another 5-6 years, at around .Net 5.
I’m just pissed I had to clean up their crap knowing they will fix it themselves eventually and more efficiently than I could.