Wenfeng's Blog: April 2005

Thursday, April 28, 2005

Method Slot Table and MethodDesc

Embedded within the MethodTable is a table of slots that point to the respective method descriptors (MethodDesc), enabling the behavior of the type.

The Method Slot Table is created based on the linearized list of implementation methods laid out in the following order: Inherited virtuals, Introduced virtuals, Instance Methods, and Static Methods. The ClassLoader walks through the metadata of the current class, parent class, and interfaces, and creates the method table.

Method Descriptor (MethodDesc) is an encapsulation of method implementation as the CLR knows it. A MethodDesc is generated as a part of the class loading process and initially points to Intermediate Language (IL). Each MethodDesc is padded with a PreJitStub, which is responsible for triggering JIT compilation.

The method table slot entry actually points to the stub instead of the actual MethodDesc data structure. This is at a negative offset of 5 bytes from the actual MethodDesc and is part of the 8-byte padding every method inherits. The 5 bytes contain instructions for a call to the PreJitStub routine. Upon the first invocation, a call to the JIT compilation routine is made. After the compilation is complete, the 5 bytes containing the call instruction will be overwritten with an unconditional jump to the JIT-compiled x86 code.

CodeOrIL before JIT compilation contains the Relative Virtual Address (RVA) of the method implementation in IL. This field is flagged to indicate that it is IL. The CLR updates this field with the address of the JITed code after on-demand compilation.

Again all the notes above are taken from the article JIT and Run: Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects by Hanu Kommalapati and Tom Christian.

A SOS Extension Issue in VS 2005 Beta 2

I'm using the SOS extension in VS 2005 Beta 2. One issue that I found is that the outputs from some SOS commands such as !DumpHeap -type or !DumpObj always display data and many messages twice. That's annoying.

Wednesday, April 27, 2005

ObjectInstance

We'll use the term ObjectInstance for the data structure located at the address pointed to by the object reference.

Before an object instance is created, the CLR looks up the loaded types, loads the type if not found, obtains the MethodTable address, creates the object instance, and populates the object instance with the TypeHandle value. The JIT compiler-generated code uses TypeHandle to locate the MethodTable for method dispatching. The CLR uses TypeHandle whenever it has to backtrack to the loaded type through MethodTable.

A typical object instance layout is as follows.

An index (a 1-based syncblk number, DWORD) into a SyncTableEntry table. For most object instances, there will be no storage allocated for the actual SyncBlock and the syncblk number will be zero. This will change when the execution thread hits statements like lock(obj) or obj.GetHashCode.
The TypeHandle that points to the MethodTable of the corresponding type.
A variable list of instance fields. The lexical sequence of member variables in the source code is not maintained in memory by default.
String literals

An object can be referenced from stack-based local variables, handle tables in the interop or P/Invoke scenarios, from registers (the this pointer and method arguments while executing a method), or from the finalizer queue for objects having finalizer methods. The OBJECTREF does not point to the beginning of the Object Instance but at a DWORD offset (4 bytes).

ObjSize (SOS command) will not include the memory taken up by the syncblk infrastructure. Also, in the .NET Framework 1.1, the CLR is not aware of the memory taken up by any unmanaged resources like GDI objects, COM objects, file handles, and so on.

Again all the notes above are taken from the article JIT and Run: Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects by Hanu Kommalapati and Tom Christian.

Richard Grimes on EventLog

The following are comments made by Richard Grimes on Dan Fernandez's Blog.

In the past I have written a lot of C++ code to use and manipulate the NT event log. The API was inherited from OS/2 and it is a little arcane and was in need of a replacement. I wrote several articles about programming the event log using the Win32 API and one of these was in the MSDN library. Essentially, the application provides a resource only DLL with format strings that have placeholders. These format strings are localised. The DLL is registered on the machine that reads the messages. The application merely has to provide the ID of the format string and the strings for the placeholders. The advantage of this mechanism is that the event log files are kept small and that *localisation is performed by the reader*.

The usual configuration is that when an event log file fills up it overwrites old messages, so it is important to make sure that the relevant messages in the log are read before they are overwritten. If the messages in the event log are large then you have less chance of doing this.

However, the localisation aspect is the most important. I worked on the error reporting in a distributed application used at 500 sites (something like 5000 machines) in several countries across Europe. Using the event log a machine in France, for example, could log a message and the user could read it in French, when the event log files were sent to the support centre in England the event messages could be read in English. How elegant is that?

Now let's look at how System.Diagnostics.EventLog does this. Well it provides a single resource DLL *for all applications*. This DLL has 65,000 format strings that look like this: %s, that is there is just one placeholder so that the application has to provide the entire string. This means that the messages are long (which brings up the issue of event log files filling up) and it means that localisation has to be performed *by the application*. Note that no crystal ball is provided. The application has to guess the culture of the reader, and by default the current locale of the application is used. So in my distributed application that would mean that the messages would be reported in French, and I would not be able to read them in English. The application could chose 'culture neutral' strings (ie US English) but that would mean that the French users would not see the message in their language.

The .NET EventLog class does not use the event log the way that it is designed to work. FYI it works in *exactly* the same way as the equivalent class in VB6, which makes me suspicious as to whether the C++ code used in the VB6 runtime was converted to C# and recompiled. I filed a bug about this in the beta and got a dismissive response from the developer stating that people have large hard discs and so could use large event log files. This showed a lack of understanding about the event log. In Whidbey, the EventLog class has been extended to allow you to use a message format DLL, but the damage has already been done.

Type Fundamentals

The CLR version of a type declaration consists of a MethodTable and an EEClass. If you use the SOS command !Name2EE, you will get both addresses of EEClass and MethodTable for a type. Except the !Name2EE command, you can get both addresses using the !DumpObj command.

EEClass comes to life before the MethodTable is created. In fact, EEClass and MethodTable are logically one data structure (together they represent a single type), and were split based on frequency of use. Fields that get used a lot are in MethodTable, while fields that get used infrequently are in EEClass. Thus information (like names, fields, and offsets) needed to JIT compile functions end up in EEClass, however info needed at run time (like vtable slots and GC information) are in MethodTable.

MethodTable and EEClass are typically allocated on the domain-specific loader heaps. Byte[] is a special case; the MethodTable and the EEClass are allocated on the loader heaps of the SharedDomain.

EEClass

There will be one EEClass for each type loaded into an AppDomain. This includes interface, class, abstract class, array, and struct. The CLR class loader creates EEClass from the metadata before MethodTable is laid out.

Each EEClass is a node of a tree tracked by the execution engine. CLR uses this tree to navigate through the EEClass structures for purposes including class loading, MethodTable layout, type verification, and type casting.

EEClass has a circular reference to MethodTable. EEClass is allocated on the LowFrequencyHeap of the AppDomain so that the operating system can better perform page management of memory, thereby reducing the working set.

EEClass has three fields to manage the node relationships between loaded types: ParentClass, SiblingChain, and ChildrenChain.

MethodTable

There will be one MethodTable for each declared type and all the object instances of the same type will point to the same MethodTable. This will contain information about the kind of type (interface, abstract class, concrete class, COM Wrapper, and proxy), the number of interfaces implemented, the interface map for method dispatch, the number of slots in the method table, and a table of slots that point to the implementations.

A pointer to the MethodTable can be acquired even in managed code through the Type.RuntimeTypeHandle property. TypeHandle, which is contained in the ObjectInstance, points to an offset from the beginning of the MethodTable. This offset is 12 bytes by default and contains GC information.

MethodTable is allocated on the HighFrequencyHeap of the AppDomain. One important data structure MethodTable points to is EEClass.

All the notes above are taken from the article JIT and Run: Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects by Hanu Kommalapati and Tom Christian. I do understand the meaning of "loading a type" in CLR much better after reading the article.

Tuesday, April 26, 2005

SOS Commands

I started to use SOS commands one year ago while reading Production Debugging for .NET Framework Applications. I also used SOS commands to investigate my own apps ten months ago. I was quite familar with SOS commands then. But I almost forget SOS commands completely now. I'd like to make a list of SOS commands commonly used so that I can use them later. The commands listed below are stolen from both Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects and Production Debugging for .NET Framework Applications.

Before you load SOS into the process, enable unmanaged code debugging from the project properties in Visual Studio .NET. Add the directory in which SOS.dll is located to the PATH environment variable.

Now you can start your VS debugging process. Then open Debug Windows Immediate. In the immediate window, you can execute SOS commands.

To load SOS.dll (You must do this before you execute any SOS commands),
.load sos.dll

Use !help to get a list of debugger commands.

!DumpDomain

//DumpHeap allows listing of all the heap contents and all the instances of a particular
// type
!DumpHeap -type SimpleClass

// DumpObj dumps the contents of an instance
!DumpObj 0x00a8197c

// ObjSize dumps the space taken up by the instance
// ObjSize will not include the memory taken up by the syncblk infrastructure
// In the .NET Framework 1.1, the CLR is not aware of the memory taken up by any
// unmanaged resources like GDI objects, COM objects, file handles, and so on
!ObjSize 0x00a8197c

// disassemble an address
!u 0x00955263

// dump a method table
!DumpMT -MD 0x9552a0

// dump a method descritor
!DumpMD 0x00955268

// find the addresses of EEClass and MethodTable for MyClass
// The first argument to Name2EE is the module name
// that can be obtained from DumpDomain command.
!Name2EE C:\Working\test\ClrInternals\Sample1.exe MyClass

!DumpClass 02ca3508

// examine the GC heap size
!eeheap -gc

!dumpheap -stat

!gcroot 17b90018

Monday, April 25, 2005

Compatibility Issues From IIS 5.0 To IIS 6.0

Our company has an application (CB/VSV) working on Win2K/IIS 5.0. Recently I was assigned the task for investigating CB/VSV compatibility issues on Win2K3/IIS 6.0. Here is a couple of issues I found:

The default installation of IIS on a Windows 2003 Server doesn’t include the support of ASP.NET. If you apps are .NET application, you must check the ASP.NET option. Both the ASP.NET subcomponent and the IIS subcomponent are components of the Application Server component. After installing the ASP.NET component, one Web Service Extension named “ASP.NET v1.1.4322” is added to the list of Web Services Extensions and is allowed by default.
All the Web Service Extensions are prohibited by default. The URL of the login page of our app is "http://localhost/portal/bin/home.dll". So IIS 6.0 doesn’t know which executable maps to home.dll by default. I have to Allow the “All Unknown ISAPI Extensions” extension in IIS 6.0.

MSDN provides samples on Using System.DirectoryServices to Configure IIS. I couldn't find documentation at MSDN about the names of the properties of a DirectoryEntry object representing a virtual directory. The way that I approach this is to enumerate the properties of a virtual directory. Save the data in a file and then I change the properties that I’m interested in manually. Then save it again in another file. Finally I compare the two files using WinDiff to figure out the names of the properties that I’m interested in.

Sunday, April 24, 2005

Domains Created by the CLR Bootstrap

I'm reading the article JIT and Run: Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects by Hanu Kommalapati and Tom Christian. It's an excellent article. It's real internal and worth reading three times throughly. This post is my first notes on this article.

Before the CLR executes the first line of the managed code, it creates three application domains: the System Domain, the Shared Domain, and the Default AppDomain. The first two are singleton and can only be created through the CLR bootstrapping process facilitated by the shim—mscoree.dll and mscorwks.dll (or mscorsvr.dll for multiprocessor systems). The third is an instance of the AppDomain class that is the only named domain. Additional domains can be created from within managed code using the AppDomain.CreateDomain method or from unmanaged hosting code using the ICORRuntimeHost interface.

System Domain

The SystemDomain is responsible for creating and initializing the SharedDomain and the default AppDomain. It loads the system library mscorlib.dll into SharedDomain. It also keeps process-wide string literals interned implicitly or explicitly.

SystemDomain is also responsible for generating process-wide interface IDs, which are used in creating InterfaceVtableMaps in each AppDomain. SystemDomain keeps track of all the domains in the process and implements functionality for loading and unloading the AppDomains.

SharedDomain

All of the domain-neutral code is loaded into SharedDomain. Mscorlib is automatically loaded into SharedDomain. Fundamental types from the System namespace like Object, ValueType, Array, Enum, String, and Delegate get preloaded into this domain during the CLR bootstrapping process. User code can also be loaded into this domain, using LoaderOptimization attributes specified by the CLR hosting app while calling CorBindToRuntimeEx. SharedDomain also manages an assembly map indexed by the base address, which acts as a lookup table for managing shared dependencies of assemblies being loaded into DefaultDomain and of other AppDomains created in managed code.

DefaultDomain

DefaultDomain is an instance of AppDomain within which application code is typically executed. Each AppDomain has its own SecurityDescriptor, SecurityContext, and DefaultContext, as well as its own loader heaps (High-Frequency Heap, Low-Frequency Heap, and Stub Heap), Handle Tables (Handle Table, Large Object Heap Handle Table), Interface Vtable Map Manager, and Assembly Cache.

The default AppDomain can't be unloaded and hence the code lives until the CLR is shut down.

LoaderHeaps

Each application domain has its own loader heaps. LoaderHeaps are meant for loading various runtime CLR artifacts and optimization artifacts that live for the lifetime of the domain. These heaps grow by predictable chunks to minimize fragmentation. The GC Heap hosts object instances while LoaderHeaps hold together the type system. Frequently accessed artifacts like MethodTables, MethodDescs, FieldDescs, and Interface Maps get allocated on a HighFrequencyHeap, while less frequently accessed data structures, such as EEClass and ClassLoader and its lookup tables, get allocated on a LowFrequencyHeap. The StubHeap hosts stubs that facilitate code access security (CAS), COM wrapper calls, and P/Invoke.

Mscorlib.dll is loaded into the SharedDomain but it is also listed against the SystemDomain. The SystemDomain and the SharedDomain use the same ClassLoader, while the Default AppDomain uses its own.

The HighFrequencyHeap initial reserve size is 32KB and its commit size is 4KB. LowFrequencyHeap and StubHeaps are initially reserved with 8KB and committed at 4KB. Each domain has a InterfaceVtableMap that is created on its own LoaderHeap during the domain initialization phase. The IVMap heap is reserved at 4KB and is committed at 4KB initially.

Global Heaps

There are four global heaps outside all the application domains: Process Heap, JIT Code Heap, GC Heap, and LOH Heap. The just-in-time (JIT) compiler generates x86 instructions and stores them on the JIT Code Heap.

Friday, April 15, 2005

Gordon Moore: Software is too complex

Gordon Moore's comments:

Graphical users interfaces are becoming too complex for people to make effective use of the underlying power of their computer.
The capability of computers keeps growing and the number of applications running keeps increasing, but the people building the interface keep growing the complexity of that.
He added that he would like a much simpler interface "but I don't know what it would look like."

Read the full story in Gordon Moore's own words here. My notes are from Gordon Moore: Software is too complex - OSNews.com.

Tuesday, April 05, 2005

Signed Messages in WSE

Signing a message in WSE is easy. The following code demonstrates that.

public string InvokeHelloWorld()
{
String sto = X509CertificateStore.MyStore;

// Open the certificate store
X509CertificateStore store = X509CertificateStore.CurrentUserStore(sto);
store.OpenRead();

// Find the certificate you want to use
String certname = System.Configuration.ConfigurationSettings.AppSettings["CertificateName"];
X509CertificateCollection certcoll = store.FindCertificateBySubjectString(certname);

if (certcoll.Count == 0)
{
// handle this
return null;
}
else
{
X509Certificate cert = certcoll[0];
DemoServiceWse svc = new DemoServiceWse();
SoapContext ctx = svc.RequestSoapContext;

// Use the certificate to sign the message
SecurityToken tok = new X509SecurityToken(cert);
ctx.Security.Tokens.Add(tok);
ctx.Security.Elements.Add(new MessageSignature(tok));

// Invoke the web service
return svc.HelloWorld();
}
}

The code is taken from Ingo Rammer's article Using Role-Based Security with Web Services Enhancements 2.0 with some modifications. And aslo the error handling code is omitted.

One thing that I want to point out is that: instead of inheriting from System.Web.Services.Protocols.SoapHttpClientProtocol as the non-WSE proxies do, the WSE proxies will extend Microsoft.Web.Services.WebServicesClientProtocol which contains a number of additional properties. So DemoServiceWse is a WSE proxy.

As you can see from the code, the syntax that WSE uses at the toppest level to deal with security related tasks such as authentication, signning a message, ..., is really easy. It lies in
SoapContext.Security. You first obtain a request or response SoapContext object. So you have SoapContext.Security, which is a Security object. A Security object maintains two collections: a strongly typed collection of security tokens and a strongly typed collection of security elements, and a Timestamp property. I will discuss the relationships among Tokens, elements, and SOAP headers in the forthcoming posts.

Sunday, April 03, 2005

Enhanced Security Configuration for Internet Explorer

Several changes were made to the default settings in Microsoft Internet Explorer 6 for Microsoft Windows Server 2003. The following are the changes URL security zones.

Default security template for the Internet Zone is adjusted from Medium to High
Default security template for the Trusted sites is adjusted from Low to Medium

In Win2K3, automatic detection of intranet sites is disabled. ActiveX controls, script, and the Microsoft virtual machine (Microsoft VM) cannot be used from any Internet Web site. Additionally, files cannot be downloaded from these sites.

Intranet vs. Trusted sites

Default security template for Intranet (in Win2K3) is Medium-low; Intranet uses the LocalIntranet permission set.
Default security template for Trusted sites (in Win2K3) is Medium; Trusted sites use the Internet permission set.

The Medium-low security template allows NTLM credentials to be sent to sites that request them.

The order for zones from high security to low security in Win2K3 is: Restricted sites, Internet, Truested sites, and Intranet. And the order for zones from high security to low security in Win2K is: Restricted sites, Internet, Intranet, and Trusted sites.

Wenfeng's Blog