Home .NET An introduction to the inner workings of the .NET Framework.Let’s see how CLR creates objects

An introduction to the inner workings of the .NET Framework.Let’s see how CLR creates objects

by admin

Here is a translation of Hanu Kommalapathy and Tom Christian’s article on the inner workings of .NET. There is an alternative translation at Microsoft
This article discusses :

  • SystemDomain, SharedDomain, and DefaultDomain
  • Representation of an object and other specifics of memory organization
  • Representation of a table of methods
  • Distribution of methods

Technologies used : .NET Framework, C#

Contents

  1. Domains created by the bootloader
  2. System domain
  3. Public Domain (Shared)
  4. Default domain
  5. Heap loader
  6. Basics of types
  7. Object instance
  8. Table of methods
  9. Size of the base instance
  10. Table of method slots
  11. Method Descriptor
  12. Map of interface virtual method tables and interface map
  13. Virtual allocation
  14. Static variables
  15. EEClass
  16. Conclusion

The Common Runtime Environment (CLR) is becoming (or has become) the main infrastructure for building applications in Windows, so having a deep understanding of its inner workings will help build effective industrial-grade applications.
In this article we will explore the inner workings of CLR, including the object instance layout, method table layout, method allocation, interface allocation, and various data structures.
We will use very simple snippets of C# code; any implicit use of programming language syntax implies C#. Some of the data structures and algorithms discussed will be changed in future versions of the Microsoft® .NET Framework, but the conceptual framework will remain the same. We will use the Visual Studio® .NET 2003 debugger and the Son of Strike (SOS) debugger extension to look at the data structures discussed in this article. SOS loads CLR’s internal data, and allows you to view, save the information of interest. See the procedure for loading SOS.dll into the debugger process in the appropriate sources.
See the "Son of Strike" sidebar for loading SOS.dll into the Visual Studio .NET 2003 debugger process.
In this article we will describe the classes corresponding to the Shared Source CLI (SSCLI) implementations.
The table in Figure 1 will help in exploring the megabytes of code in SSCLI while searching for the necessary structures.
Figure 1 SSCLI Links

Component SSCLI Path
AppDomain /sscli/clr/src/vm/appdomain.hpp
AppDomainStringLiteralMap /sscli/clr/src/vm/stringliteralmap.h
BaseDomain /sscli/clr/src/vm/appdomain.hpp
ClassLoader /sscli/clr/src/vm/clsload.hpp
EEClass /sscli/clr/src/vm/class.h
FieldDescs /sscli/clr/src/vm/field.h
GCHeap /sscli/clr/src/vm/gc.h
GlobalStringLiteralMap /sscli/clr/src/vm/stringliteralmap.h
HandleTable /sscli/clr/src/vm/handletable.h
InterfaceVTableMapMgr /sscli/clr/src/vm/appdomain.hpp
Large Object Heap /sscli/clr/src/vm/gc.h
LayoutChild /sscli/clr/src/bcl/system/runtime/interopservices/layoutkind.cs
LoaderHeaps /sscli/clr/src/inc/utilcode.h
MethodDescs /sscli/clr/src/vm/method.hpp
MethodTables /sscli/clr/src/vm/class.h
OBJECTREF /sscli/clr/src/vm/typehandle.h
SecurityContext /sscli/clr/src/vm/security.h
SecurityDescriptor /sscli/clr/src/vm/security.h
SharedDomain /sscli/clr/src/vm/appdomain.hpp
StructLayoutAttribute /sscli/clr/src/bcl/system/runtime/interopservices/attributes.cs
SyncTableEntry /sscli/clr/src/vm/syncblk.h
System namespace /sscli/clr/src/bcl/system
SystemDomain /sscli/clr/src/vm/appdomain.hpp
TypeHandle /sscli/clr/src/vm/typehandle.h

A point worth pointing out before we go any further is that the information provided in this article is only valid for .NET Framework 1.1 (also basically the same as Shared Source CLI 1.0, with some notable exceptions present in the various communication scenarios) when running on the x86 platform. The information has changed in future versions of the .NET Framework, so please do not build your applications with absolute references to these internal structures.

Domains created by the CLR loader

Before the first line of managed code is run, three application domains are created. Two of them are not available in managed code and are not even visible to the CLR host. They can only be created when the CLR is loaded by mscoree.dll and mscorwks.dll (or mscorsvr.dll for multiprocessor systems). As you can see in Figure 2, these are the system domain and the shared domain, they can only exist in one instance. The third domain is the default domain, only an instance of this application domain has a name. For a simple CLR host, such as a console application, the name of the default application domain contains the name of the executable image. Additional domains, can be created from managed code using the AppDomain.CreateDomain method or from unmanaged code host using the ICORRuntimeHost interface.
Complex hosts, such as ASP.NET create the required number of domains, according to the number of applications running in the served Web site.
An introduction to the inner workings of the .NET Framework.Let's see how CLR creates objects
Figure 2: Domains created by the CLR loader

System domain

The system domain creates and initializes the SharedDomain and Default Domain. It also loads the mscorlib.dll system library into the shared domain.
The system domain also contains the string constants available in the process boundary, interned explicitly or not explicitly.
String internalization is an optimization feature that is a bit totalitarian in the .NET Framework 1.1 environment, since the CLR does not allow assemblies to optimize this functionality. In this case, memory is used to store only one instance of the string for all string literals in all domains of the application.
The system domain also serves to generate interface identifiers in the process boundary, which are used to create interface maps (InterfaceVtableMaps) in each application domain (AppDomain).
The system domain tracks all domains in the process and provides the functionality to download and upload application domains.

SharedDomain

All domain-neutral code is loaded into the sharedomain. Mscorlib, the system library, is required for user code in all application domains(AppDomains). This library is automatically loaded into the shared domain. Basic types from the System namespace, such as Object, ValueType, Array, Enum, String and Delegate are pre-loaded into this domain by the CLR loader. User code can also be loaded into this domain, with the CLR host application setting LoaderOptimization attributes during a CorBindToRuntimeEx call. The console application can load code into the shared domain by adding the System.LoaderOptimizationAttribute attribute to the application’s Main method. The shared domain also manages an assembly map indexed relative to the base address, the map acts as a reference table for managing common dependencies of assemblies loaded into the default domain and other application domains created in managed code. The default domain serves only to load private user code, which should not be available to other applications.

Default domain

A default domain is an instance of the application domain where the application code normally executes. While some applications require that additional application domains be created at runtime (such as those that have a plugin architecture or applications that generate a significant amount of code at runtime), most applications create a single domain at runtime. All code executed in this domain is contextually constrained at the domain level. If an application creates multiple application domains, any cross-domain access will occur through the .NET Remoting proxy. Additional intradomain boundaries can be created using types inherited from System.ContextBoundObject.
Each application domain has its own SecurityDescriptor, SecurityContext, and DefaultContext, as well as its own heap loader (High-Frequency Heap, Low-Frequency Heap, and Stub Heap),
Handle Table, Large Object Heap Handle Table, Vtable Interest Map Manager, and Assembly Cache.

Loader piles

LoaderHeaps are intended to load various CLR runtime artifacts and optimization artifacts that exist over the lifetime of the domain. These heaps are incremented by predictable fragments to minimize fragmentation. The loader heap differs from the garbage collector (GC) heap (or set of heaps in the case of symmetric SMP multiprocessors) in that the garbage collector heap contains instances of objects, and the loader heap contains system types. Frequently queried structures, such as method tables, method descriptors (MethodDescs), field descriptors (FieldDescs) and interface map are located in the HighFrequencyHeap. Structures that are accessed more rarely, such as EEClass and ClassLoader, and their service tables are placed in the LowFrequencyHeap. StubHeap contains blocks that support code access security (CAS), shell COM calls and P/Invoke calls. Having considered the domains and heap loaders at a high level, now let’s look at their physical organization more closely in the context of the simple application in Figure 3. Let’s stop the execution of the program at "mc.Method1();"and create a domain dump using the extended DumpDomain command of the SOS debugger. Below is the result :

!DumpDomainSystem Domain: 793e9d58, LowFrequencyHeap: 793e9dbc, HighFrequencyHeap: 793e9e14, StubHeap: 793e9e6c, Assembly: 0015aa68 [mscorlib], ClassLoader: 0015ab40</br>Shared Domain: 793eb278, LowFrequencyHeap: 793eb2dc, HighFrequencyHeap: 793eb334, StubHeap: 793eb38c, Assembly: 0015aa68 [mscorlib], ClassLoader: 0015ab40</br>Domain 1: 149100, LowFrequencyHeap: 00149164, HighFrequencyHeap: 001491bc, StubHeap: 00149214, Name: Sample1.exe, Assembly: 00164938 [Sample1], ClassLoader: 00164a78

Figure 3 Sample1.exe

using System;public interface MyInterface1{void Method1();void Method2();}public interface MyInterface2{void Method2();void Method3();}class MyClass : MyInterface1, MyInterface2{public static string str = "MyString";public static uint ui = 0xAAAAAAAA;public void Method1() { Console.WriteLine("Method1"); }public void Method2() { Console.WriteLine("Method2"); }public virtual void Method3() { Console.WriteLine("Method3"); }}class Program{static void Main(){MyClass mc = new MyClass();MyInterface1 mi1 = mc;MyInterface2 mi2 = mc;int i = MyClass.str.Length;uint j = MyClass.ui;mc.Method1();mi1.Method1();mi1.Method2();mi2.Method2();mi2.Method3();mc.Method3();}}

Our console application, Sample1.exe, is loaded into the application domain (AppDomain), which is named "Sample1.exe". Mscorlib.dll is loaded into the SharedDomain, but also appears in the SystemDomain as the kernel system library. The HighFrequencyHeap, LowFrequencyHeap and StubHeap are located in each domain. The system domain and the shared domain use the same class loader (ClassLoader), while the Default AppDomain uses its own.
The result of the command does not show the reserved and used size of the bootloader heap. The high-frequency access heap initially reserves 32Kb and uses 4Kb.
The low-frequency access stub heap initially reserves 8Kb and takes up 4Kb.
Also not shown is the interface map heap (InterfaceVtableMap, hereafter IVMap) Each domain has an interface map, which is created on its own loader heap during the initialization phase of the domain. The interface map heap (IVMap) reserves 4Kb and occupies 4Kb initially. We will discuss the significance of the interface map when we explore the type layout in the following sections.
Figure 2 shows the default Process Heap, the JIT Code Heap, the Garbage Collector Heap (GC) for small objects (SOH) and the Large Object Heap (LOH) (for objects of 85000 bytes or more) to illustrate the semantic difference between them and the Loader Heap. The JIT compiler or runtime compiler generates instructions for the x86 architecture and stores them in the heap for JIT code. The garbage collector heap and the large object heap are heaps that are handled by the garbage collector, and managed objects are created on these heaps.

Basics of types

A type is a fundamental element of programming in .NET. In C# a type can be declared using the following keywords : class, struct and interface. Most types are explicitly created by the programmer himself; however, in special cases of interaction and in remote object call scripts (.NET Remoting), .NET CLR generates types implicitly. These generated types include COM and Runtime Callable Wrappers and Transparent Proxies.
We explore the .NET fundamental types, starting with the stack structure that contains references to an object (usually, the stack is one of the places where an object instance begins its existence).
The code in Figure 4 contains a simple program with a console entry point that calls a static method.
Method1 creates an instance of type SmallClass, which contains an array of bytes used to demonstrate the creation of an object instance in a heap of large LOH objects. The code is trivial, but will be involved in our discussion.
Figure 4 Large and small objects

using System;class SmallClass{private byte[] _largeObj;public SmallClass(int size){_largeObj = new byte[size];_largeObj[0] = 0xAA;_largeObj[1] = 0xBB;_largeObj[2] = 0xCC;}public byte[] LargeObj{get { return this._largeObj; }}}class SimpleProgram{static void Main(string[] args){SmallClass smallObj = SimpleProgram.Create(84930, 10, 15, 20, 25);return;}static SmallClass Create(int size1, int size2, int size3, int size4, int size5){int objSize = size1 + size2 + size3 + size4 + size5;SmallClass smallObj = new SmallClass(objSize);return smallObj;}}

Figure 5 shows a snapshot of a typical fastcall call stack stopped at the breakpoint at the "return smallObj;" line in the Create method. (Fastcall is a .NET call convention that specifies that arguments are passed to functions in registers whenever possible, with other arguments passed across the stack from right to left and then retrieved from the stack by the called function
A local variable of type meaningful or type-value objSize is placed directly on the stack. Variables of reference type, such as smallObj, are stored with a fixed size (4-bit DWORD double word) on the stack and contain the address of instances of objects placed in the usual garbage collector heap.
In traditional C++, this is a pointer to an object; in the managed programming world, this is an object reference. However, it contains the address of an object instance. We will use the term ObjectInstance for the data structure located at the address specified in the object reference.
An introduction to the inner workings of the .NET Framework.Let's see how CLR creates objects
Figure 5. SimpleProgram stack and heaps
An instance of the smallObj object on the normal garbage collector pile contains Byte[] pointing to _largeObj whose size is 85000 bytes (note that the figure shows 85016 bytes, which is the actual size of the occupied area). The CLR treats objects larger than or equal to 85000 bytes differently from smaller objects. Larger objects are placed in the Large Object Heap (LOH), while smaller objects are created in the regular garbage collector heap, which optimizes object placement and garbage collection. LOH is not compressed, and the regular heap is compressed every time garbage collection is performed. Moreover, LOH is cleared only when the garbage collection is complete.
A smallObj instance contains a type descriptor pointing to a method table (MethodTable) of the corresponding type. There will be one method table for each declared object and all instances of the same type will point to the same method table. Also the descriptor will contain information about the type variety (interface, abstract class, concrete class, COM wrapper, proxy), number of implemented interfaces, interface map for method allocation, number of slots in method table and slot table indicating implementation.
One important data structure points to an EEClass. The CLR class loader creates an EEClass from metadata before the method table is generated. In Figure 4, the SmallClass method table points to its EEClass. These structures point to their modules and assemblies. The method table and EEClass are usually located in domain-specific heaps of the loader. Byte[] is a special case; the Method Table and EEClass are located in the public domain loader heaps. Heaps loader refers to a specific domain (domain-specific) and any data structures mentioned earlier, once loaded, will not go anywhere until the domain is unloaded. Also, the default domain cannot be unloaded and hence the code exists until the CLR is stopped.

A copy of the object

As we noticed, all instances of value types are either embedded in the thread stack or embedded in the garbage collector heap. All reference types are created on the garbage collector heap or big object heap(LOH). Figure 6 shows a typical object instance layout. The object can be referenced by a local variable created on the stack, by descriptor tables in situations of external interactions and P/Invoke scenarios, from registers (it could be this-specify and method arguments during method execution) or from the finalizer queue for objects that have finalizer methods. OBJECTREF doesn’t point to the beginning of an object instance, it points at an offset of 4 bytes (DWORD) from the beginning. DWORD is called the object header and contains the index (the synblk synchronizing block number beginning with one) in the SyncTableEntry table. Because the allocation occurs through the index, the CLR can move the table in memory when an increase in size is needed. The SyncTableEntry serves soft references back to the object so that ownership of the sync block can be traced by the CLR. The soft links allow the garbage collector to perform cleanup when no other hard links already exist. SyncTableEntry also stores a pointer to the SyncBlock containing useful information, but less necessary for all instances of the object. This information includes the object’s locks, its hash code, any conversion data, and the domain index (AppDomainIndex). For most object instances, there will be no space allocated for the sync block (SyncBlock) and the syncblock number will be zero. This will change when the executing thread comes across a lock(obj) or obj.GetHashCode expression, as shown below :

SmallClass obj = new SmallClass() // Do some work herelock(obj) { /* Do some synchronized work here */ }obj.GetHashCode();

An introduction to the inner workings of the .NET Framework.Let's see how CLR creates objects
Figure 6. Representation of an object instance
In this code, smallObj will use zero (no syncblk) as its number in the Syncblk Entry Table. The lock instruction forces the CLR to create a syncblock entry and write the corresponding number to the header. Since the lock keyword in C# is deployed in a try-catch block using the Monitor class, a Monitor object is created in SyncBlock for synchronization. Calling the GetHashCode() method fills the Hashcode field with the hash code of the object in SyncBlock.
SyncBlock contains other fields used in interacting with COM and marshalling delegates to unmanaged code, but not related to typical object usage.
The type handler (TypeHandle) follows the syncblk number in the object instance. In order to maintain continuity of reasoning, I will discuss the type handler after explaining the variable instances. The variable list of instance fields follows the type handler. By default, instance fields are placed so that memory usage is efficient and alignment gaps are minimal. The code in Figure 7 contains a simple class SimpleClass that has a set of instance variables contained within it, with different sizes.
Figure 7 SimpleClass with instance variables

class SimpleClass{private byte b1 = 1; // 1 byteprivate byte b2 = 2; // 1 byteprivate byte b3 = 3; // 1 byteprivate byte b4 = 4; // 1 byteprivate char c1 = 'A'; // 2 bytesprivate char c2 = 'B'; // 2 bytesprivate short s1 = 11; // 2 bytesprivate short s2 = 12; // 2 bytesprivate int i1 = 21; // 4 bytesprivate long l1 = 31; // 8 bytesprivate string str = "MyString"; // 4 bytes (only OBJECTREF)//Total instance variable size = 28 bytesstatic void Main(){SimpleClass simpleObj = new SimpleClass();return;}}

Figure 8 contains an example of an instance of a SimpleClass object displayed in the Visual Studio debugger memory window. We set a breakpoint on the return statement in Figure 7 and used the simpleObj address contained in the ECX register to display an instance of the object in the memory viewer. The first 4-byte block is the syncblk number. We don’t use the instance in any code that requires synchronization (and we don’t call the HashCode method), so this field is set to 0. The reference to the object is stored in a stack variable, points to 4 bytes, located at offset 4. Byte variables b1, b2, b3, and b4 are placed side by side with each other. Byte variables b1, b2, b3, and b4 are all placed in a row, side by side. The two variables of type short s1 and s2 are also placed side by side. The string variable str is a 4-byte ODJECTREF pointing to the actual instance of the string located in the garbage collector pile. String is a special type, all instances containing the same text will point to the same instance in the global table of strings – this is done in the process of loading the assembly.This process is called string interning and is designed to optimize memory usage. As we noted earlier in the .NET Framework 1.1 assembly cannot disable the interning process, perhaps future versions of the CLR runtime will provide this capability.
An introduction to the inner workings of the .NET Framework.Let's see how CLR creates objects
Figure 8: Debug window showing an object instance in memory
Thus the lexical sequence of variable members in source code is not supported in memory by default. In external interaction scenarios where the lexical sequence must be carried into memory, the StructLayoutAttribute can be used, which takes the value of the LayoutKind enum as an argument. LayoutKind.Sequential will provide a lexical sequence for marshaled data. In the .NET Framework, this will have no effect on the managed layout (in the .NET Framework 2.0 version, applying the attribute will have an effect). In external interaction scenarios where you actually need to have an additional offset and explicit control over the field sequence, LayoutKind.Explicit can be used in conjunction with the FieldOffset attribute at the field level. Looking at the immediate contents of the memory, let’s use the SOS debugger to look at the contents of the object instance. One useful command is DumpHeap, which allows you to output all the contents of the heap and all instances of a certain type. Instead of using registers, DumpHeap can show the address of the object we just created :

!DumpHeap -type SimpleClassLoaded Son of Strikedata table version 5 from"C:/WINDOWS/Microsoft.NET/Framework/v1.1.4322/mscorwks.dll"Address MT Size00a8197c 00955124 36Last good object: 00a819a0total 1 objectsStatistics:MT Count TotalSize Class Name955124 1 36 SimpleClass

The total size of the object is 36 bytes. No matter how big the string is, SimpleClass instances contain only DWORD OBJECTREF. SimpleClass instance variables occupy only 28 bytes. The remaining 8 bytes include the TypeHandle handler (4 bytes) and the syncblk block number (4 bytes). Once we have the address of the simpleObj instance, let’s dump the contents of that instance using the DumpObj command as shown here :

!DumpObj 0x00a8197cName: SimpleClassMethodTable 0x00955124EEClass 0x02ca33b0Size 36(0x24) bytesFieldDesc*: 00955064MT Field Offset Type Attr Value Name00955124 400000a 4 System.Int64 instance 31 l100955124 400000b c CLASS instance 00a819a0 str<< some fields omitted from the display for brevity > >00955124 4000003 1e System.Byte instance 3 b300955124 4000004 1f System.Byte instance 4 b4

As noted, the default layout generated for classes by the C# compiler is LayoutType.Auto (LayoutType.Sequential is used for structures); thus the class loader rearranges the instance fields to minimize offsets. We can use ObjSize to get a graph that includes the space occupied by the instance, str. Here the resulting output is :
!ObjSize 0x00a8197c
sizeof(00a8197c) = 72 ( 0x48) bytes (SimpleClass)
Son of Strike
SOS is the debug extension used to display the contents of the CLR data structures in this article. It is part of the .NET Framework installation package and is located at %windir%\Microsoft.NET\Framework\v1.1.4322. Before loading SOS into the process, enable controlled debugging in the project properties in Visual Studio .NET. Add the directory where SOS.dll is located to the PATH environment variable. To load SOS when stopped at a breakpoint, open Debug | Windows | Immediate. In the immediate window execute .load sos.dll. Use !help to get a list of debugger commands. For more information about SOS see the msdn documentation Bugslayer column
If you subtract the size of the SimpleClass instance (36 bytes) from the total size of the object graph (72 bytes), you get the size of str, which is 36 bytes. Let’s check this by taking a dump of the str instance. Below is the output of the command :

!DumpObj 0x00a819a0Name: System.StringMethodTable 0x009742d8EEClass 0x02c4c6c4Size 36(0x24) bytes

If you add the size of the str instance (36 bytes) to the size of the SimpleClass instance (36 bytes), you get a total size of 72 bytes, which matches the output of the ObjSize command. Note that ObjSize will not include memory occupied by the syncblk infrastructure. Also, with the .NET Framework 1.1, the CLR is not aware of memory occupied by any unmanaged resources, such as GDI objects, COM objects, filehandlers, and so on; so they will not be reflected by this command.
The type handler (TypeHandle), a pointer to the method table (MethodTable), is located right after the syncblk number. Before creating an object instance, the CLR looks through the loaded types and loads type information if the type is not detected, gets the address of the method table, creates an object instance and puts the value into the TypeHandle of the object instance. The code compiled by the JIT compiler uses the TypeHandle handler to find the MethodTable method table to distribute the methods. Code compiled by the JIT compiler uses a TypeHandler (TypeHandle) to locate the MethodTable to distribute method calls. The CLR uses the type handler (TypeHandle) when you want to find a loaded type through the MethodTable.

MethodTable

Each class and interface, when loaded into the application domain, will be represented in memory by a MethodTable data structure. This is the result of the action of loading classes before the very first object instance is created. While the ObjectInstance instance stores state, the MethodTable stores behavior information. MethodTable associates an object instance with the memory-mapped metadata structures generated by the language compiler using EEClass. The information in the MethodTable method table and the data structures attached to it can be accessed from managed code through System.Type The pointer to the method table can also be obtained even in managed code through the Type.RuntimeTypeHandle property. The TypeHandle handler contained in ObjectInstance points to the offset from the beginning of the method table. This offset is 12 bytes by default and contains information for the garbage collector, which will not be discussed here.
Figure 9 shows a typical representation of a method table. We will show some important type handler fields, but use the figure for a more complete list. Let’s start with Base Instance Size, since it has a direct correlation with the runtime memory profile.
An introduction to the inner workings of the .NET Framework.Let's see how CLR creates objects
Figure 9 Method Table View

Base Instance SizeBase Instance Size

The base instance size is the size of the object, calculated by the class loader, based on the field declarations in the code. As discussed earlier, the current garbage collector implementation requires an object instance size of at least 12 bytes. If the class does not have a single declared instance field, this will result in a redundancy of 4 bytes.
The remaining 8 bytes will be taken up by the Object Header (which may contain the syncblk block number) and the Type Handler (TypeHandle). Again the size of the object can be affected by StructLayoutAttribute.
Let’s look at the memory snapshot (memory window in Visual Studio .NET 2003 ) of the method table for MyClass from Figure 3 (MyClass with two interfaces) and compare this with the output generated with SOS. In Figure 9, the object size is located at a 4-byte offset and has a value of 12 (0x0000000C) bytes. The following is the DumpHeap output from the SOS:

!DumpHeap -type MyClassAddress MT Size00a819ac 009552a0 12total 1 objectsStatistics:MT Count TotalSize Class Name9552a0 1 12 MyClass

Table of method slots

Embedded in the method table, the slot table points to the appropriate method descriptors (MethodDesc) that provide the behavior of the type. The method slot table is created based on a linear list of method declarations arranged in the following order: inherited virtual methods, declared virtual methods, instance methods, static methods. The class loader goes through the metadata of the current class, parent class and interfaces, and creates a table of methods. During shaping, overridden virtual methods are replaced, hidden methods of the parent class are replaced, new slots are created and slots are duplicated as needed. Duplicate slots are necessary to create the illusion that each interface has its own mini vtable. However, the duplicate slots point to the same physical implementation.MyClass has three instance methods, a class constructor (.cctor), and an object constructor (.ctor). The object constructor is automatically generated by the C# compiler for all objects that do not have explicitly defined constructors. The class constructor is generated by the compiler when we have static variables defined and initialized. Figure 10 shows the method table view for MyClass. The view shows 10 methods because there is a duplication of the Method2 slot in IVMap, which will be considered next. Figure 11 shows an editable SOS dump of the MyClass method table.
An introduction to the inner workings of the .NET Framework.Let's see how CLR creates objects
Figure 10 Representation of the MyClass method table
Figure 11 SOS dump of method table for MyClass

!DumpMT -MD 0x9552a0Entry MethodDesc Return Type Name0097203b 00972040 String System.Object.ToString()009720fb 00972100 Boolean System.Object.Equals(Object)00972113 00972118 I4 System.Object.GetHashCode()0097207b 00972080 Void System.Object.Finalize()00955253 00955258 Void MyClass.Method1()00955263 00955268 Void MyClass.Method2()00955263 00955268 Void MyClass.Method2()00955273 00955278 Void MyClass.Method3()00955283 00955288 Void MyClass..cctor()00955293 00955298 Void MyClass..ctor()

The first 4 methods of any type will always be ToString, Equals, GetHashCode and Finalize. These methods are virtual inherited from System.Object. Method2 slot have a duplicate, but both point to the same method descriptor. Explicitly coded .cctor and .ctor will be grouped with static and instance methods, respectively.

Method Descriptor

A MethodDesc is an encapsulation of a method implementation as the CLR understands it. There are many types of method descriptors that support calls to various implementations of external interactions, in addition to managed implementations. In this article, we’ll only look at the managed MethodDesc in the context of the code shown in Figure 3. MethodDesc is generated as part of the class loading process and initially points to an intermediate language (IL). Each MethodDesc method descriptor is populated with PreJitStub content, which is responsible for enabling JIT compilation. Figure 12 shows a typical representation. The method table slot entry actually points to a stub instead of the actual MethodDesc data structure. This entry sits at a negative offset of 5 bytes from the real MethodDesc and is part of the 8-byte fill inherited by each method. These 5 bytes contain instructions for calling the PreJitStub subroutine. This 5-byte offset can be seen from the DumpMT output (for MyClass in Figure 11) of SOS, since MethodDesc is always 5 bytes after the location specified in the method slot table entry. Before the first call, the JIT compilation subroutine is called. After the compilation is done, the 5 bytes containing the call instruction will be overwritten by the unconditional jump command to the JIT compiled code in the x86 architecture.
An introduction to the inner workings of the .NET Framework.Let's see how CLR creates objects
Figure 12 Method descriptor
Disassembling the code pointed to by the entry in the method slot table in Figure 12 will show the call to PreJitStub. Here is an abbreviated output of the disassembly before JIT compilation for Method2:

!u 0x00955263Unmanaged code00955263 call 003C3538 ;call to the jitted Method2()00955268 add eax, 68040000h ;ignore this and the rest;as !u thinks it as code

Now let’s run the method and disassemble the same address :

!u 0x00955263Unmanaged code00955263 jmp 02C633E8 ;call to the jitted Method2()00955268 add eax, 0E8040000h ;ignore this and the rest;as !u thinks it as code

Only the first 5 bytes at this address are code; the rest contain Method2 method data of the method descriptor. The "!u" command is not aware of this and generates meaningless code, which means you can ignore everything after the first 5 bytes.
CodeOrIL before JIT compilation contains the relative virtual address (RVA) of the method implementation in the intermediate language (IL). This field is set to indicate that this is intermediate code. CLR updates this field with the address of the JIT -compiled code after compiling on demand. Let’s choose a method from those displayed and dump the MethodDesc using the DumpMT command before and after the JIT compilation:

!DumpMD 0x00955268Method Name : [DEFAULT] [hasThis] Void MyClass.Method2()MethodTable 9552a0Module: 164008mdToken: 06000006Flags : 400IL RVA : 00002068

After compilation, the MethodDesc looks like this :

!DumpMD 0x00955268Method Name : [DEFAULT] [hasThis] Void MyClass.Method2()MethodTable 9552a0Module: 164008mdToken: 06000006Flags : 400Method VA : 02c633e8

The flags field in the method descriptor is encoded to store information about the type of method, such as static, instance, interface method, or COM implementation.
Let’s look at the other complex aspect of the method table : the implementation of the interfaces. It is made to look simply at the managed environment understanding all the complexities in the representation process. Next, we will look at how interfaces are placed and how the allocation of interface methods really works.

IVMap and Interface Map

At offset 12 in the method table is an important pointer, IVMap. As shown in Figure 9, IVMap points to the application domain-level mapping table, which is indexed by the process-level interface identifier. Each interface implementation will have an entry in the IVMap. If MyInterface1 is implemented by two classes, there will be two entries in the IVMap table. The entry will point back to the beginning of the subordinate table embedded in the method table (MethodTable) of MyClass, as shown in Figure 9. This is the reference by which the interface method is allocated. The IVMap is created based on the interface map information embedded in the method table. The interface map is created based on the class metadata during the method table construction process. Once the type is loaded, only the IVMap is used in the allocation method. The blend 28 field in the method table (Interface Map ) points to the InterfaceInfo record embedded within the method table. In our case, there are two records for each of the two interfaces implemented by MyClass. The first 4 bytes of the first InterfaceInfo record point to the type handler (TypeHandle) of the interface MyInterface1 (see Figure 9 and Figure 10). The next word (2 bytes) is occupied by flags (where 0 is inherited from the parent class and 1 is implemented in the current class). The next word immediately after the flags is the initial slot, which is used by the class loader to place the subordinate interface implementation table. For MyInterface1 the value is 4, which means that slots 5 and 6 point to the implementation. For MyInterface2, the value is 6, which means slots 7 and 8 point to the implementation. The class loader duplicates the slots if necessary to create the illusion that each interface gets its own implementation, although it is physically mapped to the same method descriptor. In the MyClass class, method MyInterface1.Method2 and method MyInterface2.Method2 will point to the same implementation.
Interface method allocation is done through IVMap, while direct method allocation is done through the MethodDesc address stored in the corresponding slot. As noted earlier, the .NET Framework uses a fastcall convention. The first two arguments are usually passed through the ECX and EDX registers, if possible. The first argument of an instance method is always a "this" pointer, which is passed through the ECX register, as shown by the "mov ecx, esi" instruction:

mi1.Method1();mov ecx, edi ;move "this" pointer into ecxmov eax, dword ptr [ecx] ;move "TypeHandle" into eaxmov eax, dword ptr [eax+0Ch] ;move IVMap address into eax at offset 12mov eax, dword ptr [eax+30h] ;move the ifc impl start slot into eaxcall dword ptr [eax] ;call Method1mc.Method1();mov ecx, esi ;move "this" pointer into ecxcmp dword ptr [ecx], ecx ;compare and set flagscall dword ptr ds:[009552D8h];directly call Method1

This disassembled code demonstrates that direct calls to instance methods of MyClass do not use an offset. The JIT compiler writes the address of the method descriptor directly in the code. Interface-based allocation occurs through IVMap and requires slightly more instructions than direct allocation. One of the instructions is used to get the IVMap address and the other is used to get the initial slot of the interface implementation in the method slot table. Also, bringing an object instance to an interface is simply copying that pointer to the target variable. In Figure 2, "mi1 = mc;" uses a single instruction to copy the OBJECTREF from mc to mi1.

Virtual distribution

Let’s take a look now at virtual allocation and compare it to direct and interface-based allocation. Here is the disassembly for the MyClass.Method3 virtual method call from Figure 3:

mc.Method3();Mov ecx, esi ;move "this" pointer into ecxMov eax, dword ptr [ecx] ;acquire the MethodTable addressCall dword ptr [eax+44h] ;dispatch to the method at offset 0x44

Virtual allocation always occurs through a fixed slot number, regardless of the pointer in the method table in the resulting hierarchy of class (type) implementations. In the process of constructing the method table, the class loader replaces the parent implementation with an overridden child implementation. As a result, method calls are encoded against the parent object distributed to the child object’s implementation. Disassembling demonstrates that the distribution occurs through slot number 8 in the debugger memory window (as seen in Figure 10) in the same way as in the DumpMT output.

Static variables

Static variables are an important part of the data structure of the method table. They are located as part of the method table immediately after the slot array of the method table. All primitive static types are inline, while static value objects such as structures and reference types are addressed through OBJECTREFs created in handler tables. The OBJECTREF in the methods table points to an instance of the object created in the heap. Once created, the OBJECTREF in the handler table will keep the object instance in the heap unharmed until the application domain is unloaded. In Figure 9, the static string variable str, points to the OBJECTREF in the handler table, which points to MyString in the garbage collector heap.

EEClass

EEClass appears before the creation of the method table and, in combination with the method table, is the CLR version of the type declaration. In fact, EEClass and the method table are logically the same data structure (together they represent one type) and were separated based on frequency of use. The fields used quite often are in the method table, and the fields not used often are in EEClass. So, the information (such as names, fields and offsets) needed to JIT compile functions are in EEClass, but the runtime data (such as vtable slots and garbage collector information) is in the method table.
One EEClass will be created for each type of application loaded in the domain. This includes interfaces, classes, abstract classes, arrays and structures. Each EEClass is a tree node tracked by the execution engine. The CLR uses this network to navigate through EEClass structures for purposes such as class loading, method table construction, type checking, and type conversions. The relationship of child to parent between EEClasses is established based on the inheritance hierarchy, in turn the relationship of parent to child is established based on a combination of the inheritance hierarchy and the class loading sequence. New EEClass nodes are added, relationships between nodes are superimposed, and new relationships are established during the execution of managed code. There are also horizontal relationships with EEClass twins in the network. EEClass has three fields for managing node relationships between loaded types : ParentClass, SiblingChain twin, and ChildChain. See Figure 13 for a schematic representation of EEClass in the context of the MyClass class from Figure 4.
Figure 13 shows only a few fields relevant to this discussion. Because we missed some fields in the view, we did not show the offsets in this figure. EEClass has cyclic references to the method table. EEClass also points to the method descriptor data blocks located in the default heap of the application domain’s frequent access. The reference to the list of field descriptor objects located on the process heap provides information about the placement of fields during method table construction. EEClass is placed on the low access frequency heap of the application domain, so that the operating system can manage memory pages more efficiently, and as a result, workspace is reduced.
An introduction to the inner workings of the .NET Framework.Let's see how CLR creates objects
Figure 13 EEClass representation
The other fields shown in Figure 13 are secondary and do not need to be explained in the context of MyClass (Figure 3). Let’s look at the actual physical memory by creating an EEClass dump using SOS. Let’s run the program from Figure 3 after setting a breakpoint on the line mc.Method1. First, let’s get the EEClass address for MyClass using the Name2EE command:

!Name2EE C:/Working/test/ClrInternals/Sample1.exe MyClassMethodTable: 009552a0EEClass: 02ca3508Name: MyClass

The first argument of the Name2EE command is the name of the module, which can be obtained from the DumpDomain command. Now we know the address of the EEClass, and we get the dump of the EEClass itself:

!DumpClass 02ca3508Class Name : MyClass, mdToken : 02000004, Parent Class : 02c4c3e4ClassLoader : 00163ad8, Method Table : 009552a0, Vtable Slots : 8Total Method Slots : a, NumInstanceFields: 0, NumStaticFields: 2, FieldDesc*: 00955224MT Field Offset Type Attr Value Name009552a0 4000001 2c CLASS static 00a8198c str009552a0 4000002 30 System.UInt32 static aaaaaaaa ui

Figure 13 and the output of DumpClass look essentially the same. The metadata token (mdToken) represents the MyClass index in the memory mapping metadata tables of the PE file module, the parent class points to System.Object. The twin chain (Figure 13) demonstrates that this is loaded as a result of loading the Program class.
MyClass has eight vtable slots (methods that can be distributed virtually). While Method1 Method2 methods are not virtual, they will be treated as virtual methods when performing allocation through interfaces and are therefore added to the list. Add .cctor and .ctor to the list, and you only get 10(0xA) methods. The class has two static fields at the end. MyClass has no instance fields. The rest of the fields are self-explanatory.

Conclusion

Our tour of some of the most important internal components of the CLR is over. Obviously, there’s a lot more left undiscovered and should have been covered in more depth, but hopefully this will give you some idea of how it works. A lot of the information presented here will probably be changed in future releases of the CLR and the .NET Framework. But still, if the data structures discussed in this article can be changed, the concepts will remain the same.

You may also like