Writing a .net debugger (part 3) – symbol and source files

In this part I will show you how to load module debugging symbols (PDB files) into the debugger and how to bind them with source files. This can’t be achieved without diving into process, thread and module internals so we will examine these structures also.

Our small debugger mindbg after the last part (part 2) is attached to the appdomains and receives events from the debuggee. Before we start dealing with symbols and sources I will quickly explain what changes were made to the already implemented logic.

I created a new class that will be a parent for all debuggee events:

/// <summary>
/// A base class for all debugging events.
/// </summary>
public class CorEventArgs
{
    private readonly CorController controller;

    /// <summary>
    /// Initializes the event instance.
    /// </summary>
    /// <param name="controller">Controller of the debugging process.</param>
    public CorEventArgs(CorController controller)
    {
        this.controller = controller;
    }

    /// <summary>
    /// Gets the controller.
    /// </summary>
    /// <value>The controller.</value>
    public CorController Controller { get { return this.controller;  } }

    /// <summary>
    /// Gets or sets a value indicating whether debugging process should continue.
    /// </summary>
    /// <value><c>true</c> if continue; otherwise, <c>false</c>.</value>
    public bool Continue { get; set; }
}

All events are now dispatched to the process that they belong to. As an example take a look at the Breakpoint event handler in CorDebugger:

void ICorDebugManagedCallback.Breakpoint(ICorDebugAppDomain pAppDomain, ICorDebugThread pThread, ICorDebugBreakpoint pBreakpoint)
{
    var ev = new CorBreakpointEventArgs(new CorAppDomain(pAppDomain, p_options), 
                                        new CorThread(pThread), 
                                        new CorFunctionBreakpoint(
                                               (ICorDebugFunctionBreakpoint)pBreakpoint));
    
    GetOwner(ev.Controller).DispatchEvent(ev);
    
    FinishEvent(ev);
}

DispatchEvent method is implemented in the CorProcess. For each type of event that we are interested in, we have an overloaded version of this method. Example:

/// <summary>
/// Handler for CorBreakpoint event.
/// </summary>
public delegate void CorBreakpointEventHandler(CorBreakpointEventArgs ev);

/// <summary>
/// Occurs when breakpoint is hit.
/// </summary>
public event CorBreakpointEventHandler OnBreakpoint;

internal void DispatchEvent(CorBreakpointEventArgs ev)
{
    // stops executing by default (further handlers may change this)
    ev.Continue = false;

    // calls external handlers
    OnBreakpoint(ev);
}

We want also to stop the debugger on the Main method of the executable module so we will create a function breakpoint in ModuleLoad event handler (more about breakpoints will be in the next part of the series):

internal void DispatchEvent(CorModuleLoadEventArgs ev)
{
    if (!p_options.IsAttaching)
    {
        var symreader = ev.Module.GetSymbolReader();
        if (symreader != null)
        {
            // we will set breakpoint on the user entry code
            // when debugger creates the debuggee process
            Int32 token = symreader.UserEntryPoint.GetToken();
            if (token != 0)
            {
                // FIXME should be better written (control over this breakpoint)
                CorFunction func = ev.Module.GetFunctionFromToken(token);
                CorBreakpoint breakpoint = func.CreateBreakpoint();
                breakpoint.Activate(true);
            }
        }
    }
    ev.Continue = true;
}

That’s all about events – I made also some minor changes in other parts of the application but I don’t think they are important enough to be mentioned in this post :). So let’s focus on the main topic.

I want to display source code for the location where the breakpoint was hit. So first let’s subscribe to the breakpoint event on the newly created process:

var debugger = DebuggingFacility.CreateDebuggerForExecutable(args[0]);
var process = debugger.CreateProcess(args[0]);

process.OnBreakpoint += new MinDbg.CorDebug.CorProcess.CorBreakpointEventHandler(process_OnBreakpoint);

The handler code is as follows:

static void process_OnBreakpoint(MinDbg.CorDebug.CorBreakpointEventArgs ev)
{
    Console.WriteLine("Breakpoint hit.");
    
    var source = ev.Thread.GetCurrentSourcePosition();

    DisplayCurrentSourceCode(source);
}

There are two methods that are mysterious here: CorThread.GetCurrentSourcePosition and DisplayCurrentSourceCode. Let’s start from GetCurrentSourcePosition method. When a thread executes application code it uses a stack to store function’s local variables, arguments and return address. So each stack frame is associated with a function that is currently using it. The most recent frame is the active frame and we may retrieve it using ICorDebugThread.GetActiveFrame method:

public CorFrame GetActiveFrame()
{
    ICorDebugFrame coframe;
    p_cothread.GetActiveFrame(out coframe);
    return new CorFrame(coframe, s_options);
}

and use it to get the current source position:

public CorSourcePosition GetCurrentSourcePosition()
{
    return GetActiveFrame().GetSourcePosition();
}

Inside the active CorFrame we have an access to the function associated with it:

/// <summary>
/// Gets the currently executing function.
/// </summary>
/// <returns></returns>public CorFunction GetFunction()
{
    ICorDebugFunction cofunc;
    p_coframe.GetFunction(out cofunc);
    return cofunc == null ? null : new CorFunction(cofunc, s_options);
}

/// <summary>
/// Gets the source position.
/// </summary>
/// <returns>The source position.</returns>
public CorSourcePosition GetSourcePosition()
{
    UInt32 ip;
    CorDebugMappingResult mappingResult;

    frame.GetIP(out ip, out mappingResult);

    if (mappingResult == CorDebugMappingResult.MAPPING_NO_INFO ||
        mappingResult == CorDebugMappingResult.MAPPING_UNMAPPED_ADDRESS)
        return null;

    return GetFunction().GetSourcePositionFromIP((Int32)ip);
}

The ip variable represents the instruction pointer which (after msdn) is the stack frame’s offset into the function’s Microsoft intermediate language (MSIL) code. That basically means that the ip variable points to the currently executed code. The question now is how to bind this instruction pointer with the real source code line stored in a physical file. Here symbol files come into play. Symbol files (PDB files) may be considered as translators of the binary code into the human readable source code. Unfortunately whole logic behind symbol files is quite complex and explaining it thoroughly would take a lot of space (which might be actually a good subject for few further posts :)). For now let’s assume that symbol files will provide us with the source file path and line coordinates corresponding to our instruction pointer value. I tried to implement the symbol readers and binders on my own but this subject overwhelmed me and I finally imported all symbol classes and interfaces from MDBG source code. So I will just show you how to use these classes and if someone is not satisfied with it he/she may look and analyze content of the mindbg\Symbols folder.

Each module (CorModule instance) has its own instance of the SymReader class (created with help of the SymbolBinder):

public ISymbolReader GetSymbolReader()
{
    if (!p_isSymbolReaderInitialized)
    {
        p_isSymbolReaderInitialized = true;
        p_symbolReader = (GetSymbolBinder() as ISymbolBinder2).GetReaderForFile(
                                GetMetadataInterface<IMetadataImport>(),
                                GetName(),
                                s_options.SymbolPath);
    }
    return p_symbolReader;
}

Moving back to the CorFrame.GetSourcePosition method code snippet you might have noticed that in the end it called GetSourcePositionFromIP method CorFunction instance associated with this frame. Let’s now load source information from symbol files for this function:

// Initializes all private symbol variables
private void SetupSymbolInformation()
{
    if (p_symbolsInitialized)
        return;

    p_symbolsInitialized = true;
    CorModule module = GetModule();
    ISymbolReader symreader = module.GetSymbolReader();
    p_hasSymbols = symreader != null;
    if (p_hasSymbols)
    {
        ISymbolMethod sm = null;
        sm = symreader.GetMethod(new SymbolToken((Int32)GetToken())); // FIXME add version
        if (sm == null)
        {
            p_hasSymbols = false;
            return;
        }
        p_symMethod = sm;
        p_SPcount = p_symMethod.SequencePointCount;
        p_SPoffsets = new Int32[p_SPcount];
        p_SPdocuments = new ISymbolDocument[p_SPcount];
        p_SPstartLines = new Int32[p_SPcount];
        p_SPendLines = new Int32[p_SPcount];
        p_SPstartColumns = new Int32[p_SPcount];
        p_SPendColumns = new Int32[p_SPcount];

        p_symMethod.GetSequencePoints(p_SPoffsets, p_SPdocuments, p_SPstartLines,
                                        p_SPstartColumns, p_SPendLines, p_SPendColumns);
    }
}

You may see that our function is represented in Symbol API as SymMethod which contains a collection of sequence points. Each sequence point is defined by the IL offset, source file path, start line number, end line number, start column index and end column index. IL offset is actually the value that interests us most because it is directly connected to the ip variable (which holds instruction pointer value). So finally we are ready to implement CorFunction.GetSourcePositionFromIP method:

public CorSourcePosition GetSourcePositionFromIP(Int32 ip)
{
    SetupSymbolInformation();
    if (!p_hasSymbols)
        return null;

    if (p_SPcount > 0 && p_SPoffsets[0] <= ip)
    {
        Int32 i;
        // find a sequence point that the given instruction
        // pointer belongs to
        for (i = 0; i < p_SPcount; i++)
        {
            if (p_SPoffsets[i] >= ip)
                break;
        }

        // ip does not belong to any sequence point
        if (i == p_SPcount || p_SPoffsets[i] != ip)
            i--;

        CorSourcePosition sp = null;
        if (p_SPstartLines[i] == SpecialSequencePoint)
        {
            // special type of sequence point
            // it indicates that the source code 
            // for this part is hidden from the debugger

            // search backward for the last known line 
            // which is not a special sequence point
            Int32 noSpecialSequencePointInd = i;
            while (--noSpecialSequencePointInd >= 0)
                if (p_SPstartLines[noSpecialSequencePointInd] != SpecialSequencePoint)
                    break;

            if (noSpecialSequencePointInd < 0)
            {
                // if not found in backward search
                // search forward for the first known line
                // which is not a special sequence point
                noSpecialSequencePointInd = i;
                while (++noSpecialSequencePointInd < p_SPcount)
                    if (p_SPstartLines[noSpecialSequencePointInd] != SpecialSequencePoint)
                        break;
            }

            Debug.Assert(noSpecialSequencePointInd >= 0);
            if (noSpecialSequencePointInd < p_SPcount)
            {
                sp = new CorSourcePosition(true,
                                           p_SPdocuments[noSpecialSequencePointInd].URL,
                                           p_SPstartLines[noSpecialSequencePointInd],
                                           p_SPendLines[noSpecialSequencePointInd],
                                           p_SPstartColumns[noSpecialSequencePointInd],
                                           p_SPendColumns[noSpecialSequencePointInd]);
            }
        }
        else
        {
            sp = new CorSourcePosition(false, p_SPdocuments[i].URL, p_SPstartLines[i], p_SPendLines[i],
                                        p_SPstartColumns[i], p_SPendColumns[i]);
        }
        return sp;
    }
    return null;
}

And the second mysterious function – DisplayCurrentSourceCode – from the beginning of the post is as follows:

static void DisplayCurrentSourceCode(CorSourcePosition source)
{
    SourceFileReader sourceReader = new SourceFileReader(source.Path);

    // Print three lines of code
    Debug.Assert(source.StartLine < sourceReader.LineCount && source.EndLine < sourceReader.LineCount);
    if (source.StartLine >= sourceReader.LineCount ||
        source.EndLine >= sourceReader.LineCount)
        return;

    for (Int32 i = source.StartLine; i <= source.EndLine; i++)
    {
        String line = sourceReader[i];
        bool highlightning = false;

        // for each line highlight the code
        for (Int32 col = 0; col < line.Length; col++)
        {
            if (source.EndColumn == 0 || col >= source.StartColumn - 1 && col <= source.EndColumn)
            {
                // highlight
                if (!highlightning)
                {
                    Console.ForegroundColor = ConsoleColor.Yellow;
                    highlightning = true;
                }
                Console.Write(line[col]);
            }
            else
            {
                // normal display
                if (highlightning)
                {
                    Console.ForegroundColor = ConsoleColor.Gray;
                    highlightning = false;
                }
                Console.Write(line[col]);
            }
        }
    }
}

SourceFileReader class is just a simple text file reader which reads the whole file at once and stores all lines in a collection of strings. What’s the final result? Have a look:

There is a lot more to say about symbols and source files. I hope that in further posts I will show you how to download symbols from symbol store and source files from repositories. As usually the source code for this post may be found at mindbg.codeplex.com (revision 55200).

3 thoughts on “Writing a .net debugger (part 3) – symbol and source files

  1. joymon February 27, 2012 / 11:19

    I am trying to write a debugger using MSFT provided MDbgCore.dll.Below is what I am trying to achieve.

    – From a list of break points (xml file) I set the break points in the application.
    – Up on break point notification I am just printing the function name.
    – I am using the StopReason method instead of the event CorProcess.OnBreakpoint
    – This is because I am not able to get the function name in the event handler.

    But up on break point hit I am getting an string valie “The stop reasonUnexpected raw breakpoint hit” instead of the StopReason object.

    Can you help me to resolve this ? Or any suggestions how to get the function name in the OnBreakpoint event?

    Thanks in advance

    • Sebastian Solnica March 5, 2012 / 14:25

      Hi joymon,

      Sorry for such a delay with my response. The breakpoint in the Main method is set automatically when you start debugging an application – unless you modified the DebuggingFacility code. Could you specify how do you set breakpoints and when? Is it on debugger startup? Also, it is possible to get a function name OnBreakpoint event but this functionality is not yet implemented in mindbg 😦 You may want to try mdbg debugger (http://www.microsoft.com/download/en/details.aspx?id=2282) which will allow you to walk the call stack on breakpoint and check the current function name.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.