While working on a new version of wtrace, I am analyzing the PerfView source code to learn how its various features work internally. One of such features is the call stack resolution for ETW events. This post will show you how to use the TraceEvent library to decode call stacks, and, as an exercise, we will write a sampling process profiler. Before we start, remember to set DisablePagingExecutive to 1. That is a requirement to make call stacks work for ETW sessions.
❗ ❗ ❗ Visit wtrace.net to receive updates on wtrace and my other troubleshooting tools. ❗ ❗ ❗
This month marks ten years since I started this blog 🥂🥂🥂. On this occasion, I would like to thank you for being my reader! Let’s celebrate with a new post on ETW 🙂
Empty paths issue in the wtrace output has been bugging me for quite some time. As I started working on a new wtrace release (coming soon!), there came the right moment to fix it. I’ve seen other people struggling with this problem too, so I thought that maybe it’s worth a blog post 🙂 Wtrace uses the TraceEvent library to interact with the ETW API, and in this post, I will use this library as well. Note that this issue affects only the real-time ETW sessions.
While analyzing the PerfView source code, I stumbled upon an interesting README file in the src/OSExtensions folder:
// The OSExtensions.DLL is a DLL that contains a small number of extensions
// to the operating system that allow it to do certain ETW operations.
// However this DLL is implemented using private OS APIs, and as such should
// really be considered part of the operating system (until such time as
// the OS provide the functionality in public APIs).
// To discourage taking dependencies on these internal details we do not
// provide the source code for this DLL in the open source repo.
// IF YOU SIMPLY WANT TO BUILD PERFIVEW YOU DO NOT NEED TO BUILD OSExtensions!
// A binary copy of this DLL is included in the TraceEvent\OSExtensions.
// However we don't want this source code to be lost. So we check it in
// with the rest of the code but in an encrypted form for only those few
// OS developers who may need to update this interface. These people
// should have access to the password needed to unencrpt the file.
// As part of the build process for OSExtension.dll, we run the command 'syncEncrypted.exe'.
// This command keeps a encrypted and unencrypted version of a a file in sync.
// Currently it is run on this pair
// OSExtensions.cs <--> OSExtesions.cs.crypt
// Using a password file 'password.txt'
// Thus if the password.txt exists and OSExtesions.cs.crypt exist, it will
// unencrypt it to OSExtesions.cs. If OSExtesions.cs is newer, it will
// be reencrypted to OSExtesions.cs.crypt.
Hmm, private OS APIs seem pretty exciting, right? A simple way to check these APIs would be to disassemble the OSExtensions.dll (for example, with dnSpy). But this method would not show us comments. And for internal APIs, they might contain valuable information. So let’s see if we can do better.
Recently while examining a slow request issue (I have a plan to describe this investigation in a seperate post) it came to me that every time I open the Thread Time view it takes a moment to understand what this view actually contains. I decided to write this post for me and for any of you who share the same feeling about this window :).
Today I would like to share with you an interesting (I hope) diagnostics case in one of our system services. The IngestService (that is its name) was not starting properly for the first time – it was being killed because of exceeding the default 30s timeout. But the second try was always successful. No exception was thrown and no logs could be found in the event logs. It’s a situation when ETW traces might shed some light on what’s going on. As it was a .NET service I used PerfView to record the trace file. An important checkbox to select when diagnosing thread wait times is the Thread Time box:
After collecting the traces on production, I merged them and copied to my developer machine.