Props to a post I happened to see on reddit around the time I was ready to publish this: https://www.reddit.com/r/Windows10/comments/9d7uqt/yo/

One Method of Simplifying DMP File Analysis

Some time ago we implemented a new VMware Horizon View based VDI  environment and for a while everything seemed to be going well.  Over time customers would periodically call to report that their VDI’s appeared to reboot and in reviewing the machines we noticed some unusual patterns.  After digging around a bit we found cause to be concerned because a significant number of VDI’s were BSOD’ing regularly.

Investigation

Every time a customer reported that their VDI seemed to spontaneously restart I’d check the Event Logs to confirm the restart was due to a bugcheck and not due to Windows Updates.  Once confirmed I’d check for memory dumps then analyze them.  After a while more customers became vocal about the stability of their VDI and the process I was following became tedious:

  1. Check Event Viewer to validate bugcheck vs other process initiated restarts
  2. Check C:\Windows for a MEMORY.dmp
  3. Check C:\Windows\minidump for *.dmp’s
  4. Move the .dmp files to a staging area
  5. Analyze each file with WinDBG

Too many clicks and keyboard action if you ask me. (^_^)

DMP File Analysis Simplification

After evaluating a number of VDI’s I found that in nearly every case the VDI did indeed experience a bugcheck-initiated restart so I stopped performing step 1.  And truthfully, a machine shouldn’t have any .DMP files so even if one was present I wanted to know about it.  With that out of the way I decided to automate the rest of the process in a manner that was good enough for me via a PowerShell script.

Check for & Pull .DMP File(s)

The first function in the script would check key areas for .DMP files, move them to a storage location then populate an array with details on whether or not a .DMP was found.


# centralized bsod collection location
[string]$StorageLocation = '\\Server01\Playground$\CrashDumpStorage'

# store the details of the crash files found
[System.Collections.ArrayList]$CrashDumpDetails = @()

# name of the machine that BSODd<span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span>
[string]$BSODComputer = 'BSODSystem01'

# path to remote machine
[string]$RemotePath = '\\' + $BSODComputer + '\C$'

# path to data storage location
[string]$MoveDestination = Join-Path $StorageLocation $BSODComputer

# check for memory.dmp
[bool]$MemoryDMP = $false
$MemoryDMPDate = ''
if(Test-Path -Path "$RemotePath\Windows\Memory.dmp" -PathType Leaf -ErrorAction Stop)
    {
        if(!(Test-Path -Path $MoveDestination -PathType Container -ErrorAction Stop))
            {
                New-Item -Path $MoveDestination -ItemType directory -ErrorAction Stop | Out-Null
            }

        [bool]$MemoryDMP = $true

        $MemoryDMPDate = (Get-ChildItem -Path "$RemotePath\Windows\Memory.dmp").LastWriteTime
        Move-Item -Path "$RemotePath\Windows\Memory.dmp" -Destination $MoveDestination -Verbose -Force -ErrorAction Stop
    }

# check for mini dumps
[bool]$MiniDMP = $false
$MiniDMPCount = 0
if(Test-Path -Path "$RemotePath\Windows\Minidump" -PathType Container)
    {
        $MiniDMPCount = (Get-ChildItem -Path "$RemotePath\Windows\Minidump" -ErrorAction Stop | Measure-Object).Count
        if($MiniDMPCount -gt 0)
            {
                if(!(Test-Path -Path $MoveDestination -PathType Container -ErrorAction Stop))
                    {
                        New-Item -Path $MoveDestination -ItemType directory -ErrorAction Stop | Out-Null
                    }

                [bool]$MiniDMP = $true
                Get-ChildItem -Path "$RemotePath\Windows\Minidump" | Sort-Object -Property LastWriteTime | Select -First 5 | % { Move-Item -Path $_.FullName -Destination $MoveDestination -Verbose -Force -ErrorAction Stop }
            }
    }

# add details to arraylist
$CrashDumpDetails += [pscustomobject]@{
    ComputerName = $BSODComputer;
    DumpFileDir = $MoveDestination;
    MemoryDMP = $MemoryDMP;
    MemoryDMPDate = $MemoryDMPDate;
    MiniDMP = $MiniDMP;
    MiniDMPCount = $MiniDMPCount;
}

I would store the output of the function in a variable so the operator (typically me) can see the stats:


$GetCrashDumpFiles | Format-Table -AutoSize

ComputerName    Status DumpFileDir                            MemoryDMP MemoryDMPDate        MiniDMP MiniDMPCount OtherDMP OtherDMPCount
------------    ------ -----------                            --------- -------------        ------- ------------ -------- -------------
BSODMACHINE001  OK     \\Server01\Playground$\BSODMACHINE001      False                        False            0    False             0
BSODMACHINE002  OK     \\Server01\Playground$\BSODMACHINE002       True 9/4/2018 10:35:02 AM    True            2    False             0

Analyzing the .DMP Files

Now that all the .DMP files are in one place I can analyze them with cdb.exe from the Debugging Tools for Windows.

I settled on using an Run Script File instead of hard-coding the command the function allowing me to point to a specific file depending on the operation.  I had a few ‘analysis command files’ (aka run script files) with various commands within, but my core file contains only !analyze -v which is sufficient for my day-to-day tasks.

From there it was just a matter of looping through the directories for .DMP files, running cdb against them and storing the output in a text file for review.


# hold the cdb analysis results
[System.Collections.ArrayList]$AnalysisDetails = @()

# run script file / analysis command file
[string]$AnalysisCommandsFile = "\\Server01\Playground$\DebuggingTools\AutomaticAnalysis\Core.Analysis.1.txt"

# path to cdb.exe
[string]$CDBEXE = '\\Server01\Playground$\DebuggingTools\10\x64\cdb.exe'

# loop through each dmp file and analyze it
foreach($DMPFile in $(get-childitem -path '\\Server01\Playground$\BSODMACHINE002' -Filter *.dmp))
    {
        # generate a log for each dmp file named after both the dump fil and the run script file / analysis command file used
        $Log = (Split-Path -Path $DMPFile.FullName -Parent) + "\" + (Split-Path -Path $DMPFile.FullName -Leaf) + '_' + (Split-Path -Path $AnalysisCommandsFile -Leaf).ToString().Replace('.txt','.log')

        # run cdb against the .dmp fiile using the run script file / analysis command file
        $ServiceTiming = Measure-Command { &$CDBEXE -z "$($DMPFile.FullName)" -c "`$`$<$AnalysisCommandsFile;Q" | Tee-Object -FilePath $Log }

        # capture the exit code
        $CDBExitCode = $LASTEXITCODE

        # populate the analysis results into the array
        $AnalysisDetails += [pscustomobject]@{
            DumpFile = $DMPFile.FullName;
            Date = $DMPFile.LastWriteTime;
            CDBStatus = $CDBExitCode;
            AnalysisDuration = $ServiceTiming;
        }

        Remove-Variable Log,ServiceTiming,CDBExitCode -ErrorAction SilentlyContinue
    }

And you'll see a log file for each .DMP file in the directory:

BSODAnalysis-001

The output again provides some potentially useful stats:


$AnalysisDetails | Sort -Descending -Property Date | Format-Table -AutoSize

DumpFile                                                  Date                 CDBStatus AnalysisDuration
--------                                                  ----                 --------- ----------------
\\Server01\Playground$\BSODMACHINE002\090418-15656-01.dmp 9/4/2018 10:35:25 AM         0 00:00:41.0912218
\\Server01\Playground$\BSODMACHINE002\Memory.dmp          9/4/2018 10:35:02 AM         0 00:00:11.0954740
\\Server01\Playground$\BSODMACHINE002\070218-10000-01.dmp 7/2/2018 9:49:21 AM          0 00:00:29.8022249

And although this is pretty helpful as is, you’re left to open each .log file to see what’s going on.  But that’s no fun.

[Try to] Determine Probable Cause

The bugcheck analysis typically lists the file that was likely the cause of the BSOD right at the top of the log and since the formatting is consistent, we can just extract that from the file.

foreach($LogFile in $(Get-ChildItem -Path '\\Server01\Playground$\BSODMACHINE002' -Filter *.log))
    {
        $ProbablyCausedBy = (Select-String -Path $LogFile.FullName -Pattern "Probably caused by").Line.ToString().Replace('Probably caused by : ','')
        Write-Host "$($LogFile.FullName) BSOD probably caused by: $ProbablyCausedBy"
        Remove-Variable ProbablyCausedBy -ErrorAction SilentlyContinue
    }

That output looks like this:


\\Server01\Playground$\BSODMACHINE002\070218-10000-01.dmp_Core.Analysis.1.log BSOD probably caused by: vm3dmp.sys ( vm3dmp+25900 )
\\Server01\Playground$\BSODMACHINE002\090418-15656-01.dmp_Core.Analysis.1.log BSOD probably caused by: vm3dmp.sys ( vm3dmp+25760 )
\\Server01\Playground$\BSODMACHINE002\Memory.dmp_Core.Analysis.1.log BSOD probably caused by: vm3dmp.sys ( vm3dmp+25760 )

Putting it All Together

With this done, I can now throw an array of machines to kick off the whole process, walk away and come back to some meaningful information.

After much analysis, web scouring, testing and opening a case with VMware we confirmed the root cause of our VDI’s BSOD’ing was due to a flaw [bug] in our version of VMware Tools, 10.1.15.  Basically when someone reconnected to their session, if the display parameters had changed (e.g.: Started the VDI session on their home machine with a resolution of 1024×768 then came into the office, resumed their session with a resolution of 1920×1080; OR Possibly caused by a difference in DPI) it would cause the machine to BSOD.  The customer would never see a BSOD because of the nature of the VDI client – it would just look like a [spontaneous] restart.

The fix was to upgrade to VMware Tools 10.2.x.   But VDA 7.2.x wasn’t compatible with VMware Tools 10.2.x which meant upgrading the VDA to 7.4.x.
The correct remediation procedure looks like this:

  • Uninstall VDA 7.2.x
  • Uninstall VMware Tools 10.1.x
  • Reboot
  • Install VMware 10.2.x
  • Reboot
  • Install VDA 7.4.x
  • Reboot

If you’re using AppVolumes or a GPU then you’ll need to modify this accordingly because order of operations is key!

Because this process required multiple restarts, and more importantly killed one’s ability to reconnect to the machine, I created a Task Sequence to handle this.  Once the VDI’s were updated, they were rock solid.

The full .DMP analysis PowerShell script can be found here.

In Conclusion

Although this script may not be perfect, it meets my needs of simplifying what would normally be a manual process, allowing me to focus more on providing great timely customer service and just ‘getting it done‘ as efficiently as possible.  In fact, I’ve used this script to analyze .DMPs on physical assets making that process easy.  In a future version of this I’d like to intelligently handle processed .DMP files and logs so we’re not constantly analyzing the same set of files. (i.e.: After analysis move them into a ‘Processed’ directory or something)

I may have been able to simplify this a bit by using existing products, like Nir Sofir’s BlueScreen View or something else I’m not yet aware of.  But there’s often a challenge in introducing new software into an environment and many NirSoft utilities are classified as PUA’s so they’re killed with extreme prejudice.  So, I try to stick with the tools Microsoft provides to get the job done when it makes sense to do so.

Remember, this is not the way, it’s just a way.

Good Providence to you!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s