Static Code Analysis of Unreal Engine 4
Read along to learn what Greschishchev discovered or you can jump ahead to the section that interests you the most:
Time-Lapse Analysis of Unreal Engine 4
I was inspired to run an analysis of the popular game engine. Here's the entire story, from install to analysis to results.
With every release, the Klocwork R&D department is given the opportunity to participate in a Hackathon. The four-day Hackathon is dedicated to any project as long as it’s loosely related to Klocwork.
In the same week, participants present their ‘hack’ to their peers. The open format of the hackathon gives an opportunity for R&D to showcase features that may otherwise may not be a part of our product roadmapping exercise.
I was the winner of Klocwork’s first Hackathon and having owned the highly coveted Hacksaw trophy for a few sprints now, it was time to defend my title.
Inspired by PVS-Studio’s blog, “How the PVS-Studio Team Improved Unreal Engine’s Code", I decided to take on the challenge of running Klocwork on one of the leading game engines, Unreal Engine 4.
There were many questions my static analysis nerd brain wanted to be answered, like:
- Why did PVS-Studio require a team and 17 days to perform analysis?
- Can our static analyzer even analyze this?
- Why was the number of real PVS-Studio bugs so low for such a large project?
- Why were edits made “because of the need to please the analyzer”?
There was only one (big) problem… I only had two workdays to do this. After realizing the game engine's codebase is over 2 million lines of code, I immediately ran to the documentation department, stole their video camera, and started my Hackathon time lapse.Back to top
Static Analysis in the Game Development Industry
Before we get into the deep technical stuff, I’d like to provide some context to this project. The Klocwork static analyzer is quite experienced. Not only in its age, but in its evolution. The analyzer evolved by consuming trillions of lines of code from mission-critical software.
In my (relatively) short time here, I’ve witnessed static analysis of NASA code destined for moons, military code for redacted, and banking software which ensures the rounding scam in Office Space doesn’t happen again. All of this software follows a common pattern: legacy code for legacy standards, bare to the metal compilers with concise feature sets, and source mostly written with static analysis compliance in mind. You know what they say, you are what you eat.
Unlike the mission-critical software industries, one can make the argument that the game development industry is the opposite: There is incredibly little legacy code since game engines are rewritten from the ground up at least once a decade. The compilers (and developers) abuse every optimization trick available for their target platform. Code is written with no regard for static analysis compliance (as it should be). It is a “we ship with bugs” industry.
A game industry developer’s mentality towards static analysis results are also the opposite. A medical device manufacturer would gladly sort through a 99% false positive rate to find the 1% true issue while our friends at Nintendo and Capcom would simply uninstall our tool if we fed their developers noise all day.
The good news for the static analysis vendors is that the fundamentals of static analysis are common to all industries. Build some abstract syntax trees (AST) and run some xpath-like checkers. Build some axioms via path analysis and run some more source and sink checkers.
What does differentiate the industries is the complexity and the abstraction of the code analyzed. Good luck trying to statically track memory allocation and deallocation in EA’s proprietary standard type libraries (EASTL) in Frostbite 3 out-of-the-box. I assure you this is a non-issue over at TD Bank and their C++98 compliant Point of Sale (POS).
What I’m trying to get at is analysis of software in the game development industry is really hard. This is the industry pushing the boundaries of compilers and their feature sets (clang, C++11/14/17). This is the industry where functional testing over unit testing is the standard. This is the industry that expects our Visual Studio 2015 integration to be available before Microsoft has even released the IDE to the public.
I believe game development is the industry through which static analysis should judge its success because the rest is relatively easy.Back to top
3 Steps to Analyze Unreal Engine 4
Step 1: Get the Thing to Build
Fortunately for me, the game engine's “Getting Started” documentation is excellent. After signing up over at unrealengine.com and checking out their GitHub project I was good to go. Unfortunately, the internet speeds in Canadia are not the greatest and I burnt a good chunk of valuable time waiting for git checkout. After the checkout, while waiting for the “Development Editor” configuration to compile via MSBuild 2013, I took some time to poke around the game engine's code.
The first thing I hunted for was evidence of conditional compilation and assert macros in the source code. There’s a good chance the “Development Editor” configuration in Visual Studio conditionally disabled a lot of asserts via the preprocessor, yet the disabled code may still contain valuable information for the static analyzer to consume. The compilation of the game engine took about 30 minutes and during that time I managed to find a few assert macros that I wanted our analyzer to retain, such as “check”, “verify”, etc. I notepad-ed them into a text file for use later.
Now that I verified that the game engine compiles without errors on my platform, I was ready to start the static analysis process.
Step 2: The Static Analyzer Needs to Know What to Analyze
Klocwork coined the term “build specification”. The build specification is a human-readable text file providing the analyzer with information on how the project’s compiler interprets files in your project. Information such as sizeof() standard types, predefined macros assumed by the preprocessor, “compiler features” that are actually bugs (thanks Microsoft), etc.
Klocwork ships with a collection of out-of-the-box tools to generate the build specification, such as Visual Studio project parsers, compiler wrappers, etc. I decided to go with our Swiss army knife, a utility called “kwinject”. On the command line, instead of calling msbuild ue4.sln /t:rebuild, I called kwinject msbuild ue4.sln /t:rebuild to capture compilation information and populate the build specification.
I’m guessing this step is similar to the CLMonitor.exe utility that PVS-Studio had to write and, guessing again, this utility is only able to analyze the game engine's build configurations that target the cl.exe compiler (Windows and XBox). Our build specification generation tools have the additional advantage of capturing C# and Java compilation from more than 200 different compilers, such as Sony’s ORBIS Clang PS4 compiler or Nintendo’s wacky ARM compilers.
Step 3: Start the Source Code Analysis
With 75% of my time remaining (1 minute, 30 seconds into the 5-minute time lapse video above) I moved all the tuning sliders to “Please, destroy my machine” and started the static analysis engine. My 8-core CPU was pinned to 100% for the next few hours as the analysis baseline was built. As soon as the analysis results started streaming in, I began reviewing them.
Reviewing the Unreal Engine 4 Results
The game engine's source contained significantly more C# code than I expected. Given the short time frame, I made a decision to defer reviewing C# analysis results for a later time and focus my efforts on C/C++ issues.
There were 1,311 C/C++ issues detected and, by the end of the Hackathon, I managed to review 302 of them. This is my first exposure to Unreal Engine source and reviewing source of something you’re not the domain expert of is quite a challenge. Out of the 302 issues reviewed, 213 of them appeared to be legitimate findings of various severities. From the 213 legitimate findings, I believe only 39 of the findings were severe and conclusive enough to warrant immediate attention. Extrapolating from my findings, I suspected there were ~169 issues in the game engine's codebase that was severe enough to warrant attention.
Here are a few issues that I found interesting (and concise enough) for this blog post.
- Oculus’ low persistence mode check in HeadMountedDisplayCommon has a NULL pointer dereference.
- Comparison of unsigned value against 0 is always true.
- BlueprintNodeHelpers not so helpful.
- Suspicious NULL checks related to replicated movement in CharacterMovementComponent.
- Suspicious handling of streaming media types in Windows Movie Player plugin.
- Suspicious NULL check in Velocity Render’s HasVelocity.
How Unreal Engine 4 Stood Up to Static Analysis
Klocwork analyzed 2,292,918 lines of code from the game engine and detected 1,311 C/C++ issues of varying severities. Out of 1311 detected issues, 302 were reviewed by a human. Of those reviewed, 213 appeared to be legitimate findings.
From the 213 legitimate findings, only 39 were believed to be severe and conclusive enough to warrant immediate attention. Extrapolating from the human-reviewed sample, it is suspected that there are ~169 issues in the game engine's source code which are severe enough to warrant immediate attention.
By comparing issue density in the game engine to other codebases in the game development industry, the game engine’s code quality is in a league of its own, achieving issue density nearing large mission critical software projects. I would guess that Unreal Engine 4’s code quality success is attributed to two factors: static analysis by PVS-Studio and Unreal’s open source initiative.
Given that, I was not able to fully understand the analysis time invested by PVS-Studio. The more than 1,800 issues PVS-Studio detected in their most recent analysis is an incredible number of issues to review, let alone commit fixes for in a span of 17 days.
Their statement, “the number of real bugs detected in the code is very small for such a large project,” makes it safe to assume that the PVS-Studio analysis of Unreal Engine 4 was a battle of citing false positives, making edits “because of the need to please the analyzer,” and not coding bug fixes. Regardless of PVS-Studios’ false positive rate, about ~169 high severity issues were missed.
We hope you enjoyed this breakdown of how a leading game development platform stacked up against static code analysis. This project exemplifies the power of static code analysis tools for not only supporting quality code, but also increasing the velocity of static code testing. If you’d like to take Klocwork for a spin, we offer a free trial.Back to top