October 16, 2015

Tales from the Field: Taming Transparent Huge Pages on Linux

Helix Core

The story you are about to read is true. Only the names have been changed to protect the innocent.

Just the Facts, Ma'am

Recently, several customers with large Helix user-bases experienced unresponsive servers and hung processes on multi-core Linux machines with large physical memory. The servers were provisioned adequately for the loads they were receiving but behaved as if they were overwhelmed. After much troubleshooting, research, and investigation, we determined that the culprit was the Transparent Huge Page (THP) feature of certain versions and/or distributions of Linux.

Read on to see if you may also be affected.

The Symptoms

In one customer case study, it was observed that processes would abruptly start spinning; one particular process, khugepaged, appeared to be consuming high CPU cycles and most of the other processes on the system couldn’t even find CPU time to finish their tasks[1]. For example:

  • A yum process was observed stuck after the dependency checking
  • A “ps” command without any switches barely worked
  • A “ps –auxfw” command became stuck 

The Troubleshooting Steps

The nightly builds kicked off at 2 a.m., while some developers across the globe, during their work hours, started synchronizing large amount of code. The commands failed to respond and appeared hung. Unsure if it was the Helix engine or something else, the administrator called us to seek assistance in finding the root cause.

While investigating the Helix logs to locate any clues, we found many entries with unexpectedly high system CPU time in the 'usage' tracking entries, such as:

--- usage 180+104505us 3264+80io 0+0net 15460k 0pf

These numbers are the calculated deltas between calls to getrusage(). In this case, the user CPU time was small (180ms) but the system CPU time was very large (104,505ms). This indicated that the Helix Engine was doing only a small amount of work that required a large amount of system overhead to achieve. This suggested a deeper issue within the OS and not the Helix Engine itself.

The system logs had distinct system call madvise() entries along with isolate_freepages() calls. vmstat output showed the following counters being consistently incremented:

  • thp_fault_alloc
  • thp_collapse_alloc
  • thp_fault_fallback 
  • thp_collapse_alloc_failed
  • thp_zero_page_alloc 
  • thp_zero_page_alloc_failed 

This indicated that the system calls had effectively led the Linux kernel to allocate Transparent Huge Pages (THP). The kernel was in fact scanning the memory at an astonishing frequency in order to allocate those pages as soon as it aligned with the huge page size. This can very easily waste memory; for example, a 2MB mapping that only ever accesses 1 byte still results in 2MB of wired memory, instead of one 4KB page.

The next natural course of action was to investigate the Transparent Huge Page configuration[2]:

# cat  /sys/kernel/mm/transparent_hugepage/enabled was set to “madvise” 

# cat  /sys/kernel/mm/transparent_hugepage/defrag was set to “madvise” 

# cat  /sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page was set to “1” 

# cat  /sys/kernel/mm/transparent_hugepage/khugepaged/page_to_scan was set to “0” 

# cat  /sys/kernel/mm/transparent_hugepage/khugepaged/defrag was set to 1

Note that:

  • The “enabled” parameter turned madvise huge page allocation scheme on.
  • The “defrag” parameter determined the allocation region.
  • The “use_zero_page” parameter used a huge zero page on read fault.
  • The “page_to_scan” parameter was scanning huge pages relentlessly since the time in milliseconds was set to zero, which resulted in khugepage consuming exponentially high CPU cycles.
  • The last parameter suggested that Transparent Huge Page daemon was enabled.

This showed that THP was the likely cause of the excessive CPU utilization. Disabling the Transparent Huge Page allocation was an option, so we tried it and after executing the following commands, we saw the system return to life[3]:

# echo never >/sys/kernel/mm/transparent_hugepage/enabled

# echo never >/sys/kernel/mm/transparent_hugepage/defrag

# echo 0 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page

# echo 0 >/sys/kernel/mm/transparent_hugepage/khugepaged/defrag

The Outcome

Upon completion of troubleshooting this specific instance we determined that some Linux Kernel versions, such as 3.5.4, 3.5.5, 3.6.6 and 3.6.7 reported similar symptoms[4]. We also learned that the Transparent Huge Page feature can affect any application; in this case, it just happened to be the Helix Engine.

The Conclusion

Use the Transparent Huge Page feature wisely or disable it completely if the performance degradation is consistently noticed. Disabling THP has no adverse effect on the performance of the Helix Engine; in fact, it can benefit the Helix Engine to predictably rely on the OS to prioritize its tasks with efficient memory allocation and optimized user+system CPU cycles.

Here is the reference to the THP Kernel bug on Redhat Linux distribution: https://bugzilla.redhat.com/show_bug.cgi?id=879801.

[1] Not all Linux distributions will show the khugepaged process in the top output, yet the THP issue may still be a factor.

[2] This can be Linux distribution specific; please consult the documentation for your particular distro.

[3] More information can be found here.

[4] These symptoms may not be limited to the Linux Kernel versions that are listed here; THP was first enabled by default in Linux Kernel 2.6.32 on popular distributions.