Benchmarks: Excellent Power Efficiency With 5th Gen AMD EPYC Using amd-pstate & Power Profiles

The AMD EPYC 9005 “Turin” processors that launched last year offer excellent performance and power efficiency out-of-the-box. For those wanting to pursue maximum power efficiency and running in the most optimal configuration for performance-per-Watt, AMD EPYC BIOS tunables as well as recent Linux kernel driver improvements can help in driving even greater efficiency. Today’s article is a look at the impact of the AMD P-State driver usage and options with recent kernel versions as well as the Power Profile Selection BIOS option for the impact on 5th Gen EPYC performance and power efficiency.

AMD Volcano CPU cooling for EPYC Turin

As covered in several Phoronix articles over the past few months, with the Linux 6.13 kernel and newer the AMD P-State driver is used by default on EPYC 9005/Turin processors and future AMD server processors moving forward – assuming the server platform/motherboard supports ACPI Collaborative Processor Performance Control (CPPC). Over the generic ACPI CPUFreq CPU frequency scaling driver, the AMD P-State driver is able to make more informed frequency/power selection and paired with Energy Performance Preference (EPP) hints that can be set by the user / server administrators, there is much greater power/performance control than with ACPI CPUFreq as is used by default on pre-6.13 kernels and prior generation AMD EPYC servers.

AMD Volcano server

Those not having read the prior Phoronix articles on AMD P-State for EPYC servers can find all of the kernel driver documentation on kernel.org. Simply put, this transition from ACPI CPUFreq to AMD P-State can allow for greater performance and power efficiency when moving to a Linux 6.13+ kernel. Adjusting the EPP value allows for communicating your preference for performance or power efficiency.

AMD Power Profile Selection

In addition to the ACPI EPP value, via the server BIOS the “Power Profile Selection” can be set for going from the default high performance mode to an efficiency mode, maximum I/O performance, balanced memory performance, balanced core performance, or balanced core memory performance mode. Alternatively, the Power Profile Selection can also be adjusted at run-time using the AMD HSMP/APML software. For those really wanting to optimize for power efficiency without compromising much performance, the “Balanced Memory Performance” mode is recommended for most workloads in constraining the memory/fabric/xGMI performance to workload bandwidth and latency needs while utilizing the remaining TDP for maximizing the CPU core frequency.

AMD Balanced Memory mode

For this article I tested the following CPU frequency scaling governor and driver modes, EPP values, and Power Profile Selections for demonstrating the impact of these changes on AMD EPYC 9005 power and performance:

acpi-cpufreq performance [Old Default] - The default on pre-6.13 kernels where ACPI CPUFreq is used with the performance governor and default Power Profile.

amd-pstate performance, EPP performance [New Default] - The “new” default on Linux 6.13+ for those using an AMD EPYC 9005 series server that supports ACPI CPPC. The AMD P-State driver with performance CPU frequency governor and EPP “performance” preference and the default (high performance) Power Profile selection.

amd-pstate powersave, EPP power - The AMD P-State default above but switching to a “power” Energy Performance Preference bias.

amd-pstate powersave, EPP power, Balanced Memory - Switching over to the “Balanced Memory” Power Profile selection and with EPP power setting and using the powersave governor.

amd-pstate powersave, EPP performance, Balanced Memory - Switching over to the “Balanced Memory” Power Profile selection and with EPP performance setting and using the powersave governor. This is one of the recommended configurations by AMD engineers for those wanting to maximize their power efficiency on modern Linux servers. ([R] for recommended on graphs.)

amd-pstate powersave, EPP performance - Using the default High Performance Power Profile while switching to the powersave governor and performance EPP preference.

acpi-cpufreq schedutil - The “old” default of using ACPI CPUFreq and for Linux distributions that default to using the scheduler utilization governor “Schedutil” rather than the performance governor.

All of this testing was done on the AMD Volcano reference server platform using two AMD EPYC 9755 processors. These 128-core Zen 5 server processors were running at their default speeds and other defaults except where otherwise noted (the difference in clock speed from the system table amounts to sysfs reporting differences for the clock speed depending upon whether ACPI CPUFreq or AMD P-State was utilized).

AMD EPYC 9755 Linux Power Comparison Benchmarks

Dozens of different benchmarks were carried out across these different power/performance configurations for this dual AMD EPYC 9755 server running Ubuntu 24.10 with the Linux 6.13 stable kernel. The CPU power consumption, thermals, and peak core frequency were monitored during each benchmark being run.

ACES DGEMM benchmark with settings of Sustained Floating-Point Rate. amd-pstate powersave, EPP performance was the fastest.

In a number of the workloads tested on this dual socket EPYC 9755 server, the raw performance was minimally changed when applying some of the power efficiency optimizations…

ACES DGEMM benchmark with settings of Sustained Floating-Point Rate. amd-pstate powersave, EPP performance was the fastest.

But sure enough when applying the “Balanced Memory” Power Profile mode from the BIOS, there was a clear difference in the CPU power consumption. The combined CPU power consumption of the dual EPYC 9755 128-core processors dropped from a peak of 560~577 Watts down to 535 Watts. The average EPYC 9755 2P power consumption in the recommended balanced memory mode was at 95% the power use of the default mode.

ACES DGEMM benchmark with settings of Sustained Floating-Point Rate. amd-pstate powersave, EPP performance was the fastest.

Thus a nice boost to the performance-per-Watt in the ACES DGEMM benchmark for example when running in the balanced memory mode for some power savings without costing much in terms of performance.

Xcompact3d Incompact3d benchmark with settings of Input: X3D-benchmarking input.i3d. amd-pstate powersave, EPP performance, Balanced Memory [R] was the fastest.

Xcompact3d Incompact3d benchmark with settings of Input: X3D-benchmarking input.i3d. amd-pstate powersave, EPP performance, Balanced Memory [R] was the fastest.

Xcompact3d Incompact3d benchmark with settings of Input: X3D-benchmarking input.i3d. amd-pstate powersave, EPP performance, Balanced Memory [R] was the fastest.

With the Incompact3D HPC benchmark where the CPUs were running fully utilized, there wasn’t any major difference to the performance but the Balanced Memory mode led the combined EPYC 9755 2P power consumption to run at 797~815 Watts compared to an average of 828 Watts at the defaults. So again some energy efficiency advantages without being really detrimental to the EPYC server performance.

DaCapo Benchmark benchmark with settings of Java Test: Avrora AVR Simulation Framework. amd-pstate powersave, EPP performance was the fastest.

Moving to some of the OpenJDK/Java workloads is where they are much more susceptible to driver/governor/power-profile changes. With the Avrora simulation framework some nice performance while using the AMD P-State driver and better than the former default with ACPI CPUFreq. Even with the balanced memory mode of operation was a good showing for raw performance.

DaCapo Benchmark benchmark with settings of Java Test: Avrora AVR Simulation Framework. amd-pstate powersave, EPP performance was the fastest.

When looking at the combined EPYC 9755 CPU power consumption though is where the Balanced Memory mode impact was very evident.

DaCapo Benchmark benchmark with settings of Java Test: Avrora AVR Simulation Framework. amd-pstate powersave, EPP performance was the fastest.

Running with the amd-pstate powersave configuration with EPP performance setting and Balanced Memory mode led to 1.19x the energy efficiency at the defaults or 1.44x compared to the ACPI CPUFreq performance default formerly used on older kernels/distributions.

DaCapo Benchmark benchmark with settings of Java Test: Apache Tomcat. amd-pstate performance, EPP performance [Default] was the fastest.

DaCapo Benchmark benchmark with settings of Java Test: Apache Tomcat. amd-pstate performance, EPP performance [Default] was the fastest.

DaCapo Benchmark benchmark with settings of Java Test: Apache Tomcat. amd-pstate performance, EPP performance [Default] was the fastest.

DaCapo Benchmark benchmark with settings of Java Test: Tradebeans. amd-pstate performance, EPP performance [Default] was the fastest.

DaCapo Benchmark benchmark with settings of Java Test: Tradebeans. amd-pstate performance, EPP performance [Default] was the fastest.

DaCapo Benchmark benchmark with settings of Java Test: Tradebeans. amd-pstate performance, EPP performance [Default] was the fastest.

DaCapo Benchmark benchmark with settings of Java Test: Tradesoap. amd-pstate performance, EPP performance [Default] was the fastest.

DaCapo Benchmark benchmark with settings of Java Test: Tradesoap. amd-pstate performance, EPP performance [Default] was the fastest.

DaCapo Benchmark benchmark with settings of Java Test: Tradesoap. amd-pstate performance, EPP performance [Default] was the fastest.

Across a variety of Java workloads these power tuning optimizations were helping significantly in delivering improved energy efficiency of this 5th Gen AMD EPYC server without costing too much raw performance and often times the recommended mode was still better than the former ACPI CPUFreq default.

DaCapo Benchmark benchmark with settings of Java Test: Apache Kafka. amd-pstate performance, EPP performance [Default] was the fastest.

DaCapo Benchmark benchmark with settings of Java Test: Apache Kafka. amd-pstate performance, EPP performance [Default] was the fastest.

DaCapo Benchmark benchmark with settings of Java Test: Apache Kafka. amd-pstate performance, EPP performance [Default] was the fastest.

The impact of the Balanced Memory Power Profile was one of my takeaways from this testing with being able to see the clear benefit for those wanting to run their AMD EPYC 9005 servers with delivering great performance but ensuring you are running with optimal power use.

Renaissance benchmark with settings of Test: Apache Spark PageRank. amd-pstate powersave, EPP power was the fastest.

Renaissance benchmark with settings of Test: Apache Spark PageRank. amd-pstate powersave, EPP power was the fastest.

Renaissance benchmark with settings of Test: Apache Spark PageRank. amd-pstate powersave, EPP power was the fastest.

Apache Cassandra benchmark with settings of Test: Writes. amd-pstate powersave, EPP performance was the fastest.

Apache Cassandra benchmark with settings of Test: Writes. amd-pstate powersave, EPP performance was the fastest.

Apache Cassandra benchmark with settings of Test: Writes. amd-pstate powersave, EPP performance was the fastest.

The AMD P-State CPU frequency scaling driver continues to prove to be a very practical improvement for AMD EPYC 9005 servers whether you are after the best performance or optimal power efficiency.

SVT-AV1 benchmark with settings of Encoder Mode: Preset 13, Input: Bosphorus 4K. amd-pstate powersave, EPP performance was the fastest.

SVT-AV1 benchmark with settings of Encoder Mode: Preset 13, Input: Bosphorus 4K. amd-pstate powersave, EPP performance was the fastest.

SVT-AV1 benchmark with settings of Encoder Mode: Preset 13, Input: Bosphorus 4K. amd-pstate powersave, EPP performance was the fastest.

AMD Ryzen systems with ACPI CPPC support have been defaulting to AMD P-State for a number of kernel releases while it’s great to see this change materialize now for AMD EPYC servers moving forward.

SVT-AV1 benchmark with settings of Encoder Mode: Preset 5, Input: Bosphorus 4K. amd-pstate performance, EPP performance [Default] was the fastest.

SVT-AV1 benchmark with settings of Encoder Mode: Preset 5, Input: Bosphorus 4K. amd-pstate performance, EPP performance [Default] was the fastest.

SVT-AV1 benchmark with settings of Encoder Mode: Preset 5, Input: Bosphorus 4K. amd-pstate performance, EPP performance [Default] was the fastest.

Kvazaar benchmark with settings of Video Input: Bosphorus 4K, Video Preset: Ultra Fast. amd-pstate powersave, EPP performance was the fastest.

Kvazaar benchmark with settings of Video Input: Bosphorus 4K, Video Preset: Ultra Fast. amd-pstate powersave, EPP performance was the fastest.

Kvazaar benchmark with settings of Video Input: Bosphorus 4K, Video Preset: Ultra Fast. amd-pstate powersave, EPP performance was the fastest.

For multi-threaded video encoding workloads there were nice improvements to power efficiency in the balanced memory mode without significantly hindering the raw performance.

Timed Node.js Compilation benchmark with settings of Time To Compile. amd-pstate powersave, EPP performance was the fastest.

Timed Node.js Compilation benchmark with settings of Time To Compile. amd-pstate powersave, EPP performance was the fastest.

Timed Node.js Compilation benchmark with settings of Time To Compile. amd-pstate powersave, EPP performance was the fastest.

Timed Eigen Compilation benchmark with settings of Time To Compile. acpi-cpufreq performance [Old Default] was the fastest.

Timed Eigen Compilation benchmark with settings of Time To Compile. acpi-cpufreq performance [Old Default] was the fastest.

Timed Eigen Compilation benchmark with settings of Time To Compile. acpi-cpufreq performance [Old Default] was the fastest.

Timed LLVM Compilation benchmark with settings of Build System: Ninja. amd-pstate powersave, EPP performance, Balanced Memory [R] was the fastest.

Timed LLVM Compilation benchmark with settings of Build System: Ninja. amd-pstate powersave, EPP performance, Balanced Memory [R] was the fastest.

Timed LLVM Compilation benchmark with settings of Build System: Ninja. amd-pstate powersave, EPP performance, Balanced Memory [R] was the fastest.

For code compilation workloads like compiling the large Node.js codebase, LLVM, or Eigen the efficiency benefits of the Balanced Memory Power Profile remained clear.

ASTC Encoder benchmark with settings of Preset: Very Thorough. amd-pstate performance, EPP performance [Default] was the fastest.

ASTC Encoder benchmark with settings of Preset: Very Thorough. amd-pstate performance, EPP performance [Default] was the fastest.

ASTC Encoder benchmark with settings of Preset: Very Thorough. amd-pstate performance, EPP performance [Default] was the fastest.

The increased efficiency of the Balanced Memory mode and AMD P-State tuning were also clear for other workloads like ASTC texture compression.

Geometric Mean Of All Test Results benchmark with settings of Result Composite, AMD EPYC 9755 Linux Power Comparison Benchmarks. amd-pstate performance, EPP performance [Default] was the fastest.

When taking the geometric mean of the raw performance across 50+ different workloads, the modern default on Linux 6.13+ of AMD P-State performance was about 2% faster overall than the prior default of ACPI CPUFreq. A little bit of uplift at large and not at all bad considering all of the original AMD EPYC 9005 series benchmarking was done with the ACPI CPUFreq driver. When tuning for efficiency with the Balanced Memory Power Profile and powersave governor with EPP performance bias, there was still 97.8% the performance of the new default configuration.

CPU Power Consumption Monitoring Overview benchmark with settings of Accumulated CPU Power Consumption Monitoring.

The energy efficiency tuning with achieving 97.8% the performance of the AMD P-State performance default becomes very interesting when looking at the combined AMD EPYC 9755 CPU power consumption. That tuned configuration led to the EPYC 9755 2P power consumption at 92% the power of AMD P-State or 87.8% the power consumption on average of the ACPI CPUFreq performance default. That equates to a very nice improvement in EPYC Turin efficiency without losing much on the raw performance side.

CPU Temp Monitoring Overview benchmark with settings of Accumulated CPU Temp Monitoring.

The CPU power reduction also equated to slightly lower CPU core temperatures.

CPU Peak Freq Monitoring Overview benchmark with settings of Accumulated CPU Peak Freq Monitoring.

Lastly is a look at the peak CPU clock frequency observed during the entire duration of the testing.

For those wanting to maximize the efficiency of AMD EPYC 9005 series servers, the Balanced Memory Power Profile is quite interesting when paired with the new AMD P-State driver default. It’s certainly worth exploring and evaluating for your particular workloads if really concerned about achieving the peak performance-per-Watt without sacrificing performance. The results of ACPI CPUFreq vs. AMD P-State drivers also continues to look very good for those running an up-to-date kernel in production. Further benchmarks looking at these tunables on more AMD EPYC 9005 series hardware will be coming up in future articles on Phoronix.