Benchmark methodology

Novabench benchmarks produce scores that are comparable across Windows, macOS, and Linux, and across different hardware configurations.

Design principles

Novabench combines synthetic throughput tests with workloads that mirror common application patterns.

Synthetic tests like raw memory bandwidth, GPU compute, and CPU scalar and SIMD operations measure the peak performance capability of each component under controlled conditions. These tests are sensitive to hardware differences, and useful for isolating the performance of a specific subsystem.

Other test patterns mix compute and data access patterns that reflect how software uses hardware in practice. The CPU test includes hash and compression operations alongside arithmetic workloads. The GPU test renders a 3D scene with geometry, textures, lighting, and shaders. The storage test uses both sequential and random access patterns that mirror typical file operations.

Together, these tests produce scores that reflect both raw hardware capability and the performance users experience in practice.

Fixed-duration tests

Each benchmark workload runs for a fixed duration rather than a fixed amount of work. Novabench measures how much work the hardware completes in that time. This approach scales naturally across hardware generations and leads to predictable test times.

Pre-test warmup period are run before every test to help calibrate workloads and help reduce variance between tests.

Thermal gaps

A short gap is introduced between each test to give hardware time to cool down. The thermal gap length was chosen empirically to reduce the impact of thermal throttling to the performance of tests the run later in the test sequence.

Cross-platform comparability

A Novabench Score of 2,000 is designed to mean the same thing whether it was produced on a Windows desktop, a MacBook, or a Linux workstation.

The computational work performed by each test is the same across platforms. Novabench runs functionally equivalent workloads on all target operating systems and architectures.

Multi-iteration runs

While one test iteration is designed to measure system capabilities with minimal variance, multiple iteration tests can be enabled for testing that requires more precision, at the expense of longer test times.

Sensor data collection

On Plus, Novabench collects sensor data during the benchmark. Sensor readings are captured at regular intervals throughout each test and stored alongside the results.

What sensors are collected

Sensor	Description
Temperature	CPU and GPU core temperatures in degrees Celsius
Power draw	Watts consumed by the CPU and GPU during each test phase
Clock speed	Processor and GPU clock frequencies in MHz or GHz

How sensor data helps

Sensor data transforms a benchmark score from a single number into a diagnostic tool:

Temperature trends: a score that drops across iterations, combined with rising temperatures, confirms thermal throttling. The sensor chart shows exactly when throttling begins.
Power delivery: unusually low power draw during a GPU test might indicate a power supply issue or a power-saving mode limiting performance.
Clock speed stability: a processor that maintains its boost clock throughout the test is delivering its full rated performance. Fluctuating clock speeds suggest thermal or power constraints.

Test validation and result integrity

Novabench includes several mechanisms to ensure result integrity:

Hardware detection: Novabench identifies your hardware (CPU model, GPU model, RAM configuration, drive type) and records it with each result. This enables accurate comparisons against identical configurations.
Failure handling: if a test encounters an error (for example, a GPU driver crash during the 3D test), the result is flagged. Flagged results are excluded from comparison data.
Skipped test handling: tests that do not apply to your system (for example, the NPU test on a system without an NPU) are cleanly omitted without affecting other scores.
Test isolation: each component benchmark runs independently. A failure in one test does not affect the others.

Best practices for reliable results

Follow these guidelines to get the most accurate and repeatable benchmark results:

Close background applications: other applications can compete with the benchmark for system resources. Close them before benchmarking.
Plug in the power adapter: on laptops, battery power typically limits processor and GPU speed. Benchmark on AC power for full performance.
Set the power plan: use Balanced or High Performance (Windows) or disable Low Power Mode (macOS) to ensure the system is not artificially limiting performance.
Let the system cool: if you just finished an intensive workload, wait a few minutes for temperatures to return to idle before benchmarking. Elevated starting temperatures reduce the available thermal headroom.

Understanding your scores: what scores mean and how to interpret them
Comparing results: using histograms, percentile rankings, and Explain to put scores in context
Stress test: validating system stability under sustained load