Puget Systems’ Perspective on Intel CPU Instability Issues

You may have heard about instability issues with Intel Core 13th and 14th Gen desktop processors. The issue has attracted more attention as time goes on. I am posting to share what we’ve experienced here at Puget Systems and what we’re doing about it.

The issue gained attention over the past three months, led by GamersNexus, Level1Techs, and others. Details were vague at first because the issue tended to surface only over time. Early on it looked like it was related to motherboard power management, making it uncertain whether it was a motherboard issue, an Intel issue, or both. As time went on and more information and speculation developed, the issue became particularly alarming because it seemed to represent a physical degradation of the processor, which is not recoverable. The concern is not only with the nature of the instability, but the incident rate. Some game development studios and cloud gaming providers have come forward with concerning failure rates upwards of 50%.

Last week, Intel posted an official statement, in which they shared that elevated voltage requests to the processor was a significant factor, and that a microcode patch would be delivered once validated, with mid-August as the target release date. Intel did not officially address the physical degradation issue, but the general community consensus is that the microcode update is expected to prevent, but not reverse, that degradation.

How Puget Systems is Unique

At Puget Systems, we HAVE seen the issue, but our experience has been much more muted in terms of timeline and failure rate. In order to answer why, I have to give a little bit of history.

Going all the way back to 2017, with the Intel 8700K processor, we published an article titled Why Do Hardware Reviewers Get Different Benchmark Results? which helped call attention to the fact that motherboards were shipping with “Multicore Enhancement” enabled, which set the CPU “All Core Turbo” to be equal to the “Single Core Turbo” frequency. This essentially was overclocking the CPU, by pushing it past official Intel specifications, and had negative effects on stability and temperatures. At Puget Systems, we have always valued stability first and we actively made the choice to follow Intel specifications. Behind the scenes, this meant encouraging Intel to make those specifications public on Intel ARK and pushing motherboard ODMs to follow Intel guidance as their default settings. JayzTwoCents helped drive public awareness of the issue, and for a short time it appeared that things were back on track.

Since that time, our stance at Puget Systems has been to mistrust the default settings on any motherboard. Instead, we commit internally to test and apply BIOS settings — especially power settings — according to our own best practices, with an emphasis on following Intel and AMD guidelines. With Intel Core CPUs in particular, we pay close attention to voltage levels and time durations at which those levels are sustained. This has been especially challenging when those guidelines are difficult to find and when motherboard makers brand features with their own unique naming.

Nevertheless, we kept that approach with confidence due to the high amount of real-world testing we do here. We’ve even developed our own suite of PugetBench Benchmarks, whose goal is to test real-world scenarios, guided by years of experience and learning through our customers and partners. Our approach has always led us to be conservative with our power settings, especially when we have shown that the real-world performance impact to be a small 1-2% range.

Puget Systems Intel Core Failure Rates

So, with that understanding of WHY we may be seeing things differently than others in the industry — what ARE we seeing here at Puget Systems?

Even though failure rates (as a percentage) are the most consequential, I think showing the absolute number of failures illustrates our experience best. I decided to go back all the way to the launch of Intel Core 10th Gen to give some historical perspective. Starting with 10th Gen, we have only sold the top 2 SKUs (XX700K and XX900K) in volume, which gives us a nice clean set of data.

Looking at that chart, you’ll notice a few things. First, your attention undoubtedly is drawn to the recent spike of failures with Intel Core 14th Gen. Second, you can see that Intel Core 11th Gen CPUs had a failure rate at nearly the same level, even though it didn’t get as much press at that time, that I can recall. Third, I’ll draw your attention to a steady and elevated failure rate on 13th Gen processors.

I can also plot this same data, but instead of coloring it by CPU generation, I’ll color it based on whether we caught the issue on our production floor (shop failure), or if the issue made it out to the customer (field failure). Obviously, a field failure is dramatically more severe of a problem because it now impacts our customer experience.

The most concerning part of all of this to us here at Puget Systems is the rise in the number of failures in the field, which we haven’t seen this high since 11th Gen. We’re seeing ALL of these failures happen after 6 months, which means we do expect elevated failure rates to continue for the foreseeable future and possibly even after Intel issues the microcode patch.

Based on this information, we are definitely experiencing CPU failures higher than our historical average, especially with 14th Gen. We have enough data to know that we don’t have an acute problem on the horizon with 13th Gen — it is more of a slow burn. We do expect an elevated failure rate on 14th Gen while Intel finishes finding a root cause and issuing a microcode update. While the number of failures we are experiencing is definitely higher than our historical average, it is difficult to classify 5-7 failures a month in the field as a huge issue, and it is definitely a lower rate of failure than we are hearing about from others in the industry. The recent spike in 14th Gen failure rates stands out mostly because how incredibly low historical CPU failure rates tend to be.

We believe that our commitment to internally developed power settings is why we have been much less impacted than others by these Intel stability issues. This is shaping our approach over the coming months.

Failure Rates in Context

Everything I’ve shown you so far is our raw number of failures, but what matters most is failure rate percentages. Let’s look at total failure rates in the context of multiple generations and with comparison to AMD Ryzen CPUs.

You can see that in context, the Intel Core 13th and 14th Gen processors do have an elevated failure rate but not at a show-stopper level. The concern for the future reliability of those CPUs is much more the issue at hand, rather than the failure rates we are seeing today. If it is true that the 14th Gen CPUs will continue to have increasing failures over time, this could end up being a much bigger problem as time goes by and is something we will, of course, be keeping a close eye on. 14th Gen isn’t as rock solid as Intel’s 10th or 12th Gen processors, but at least for us, it isn’t yet at critical levels.

Based on the failure rate data we currently have, it is interesting to see that 14th Gen is still nowhere near the failure rates of the Intel Core 11th Gen processors back in 2021 and also substantially lower than AMD Ryzen 5000 (both in terms of shop and field failures) or Ryzen 7000 (in terms of shop failures, if not field). We aren’t including AMD here to try to deflect from the issues Intel is currently experiencing but rather to put into context why we have not yet adjusted our Intel vs. AMD strategy in our workstations.

Our Plan of Action

Even if we are not seeing the same level of failures as others, this is a real issue that we are addressing internally. But what exactly are we doing about it?

  1. In the majority of cases, we are staying the course for now. Various BIOS updates have been launched by motherboard manufacturers to provide more conservative power settings, but in our opinion, they don’t quite hit the mark. They are either too conservative in some places (leading to unacceptable loss in performance) or they are not conservative enough. We trust our internally developed settings more. We also are concerned with the rise in failure rate, but it is not at a level of severity that changes our CPU recommendations for our customer workflows.
  2. We will immediately validate the Intel microcode update when it is released. We will start with internal testing for stability and performance. If it passes that testing, we will begin using it on our shipping configurations as soon as possible.
  3. We will contact all our affected customers to provide the Intel microcode update. We will do this after gaining some internal experience and confidence with the update, and have developed detailed guides on how to install it while preserving our recommended BIOS settings.
  4. We are extending our warranty to 3 years for all customers affected by this issue, regardless of warranty purchased. With a Puget Systems PC, you should be able to count on it working for you. If we no longer have supply of 13th or 14th Gen processors, we’ll upgrade you to a more current generation.

We’ll all stay tuned together for Intel to release their microcode update in August. In the meantime if you have any questions or concerns, please reach out to me directly or our support team!