Processors, Threads, and Processes: Is the PC in for a multi-core future?

Processors, Threads, and Processes: Is the PC in for a multi-core future?

For many decades, all processors had only one core and one thread. It took a long time before the appearance of the first Dual Core CPUs. Now you can have 8, 12, 16 or more cores in your home computer CPU. Modern PCs have processors that can process many threads at the same time. All thanks to developments in the field of design and manufacture of microcircuits. But what are threads, and why is it so important that the CPU can handle more than one thread? In this article, the reader will find the answer to these and other questions.

What is a stream?

Simply put, a processor thread is a set of data, the shortest sequence of instructions needed to complete a computational task. This may be a very short list, but it can also be huge in length. What affects this is the process that threads are a part of (as shown below).

CPU

So now we have a new question to answer (i.e. what is a process?), but luckily it’s just as easy to solve. If you are using Windows on your computer, press the Windows key and X and select Task Manager from the list that appears.

Learn important technologies for the tester at hand, and get $1300 already through the robot

REGISTER!

By default, it opens in the Processes tab and you should see a long list of processes currently running on your computer. Some of them will be stand-alone programs that run on their own without user input.

CPU

Others will be an application that you can control directly. Some of them can generate additional background processes – tasks that are performed behind the scenes at the direction of the main program.

If you switch to the Performance tab in Task Manager and then select the CPU section, you will see how many processes are currently running as well as the total number of active threads.

Every time a process wants to access a file, whether in RAM or on storage, a file descriptor is created. Each one is unique to the process that created it, so a single file can have multiple handles.

Returning to threads, Task Manager doesn’t say much about them – for example, the number of threads associated with each process is not shown. Luckily, Microsoft has another program called Process Explorer to help with this.

CPU

Here we can see a much more detailed overview of the various processes and their threads.

Note how some programs generate relatively few instruction sequences (for example, the Corsair iCUE plugin host has only one), while other programs number in the hundreds, such as the System.

It is the operating system that generates most of these threads. The OS then proceeds to create and manage them on its own.

CPU

The final destination for any thread is the central processing unit (CPU). This device takes a list of instructions, translates them into a “language” that it understands, and then performs the assigned tasks.

Deep inside the processor, special hardware stores threads for analysis, and then sorts their list of instructions to best match what the processor is doing at the time.

CPU

Even on older Pentium processors, thread instructions can be slightly reordered to maximize performance. Modern CPUs contain extremely complex thread management tools due to the sheer number of threads they have to manage.

If the stream contains a sequence of “If…then…else” instructions, the prediction circuit evaluates the most likely outcome. The answer to that guess then causes the CPU to dig into its instruction store and then execute the ones required by the logic solution.

If the “prediction” was correct, then a significant amount of time is saved from having to wait for the entire stream to be processed. If not, then it’s not so good – that’s why processor developers are hard at work on it! A modern processor independently selects the most necessary data stream for processing at a certain point in time.

CPU

Intel server processors of the first half of the 90s

The CPUs of the 1990s, whether desktop or server, had only one core, so could only run one thread at a time, although they could execute multiple instructions at the same time (known as superscalar).

High-end servers and workstations have to deal with a huge number of threads, and Pentium-era machines typically had two CPUs to handle the workload. However, the idea that a processor can handle multiple threads at the same time has been around for quite some time. Modern servers also use multiple processors on the same motherboard.

The idea of ​​a CPU executing more than one thread instruction in its core, also known as simultaneous multithreading (SMT), had to wait. It took a long time for the capabilities of the equipment to allow the implementation of such a technology.

CPU

The Intel Northwood architecture brought multithreading to the masses. 1 core, 2 threads in Intel Pentium 4

This was achieved by 2002 when Intel released a new version of the Pentium 4 processor. It was the first fully SMT-compliant desktop CPU, with a feature dubbed Intel Hyper-Threading Technology. All modern processors are its heirs.

multitasking

So how does one processor core handle two threads at the same time? Think of the processor as a complex factory consisting of several stages: receiving and organizing raw materials (i.e. data), then sorting orders (streams), breaking them down into many smaller tasks.

CPU

In the same way that a high-volume car production line will work with different parts, one or two at a time, the CPU must perform different tasks in a given sequence in order to execute a given set of instructions.

This is how the pipeline works, the various stages will not always be busy. Some data has to wait for some time until the previous steps are completed.

This is where SMT (Simultaneous multithreading) comes into play. Hardware designed to keep track of the state of each part of the “pipeline” is used to determine if another thread can use idle stages without stopping the current thread.

The fact that desktop processors were multi-threaded long before they were multi-core shows that SMT is much easier to implement. In the case of the Intel Northwood architecture, less than 5% of the entire die was used to manage two threads.

The CPU cores that support SMT are organized so that they appear to the operating system as separate logical cores. Physically, they use the same resources, but operate independently.

Desktop CPUs process no more than two threads per CPU core because their pipelines are relatively short and simple, and analysis by the developers would show that two is the optimal limit. Therefore, we still do not see 8 cores – 24 threads, etc. in home computers.

CPU

CPU IBM Power10 – 15 SMT8 cores

On the opposite end of the spectrum, huge server processors like the Intel Xeon Phi chips or the latest IBM POWER processors handle 4 and 8 threads per core, respectively. This is because their cores contain many pipelines with shared resources.

These different approaches to CPU design arise from the very different workloads the chips have to deal with.

CPUs are not the only microcircuits in a computer that have to deal with a large number of threads. There is one chip with a very specific role that handles thousands of threads at the same time.

Video cards

When it comes to excessive numbers, processors completely lose out to video cards. They are physically larger, have many more transistors, consume more power, and process many more threads than any server processor.

CPU

Entry-level graphics card processes data faster than a modern 32-core AMD Ryzen processor

Take, for example, the AMD Radeon RX 6800 graphics card with the Navi 21 chip. This processor consists of 60 compute units (CUs), each of which must process 64 separate threads simultaneously. That’s 3840 threads!

So how does a GPU handle much larger tasks than a CPU?

Each CU has two sets of SIMD blocks (one instruction, multiple data), and each of them can work with 32 separate data elements at the same time. They can all be from different threads, but the catch is that the module must execute the same instruction on every thread.

This is the key difference from the CPU – where a desktop processor core will process no more than two threads, the instructions can be completely different, from completely unrelated processes.

CPU

GPUs are designed to do the same things over and over again, usually from the same processes (technically known as cores, but we’ll leave that aside), but it’s all done in parallel.

As with the IBM POWER10, an enterprise server-only processor, the graphics adapter is built for a very specialized task.

The largest modern games with their complex 3D images require an incredible amount of mathematical calculations in just a few milliseconds. And this requires a huge amount of flows.

Need more streams

If you take a look at any CPU review, you’ll almost always see two results from Cinebench, a test that does the tricky CPU-based rendering task.

CPU

One result is for a test that uses only one thread, while the other will use as many threads as the CPU can handle. The results of the latter are always much faster than those of a single-threaded test. Why is it so?

Cinebench renders 3D graphics just like in a game, only one frame in high detail. And if you remember how GPUs execute many threads in parallel to create 3D graphics, it becomes obvious why processors with a large number of cores, especially with SMT, can handle the workload so quickly. This is one of the few scenarios where all CPU cores/threads can be implemented.

Unfortunately, adding more cores just makes the processor bigger and therefore more expensive, so it would seem that SMT will always be a good technology. However, much depends on the situation.

For example, the AMD Ryzen 9 3950X processor (12-core processor with 24 threads) shows different results in 36 different games with and without SMT enabled. Some games will experience 10-16% better performance with SMT enabled, while others will experience 10-12% worse performance.

The average difference, however, was only 1%, so it’s certainly not a case where SMT should always be disabled during gaming. But this raises a few more questions.

First, why does the game run 12% slower if the CPU cores are processing two threads at the same time? The key phrase here is “conflict over resources”.

CPU

The more threads a CPU can handle, the more important the caching system in the processor becomes. This becomes apparent when examining processors with a fixed L3 cache size, no matter how many cores are enabled.

The more cores and threads a chip has, the more cache requests the system has to process. And that brings us to the next question: is this why most games can’t handle a lot of threads/cores?

Why don’t games use many threads?

Let’s go back to Process Explorer and look at a few games, namely Cyberpunk 2077, Spider-Man Remastered and Shadow of the Tomb Raider. All three were developed for PC and consoles, so you can expect them to use 4 to 8 threads.

CPU

At first glance, games do use a lot of threads. But it’s not possible because the processor used in the computer running the games supports a maximum of 8 threads. But if we dig deeper into process flows, we get a much clearer picture. Let’s take a look at Shadow of the Tomb Raider.

Below we can see that the vast majority of these threads take almost no CPU execution time (second column, displayed in seconds). Although the process and OS have created over a hundred threads, most of them run too fast to even register.

CPU

The delta cycle count is the total number of CPU cycles accumulated by a thread in a process, and in the case of this game, it is dominated by just two threads. However, others still use all available processor cores.

The number of cycles may seem like a ridiculous number, but if the processor has a clock speed of, say, 4.5 GHz, then one cycle takes only 0.22 nanoseconds. So 1.3 billion cycles corresponds to just under 300 milliseconds.

Of course, not all games can do this, and the older the project, the fewer threads it uses. If we look at the original Call of Duty from 2003, we see a very different picture.

CPU

All the games of that era were like that – just one thread for everything. This is due to the fact that at that time processors had only one core, and relatively few of them supported SMT.

While a Call of Duty process requires a single thread, Shadow of the Tomb Raider is properly multi-threaded at the same time (as many as the CPU supports).

Initially, hardware was ahead of software when it came to taking full advantage of all the cores on offer (with or without SMT), and we had to wait several years before games became fully multi-threaded.

Now that the latest consoles have an 8-core processor with 2SMT support, future games will certainly be more threaded.

Multithreading – the future?

A user can get a desktop PC with a CPU that can handle 32 threads ( AMD Ryzen 9 7950X ) and a GPU that can handle 4096 threads ( Nvidia GeForce RTX 4090 ).

This hardware is of course at the forefront of technology, cost and power and certainly not representative of what most computers have to offer. But about 10 years ago, the picture was very different.

The best processors supported 8 threads via SMT, but the average PC usually had to make do with about 4 threads. You can now get processors under $100 that perform just like the best chips from 7 years ago.

Processors, Threads, and Processes: Is the PC in for a multi-core future?

4 cores, 8 threads, less than $100 – The Intel Core i3-10100 shows about the same level of performance as the Intel Core i7-7700.

We can thank AMD for this, as they were the first to offer many cores/threads at an affordable price. The AM4 platform has revolutionized the home PC world. And today, both manufacturers regularly fight over who can offer the most cores/threads per dollar.

We are at a stage where new games almost take full advantage of all the processing power available to them, if they are not limited by the GPU.

Results

So what’s next? If we could fast forward a decade, would we see the average gamer using a 128-thread processor? It’s possible, but very unlikely, simply because most software is still limited to the power of a single core. Upgrading Single Core performance is important, as is the number of cores. However, professional graphic content creators are already using powerful processors and expensive graphics cards for their work.

Source: techspot

Related Posts

UK to regulate cryptocurrency memes: illegal advertising

Britain’s financial services regulator has issued guidance to financial services companies and social media influencers who create memes about cryptocurrencies and other investments to regulate them amid…

unofficial renders of the Google Pixel 9 and information about the Pixel 9 Pro XL

The whistleblower @OnLeaks and the site 91mobiles presented the renders of the Google Pixel 9 phone. Four images and a 360° video show a black smartphone with…

Embracer to sell Gearbox (Borderlands) to Take-Two (Rockstar and 2K) for $460 million

Embracer continues to sell off assets – the Swedish gaming holding has just confirmed the sale of The Gearbox Entertainment studio to Take-Two Interactive. The sum is…

photo of the new Xbox X console

The eXputer site managed to get a photo of a new modification of the Microsoft Xbox game console. The source reports that it is a white Xbox…

Israel Deploys Massive Facial Recognition Program in Gaza, – The New York Times

The Technology section is powered by Favbet Tech The images are matched against a database of Palestinians with ties to Hamas. According to The New York Times,…

Twitch has banned chest and buttock broadcasts of gameplay

Twitch has updated its community rules and banned the focus of streams on breasts and buttocks. According to the update, starting March 29, “content that focuses on…

Leave a Reply

Your email address will not be published. Required fields are marked *