In the Forums...
Posted: June 5th, 2001
Written by: Tuan "Solace" Nguyen
Here are the features for SmartMP:
- Dual Point to Point, high-speed 266MHz Athlon system buses
Dedicated bus per CPU delivers better performance
Designed to provide up to 4.2GB/sec of bus bandwidth in a 2P system
Transaction-based protocol allows CPUs to continue working while outstanding data requests are being filled
- Optimized "Modified Owner Exclusive Shared Invalid", or MOESI, Cache Coherency Protocol
Keeps track of data in CPU caches
Identifies when data from one CPU is needed by another, and when data is shared between CPUs
Effectively reduces memory traffic, increasing available bandwidth
- Innovative "Snoop" busses
High-speed inter-processor communication bus for CPU snooping
Transfers between CPU cache, through the Data Bus, reduce memory bandwidth requirements for shared data
What does all this stuff mean? First, letís talk about Cache Coherency.
Cache Coherency enables one processor in the system to be able to fetch data into its local cache by snooping at the other processor. For example, if the first CPU has data in its cache that the second requires, it is able to call the first to send the requested data through the core logic and into the second CPUís cache. Whatís so special about this? Traditionally, data had to be sent out from the first CPUís cache, into the core logic, out into main memory, back into core logic and into the second CPUís cache. This method uses precious memory bandwidth.
With Cache Coherency, each CPU has its own ďsnoopĒ bus that allows it to constantly check for data in main memory and in the other CPUís cache. If something is needed, data is simply transferred from cache into core logic, and back into the other processorís cache, using absolutely no main memory bandwidth at all. This reduces memory latency and frees up leg room for other transactions to occur. This is possible from the Athlon processorís PTP protocol bus.
SmartMP MOESI protocol also enables instructions to be executed even though all the required data isnít available in cache. The Athlon MP can fetch data while execution is taking place. This was previously not possible using conventional cache coherency protocols.
How does a system take advantage of multiple processors? It must contain parallel code. A program that is parallel has instructions that can be executed at the same time as another instruction. For example, if the program wants to execute instruction C, but C requires the result of A, then C canít execute at the same time as A. C is dependant on the results of A. A program with massive dependency code will not be able to benefit from multiple processors.
Besides trying to find highly paralleled programs which are also more difficult to program well, having an MP system allows you to run more programs simultaneously or load up more instances of the same program, like Winamp. With the right operating system, each program thread will be distributed among available processors so that no single processor takes a heavy beating. Trying to achieve a perfect 50/50 balance is going to be extremely difficult since there arenít many consumer level applications and games that are capable of being MP aware. Still, having an OS that natively supports MP can bring you closer to that realization.