Processors may be interconnected using buses, crossbar switches or on-chip mesh networks.
The bottleneck in the scalability of SMP using buses or crossbar switches is the bandwidth and power consumption of the interconnect among the various processors, the memory, and the disk arrays.
Mesh architectures avoid these bottlenecks, and provide nearly linear scalability to much higher processor counts at the sacrifice of programmability
The memory access time depends on the memory location relative to the processor.
Under NUMA, a processor can access its own local memory faster than non-local memory (memory local to another processor or memory shared between processors).
The benefits of NUMA are limited to particular workloads,
notably on servers where the data is often associated strongly with certain tasks or users.1