Login:
Stimmen - 4, Durchschnittliche Bewertung: 4.5 ( )

Anleitung Intel, modell ARCHITECTURE IA-32

Hersteller: Intel
Dateigröße: 1.79 mb
Dateiname: 0e889a63-515d-46c3-b9e4-3c91e6e61019.pdf
Unterrichtssprache:en
Link zum kostenlosen Download Hinweise finden Sie am Ende der Seite



Anleitung Zusammenfassung


False sharing applies to data used by one thread that happens to reside on the same cache line as different data used by another thread. These situations can also incur performance delay depending on the topology of the logical processors/cores in the platform. An example of false sharing of multi-threading environment using processors based on Intel NetBurst Microarchitecture is when thread-private data and a thread synchronization variable are located within the line size boundary (64 bytes) or sector boundary (128 bytes). When one thread modifies the synchronization variable, the “dirty” cache line must be written out to memory and updated for each physical processor sharing the bus. Subsequently, data is fetched into each target processor 128 bytes at a time, causing previously cached data to be evicted from its cache on each target processor. False sharing can experience performance penalty when the threads are running on logical processors reside on different physical processors. For processors that support Hyper-Threading Technology, false-sharing incurs a performance penalty when two threads run on different cores, different physical processors, or on two logical processors in the physical processor package. In the first two cases, the performance penalty is due to cache evictions to maintain cache coherency. In the latter case, performance penalty is due to memory order machine clear conditions. False sharing is not expected to have a performance impact with a single Intel Core Duo processor. Multi-Core and Hyper-Threading Technology 7 Multi-Core and Hyper-Threading Technology 7 When a common block of parameters is passed from a parent thread to several worker threads, it is desirable for each work thread to create a private copy of frequently accessed data in the parameter block. Placement of Shared Synchronization Variable On processors based on Intel NetBurst microarchitecture, bus reads typically fetch 128 bytes into a cache, the optimal spacing to minimize eviction of cached data is 128 bytes. To prevent false-sharing, synchronization variables and system objects (such as a critical section) should be allocated to reside alone in a 128-byte region and aligned to a 128-byte boundary. Example 7-6 shows a way to minimize the bus traffic required to maintain cache coherency in MP systems. This technique is also applicable to MP systems using IA-32 processors with or without Hyper-Threading Technology. On Pentium M, Intel Core Solo and Intel Core Duo processors, a synchronization variable should be placed alone and in separate cache line to avoid false-sharing. Software must not allow a synchronization variable to span across page boundary. User/Source Coding Rule 25. (M impact, ML generality) Place each synchronization variable alone, separated by 128 bytes or in a separate cache line. User/Source Coding Rule 26. (H impact, L generality) Do not place any spin lock variable to span a cache line boundary. At the code level, false sharing is a special concern in the following cases: • Global data variables and static data variables that are placed in the same cache line and are written by different threads. IA-32 Intel® Architecture Optimization • Objects allocated dynamically by different threads may share cache lines. Make sure that the variables used locally by one thread are allocated in a manner to prevent sharing the cache line with other threads. Example 7-6 Placement of Synchronization and Regular Variables int regVar; int padding[32]; int SynVar[32*NUM_SYNC_VARS]; int AnotherVar; Another technique to enforce alignment of synchronization variables and to avoid a cacheline being shared is to use compiler directives when declaring data structures. Example 7-7 Declaring Synchronization Variables without Sharing a Cache Line __declspec(align(64)) unsigned __int64 sum; struct sync_struct {…}; __declspec(align(64)) struct sync_struct sync_var; Other techniques that prevent false-sharing include: • Organize variables of different types in data structures (because the layout that compilers give to data variables might be different than their placement in the source code). • When each thread needs to use its own copy of a set of variables, declare the variables with: — the directive threadprivate, when using OpenMP — the modifier __declspec (thread), when using Microsoft compiler Multi-Core and Hyper-Threading Technology 7 • In managed environments that provide automatic object allocation, the object allocators and garbage collectors are responsible for layout of the objects in memory so that false sharing through two objects does not happen. • Provide classes such that only one thread writes to each object field and close object fields, in order to avoid false sharing. One should not equate the recommendations discussed in this section as favoring a sparsely populated data layout. The data-layout recommendations should be adopted when necessary and avoid unnecessary bloat in the size of the work set....


Bewertungen



Bewerten
Vorname:
Geben Sie zwei Ziffern:
capcha





Kategorien