Cache
In computer science, a cache is an easily-accessed memory used to store a subset of a larger pool of data that is expensive or slow to either fetch or compute. Repeated accesses to the data can reference the cache rather than refetching or recomputing, so that the perceived average access cost or time is lower.The reason caches work at all is that many access patterns in typical computer applications have locality of reference. There are several sorts of locality, but we mainly mean that often the same data is accessed frequently or with accesses that are close together in time, or that data near to each other are accessed close together in time.
| Table of contents |
|
2 Cache hierarchy in a modern computer 3 Other caches 4 Cache management 5 Related topics 6 External links |
The term "cache" has recently become somewhat synonymous with a particular cache, the cache used by the CPU to avoid the expensive process of gather data from main memory. As the performance of modern CPU designs grew in relation to the main memory, larger and more complex caching systems have been responsible for much of the recent gains in computing performance. Generally the only difference between "standard" and "server" CPU designs (for instance, the Pentium 4 vs. Xeon) is the amount of cache included in the design.
Most CPUs include a small number registers that can be considered the fastest cache, although in most discussions registers are treated differently. The next step is a Level 1 cache (or L1, or primary cache) which is physically located on the same chip as the CPU, and can be accessed at very low cost. Memory that can "keep up" with a modern CPU is very expensive however, so Level 1 caches tend to be small.
The next step is the Level 2 cache (or L2, or second level cache), which is located on separate chips, but connected to the CPU via a private high-speed interface of some sort. L2 caches typically operate at the same speed as the CPU, or some small divider (on-half speed). "Lower speed" L2 caches are sometimes referred to as Level 3, although this terminology is not widely used.
When accessing data, circuitry inside the CPU always looks in the cache for the data it needs -- as far as it is concerned there is no memory hierarchy, the memory management unit makes it invisible. If the data can be found already loaded into the cache, a condition known as a cache hit, the CPU can continue processing quickly. However if the data is not already in the cache, a cache miss, the CPU must wait while the data is fetched from main memory.
Cache misses are very expensive, if the CPU actually stops and waits for the data to become available, this might be the equivalent of hundreds of operations that it could have completed in this time. Modern CPU designs attempt to reduce this "hit" by running other operations while it waits, and a newer technique, simultaneous multithreading (or HyperThreading in Intel's terminology), allows another program to take control of the CPU while it waits for the data to become available.
Typical computers have a hierarchy of caches. The hard disk, as well as storing local files, may be used to cache copies of files from tape drives, remote file servers, or (most commonly) web pages. Some of the main memory of the computer is generally used to cache portions of the hard drive. Portions of main memory are cached in a high-speed CPU memory cache, which itself may have one, two, or even three levels. The CPU will keep some of the data it operates on in a register file which some people view as software controlled cache.
Each of these levels of cache are smaller, faster, and more expensive per byte that the ones below. The large number of caches in the hierarchy reflects the incredible range of speeds and capacities of modern computers. A web page access may take ten seconds to load one of billions of possible pages. A disk drive access may take ten milliseconds to load a few kilobytes from the hundreds of gigabytes on the disk platters. Main memories, which are commonly made of DRAM, take 100 nanoseconds to load tens of bytes from a few hundred megabytes. Large CPU caches may take 10
nanoseconds to load ten bytes from a cache of perhaps a megabyte; small CPU caches may take 1 nanosecond to load the same ten bytes from a cache of tens of kilobytes. Finally, registers are accessed in fractions of a nanosecond, from the tens of registers in a typical register file.
The CPU caches are generally managed entirely by hardware. Other
caches are managed by a variety of software. The cache of disk
sectors in main memory is usually managed by the operating system
kernel. The BIND DNS daemon caches a mapping of domain names to IP addresses, as does a resolver library.
A cache of recently visited web pages can be managed by your Web browser. Some browsers are configured to use an external proxy web cache, a server program through which all web requests are routed so that it can cache frequently accessed pages for everyone in an organization. Many ISPs use proxy caches to save bandwidth on frequently-accessed web pages.
The search engine Google keeps a cached copy of each page it
examines on the web. These copies are used by the Google indexing
software, but they are also made available to Google users, in case
the original page is unavailable. If you click on the "Cached" link
in a Google search result, you will see the web page as it looked when Google indexed it.
The CPU register file, if thought of as a cache, is managed primarily
by the compiler, and to some extent by the operating system kernel.
There are basically three problems that must be addressed while the
cache is in use. If the strategies for addressing these problems are
simple enough, and speed crucial enough, these policies can be
implemented in hardware, otherwise they are implemented in software.
The cache is inevitably too small to hold all the data referenced.
Some data must be evicted when new data is brought in. The cache will
perform best if the data evicted is that which is least likely to be
referenced again. Heuristics which attempt to predict and conform to
the future access pattern are known as the replacement policy.
One popular replacement policy, LRU, is to replace the least recently
used datum with the incoming datum.
The data set being cached may be changed by other entities, in which
case the copy in the cache may become out-of-date or stale. Or,
the computer may attempt to update the data in the cache itself,
causing cached copies of the data in other computers (or other caches
within the same computer) to become stale. Communication protocols
between the cache managers which keep the data set consistent are
known as the coherency protocol.
If the computer updates the data in the cache, that update must
eventually be propagated to the long-term storage point of the data,
and eventually to any other users of that data. The update
propagation timing is set by the write policy:
Caching for reading access only is more common than for writing when
operating over networks, because the coherency protocol may become
exceedingly complicated if communication is not reliable.
For instance, web page caches and client-side network file system
caches are typically read-only specifically to keep the network
protocol simple and reliable.
CPU caching
Cache hierarchy in a modern computer
Other caches
Cache management
There are intermediate policies as well. The cache may be
write-through, but the writes may be held in a store data queue
temporarily, usually so that multiple stores can be processed together
(which can reduce bus turnarounds and so improve bus utilization).Related topics
External links