We implemented various cache hierarchies with some modifications and found some interesting results. We also compared various LLC Cache replacement policies for various types of Graph workloads including Breadth First Search(BFS), Connected Components(CC), Page Rank(PR), Single Source Shortest Path(SSSP) and Betweenness Centrality(BC).
There are three common cache hierarchies : inclusive, exclusive and non-inclusive. Non-inclusive cache hierarchy was implemented in ChampSim repository we used. We implemented exclusive, l3exclusive and l3inclusive. l3exclusive and l3inclusive cache hierarchies are minor modifications of exclusive and inclusive to reduce strictness and make use of their positive effects. The modifications used reduced the overhead of writeback in exclusive and used the low L1 miss latency of exclusive hierarchy. Below are more implementation details of the l3inclusive and l3exclusive cache hierarchies. We took reference for implementing the policies from Evaluation of Cache Inclusion Policies in Cache Management A Thesis by Luna Backes Drault
Here we mention the differences in the policy implemented from non-inclusive policy.
-
In handle_fill
If we are at LLC, it is bypassed and data is sent directly to upper level cache. For L2C or L1I or L1D, the victim is sent to LLC using a writeback packet.
-
In handle_writeback
If writeback packet is recieved at L2, the victim is copied back to L3.
-
In handle_read
If there is a read hit in LLC, the data is returned to upper level cache and the entry is invalidated from LLC.
-
Abstraction
LLC cache is a super set of L1 and L2 caches. No such guarantee is provided for the relation between L1 and L2
-
Deletion from LLC
Data requested from DRAM is received in LLC replacing the victim cache line. This cache is also simultaneously deleted from L1 and L2 caches. If the victim cache line is dirty in L1 or L2 then writeback packet is sent to DRAM accordinly. The latest copy of the line is present in the highest cache block, so that line is sent as the writeback packet
-
Deletion from L1/L2
If the cache line is clean then we simply invalidate the line. If the line is dirty in either of the caches, we sent an appropriate writeback message to LLC cache.
-
In handle_fill
If we are at LLC, it is bypassed and data is sent directly to upper level cache. For L2C, the victim is sent to LLC using a writeback packet.
-
In handle_writeback
If writeback packet is recieved at L2, the victim is copied back to L3.
-
In handle_read
If there is a read hit in LLC, the data is returned to upper level cache and the entry is invalidated from LLC.
Here is the summary of the Cache Hierarchy with highest improvement in IPC. This data is for LRU replacement policy.
Trace Type | Cache Hierarchy | Percentage Improvement |
---|---|---|
BC | l3inclusive,l3exclusive | 4-6 |
BFS | l3inclusive | 2-4 |
CC | l3exclusive | 6-8 |
PR | l3exclusive | 10-12 |
SSSP | l3exclusive |
4-6 |
Here are the graphs for percentage improvement of these cache hierarchy policies over non-inclusive hierarchy for LRU cache replacement policy.
Besides the srrip, ship, drrip, lru which were present in the ChampSim directory, we added hawkeye replacement policy also (reference).
In addition to this, we also implemented LFU and FIFO. We also implemented a hybrid of LFU and LRU, lfru. We maintained both LFU and LRU score(0 being least recent) Then we used a weighted combination of scores of LFU and LRU to get the victim. This way, we are able to keep the address which is used frequently from being removed from cache. But this also gives a weight to the ones which were accessed very recently and had low frequency because of that.
We compared different cache replacement policies across different traces and cache hierarchies. The IPC value for BFS traces does not vary significantly across different replacement policies. It's IPC performance is only slightly affected by cache_hierarchy policy
L3inclusive does not show significant performance deviation when changing replacement policies.
IPC performance of L3 exclusive varies a lot with different replacement policy. Here is the order:
Hawkeye
Here is the graph of the comparison of the same.
More graphs for other traces are present in the file plot.ipynb
We ran the simulation for various sizes of LLC with varying associativity in caches for trace of BFS. For this simulation, we used the graph from this website to simulate latency value for different cache sizes.
Here is the graph for the comparison.
We can see that the effect of number of ways is negligible. This shows that the number of conflict misses is not that large. But the size of LLC had significant effect on IPC.
When the size was changed from 2 MB to 8 MB, the number of hits did not increase much but the latency increased. As a result, the average miss latency of L1 increased and there was a decrease in IPC.
On the other hand, using a 32 MB cache decreased the average miss latency significantly. The number of LLC miss decreased significantly (~8 times). This decrease in number of misses more than offset the increase in latency and lead to an overall better IPC
Roll Number | Name |
---|---|
210050026 | Atishay Jain |
210050038 | Chaitanya Aggarwal |
210050085 | Khushang Singla |