ACM Symp. on Applied Computing,
pgs. 913--917,
2002
This paper presents a new ILP processor architecture called Compressed VLIW (CVLIW). The CVLIW processor constructs a sequence of long instructions by removing nearly all NOPs (No OPerations) and LNOPs (Long NOPs) from VLIW code. The CVLIW processor individually schedules each instruction within long instructions using functional unit and dynamic scheduler pairs. Every dynamic scheduler in the CVLIW processor individually checks for data dependencies and resource collisions while scheduling each instruction. In this paper, we simulate the architecture and show that the CVLIW processor performs better than the VLIW processor for a wide range of cache sizes and across various numerical benchmark applications. These performance gains of the CVLIW processor result from individual instruction scheduling and size reduction of object code. Even though we assume a cache with a zero miss rate, the CVLIW's performance is still 9%~15% higher than that of the VLIW processor regardless of benchmark applications.