Jiang Lin
Ph.D Student in Computer Engineering
Iowa State University
34 Schilletter Village B
Ames, IA 50010
Office: Durham 372
Phone: (515) 294-8936
E-mail: linj@iasatate.edu

I am a PhD student in Department of Electrical and Computer Engineering, Iowa State University.
My adviser is Prof. Zhao Zhang.

I will join IBM Austin Research Lab in August 2008 as a PostDoc.

Research Interests (research statement)

Computer System Architecture and Its Interaction with Operating Systems

Education

Ph.D. August, 2008
(Expected)
Department of Electrical and Computer Engineering
Iowa State University
Ames, IA
B.S. June, 1998 College of Computer Science and Technology
Huazhong University of Science & Technology
Wuhan, China
Publications
MICRO'08 "Mini-Rank: Adaptive DRAM Architecture for Improving Memory Power Efficiency", [Paper]
Hongzhong Zheng, Jiang Lin, Zhao Zhang, Eugene Gorbatov, Howard David and Zhichun Zhu
2008 International Symposium on Microarchitecture,
Lake Como, ITALY, November 2008.
ICPP'08 "Memory Access Scheduling Schemes for Systems with Multi-Core Processors", [Paper]
Hongzhong Zheng, Jiang Lin, Zhao Zhang, and Zhichun Zhu
2008 International Conference on Parallel Processing,
Portland, Oregon, September 2008.
SIGMETRICS'08 "Software Thermal Management of DRAM Memory for Multicore Systems", [Paper] [Talk]
Jiang Lin, Hongzhong Zheng, Zhichun Zhu, Eugene Gorbatov, Howard David and Zhao Zhang
2008 International Conference on Measurement and Modeling of Computer Systems,
Annapolis, Maryland, June 2008.
HPCA'08 "Gaining Insights into Multi-Core Cache Partitioning: Bridging the Gap between Simulation and Real Systems", [Paper] [Talk]
Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, and P. Sadayappan,
14th International Symposium on High-Performance Computer Architecture,
Salt Lake City, UT, February 2008.
ISCA'07 "Thermal Modeling and Management of DRAM Memory Systems", [Paper] [Talk]
Jiang Lin, Hongzhong Zheng, Zhichun Zhu, Howard David, and Zhao Zhang,
34th International Symposium on Computer Architecture,
San Diego, CA, June 2007.
ISPASS'07 "DRAM-Level Prefetching for Fully-Buffered DIMM: Design, Performance and Power Saving", [Paper] [Talk]
Jiang Lin, Hongzhong Zheng, Zhao Zhang, Zhichun Zhu, and Howard David,
2007 International Symposium on Performance Analysis of Systems and Software,
San Jose, CA, April 2007.
MASCOTS'05 "Towards Pairing Java Applications on SMT Processors", [Paper]
Wei Huang, Jiang Lin, Zhao Zhang, and J. Morris Chang,
2005 International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems,
Atlanta, GA, October 2005.
ISPASS'05 "Performance Characterization of Java Applications on SMT Processors", [Paper]
Wei Huang, Jiang Lin, Zhao Zhang, and J. Morris Chang,
2005 International Symposium on Performance Analysis of Systems and Software,
Austin, TX, March 2005.
Research Projects
Leveraging Page Coloring for Managing Multicore Shared Cache in OS
(with The Ohio State University)

Page recolor procedure

Cache partitioning and sharing is critical to the effective utilization of multicore processors. However, almost all existing studies have been evaluated by simulation that often has several limitations, such as excessive simulation time, absence of OS activities and proneness to simulation inaccuracy. To address these issues, we have taken an efficient software approach to supporting both static and dynamic cache partitioning in OS through memory address mapping.

In our HPCA'08, we have comprehensively evaluated several representative cache partitioning schemes with different optimization objectives, including performance, fairness, and quality of service (QoS). Our software approach makes it possible to run the SPEC CPU2006 benchmark suite to completion. Besides confirming important conclusions from previous work, we are able to gain several insights from whole program executions, which are infeasible from simulation. For example, giving up some cache space in one program to help another one may improve the performance of both programs for certain workloads due to reduced contention for memory bandwidth. Our evaluation of previously proposed fairness metrics is also significantly different from a simulation-based study.

Ongoing and future work is planned along several directions. First, we will refine our system implementation, to further reduce dynamic cache partitioning overhead. Second, we plan to make our software layer available for the architecture community by adding an easy user interface. Third, our software provides us with the ability to control data locations in the shared cache. With a well defined cache partitioning interface, we are conducting cache partitioning research at the compiler level, for both multiprogramming and multithreaded applications.

Power and Thermal Modeling and Management of DRAM Systems
(with University of Illinois at Chicago and Intel Corporation)

Thermal Modeling of Fully Buffered DIMM (FB-DIMM)



Thermal Zone

With increasing speed and power density, high-performance memories, including FB-DIMM (Fully Buffered DIMM) and DDR2 DRAM, now begin to require dynamic thermal management (DTM) as processors and hard drives did. The DTM of memories, nevertheless, is different in that it should take the processor performance and power consumption into consideration. Existing schemes have ignored that.

In our ISCA'07, we investigate a new approach that controls the memory thermal issues from the source generating memory activities -- the processor. It will smooth the program execution when compared with shutting down memory abruptly, and therefore improve the overall system performance and power efficiency. For multicore systems, we propose two schemes called adaptive core gating and coordinated DVFS. The first scheme activates clock gating on selected processor cores and the second one scales down the frequency and voltage levels of processor cores when the memory is to be overheated. They can successfully control the memory activities and handle thermal emergency. More importantly, they improve performance significantly under the given thermal envelope. Our simulation results show that adaptive core gating improves performance by up to 23.3% 16.3% on average on a four-core system with FB-DIMM when compared with DRAM thermal shutdown; and coordinated DVFS with control-theoretic methods improves the performance by up to 18.5% (8.3% on average).

In our SIGMETRICS'08, we have implemented and improved the two proposed DTM on two multicore systems with Linux OS. The machines are a Dell PowerEdge 1950 server and an Intel SR1500AL server testbed. We have done comprehensive experiments and detailed analysis regarding performance, memory temperature, and system power and energy consumption. The results first confirm that a system-level memory DTM policy may significantly improve system performance and power efficiency. DTM-ACG shows performance improvements comparable to that reported in simulation. The average performance improvements are 11.7% and 6.7% on the PowerEdge 1950 and the SR1500AL, respectively. We also have surprising findings that reveal the weakness of the previous study: The CPU heat dissipation and its impact on DRAM memories, which were ignored, are significant factors. We have observed that the DTM-CDVFS has much better performance than that reported in the simulation-based study. The average improvements are 9.7% and 13.2% on the two machines, respectively. It also significantly reduces the processor power by 15.5% and energy by 22.7% on average.

DRAM-Level Prefetching for Fully-Buffered DIMM
(with University of Illinois at Chicago and Intel Corporation)

Structure of Fully Buffered DIMM (FB-DIMM)

In our ISPASS'07, we have studied DRAM-level prefetching for the fully buffered DIMM (FB-DIMM) designed for multi-core processors. FB-DIMM has a unique two-level interconnect structure, with FB-DIMM channels at the first-level connecting the memory controller and Advanced Memory Buffers (AMBs); and DDR2 buses at the second-level connecting the AMBs with DRAM chips. We propose an AMB prefetching method that prefetches memory blocks from DRAM chips to AMBs. It utilizes the redundant bandwidth between the DRAM chips and AMBs but does not consume the crucial channel bandwidth. The proposed method fetches K memory blocks of L2 cache block sizes around the demanded block, where K is a small value ranging from two to eight. The method may also reduce the DRAM power consumption by merging some DRAM precharges and activations. Our cycle-accurate simulation shows that the average performance improvement is 16% for single-core and multi-core workloads constructed from memory-intensive SPEC2000 programs with software cache prefetching enabled; and no workload has negative speedup. We have found that the performance gain comes from the reduction of idle memory latency and the improvement of channel bandwidth utilization. We have also found that there is only a small overlap between the performance gains from the AMB prefetching and the software cache prefetching. The average of estimated power saving is 15%.

Performance Evaluation and Optimization of Java Applications on SMT Processors
(with CSL in Iowa State University)

Job Pairing

As Java is emerging as one of the major programming language in software development, studying how Java applications behave on recent SMT processor is of great interest.

In our ISPASS'05, we characterize the performance of Java applications on Intel Pentium 4 Hyper-Threading processor. Using the hardware performance counters, we quantitatively evaluate micro-architecture metrics while running various types of Java applications.

In our MASCOTS'05, we furthur investigate various issues of pairing Java applications for multi-threaded execution on Intel Pentium 4 processor. A statistical model is proposed to analyze the collected data. This novel approach reveals that trace cache is the major factor determining the pairing performance.