Webpage of Girish
(Updated on May 1, 2011
B (P.O. Box: 116200,
339), Larsen Hall
Tel (O): (352) 392-8705
GIRISHVS AT UFL DOT
EDU (All lower case)
I am a PhD graduate from the Department of Electrical and Computer Engineering, University of Florida, Gainesville. I am an alumnus of the Advanced Computing and Information System Lab, where I was advised by Dr. Renato Figueiredo.
I completed my
Bachelor of Engineering from P.S.G
(2003) and my Master of
Science from University
I am currently a member of the Hybrid Computing Group at Intel Labs, Santa Clara, California.
format (outdated, email for updated copy)
During my PhD, I conducted research in the areas of computer architecture, virtualization and full-system modeling & simulation. I have also worked on designing Fault-tolerant Nanoscale Memories. Here's a list of the significant projects I have worked on. (Unless mentioned otherwise, I am the sole architect and developer on these projects)
A simulation framework for the analysis of TLB behavior
To understand the performance impact of the TLB and to investigate the performance improvement due to various architectural modifications, a suitable full-system framework which models the x86 ISA at the micro-operation level and supports customizable tagged TLB models. I developed such a simulation framework using Simics and FeS2 and adding full-fledged tagged TLB models on top of this framework. This is the first academic simulator that is capable of simulating consolidated virtualized workloads in a full-system framework.
This simulation framework was the focus of an interview by Jakob Engblom of Windriver Simics. The interview can be found here or a PDF version can be accessed here.
TMT - A Tag Management Framework for Translation Lookaside Buffers
The Tag Manager Table (TMT) is a low-latency software-transparent hardware solution for generation and management of tags for hardware-managed TLBs (like on x86) in order to improve the performance of virtualized workloads. Unlike VM-specific tags (ASIDs), the TMT uses process-specific identifiers for tagging TLB entries and avoids TLB flushes during both inter-VM and intra-VM context switches. From simulation based results, it is found that the TMT-generated tags can reduce the number of flushes by about 90% compared to no tags. For 8-way associative TLBs, this reduces the TLB miss rates by 65% to 90% and results in a 4.5% to 25% improvement in the IPC of virtualized workloads.
In addition, the TMT may be seamlessly used for non-virtualized workloads as well to reduce 80% to 90% of the TLB-induced performance slowdown. The TMT can also be used as a generic tagging framework for enabling shared Last Level TLBs or for tagging I/O TLBs.
CShare TLB - Controlled Sharing of Tagged Translation Lookaside Buffers
With the emergence of TLB tags to facilitate caching the entries of multiple address spaces, the TLB has become a performance-critical shared platform resource. The CShare TLB architecture is proposed to isolate the TLB behavior of virtualized workloads from one another using a TLB Sharing Table (TST) along with the TMT. The CShare TLB may also be used to increase the IPC of consolidated workloads or selectively enhance the IPC of one component workload by controlling the amount of shared TLB space used by the different VMs running these workloads. The performance of a high priority workload due to using the TMT without controlled sharing can be further improved by 1.4X using such TLB usage restrictions. The CShare TLB can be used with both static and dynamic usage control policies.
I am a part of the team working on the NSF ARCHER project. I am responsible for customizing the Archer VM disk by populating it with architectural simulation tools like SESC, PTLSim, FeS2, Pin and Simics and for producing manuals for using these tools on the Archer infrastructure.
Crossbar Nanoscale Memories consisting of non volatile memory elements at the crosspoints of a grid of nanowires is touted as the next paradigm in high density memory design. Such memories formed by methods like stocahstic selection comprise an address space which is far greater than the log2N address space (N wires will have log2N bit address). Hence, a mapper which translates from a log2N address space to the actual addresses and interfaces the memory to the microscale circuitry is needed. NanoMAP is a mapper which uses a hierarchical approach, where the nth stage maps from the log2N space to the n-1th stage and the 1st stage maps from the 2nd stage to the nanoscale memory. With a good choice of the number of stages, the size of the total interface module can be brought down to 3% to 5% the size of a one-stage microscale mapper.
Moreover, apart from the usual constraints in terms of Area, Power and Speed, Reliability has emerged as one of the important limitations in these memories. The NanoMAP model also comprises an analytical model for the reliability of a nanoscale memory. The defects being considered are unclustered hard defects and the fault tolerance mechanism is ECC with reconfiguration using spares. Using this model, the variation of reliability with redundancy allocation is also studied.
Reconfigurable Computing Cluster
I developed a communication mechanism among RC boards in a cluster using MPI and architected a prototype of such a cluster. I adapted parallel cluster benchmarks and developed performance metrics for measuring RC cluster systems.
Other Class Projects
selection from the projects I
have done for my coursework.
- Study of the Cache
Access Patterns for SPEC2000 Benchmarks
- DANG: A
co-relating branch predictor/ branch target buffer
- Implementation and
comparison of routing protocols in Mobile Adhoc
- POpALU – A Power
ALU for embedded controllers
.; Figueiredo, R.J., "A Nanoscale Memory Interface Scheme based on Hierarchical Memory Mapping ,"Sixth IEEE Conference on Nanotechnology (Nano '06), June 2006
Venkatasubramanian, G.; Boykin, P.O.; Figueiredo, R.J., "Design of high-yield defect-tolerant self-assembled nanoscale memories," IEEE International Symposium on Nanoscale Architectures (Nanoarch '07), Oct. 2007 (Acceptance Rate: ~33%)
Venkatasubramanian, G.; Figueiredo, R.J.; Illikkal, R.; Newell, D., "A Simulation Analysis of Shared TLBs with Tag Based Partitioning in Multicore Virtualized Environments," Workshop on Managed Multi-Core Systems (MMCS '09) (in conjunction with ASPLOS 2009), Mar. 2009
Venkatasubramanian, G.; Figueiredo, R.J.; Illikkal, R.; Newell, D., "TMT - A TLB Tag Management Framework for Virtualized Platforms," 21st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2009), Oct. 2009 (Acceptance Rate: 35%)
Wolinsky, D.; Liu, Y.; St Juste, P.; Venkatasubramanian, G.; Figueiredo, R.J., "On the Design of Scalable, Self-Configuring Virtual Networks," The 21st International Conference for High Performance Computing, Networking, Storage and Analysis (SC '09), Nov. 2009 (Acceptance Rate: 22%)
Venkatasubramanian, G.; Figueiredo, R.J.; Illikkal, R.; Newell, D., "A Simulation Framework for the Analysis of TLB Behavior in Virtualized Environments," 18th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS 2010), Aug. 2010 (Acceptance Rate 16%)
Venkatasubramanian, G.; Wolinksy, D, Figueiredo, R.J., "Towards Collaborative Research and Education in Computer Architecture with the Archer System," 18th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS 2010), Aug. 2010
Venkatasubramanian, G.; Figueiredo, R.J.; Illikkal, R.; Newell, D., "TMT - A TLB Tag Management Framework for Virtualized Platforms," (Invited) International Journal of Parallel Programming (Link to IJPP official copy)
Venkatasubramanian, G.; Figueiredo, R.J.; Illikkal, R., "On the Performance of Tagged Translation Lookaside Buffers: A Simulation-Driven Analysis," 19th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS 2011), July. 2011 (Selected as one of the top 10 papers in the conference) (Link to IEEE official copy)
Venkatasubramanian, G., "Tag management architecture and policies for hardware-managed translation lookaside buffers in virtualized platforms," Ph.D Dissertaion, University of Florida, 2011 (Link to Proquest published copy)
Graduate Intern at Systems Technology Lab (CTG), Intel Corporation, Hillsboro, Oregon (August to November 2007)
Worked on the Jungle Project - Jungle is a lower cost, lower power solution for large-scale systems through scalable integration of CPU, Memory and I/O into domains. This incorporates JungleVine - a protocol for managing coherence within a domain and supporting inter-domain communication.
Developed JV - a cycle-accurate performance model (C++) for CC NUMA memory subsystem (including multilevel Caches) for Jungle
Integrated of JungleVine with SoftSDV - an Intel proprietary full system simulator.
Selected workloads and adapting to run on SoftSDV.
Quantified performance advantages of Jungle for these representative workloads.
Member of Technical Staff Intern at Performance Group, VMware Corporation, Palo Alto, California (September to November 2009)
Developed NetPAn, a tool for network performance testing and analysis
NetPAn is capable of generating re-creatable traffic patterns such as rate controlled traffic which was not possible using netperf
Developed a front end which would facilitate setting up complex test profiles (various tests between various nodes starting in synchronized or staggered fashion).
NetPAn also performs automated log retrieval and analysis and visualization for results (plotting latency, jitter etc) for better comprehension.
Evaluated the latency, throughput and jitter under various UDP traffic patterns and message sizes and studied the variation of these as the number of VMs is scaled up
Graduate Intern at Intel Labs (IPR/SPA), Intel Corporation, Hillsboro, Oregon (September to November 2010)
Worked on simulating driverless interfacing of Virtual Memory Aware (VMA) accelerators and IP blocks for SoC platforms.
In VMA accelerators, to avoid the user-kernel copy delays of device-driver model, the user-space buffer containing the data is directly offloaded to the accelerator. This necessitates an MMU to translate the offloaded virtual address to physical address. Modeled this IPMMU in a core-i7 Tylersburg platform running Linux 2.6.28 using Simics full system simulator. Also modeled the IPMMU TLB and hardware-synchronized it with the core TLB.
Modeled an accelerator with fine-grained offload functionality (Sum of Products, conversion of RGBA pixel to grayscale) as a PCI device. Wrote a device driver to interface this accelerator to the simulated platform (for contrasting with VMA mode). Made the accelerator VMA by offloading the user-space buffer, along with context id of the application, directly to the accelerator. Modified the PCI transaction definition to include this context id as a part of the PCI memory transaction. This context id is used by the IPMMU to obtain the context of the user application (CR3) and perform the TLB lookup/page walk.
Wrote an IPMMU page fault handler to handle page fault exceptions arising from the IPMMU page walk and integrated this handler with the Linux 2.6.28 kernel.
Developed very thin API library for user applications to offload tasks to the VMA accelerators. Tested the developed hardware models and software stack by writing a test application to successfully convert 512X512 32bpp Lena to grayscale. 256 IPMMU page faults (due to lazy allocation of the 1MB destination user-space buffer) were encountered and handled successfully.
I have a 4.0 GPA for the graduate level courses that I have taken. Some of these are
- Computer Architecture, Parallel Computer Architecture, Virtual Computers
- Fault Tolerant Computer Architecture, Nanocomputing
- VLSI Circuits and Technology, Advanced VLSI Design, Embedded Systems Design
Awards and Honors
- “Student Travel Grant for 2009 awarded by the Student Government, University
Gainesville for the 21st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2009)
International Student” awarded by The University of Florida
International Center, University
Gainesville - 2006 (1 of 63 students
selected from all international students in University of Florida)
- “Certificate of
Achievement for Outstanding Academic Performance”
awarded by University of Florida International Center, University
- 2004, 2005, 2008 and 2011
- “Dean’s Letter of
Commendation for Outstanding Academic Performance”
awarded by PSG College of Technology, India - July 2001 and
Outgoing Student” awarded by GRG Matriculation and Higher
School, Coimbatore, India