I am a PhD student in the Department
of
Electrical and Computer Engineering, University
of Florida, Gainesville.
I work in the Advanced
Computing and Information System Lab. My advisor is Dr. Renato Figueiredo.
I completed my
Bachelor of Engineering from P.S.G
College of
Technology, Coimbatore,
India
(2003) and my Master of
Science from
Resume: PDF format
My research activities are in the areas of virtualization, architectural modeling, simulation and system design. I have also worked on designing Fault-tolerant Nanoscale Memories. Here's a list of the significant projects I have worked on and I am currently working on.(Unless mentioned otherwise, I am the sole architect and developer on these projects)
TMT - A Tag Management Framework for TLBs
The Tag Manager Table (TMT) is a low-latency hardware solution for generation and management of tags for the x86 Translation Lookaside Buffer to improve the performance on virtualized platforms. Unlike domain specific tags (ASIDs) the TMT uses a function of the CR3 to tag entries in the TLB, thereby avoiding TLB flushes during both inter-VM and intra-VM context switches. From simulation based results, CR3 based tags can reduce the number of flushes by about 90% compared to no tags and 75% compared to domain-specific tags. For 8-way TLBs the average reduction across a variety of workloads is 35.82% for a 512 entry DTLB and 52.73% for a 256 entry ITLB. Moreover, this reduction is achieved using a 3 bit tag and an 8 entry TMT. Currently, a timing model for the TLB and the TMT is being developed to get an idea of the benefits of TMT in terms of performance based metrics. The impact of tagging for workloads with varying amounts of memory footprint and flush rates are also being investigated.
 
CShare TLB - Sharing in TLBs
With the emergence of tagged Translation Lookaside Buffers that can retain the entries of multiple processes, the TLB has become an important shared resource between processes. With the cost of TLB misses increasing it is imperative to understand and control the sharing of the TLB by various processes. Moreover, avoiding TLB flushes which occur because of changes in the page table becomes important, especially due to the high latency remote shootdown process involving Inter Process Interrupt. CShare TLB is an effort to utilize the TMT to store metadata apart from CR3 based tags and use this metadata to control the sharing of the TLB. Currently, the effects of sharing reflected as a reduction in Clocks per Instruction, is being studied for various typical server workloads. Possibilities of incorporating coherence among TLBs, to avoid remote TLB shootdowns are also being investigated.
 
Archer
I am a part of the team working on the NSF ARCHER project. I am responsible for customizing the Archer VM disk by populating it with architectural simulation tools like SESC, PTLSim, FeS2, Pin and Simics and for producing manuals for using these tools on the Archer infrastructure.
 
NanoMAP
Crossbar Nanoscale Memories consisting of non volatile memory elements at the crosspoints of a grid of nanowires is touted as the next paradigm in high density memory design. Such memories formed by methods like stocahstic selection comprise an address space which is far greater than the log2N address space (N wires will have log2N bit address). Hence, a mapper which translates from a log2N address space to the actual addresses and interfaces the memory to the microscale circuitry is needed. NanoMAP is a mapper which uses a hierarchical approach, where the nth stage maps from the log2N space to the n-1th stage and the 1st stage maps from the 2nd stage to the nanoscale memory. With a good choice of the number of stages, the size of the total interface module can be brought down to 3% to 5% the size of a one-stage microscale mapper.
 
Moreover, apart from the usual constraints in terms of Area, Power and Speed, Reliability has emerged as one of the important limitations in these memories. The NanoMAP model also comprises an analytical model for the reliability of a nanoscale memory. The defects being considered are unclustered hard defects and the fault tolerance mechanism is ECC with reconfiguration using spares. Using this model, the variation of reliability with redundancy allocation is also studied.
 
Reconfigurable Computing Cluster
I developed a communication mechanism among RC boards in a cluster using MPI and architected a prototype of such a cluster. I adapted parallel cluster benchmarks and developed performance metrics for measuring RC cluster systems.
 
Other Class Projects
A selection from the projects I have done for my coursework. The details of the projects can be found in my resume
Graduate Intern at Systems Technology Lab (CTG), Intel Corporation, Hillsboro, Oregon (August to November 2007)
Member of Technical Staff Intern at Performance Group, VMware Corporation, Palo Alto, California (September to November 2007)
I have a 4.0 GPA for the graduate level courses that I have taken. Some of these are