Sky Computing on FutureGrid and Grid’5000 (poster)

TitleSky Computing on FutureGrid and Grid’5000 (poster)
Publication TypeConference Paper
Year of Publication2010
AuthorsRiteau, P, Tsugawa, M, Matsunaga, A, Fortes, JAB, Keahey, K
Conference NameTeraGrid'10
Conference LocationPittsburgh, PA
AbstractSky computing is an emerging computing model where resources from multiple cloud providers are leveraged to create large scale distributed infrastructures. These infrastructures provide resources to execute computations requiring large computational power, such as scientific software. Establishing a sky computing system is challenging due to differences among providers in terms of hardware, resource management, and connectivity. Furthermore, scalability, balanced distribution of computation and measures to recover from faults are essential for applications to achieve good performance. This work shows how resources across two experimental projects: the FutureGrid experimental testbed in the United States and Grid'5000, an infrastructure for large scale parallel and distributed computing research composed of 9 sites in France, can be combined and used to support large scale, distributed experiments. This showcases not only the capabilities of the experimental platforms, but also their emerging collaboration. Several open source technologies are integrated to address these challenges. Xen machine virtualization is used to minimize platform (hardware and operating system stack) differences. Nimbus, which offers VM provisioning and contextualization services, is used for resource and VM management. Nimbus allows turning a cluster into an Infrastructure-as-a-Service cloud. By deploying Nimbus on the FutureGrid and Grid’5000 platforms, we provide an identical interface for requesting resources on these different testbeds, effectively rendering interoperability possible. We also leverage the contextualization services offered by Nimbus to automatically configure the provisioned virtual machines. In our context, contextualization enables new resources to join the virtual cluster without any manual intervention. Commercial clouds and scientific testbeds limit the network connectivity of virtual machines, effectively rendering all-to-all communication, which is required by many scientific applications, impossible. ViNe, a virtual network based on an IP-overlay, allows us to enable all-to-all communication between virtual machines involved in a virtual cluster spread across multiple clouds. In the context of scientific testbeds such as FutureGrid and Grid'5000, it allows us to connect the two testbeds with minimal intrusion in their security policies. Additionally, we use Hadoop for parallel fault- tolerant execution of a popular embarrassingly parallel bioinformatics application (BLAST). In particular, we leverage the dynamic cluster extension feature of Hadoop to enable resources from Grid’5000 to merge with resources from FutureGrid in a single virtual cluster while computation is under progress. After extension, Map and Reduce tasks are distributed among all resources, speeding up the computation process. Finally, to accelerate the provisioning of additional Hadoop workers (deployed as VMs), an extension to Nimbus taking advantage of Xen copy-on-write image capabilities has been developed. The extension decreases the VM instantiation time from minutes to just a few seconds. The elasticity of this approach has been showcased as a demo presented at OGF-29. It includes elements from the Scaling-out CloudBLAST: Combining Technologies to BLAST on the Sky demo performed at CCGrid2010 and from the entry that won the Grid'5000 Large Scale Deployment Challenge at the 2010 Grid'5000 Spring School: Deployment of Nimbus Clouds on Grid'5000.