'Automatic enablement, coordination and resource usage prediction of unmodified applications on clouds

Title'Automatic enablement, coordination and resource usage prediction of unmodified applications on clouds
Publication TypeThesis
Year of Publication2010
AuthorsMatsunaga, A
Secondary AuthorsFortes, JAB
Academic DepartmentElectrical and Computer Engineering Department
Number of Pages138
UniversityUniversity of Florida
ISBN Number9781124131443
KeywordsApplied sciences, cloud computing, machine learning, MapReduce, Resource usage prediction, virtualization, Web service
AbstractAs computing paradigms evolve and application demands grow, users face challenges in efficiently accessing, using and coordinating the large number of heterogeneous resources of complex computing systems. In the context of cloud computing, this work proposes methods to automate time-consuming processes currently performed by end users or developers, namely, the enabling of applications as services, the scaling-out of applications, and the estimation of resource consumption by applications. First, offering application services to consumers requires significant efforts by cloud providers, since many existing applications, useful to consumers, are not implemented as Web Services. To speed the creation of application services, this dissertation describes an approach to the automatic enablement of existing text-based applications with a command-line interface, called Command-Line Application Wrapper Service (CLAWS). Compared to other application-wrapping approaches, CLAWS exposes a simpler interface to users, completely hiding the complexities of understanding, developing and deploying access-controlled Web Services. CLAWS is motivated by, and effective for, the important case of interactive applications, which has not been considered by other approaches. The use of CLAWS is evaluated in the context of a transnational digital government project, where CLAWS greatly facilitated the integration of translation and conversation applications into an information sharing system. Second, in order to address the need to scale computationally intensive applications on clouds, this dissertation presents an end-to-end approach and best practices to run such large applications on multiple clouds. The techniques were implemented in CloudBLAST, combining the use of MapReduce model for coordinating the parallel execution of unmodified applications, machine virtualization for encapsulating applications and their execution environments, network virtualization to connect resources behind firewalls/NATs, and cloud management services to coordinate the deployment of a virtual cluster on demand. Experiments with CloudBLAST in cloud testbeds have demonstrated good scalability executing a bioinformatics application (BLAST), even when using cloud resources across wide-area networks and in the presence of machine and network virtualization. Third, a useful piece of information for scheduling jobs, typically not available, is the extent to which applications will use available resources, once executed on heterogeneous clouds. This dissertation comparatively assesses the suitability of several machine learning techniques for predicting linear and non-linear spatiotemporal utilization of resources, taking into account application- and system-specific attributes. This work also extends an existing classification tree algorithm, called Predicting Query Runtime (PQR), to the regression problem by selecting the best regression method for each collection of data on the leaves. When compared to other regression algorithms (linear, k-nearest neighbor, decision table, radial basis function network, and support vector machine), the new method (PQR2) yields the best mean percentage error predicting execution time, memory and disk consumption for two bioinformatics applications, BLAST and RAxML, deployed on scenarios that differ in system and usage.