Leaving aside the system database and associated servers, resource requirements could easily be established in terms of physical CPU capability, memory, storage and I/O. Furthermore this could be achieved for different anticipated usage levels accommodating the desired 2X and 3X expansion. The database provision was more problematic, but based upon past infrastructure expansion decisions; server specifications were produced in which the technical team had confidence, albeit with a possible requirement for further query optimisation and minor index/table restructuring. Ultimately, there was a database sharding plan which provided sufficient insurance as to allow a high level of confidence in the resultant specification. The existing system maintained a high level of redundancy in order to provide for both expansion of 50% and disaster recovery. The requirements specifications made similar provisions for the three target volumes. Certain assumptions were made regarding the stability of third-party APIs and the ratios and relative activities of the three user categories.
c) Comparing Costs/Benefits of Selected Suppliers
In order to address this problem, it was first necessary to identify potential suppliers. There is now a huge range of cloud providers and aggregators and an objective means was sought of determining the likely best candidates. This proved to be a far more difficult problem than anticipated. Gartner is usually regarded as a reliable source of information within the industry with assessments based upon routinely conducted surveys. However, there was imperfect correspondence between Gartner’s results (Gartner, 2012) and those of other surveys. There have been some attempts to more objectively and empirically assess cloud providers. For example, Compuware (http://www.compuware.com/) provides a service which continuously monitors a reference application running in each of the ‘major’ cloud service providers worldwide (Molyneaux, 2009). Results are available in terms of availability and response-time statistics updated in real time. This revealed that at least one provider had failed to meet its SLA targets within the previous week and also suggested considerable response-time variability — both within and between providers. The company considered the Compuware tool to be the most objective measure and used it to select an initial list of potential providers.
Based upon the analysis performed with the Compuware tool, the following providers were identified as those providing the best response times in the seven days prior to the analysis – UMBEE, ElasticHosts, BlueSquare, Qube, Netcetera, Rackspace UK, BT Global Services, VOXEL (EU Netherlands), Windows AZURE, LUNACloud (EU France), Dimension Data (EU Netherlands), Amazon EC2 (EU Ireland) and CloudSigma (EU Switzerland). Based upon the Compuware data, there was considerable variation, with the worst average response time of this elite group being some 300% of the fastest for the simple HTTP based reference application accessed from the London spine. The study company was UK based with customer access almost exclusively from the UK.
In gathering information on all aspects of the services of each provider, their on-line promotions were utilised, an approach adopted by the University of Surrey in their Fair Benchmarking for Cloud Computing Systems study (Gilam, 2013). At this stage, consultations with sales and/or technical staff were considered likely to be too time consuming. The information was gathered over a one month period during early 2013. It is possible that there were commercial and/or technical changes occurring during this period that are not reflected in subsequent narrative.
There has been some improvement in the transparency of charges made by the major hosting providers over the past few years. Most employ some form of interactive cost calculator (which were utilised for this study). However, there were variations in practice, with some companies for example, having a fixed inclusive element – say for outgoing bandwidth – and others having only a unit charge. Some companies had a ‘base system’ with an associated cost — to which unit priced resources could be added. All quoted hourly rates for incremental resource units. Some companies continued to represent systems in terms of relative capabilities (x small, small, medium, large, x large etc), often with reference to a ‘base’ machine. For example Amazon cites a 2006 Xeon running at 1.7GHz as an ECU which they further equate to a 2007 Opteron or Xeon processor running at between 1.0 and 1.2 GHz (note the introduction of ambiguity). These variations made exact comparisons extremely difficult.
The company, with limited time, produced a best estimate of the costs to meet the resource requirements for each of the above providers. The figures are not presented here — they are undoubtedly inaccurate and would likely be unfair to one or more providers. However, they did represent the best efforts of the company to perform a task whose complexity was magnified by the non-uniform promotional strategies of cloud providers. The company further considered the complexities that might be involved in migration and the capabilities of each provider to expand and contract resource provisioning with the finest level of granularity possible. Security, data integrity, data privacy and other regulatory issues were also concerns.
Provisioning costs across the companies varied considerably: the highest estimate being considerably more than double the lowest estimate assuming provisioning based upon peak demands within a 24 hour cycle. However, even the highest estimate was some 20% less than the ‘in house’ cost based on a three year capital depreciation cycle. The cost of software licences was included and an assumption was made that staffing would continue at existing levels and cost. At face value, the case for cloud migration thus seemed compelling.
However, the complexities of migration had to be considered, together with possible effects on SLAs and other issues. The most attractive solution, based upon both overt cost and the level of elasticity granulation was Amazon EC2. The ability to provision instances ‘in minutes’ rather than ‘hours’ was particularly attractive. The company therefore prioritised EC2 in its further evaluation. Based upon the literature available and costing in development time necessary to overcome perceived constraints in the Amazon offering (notably persistence problems, MySql hosting limitations on EBS (Elastic Block Store) and the necessity to use software based RAID), EC2 was excluded. Observationally, this may have been misguided but the business sought certainty and the migration to EC2 afforded unacceptable risks from the perspective of the CEO based upon the published information available. Limited literature searches supported the perception that EBS I/O performance was too variable, particularly with live replication of the master database. Refer, for example, to (Robertson, 2011).
Testing
Two further providers were selected for consideration, having met the ‘minimal development cost’ constraint, meeting regulatory requirements regarding security and data privacy, and falling at the lower end of the cost estimates. We will refer to these as CPA and CPB. Initial literature searches suggested that for both providers’ performance variation was less than 5% which could be ‘budgeted in’ if necessary. Testing was destined to be difficult given the asynchronous nature of the applications and the involvement of third party services. However, the timings involved for such services were known and hence could be simulated.
Images were created for the two selected services. This was not trivial, given the need for simulation, but the VMware based virtualisation already employed by the company provided an advanced starting point. The critical latencies in the application centred on a number of complex queries. The database comprised some 90 tables and certain critical sessions could involve staged queries involving between 15 and 20 tables. These were thus targeted for testing together with more mundane but high volume transactional sessions. At this stage the MySql server was not migrated, a decision based upon the fact that live replication to a separate provider was a resilience requirement and hence the I/O capabilities were crucial.
During the first eight hours of testing, both providers were failing to meet targeted response times with load levels at 50% of peak. Furthermore, there was high variability in performance (as measured by response times) which was most marked with low instance levels. One of the providers reported a problem during testing and provided notification of the necessity for a machine reboot. Both providers were contacted through online messaging. CPA took a look at the configuration and reported nothing amiss, although did admit to a ‘temporary high volumes of traffic’. CPB insisted that a support ticket be raised which would be dealt with within 24 hours. The experience produced such a lack of confidence in both providers that the testing (which was costly) was abandoned.
Lessons Learnt
The evaluation of services offered by competing cloud service providers is not trivial. Marketing information published by providers highlights potential benefits but offers little information which might assist potential customers in exercising informed judgements. Whilst all providers offer uptime SLAs (often merely guaranteeing to not to charge for hours which fall below the SLA figure), none give any performance guarantees for CPU and I/O. There is a range of comparative studies available, for example CloudSpectator, 2013, but they are often associated with one or more providers, or are based upon a single reference application/test suite which may inadequately represent the optimal ratio of resource allocation suited to a target application. With the exception of Amazon’s EC2, the granularity of resource allocation in all of the providers considered is not only coarse but generally also requires manual intervention, additional configuration input information and significant time. Most of the research into ensuring elasticity of response is largely to the benefit of providers rather than consumers of cloud services; see, for example, (Bennani, 2005) and (Menasc’e, 2009).
Resource allocation in the physical world is problematic enough and is often responsive — in the sense that it is a reaction to a perceived impending problem. For example, a lengthening of response latency produced by long query times might be addressed by a combination of query optimisation, additional indices, expansion of database server RAM, increased SSD storage and bandwidth expansion. In the virtual world, where the performance of these elements (or their substitutes) is ambiguous, the problem is exacerbated. In deciding whether to migrate to the cloud, organisations are in forward-planning rather than responsive mode and will therefore be more cautious and less experimental.
In selecting infrastructure and platforms in the physical world, organisations have the benefit of an extensive range of benchmarking services. They are particularly well established in high performance computing, for example the HPC Challenge Benchmark Suite (Luszczek, 2006). The need for Cloud benchmarking is well established (Luszczek, 2011) and the Transaction Processing Council is working on a new benchmark for assessing transaction based applications on Cloud infrastructure and platforms (Nambier, 2013). Although there have been some experimental tools (Calheiros, 2010), these are complex and are beyond the reach of most human resource challenged SMEs.
In the embedded computing world the characterisation of applications for optimal hardware configuration is reasonably well established (Sanna, 2009). The problem in doing so for web-based applications differs little conceptually. What we need as a starting point is a means of characterising applications and their usage in terms of the optimal combinations of elements for differing scales of processing, subject to automatically identified upper (infrastructure or platform) limits. With adequate benchmarking of cloud IaaS and PaaS services coupled such automated characterisation of web applications, we would have the possibility of tools permitting automatic configuration of cloud resources which would not only assist migrators but also serve as a basis for vendor provided resource planning and performance guarantees.
Conclusion
At present, the development effort required to evaluate cloud service providers for SMEs with established web-based applications is too high. Furthermore, on the basis of the evidence that we have reviewed (over a period of 12 months), large-volume applications with large numbers of machine instances are assured less variability in performance levels and consequently smaller organisations are necessarily disadvantaged. Variability in performance coupled with lack of clarity in instance performance specification and realisation make it extremely difficult to plan resource usage. The level of (timely) granularity in resource responsiveness is not sufficiently fine to allow smaller companies to benefit significantly from cloud economies. Much of the work on improving resource elasticity is for the benefit of service providers rather than consumers who continue to be offered rather quaint ‘compute units’ with (with the exception of EC2) impractical methods of adjusting resource allocation. More work needs to be done on the automatic characterisation of applications and thereby the automatic determination of cloud resource allocation with associated performance guarantees within defined limits.For the company concerned in the study, the revelation that resource usage was concentrated in an eight hour block offered possibilities of far clearer benefit than cloud migration. They sought an agreement with a co-located company dealing with consumer sales in Asia. Perhaps it is time to add the term Cooperative Cloud to the ever expanding cloud glossary. We are presently engaged in work on the characterisation of Web 2.0 applications to permit the automatic generation of reference models for comparative evaluation across cloud service providers.
Note
1Earlier collaboration had attracted two funding awards: i) “Digital-media Authenticated Electronic Disclosure Application System”, UWSP Advantage Proof of Concept (POC), 2009-10. ii) “A Practical framework for the development and Evaluation of Multi-factored Authentication Schemes for Secure Distributed Systems”, EPSRC/UK CASE Studentship Award (EP/H501320/1), 2009-2012.
1.Bennani, M.N. and Menasc´e, D.A.(2005), ‘Resource Allocationfor Autonomic Data Centers Using Analytic Performance Models’, Proceedings of 2005 IEEE International Conference on Autonomic Computing, Seattle, WA,
Google Scholar
2.Calheiros, R.N.; Ranjan, R.; Beloglazo, A.; De Rose, C.A.F.; Buyya, R., (2010), ‘CloudSim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms’, Software Practice and Experience, 2011; 41:23—50, Published online 24 August 2010 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/spe.995.
Publisher – Google Scholar
3.Cloud Spectator, (2013), ‘Cloud Server Performance – A Comparative Analysis of 5 Large Cloud IaaS Providers’, Online, (at CloudSpectator,last), last accessed on 02-08-2013, available at ,http://www.cloudspectator.com/wp-content/uploads/2013/06/Cloud-Computing-Performance-A-Comparative-Analysis-of-5-
Large-Cloud-IaaS-Providers.pdf,
4.Gartner, (2012), ‘Critical Capabilities for Public Cloud Storage Services’, Online, Gartner,Last Accessed 02/08/2013, Corrected 10/01/2013, http://www.gartner.com/technology/reprints.do?id=1-1D9C6ZM&ct=121216&st=sg.
5.Gillam L., Li B., O’Loughlin J., Tomar A.P.S (2013), ‘Fair Benchmarking for Cloud Computing Systems’, Journal of Cloud Computing: Advances, Systems and Applications 2013, 2:6 7th March, 2013
Google Scholar
6.Maatta, S. ; Indrusiak, L.S. ; Ost, L. ; Moller, L. ; Nurmi, J. ; Glesner, M. ; Moraes, F, (2009) ‘Characterising embedded applications using a UML profile’, Proceedings of the 11th international conference on System-on-chip (SOC’09), IEEE Press, Piscataway, NJ, USA, 172-175.
Google Scholar
7.Microsoft, (2010), ‘SMB Cloud Adoption Study Dec 2010’, Online,Microsoft, Last accessed on 02/09/2013,http://www.microsoft.com/en-us/news/presskits/telecom/docs/smbstudy_032011.pdf.
8.Molyneaux, I., (2009), The Art of Application Performance Testing: Help for Programmers and Quality Assurance, 1st edition, O’Reily Media.
Google Scholar
9.Robertson, K, (2011), ‘Our Pain Points with EC2 and how our Move solved them, available online at InvalidLogic, last accessed on 15/08/2013, http://invalidlogic.com/2011/02/16/our-pain-points-with-ec2/ .
10.Menasc´e D.A. and Ngo P. (2009), ‘Understanding Cloud Computing: Experimentation and Capacity Planning’, CMG, Proceedings of 2009 Computer Measurement Group, Dallas, Texas, December 2009
Google Scholar
11.Luszczek, P., Bailey, D., Dongarra, J., Kepner, J., Lucas, R., Rabenseifner, R., Takahashi, D. (2006), ‘The HPC Challenge (HPCC) Benchmark Suite’, SC06 Conference Tutorial, ACM/IEEE SC2006 Conference on High Performance Networking and Computing, Tampa, Florida, November 12, 2006.
12.Luszczek, P., Meek, E., Moore, S., Terpstra, D., Weaver, V., Dongarra, J. (2011), ‘Evaluation of the HPC Challenge Benchmarks in Virtualized Environments’, Proceedings of 6th Workshop on Virtualization in High-Performance Cloud Computing, Bordeaux, France, August 30, 2011.
Google Scholar
13.Nambiar, R. et al, (2012), ‘TPC Benchmark Roadmap 2012’, Springer, Selected Topics in Performance Evaluation and Benchmarking, Lecture Notes in Computer Science Volume 7755, 2013, pp 1-20
Google Scholar