Stop throwing Money Away On Cloud

How can you control the costs of deploying services in the cloud?

While I will unpack this a bit below, the answer, in a word, is Automation.

We’ve seen cloud systems not just cost more than they need to, but even fail because they were manually driven and failed to provide the promised value. Now I recognize that not all aspects of a cloud system (in development) might be initially automated. However getting them automated as quickly as possible saves not just in costs of actual spending on cloud resources, but will save by reducing employee time (Dev and Ops) spent on repetitive tasks

Why pay for what you aren’t using? It should be obvious that cloud resources go by a pay-for-what-you-use model. In most cases, this means that you are paying for the resources that you are taking up (and thus are not available to other customers) whether or not you are actively using them. (Think of a virtual server (VM) that is turned on but that you are not actively using for anything). You pay for the resources you use, so if you are able to delete resources, you will not be paying for excess.

Know your costs
Most public cloud vendors have different pricing models for shared or reserved resources and many offer spot pricing (which can be a great deal or cost more than a shared/reserved instance depending on demand). For long-running processes, it will most likely make sense to run a reserved instance, but if you are spinning up a system for a test, looking at the spot vs shared cost before deploying could save you some money.

Start small and size up when you (really) need to. Some (most?) people get a bigger machine than they need. It’s better to see if a small machine does the job before jumping to the next size up. Why pay for four cores when one will do?

Manage your resource pool correctly
Resources should be available to users on demand and users should be expected to release resources that are not being actively used.

Some organizations run clouds like a virtual managed hosting system (users have to request resources (VMs, storage, load balancers, etc.) from IT and often this process takes weeks if not months. This model tends to get users treating what should be ephemeral resources, like VMs, more like traditional hardware. If it takes you two months to get a VM, why would you ever return it?

In the same vein, if the creation of resources (spinning up a VM and configuring it for use) takes too long (what that means will vary by dev) there may well be a tendency to keep a running instance up, rather than deleting it and launching a new one when actually needed.

To solve this, resources need to be easy for users to get and easy to provision. Allocating resource limits on a per/users or project (aka tenant) basis should allow on-demand provisioning while controlling potentially excessive or wasteful usage. Automating your system deployment (via simple script or provisioning tool (e.g. Ansible, Puppet, Chef..)) should remove the pain of setting up an environment – dev or test and remove the barrier to releasing unused resources.

Keep Dev processes efficient
If you are able to automate system deployment it saves developers’ time by launching an environment when it’s needed (no need to spend wait on manual deployment steps tasks) and that system can be deployed as often as needed. An automated deployment also assures that the system being launched is consistent and that bugs being tracked during development end up being due to code changes, rather than errors in deployment (nothing quite as frustrating as tracking down a “bug” that was a typo in a manual system deployment.).

Truly embracing DevOps with automation appropriately used for continuous integration / continuous deployment takes this a step further as it leverages the on-demand cloud model for efficient resource usage. Being able to automatically run tests on an environment that is automatically built for that test is an extremely powerful tool. As above, the automated build process avoids potential manual blunders and assures that tests are run on a reliable system, and you can feel confident in deploying the final code to production – which can also be automated!

Monitor your environment(s)
Monitoring really ties into the first two points. It is critical to know how your cloud environment is working efficiently to allow your Ops engineers to tune that system for efficiency. Tying monitoring into automation will allow your production system to scale up when demand increases (or an existing resource fails) and, from a cost-savings perspective, to scale down when resources are underutilized. In a dev environment, monitoring resource usage can also identify systems that are underutilized and return them to the resource pool for others to use or at least stop being billed for them. This process can also be automated so that your ops team is not manually checking utilization and returning idle resources.

Build “Cloud Native” applications
I’ve mostly referenced virtual machines in the content above, which can be totally appropriate for a cloud-deployed application (as can bare metal servers). Considering newer cloud-native tools such as the current default of Linux containers (e.g. Docker) or the newer serverless approaches (e.g. AWS Lambda) use resources much more efficiently (in most cases). This can translate to cost savings due to a smaller footprint and faster response to changing application needs. As a bonus, the rapid response times possibly with containers and serverless are readily automated (and monitored) with integrated tools (e.g. Kubernetes, Prometheus).

The microservices architectural model effectively breaks apart an app into individual functions (services) that can scale independently. Again, being able to scale only the bits of an application you need translates into more efficient use of the resources you end up using and paying for.

Private or Public or Hybrid Cloud?
If you are launching your startup and don’t have access to your own hardware, a public cloud service makes a lot of sense. You can literally pay just for the services you need and don’t need to manage the operational overhead that maintaining, managing, and operating your own hardware entails. As your system starts to scale, however, there is a tipping point where being able to own (and depreciate) hardware, even with the staffing costs of hiring a dedicated operational team becomes the better value. The TL;DR if you are running the equivalent of about 500 full-time use VMs, you may want your own system. (You can dive into a full article on the cost tipping point here: Do you need a Public or a Private Cloud – the Rent or Buy question.)

Takeaway
So while there are a lot of pieces in a cloud-based development system that you can find cost savings with, where you can actually realize savings will depend on where your business is along the transition to cloud (or cloud-native) development and service deployment. (This is why helping companies transition sensibly (and cost-effectively) is the primary focus of the company I work for -because it _is_ complicated to map out a clear path.) Still, in general, you want to have awareness of your system (monitoring) to identify places where you can reduce resource use, find the most efficient means of deploying/developing your application (containers/container orchestration, serverless, etc.), and automate processes that manage the system intelligently (automation) that only deploy and run the pieces that are actually needed.

Thanks for reading, and I hope this was helpful. Have other questions? Write me a comment on this post!

If you are looking for a next step in understanding this topic, here is a solid presentation by a cloud computing veteran (who is Kumulus Tech’s CTO) presenting at the SF Bay Cloud Native Open Infra meetup.