Better Cloud Self-Service with a Little Change

"Cloud self-service is easy... until it isn't" with a picture of chaotic fire in the background.

With the creation of the cloud and DevOps, specifically Infrastructure as Code (IaC), people have been wanting and creating cloud self-service.  It is a great idea.  It allows developers, architects, testers, almost anyone to create a service or application.  Developers and architects can try new services in their applications, while testers can create whole applications to do testing, while others others can use it for demos or other activities.  This just names a few options you can do with a cloud self-service solution.  This is a great capability!  However, it can become a problem depending how the services in the cloud self-service system are supported.

Not Like Cloud Provider Services

People see cloud self-service just like he services from the cloud or better.  Instead of what the cloud provider can offer, they can put anything in the cloud self-service systems.  This means they can have their favorite open source software ready to go with a simple click of a mouse button.  In addition, they can launch any of the company’s whole applications.  However, if not done properly, these on-demand services and applications are nothing like the services from a cloud provider.

If you just implement the cloud self-service catalog and do not think about the implication, then you are going to fall into a trap.  Just providing an “easy button” attached to a set of Infrastructure as Code (IaC) tools is not the complete solution. Basically, this scenarios allows people to quickly build something they have no deep knowledge of.  What do they do to troubleshoot it when something fails?  How do they handle maintenance and versioning? Can they mimic production for realistic load-testing?  This becomes a bigger problem when dealing with whole applications and large datasets.  Just building the service or application is not good enough, instead take a lesson from cloud providers for better solution.

Shared Responsibility Model

Look at AWS, they have three key types of services:

  • Compute – AWS manages the physical and virtual layer, customer handles OS and above
  • Managed Services – AWS is responsible for most of the service, customer does light administration and uses it
  • Serverless – AWS provides it and administrates it, customer just uses it

Each have their own shared responsibility model and this is key to understanding how to support services in your cloud self-service system.  If you provide a cloud self-service solution with no shared responsibility, then it is the consumer who is responsible for it all.  This leads to larger development teams, because you need more people to deal with the extra responsibilities.  Shared responsibilities are key to smaller teams and faster.

Shared responsibility shrinks your teams and allows the service teams to become experts of their services.  This means they can take on more responsibility as they become experts in providing the service.  Application development teams can leverage this expertise, so they only have to become experts in what they build.  If there is no shared responsibility, then the application team needs to be an expert in all aspects.  This tends to create larger teams to deal with all aspects of a service.

The Lifecycle of a Service

At a high-level, there are three main phases in the lifecycle of a service, which are:

  • Build the service
  • Operate & maintain the service
  • Destroy the service

Building a service today with IaC tools is easy, especially when someone has created the service template.  Once the service is available, you are in the operate & maintain phase.  This phase becomes more critical as the service remains operational, as we will see later.  Once the service is no longer needed, then it is destroyed and the resources are recovered using the same IaC tools used to create it.  Out of all the service lifecycle phases, the operate & maintain phase is the most important.

Breakdown the Operate & Maintain Phase

There are a number of possible activities in the operate & maintain phase, like:

  • Version updates to software and OS
  • Backup and recovery
  • Performance enhancements
  • Troubleshooting
  • Large data loads

This list highlights some of the key activities involved during the operate & maintain phase of a service.  These can be very big tasks, especially if there is no shared responsibility in place.  In a non-shared responsibility scenario, you as a service consumer has to do it all!  Whereas, in a shared responsibility model, more is put on the service team, who are the service experts.

The top four activities in the operate & maintain list above should be handled mostly by the service team.  In most cases, they should own it 100%, but this will vary from one service to the next.  However, the service documentation should call out in detail the shared responsibilities and what the service will do and what is left for the service consumer.  Beyond documentation, a service needs a builder API.

API to the Rescue

Instead of offering up a service via a IaC tool template, which only builds the systems making up the service.  Have an API offer up the service, which builds the service’s systems, but also enforces the services’ shared responsibilities.  This is a big shift from just getting a service where the consumer operates and manages everything.  However, the API can allow the consumer to opt-in and -out of certain shared responsibilities depending on what the service team implements.

When providing a service to an organization, there are a number of scenarios for how a service will be used.  Not all service instances will be production-grade, so the service API needs to allow for variation is performance, scale, costs and shared responsibilities to name a few to consider.  AWS has these built into their APIs, let’s take a look.

AWS Example of Builder API Supporting Operate & Maintain

One example of AWS having a number of aspects built into their service APIs, is their Relational Database Service (RDS).  Building a MySQL database in AWS’ RDS has more than just click a button, which you can see here.  The service consumer can do the edit the following:

  • Production-level or not
  • DB version
  • Cloud instance size
  • Multi-availability zone deployment
  • Storage type and size
  • Network and security settings
  • Database options
  • Backup
  • Monitoring
  • Maintenance preferences

As one can see, the API allows the service consumer a great amount of flexibility beyond just building a service.  The AWS API for a MySQL database allows the service consumer to make a number of decisions to include maintenance activities.  All of these choices are part of the shared responsibility model and is enforced in the API.  When building services, it is best to have a similar API.  Now, let’s look at how to apply this to our “large data loads” operation activity from above.

Handling a Large Dataset with an Service API

The operate & maintain activity list above mostly focuses on maintenance with only one operate activity called large data load.  The large data load can be a time consuming activity, especially when you want to do realistic load/performance testing.  Imagine loading a production-size load of data into a enterprise scale system.  How many hours do you think it will take?  Should the service consumer need to do this every time?  Is there a better way to handle this scenario and similar situations?

If you have a service where a common use case is to load large datasets, then the service team may be able to help its consumers.  The service team can look at a number of solutions to address this issue.  Here is a short list of possible options:

  • Export data in a native format to expedite future loads
  • Snapshot the database for later use, data and cloud instances, then shut it down
  • Leave the server up, but move the storage to a lower-tier

These are some options available to support a specific operations scenario.  These can be built into a service API for the consumer.  It comes down to what the service consumer needs.  This requires interaction between the service teams and consumers.  In addition, service teams should have quality usage data available to see how people are using their services, so they can improve the service over time.

Final Thoughts on Cloud Self-Service

People hear about cloud self-service and think it is the solution to solve people not using the cloud.  However, it requires people to take full-ownership of a service(s) or application, down to every nut and bolt.  This can be overloading, especially when something goes wrong.  By backing these services up with dedicated teams and builder APIs, cloud self-service can be much easier with shared responsibilities.

If you like this concept, then you may like the concept this is derived from, called Commercial-Grade Digital Services (CGDS).  You can learn more about CGDS at http://cgds.rocks/