DevProd Team - Core Vision: Clowder

This is the first in a series of posts that will explain the strategic goals of the Clouddot DevProd team, their key responsiblities, and how those responsibilities fit into their strategic goals. This post will focus on the rationale behind the creation of Clowder.

Strategic Goals#

The DevProd team’s chief concern is to improve developer productivity by focusing on two strategic goals:

  • Experience: Improving developers' experience of operating inside a kubernetes-based platform such as Clouddot, e.g. debugging, maintaining deployment resources, observability, logging.
  • Scalability: Investigating and documenting best practices and procedures that ensure consistent DevOps behaviours. Also includes automating as much of these best practices as possible.

These goals are closely related but are worth calling out separately. Scalability could roll up under developer experience: a more scalable organization implies that it takes less effort to get the same amount of work done, which would be consdiered a better experience. But it’s worth calling out scalability because it’s such a key strategy to achieving a better developer experience, particularly as the Clouddot platform continues to rapidly grow.

Also, scalability is just decsribing productivity from a solution-based perspective: a solution is considered scalable if it makes the individuals in the organization more productive (which leads to a better overall experience).

Lastly, developers' productivity is of course a measurement of time. How long does it take for a developer to e.g. onboard a new app onto the Clouddot platform, push a changeset to production, or integrate Redis into their app? What Clowder automates saves signifcant time for both developers and SRE.

There is a lot of room to improve Clouddot’s experience and scalability, and Clowder is a key component in moving that experience forward.

Today’s Complex Environments#

It is well known that the complexity and configurability of today’s environments can require many hours of research and experimentation in order to achieve a minimum viable deployment. From the deceptively simple task of building a lightweight, secure, and robust image to the more advanced stages of operations, such as pod scaling and monitoring, the knowledge required to excel in this landscape is vast.

Teams often have little time for researching solutions to a particular operational problem — and rightly so. A team should be focused on the development and stabilization of features for their application in order to provide business value as quickly as possible. What often transpires as a result of having no pre-set standard and a tight application-centric focus is a number of different, sometimes divergent, solutions to the same problem. This can make it very difficult to achieve a consistent standard in operations across the platform. This consistency is key to building a scalable platform: they are more automatable, understandable, and more easily documented.

As an example, take the simple case of documentation tooling. It could be expected that combining documentation from multiple projects into a single place with a consistent look and feel would be rather straightforward to achieve. Yet the variety of tooling utilized across the ecosystem has resulted in a non-trivial effort, consisting of converters and other scripts to try to reach a singular portal for documentation.

Curated Toolchains for Everyone#

The DevProd team works to curate the set of tools for developing and operating applications on the Clouddot platform. Because they operate as a team outside of any one application, the DevProd team can take a wider view of the requirements and offer solutions that will provide the biggest net benefit to the entire platform.

The end result may not necessarily be the most optimal solution for each individual team, but keeping an approach consistent across the platform at the cost of e.g. a 5% performance is usually worth the added scalability.

Codifying the Operational Logic#

Whilst these best practices and tools can be provided to developers as documents and scripts, the onus is then once again on the developers to understand and adopt them. This consumes already precious resources and is open to errors resulting from mistakes during input (pebkac), misinterpretations of specifications, and intentional or unintentional non-compliance.

The DevProd team chose to minimise this risk with Clowder. By codifying these practices into an operator and promoting convention over configuration, Clowder can automatically configure applications to align with best practices and standards, while at the same time reducing the developer knowledge necessary to maintain an optimal environment.

Operational Benefits#

The scalability benefits Clowder brings also extend to the front line support teams (i.e. SRE) dealing with operations. These teams traditionally spend a portion of their time orienting themselves to each app they are responsible for. This process is time consuming and leaves room for human error, potentiallly leading to incorrect assumptions being made about application configuration or process flow.

With the responsibility of building and deploying Kubernetes resources delegated to Clowder, certain variables that were previously left up to personal preference, for example service port numbers, can be assumed for every single application, eliminating costly ambiguity when operations teams are trying to understand an application’s entrypoints.

Clowder gives operations teams (and other interested devs) a single point of entry to understand an application, with all relevant information contained in a single document (the ClowdApp). Knowledge gained understanding one app is often transferable to another, decreasing problem investigation time, and increasing platform operational awareness.

As both operations and developers gain an understanding of Clowder and the deployment models used, there is also a shared language they can use to describe an app’s operation, fostering tighter cohesion between the two.

Insulation from Environmental and Requirement Changes#

By reducing developers’ interactions with environments to a single configuration document and letting Clowder deploy and configure their applications and prerequisite services, developers are automatically insulated from the changes that occur in ever evolving operational environments. These changes will instead be implemented inside Clowder and automatically applied at the correct time.

As an example, the arduous task of complying with FedRAMP specifications could well be reduced by say 30% by having an application be managed by Clowder. Since Clowder can ensure that certain requirements of FedRAMP are implicitly met, developer teams are left with a reduced workload. As and when these requirements are reviewed and updated, Clowder can be modified to reflect those changes, insulating the development teams from any further effort.

Devs also benefit from configuring their apps using a single JSON document mounted in their app’s container. Devs can depend on this document being consistent regardless of the operational variables outside the application, e.g. using an RDS database versus a local DB or using Redis versus Elasticash. (Side note: files mounted into a container is a more secure way to provide sensitive information to apps than environment variables since they are so often dumped to logs.)

As the number of these changes continues to grow, with the advent of requirements like service meshes, advanced autoscaling configurations, and canary deployments, developers onboarding to Clowder will be able to reduce time spent on researching operational resources and instead spend more time delivering business value by leveraging the reasonable default configurations that Clowder provides.

Read more about Clowder here