Skip to content
    May 29, 2024

    You build it, you own it: Lessons learned from the trenches of cloud modernization CI/CD

    At one of my clients a few years back, the business decided that we no longer wanted to build and maintain our systems on-premises, and AWS was the chosen cloud provider to facilitate this goal. This decision was easy to understand when dealing with systems supporting "visible" products that often cater to external end users, directly impact ROI, and frequently get mentioned by senior management. But what about the less visible systems, such as continuous integration and code deployment (CI/CD) pipelines? I’ve learned quite a bit in the process.  

     

    My first foray into the Cloud was a lift-and-shift, where all on-prem technical assets were moved to or re-created in the Cloud. All servers were created in AWS, mimicking the specs of the on-prem systems as closely as possible. All staff received laptops that connected to each user's virtual AWS workstation. Developers and testers got replicas of their on-prem development, QA, and UAT environments. One of the systems was particularly massive, and there were some technical hurdles with replicating the larger databases in AWS that were resolved quickly. Ultimately, the lift-and-shift project was quite successful and delivered on time and within budget. So far, so good! 

     

    The next initiative was to modernize our systems to make them more Cloud native, particularly by taking advantage of server-less technologies. We quickly found that the existing continuous integration and continuous deployment tools (CI/CD) were insufficient, and a new DevOps team was spun up to build a system that would build and deploy the modernized products. The mandate was that every system be built and/or provisioned through automation by writing and executing infrastructure as code (IAC) - no manual work allowed.  

     

    As if out of thin air, several projects containing thousands of lines of code sprang to life. Initially, this code base was generic and un-opinionated, meaning it only supported custom cases within certain naming conventions for the resources it created. When it came to provisioning databases and APIs, it seemed straightforward. But as more environments were added to support user acceptance, beta, and automated testing, the code base grew and became more and more custom. It then started to become littered with conditional statements about what environment it was building so it could assign appropriate security policies and which Virtual Private Cloud (VPC) it was provisioning a system in. Developers took different approaches to provisioning certain assets, making the code less cohesive and reusable. The CIO remained content with our progress, seeing this parallel code base as a necessary evil. 

     

    Then, one day, a new type of system needed to be provisioned whose technical stack was not supported by the CI/CD code base. There were some complicated dependencies regarding which resource needed to be built first before another resource was provisioned. Thousands more lines of code were added to support this system, with more custom logic and conditional statements. The number of bugs introduced grew exponentially forcing the team to shift their focus to solely supporting the CI/CD codebase, which was now a full-fledged product. It appeared if the team could just get through the existing requirements, technical hurdles, and bugs, the CI/CD product would get "there." But there was no "there" in this case. Like any other large product, it requires dedicated resources to enhance, monitor, and maintain (also known as you build it, you own it!). That is not to say that internal infrastructure projects are unimportant, but this code base now consumes a great deal of resource time – a costly investment. It has shifted the focus from other core business efforts in an environment where frequent releases do not generate significant incremental business value. Suffice it to say that the CIO was not content with this outcome. 

     

    As I look back on this experience, here are a few lessons I learned that may be helpful to you and your organization as you tackle a similar challenge: 

     

    1. Delaying infrastructure automation until there is an in-depth understanding of the technical resources your product portfolio requires will result in a more robust and maintainable CI/CD product.
    2. Be careful when asking your DevOps teams to scramble to produce a quick fix to support a new system that does not exactly fit into the existing stack. Instead, wait a bit and manually deploy. You may eventually discover that you don’t get the return on the investment required to automate the creation of certain resources.
    3. Beware of technical debt creeping into backend infrastructure systems - it will increase maintenance costs and kill your total cost of ownership.
    4. Avoid writing custom infrastructure code like you avoid the plague. Whenever possible, the cost of using third-party software as a service (SAAS) provider will typically be far less than custom development and maintenance costs.

     

    As it has always been in software, only more so in the cloud - you built it, now you own it. Good luck out there. If you want to chat about some of the challenges you’re having, we’d love to hear from you!

    More from the blog

    View All Blog Posts