Australian collaborationware firm Atlassian has revealed it’s spent 4 years making an attempt to scale back harmful inner dependencies, and whereas it has rebuilt its PaaS, it nonetheless has points – however thinks they’re now manageable.

As defined in a Tuesday post by Senior Engineering Supervisor Andrew Ross, “Atlassian runs a big service-based platform with hundreds of various providers, most deployed by our customized orchestration system, ‘Micros’.”

Micros handles over 2,000 providers, 5,000-plus each day deploys, works on over 40,000 DynamoDB tables and 80,000-plus Amazon Relational Database Service (RDS) tables. It additionally manages three million lambda capabilities.

One other piece of Atlassian’s infrastructure is a non-public Docker registry referred to as “Artifactory.”

In 2021, Atlassian deployed Artifactory utilizing Micros, and the Micros platform trusted Artifactory at deployment and runtime. That round dependency meant a failure in each of the instruments would make it unimaginable to get better the opposite.

And that’s hassle for Atlassian, given it’s a SaaS store and on the time it began to sort out dependencies was about to shift prospects from on-prem merchandise to the cloud.

Atlassian's dependency analysis for a subset of its platform

Atlassian’s dependency evaluation for a subset of its platform – Click on to enlarge

The corporate created a Steady PaaS Restoration (CPR) undertaking to handle as many dependencies because it may.

As that undertaking progressed, Atlassian realized it couldn’t take away all dependencies “on account of their quantity and complexity.” It due to this fact prioritized unpicking dependency tangles that made it exhausting to get better providers.

In 2023, the corporate staged a tabletop catastrophe restoration train that simulated 6.5 days of restoration efforts, to assist employees perceive and determine dangers.

Ross’s put up illustrates the results of that train with the photographs beneath, which present recovered providers in inexperienced, and unrecovered providers which have dependency tangles in pink. Within the “earlier than” shot, at left, three providers had been alive. Within the “after” shot, dozens of providers remained down on account of dependencies.

The results of Atlassian's tabletop DR exercise

The outcomes of Atlassian’s tabletop DR train – Click on to enlarge

Atlassian has now re-architected its platform into what Ross described as a “layer cake.”

“We determined to divide the cloud infrastructure into layers, with the bottom layers having the fewest dependencies and higher layers having many dependencies,” he wrote. This new cake isn’t freed from dependencies as a result of Atlassian doesn’t suppose it’s potential or sensible to eradicate all of them. As an alternative, the corporate has realized to reside with them utilizing the next rules:

  • A element in layer (N) can solely have exhausting dependencies on decrease layers (N → N-1 = Good).
  • No exhausting dependencies on the identical layer (N → N = Unhealthy).
  • No exhausting dependencies on larger layers (N → N+1 = Unhealthy)

The corporate has additionally migrated Artifactory from Micros to Kubernetes, eliminating a essential round dependency, and constructed a brand new low-dependency provisioning system referred to as Atlassian Platform Deployer (APD) that makes use of AWS CloudFormation as its deployment orchestration engine.

APD helped the corporate to create and deploy its not too long ago introduced Authorities Cloud. After many additional adventures, Atlassian migrated Micros itself to APD.

The corporate nonetheless has inner round dependencies however eradicated a whole lot of them and feels it now operates a extra dependable platform that’s simpler to get better.

It must, as a result of Atlassian not too long ago introduced a plan to ditch its on-prem merchandise and transfer all prospects to its cloud. And people prospects may rightly be cautious of that transfer, on condition that round dependencies had been large elements in current outages at Cloudflare and AWS. ®


Source link