I published an architecture in the two-part series which can be perfect for Digital Transformations. Now it’s very tricky to measure the success of architecture but there are certain aspects which can be based on the best practices and guidance published. One of my favorite ones is the AWS Well-Architected framework. You can use this tool to measure if your architecture is set up to handle whatever you and your business wants.
So what is it?
The Well-Architected Framework has been developed to help cloud architects build secure, high-performing, resilient, and efficient infrastructure for their applications. Based on five pillars — operational excellence, security, reliability, performance efficiency, and cost optimization — the Framework provides a consistent approach for customers and partners to evaluate architectures, and implement designs that will scale over time.
The AWS Well-Architected Tool is now available. The user guide can be located here.
So now that you have got the overview lets try to apply it to my reference architecture. We will go into each pillar and see how I have applied that lens.
The operational excellence pillar includes the ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures. Factors like business KPIs and customer insights are important to know how to keep your applications healthy and reliable. System metrics from CloudWatch, Xray, APM tools are what I rely on to maintain the application distinction which are memory utilization, CPU, external/Internal latency, HTTP Errors, etc. In a production-like environment, you can go by incident reports or tickets dashboard to see how applications have fared over time.
The security pillar includes the ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies. Very crucial for the success of the business outcomes. I go by the minimalist principle i.e. everything is denied by default and should be allowed as per the need for infrastructure protection. I use AWS IAM roles, policies to provide minimal access for services. With APPSync and API gateway, I used the Cognito user pools and lambda authorizer to provide secure access from mobile/Web. To protect sensitive data, the application owns the encryption and masking as required along with encryption features from AWS. Encryption at rest and in transit is available in most services. e.g AWS DynamoDB has to default AES-256 encryption at rest and you can secure the connection with the IAM role.
As the name suggests, the system should be able to recover from disruptions at infrastructure and service level. To achieve this I look at the limits set for the applications and HA infrastructure to support it. Serverless doesn’t really need autoscaling like other compute but still has concurrent limits and if you don’t the limitations then its doomsday.
I always set up DDOS protection (for production) and set up throttling so that in unexpected high traffic the application is able to reject and still able to function. I also tried to put every requirement through the lens of Event-driven architecture as distributed transactions are hard to set up for reliability. If you have to have distributed transactions then I keep the approach to fail fast and fail over. For event-driven architecture too, I keep deletes manual and DLQ set up to have data reliability.
The performance efficiency pillar focuses on the efficient use of computing resources to meet requirements and the maintenance of that efficiency as demand changes and technologies evolve. With Cloud as we don’t have to do traditional sizing and capacity planning for nth level but still initial compute size, memory, etc. needs to be set. But I want to learn and that’s where I use operational analytics to help me set up the right sizing over time. I can check the lambda invocations in the cloud watch to see how much memory and time it’s taking over a range to set up betting tuning. The biggest drawback I have seen is that people tend to use the cloud as infinite space and instead of fixing bad code they latch on to higher capacity. But I firmly believe that code optimizations are mandatory and still solid in CN world. And lastly as mentioned in the reliability Piller putting the lens of event-driven approach to my use cases helped me solve a lot of performance problems.
A no brainer. I dont want my higher-ups to come to me after going live with a bill to explain. kidding aside, I always set an ROI for a use case and that’s what it is . The continual process of refinement and improvement of a system over its entire lifecycle. In my use case, I have used the serverless component which is easier to manage in terms of correct resource allocation. The fact that almost no sizing is required with architecting and the ability to scale based on demand with services but still as mentioned in the above pillars the need monitor and optimize for memory and time will save cost.
Connect with Me
If you are interested for a conversation or a chat. Please reach me on my linkedin.