As a Senior Technical Product Manager (hands-on with day-to-day coding and contribute to features as an IC), working on a large scale distributed microservice architecture for an ecommerce platform, I was faced with the challenge of reducing our AWS bill which was exceeding $7000+ per month.
After conducting a thorough analysis of our infrastructure and codebase, I implemented several measures to bring down the costs to under $2000 per month.
Here are the steps I took:
MySQL queries can be a major bottleneck in application performance, leading to slow response times and increased resource utilization. By identifying and optimizing slow queries, you can significantly improve application performance and reduce AWS costs.
To identify slow MySQL queries, you can use a variety of tools, including Laravel Telescope, MySQL’s slow query log, and performance monitoring tools like New Relic or Datadog.
For my applications, I used Laravel Telescope since our applications runs on Laravel. By using Laravel Telescope to identify slow queries, I was able to focus my optimization efforts on the parts of the application that would provide the most significant performance gains.
The EXPLAIN operator is a useful tool that helps to analyze and optimize MySQL queries. By running slow queries with the EXPLAIN operator, I was able to identify the areas of the queries that needed optimization. Here’s an example of how to use the EXPLAIN operator:
The EXPLAIN operator is a powerful tool for analyzing the performance of MySQL queries. It provides detailed information about how MySQL is executing a given query, including which indexes are being used, how tables are being joined, and how data is being sorted.
By running slow queries with the EXPLAIN operator, I was able to identify areas of the queries that needed optimization, such as missing indexes or inefficient join operations.
Indexes are an essential tool for optimizing database performance. By adding indexes to the columns used in WHERE clauses, I was able to speed up the execution of MySQL queries. Here’s an example of how to add an index to a column:
By creating indexes on frequently queried columns, you can significantly improve query performance and reduce AWS costs.
When working with alphanumeric string columns, it’s important to use proper indexing and string matching techniques to improve query performance. One common mistake is to use the wrong data type when querying the column, which can lead to poor performance and increased resource utilization.
In the case of an alphanumeric string column like “sku”, a query that uses a string value with proper quotation marks (i.e., WHERE sku = “1882900”) will match the index and significantly improve query performance. This is because MySQL can use the B-Tree index type to perform an efficient string match and quickly find the rows that match the query condition.
However, if the query uses a numeric value without proper quotation marks (i.e., WHERE sku = 1882900), MySQL will treat the value as a number rather than a string. This means that MySQL will need to perform a full table scan to search for rows that match the query condition, which can be slow and resource-intensive.
To avoid this issue, it’s important to use proper string matching techniques and data types when querying alphanumeric string columns. By using proper quotation marks and string matching techniques, you can improve query performance and reduce resource utilization, leading to a more efficient and cost-effective database infrastructure.
One way to optimize costs when using AWS Lambda is to decrease the function timeouts. AWS charges for the duration of a function’s execution, rounded up to the nearest 100ms. By reducing the timeout, you can decrease the amount of time AWS charges you for.
For example, if you have a Lambda function that runs for an average of 60 seconds and you reduce the timeout from 60 seconds to 30 seconds, you’ll only be charged for 30 seconds of execution time instead of 60. This can lead to significant cost savings over time, especially if you have many functions running for long periods of time.
Since the queries were already optimized, it didn’t need that long a timeout, thus it was convenient for me to do so, for other it might be a different number.
API Gateway caching is a powerful tool for reducing the load on your backend by caching the responses to frequently accessed API endpoints. By enabling caching for our most frequently accessed endpoints, we were able to reduce the number of requests hitting our backend, resulting in lower AWS costs.
After optimizing our MySQL queries and adding proper indexes, we were able to reduce the size of our RDS instances, which lowered our monthly costs. It’s important to monitor the performance of your RDS instances and adjust their sizes accordingly to avoid overprovisioning and incurring unnecessary costs.
CloudWatch logs can quickly accumulate, resulting in high storage costs. By reducing the retention period for CloudWatch logs, we were able to save on storage costs. It’s important to balance the need for log retention with the cost of storage and adjust the retention period accordingly.
CloudFront caching is a powerful tool for reducing the load on your backend by caching the responses to frequently accessed API endpoints. By using CloudFront caching for our most frequently accessed APIs, we were able to lower our AWS costs. However, it’s important to consider the cacheability of your API responses and adjust the caching strategy accordingly.
When it comes to an ecommerce search service, it can involve generating filters, synonyms, fuzzy search, aliasing, and masking. Each of these tasks requires querying the database to get the necessary information, which can result in a large number of queries being executed for each search.
To optimize the search service, it’s important to reduce the number of queries being executed while still maintaining the necessary functionality. This can be achieved through techniques such as caching and optimization of the search algorithm.
For example, instead of querying the database for every filter and synonym needed for a search query, the service can pre-cache commonly used filters and synonyms, and only query the database for those that are not cached. Additionally, the search algorithm can be optimized to reduce the number of queries needed to generate the search results.
By reducing the number of queries needed for each search, the search service can be made more efficient, leading to faster search results and lower resource utilization. This can ultimately result in cost savings and improved user experience for customers.
Another way to optimize your database performance is by selecting only the columns that you actually need in your queries, rather than using the wildcard (*) to select all columns. This can reduce the amount of data that needs to be retrieved from the database and improve query performance.
For example, instead of using the query
you could use
to retrieve only the columns that are needed.
When running a staging environment, it’s important to conserve resources and minimize costs when the environment is not being used. One way to do this is by turning off the staging environment during non-working hours and weekends, when developers are not actively working on it.
For example, you could use AWS Lambda functions to automatically start and stop the staging environment based on a schedule. This can help reduce costs by only running the environment when it’s needed.
When designing your database schema, it’s important to minimize the number of table joins required for your queries. Each join adds additional complexity and can slow down query performance. One way to do this is by normalizing your tables, which involves breaking down your data into smaller, related tables.