The Challenge The customer is one of the largest healthcare providers in the United States. The customer had become increasingly dissatisfied with a costly Teradata platform hindering their ability to meet reporting and business goals related to performance and scalability.
In addition to freezing the transaction ID to prevent it from wraparound, autovacuum also removes dead tuples to recover space usage. For databases with a high volume of write operations, it is recommended that you tune autovacuum to run frequently.
Doing this helps you avoid the accumulation of dead tuples that bloat tables and indexes. In this post, I use a case study to demonstrate how to monitor and tune the autovacuum process in such a circumstance. What is a dead tuple? When a row is updated, a new version of the row, known as a tuple, is created and inserted into the table.
The old version of the row, referred to as a dead tuple, is not physically removed but is marked as invisible for future transactions. Because every row can have multiple different versions, PostgreSQL stores visibility information inside tuples to help determine whether it is visible to a transaction or query based on its isolation level.
Dead tuples might still be visible to transactions.
If a dead tuple is not visible to any transaction, the vacuum process can remove it by marking its space as available for future reuse.
You can find a good explanation of vacuuming for recovering space in the PostgreSQL documentation. The importance of removing dead tuples is twofold. Dead tuples not only decrease space utilization, but they can also lead to database performance issues. When a table has a large number of dead tuples, its size grows much more than it actually needs—usually called bloat.
The following Amazon CloudWatch graph shows an example of the problems that I encountered. It can be seen as a summary of my tuning experience. During the first few weeks after migration, several databases experienced up to 25, Read IOPS spikes in a situation in which there was no increase in load.
I noticed two problems with the autovacuum sessions. The default three autovacuum sessions had been running for a long time while vacuuming tables. This means that a table is eligible to be vacuumed when more than 10 percent of its tuples are dead tuples. However, many of my tables were big, with hundreds of millions of row counts.
When those tables reached this 10 percent threshold, their dead tuples had already grown into millions.
This means that at a given time, three concurrent autovacuum sessions can run, or three tables can be vacuumed concurrently.
When those three autovacuum sessions were all occupied, other tables had to wait for their turn to be vacuumed while their dead tuples kept growing. This turned into an unhealthy cycle. On the table that had the autovacuum session running for the longest time, I also found another session querying it and getting stuck in the idle in transaction status.
When autovacuum tried to remove dead tuples on the table involved, it noticed that they were still visible to open transactions and could not remove them. Autovacuum was essentially being blocked. As shown in the table stats, many tables were bloated, and their dead tuples had grown tremendously.
This became the root cause of IOPS spikes. Many parameters are provided that you can use in a flexible way. Some can be changed dynamically without bouncing the Amazon RDS instance. Some can be set either at the database level or at the table level. My tuning efforts were focused on parameters, which helped me solve those two problems identified before.
The default value of this parameter is 0. The following formula calculates the autovacuum threshold for a table: The smaller the value is for this parameter, the less the number of dead tuples that autovacuum will work on each time.
For small tables, there may be a concern that autovacuum runs unnecessarily frequently and incurs overhead. If your tables have various sizes or different write patterns, I recommend that you set this parameter with different values at the table level, instead of one value at the database level.
Problem 1 also indicated that running three default autovacuum sessions concurrently was not quick enough to traverse all the tables that met the autovacuum threshold.
On the one hand, the autovacuum sessions are empowered to get the job done in an optimal way if they have enough system resource allocation. On the other hand, you want to put a limit on their system resource consumption so that their performance impact can be predictable.
Either of these two parameters sets the maximum size of memory that each autovacuum session can use.ABOUT The mission of the National Center for Case Study Teaching in Science (NCCSTS) is to promote the nationwide application of active learning techniques to the teaching of science, with a particular emphasis on case studies and problem-based learning.
Team Case Study DATABASE DESIGN Primary Keys and Attributes The primary keys will include, Doctor’s ID, Patients ID, and procedures ID, and Appointment ID. Data types for each entity Patients entity – patients first and last names, name of kin, date of birth, postal address, social security number, sex, contact number.
This article provides a case study of this approach on a commercial distributed database product. A benchmark for distributed databases for decision support, known as D 3 S, is introduced and used in the case study. redBus, an Indian company providing a Software as a Service (SaaS) application for bus operators in addition to selling bus tickets on its own and third-party websites, migrated its operations completely to AWS.
The company uses Amazon EC2, Elastic Load Balancing, Amazon RDS, Amazon S3, Amazon EBS, and Amazon CloudWatch. Application of a Self-Controlled Case Series Study to a Database Study in Children application of DPC hospitals.
The number of DPC-intro-duced hospitals is expected to continually increase.
The self-controlled case series method is an appealing. Microsoft Azure Stack is an extension of Azure—bringing the agility and innovation of cloud computing to your on-premises environment and enabling the only hybrid cloud that allows you to build and deploy hybrid applications anywhere.