What is denormalization in databases

When should you denormalize?


Denormalize if it is OLAP operations, Normalize if OLTP (from the linked article in the Denormalization section)

Databases dedicated to online transaction processing (OLTP) are typically more normalized than databases dedicated to online analytical processing (OLAP). OLTP applications are characterized by a high volume of small transactions, e.g. B. by updating a sales record at a supermarket checkout. Every transaction is expected to keep the database in a consistent state. In contrast, databases dedicated to OLAP operations are primarily "mostly read" databases. OLAP applications typically extract historical data that has accumulated over a long period of time. For such databases, redundant or "denormalized" data can facilitate business intelligence applications. In particular, dimension tables in a star schema often contain denormalized data. The denormalized or redundant data must be carefully controlled during Extract, Transform, and Load (ETL) processing, and users should not be allowed to see the data until it is in a consistent state. The normalized alternative to the star scheme is the snowflake scheme. In many cases, the need for denormalization has diminished as computers and RDBMS software have become more powerful. However, as data volume has generally increased along with hardware and software performance, OLAP databases often still use denormalized schemes.

Denormalization is also used to improve performance on smaller computers, such as computerized cash registers and mobile devices, as they may only use the data for reference purposes (e.g. price searches). Denormalization can also be used when a platform (e.g. Palm) does not have an RDBMS or when there is no need to make changes to the data and a quick response is critical.

Normalize until it hurts, Denormalize until it works (i.e.: performance becomes acceptable) :)

A potentially sane reason for using the controlled Denormalization is that you can apply an integrity constraint to the data that would otherwise not be possible. Most SQL DBMS have very limited support for multiple table restrictions. In SQL, the only effective way to implement certain constraints is sometimes to ensure that the attributes involved in the constraint are all in the same table - even if normalization dictates that they belong in separate tables.

Controlled Denormalization means that mechanisms are implemented that ensure that no inconsistencies can occur due to redundant data. The cost of these additional controls and the risk of inconsistent data must be considered when deciding whether to denormalize.

Another common reason for denormalization is to allow a change in memory structures or some other physical optimization that the DBMS would otherwise not allow. According to the principle of independence of physical data should a DBMS can configure internal storage structures without unnecessarily changing the logical representation of data in the database. Unfortunately, many DBMSs severely limit the physical implementation options for a particular database schema. They tend to compromise the independence of the physical database by only supporting a sub-optimal implementation of the desired logical model.

It should be obvious, but it has to be said: In all cases, only changes in the physical implementation characteristics can determine performance - characteristics such as internal data structures, files, indexing, hardware, and so on. Normalization and denormalization have nothing to do with performance or memory optimization.

If you have frequent access to calculated data, denormalize as suggested in the answers to this question. The cost of storing and managing the calculated data is often less than the cost of recalculating if your load profile is heavily read.

I routinely denormalize so I can enforce data integrity with restrictions. An example is a question recently asked on this site: I'm replicating a column in another table so that I can use a CHECK constraint to compare it to another column. Another example of this technique is my blog post.

You cannot use CHECK constraints to compare columns in different rows or in different tables unless you include these functions in scalar UDFs that are called by a CHECK constraint. What if you actually need to compare columns in different rows or in different tables to enforce a business rule? Suppose you know a doctor's working hours and want to make sure all appointments are within working hours? Of course, you can use a trigger or a stored procedure to implement this business rule, but neither a trigger nor a stored procedure can give you a 100% guarantee that all of your data is clean - someone can disable or delete your trigger, enter some bad data and re-enable or re-create the trigger. Also, someone can modify your table directly and bypass stored procedures.

Let me show you how to implement this business rule with only FK and CHECK constraints. This ensures that all data meets the business rule as long as all restrictions are trusted.

Another example is a way to ensure that time periods have no gaps or overlaps.

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.

By continuing, you consent to our use of cookies and other tracking technologies and affirm you're at least 16 years old or have consent from a parent or guardian.

You can read details in our Cookie policy and Privacy policy.