How can we fix a deadlock

8.5 Deadlock handling

8.5.4 Deadlock detection

With deadlock detection, all waiting relationships for active transactions are explicitly included in one Waiting graph (wait-for graph) and deadlocks detected by cycle search in this graph. In contrast to the serialization or dependency graph, the wait graph only takes into account active transactions that are involved in a lock conflict. A deadlock is resolved by resetting one or more of the transactions involved in the cycle. The deadlock detection is the most complex compared to avoidance or timeout strategies, but it gets by with the fewest resets. In quantitative investigations [ACM87], this resulted in the best performance for centralized DBS, where deadlock detection can also be implemented relatively easily [ACD83, Ji88].

Example 8-16

Fig. 8-10 shows an example of a waiting graph. The edges between the transactions are marked with the object for which the conflict occurred. For the lock request from T2 to object O1 a conflict has arisen with two transactions (e.g. both of which have a read lock). In the scenario shown, there are two deadlocks:
T1 -> T2 -> T3 -> T4 -> T1 as well as T1 -> T2 -> T5 -> T1. Resetting T2 would resolve both deadlocks.
The cycle search can be carried out for every lock conflict (continuous deadlock detection) or at periodic intervals. In the former case, a deadlock is resolved at the earliest possible point in time; In addition, the search is simplified, as i.a. only part of the graph needs to be considered. With the periodic cycle search, the entire graph must always be searched, but the effort can be limited by executing it less frequently. However, there is then again a delayed resolution of deadlocks similar to a timeout approach. When selecting the "victims", various criteria can be used, e.g. minimizing the loss of work or simplicity in determining the victim. If deadlock detection is carried out immediately in the event of a lock conflict, a new deadlock can be bypassed simply by resetting the conflicted transaction. However, this is no longer possible in the distributed case.

The deadlock detection in distributed DBS is far more complex than in centralized DBS, since communication processes for the notification of waiting relationships are necessary. In addition, it is difficult to implement correct deadlock detection at all, since each node only knows the current waiting situation with regard to local transactions. Waiting information regarding external transactions, on the other hand, is usually due to time delays in the transmission. outdated. For many proposed procedures it could be shown that they recognize "wrong" deadlocks (phantom deadlocks) and thus reset transactions without a deadlock being present [BHG87]. In many cases, global deadlocks are also recognized multiple times, which also leads to unnecessary resets.

In the following we discuss different approaches to global deadlock detection, which can be done either centrally or distributed.

Central deadlock detection

A global waiting graph is kept in a central node and searched for cycles. In order to limit the communication overhead, each computer sends the waiting relationships it creates to the central node at periodic intervals (message bundling). This completes its waiting graph and starts the cycle search. Since the global waiting graph due to the message delays generally is not up-to-date, deadlocks are not only discovered late, phantom deadlocks can also be recognized, causing unnecessary resets. The special role of the computer responsible for deadlock detection causes similar problems with regard to availability and computer autonomy as with a central locking procedure. After a failure of the central node, however, someone else could easily take over its function or switch to a timeout procedure.

A central deadlock detection was carried out in Distributed Ingres [St79].

Distributed deadlock detection

Distributed deadlock detection avoids dependencies on a central node, but is much more difficult to implement. In particular, multiple detection of global deadlocks can occur and waiting information may have to be propagated across several computers until a global deadlock is detected. An overview of the numerous proposed procedures can be found in [El86, Kn87]. We limit ourselves here to the description of the Obermarck procedure implemented in the R * system [Ob82, MLO86]. It has the great advantage that only one message is required to resolve global deadlocks in which two computers are involved. This is the most important type of global deadlock, as most deadlocks (> 90%) typically only have cycle length 2 [GHOK81]. A central deadlock detection would have required more effort to transfer the waiting relationships.

The Obermarck method is based on the assumption that a special process ("deadlock detector") is tasked with deadlock detection in each computer and communicates with the corresponding processes on the other computers. Each of these processes keeps a local waiting graph in which all waiting relationships for local and external transactions with regard to the objects managed on this computer are recorded. This allows local deadlocks to be recognized immediately and without communication. A special EXTERNAL node is managed in the waiting graph, which identifies waiting relationships for sub-transactions on other computers. For an external sub-transaction Tj there is always a waiting relationship EXTERNAL -> Tj added, since other computers may have been locked by Tj is being serviced. A waiting relationship Ti -> EXTERNAL will also be introduced as soon as transaction Ti an external sub-transaction starts because it may conflict. A more potential Global deadlock is indicated by a cycle in the local waiting graph in which the EXTERNAL node is involved:

EXTERNAL -> T1 -> T2 -> ... -> Tn -> EXTERNAL

In order to determine whether there is actually a global deadlock, the waiting information is forwarded to the computer on which transaction Tn the sub-transaction had started. After receiving such waiting information, the responsible process completes its waiting graph and then carries out a cycle search. To detect a global deadlock, it may be necessary to forward the extended waiting information to another computer again. If a complete cycle is found, one of the transactions involved is reset. If possible, a local transaction should be selected in order to resolve the deadlock as quickly as possible [MLO86].

One problem with the method outlined is that in the event of a global deadlock there is a cycle with the EXTERNAL node on each of the computers involved. If each of these computers now forwards its waiting information for global deadlock detection as described above, this leads to an unnecessarily high communication effort and multiple detection of a global deadlock. To alleviate this problem, each transaction T is in turn given a globally unique time stamp ts (T) given. In the above situation, the waiting information is only forwarded if ts (Tn) < ts (T1) applies. In the event of a global deadlock with two computers, this ensures that only one of the two computers forwards the waiting information. If there are more than two computers, however, it is still possible for the waiting information to be passed on from more than one computer.

Example 8-17

For the deadlock shown in Fig. 8-8 (example 8-11), the following cycle is created in computer R1
in R2 on the other hand

If ts (T1) < ts (T2) applies, then only R1 sends its waiting information to R2, but not the other way around. In R2 the waiting graph is completed and the cycle
T1 -> T2 -> T1
discovered. The deadlock is resolved by resetting the local transaction T2.

The Obermarck algorithm requires a maximum of N * (N-1) / 2 messages for one get over N Machine spanning global deadlock. For N = 2 there is only one message. However, "false" deadlocks can be detected, as the waiting graphs of the individual computers are generally used. reflect different states of change [El86].

Central deadlock detection
Distributed deadlock detection