| Hans Boehm | 891cb88 | 2020-07-31 12:06:58 -0700 | [diff] [blame] | 1 | Mechanisms for Coordination Between Garbage Collector and Mutator |
| 2 | ----------------------------------------------------------------- |
| 3 | |
| 4 | Most garbage collection work can proceed concurrently with the client or |
| 5 | mutator Java threads. But in certain places, for example while tracing from |
| 6 | thread stacks, the garbage collector needs to ensure that Java data processed |
| 7 | by the collector is consistent and complete. At these points, the mutators |
| 8 | should not hold references to the heap that are invisible to the garbage |
| 9 | collector. And they should not be modifying the data that is visible to the |
| 10 | collector. |
| 11 | |
| 12 | Logically, the collector and mutator share a reader-writer lock on the Java |
| 13 | heap and associated data structures. Mutators hold the lock in reader or shared mode |
| 14 | while running Java code or touching heap-related data structures. The collector |
| 15 | holds the lock in writer or exclusive mode while it needs the heap data |
| 16 | structures to be stable. However, this reader-writer lock has a very customized |
| 17 | implementation that also provides additional facilities, such as the ability |
| 18 | to exclude only a single thread, so that we can specifically examine its heap |
| 19 | references. |
| 20 | |
| 21 | In order to ensure consistency of the Java data, the compiler inserts "suspend |
| 22 | points", sometimes also called "safe points" into the code. These allow a thread |
| 23 | to respond to external requests. |
| 24 | |
| 25 | Whenever a thread is runnable, i.e. whenever a thread logically holds the |
| 26 | mutator lock in shared mode, it is expected to regularly execute such a suspend |
| 27 | point, and check for pending requests. They are currently implemented by |
| 28 | setting a flag in the thread structure[^1], which is then explicitly tested by the |
| 29 | compiler-generated code. |
| 30 | |
| 31 | A thread responds to suspend requests only when it is "runnable", i.e. logically |
| 32 | running Java code. When it runs native code, or is blocked in a kernel call, it |
| 33 | logically releases the mutator lock. When the garbage collector needs mutator |
| 34 | cooperation, and the thread is not runnable, it is assured that the mutator is |
| 35 | not touching Java data, and hence the collector can safely perform the required |
| 36 | action itself, on the mutator thread's behalf. |
| 37 | |
| 38 | Normally, when a thread makes a JNI call, it is not considered runnable while |
| 39 | executing native code. This makes the transitions to and from running native JNI |
| 40 | code somewhat expensive (see below). But these transitions are necessary to |
| 41 | ensure that such code, which does not execute "suspend points", and can thus not |
| 42 | cooperate with the GC, doesn't delay GC completion. `@FastNative` and |
| 43 | `@CriticalNative` calls avoid these transitions, instead allowing the thread to |
| 44 | remain "runnable", at the expense of potentially delaying GC operations for the |
| 45 | duration of the call. |
| 46 | |
| 47 | Although we say that a thread is "suspended" when it is not running Java code, |
| 48 | it may in fact still be running native code and touching data structures that |
| 49 | are not considered "Java data". This distinction can be a fine line. For |
| 50 | example, a Java thread blocked on a Java monitor will normally be "suspended" |
| 51 | and blocked on a mutex contained in the monitor data structure. But it may wake |
| 52 | up for reasons beyond ARTs control, which will normally result in touching the |
| 53 | mutex. The monitor code must be quite careful to ensure that this does not cause |
| 54 | problems, especially if the ART runtime was shut down in the interim and the |
| 55 | monitor data structure has been reclaimed. |
| 56 | |
| 57 | Calls to change thread state |
| 58 | ---------------------------- |
| 59 | |
| 60 | When a thread changes between running Java and native code, it has to |
| 61 | correspondingly change its state between "runnable" and one of several |
| 62 | other states, all of which are considered to be "suspended" for our purposes. |
| 63 | When a Java thread starts to execute native code, and may thus not respond |
| 64 | promptly to suspend requests, it will normally create an object of type |
| 65 | `ScopedThreadSuspension`. `ScopedThreadSuspension`'s constructor changes state to |
| 66 | the "suspended" state given as an argument, logically releasing the mutator lock |
| 67 | and promising to no longer touch Java data structures. It also handles any |
| 68 | pending suspension requests that slid in just before it changed state. |
| 69 | |
| 70 | Conversely, `ScopedThreadSuspension`'s destructor waits until the GC has finished |
| 71 | any actions it is currently performing on the thread's behalf and effectively |
| 72 | released the mutator exclusive lock, and then returns to runnable state, |
| 73 | re-acquiring the mutator lock. |
| 74 | |
| 75 | Occasionally a thread running native code needs to temporarily again access Java |
| 76 | data structures, performing the above transitions in the opposite order. |
| 77 | `ScopedObjectAccess` is a similar RAII object whose constructor and destructor |
| 78 | perform those transitions in the reverse order from `ScopedThreadSuspension`. |
| 79 | |
| 80 | Mutator lock implementation |
| 81 | --------------------------- |
| 82 | |
| 83 | The mutator lock is not implemented as a conventional mutex. But it plays by the |
| 84 | rules of our normal static thread-safety analysis. Thus a function that is |
| 85 | expected to be called in runnable state, with the ability to access Java data, |
| 86 | should be annotated with `REQUIRES_SHARED(Locks::mutator_lock_)`. |
| 87 | |
| 88 | There is an explicit `mutator_lock_` object, of type `MutatorMutex`. `MutatorMutex` is |
| 89 | seemingly a minor refinement of `ReaderWriterMutex`, but it is used entirely |
| 90 | differently. It is acquired explicitly by clients that need to hold it |
| 91 | exclusively, and in a small number of cases, it is acquired in shared mode, e.g. |
| 92 | via `SharedTryLock()`, or by the GC itself. However, more commonly |
| 93 | `MutatorMutex::TransitionFromSuspendedToRunnable()`, is used to logically acquire |
| 94 | the mutator mutex, e.g. as part of `ScopedObjectAccess` construction. |
| 95 | |
| 96 | `TransitionFromSuspendedToRunnable()` does not physically acquire the |
| 97 | `ReaderWriterMutex` in shared mode. Thus any thread acquiring the lock in exclusive mode |
| 98 | must, in addition, explicitly arrange for mutator threads to be suspended via the |
| 99 | thread suspension mechanism, and then make them runnable again on release. |
| 100 | |
| 101 | Logically the mutator lock is held in shared/reader mode if ***either*** the |
| 102 | underlying reader-writer lock is held in shared mode, ***or*** if a mutator is in |
| 103 | runnable state. |
| 104 | |
| 105 | Suspension and checkpoint API |
| 106 | ----------------------------- |
| 107 | |
| 108 | Suspend point checks enable three kinds of communication with mutator threads: |
| 109 | |
| 110 | **Checkpoints** |
| 111 | : Checkpoint requests are used to get a thread to perform an action |
| 112 | on our behalf. `RequestCheckpoint()` asks a specific thread to execute the closure |
| 113 | supplied as an argument at its leisure. `RequestSynchronousCheckpoint()` in |
| 114 | addition waits for the thread to complete running the closure, and handles |
| 115 | suspended threads by running the closure on their behalf. In addition to these |
| 116 | functions provided by `Thread`, `ThreadList` provides the `RunCheckpoint()` function |
| 117 | that runs a checkpoint function on behalf of each thread, either by using |
| 118 | `RequestCheckpoint()` to run it inside a running thread, or by ensuring that a |
| 119 | suspended thread stays suspended, and then running the function on its behalf. |
| 120 | `RunCheckpoint()` does not wait for completion of the function calls triggered by |
| 121 | the resulting `RequestCheckpoint()` invocations. |
| 122 | |
| Hans Boehm | 0529cfa | 2021-08-16 16:50:28 -0700 | [diff] [blame] | 123 | **Empty checkpoints** |
| Hans Boehm | 891cb88 | 2020-07-31 12:06:58 -0700 | [diff] [blame] | 124 | : ThreadList provides `RunEmptyCheckpoint()`, which waits until |
| 125 | all threads have either passed a suspend point, or have been suspended. This |
| 126 | ensures that no thread is still executing Java code inside the same |
| 127 | suspend-point-delimited code interval it was executing before the call. For |
| 128 | example, a read-barrier started before a `RunEmptyCheckpoint()` call will have |
| 129 | finished before the call returns. |
| 130 | |
| 131 | **Thread suspension** |
| 132 | : ThreadList provides a number of `SuspendThread...()` calls and |
| 133 | a `SuspendAll()` call to suspend one or all threads until they are resumed by |
| 134 | `Resume()` or `ResumeAll()`. The `Suspend...` calls guarantee that the target |
| 135 | thread(s) are suspended (again, only in the sense of not running Java code) |
| 136 | when the call returns. |
| 137 | |
| Hans Boehm | 0529cfa | 2021-08-16 16:50:28 -0700 | [diff] [blame] | 138 | Deadlock freedom |
| 139 | ---------------- |
| 140 | |
| 141 | It is easy to deadlock while attempting to run checkpoints, or suspending |
| 142 | threads. In particular, we need to avoid situations in which we cannot suspend |
| 143 | a thread because it is blocked, directly, or indirectly, on the GC completing |
| 144 | its task. Deadlocks are avoided as follows: |
| 145 | |
| 146 | **Mutator lock ordering** |
| 147 | The mutator lock participates in the normal ART lock ordering hierarchy, as though it |
| 148 | were a regular lock. See `base/locks.h` for the hierarchy. In particular, only |
| 149 | locks at or below level `kPostMutatorTopLockLevel` may be acquired after |
| 150 | acquiring the mutator lock, e.g. inside the scope of a `ScopedObjectAccess`. |
| 151 | Similarly only locks at level strictly above `kMutatatorLock`may be held while |
| 152 | acquiring the mutator lock, e.g. either by starting a `ScopedObjectAccess`, or |
| 153 | ending a `ScopedThreadSuspension`. |
| 154 | |
| 155 | This ensures that code that uses purely mutexes and threads state changes cannot |
| 156 | deadlock: Since we always wait on a lower-level lock, the holder of the |
| 157 | lowest-level lock can always progress. An attempt to initiate a checkpoint or to |
| 158 | suspend another thread must also be treated as an acquisition of the mutator |
| 159 | lock: A thread that is waiting for a lock before it can respond to the request |
| 160 | is itself holding the mutator lock, and can only be blocked on lower-level |
| 161 | locks. And acquisition of those can never depend on acquiring the mutator |
| 162 | lock. |
| 163 | |
| Hans Boehm | f68f918 | 2021-08-26 12:05:43 -0700 | [diff] [blame] | 164 | **Checkpoints** |
| 165 | Running a checkpoint in a thread requires suspending that thread for the |
| 166 | duration of the checkpoint, or running the checkpoint on the threads behalf |
| 167 | while that thread is blocked from executing Java code. In the former case, the |
| 168 | checkpoint code is run from `CheckSuspend`, which requires the mutator lock, |
| 169 | so checkpoint code may only acquire mutexes at or below level |
| 170 | `kPostMutatorTopLockLevel`. But that is not sufficient. |
| 171 | |
| 172 | No matter whether the checkpoint is run in the target thread, or on its behalf, |
| 173 | the target thread is effectively suspended and prevented from running Java code. |
| 174 | However the target may hold arbitrary Java monitors, which it can no longer |
| 175 | release. This may also prevent higher level mutexes from getting released. Thus |
| 176 | checkpoint code should only acquire mutexes at level `kPostMonitorLock` or |
| 177 | below. |
| 178 | |
| 179 | |
| Hans Boehm | 0529cfa | 2021-08-16 16:50:28 -0700 | [diff] [blame] | 180 | **Waiting** |
| 181 | This becomes much more problematic when we wait for something other than a lock. |
| 182 | Waiting for something that may depend on the GC, while holding the mutator lock, |
| 183 | can potentially lead to deadlock, since it will prevent the waiting thread from |
| 184 | participating in GC checkpoints. Waiting while holding a lower-level lock like |
| 185 | `thread_list_lock_` is similarly unsafe in general, since a runnable thread may |
| 186 | not respond to checkpoints until it acquires `thread_list_lock_`. In general, |
| 187 | waiting for a condition variable while holding an unrelated lock is problematic, |
| 188 | and these are specific instances of that general problem. |
| 189 | |
| 190 | We do currently provide `WaitHoldingLocks`, and it is sometimes used with |
| 191 | low-level locks held. But such code must somehow ensure that such waits |
| 192 | eventually terminate without deadlock. |
| 193 | |
| 194 | One common use of WaitHoldingLocks is to wait for weak reference processing. |
| 195 | Special rules apply to avoid deadlocks in this case: Such waits must start after |
| 196 | weak reference processing is disabled; the GC may not issue further nonempty |
| 197 | checkpoints or suspend requests until weak reference processing has been |
| 198 | reenabled, and threads have been notified. Thus the waiting thread's inability |
| 199 | to respond to nonempty checkpoints and suspend requests cannot directly block |
| 200 | the GC. Non-GC checkpoint or suspend requests that target a thread waiting on |
| 201 | reference processing will block until reference processing completes. |
| 202 | |
| 203 | Consider a case in which thread W1 waits on reference processing, while holding |
| 204 | a low-level mutex M. Thread W2 holds the mutator lock and waits on M. We avoid a |
| 205 | situation in which the GC needs to suspend or checkpoint W2 by briefly stopping |
| 206 | the world to disable weak reference access. During the stop-the-world phase, W1 |
| 207 | cannot yet be waiting for weak-reference access. Thus there is no danger of |
| 208 | deadlock while entering this phase. After this phase, there is no need for W2 to |
| 209 | suspend or execute a nonempty checkpoint. If we replaced the stop-the-world |
| 210 | phase by a checkpoint, W2 could receive the checkpoint request too late, and be |
| 211 | unable to respond. |
| 212 | |
| 213 | Empty checkpoints can continue to occur during reference processing. Reference |
| 214 | processing wait loops explicitly handle empty checkpoints, and an empty |
| 215 | checkpoint request notifies the condition variable used to wait for reference |
| 216 | processing, after acquiring `reference_processor_lock_`. This means that empty |
| 217 | checkpoints do not preclude client threads from being in the middle of an |
| 218 | operation that involves a weak reference access, while nonempty checkpoints do. |
| 219 | |
| 220 | |
| Hans Boehm | 891cb88 | 2020-07-31 12:06:58 -0700 | [diff] [blame] | 221 | [^1]: Some comments in the code refer to a not-yet-really-implemented scheme in |
| 222 | which the compiler-generated code would load through the address at |
| 223 | `tlsPtr_.suspend_trigger`. A thread suspension is requested by setting this to |
| 224 | null, triggering a `SIGSEGV`, causing that thread to check for GC cooperation |
| 225 | requests. The real mechanism instead sets an appropriate `ThreadFlag` entry to |
| 226 | request suspension or a checkpoint. Note that the actual checkpoint function |
| 227 | value is set, along with the flag, while holding `suspend_count_lock_`. If the |
| 228 | target thread notices that a checkpoint is requested, it then acquires |
| 229 | the `suspend_count_lock_` to read the checkpoint function. |