Blame - runtime/mutator_gc_coord.md - art

blob: b3635c558a25f491a446cf74757802fe65f37c85 [file] [log] [blame] [view]

Hans Boehm	891cb88	2020-07-31 12:06:58 -0700	[diff] [blame]	1	Mechanisms for Coordination Between Garbage Collector and Mutator
				2	-----------------------------------------------------------------
				3
				4	Most garbage collection work can proceed concurrently with the client or
				5	mutator Java threads. But in certain places, for example while tracing from
				6	thread stacks, the garbage collector needs to ensure that Java data processed
				7	by the collector is consistent and complete. At these points, the mutators
				8	should not hold references to the heap that are invisible to the garbage
				9	collector. And they should not be modifying the data that is visible to the
				10	collector.
				11
				12	Logically, the collector and mutator share a reader-writer lock on the Java
				13	heap and associated data structures. Mutators hold the lock in reader or shared mode
				14	while running Java code or touching heap-related data structures. The collector
				15	holds the lock in writer or exclusive mode while it needs the heap data
				16	structures to be stable. However, this reader-writer lock has a very customized
				17	implementation that also provides additional facilities, such as the ability
				18	to exclude only a single thread, so that we can specifically examine its heap
				19	references.
				20
				21	In order to ensure consistency of the Java data, the compiler inserts "suspend
				22	points", sometimes also called "safe points" into the code. These allow a thread
				23	to respond to external requests.
				24
				25	Whenever a thread is runnable, i.e. whenever a thread logically holds the
				26	mutator lock in shared mode, it is expected to regularly execute such a suspend
				27	point, and check for pending requests. They are currently implemented by
				28	setting a flag in the thread structure[^1], which is then explicitly tested by the
				29	compiler-generated code.
				30
				31	A thread responds to suspend requests only when it is "runnable", i.e. logically
				32	running Java code. When it runs native code, or is blocked in a kernel call, it
				33	logically releases the mutator lock. When the garbage collector needs mutator
				34	cooperation, and the thread is not runnable, it is assured that the mutator is
				35	not touching Java data, and hence the collector can safely perform the required
				36	action itself, on the mutator thread's behalf.
				37
				38	Normally, when a thread makes a JNI call, it is not considered runnable while
				39	executing native code. This makes the transitions to and from running native JNI
				40	code somewhat expensive (see below). But these transitions are necessary to
				41	ensure that such code, which does not execute "suspend points", and can thus not
				42	cooperate with the GC, doesn't delay GC completion. `@FastNative` and
				43	`@CriticalNative` calls avoid these transitions, instead allowing the thread to
				44	remain "runnable", at the expense of potentially delaying GC operations for the
				45	duration of the call.
				46
				47	Although we say that a thread is "suspended" when it is not running Java code,
				48	it may in fact still be running native code and touching data structures that
				49	are not considered "Java data". This distinction can be a fine line. For
				50	example, a Java thread blocked on a Java monitor will normally be "suspended"
				51	and blocked on a mutex contained in the monitor data structure. But it may wake
				52	up for reasons beyond ARTs control, which will normally result in touching the
				53	mutex. The monitor code must be quite careful to ensure that this does not cause
				54	problems, especially if the ART runtime was shut down in the interim and the
				55	monitor data structure has been reclaimed.
				56
				57	Calls to change thread state
				58	----------------------------
				59
				60	When a thread changes between running Java and native code, it has to
				61	correspondingly change its state between "runnable" and one of several
				62	other states, all of which are considered to be "suspended" for our purposes.
				63	When a Java thread starts to execute native code, and may thus not respond
				64	promptly to suspend requests, it will normally create an object of type
				65	`ScopedThreadSuspension`. `ScopedThreadSuspension`'s constructor changes state to
				66	the "suspended" state given as an argument, logically releasing the mutator lock
				67	and promising to no longer touch Java data structures. It also handles any
				68	pending suspension requests that slid in just before it changed state.
				69
				70	Conversely, `ScopedThreadSuspension`'s destructor waits until the GC has finished
				71	any actions it is currently performing on the thread's behalf and effectively
				72	released the mutator exclusive lock, and then returns to runnable state,
				73	re-acquiring the mutator lock.
				74
				75	Occasionally a thread running native code needs to temporarily again access Java
				76	data structures, performing the above transitions in the opposite order.
				77	`ScopedObjectAccess` is a similar RAII object whose constructor and destructor
				78	perform those transitions in the reverse order from `ScopedThreadSuspension`.
				79
				80	Mutator lock implementation
				81	---------------------------
				82
				83	The mutator lock is not implemented as a conventional mutex. But it plays by the
				84	rules of our normal static thread-safety analysis. Thus a function that is
				85	expected to be called in runnable state, with the ability to access Java data,
				86	should be annotated with `REQUIRES_SHARED(Locks::mutator_lock_)`.
				87
				88	There is an explicit `mutator_lock_` object, of type `MutatorMutex`. `MutatorMutex` is
				89	seemingly a minor refinement of `ReaderWriterMutex`, but it is used entirely
				90	differently. It is acquired explicitly by clients that need to hold it
				91	exclusively, and in a small number of cases, it is acquired in shared mode, e.g.
				92	via `SharedTryLock()`, or by the GC itself. However, more commonly
				93	`MutatorMutex::TransitionFromSuspendedToRunnable()`, is used to logically acquire
				94	the mutator mutex, e.g. as part of `ScopedObjectAccess` construction.
				95
				96	`TransitionFromSuspendedToRunnable()` does not physically acquire the
				97	`ReaderWriterMutex` in shared mode. Thus any thread acquiring the lock in exclusive mode
				98	must, in addition, explicitly arrange for mutator threads to be suspended via the
				99	thread suspension mechanism, and then make them runnable again on release.
				100
				101	Logically the mutator lock is held in shared/reader mode if *either* the
				102	underlying reader-writer lock is held in shared mode, *or* if a mutator is in
				103	runnable state.
				104
				105	Suspension and checkpoint API
				106	-----------------------------
				107
				108	Suspend point checks enable three kinds of communication with mutator threads:
				109
				110	Checkpoints
				111	: Checkpoint requests are used to get a thread to perform an action
				112	on our behalf. `RequestCheckpoint()` asks a specific thread to execute the closure
				113	supplied as an argument at its leisure. `RequestSynchronousCheckpoint()` in
				114	addition waits for the thread to complete running the closure, and handles
				115	suspended threads by running the closure on their behalf. In addition to these
				116	functions provided by `Thread`, `ThreadList` provides the `RunCheckpoint()` function
				117	that runs a checkpoint function on behalf of each thread, either by using
				118	`RequestCheckpoint()` to run it inside a running thread, or by ensuring that a
				119	suspended thread stays suspended, and then running the function on its behalf.
				120	`RunCheckpoint()` does not wait for completion of the function calls triggered by
				121	the resulting `RequestCheckpoint()` invocations.
				122
Hans Boehm	0529cfa	2021-08-16 16:50:28 -0700	[diff] [blame]	123	Empty checkpoints
Hans Boehm	891cb88	2020-07-31 12:06:58 -0700	[diff] [blame]	124	: ThreadList provides `RunEmptyCheckpoint()`, which waits until
				125	all threads have either passed a suspend point, or have been suspended. This
				126	ensures that no thread is still executing Java code inside the same
				127	suspend-point-delimited code interval it was executing before the call. For
				128	example, a read-barrier started before a `RunEmptyCheckpoint()` call will have
				129	finished before the call returns.
				130
				131	Thread suspension
				132	: ThreadList provides a number of `SuspendThread...()` calls and
				133	a `SuspendAll()` call to suspend one or all threads until they are resumed by
				134	`Resume()` or `ResumeAll()`. The `Suspend...` calls guarantee that the target
				135	thread(s) are suspended (again, only in the sense of not running Java code)
				136	when the call returns.
				137
Hans Boehm	0529cfa	2021-08-16 16:50:28 -0700	[diff] [blame]	138	Deadlock freedom
				139	----------------
				140
				141	It is easy to deadlock while attempting to run checkpoints, or suspending
				142	threads. In particular, we need to avoid situations in which we cannot suspend
				143	a thread because it is blocked, directly, or indirectly, on the GC completing
				144	its task. Deadlocks are avoided as follows:
				145
				146	Mutator lock ordering
				147	The mutator lock participates in the normal ART lock ordering hierarchy, as though it
				148	were a regular lock. See `base/locks.h` for the hierarchy. In particular, only
				149	locks at or below level `kPostMutatorTopLockLevel` may be acquired after
				150	acquiring the mutator lock, e.g. inside the scope of a `ScopedObjectAccess`.
				151	Similarly only locks at level strictly above `kMutatatorLock`may be held while
				152	acquiring the mutator lock, e.g. either by starting a `ScopedObjectAccess`, or
				153	ending a `ScopedThreadSuspension`.
				154
				155	This ensures that code that uses purely mutexes and threads state changes cannot
				156	deadlock: Since we always wait on a lower-level lock, the holder of the
				157	lowest-level lock can always progress. An attempt to initiate a checkpoint or to
				158	suspend another thread must also be treated as an acquisition of the mutator
				159	lock: A thread that is waiting for a lock before it can respond to the request
				160	is itself holding the mutator lock, and can only be blocked on lower-level
				161	locks. And acquisition of those can never depend on acquiring the mutator
				162	lock.
				163
Hans Boehm	f68f918	2021-08-26 12:05:43 -0700	[diff] [blame]	164	Checkpoints
				165	Running a checkpoint in a thread requires suspending that thread for the
				166	duration of the checkpoint, or running the checkpoint on the threads behalf
				167	while that thread is blocked from executing Java code. In the former case, the
				168	checkpoint code is run from `CheckSuspend`, which requires the mutator lock,
				169	so checkpoint code may only acquire mutexes at or below level
				170	`kPostMutatorTopLockLevel`. But that is not sufficient.
				171
				172	No matter whether the checkpoint is run in the target thread, or on its behalf,
				173	the target thread is effectively suspended and prevented from running Java code.
				174	However the target may hold arbitrary Java monitors, which it can no longer
				175	release. This may also prevent higher level mutexes from getting released. Thus
				176	checkpoint code should only acquire mutexes at level `kPostMonitorLock` or
				177	below.
				178
				179
Hans Boehm	0529cfa	2021-08-16 16:50:28 -0700	[diff] [blame]	180	Waiting
				181	This becomes much more problematic when we wait for something other than a lock.
				182	Waiting for something that may depend on the GC, while holding the mutator lock,
				183	can potentially lead to deadlock, since it will prevent the waiting thread from
				184	participating in GC checkpoints. Waiting while holding a lower-level lock like
				185	`thread_list_lock_` is similarly unsafe in general, since a runnable thread may
				186	not respond to checkpoints until it acquires `thread_list_lock_`. In general,
				187	waiting for a condition variable while holding an unrelated lock is problematic,
				188	and these are specific instances of that general problem.
				189
				190	We do currently provide `WaitHoldingLocks`, and it is sometimes used with
				191	low-level locks held. But such code must somehow ensure that such waits
				192	eventually terminate without deadlock.
				193
				194	One common use of WaitHoldingLocks is to wait for weak reference processing.
				195	Special rules apply to avoid deadlocks in this case: Such waits must start after
				196	weak reference processing is disabled; the GC may not issue further nonempty
				197	checkpoints or suspend requests until weak reference processing has been
				198	reenabled, and threads have been notified. Thus the waiting thread's inability
				199	to respond to nonempty checkpoints and suspend requests cannot directly block
				200	the GC. Non-GC checkpoint or suspend requests that target a thread waiting on
				201	reference processing will block until reference processing completes.
				202
				203	Consider a case in which thread W1 waits on reference processing, while holding
				204	a low-level mutex M. Thread W2 holds the mutator lock and waits on M. We avoid a
				205	situation in which the GC needs to suspend or checkpoint W2 by briefly stopping
				206	the world to disable weak reference access. During the stop-the-world phase, W1
				207	cannot yet be waiting for weak-reference access. Thus there is no danger of
				208	deadlock while entering this phase. After this phase, there is no need for W2 to
				209	suspend or execute a nonempty checkpoint. If we replaced the stop-the-world
				210	phase by a checkpoint, W2 could receive the checkpoint request too late, and be
				211	unable to respond.
				212
				213	Empty checkpoints can continue to occur during reference processing. Reference
				214	processing wait loops explicitly handle empty checkpoints, and an empty
				215	checkpoint request notifies the condition variable used to wait for reference
				216	processing, after acquiring `reference_processor_lock_`. This means that empty
				217	checkpoints do not preclude client threads from being in the middle of an
				218	operation that involves a weak reference access, while nonempty checkpoints do.
				219
				220
Hans Boehm	891cb88	2020-07-31 12:06:58 -0700	[diff] [blame]	221	[^1]: Some comments in the code refer to a not-yet-really-implemented scheme in
				222	which the compiler-generated code would load through the address at
				223	`tlsPtr_.suspend_trigger`. A thread suspension is requested by setting this to
				224	null, triggering a `SIGSEGV`, causing that thread to check for GC cooperation
				225	requests. The real mechanism instead sets an appropriate `ThreadFlag` entry to
				226	request suspension or a checkpoint. Note that the actual checkpoint function
				227	value is set, along with the flag, while holding `suspend_count_lock_`. If the
				228	target thread notices that a checkpoint is requested, it then acquires
				229	the `suspend_count_lock_` to read the checkpoint function.