计算机 / 读书笔记 · 2021年12月19日 0

Java Concurrency in Practice读书笔记

一、Introduction

Java多线程编程中需要关注的几个问题:

  • Safety
    在多个线程中共享的数据要正确地同步
  • Liveness
    避免死锁、活锁、饿死这些问题
  • Performance
    设计合理多线程程序与设计不合理的多线程程序间性能可能存在巨大差异

二、Thread Safety

三、Sharing Objects

1. Visibility

几个值得注意的小点:

  • Locking is not just about mutual exclusion; it is also about memory visibility. To ensure that all threads see the most up-to-date values of shared mutable variables, the reading and writing threads must synchronize on a common lock.
  • 对于setter和getter方法应该都要加同步机制保护才能保证getter读出来的不是旧值(stale)。因为只有同步机制才保证了getter和setter之间的happens-before关系。
  • out-of-thin-air safety
    对于不加同步机制保护的变量,读取变量得到的值可能是旧值(stale),但不会是一个其他的不相关的值。但是对于64位的数值变量(double和long)这个规则不适用,因为JVM对于一个64数据的读写操作可以用两个32位的操纵替换,在你读了一个double的前32位时,另一个线程写了这个double的后32位,然后你再读取这后32位,组合出来的这个double就不是一个旧值,而是一个完全错误的值。

2. Publication and escape(发布与逸出)

Publishing an object means making it available to code outside of its current scope, such as by storing a reference to it where other code can find it, returning it from a nonprivate method, or passing it to a method in another class.
一个对象在其所处作用域之外可以被访问就称为发布(这个对象)。比如保存对象的引用,使得其他处的代码可以通过这个引用访问这个对象;通过非私有方法返回的对象;将对象传入其他类的方法。

An object that is published when it should not have been is said to have escaped.
不该发布的对象被发布了就称为逸出。

Publishing an object also publishes any objects referred to by its nonprivate fields. More generally, any object that is reachable from a published object by following some chain of nonprivate field references and method calls has also been published.
发布一个对象的同时也发布了可以通过该对象的非私有方法、域可以访问的对象。一般地,如果一个对象可以从某个被发布的对象开始通过某条非私有域、非私有对象的链路访问到,那么该对象也被发布。

From the perspective of a class C, an alien method is one whose behavior is not fully specified by C. This includes methods in other classes as well as overrideable methods (neither private nor final) in C itself. Passing an object to an alien method must also be considered publishing that object.
从一个类C的角度来看,一个alien 方法指的是不是完全由该类自己定义的方法,alien方法包括了其他类的方法、可以重载的方法(只要不是private和final的方法都可以重载)。将对象传给alien方法也是在发布该对象。

Once an object escapes, you have to assume that another class or thread may, maliciously or carelessly, misuse it.
将对象传给alien方法之后,无法控制alien方法会对该对象做出什么。

如果一个匿名内部类的实例被publish,由于该实例会隐式引用一个外部类的this指针,因此相应的外部类对象也被publish。

一个对象只有在它的构造函数返回之后才会处于一个predictable、consistent的状态,所以一个对象在自己的构造函数中被publish时,会publish一个未完整构建的对象(自己),即使这个publish发生在构造函数的最后一条语句。对于这种this引用在构造函数里escape的情形,该对象被认为是not properly constructed。下面的ThisEscape就是这样一个例子:

public class ThisEscape {
	public ThisEscape(EventSource source) {
		source.registerListener(
			new EventListener() {
				public void onEvent(Event e) {
					doSomething(e);
				}
			});
	}
}

Do not allow this reference to escape during construction.

一个常见的这种类型的错误是在构造函数中开始一个新的线程。在构造函数中创建新的线程本身也没有错,但是最好不要也在构造函数中开始运行这个线程,因为如果你在构造函数中开始运行这个线程,而由于此时该对象还处于未完成构造的状态,那么在该线程运行的过程中就可能会出现些问题。

像实现和上面代码同样的功能但是同时又不让this引用escape,我们可以这样做:

public class SafeListener {
	private final EventListener listener;

	private SafeListener() {
		listener = new EventListener() {
			public void onEvent(Event e) {
				doSomething(e);
			}
		};
	}

	public static SafeListener newInstance(EventSource source) {
		SafeListener safe = new SafeListener();
		source.registerListener(safe.listener);
		return safe;
	}
}

3. Thread confinement(线程封闭)

限制一个对象只能由一个线程访问。这由开发者负责实现。

  • Ad-hoc thread confinement
    完全由程序实现保证的thread confinement,不是由编程语言特性辅助实现的。
  • Stack confinement
    只让局部变量访问的对象对局部变量所属的线程封闭。
  • ThreadLocal类Thread-local variables are often used to prevent sharing in designs based on mutable Singletons or global variables.

4. Immutability

Immutable objects are always thread-safe.

immutability不等于将对象的所有域都声明为final,因为final域可能持有mutable对象的引用,如果修改了该final域指向的mutable对象,那么这个对象的状态也就改变了。

一个对象是immutable如果:

  • 构造完成后该对象状态不能被修改;
  • 所有的域是final(有一个特例是String);
  • properly constructed(the this reference does not escape during construction)

5. Safe Publication

  • 对于需要在多线程中运行的代码,即使简单的对象发布,也需要考虑添加同步机制
  • 但是immutable对象受到Java Memory Model的优待
    Java Memory Model为immutable对象提供特使的initialization safety保障机制。即使not properly constructed,immutable对象也可以被安全地访问。Immutable objects can be used safely by any thread without additional synchronization, even when synchronization is not used to pubilsh them.
    This guarantee extends to the values of all final fields of properly constructed objects; final fields can be safely accessed without additional synchronization. However, if final fields refer to mutable objects, synchronization is still required to access the state of the objects they refer to.
  • Safe publicatioin idiomsTo publish an object safely, both the reference to the object and the object’s state must be made visible to other threads at the same time. A properly constructed object can be safely published by:
    • Initializing an object reference from a static initializer;
    • Storing a reference to it into a volatile field or AtomicReference;
    • Storing a reference to it into a final field of a properly constructed object; or
    • Storing a reference to it into a field that is properly guarded by a lock.
  • Effectively immutable objects
    如果一个对象在构造好后,其状态就不再改变,那么该对象就是effectively immutable,这种对象只需要做到safe publication(在其引用对其他线程可见时该对象状态已经达到稳定),不需要其他的同步机制来保护对该对象的访问。
  • Mutable objects
    对象的mutability不同,发布的要求也不同:
    • Immutable对象可以随意发布
    • Effectively immutable的对象需要安全地发布
    • Mutable对象必须安全发布,而且必须是线程安全或者用锁保护
  • 安全地共享变量
    在并行程序中最有效的共享变量的方法有:
    • Thread-confinedA thread-confined object is owned exclusively by and confined to one thread, and can be modified by its owning thread.
    • Shared read-onlyA shared read-only object can be accessed concurrently by multiple threads without additional synchronization, but cannot be modified by any thread. Shared read-only objects include immutable and effectively immutable objects.
    • Shared thread-safeA thread-safe object performs synchronization internally, so multiple threads can freely access it through its public interface without further synchronization.
    • GuardedA guarded object can be accessed only with a specific lock held. Guarded objects include those that are encapsulated within other thread-safe objects and published objects that are known to be guarded by a specific lock.

四、Composing Objects

4.1 Designing a thread-safe class

The design process for a thread-safe class should include these three basic elements:

  • Identify the variables that form the object’s state(找到构成该对象状态的变量);
  • Identify the invariants that constrain the state variables;
  • Establish a policy for managing concurrent access to the object’s state(synchronization policy阐述了immutability,thread confinement,locking是如何被用来保证线程安全的,为了方便程序的分析和维护,应该为synchronization policy写文档).

对于上面的第二点invariant的解释:
其实就是要保证对象总是有一个合理的正确的值,不要被赋值为无效的值。
分为两点:

  1. 如果对象在某个操作的中间过程其值会处于invalid(无效)状态(比如该对象有很多个域,某个操作修改了其中的某些域,在这个操作已经修改了一些域但是还没有完成修改时这个对象就处于无效状态),那么我们就应该将这个操作设为原子操作以避免对象取这些中间状态值的时候被访问到。
  2. 如果对象不能取某些值或者一套逻辑来决定对象应该取什么样的值,那么就应该对修改对象值的行为进行封装,防止有人给该对象设置一些乱七八糟不该取的值。

You cannot ensure thread safety without understanding an object’s invariants and postconditions. Constraints on the valid values or state transitions for state variables can create atomicity and encapsulation requirements.

postconditions: 对对象的状态作修改后应该满足的要求;
preconditions: 对对象的状态作修改前应该满足的要求;

4.2 Instance confinement

4.3 Delegating thread safety

If a class is composed of multiple independent thread-safe state variables and has no operations that have any invalid state transitions, then it can delegate thread safety to the underlying state variables.

If a state variable is thread-safe, does not participate in any invariants that constrain its value, and has no prohibited state transitions for any of its operations, then it can safely be published.

4.4 Adding functionality to existing thread-safe classes

用继承的方式来为已有的线程安全类添加新的接口并不太好,因为你必须清楚地理解原来的类是如何实现线程安全的,原来的类不一定提供源代码或者相关文档来说明其线程安全特性实现原理,而且原来的类可能在新的版本中改变其线程安全的实现方式。更好的方法是用组合,将原来的线程安全的类作为成员变量,实现一个辅助的线程安全的类(wrapper、helper)。

4.5 Documenting synchronization policies

Document a class’s thread safety guarantees for its clients; document its synchronization policy for its maintainers.

一个比较有用的技巧是使用Guardedby注解,方便让人知道该变量需要用某个锁同步。

五、Building Blocks

5.1 Synchronized collections

  • 用工厂方法Collections.synchronizedXxx创造的集合对象(包括Vector,Hashtable等)。这些集合对象自身提供的接口是线程安全的,但是如果我们想对其做一些组合性质的操作(含有多个步骤的改动,比如遍历一个集合中的对象这种操作),那么我们就应该手动同步(client-side locking)。
  • 迭代器与ConcurrentModificationException
    对于这种synchronized collections,在取得其iterator之后,在每次调用iterator.next()或者iterator.hasNext()方法的时候,都会检查该集合是否被修改(这种检查是不保证准确的,即有可能该集合被修改了但是未检测出来)。如果该集合被检测出修改,则会抛出ConcurrentModificationException异常。为了避免这种问题的出现,要么对synchronized collections迭代遍历时加锁,要么clone该集合得到一个线程封闭的集合,然后对clone的集合进行遍历。
  • Hidden Iterators
    注意有些操作是隐式地使用了迭代器遍历集合的,比如集合的toString操作。因此这些地方也可能抛出ConcurrentModificationException异常,要异常小心。

5.2 Concurrent collections

Java5.0引入了concurrent集合,相比于之前的synchronized集合增加了对集合操作的throughput。

ConcurrentHashMap

  1. 这得益于改进的locking strategy,之前的synchronized collections在所有的方法上使用同一个锁进行同步,而concurrent collections使用更细粒度的lock stripping
  2. concurrent collections的iterator是weakly consistent而不是如synchronized collections的iteraotr那样fail-fast。也就是说在用迭代器遍历集合时该集合可以被修改,只是用该迭代器遍历的结果基本上是获得该迭代器时集合的样子(“基本上”的意思是遍历的结果也可能反映该集合被修改后的样子)。
  3. 像size、isEmpty等操作被弱化为对集合对象的estimate。因为设计者认为concurrent collections的更重要的操作是get、put、containtsKey和remove等。

在大多数情况下(只要你不是一定需要对集合的互斥访问),用ConcurrentHashMap替换synchronizedMap、Hashtable会取得更好的性能、scalability。

CopyOnWriteArrayList

Blocking queues and the producer-consumer pattern

Executor使用了blocking queue(存疑)。

Bounded queues are a powerful resource management tool for building reliable appplications: they make your program more robust to overload by throttling activities that threaten to produce more work than can be handled.

Serial thread confinement:即保证一个对象同时只有一个线程会修改它的线程封闭。