Java并发-死锁的原因及如何解决

什么是死锁

死锁是指两个或两个以上的进程在执行过程中,由于竞争资源或者由于彼此通信而造成的一种阻塞的现象,若无外力作用,它们都将无法推进下去。

死锁的原因

独占锁:独占锁就是在同一时刻只能有一个线程获取到锁,而其他获取锁的线程只能处于同步队列中等待,只有获取锁的线程释放了锁,后继的线程才能够获取锁。

常说的死锁四大条件包括:

互斥条件 —> 独占锁的特点之一。

请求与保持条件 —> 独占锁的特点之一,尝试获取锁时并不会释放已经持有的锁

不剥夺条件 —> 独占锁的特点之一。

循环等待条件 —> 唯一需要记忆的造成死锁的条件。

如下代码会出现死锁:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
public class DeadLockTest {

public static void main(String[] args) {
String lockA = "A";
String lockB = "B";
new Thread(() -> {
synchronized (lockA){
try {
Thread.sleep(100);
} catch (InterruptedException e) {
e.printStackTrace();
}
synchronized (lockB){
System.out.println("this is b");
}
}
}).start();
new Thread(() -> {
synchronized (lockB){
synchronized (lockA){
System.out.println("this is a");
}
}
}).start();
}
}

线程1一开始获取了锁A并占用100ms,此时线程2先获取到了锁B,然后尝试获取锁A,但A被线程1占用,所以线程2进入了锁池等待竞争锁A。当线程1执行到需要获取锁B,可锁B被线程2所占用未释放,所以线程1进入了锁B的等待池,等待竞争锁B。至此,两个线程都进入了等待阶段,而两个线程又都不会释放自身所持有的锁。从而造成了死锁。

如何减少死锁发生的概率

  1. 如果想要打破互斥条件,我们需要允许进程同时访问某些资源,这种方法受制于实际场景,不太容易实现条件;
  2. 打破不可抢占条件,这样需要允许进程强行从占有者那里夺取某些资源,或者简单一点理解,占有资源的进程不能再申请占有其他资源,必须释放手上的资源之后才能发起申请,这个其实也很难找到适用场景;
  3. 进程在运行前申请得到所有的资源,否则该进程不能进入准备执行状态。这个方法看似有点用处,但是它的缺点是可能导致资源利用率和进程并发性降低;
  4. 避免出现资源申请环路,即对资源事先分类编号,按号分配。这种方式可以有效提高资源的利用率和系统吞吐量,但是增加了系统开销,增大了进程对资源的占用时间。
  • 避免一个线程同时获取多个锁
  • 避免一个线程在锁内同时占用多个资源,尽量保证每个锁只占用一个资源。
  • 尝试使用定时锁,使用lock.tryLock(timeout)来替代使用内部锁机制。
  • 对于数据库锁,加锁和解锁必须在一个数据库连接里,否则会出现解锁失败的情况。

当死锁发生时

检测死锁

  1. 使用jps工具

    jps是jdk提供的一个查看当前java进程的小工具, 可以看做是JavaVirtual Machine Process Status Tool的缩写。非常简单实用。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    命令格式:jps [options ] [ hostid ] 

    [options]选项 :
    -q:仅输出VM标识符,不包括classname,jar name,arguments in main method
    -m:输出main method的参数
    -l:输出完全的包名,应用主类名,jar的完全路径名
    -v:输出jvm参数
    -V:输出通过flag文件传递到JVM中的参数(.hotspotrc文件或-XX:Flags=所指定的文件
    -Joption:传递参数到vm,例如:-J-Xms512m

    [hostid]:

    [protocol:][[//]hostname][:port][/servername]

    命令的输出格式 :
    lvmid [[classname|JARfilename|"Unknown"][arg*][jvmarg*]]

    这里用jps -l,找到我们实际用的进程

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    D:\projects\my\demo>jps -l
    13696 com.caucho.server.resin.Resin
    14708 org.jetbrains.jps.cmdline.Launcher
    13168 com.caucho.server.resin.Resin
    14208 org.jetbrains.jps.cmdline.Launcher
    14240 org.jetbrains.idea.maven.server.RemoteMavenServer
    16412 com.skywater.demo.thread.art.one.DeadLockTest
    12620 org.jetbrains.jps.cmdline.Launcher
    12552 org.jetbrains.jps.cmdline.Launcher
    7560 com.caucho.server.resin.Resin
    4360 org.jetbrains.kotlin.daemon.KotlinCompileDaemon
    2424 org.codehaus.plexus.classworlds.launcher.Launcher
    15560
    15424 sun.tools.jps.Jps

    com.skywater.demo.thread.art.one.DeadLockTest对应的16412

  2. 使用jstack

    jstack是java虚拟机自带的一种堆栈跟踪工具。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    D:\projects\my\demo>jstack -h
    Usage:
    jstack [-l] <pid>
    (to connect to running process)
    jstack -F [-m] [-l] <pid>
    (to connect to a hung process)
    jstack [-m] [-l] <executable> <core>
    (to connect to a core file)
    jstack [-m] [-l] [server_id@]<remote server IP or hostname>
    (to connect to a remote debug server)

    Options:
    -F to force a thread dump. Use when jstack <pid> does not respond (process is hung)
    -m to print both java and native frames (mixed mode)
    -l long listing. Prints additional information about locks
    -h or -help to print this help message

    jstack用于打印出给定的java进程ID或core file或远程调试服务的Java堆栈信息,如果是在64位机器上,需要指定选项”-J-d64”,Windows的jstack使用方式只支持以下的这种方式:jstack [-l] pid

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
        D:\projects\my\demo>jstack 16412
    2020-06-08 16:44:04
    Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.112-b15 mixed mode):

    "DestroyJavaVM" #13 prio=5 os_prio=0 tid=0x000000000215a800 nid=0x3324 waiting on condition [0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

    "Thread-1" #12 prio=5 os_prio=0 tid=0x000000001e3b0800 nid=0x22c0 waiting for monitor entry [0x000000001ecbf000]
    java.lang.Thread.State: BLOCKED (on object monitor)
    at com.skywater.demo.thread.art.one.DeadLockTest.lambda$main$1(DeadLockTest.java:41)
    - waiting to lock <0x000000076b8a9318> (a java.lang.String)
    - locked <0x000000076b8a9348> (a java.lang.String)
    at com.skywater.demo.thread.art.one.DeadLockTest$$Lambda$2/812265671.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:745)

    "Thread-0" #11 prio=5 os_prio=0 tid=0x000000001e3b0000 nid=0x43f4 waiting for monitor entry [0x000000001eb6f000]
    java.lang.Thread.State: BLOCKED (on object monitor)
    at com.skywater.demo.thread.art.one.DeadLockTest.lambda$main$0(DeadLockTest.java:29)
    - waiting to lock <0x000000076b8a9348> (a java.lang.String)
    - locked <0x000000076b8a9318> (a java.lang.String)
    at com.skywater.demo.thread.art.one.DeadLockTest$$Lambda$1/94438417.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:745)

    "Service Thread" #10 daemon prio=9 os_prio=0 tid=0x000000001d4c0000 nid=0x3844 runnable [0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

    "C1 CompilerThread2" #9 daemon prio=9 os_prio=2 tid=0x000000001d421000 nid=0xc90 waiting on condition [0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

    "C2 CompilerThread1" #8 daemon prio=9 os_prio=2 tid=0x000000001d434000 nid=0x3d14 waiting on condition [0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

    "C2 CompilerThread0" #7 daemon prio=9 os_prio=2 tid=0x000000001d433800 nid=0x3120 waiting on condition [0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

    "Monitor Ctrl-Break" #6 daemon prio=5 os_prio=0 tid=0x000000001d432000 nid=0x2898 runnable [0x000000001d8ce000]
    java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:170)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
    - locked <0x000000076b64f008> (a java.io.InputStreamReader)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at java.io.BufferedReader.fill(BufferedReader.java:161)
    at java.io.BufferedReader.readLine(BufferedReader.java:324)
    - locked <0x000000076b64f008> (a java.io.InputStreamReader)
    at java.io.BufferedReader.readLine(BufferedReader.java:389)
    at com.intellij.rt.execution.application.AppMainV2$1.run(AppMainV2.java:61)

    "Attach Listener" #5 daemon prio=5 os_prio=2 tid=0x000000001d10a800 nid=0x3618 waiting on condition [0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" #4 daemon prio=9 os_prio=2 tid=0x000000001be1f000 nid=0xc48 runnable [0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" #3 daemon prio=8 os_prio=1 tid=0x000000001bdf7000 nid=0x25fc in Object.wait() [0x000000001cf8e000]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x000000076b388e98> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
    - locked <0x000000076b388e98> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

    "Reference Handler" #2 daemon prio=10 os_prio=2 tid=0x000000001bdb5800 nid=0x45a8 in Object.wait() [0x000000001d0fe000]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x000000076b386b40> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:502)
    at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
    - locked <0x000000076b386b40> (a java.lang.ref.Reference$Lock)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

    "VM Thread" os_prio=2 tid=0x000000001bdae000 nid=0x4018 runnable

    "GC task thread#0 (ParallelGC)" os_prio=0 tid=0x0000000002170000 nid=0x18b4 runnable

    "GC task thread#1 (ParallelGC)" os_prio=0 tid=0x0000000002171800 nid=0x331c runnable

    "GC task thread#2 (ParallelGC)" os_prio=0 tid=0x0000000002173000 nid=0x4154 runnable

    "GC task thread#3 (ParallelGC)" os_prio=0 tid=0x0000000002174800 nid=0x3888 runnable

    "VM Periodic Task Thread" os_prio=2 tid=0x000000001e14a000 nid=0x31bc waiting on condition

    JNI global references: 335


    Found one Java-level deadlock:
    =============================
    "Thread-1":
    waiting to lock monitor 0x000000001be02208 (object 0x000000076b8a9318, a java.lang.String),
    which is held by "Thread-0"
    "Thread-0":
    waiting to lock monitor 0x000000001be036a8 (object 0x000000076b8a9348, a java.lang.String),
    which is held by "Thread-1"

    Java stack information for the threads listed above:
    ===================================================
    "Thread-1":
    at com.skywater.demo.thread.art.one.DeadLockTest.lambda$main$1(DeadLockTest.java:41)
    - waiting to lock <0x000000076b8a9318> (a java.lang.String)
    - locked <0x000000076b8a9348> (a java.lang.String)
    at com.skywater.demo.thread.art.one.DeadLockTest$$Lambda$2/812265671.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:745)
    "Thread-0":
    at com.skywater.demo.thread.art.one.DeadLockTest.lambda$main$0(DeadLockTest.java:29)
    - waiting to lock <0x000000076b8a9348> (a java.lang.String)
    - locked <0x000000076b8a9318> (a java.lang.String)
    at com.skywater.demo.thread.art.one.DeadLockTest$$Lambda$1/94438417.run(Unknown Source)
    at java.lang.Thread.run(Thread.java:745)

    Found 1 deadlock.

    可见显示有1个死锁Found 1 deadlock.

死锁的恢复

一旦在死锁检测时发现了死锁,就要消除死锁,使系统从死锁状态中恢复过来。

  1. 最简单,最常用的方法就是进行系统的重新启动,不过这种方法代价很大,它意味着在这之前所有的进程已经完成的计算工作都将付之东流,包括参与死锁的那些进程,以及未参与死锁的进程。
  2. 撤消进程,剥夺资源。终止参与死锁的进程,收回它们占有的资源,从而解除死锁。这时又分两种情况:一次性撤消参与死锁的全部进程,剥夺全部资源;或者逐步撤消参与死锁的进程,逐步收回死锁进程占有的资源。一般来说,选择逐步撤消的进程时要按照一定的原则进行,目的是撤消那些代价最小的进程,比如按进程的优先级确定进程的代价;考虑进程运行时的代价和与此进程相关的外部作业的代价等因素。
  3. 进程回退策略,即让参与死锁的进程回退到没有发生死锁前某一点处,并由此点处继续执行,以求再次执行时不再发生死锁。虽然这是个较理想的办法,但是操作起来系统开销极大,要有堆栈这样的机构记录进程的每一步变化,以便今后的回退,有时这是无法做到的。

参考

<<Java并发编程的艺术>>

Java 程序死锁问题原理及解决方案