Log4j2 在启用 JDBC+Async Appender 情况下死锁

issue url

issue: druid issue #5414

现象

发生系统无法响应情况

分析

观察 javacorelog4j2.xml 配置,发现存在通过日志直接插数据库配置。

死锁

  1. Log4j2-TF-3-AsyncLoggerConfig-2 线程(下称线程A)要消费 queue 写数据库,需要数据库连接。数据库连接都被用光或过期了,所以需要调用 Druid-ConnectionPool-Create--1146771917 线程(下称线程B)创建连接。这里存在线程A等待线程B

    3XMTHREADINFO      "Log4j2-TF-3-AsyncLoggerConfig-2" J9VMThread:0x0000000003EB5800, omrthread_t:0x00007F9E58A39D30, java/lang/Thread:0x000000068845D290, state:P, prio=5
    3XMJAVALTHREAD (java/lang/Thread getId:0x8C, isDaemon:true)
    3XMTHREADINFO1 (native thread ID:0x1B344, native priority:0x5, native policy:UNKNOWN, vmstate:P, vm thread flags:0x00020081)
    3XMTHREADINFO2 (native stack address range from:0x00007F9EB8E35000, to:0x00007F9EB8E76000, size:0x41000)
    3XMCPUTIME CPU usage total: 4147.834012773 secs, current category="Application"
    3XMTHREADBLOCK Parked on: java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject@0x000000069AF44CF0 Owned by: <unknown>
    3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0)
    3XMTHREADINFO3 Java callstack:
    4XESTACKTRACE at sun/misc/Unsafe.park(Native Method)
    4XESTACKTRACE at java/util/concurrent/locks/LockSupport.park(LockSupport.java:186(Compiled Code))
    4XESTACKTRACE at java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2050(Compiled Code))
    4XESTACKTRACE at com/alibaba/druid/pool/DruidDataSource.takeLast(DruidDataSource.java:2218)
    4XESTACKTRACE at com/alibaba/druid/pool/DruidDataSource.getConnectionInternal(DruidDataSource.java:1690)
    4XESTACKTRACE at com/alibaba/druid/pool/DruidDataSource.getConnectionDirect(DruidDataSource.java:1427)
    4XESTACKTRACE at com/alibaba/druid/pool/DruidDataSource.getConnection(DruidDataSource.java:1407)
    4XESTACKTRACE at com/alibaba/druid/pool/DruidDataSource.getConnection(DruidDataSource.java:1397)
    4XESTACKTRACE at com/alibaba/druid/pool/DruidDataSource.getConnection(DruidDataSource.java:100)
    4XESTACKTRACE at xxx/xxx/xxx/ConnectionFactoryConfig.getDatabaseConnection(ConnectionFactoryConfig.java:52)
    4XESTACKTRACE at sun/reflect/NativeMethodAccessorImpl.invoke0(Native Method)
  2. 线程B在运行过程中需要输出日志,被 NioProcessor-1(下称线程C)阻塞

    3XMTHREADINFO      "Druid-ConnectionPool-Create--1146771917" J9VMThread:0x0000000007CBFC00, omrthread_t:0x00007F9E58866C28, java/lang/Thread:0x000000069AF57EB0, state:B, prio=5
    3XMJAVALTHREAD (java/lang/Thread getId:0xA94E, isDaemon:true)
    3XMTHREADINFO1 (native thread ID:0x11981, native priority:0x5, native policy:UNKNOWN, vmstate:B, vm thread flags:0x00000281)
    3XMTHREADINFO2 (native stack address range from:0x00007F9DA8CB4000, to:0x00007F9DA8CF5000, size:0x41000)
    3XMCPUTIME CPU usage total: 13.888869677 secs, current category="Application"
    3XMTHREADBLOCK Blocked on: java/lang/Object@0x00000006884A0560 Owned by: "NioProcessor-1" (J9VMThread:0x0000000007C39700, java/lang/Thread:0x00000006B5BBAE18)
    3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0)
    3XMTHREADINFO3 Java callstack:
    4XESTACKTRACE at org/apache/logging/log4j/core/async/AsyncLoggerConfigDisruptor.enqueue(AsyncLoggerConfigDisruptor.java:356)
    4XESTACKTRACE at org/apache/logging/log4j/core/async/AsyncLoggerConfigDisruptor.enqueueEvent(AsyncLoggerConfigDisruptor.java:311)
    4XESTACKTRACE at org/apache/logging/log4j/core/async/AsyncLoggerConfig.logInBackgroundThread(AsyncLoggerConfig.java:158)
    4XESTACKTRACE at org/apache/logging/log4j/core/async/EventRoute$1.logMessage(EventRoute.java:46)
    4XESTACKTRACE at org/apache/logging/log4j/core/async/AsyncLoggerConfig.handleQueueFull(AsyncLoggerConfig.java:148)
    4XESTACKTRACE at org/apache/logging/log4j/core/async/AsyncLoggerConfig.logToAsyncDelegate(AsyncLoggerConfig.java:135(Compiled Code))
    4XESTACKTRACE at org/apache/logging/log4j/core/async/AsyncLoggerConfig.log(AsyncLoggerConfig.java:115(Compiled Code))
    4XESTACKTRACE at org/apache/logging/log4j/core/config/LoggerConfig.log(LoggerConfig.java:388(Compiled Code))
    4XESTACKTRACE at org/apache/logging/log4j/core/config/AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:63(Compiled Code))
    4XESTACKTRACE at org/apache/logging/log4j/core/Logger.logMessage(Logger.java:153(Compiled Code))
    4XESTACKTRACE at org/apache/logging/slf4j/Log4jLogger.log(Log4jLogger.java:376(Compiled Code))
    4XESTACKTRACE at com/alibaba/druid/support/logging/SLF4JImpl.error(SLF4JImpl.java:54(Compiled Code))
    4XESTACKTRACE at com/alibaba/druid/pool/DruidDataSource$CreateConnectionThread.run(DruidDataSource.java:2840)
  3. 线程C持有锁,也要写日志,但 queue 满了,必须等待

    3XMTHREADINFO      "NioProcessor-1" J9VMThread:0x0000000007C39700, omrthread_t:0x00007F9E940D1308, java/lang/Thread:0x00000006B5BBAE18, state:R, prio=5
    3XMJAVALTHREAD (java/lang/Thread getId:0xAE4A, isDaemon:false)
    3XMTHREADINFO1 (native thread ID:0xE08F, native priority:0x5, native policy:UNKNOWN, vmstate:CW, vm thread flags:0x00000081)
    3XMTHREADINFO2 (native stack address range from:0x00007F9DA920E000, to:0x00007F9DA924F000, size:0x41000)
    3XMCPUTIME CPU usage total: 83.480989527 secs, current category="Application"
    3XMHEAPALLOC Heap bytes allocated since last GC cycle=0 (0x0)
    3XMTHREADINFO3 Java callstack:
    4XESTACKTRACE at sun/misc/Unsafe.park(Native Method)
    4XESTACKTRACE at java/util/concurrent/locks/LockSupport.parkNanos(LockSupport.java:349(Compiled Code))
    4XESTACKTRACE at com/lmax/disruptor/MultiProducerSequencer.next(MultiProducerSequencer.java:137(Compiled Code))
    4XESTACKTRACE at com/lmax/disruptor/MultiProducerSequencer.next(MultiProducerSequencer.java:105)
    4XESTACKTRACE at com/lmax/disruptor/RingBuffer.publishEvent(RingBuffer.java:524)
    4XESTACKTRACE at org/apache/logging/log4j/core/async/AsyncLoggerConfigDisruptor.enqueue(AsyncLoggerConfigDisruptor.java:357)
    5XESTACKTRACE (entered lock: java/lang/Object@0x00000006884A0560, entry count: 1)
    4XESTACKTRACE at org/apache/logging/log4j/core/async/AsyncLoggerConfigDisruptor.enqueueEvent(AsyncLoggerConfigDisruptor.java:311)
    4XESTACKTRACE at org/apache/logging/log4j/core/async/AsyncLoggerConfig.logInBackgroundThread(AsyncLoggerConfig.java:158)
    4XESTACKTRACE at org/apache/logging/log4j/core/async/EventRoute$1.logMessage(EventRoute.java:46)
    4XESTACKTRACE at org/apache/logging/log4j/core/async/AsyncLoggerConfig.handleQueueFull(AsyncLoggerConfig.java:148)
    4XESTACKTRACE at org/apache/logging/log4j/core/async/AsyncLoggerConfig.logToAsyncDelegate(AsyncLoggerConfig.java:135(Compiled Code))
    4XESTACKTRACE at org/apache/logging/log4j/core/async/AsyncLoggerConfig.log(AsyncLoggerConfig.java:115(Compiled Code))
    4XESTACKTRACE at org/apache/logging/log4j/core/config/LoggerConfig.log(LoggerConfig.java:388(Compiled Code))
    4XESTACKTRACE at org/apache/logging/log4j/core/config/AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:63(Compiled Code))
    4XESTACKTRACE at org/apache/logging/log4j/core/Logger.logMessage(Logger.java:153(Compiled Code))
    4XESTACKTRACE at org/apache/log4j/Category.maybeLog(Category.java:533(Compiled Code))
    4XESTACKTRACE at org/apache/log4j/Category.debug(Category.java:251(Compiled Code))

由此构成由三个线程死锁

解决

对数据库日志由异步改为同步

备注

需要非常巧合的场合才能出现这种死锁。我尝试构建最小错误集,没有构建出来。
发布在这里,对大家做个提醒。

出现这个死锁的原因是 Druid 创建连接是通过额外线程创建的,而额外线程也需要写日志,这就构成了死锁的潜在可能。最终触发。