Skip to content

Deadlock in IvasModeRunner.py

Commit SHA to replicate: ca31cb52 (for example)

It has been found out in the development of the CI system and specifically smoke tests that the IvasModeRunner.py can end up into a deadlock situation. In previous use, this has been probably a very rarely occurring situation but with the short inputs of the smoke tests combined with high concurrency (over 50 threads) on some runners has brought the deadlock up.

This problem can be replicated at least on Ubuntu 18.04 using Python version 3.8.10 and with the following edited command of smoke_test.sh

./scripts/runIvasCodec.py -p ./scripts/config/ci_linux.json -m $list -U 1 -t 200 | tee smoke_test_output.txt

Probably increasing threads from 200 to higher count makes it appear even more reliably but this already ended in deadlock almost always on our platform. Our default thread count of 16 required somewhere between 50 to 100 runs to end in deadlock.

The problem seems to be limited to the normal encoder-decoder run as the run with --decoder_only option did not deadlock.

A trace with GDB from the deadlock situation gives this

(gdb) py-bt
Traceback (most recent call first):
  File "/usr/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
  File "/usr/lib/python3.8/threading.py", line 1011, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 236, in shutdown
    t.join()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 644, in __exit__
    self.shutdown(wait=True)
  File "/home/gitlab-runner/debugtemp/ivas-pc-repo/scripts/pyivastest/IvasModeRunner.py", line 2632, in run_enc_dec_threads
  File "/home/gitlab-runner/debugtemp/ivas-pc-repo/scripts/pyivastest/IvasModeRunner.py", line 1513, in run
    self.run_enc_dec_threads()
  File "./scripts/runIvasCodec.py", line 908, in run
  File "./scripts/runIvasCodec.py", line 661, in <module>

and interrupting the process gives this trace

  File "./scripts/runIvasCodec.py", line 149, in <module>
    script.run()
  File "./scripts/runIvasCodec.py", line 140, in run
    runner.run()
  File "/home/gitlab-runner/debugtemp/ivas-pc-repo/scripts/pyivastest/IvasModeRunner.py", line 1513, in run
    self.run_enc_dec_threads()
  File "/home/gitlab-runner/debugtemp/ivas-pc-repo/scripts/pyivastest/IvasModeRunner.py", line 1608, in run_enc_dec_threads
    self.dec_queue["condition"].release()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 644, in __exit__
    self.shutdown(wait=True)
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 236, in shutdown
    t.join()
  File "/usr/lib/python3.8/threading.py", line 1011, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):

Looking at the code, this deadlock problem looks to be related to the lock self.dec_queue["condition"] acquired at row 1596 of IvasModeRunner.py and the break on row 1600 of the same file. I would guess that there should be at least a release of the lock there. However, testing addition of the release gives a wrong progress output so some more work is required to fix this properly.

Edited by Tapani Pihlajakuja