Skip to content

Repeated calls to auto_close_old_connections() in a single request can cause a MySQLdb.OperationalError #1793

@marktriggs

Description

@marktriggs

Expected behaviour

Within a single request, a connection to MySQL shouldn't be closed while still in use.

Current behaviour

We recently hit an issue where MCPServer unexpectedly threw an exception during processing, reporting MySQLdb.OperationalError: (2006, '') (which is "mysql server has gone away"). Usually this happens when MySQL gets restarted or the underlying connection is closed for some other reason, but in this case there was no system-level reason for this to have happened.

After some digging, I eventually worked out that the culprit was auto_close_old_connections() and a pretty large dose of bad luck. The root cause is auto_close_old_connections() being called more than once during the same request, when the request happens to start just before the underlying database connection is due to be expired by Django's CONN_MAX_AGE setting (which defaults to 1 hour).

In my case, I think the sequence of events was:

  • A request starts, which calls auto_close_old_connections() up front. No problem there, because the connection is a mere 59 minutes and 59 seconds old.
  • In lib/server/jobs/client.py, we find ourselves in submit_tasks in a loop that is iterating over the self.package.files result set.
  • Within the body of that loop, we call self.task_backend.submit_task, which ultimately calls the submit method of server/tasks/backends/gearman_backend.py. In turn, that method calls Task.bulk_log, which is decorated with @auto_close_old_connections()
  • Now our underlying database connection is over an hour old, and auto_close_old_connections() closes it.
  • But eventually we return back to our loop that is still iterating over self.package.files. It finishes successfully and attempts to close the underlying result set, but finds that its database connection has gone away.

The stack trace we see when that happens:

Traceback (most recent call last):
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
    callback(self)
  File "/usr/lib/archivematica/MCPServer/server/queues.py", line 220, in _job_completed_callback
    next_job = future.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/usr/lib/archivematica/MCPServer/server/jobs/client.py", line 122, in run
    self.submit_tasks()
  File "/usr/lib/archivematica/MCPServer/server/jobs/client.py", line 225, in submit_tasks
    for file_replacements in self.package.files(
  File "/usr/lib/archivematica/MCPServer/server/packages.py", line 663, in files
    for file_obj in queryset.iterator():
  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.10/site-packages/django/db/models/query.py", line 516, in _iterator
    yield from iterable
  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.10/site-packages/django/db/models/query.py", line 121, in __iter__
    for row in compiler.results_iter(results):
  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1496, in apply_converters
    for row in map(list, rows):
  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 2096, in cursor_iter
    cursor.close()
  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.10/site-packages/MySQLdb/cursors.py", line 103, in close
    self._discard()
  File "/usr/share/archivematica/virtualenvs/archivematica/lib/python3.10/site-packages/MySQLdb/cursors.py", line 95, in _discard
    while con.next_result() == 0:  # -1 means no more data.
MySQLdb.OperationalError: (2006, '')

There are a couple of other issues that might not be exactly the same bug, but have the same kind of shape:
#1758 #1750

Steps to reproduce

It's difficult to reproduce in the wild because you do have to get very unlucky :) In my development setup, I was able to replicate the issue by:

  • Changing my CONN_MAX_AGE setting to 1 to get it turning over connections after 1 second
  • Adding a sleep statement to the body of lib/server/jobs/client.py's submit_tasks to slow down the loop enough to make hitting the bug likely.
  • Submitting a transfer as normal

Your environment (version of Archivematica, operating system, other relevant details)

I'm running 1:1.17.0-1~22.04 under Ubuntu Jammy (Ubuntu 22.04.5 LTS)


For Artefactual use:

Before you close this issue, you must check off the following:

  • All pull requests related to this issue are properly linked
  • All pull requests related to this issue have been merged
  • A testing plan for this issue has been implemented and passed (testing plan information should be included in the issue body or comments)
  • Documentation regarding this issue has been written and merged (if applicable)
  • Details about this issue have been added to the release notes (if applicable)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: bugA flaw in the code that causes the software to produce an incorrect or unexpected result.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions