I've been chasing this for a year or so (across various versions of Python and futures—this time 2.7.6 and 3.0.3) and finally went through the rigamarole of settings up the Python gdb tools to get some decent tracebacks out of it. In short, during large jobs with thousands of tasks, execution sometimes hangs. It runs for about an hour, getting somewhere between 11-17% done in the current reproduction; conveniently, I have a progress bar. The variation makes me think it's some kind of timing bug. The CPU use slowly falls down to 0 as the worker processes complete and no new ones are scheduled to replace them. I end up with a process table like this:
585 ? 00:01:58 dxr
605 ? 01:03:30 dxr <defunct>
606 ? 00:22:48 dxr <defunct>
607 ? 00:01:50 dxr <defunct>
609 ? 00:17:43 dxr <defunct>
The defunct processes are the workers. Adding -L
, we can see the threads futures spins up to coordinate the work distribution:
585 585 ? 00:00:39 dxr
585 603 ? 00:00:00 dxr
585 604 ? 00:00:00 dxr
585 608 ? 00:01:16 dxr
605 605 ? 01:03:30 dxr <defunct>
606 606 ? 00:22:48 dxr <defunct>
607 607 ? 00:01:50 dxr <defunct>
609 609 ? 00:17:43 dxr <defunct>
I don't know why there are only 3 of them, when my process pool is of size 4. Maybe that's a clue?
The Python traceback, from attaching with gdb and using its Python tools, looks like this:
(gdb) py-bt
#5 Frame 0x23c2e90, for file /usr/lib/python2.7/threading.py, line 339, in wait (self=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7f2716b74ea0>, acquire=<built-in method acquire of thread.lock object at remote 0x7f2716b74ea0>, _Condition__waiters=[<thread.lock at remote 0x7f2716b74e00>], release=<built-in method release of thread.lock object at remote 0x7f2716b74ea0>) at remote 0x7f2716b5c920>, timeout=None, waiter=<thread.lock at remote 0x7f2716b74e00>, saved_state=None)
waiter.acquire()
#9 Frame 0x7f272708f460, for file /usr/lib/python2.7/threading.py, line 620, in wait (self=<_Event(_Verbose__verbose=False, _Event__flag=False, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7f2716b74ea0>, acquire=<built-in method acquire of thread.lock object at remote 0x7f2716b74ea0>, _Condition__waiters=[<thread.lock at remote 0x7f2716b74e00>], release=<built-in method release of thread.lock object at remote 0x7f2716b74ea0>) at remote 0x7f2716b5c920>) at remote 0x7f2716b5c840>, timeout=None)
self.__cond.wait(timeout)
#13 Frame 0x24189e0, for file /code/dbg-venv/local/lib/python2.7/site-packages/concurrent/futures/_base.py, line 217, in as_completed (fs=set([<...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <.....(truncated)
waiter.event.wait(wait_timeout)
#19 Frame 0x2424df0, for file /code/dbg-venv/local/lib/python2.7/site-packages/click/_termui_impl.py, line 240, in next (self=<ProgressBar(color=None, pos=47, length_known=True, max_width=62, file=<_NonClosingTextIOWrapper(_stream=<_FixupStream(_stream=<file at remote 0x7f272f2751c0>) at remote 0x7f27270a6fb0>) at remote 0x7f272b1edc60>, is_hidden=False, avg=[<float at remote 0x25d2558>, <float at remote 0x25d2530>, <float at remote 0x25d2508>, <float at remote 0x25d24e0>, <float at remote 0x25d24b8>, <float at remote 0x25d25a8>, <float at remote 0x25d25d0>], last_eta=<float at remote 0x25d2468>, width=36, info_sep=' ', bar_template='%(label)-18s [%(bar)s] %(info)s', label='Indexing files', empty_char='-', start=<float at remote 0x2360e88>, entered=True, item_show_func=None, autowidth=False, show_percent=None, show_pos=False, finished=False, fill_char='#', eta_known=True, show_eta=False, iter=<generator at remote 0x7f2716b4cb60>, length=273, current_item=<Future(_exception=None, _result=None, _condition=<_Condit...(truncated)
rv = next(self.iter)
#27 Frame 0x24187a0, for file /home/dxr/dxr/dxr/build.py, line 325, in show_progress (futures=[<Future(_exception=None, _result=None, _condition=<_Condition(_Condition__lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7f272bb3bc20>, _RLock__count=0) at remote 0x7f27270a6f40>, acquire=<instancemethod at remote 0x7f27270ee9e0>, _is_owned=<instancemethod at remote 0x7f27270ee4e0>, _release_save=<instancemethod at remote 0x7f27270ee8e0>, release=<instancemethod at remote 0x7f27270ee5e0>, _acquire_restore=<instancemethod at remote 0x7f27270eeae0>, _Verbose__verbose=False, _Condition__waiters=[]) at remote 0x7f272f13d4c0>, _state='RUNNING', _traceback=None, _waiters=[<_AsCompletedWaiter(lock=<thread.lock at remote 0x7f2716b74f40>, event=<_Event(_Verbose__verbose=False, _Event__flag=False, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7f2716b74ea0>, acquire=<built-in method acquire of thread.lock object at remote 0x7f2716b74ea0...(truncated)
for future in bar:
#30 Frame 0x2411980, for file /home/dxr/dxr/dxr/build.py, line 638, in index_files (tree=<TreeConfig(_section={'ignore_filenames': ['.hg', '.git', 'CVS', '.svn', '.bzr', '.deps', '.libs', '.DS_Store', '.nfs*', '*~', '._*'], 'build_command': 'cd $source_folder && ./mach clobber && make -f client.mk build MOZ_OBJDIR=$object_folder MOZ_MAKE_FLAGS="-s -j$jobs"', 'description': '', 'core': {}, 'enabled_plugins': [<Plugin(tree_to_index=<type at remote 0x2325170>, file_to_skim=None, filters=[<type at remote 0x232e220>, <type at remote 0x2324270>, <type at remote 0x2323e80>, <type at remote 0x2324d80>, <type at remote 0x233b890>, <type at remote 0x2324990>], analyzers={'tokenizer': {'trigram_tokenizer': {'max_gram': 3, 'type': 'nGram', 'min_gram': 3}}, 'analyzer': {'trigramalyzer': {'type': 'custom', 'tokenizer': 'trigram_tokenizer'}, 'lowercase': {'filter': ['lowercase'], 'type': 'custom', 'tokenizer': 'keyword'}, 'trigramalyzer_lower': {'filter': ['lowercase'], 'type': 'custom', 'tokenizer': 'trigram_tokenizer'}}}, ref...(truncated)
for future in show_progress(futures, 'Indexing files'):
#33 Frame 0x231ea40, for file /home/dxr/dxr/dxr/build.py, line 275, in index_tree (tree=<TreeConfig(_section={'ignore_filenames': ['.hg', '.git', 'CVS', '.svn', '.bzr', '.deps', '.libs', '.DS_Store', '.nfs*', '*~', '._*'], 'build_command': 'cd $source_folder && ./mach clobber && make -f client.mk build MOZ_OBJDIR=$object_folder MOZ_MAKE_FLAGS="-s -j$jobs"', 'description': '', 'core': {}, 'enabled_plugins': [<Plugin(tree_to_index=<type at remote 0x2325170>, file_to_skim=None, filters=[<type at remote 0x232e220>, <type at remote 0x2324270>, <type at remote 0x2323e80>, <type at remote 0x2324d80>, <type at remote 0x233b890>, <type at remote 0x2324990>], analyzers={'tokenizer': {'trigram_tokenizer': {'max_gram': 3, 'type': 'nGram', 'min_gram': 3}}, 'analyzer': {'trigramalyzer': {'type': 'custom', 'tokenizer': 'trigram_tokenizer'}, 'lowercase': {'filter': ['lowercase'], 'type': 'custom', 'tokenizer': 'keyword'}, 'trigramalyzer_lower': {'filter': ['lowercase'], 'type': 'custom', 'tokenizer': 'trigram_tokenizer'}}}, refs...(truncated)
index_files(tree, tree_indexers, index, pool, es)
#37 Frame 0x23ea910, for file /home/dxr/dxr/dxr/build.py, line 64, in index_and_deploy_tree (tree=<TreeConfig(_section={'ignore_filenames': ['.hg', '.git', 'CVS', '.svn', '.bzr', '.deps', '.libs', '.DS_Store', '.nfs*', '*~', '._*'], 'build_command': 'cd $source_folder && ./mach clobber && make -f client.mk build MOZ_OBJDIR=$object_folder MOZ_MAKE_FLAGS="-s -j$jobs"', 'description': '', 'core': {}, 'enabled_plugins': [<Plugin(tree_to_index=<type at remote 0x2325170>, file_to_skim=None, filters=[<type at remote 0x232e220>, <type at remote 0x2324270>, <type at remote 0x2323e80>, <type at remote 0x2324d80>, <type at remote 0x233b890>, <type at remote 0x2324990>], analyzers={'tokenizer': {'trigram_tokenizer': {'max_gram': 3, 'type': 'nGram', 'min_gram': 3}}, 'analyzer': {'trigramalyzer': {'type': 'custom', 'tokenizer': 'trigram_tokenizer'}, 'lowercase': {'filter': ['lowercase'], 'type': 'custom', 'tokenizer': 'keyword'}, 'trigramalyzer_lower': {'filter': ['lowercase'], 'type': 'custom', 'tokenizer': 'trigram_tokenizer...(truncated)
index_name = index_tree(tree, es, verbose=verbose)
#41 Frame 0x231f140, for file /home/dxr/dxr/dxr/cli/index.py, line 26, in index (config=<Config(_section={'es_hosts': 'http://12---Type <return> to continue, or q <return> to quit---
7.0.0.1:9200/', 'default_tree': 'mozilla-central', 'max_thumbnail_size': 20000, 'es_alias': 'dxr_{format}_{tree}', 'es_catalog_index': 'dxr_catalog', 'workers': 4, 'log_folder': '/code/dxr-logs-{tree}', 'es_indexing_timeout': 60, 'temp_folder': '/code/dxr-temp-{tree}', 'google_analytics_key': '', 'es_refresh_interval': 60, 'es_catalog_replicas': 1, 'www_root': '', 'es_index': 'dxr_{format}_{tree}_{unique}', 'generated_date': 'Fri, 05 Feb 2016 18:27:38 +0000', 'skip_stages': ['build']}, trees=<OrderedDict(_OrderedDict__map={'mozilla-central': ['mozilla-central', [None, [...], [...]], [...]]}, _OrderedDict__end=[...]) at remote 0x7f272b1ae5c0>) at remote 0x7f272b5e9300>, verbose=False, tree_names=(), tree=<TreeConfig(_section={'ignore_filenames': ['.hg', '.git', 'CVS', '.svn', '.bzr', '.deps', '.libs', '.DS_Store', '.nfs*', '*~', '._*'], 'build_command': 'cd $source_folder && ./mach clobbe...(truncated)
...
Here's the calling code.
Here's the C traceback as well, in case it's helpful:
#0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1 0x000000000056e2a4 in PyThread_acquire_lock (lock=0x22b12e0, waitflag=1) at ../Python/thread_pthread.h:324
#2 0x000000000060fba9 in lock_PyThread_acquire_lock (self=0x7f2716b74e00, args=()) at ../Modules/threadmodule.c:52
#3 0x00000000004877aa in PyCFunction_Call (func=<built-in method acquire of thread.lock object at remote 0x7f2716b74e00>,
arg=(), kw=0x0) at ../Objects/methodobject.c:81
#4 0x00000000005273b4 in call_function (pp_stack=0x7ffcb97d7450, oparg=0) at ../Python/ceval.c:4020
#5 0x00000000005222e1 in PyEval_EvalFrameEx (
f=Frame 0x23c2e90, for file /usr/lib/python2.7/threading.py, line 339, in wait (self=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7f2716b74ea0>, acquire=<built-in method acquire of thread.lock object at remote 0x7f2716b74ea0>, _Condition__waiters=[<thread.lock at remote 0x7f2716b74e00>], release=<built-in method release of thread.lock object at remote 0x7f2716b74ea0>) at remote 0x7f2716b5c920>, timeout=None, waiter=<thread.lock at remote 0x7f2716b74e00>, saved_state=None), throwflag=0) at ../Python/ceval.c:2666
#6 0x0000000000524b9a in PyEval_EvalCodeEx (co=0x7f272ce09720,
globals={'current_thread': <function at remote 0x7f272ce27060>, '_BoundedSemaphore': <type at remote 0x1f88210>, 'currentThread': <function at remote 0x7f272ce27060>, '_Timer': <type at remote 0x1f8a6a0>, '_format_exc': <function at remote 0x7f272ce16c30>, 'Semaphore': <function at remote 0x7f272ce22a38>, '_deque': <type at remote 0x8c7c60>, 'activeCount': <function at remote 0x7f272ce27300>, '_profile_hook': None, '_sleep': <built-in function sleep>, '_trace_hook': None, 'ThreadError': <type at remote 0x1df77d0>, '_enumerate': <function at remote 0x7f272ce273a8>, '_start_new_thread': <built-in function start_new_thread>, 'BoundedSemaphore': <function at remote 0x7f272ce241b0>, '_shutdown': <instancemethod at remote 0x7f272d2048e0>, '__all__': ['activeCount', 'active_count', 'Condition', 'currentThread', 'current_thread', 'enumerate', 'Event', 'Lock', 'RLock', 'Semaphore', 'BoundedSemaphore', 'Thread', 'Timer', 'setprofile', 'settrace', 'local', 'stack_size'], '_Event': <type at remote 0x1f88930>, 'active_count': <fu...(truncated), locals=0x0, args=0x7f272708f5f8, argcount=2, kws=0x7f272708f608, kwcount=0, defs=0x7f272ce23168,
defcount=1, closure=0x0) at ../Python/ceval.c:3252
#7 0x000000000052799e in fast_function (func=<function at remote 0x7f272ce22f78>, pp_stack=0x7ffcb97d78f0, n=2, na=2, nk=0)
at ../Python/ceval.c:4116
#8 0x0000000000527588 in call_function (pp_stack=0x7ffcb97d78f0, oparg=1) at ../Python/ceval.c:4041
#9 0x00000000005222e1 in PyEval_EvalFrameEx (
f=Frame 0x7f272708f460, for file /usr/lib/python2.7/threading.py, line 620, in wait (self=<_Event(_Verbose__verbose=False, _Event__flag=False, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7f2716b74ea0>, acquire=<built-in method acquire of thread.lock object at remote 0x7f2716b74ea0>, _Condition__waiters=[<thread.lock at remote 0x7f2716b74e00>], release=<built-in method release of thread.lock object at remote 0x7f2716b74ea0>) at remote 0x7f2716b5c920>) at remote 0x7f2716b5c840>, timeout=None), throwflag=0) at ../Python/ceval.c:2666
#10 0x0000000000524b9a in PyEval_EvalCodeEx (co=0x7f272ce0e5c0,
globals={'current_thread': <function at remote 0x7f272ce27060>, '_BoundedSemaphore': <type at remote 0x1f88210>, 'currentThread': <function at remote 0x7f272ce27060>, '_Timer': <type at remote 0x1f8a6a0>, '_format_exc': <function at remote 0x7f272ce16c30>, 'Semaphore': <function at remote 0x7f272ce22a38>, '_deque': <type at remote 0x8c7c60>, 'activeCount': <function at remote 0x7f272ce27300>, '_profile_hook': None, '_sleep': <built-in function sleep>, '_trace_hook': None, 'ThreadError': <type at remote 0x1df77d0>, '_enumerate': <function at remote 0x7f272ce273a8>, '_start_new_thread': <built-in function start_new_thread>, 'BoundedSemaphore': <function at remote 0x7f272ce241b0>, '_shutdown': <instancemethod at remote 0x7f272d2048e0>, '__all__': ['activeCount', 'active_count', 'Condition', 'currentThread', 'current_thread', 'enumerate', 'Event', 'Lock', 'RLock', 'Semaphore', 'BoundedSemaphore', 'Thread', 'Timer', 'setprofile', 'settrace', 'local', 'stack_size'], '_Event': <type at remote 0x1f88930>, 'active_count': <fu...(truncated), locals=0x0, args=0x2418bb0, argcount=2, kws=0x2418bc0, kwcount=0, defs=0x7f272ce23478, defcount=1,
closure=0x0) at ../Python/ceval.c:3252
#11 0x000000000052799e in fast_function (func=<function at remote 0x7f272ce24ae0>, pp_stack=0x7ffcb97d7d90, n=2, na=2, nk=0)
at ../Python/ceval.c:4116
#12 0x0000000000527588 in call_function (pp_stack=0x7ffcb97d7d90, oparg=1) at ../Python/ceval.c:4041
#13 0x00000000005222e1 in PyEval_EvalFrameEx (
f=Frame 0x24189e0, for file /code/dbg-venv/local/lib/python2.7/site-packages/concurrent/futures/_base.py, line 217, in as_completed (fs=set([<...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <...>, <.....(truncated), throwflag=0) at ../Python/ceval.c:2666
#14 0x0000000000452c14 in gen_send_ex (gen=0x7f2716b4cb60, arg=0x0, exc=0) at ../Objects/genobject.c:85
#15 0x0000000000453581 in gen_iternext (gen=0x7f2716b4cb60) at ../Objects/genobject.c:283
#16 0x0000000000513dce in builtin_next (self=0x0, args=(<generator at remote 0x7f2716b4cb60>,)) at ../Python/bltinmodule.c:1107
#17 0x00000000004877aa in PyCFunction_Call (func=<built-in function next>, arg=(<generator at remote 0x7f2716b4cb60>,), kw=0x0)
at ../Objects/methodobject.c:81
#18 0x00000000005273b4 in call_function (pp_stack=0x7ffcb97d8120, oparg=1) at ../Python/ceval.c:4020
#19 0x00000000005222e1 in PyEval_EvalFrameEx (
f=Frame 0x2424df0, for file /code/dbg-venv/local/lib/python2.7/site-packages/click/_termui_impl.py, line 240, in next (self=<ProgressBar(color=None, pos=47, length_known=True, max_width=62, file=<_NonClosingTextIOWrapper(_stream=<_FixupStream(_stream=<file at remote 0x7f272f2751c0>) at remote 0x7f27270a6fb0>) at remote 0x7f272b1edc60>, is_hidden=False, avg=[<float at remote 0x25d2558>, <float at remote 0x25d2530>, <float at remote 0x25d2508>, <float at remote 0x25d24e0>, <float at remote 0x25d24b8>, <float at remote 0x25d25a8>, <float at remote 0x25d25d0>], last_eta=<float at remote 0x25d2468>, width=36, info_sep=' ', bar_template='%(label)-18s [%(bar)s] %(info)s', label='Indexing files', empty_char='-', start=<float at remote 0x2360e88>, entered=True, item_show_func=None, autowidth=False, show_percent=None, show_pos=False, finished=False, fill_char='#', eta_known=True, show_eta=False, iter=<generator at remote 0x7f2716b4cb60>, length=273, current_item=<Future(_exception=None, _result=None, _condition=<_Condit...(truncated), throwflag=0) at ../Python/ceval.c:2666
...
Let me know if I can supply any more information. I'm also not sure if this is more properly filed with upstream, as my codebase isn't Python 3 clean. Thank you!