Architecture Decisions
A running log of significant architecture decisions made during HPyX development. Entries are listed newest-first within each phase.
Phase 3 — Benchmarks + Diagnostics (2026-05-04)
2026-05-04: Tracing ring buffer uses mutex + vector, not lock-free (Implemented)
- Decision:
tracing.cppstores events in astd::vector<TraceEvent>protected by astd::mutexrather than a lock-free ring buffer. A globalstd::atomic<bool> g_enabledprovides a zero-cost fast path when tracing is off (onememory_order_acquireload per task dispatch). - Why: Lock-free ring buffers are subtle to implement correctly, and the expected event rate (one event per HPyX
async_call) does not justify the complexity. The fast path—is_enabled()returns false—costs a single atomic load and a branch; the slow path (tracing on) holds the mutex for the duration of onepush_back. Bench measurements showed the mutex contention is unmeasurable at realistic task submission rates. A true SPSC or MPSC ring buffer is deferred until profiling shows the mutex is a bottleneck. - Result:
src/_core/tracing.cppis ~50 lines;drain()swaps the buffer under the lock so the drain thread holds the mutex for O(1) time regardless of queue depth. Tests intests/test_debug.pyverify JSONL output, double-enable rejection, idempotent disable, and env-var fallback.
2026-05-04: Task name captured before GIL release in hpx::async lambda (Implemented)
- Decision: When tracing is enabled,
src/_core/futures.cppreadsfn.__qualname__(falling back to__name__, then"<anonymous>") before callingnb::gil_scoped_release. The resultingstd::string task_nameis captured by value in the HPX async lambda. - Why: HPX continuations run on worker threads that do not hold the GIL. Accessing
fn.__qualname__after GIL release would be a data race against the Python interpreter. Capturing before release is the only safe option. The attribute read is gated onhpyx::tracing::is_enabled()(the fast-path atomic), so it incurs zero overhead when tracing is off. - Result:
TraceEvent::nameis always a validstd::stringon the worker thread. The GIL is released promptly before HPX task submission; the name capture adds at most twoPyObject_GetAttrStringcalls per task, which is negligible compared to the HPX thread-hop overhead.
2026-05-04: Python drain thread pulls events every 100 ms, final flush on stop (Implemented)
- Decision:
hpyx.debug.enable_tracingspawns a daemonthreading.Threadthat calls_core.tracing.drain()in a loop withstop_event.wait(timeout=0.1).disable_tracingsets the stop event, joins the thread (5 s timeout), then performs a finaldrain(). - Why: The alternative—flushing from the HPX worker thread—would require file I/O on a thread that does not hold the GIL and where blocking is undesirable. Separating the drain into a Python daemon thread keeps the HPX worker path fast (one
push_backunder mutex) and the I/O path simple (pure Python, GIL held for eachjson.dumps). The 100 ms poll interval is a reasonable latency/overhead trade-off for diagnostic use. - Result:
src/hpyx/debug.pyis self-contained: no new C extensions, no background HPX continuations. The daemon flag ensures the thread does not prevent interpreter shutdown.tests/test_debug.py::test_jsonl_outputverifies events appear in the JSONL file afterdisable_tracing().
2026-05-04: Cold-start benchmark isolated via subprocess, not session fixture opt-out (Implemented)
- Decision:
benchmarks/test_bench_cold_start.pymeasureshpyx.init()+hpyx.shutdown()latency by launching a freshsys.executable -csubprocess for each trial. It does not use the session-scopedhpx_runtimefixture. - Why: HPX cannot restart within a process (the HPX runtime is a singleton; calling
hpx::inittwice in the same process is undefined behavior). The session-scopedhpx_runtimefixture inconftest.pyhides the init cost from every other benchmark file — which is intentional (we don't want benchmark variance from cold init). The cold-start file is the exception: it explicitly measures what the fixture hides. Subprocess isolation is the only correct way to get a clean HPX init. - Result:
benchmark.pedantic(_run_cold_start_subprocess, rounds=5, iterations=1)gives five independent cold-start samples. The subprocess overhead is small relative to HPX runtime init (~200–500 ms typical). Noted as a caveat inbenchmarks/README.md.
2026-05-04: Thread-scaling parametrizes workload width, not os_threads (Implemented)
- Decision:
benchmarks/test_bench_thread_scaling.pyparametrizeswork(number of parallel iterations:[1_000, 10_000, 100_000, 1_000_000]) with a fixed sessionos_threads=4, rather than parametrizingos_threadsdirectly. - Why: HPX's
os_threadscount is set at runtime init and cannot be changed without restarting the process. Parametrizingos_threadswithin a pytest session would require subprocess isolation for each value (astest_bench_cold_start.pydoes), making the test slow and complex. Parametrizing workload width instead shows how throughput scales with work at fixed thread count — a useful and fast measurement. A true os_threads sweep is explicitly deferred to a future subprocess-harness approach, noted in the file's module docstring. - Result:
test_for_loop_scaling_workloadruns four workload widths without restarting HPX; pytest-benchmark groups them under"thread_scaling". The module docstring explains the limitation so future contributors understand why os_threads parametrization is absent.
2026-05-04: Seven-rule authoring contract enforced through fixtures, not linting (Implemented)
- Decision: The benchmark authoring contract (see
benchmarks/README.md) is enforced by shared fixtures inbenchmarks/conftest.pyrather than a custom pytest plugin or linting rule. The seven rules are: - Setup never timed (
benchmark.pedanticor sessionhpx_runtime). - Three size orders:
[1_000, 100_000, 10_000_000]. - Three baselines per HPyX benchmark (NumPy, pure-Python,
ThreadPoolExecutor). - Module-level
pytestmark = pytest.mark.benchmark(group="<topic>"). - Minimize Python overhead unless measuring it (documented in docstring).
- Thread-scaling via
@pytest.mark.parametrize("hpx_threads", [...], indirect=True). - Free-threading gating via
@requires_free_threading. - Why: A linting rule would require a custom plugin that must be maintained and could produce false positives. Fixture enforcement means violations fail at collection time with a clear error. The
hpx_runtimefixture being session-scoped means violating Rule 1 (constructingHPXRuntime()inside a timed callable) either crashes or produces inflated numbers — a self-evident signal.requires_free_threadingis a fixture that skips the test on non-free-threaded builds, preventing phantom benchmark numbers on 3.12. - Result:
benchmarks/conftest.pyprovidespin_cpu,seed_rng,no_gc,hpx_runtime,hpx_threads,requires_free_threading, andenv_sanity_check. All six benchmark files (parallel,kernels,executor,futures,aio,thread_scaling,free_threading,cold_start) follow the contract and demonstrate the fixture patterns.
2026-05-04: pin_cpu is a session-autouse fixture, no-op on non-Linux (Implemented)
- Decision:
benchmarks/conftest.py::pin_cpuusesos.sched_setaffinity(0, {0})at session start to pin the benchmark process to CPU 0 on Linux. On macOS and Windows the fixture yields immediately without doing anything. - Why:
os.sched_setaffinityis Linux-only (macOS usesthread_policy_setand Windows usesSetThreadAffinityMask, both more complex). CPU affinity significantly reduces timing variance in microbenchmarks. The no-op on non-Linux is documented inbenchmarks/README.md's caveats section ("macOS benchmarks are noisier than Linux;pin_cpuis a no-op on macOS."). - Result: Benchmark variance is noticeably lower in Linux CI runs. The
autouse=True, scope="session"means it applies once at the start of the benchmark session without any per-test decoration.
Phase 1 — Futures, Executor, asyncio Bridge (2026-04-24)
2026-04-27: Phase 1 acceptance criteria — all green (Verified)
End-to-end verification of the Phase 1 deliverables specified in epic #116.
Test suite: pixi run test reports 130 passed, 2 skipped, 1 xfailed. The two skips are runtime-isolation tests in tests/test_runtime.py that intentionally leave the runtime stopped (run them with pixi run -e test-py313t pytest tests/test_runtime.py -m skip_after_shutdown). The xfail is test_get_worker_thread_id_from_hpx_thread_is_valid (deferred to Plan 2 alongside the parallel-algorithm bindings).
Acceptance criteria from #116:
| Criterion | Status |
|---|---|
hpyx.async_(fn, x, y) runs fn on an HPX worker under launch::async |
✅ verified by tests/test_futures.py::test_async_submit_runs_on_hpx_worker |
dask.compute(arr.sum(), scheduler=hpyx.HPXExecutor()) completes for a non-trivial graph |
✅ verified by tests/test_dask_integration.py (4 patterns) |
await hpyx.async_(fn, x, y) works inside asyncio.run(main()) |
✅ verified by tests/test_aio.py::test_await_future and test_await_does_not_block_event_loop |
| Full test suite passes on free-threaded 3.13t | ✅ 130 passed under test-py313t |
HPYX_ASYNC_MODE=deferred rollback flag works |
✅ smoke-tested: python -c "import hpyx; print(hpyx.async_(lambda: 'from-deferred').result())" returns from-deferred |
Free-threaded scaling smoke test (verifies real concurrency, not GIL-serialization): 20 × time.sleep(0.1) submitted via HPXExecutor with os_threads=4 completes in ~0.2–0.5s (serial would be 2.0s). Reproduce with:
pixi run -e test-py313t python -c "
import time, hpyx
N = 20
start = time.perf_counter()
with hpyx.HPXExecutor() as ex:
futs = [ex.submit(time.sleep, 0.1) for _ in range(N)]
for f in futs: f.result()
print(f'{N}x0.1s in {time.perf_counter() - start:.2f}s')
"
Out of scope for Phase 1 (deferred per epic): parallel algorithms (Plan 2), C++ kernels (Plan 2), hpyx.execution policy module (Plan 2), benchmark harness + CI gating (Plan 3+), full enable_tracing JSONL output (Plan 4), distributed/multi-locality (v2).
2026-04-27: Test dependency is dask-core, not dask — free-threading constraint (Implemented)
- Decision:
pixi.tomladdsdask-core >=2024.10.0(andnumpy >=1.26) under[feature.test.dependencies]. The fulldaskmetapackage is not added. - Why: The free-threaded
test-py313tenvironment cannot install the fulldaskmetapackage becausedaskpulls indistributed, which depends ontornado, which does not yet ship acp313tbuild on conda-forge. Thedask-corenoarch package providesdask.array,dask.delayed,dask.base.get_scheduler, and the entire scheduler-resolution code path that HPyX needs to validate thedask.compute(scheduler=HPXExecutor())integration. Droppingdistributedis acceptable for v1 because HPyX is single-process by design (multi-locality / parcelport ships in v2). - Result:
tests/test_dask_integration.pyruns cleanly undertest-py313twith all four smoke tests passing (da.array.sum, chunked matmul,dask.delayedchain, multi-stage reductions).dask.distributedintegration is explicitly out-of-scope for v1; documented in the dask integration section of the usage guide. Will revisit when upstreamtornadoships a free-threading build.
2026-04-27: Dask integration smoke test pinned at the executor boundary (Implemented)
- Decision:
tests/test_dask_integration.pyexercises four code paths:da.arange(...).sum().compute(scheduler=ex), chunked 64×64 matmul against a numpy reference,dask.delayedgraph compilation, and multi-stage reductions (mean,var). All four use thewith hpyx.HPXExecutor() as ex:pattern and passscheduler=exto.compute(). No HPyX-side adapter or dask-side patch is required. - Why: The Phase 1 epic (#116) lists
dask.compute(arr.sum(), scheduler=hpyx.HPXExecutor())as a top-level acceptance criterion. Pinning the integration with a smoke test (rather than a benchmark or stress test) catches regressions cheaply: any change toHPXExecutor's submit/map/shutdown surface that breaks dask's scheduler resolution will fail these tests in CI before reaching users. The four flavors are deliberately diverse — array reductions, dense linear algebra, lazy graphs, and back-to-back compute calls — so a regression in any one of them shows up as a specific test failure rather than a generic "dask doesn't work." - Result: All 4 tests pass; the dask integration story for v1 is now contractually pinned. Future executor changes that break the
concurrent.futures.Executorinterface will fail at least one of these tests.
2026-04-27: hpyx.Future inherits from concurrent.futures.Future (Implemented)
- Decision:
hpyx.futures._future.Futureis a real subclass ofconcurrent.futures.Future. The original Task 5 implementation used composition with duck-typing; the inheritance change went in to supportasyncio.wrap_futureandloop.run_in_executor, which both performisinstance(fut, concurrent.futures.Future)checks before threading state through their internals. - Why: The Phase 1 spec (§4.4 acceptance criteria) calls out
asyncio.wrap_future(hpyx_future)andloop.run_in_executor(HPXExecutor(), fn, ...)as required. Both stdlib helpers reject duck-typed futures withAssertionError: concurrent.futures.Future is expected. There is noregister()mechanism onconcurrent.futures.Future(it is a regular class, not anABCMeta), so virtual subclassing is not an option. Direct inheritance is the only way to satisfy theisinstancecontract while keeping our customresult/exception/done/etc. semantics. The base class's internal state machine (_state,_condition,_waiters,_done_callbacks) becomes dead weight that we synchronize with our_hpxfuture via a private helper (see the next ADR). - Result:
__slots__ = ()documents that we add no slots beyond what the base class allocates (which has no__slots__of its own, so memory savings are not on the table).super().__init__()runs in our__init__, every method we expose is overridden to delegate to_hpx, andisinstance(hpyx.async_(fn), concurrent.futures.Future)is nowTrue.tests/test_aio.py::test_wrap_future_worksandtest_run_in_executor_with_HPXExecutorconfirm the integration.
2026-04-27: Mirror inherited _state so concurrent.futures.wait / as_completed work (Implemented)
- Decision: Whenever the underlying
_hpxfuture settles, we eagerly callconcurrent.futures.Future.set_result(self, value)(orset_exception) onselfto flip the inherited_statefromPENDINGtoFINISHED. The sync is gated by an instance_base_state_lockand_base_state_syncedflag so the operation runs at most once per Future. We register the eager sync in__init__(one-shot continuation on the C++_hpx) and we also call it from the_draincallback path so user callbacks observe a settled base state. - Why:
concurrent.futures.waitandas_completeddo not calladd_done_callbackon subclassed futures — they read_stateand_waitersdirectly through the base class's_AcquireFuturescontext manager. Without an eager mirror, the base state stayedPENDINGforever andwait([fut])returnednot_done={fut}even afterfut.result()succeeded. Putting the sync in__init__rather than only in_drainensures interop works regardless of whether the user callsadd_done_callback(they don't, when callingconcurrent.futures.wait). The lock + flag prevents the eager and_drainpaths from racing intoInvalidStateError: FINISHED. - Result:
tests/test_aio.py::test_hpyx_Future_visible_in_concurrent_futures_waitandtest_hpyx_Future_visible_in_concurrent_futures_as_completedconfirm interop. The cost per Future is one extra HPX continuation registration plus one extra_hpx.result()invocation when the future settles — small but not free; revisit if profiling shows it dominates fan-out workloads.
2026-04-27: Override set_result / set_exception / set_running_or_notify_cancel to raise (Implemented)
- Decision: The three "setter" methods inherited from
concurrent.futures.Futureare overridden to raiseRuntimeError("hpyx.Future state is set by the HPX runtime; do not call <method> directly"). The internal sync helper (previous ADR) calls the unbound base methods directly (concurrent.futures.Future.set_result(self, value)) to bypass our own override. - Why: Without these overrides, user code can call
fut.set_result('foo')on an in-flight HPyX future, succeed silently, and corrupt the inherited_statetoFINISHEDwhile_hpxkeeps running and eventually returns the real value. The reviewer demonstrated this directly:fut.set_result('user_value'); fut.result()returned42(the real result) whilefut._statewas'FINISHED'andfut.done()returnedFalse— three different views of the same Future, none of them correct. RaisingRuntimeErroron the public API forces the divergence into a loud failure mode. - Result:
tests/test_aio.py::test_set_result_raises,test_set_exception_raises, andtest_set_running_or_notify_cancel_raisesconfirm the public methods reject user calls. The internal eager-sync continues to work because it callsconcurrent.futures.Future.set_result(self, value)(explicit unbound method invocation) which bypasses MRO lookup and never hits our overridden raise.
2026-04-27: Future.__await__ bridges via loop.create_future() + call_soon_threadsafe (Implemented)
- Decision:
hpyx.Future.__await__lazy-importshpyx.aio._future_await. That coroutine creates anasyncio.Futureon the running event loop, registers an_on_donecallback on the hpyx Future viaadd_done_callback, and posts the result/exception back vialoop.call_soon_threadsafefrom the HPX worker thread that completes the future. The asyncio Future is what the user'sawaitresumes on. - Why: HPX continuations fire on worker threads, not on the asyncio event loop's thread. The only documented thread-safe primitive for waking up the loop from a foreign thread is
loop.call_soon_threadsafe. The bridge is intentionally minimal: no thread pool, no synchronization queue, just oneadd_done_callbackregistration and onecall_soon_threadsafeinvocation perawait. Theif not aio_fut.done():guard runs inside the lambda passed tocall_soon_threadsafe, so the check executes on the loop thread under loop ownership — no race, no missing wakeup. - Result:
tests/test_aio.py::test_await_does_not_block_event_loopregisters an asyncio counter task that increments while a 100ms HPXtime.sleepis in flight; the counter reliably runs ≥50 iterations, proving the loop is not starved. Directawait, exception propagation, already-done futures, andasyncio.gatherover multiple HPyX futures all work.
2026-04-27: Loop-closed posts logged at WARNING and dropped silently (Implemented)
- Decision: When
loop.call_soon_threadsaferaisesRuntimeError(because the event loop has been closed before the HPX future completed),hpyx.aiocatches the exception, logs atWARNINGon thehpyx.aiologger, and drops the result/exception silently. The HPX worker thread does not re-raise. - Why: Re-raising on a worker thread would tear down the HPX runtime — a pathological response to a benign user error (closing the loop with pending work). Spec §5.2 explicitly classifies this as a WARNING-level event so users who configured logging see a message but unconfigured users (the common case) see nothing. We initially shipped at
DEBUG, but the reviewer flagged that as functionally invisible — DEBUG is off by default — and we upgraded to WARNING per spec. - Result: Verified manually: a coroutine that creates an HPX future and lets
asyncio.runexit before the future completes does not crash; aWARNING:hpyx.aio:hpyx.aio: dropping result; event loop is closedline appears when the user enables warning-level logging.
2026-04-27: await_all / await_any are async wrappers around when_all / when_any (Implemented)
- Decision:
hpyx.aio.await_all(*futures)andhpyx.aio.await_any(*futures)areasync deffunctions that lazy-importhpyx.futures.when_all/when_any, build the combined Future, andawaitit. The combined Future's__await__does the loop-bridge work via_future_await. - Why: Users writing async functions want a single-call pattern (
await hpyx.aio.await_all(f1, f2, f3)) rather thanawait hpyx.when_all(f1, f2, f3)(which also works, but reads less idiomatically insideasync def). The lazyfrom hpyx.futures import when_allavoids theaio↔futuresimport cycle that would otherwise trigger when both modules need each other. Theasync defwrapping costs one extra coroutine frame per call, which the reviewer flagged as unnecessary; we kept the async signature for API clarity (the helpers are documented as awaitables, not Future-returners) and accept the negligible per-call overhead. - Result:
tests/test_aio.py::test_aio_await_all,test_aio_await_all_propagates_first_failure, andtest_aio_await_anycover the happy path, exception short-circuit, and the index-and-list return shape. Exception semantics matchwhen_all(first-to-fail wins; siblings finish but the chain raises the first), explicitly diverging fromasyncio.gather's default behavior of consuming exceptions.
2026-04-27: hpyx.aio imports Future unconditionally for runtime introspection (Implemented)
- Decision:
src/hpyx/aio.pydoesfrom hpyx.futures._future import Futureat module level (not underTYPE_CHECKING). The annotations on_future_await,await_all, andawait_anyuseFuturedirectly rather than the"Future"string forward-ref. - Why: With the gated import,
typing.get_type_hints(hpyx.aio.await_all)raisedNameError: name 'Future' is not definedbecause the runtime resolution couldn't find the symbol. This breaks Sphinx autodoc withautodoc_typehints = "description", FastAPI parameter introspection, and any tool that reflects on signatures. ImportingFuturefrom_future(the leaf module, not thehpyx.futurespackage init) keeps the import cycle benign:_future.pydoes not importhpyx.aioat module load (__await__uses a function-local import), soaio.py→_future.pyis safe. - Result:
typing.get_type_hints(hpyx.aio._future_await)now resolves cleanly. Thefrom __future__ import annotationsdirective at the top ofaio.pykeeps annotations lazy, so the new top-level import has no measurable startup cost.
2026-04-27: HPXExecutor is a true concurrent.futures.Executor subclass (Implemented)
- Decision:
hpyx.HPXExecutorinherits fromconcurrent.futures.Executorand implementssubmit,map, andshutdowndirectly.submit(fn, /, *args, **kwargs)returnshpyx.async_(fn, *args, **kwargs).__enter__/__exit__are inherited from the stdlib base. - Why: The dask integration story (
dask.compute(arr.sum(), scheduler=HPXExecutor())) only works ifisinstance(ex, concurrent.futures.Executor)is true — dask's scheduler resolution checks the protocol structurally. Subclassing the stdlib base gives that for free, plusloop.run_in_executor,asyncio.wrap_future, and any third-party library that already targets the stdlib. The previous v0.x executor inherited fromExecutorbut itssubmitwas broken (referenced an unboundhpx_async_set_result); the rewrite makes that contract real. - Result:
tests/test_executor.pyconfirmsissubclass(hpyx.HPXExecutor, concurrent.futures.Executor), basicsubmit/map/shutdownsemantics, args/kwargs forwarding, and exception propagation. The dask smoke test ships in a follow-up task.
2026-04-27: Per-handle shutdown(); atexit owns process-level HPX teardown (Implemented)
- Decision:
HPXExecutor.shutdown()setsself._closed = Trueand returns. It does not call_runtime.shutdown()and does not stop the HPX runtime. Subsequentsubmit/mapon the same handle raiseRuntimeError("cannot schedule new futures after shutdown"). Other liveHPXExecutorhandles continue to work. The HPX runtime itself only stops when theatexithandler fires at process exit. - Why: HPX is a process-global singleton: it cannot host multiple runtimes per process and cannot restart after a stop. Tying executor lifetime to runtime lifetime would mean a single
with HPXExecutor():block ends the runtime forever, which is hostile to scripts that want multiple sequentialwithblocks. Decoupling per-handle shutdown from process-level teardown matches user mental models fromconcurrent.futures.ThreadPoolExecutor(where a shutdown thread pool's threads also disappear, but the process keeps running) while respecting HPX's hard restart constraint. - Result:
test_separate_handles_independent_shutdownandtest_context_manager_shuts_downconfirm the per-handle semantics.test_submit_after_shutdown_raisesandtest_map_after_shutdown_raisesconfirm the post-shutdown error path. The error message intentionally matches the stdlibThreadPoolExecutorexactly.
2026-04-27: max_workers is advisory; mismatches with the running runtime warn instead of erroring (Implemented)
- Decision:
HPXExecutor(max_workers=N)does one of three things: (1) if the runtime is not yet started, seeds_runtime.ensure_started(os_threads=N); (2) if the runtime is started withos_threads=Nalready, no-op; (3) if the runtime is started with a differentos_threads, emit aUserWarningand use the existing pool unchanged.max_workers=Nonealways just callsensure_started()with no thread override. - Why: HPX's worker pool is process-global and cannot be reconfigured after start. A strict implementation that raised on mismatch would break legitimate use cases like "library X spins up an executor with
max_workers=8, then library Y constructs a second one withmax_workers=4" — both libraries should keep working, and only one of them gets to seed the pool. The warning surfaces the conflict so the user can correct it (typically by initializing the runtime explicitly withhpyx.init(os_threads=...)before either library imports), while still letting both libraries run. - Result:
test_max_workers_warning_when_mismatchedconfirms the warning fires;test_max_workers_matches_runtime_no_warningconfirms there's no warning when values match. The warning message names the actual running thread count and explains why HPX can't be reconfigured.
2026-04-27: _runtime.running_os_threads() public accessor instead of _started_cfg private access (Implemented)
- Decision:
src/hpyx/_runtime.pyexposesrunning_os_threads() -> int | Nonereturning the currently-running runtime'sos_threads, orNoneif the runtime is not started.HPXExecutor.__init__calls this instead of reaching into_runtime._started_cfg["os_threads"]. - Why: Reaching into a leading-underscore module-private dict is a code smell, and the original implementation wrapped it in a broad
try/except Exception # noqa: BLE001to defend against schema drift. Exposing a typed accessor (a) eliminates the defensive try/except, (b) gives a clear contract for any future caller that needs the same information (Plan 3 will likely want it), (c) makes the executor-runtime boundary explicit. - Result:
test_running_os_threads_reflects_session_configconfirms the accessor returns the value passed tohpyx.init(os_threads=...). The executor's__init__is two lines shorter and no longer touches private names.
2026-04-27: _closed flag guarded by threading.Lock for free-threaded 3.13t (Implemented)
- Decision:
HPXExecutorkeeps athreading.Lockon the instance. Reads ofself._closedinsubmit/mapand the write inshutdownare all done under the lock. - Why: Under GIL-mode CPython, single-attribute Python writes are effectively atomic. Under free-threaded 3.13t that is no longer guaranteed for the broader memory model — torn reads are theoretically possible, and stdlib
ThreadPoolExecutoritself uses an explicit_shutdown_lockfor the same flag. Explicitly locking matches stdlib behavior and removes a class of TOCTOU races where one thread sees_closed=Falseand submits a task while another thread is in the middle ofshutdown(). - Result: The 50-thread cross-thread submit test continues to pass. The lock overhead is negligible (one acquire/release per
submit/shutdowncall).
2026-04-27: HPXExecutor.map matches stdlib's silent-truncation zip (Implemented)
- Decision:
HPXExecutor.map(fn, *iterables)uses barezip(*iterables), which silently truncates to the shortest input. The earlier draft usedzip(*iterables, strict=True)(which would raiseValueErroron length mismatch), but we reverted to match stdlibExecutor.map. - Why: Substitutability with
concurrent.futures.ThreadPoolExecutoris the whole point of the v1 executor — dask, asyncio, and other consumers may pass iterables with different lengths and expect stdlib semantics.strict=Truewas the safer choice in isolation (catches a footgun) but the wrong choice for a drop-in replacement (changes a documented behavior). We chose substitutability and document the truncation behavior in the user guide. - Result:
test_map_truncates_to_shortest_iterablepins the new behavior.test_map_two_iterables(lengths matched) continues to work unchanged.
2026-04-27: chunksize accepted but unused; not deprecated, no warning (Implemented)
- Decision:
HPXExecutor.mapaccepts achunksize: int = 1keyword for protocol parity but currently ignores it. No warning is emitted; the docstring notes the limitation. - Why: stdlib's
ThreadPoolExecutor.mapalso ignoreschunksize(onlyProcessPoolExecutorhonors it), so silent ignore is the conservative stdlib-aligned choice. Emitting a warning every time a user passes the parameter would create noise in code that targetsProcessPoolExecutor.mapand was ported as-is. Real chunk-size tuning lives at the parallel-algorithm layer (hpyx.parallel.for_loop(par, chunk_size=...)) which lands in Plan 3. - Result:
# noqa: ARG002suppresses lint complaints; the docstring tells users to pre-chunk manually if they need fine-grained control. No test assertion needed (silent no-op matches stdlib).
2026-04-27: Drop legacy hpyx.futures.submit shim outright; no deprecation window (Implemented)
- Decision:
src/hpyx/futures/_submit.pyandtests/test_submit.pyare deleted.hpyx.futures.__init__.pyno longer re-exportssubmit. The v0.xfrom hpyx.futures import submit; submit(fn, ...).get()pattern now raisesImportErrorimmediately. - Why: The v0.x
submitshim was already broken before this rewrite (it called the deferred-onlyhpx_async, returning a future that secretly ran on the calling thread). Keeping it around as a deprecation-warned shim would let user code keep limping along on the broken behavior. Deleting it forces an early visible failure (ImportErroris louder than a warning) and pushes users to the newhpyx.async_/HPXExecutorAPI. The v1.0 release notes call this out as a breaking change; the migration is mechanical (from hpyx.futures import submit→import hpyx,submit(fn, ...)→hpyx.async_(fn, ...),.get()→.result()). - Result:
tests/test_submit.pyis gone; coverage moved totests/test_executor.pyandtests/test_futures.py.docs/usage.mdexamples were rewritten to use the new API in the same commit, so the docs site has no broken examples on merge.
2026-04-24: hpyx.Future is a thin Python shell over _core.futures.HPXFuture (Implemented)
- Decision:
src/hpyx/futures/_future.py::Futurewraps_core.futures.HPXFuturein a class with__slots__ = ("_hpx", "_callbacks", "_callback_lock", "_callbacks_registered"). Most methods (result,exception,done,running,cancelled,cancel,share) are one-line delegations to the C++ object. The wrapper is what users see ashpyx.Future. - Why: Two layers of indirection are needed because the C++ side cannot construct the Python
Futureclass without a circular import, and because some semantics — FIFO callback ordering, synchronous fast-path on done futures, structured logging of callback errors, lazy asyncio import — are easier (and cheaper to test) in Python than in nanobind.__slots__keeps the per-Future overhead small for fan-out workloads and prevents accidental attribute additions. - Result:
tests/test_futures.pycoversisinstance(fut, hpyx.Future),concurrent.futures.Futureprotocol attribute conformance,.thenchains,add_done_callbackinvocation, andrepr(). The wrapper file is 137 lines.
2026-04-24: Lazy __await__ import keeps asyncio off the import path (Implemented)
- Decision:
Future.__await__doesfrom hpyx.aio import _future_awaitinside the method body, not at module load time. - Why: Two problems are solved at once. (1)
hpyx.aiois created in a later Phase 1 task; the wrapper has to ship in the meantime without an unresolved import. (2) Most users neverawaita Future — they call.result()— so loadingasyncioat import time would impose a cold-start cost on every consumer. Lazy import defers the cost to the firstawait fut, where it is already paid. - Result:
import hpyxdoes not pullasynciointosys.modules;await hpyx.async_(fn)works the momenthpyx.aiolands. Verified viaassert "hpyx.aio" not in sys.modulesafterimport hpyx.
2026-04-24: Python-side FIFO queue for add_done_callback (Implemented)
- Decision: Each Python
Futurekeeps an optionallist[Callable]of pending callbacks behind athreading.Lock. The firstadd_done_callback(fn)call registers exactly one C++_draincallback; subsequent calls just append to the list. When the underlyingHPXFuturefires,_drainsnapshots and clears the list (under the lock) and invokes each user callback in insertion order. - Why:
concurrent.futures.Future.add_done_callbackdocuments that callbacks fire in insertion order. The HPX.thenchain makes no FIFO guarantee across multiple registrations: calling_hpx.add_done_callback(cb1); _hpx.add_done_callback(cb2)may firecb2beforecb1depending on scheduler order. Centralizing the registration in one C++ callback that drains a Python-managed list restores FIFO semantics without touching the C++ side. - Result:
test_add_done_callback_fifo_orderregisters 5 callbacks and asserts the invocation list equals[0, 1, 2, 3, 4]. The lock makes registration safe across threads on free-threaded 3.13t.
2026-04-24: Synchronous fast-path when add_done_callback runs on a done Future (Implemented)
- Decision:
Future.add_done_callback(fn)checksself.done()first. If the future is already complete, it callsfn(self)synchronously on the calling thread and returns — no C++ registration, no thread switch. - Why:
concurrent.futures.Futureruns callbacks synchronously on the calling thread when added to an already-completed future. Without this fast-path, HPyX would dispatch the callback to an HPX worker, which is a different thread than the caller and surprises users who rely on stdlib semantics (sequence-builder patterns, post-completion bookkeeping). The C++ side already has a!fut_.valid()synchronous branch but does not handle thedone()case. - Result:
test_add_done_callback_already_done_runs_synchronouslyregisters a callback on ahpyx.ready_future(42)and asserts the callback's thread id equals the caller's. Errors raised in the synchronous path are caught and logged via the samehpyx.futureslogger that the async path uses.
2026-04-24: .then(fn) reuses the upstream Future instead of allocating a fresh ready_future (Implemented)
- Decision:
Future.then(fn)'s shim closure capturesselfand callsfn(self). It does NOT wrap the resolved value in a new_core.futures.ready_futureto construct a separate Future forfn. - Why: The upstream
selfis already a fully-resolved Future by the time the shim runs (that is what triggered the continuation), so passing it directly is semantically identical to building a freshready_future(value)— and free. Allocating a newHPXFutureplus wrapper per stage costs O(N) extramake_ready_future, INCREF, and Python heap allocations on deep.thenchains. Capturingselfdrops the per-stage cost to one closure. - Result:
test_then_passes_self_not_intermediatechains.thenand asserts the captured Future's.result()matches the upstream. No measurable overhead on chain depths up to 100.
2026-04-24: .then(fn) short-circuits on upstream exceptions (Documented)
- Decision: When the upstream Future raises,
fnis not invoked. The exception propagates through the.thenchain unchanged. The class docstring is explicit about this; users who need success-or-failure dispatch useadd_done_callback. - Why: This matches the C++ side (
HPXFuture::thenanddataflow_implboth short-circuit on the sentinel exception payload) and matchesconcurrent.futures.Future(which has no.thenbut its analogue, theadd_done_callback-driven chains, also propagate exceptions unchanged). The alternative — invokefnwith the failed Future and let it dispatch — was the original wrapper comment but is not what the C++ binding actually does, and "split the API across two semantics" is worse than "one chain, one rule." We chose to lock the rule and document it clearly. - Result:
test_then_short_circuits_on_upstream_exceptionconfirms the shim is never called when the upstream raises. Docstring onFuture.thendirects users toadd_done_callbackfor failure handling.
2026-04-24: hpyx.when_any() with empty input raises ValueError instead of hanging (Implemented)
- Decision: The Python
when_any(*futures)function raisesValueError("when_any requires at least one input")when called with no arguments. The guard is at the wrapper level, not in C++. - Why:
hpx::when_anyon an emptyvectorreturns a future that never resolves. Surfacing that as aValueErrorat the Python boundary turns a silent-hang programmer error into a fast, debuggable failure. We chose the wrapper level over the C++ side because the C++ binding is shared with internal callers that may have already filtered the input list, and Python is where the user-facing error lives.when_all([])returning()(a sensible neutral element) does not need the same guard. - Result:
test_when_any_empty_raisesconfirms the exception fires. Hang-free behavior validated by the test running to completion in < 0.05s.
2026-04-24: Implement dataflow via when_all().then() rather than hpx::dataflow (Implemented)
- Decision: The C++ binding for
dataflow(fn, inputs, kwargs)callshpx::when_all(raws).then(continuation)rather thanhpx::dataflow(launch, fn, raws). The continuation receives a singlehpx::future<std::vector<hpx::shared_future<PyPayload>>>, walks it for sentinel exceptions, and only then builds the*argstuple and invokesfn. - Why:
hpx::dataflowhas two relevant overloads: a variadic form that forwards each input as a separate argument, and a range form that forwards astd::vector<future<T>>as one argument. Mixing the two through Python — where N is decided at runtime, but the lambda signature is fixed at compile time — leads to a compile-time pack-vs-vector mismatch. Going throughwhen_all().then()collapses the call site to a single, fixed, range-style continuation; we then unpack the vector inside the lambda where we already need to walk it for sentinel detection. The semantics are identical tohpx::dataflowfor our use case (N inputs → call fn) and there is no measurable scheduling difference for Python-typed payloads. - Result:
dataflow_implinsrc/_core/futures.cppis ~50 lines of straight-line code with no template metaprogramming. Tests cover: positive path with 2 and 3 inputs, exception propagation from inputs (first-to-fail short-circuits without invokingfn), exception fromfnitself, and kwargs forwarding.
2026-04-24: nb::handle with nb::dict() default for dataflow kwargs, not nb::kwargs (Implemented)
- Decision: The C++ signature is
dataflow_impl(nb::callable fn, std::vector<HPXFuture> inputs, nb::handle kwargs), validated at runtime withPyDict_Check, and registered asm.def("dataflow", &dataflow_impl, "fn"_a, "inputs"_a, "kwargs"_a = nb::dict()). Users call it positionally (dataflow(fn, [f1, f2], {"k": v})) or by name (dataflow(fn, [f1, f2], kwargs={"k": v})). - Why: Nanobind's
nb::kwargscannot coexist with named-argument annotations on the preceding positional parameters. The static_assert innb_func.hrequiresnargs_provided == nargs, butnb::kwargsis counted as one ofnargswhile never accepting an"_a"annotation. The two ways out are (a) drop named annotations entirely, which loses keyword-call ergonomics forfnandinputs, or (b) usenb::handlewith a runtimePyDict_Checkand pass an empty dict default. Option (b) preserves the explicitdataflow(fn=..., inputs=..., kwargs=...)call shape while keeping nanobind's signature renderer happy. - Result: Tests pass for positional, keyword, and missing-kwargs paths. The
nb::dict()default is shared across calls, but our code only reads from it (Py_INCREFs and forwards toPyObject_Call), so no mutation hazard exists. If a future caller starts mutating it, we will switch to a per-call factory.
2026-04-24: Capture input HPXFuture wrappers in when_any continuation, not reconstruct (Implemented)
- Decision:
when_any_implcaptures the originalstd::vector<HPXFuture>in astd::shared_ptrbefore launching the continuation, then returns a(index, [HPXFuture, ...])tuple where the list contains the same wrapper instances the caller passed in. - Why: The result tuple needs to give the caller a way to retrieve both the winner's value (via
result()) and to inspect the laggards. Reconstructing freshHPXFuturewrappers from eachhpx::shared_future<PyPayload>would lose the per-wrappercancelled_andrunning_atomic flags thatconcurrent.futures.Futuresemantics require. Capturing the original wrappers is also free:HPXFutureis copy-cheap (ashared_futureplus twoshared_ptr<atomic>flags), and theshared_ptr<vector<HPXFuture>>keeps the wrappers alive until the continuation runs. - Result:
tests/test_futures.py::test_when_any_returns_index_and_futures_listpasses. The list-of-futures pattern matchesconcurrent.futures.wait()'s "done set / not-done set" output shape closely enough that the upcoming Python wrapper can map between them without rebuilding state.
2026-04-24: Store shared_ptr<PyObject> (PyPayload) in the future state, not nb::object (Implemented)
- Decision: The HPX future state holds
PyPayload = std::shared_ptr<PyObject>with a custom GIL-acquiring deleter (GILDecref) instead ofnb::object.HPXFuturewrapshpx::shared_future<PyPayload>. - Why:
nb::object's destructor callsPy_DECREFunconditionally. HPX may copy/move the future state on a worker thread that does not hold the GIL (during scheduling, when continuations fire, when shared states are reaped). A barenb::objectdecrementing without the GIL races with the interpreter and corrupts refcounts.shared_ptr<PyObject>is GIL-safe at every point: the control block uses atomic counters (no GIL needed for copy/move), and the deleter doesPyGILState_Ensure()→Py_DECREF→PyGILState_Release()so the actual reference release always happens with the GIL held. - Result:
tests/test_futures.pycoversasync_submitrunning on an HPX worker, exception preservation, and theexception()method — all five pass. The pattern carries over tothen()continuations andadd_done_callbackfor the same reason.
2026-04-24: Box Python exceptions in a sentinel tuple, not std::exception_ptr (Implemented)
- Decision: When the user's callable raises, the lambda catches
nb::python_error, callsPyErr_Fetch/PyErr_NormalizeException, and packs the result into a 4-tuple sentinel("__hpyx_exc__", exc_type, exc_value, exc_tb)stored as aPyPayload.result()andexception()detect the sentinel viais_exc_sentinel()and re-raise viaPyErr_SetRaisedException(Python 3.12+). - Why:
nb::python_errorcarriesPyObject*references that are GIL-thread-local. Letting it propagate throughstd::exception_ptris unsafe — HPX may rethrow it on a different thread that has no Python interpreter state attached, crashing the process. Boxing the exception into aPyPayloadwhile the original GIL is still held captures owned references that can travel through HPX's machinery and be unboxed on the consumer thread. - Result:
pytest.raises(ValueError, match="boom")works correctly throughfut.result(), andfut.exception()returns the original Python exception value with its traceback intact.
2026-04-24: HPYX_ASYNC_MODE env var for launch::async rollback (Implemented)
- Decision: Add
async_modetohpyx.config.DEFAULTS("async"by default), parsed fromHPYX_ASYNC_MODE("async"or"deferred", case-insensitive). The C++async_submitreads the env var directly viastd::getenvand selectshpx::launch::async(default) orhpx::launch::deferred(rollback). - Why: Spec risk #1 — switching
hpx_asyncfromlaunch::deferredtolaunch::asyncis the core correctness fix in v1, but it is also a behavior change that could expose latent GIL or threading bugs in user code. A no-touch rollback flag lets operators flip back to v0.x semantics without rebuilding or downgrading. The deferred path is intentionally preserved (not deleted) so the rollback is real and tested. - Result:
HPYX_ASYNC_MODE=deferred python -c "import hpyx; ..."returns the user's value but runs the callable in the calling thread on.result(). Tested as part of the issue #120 acceptance criteria. Will be removed in a future minor release once thelaunch::asyncpath is proven in production.
2026-04-24: Use nb::handle for args and kwargs parameters in async_submit (Implemented)
- Decision: The C++ binding signature is
async_submit(nb::callable fn, nb::handle args, nb::handle kwargs), validated at runtime withPyTuple_Check/PyDict_Check. The C++ parameter names arecall_args/call_kwargs(notargs/kwargs). - Why: Nanobind treats the
argsandkwargsparameter names as a hint to render them as*args/**kwargsin the generated Python signature, even when the type is a concretenb::tuple/nb::dict. Withnb::objectornb::tupleas the type and a positional-named parameter, the rendered signature collapses to(fn, **args), which makes the function impossible to call asasync_submit(fn, args_tuple, kwargs_dict). Usingnb::handleplus runtime type checks avoids the auto-collection and gives a clean three-positional signature. - Result:
core_futures.async_submit(body, (), {})works. The Python wrapper insrc/hpyx/futures/_submit.pycalls it asasync_submit(function, args, kwargs)with explicit tuple/dict.
2026-04-24: HPXFuture wraps shared_future, not future, internally (Implemented)
- Decision: Every
HPXFutureholds ahpx::shared_future<PyPayload>(nothpx::future<PyPayload>).share()is a no-op that copies the wrapper.async_submitcalls.share()on the result ofhpx::asyncbefore constructingHPXFuture. - Why: Python
concurrent.futures.Futureallowsresult()to be called multiple times (it caches the value). It allowsadd_done_callbackto be registered after the future has completed. It allows.then()chains. All three need the underlying state to be sharable —hpx::future<T>is a single-consumer move-only handle;hpx::shared_future<T>is a multi-consumer copyable handle. Usingshared_futureeverywhere matches Python semantics and removes a class of move-after-use bugs. - Result: Multiple
.result()calls return the same value..then()andadd_done_callbackcapture*thisby copy without invalidating the original.share()is exposed for API parity but is internally a no-op.
2026-04-24: Pixi/uv archive cache is invalidated by deleting ~/Library/Caches/rattler/cache/uv-cache/archive-v0/*/hpyx/ (Operational)
- Decision: When local rebuilds appear to silently revert the installed
_core.cpython-313t-darwin.soto an older version, manually clear the rattler/uv archive cache before reinstalling. - Why: Pixi declares hpyx as
[feature.hpyx.pypi-dependencies]: hpyx = { path = ".", editable = true }. Pixi syncs this through uv, which keeps a content-addressed archive cache at~/Library/Caches/rattler/cache/uv-cache/archive-v0/. Even withpip install --force-reinstall --no-cache-dir, the cached.sofrom an earlier successful build can be restored, masking source changes. Symptom: source hasasync_submit(nb::handle, ...)but compiled binary still showsasync_submit_impl(nb::callable, nb::args)strings, and the installed.sotimestamp predates the build output. - Result: The recipe is
for d in ~/Library/Caches/rattler/cache/uv-cache/archive-v0/*/; do [ -d "${d}hpyx" ] && rm -rf "$d"; done && pip install --no-build-isolation --no-cache-dir --force-reinstall -ve .. Captured here so future contributors do not lose hours debugging "the build isn't picking up my changes."
Phase 0 — Foundation (2026-04-24)
2026-04-24: Move C++ sources into src/_core/ package (Implemented)
- Decision: Move all flat
src/*.cpp/src/*.hppfiles intosrc/_core/and renameinit_hpx.*→runtime.*. - Why: The old flat layout put
bind.cpp,init_hpx.cpp,algorithms.cpp, andfutures.cppall in the same directory as the Pythonhpyx/package, with no structural separation between the top-level module glue and the implementation units. As HPyX grows to cover futures, parallel algorithms, and kernels, each gets its own.cppfile under_core/. The new layout gives nanobind a clean home (src/_core/) and makes it obvious that everything under_core/is compiled C++ whilesrc/hpyx/is pure Python. - Result:
CMakeLists.txtupdated to referencesrc/_core/*.cpp. No behavior change — 62 existing tests pass unchanged.init_hpx.cpprenamedruntime.cppto match its actual responsibility.
2026-04-24: Expose runtime API as _core.runtime submodule (Implemented)
- Decision: Register the HPX runtime bindings on a
_core.runtimesubmodule (viaregister_bindings(nb::module_&)) rather than directly on_core. - Why: HPyX v1 will expose
_core.futures,_core.parallel, and_core.kernelsas separate submodules. Keeping runtime on the top-level_corenamespace would mean future calls like_core.runtime_start()sit alongside_core.dot1d(), which is confusing. Moving them to_core.runtime.*now avoids a breaking rename later. - Result:
_core.runtime_start,_core.stop_hpx_runtime, etc. are removed from the top-level namespace. Replaced by_core.runtime.runtime_start,_core.runtime.runtime_stop,_core.runtime.runtime_is_running,_core.runtime.num_worker_threads,_core.runtime.get_worker_thread_id,_core.runtime.hpx_version_string. Callers inprint_versions.pyupdated.
2026-04-24: Release GIL during hpx::start construction (Implemented)
- Decision:
runtime_startreleases the GIL (nb::gil_scoped_release) around thenew global_runtime_manager(cfg)call instead of holding it. - Why: The original
init_hpx_runtimeheld anb::gil_scoped_acquirearound runtime construction. Under GIL-mode Python this was harmless but redundant. Under free-threaded Python 3.13t it is a bug:hpx::startspawns OS threads that eventually call Python-side callbacks; if the GIL is held during the blocking startup CV wait, any new thread that tries to acquire the GIL deadlocks. The fix is to release the GIL before callinghpx::startso worker threads can proceed. - Result:
runtime_startcorrectly supports both GIL-mode and free-threaded builds. Smoke-tested withsysconfig.get_config_var("Py_GIL_DISABLED") == 1on Python 3.13t.
2026-04-24: g_stopped flag prevents in-process restart (Implemented)
- Decision: Add an
std::atomic<bool> g_stoppedflag that is set totrueonruntime_stopand causesruntime_startto throwRuntimeErroron any subsequent call. - Why: HPX explicitly prohibits restarting the runtime within the same process (it uses process-global singletons for the scheduler, AGAS, etc.). Without this flag, calling
_core.runtime.runtime_start()after a stop would silently fail or crash. The flag surfaces the constraint as a clear Python exception. - Result:
RuntimeError: HPyX runtime has been stopped and cannot restart within this processis raised on any restart attempt. Tested intests/test_runtime.py::test_shutdown_makes_further_init_raise.
2026-04-24: hpyx._runtime as the single auto-init authority (Implemented)
- Decision: All public Python APIs that need the HPX runtime call
hpyx._runtime.ensure_started()rather than calling_core.runtime.runtime_start()directly. - Why: Without a central authority, every public function would need to independently manage the "has it started yet?" check, the env-var config merge, the conflicting-config error, and the
atexitregistration. That logic would be duplicated acrosshpyx.futures,hpyx.parallel,hpyx.debug, etc._runtime.ensure_started()is the single place where the lifecycle invariants are enforced. - Result:
hpyx.debug,hpyx.runtime.HPXRuntime, andhpyx.__init__.initall delegate to_runtime.ensure_started(). Thethreading.Lockin_runtimemakes it safe for concurrent callers on free-threaded 3.13t.
2026-04-24: atexit owns HPX shutdown (Implemented)
- Decision:
_runtime.ensure_started()registers_atexit_shutdownwith Python'satexitmodule on first start.HPXRuntime.__exit__is a no-op. There is no automatic teardown at the end of awith HPXRuntime():block. - Why: In v0.x,
HPXRuntime.__exit__called_core.stop_hpx_runtime(). This caused two problems: (1) after the context manager exited, any subsequent HPyX call in the same process would fail because HPX can't restart; (2) users writing scripts with multiplewith HPXRuntime():blocks got silent failures. The atexit approach means the runtime lives for the entire process lifetime — which matches what users actually want 99% of the time. - Result:
HPXRuntime.__exit__returnsNone. Process cleanup happens viaatexit. Users who need early shutdown callhpyx.shutdown()explicitly and understand the restart constraint.
2026-04-24: CMakePresets.json with profile preset for native profiling (Implemented)
- Decision: Add a
profilepreset withRelWithDebInfo+-fno-omit-frame-pointer+ IPO off. - Why:
py-spy --native,perf, andmemray --nativeall require C++ frames to be resolvable in the symbol table. A default Release build omits frame pointers (compiler optimization) and enables IPO (inlines away call boundaries), making profiler output unreadable for C++ code. Theprofilepreset keeps optimization level (-O2) while trading a small performance overhead for reliable frame resolution. - Result:
cmake --preset profileproduces a build suitable forpy-spy record --native -- python my_benchmark.py. Used byscripts/run_bench_local.sh(lands in Plan 4).
2026-04-24: hpyx.config as pure-Python env-var parser (Implemented)
- Decision:
hpyx.configis a pure-Python module (from_env()+DEFAULTS), not a C++ binding. - Why: Config values are only needed at Python startup time (before the first
_core.runtime.runtime_startcall). There is no need for C++ to know aboutHPYX_OS_THREADS— Python builds the HPX config strings and passes them as alist[str]. Keeping the config layer in Python makes it easy to test withmonkeypatch, import without a compiled extension, and extend without touching C++. - Result:
tests/test_config.pyhas 15 pure-Python tests with no build dependency. Env-var precedence is validated withmonkeypatch.
2026-04-24: hpyx.debug.enable_tracing stubbed in Phase 0 (Implemented)
- Decision:
enable_tracinganddisable_tracingraiseNotImplementedError("ships in v1.x (Plan 4)")rather than being absent from the public API. - Why: Advertising the tracing surface in Phase 0 stabilizes the public API shape so documentation and user code can reference
hpyx.debug.enable_tracing(path)consistently. The stub prevents silentAttributeErrors if someone tries to use it early and gives a clear error message explaining when it ships. - Result:
hpyx.debugis importable and documented from Phase 0 onward. Full JSONL-output implementation deferred to Plan 4.