Skip to content

v0.11.7

Compare
Choose a tag to compare
@github-actions github-actions released this 03 Oct 17:02
· 1934 commits to main since this release
6fb049f

Many important fixes for RTS/DB and the language in general!

Added

  • Bash completion is now part of the Debian packages & brew formula

Changed

  • actondb now uses a default value for gossip port of RPC port +1 [#913]
    • The gossip protocol only propagates the RPC port & parts of the
      implementation has a hard-coded assumption that the gossip port has a +1
      offset
    • In order to avoid configuration errors, the default gossip port is now RPC
      port + 1 and if another gossip port is explicitly configured, an error log
      message is emitted on startup.
    • While this is marked as a change, it could really be considered a fix as any
      other configuration of the system was invalid anyway.

Fixed

  • Fixed include path for M1
    • /opt/homebrew/include added to header include path [#892]
    • Actually fixes builds on M1!
    • This has "worked" because the only M2 where Acton was tested also had header
      files in /usr/local/include but on a fresh install it errored out.
  • Fix up-to-date check in compiler for imported modules from stdlib [#890]
  • Fix seed arg parsing in actondb that lead to "Illegal instruction" error
  • Fix nested dicts definitions [#869]
    • Now possible to directly define nested dicts
  • Avoid inconsistent view between RTS & DB in certain situations [#788]
    • If an RTS node was stopped & quickly rejoins or if a transient partition
      happens and the gossip round does not complete before the partition heals.
    • We now wait for gossip round to complete.
    • This ensures that local actor placement doesn't fail during such events.
  • Fix handling of missed timer events [#907]
    • Circumstances such as suspending the Acton RTS or resuming a system from the
      database could lead to negative timeout, i.e. sleep for less than 0 seconds.
    • The libuv timeout argument is an uint64 and feeding in a negative signed
      integer results in a value like 18446744073709550271, which roughly meant
      sleeping for 584 million years, i.e. effectively blocking the RTS timerQ.
    • It's now fixed by treating negative timeouts as 0, so we immediately wake up
      to handle the event, however late we might be.
  • Timer events now wake up WT threads after system resumption [#907]
    • Worker Threads (WT) are created in NoExist state and should transition
      into Idle once initiated, however that was missing leading to a deadlock.
    • This was masked as in most cases, a WT and will transition into Working
      once they've carried out some work and then back into Idle
    • wake_wt function, which is called to wake up a WT after a timer event is
      triggered, wakes up threads that are currently in Idle state, if they are
      in NoExist, it will do nothing.
    • If there is no work, such as the case after system resumption from the DB,
      WTs will stay in the NoExist state and then wake_wt will do nothing, so
      the system is blocked.
    • WT now properly transition into Idle.
  • Only communicate with live DB nodes from RTS DB client [#910] [#916]
    • When the RTS communicates with the DB nodes, we've broadcast messages to all
      servers we know about. If they are down, they've had their socket fd set to
      0 to signal that the server is down. However, fd=0 is not invalid, it is
      stdin, so we ended up sending data to stdin creating lots of garbage output
      on the terminal.
    • fd -1 is used to signal an invalid fd, which prevents similar mistakes.
    • The DB node status is inspected and messages are only sent to live servers.
  • Avoid segfault on resuming TCP listener & TCP listener connection [#922]
    • Invalidate fds on actor resumption [#917]
  • Remove remaining ending new lines from RTS log messages [#926]
  • Remove ending new lines from DB log messages [#932]

Testing / CI

  • Rewritten RTS / DB tests [#925] [#929]
    • More robust event handling, directly reacting when something happens, for
      example if a DB server segfaults or we see unexpected output we can abort
      the test
    • Now has much better combined output of DB & app output for simple
      correlation during failures
    • Test orchestrator now written in Acton (previously Python), at least async
      IO callback style is better supported to directly react to events...