Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

createActor capability failures are not easy to diagnose #14

Open
kquick opened this issue Nov 9, 2017 · 3 comments
Open

createActor capability failures are not easy to diagnose #14

kquick opened this issue Nov 9, 2017 · 3 comments

Comments

@kquick
Copy link
Owner

kquick commented Nov 9, 2017

When a createActor is performed and there is no ActorSystem that can meet the requirements of that Actor, the failure tends to be obscure: the thespian.log internal log will contain something like "Pending Actor create for ActorAddr-(T|:43797) failed (3586): None", and subsequent messages are likely to result in a PoisonMessage response:

From thespianpy#50:
"The message wrapped into the poison message (attribute poisonMessage) is the initialization message sent to the named actor after creation. The attribute details contains the string Child Aborted."

It would be nice if there was a clearer indication of the createActor failure and the corresponding failure reason.

@KorbinianK
Copy link

Not entirely sure if it fits to this, but I am having trouble with continuing to process something even though an actor couldn't be created and the AS therefore not having the correct capabilities.

Scenario:
An ActorSystem gets asked to build a message dynamically. For that 2 or 3 actors exist, [A] [B] and [C].
For B, the AS needs the capability "b". If the AS fulfills the requirement, it should return the string "[A][B][C]" if it doesn't, it should return "[A][C]".
When I start two AS, one with "b" and one without it works flawlessly, also when starting a single one with "b" it prints the correct string (obviously). But when I start a single AS that cannot create a [B], it only creates a single (correct) string and then hangs. It does try to send a message to A but it never arrives. I also checked the TCP-dumps and tried UDP instead. Can you point me in the right direction on this? Code and Log output below.

Code(actor_test.py)

I start an actor system by using the start scripts from the docu python3 start.py 1900 and/or python3 start.py 1900 "b"

from thespian.actors import *
from datetime import timedelta
from collections import OrderedDict 
import time
import json
import random
import logging
from logsetup import logcfg 

class Message(object):
    def __init__(self,message,origin,requester):
        self.new = False
        self.requester = requester
        self.origin = origin
        self.sendTo = []
        try:
            msg = json.loads(message)
            self.message = msg['msg']
            if(msg['new']): 
                self.new = True
        except ValueError:
            pass
    def __str__(self): 
        return self.message

    def remove(self,sender):
        try:
            del self.sendTo['actor_test.'+sender.__class__.__name__]
        except:
            print(self.sendTo)

class Handler(ActorTypeDispatcher):
    def __init__(self, *args, **kw):
        super(Handler, self).__init__(*args, **kw)
        parts = {   'actor_test.A':None,
                    'actor_test.B':None,
                    'actor_test.C':None,
                    }
        self.parts = OrderedDict(parts)

    def receiveMsg_str(self, message, sender):
        logging.info('Handler received String: %s', message)
        msg = Message(message,sender,self.myAddress)
        if msg.new:
            for each in self.parts:
                if not self.parts[each]:
                    logging.info("Creating %s",each)
                    actor = self.createActor(each)
                    self.parts[each] = actor
                else:
                    logging.info("exists %s",self.parts[each])
            msg.sendTo = self.parts
            logging.info('sending %s to: %s',msg, self.parts[list(self.parts.keys())[0]])       
            self.send(self.parts[list(self.parts.keys())[0]],msg)
        else:
            pass

    def receiveMsg_Message(self,message,sender):
        logging.info('Handler received Message Object with: %s', message.message)
        if(len(message.sendTo) > 0):
            nextTo = message.sendTo[list(message.sendTo.keys())[0]]
            if nextTo is None:
                logging.error("%s does not exist", list(message.sendTo.keys())[0])
                del(message.sendTo[list(message.sendTo.keys())[0]])
                nextTo = message.sendTo[list(message.sendTo.keys())[0]]
            self.send(nextTo,message)
        else:
            self.send(message.origin,message)

    def receiveMsg_ChildActorExited(self, message, sender):
        print("child exited:",message.childAddress)
        for ename in self.parts:
            if self.parts[ename] == message.childAddress:
                self.parts[ename] = None
                return

class Abstract(ActorTypeDispatcher):
    def receiveMsg_Message(self,message,sender):
        logging.info("Recieved message")
        self.add(message)
    def add(self,message):
        return ""

@requireCapability('b')
class B(Abstract):
    def add(self,message):
        logging.info("B called")
        message.message += "[B]"
        message.remove(self)
        self.send(message.requester,message)

class C(Abstract):
    def add(self,message):
        logging.info("C called")
        message.message += "[C]"
        message.remove(self)
        self.send(message.requester,message)
        return 

class A(Abstract):
   def add(self,message):
        logging.info("A called")
        message.message += "[A]"
        message.remove(self)
        self.send(message.requester,message)
        return

if __name__ == "__main__":
    import sys
    asys = ActorSystem('multiprocTCPBase')
    handler = asys.createActor('actor_test.Handler')
    for i in range(10):
        answer = asys.ask(handler, json.dumps({"msg":"Test","new":True}))
        print(answer)
        time.sleep(2)
        pass
    asys.tell(handler, ActorExitRequest())
    sys.exit(0)

Logging output (with B)

INFO:root:Handler received String: {"msg": "Test", "new": true}
INFO:root:Creating actor_test.A
INFO:root:Creating actor_test.B
INFO:root:Creating actor_test.C
INFO:root:sending Test to: ActorAddr-LocalAddr.0
INFO:root:Recieved message
INFO:root:Handler received Message Object with: Test[A]
INFO:root:A called
INFO:root:Recieved message
INFO:root:Handler received Message Object with: Test[A][B]
INFO:root:B called
INFO:root:Recieved message
INFO:root:Handler received Message Object with: Test[A][B][C]
INFO:root:C called
--> Test[A][B][C]

INFO:root:Handler received String: {"msg": "Test", "new": true}
INFO:root:Recieved message
INFO:root:exists ActorAddr-0-ActorAddr-(T|:37699)
INFO:root:A called
INFO:root:exists ActorAddr-1-ActorAddr-(T|:36929)
INFO:root:Recieved message
INFO:root:exists ActorAddr-2-ActorAddr-(T|:34455)
INFO:root:B called
INFO:root:sending Test to: ActorAddr-0-ActorAddr-(T|:37699)
INFO:root:Recieved message
INFO:root:Handler received Message Object with: Test[A]
INFO:root:C called
INFO:root:Handler received Message Object with: Test[A][B]
--> Test[A][B][C]

.....etc

Logging output (without B)

INFO:root:Handler received String: {"msg": "Test", "new": true}
INFO:root:Creating actor_test.A
child exited: ActorAddr-LocalAddr.1
INFO:root:Creating actor_test.B
INFO:root:Creating actor_test.C
INFO:root:sending Test to: ActorAddr-LocalAddr.0
ERROR:actor_test.Handler:Pending Actor create for ActorAddr-(T|:36909) failed (3586): None
INFO:root:Recieved message
INFO:root:A called
INFO:root:Handler received Message Object with: Test[A]
ERROR:root:actor_test.B does not exist
INFO:root:Recieved message
INFO:root:Handler received Message Object with: Test[A][C]
INFO:root:C called
--> Test[A][C]

INFO:root:Handler received String: {"msg": "Test", "new": true}
INFO:root:exists ActorAddr-0-ActorAddr-(T|:36509)
child exited: ActorAddr-LocalAddr.3
INFO:root:Creating actor_test.B
INFO:root:exists ActorAddr-2-ActorAddr-(T|:33733)
INFO:root:sending Test to: ActorAddr-0-ActorAddr-(T|:36509)
ERROR:actor_test.Handler:Pending Actor create for ActorAddr-(T|:36909) failed (3586): None

@kquick
Copy link
Owner Author

kquick commented Nov 15, 2018

Hi @KorbinianK,

Your situation is related in that it's hard to supply the information that would help you diagnose this directly, and it's also having some subtle effects.

  • The Handler Actor is being told about a problem, but you are using the ActorTypeDispatcher and do not have a handler for unrecognized messages, so the problem message is getting discarded. It should probably be stated more strongly in the documentation, but I would recommend always providing a handler for unrecognized messages. For your Handler, you could add:
    def receiveUnrecognizedMessage(self, message, sender):
        logging.warning("Handler got unknown message from %s: %s (%s)", sender, message, type(message))
  • The above will tell you that you are getting a PoisonMessage (https://thespianpy.com/doc/using.html#hH-407c4c79-2a05-442d-b6e8-5bf7c2f2d068). When the creation of the [B] actor fails, any pending messages for that Actor get wrapped in a PoisonMessage and returned with a "Child Aborted" indication. Using this, you can fix your example by adding a PoisonMessage handler to the Handler actor:
    def receiveMsg_PoisonMessage(self, poisonmsg, sender):
        origmsg = poisonmsg.poisonMessage
        if isinstance(origmsg, Message):
            logging.warning('Message returned as poison: %s', poisonmsg)
            self.send(self.myAddress, origmsg)   # fake a response from the actual target
        else:
            logging.error("Unrecognized poisonmessage from %s: %s (%s)', sender, origmsg, type(origmsg))
  • The above will help your example run when there is no ActorSystem with the b attribute, but you are probably still wondering why it seemed to work for the first iteration and then hang on the second iteration; this is where some subtle effects come in to play:
    • The createActor is asynchronous, so in the initial run of Handler, it requests that three actors be created and it gets a "Local" address for each of them; the Local address indicates that the actual address is pending, but won't be known until the actor creation completes.
    • It is not possible to send a message with a Local address in it. When the Handler tries to do so, the message is queued internally within the Handler until the actor creation completes and the Local addresses can be resolved to actual addresses which other actors could receive and use.
    • In the first run, the actor creation effectively happens in the order they were issued, so the failure to create [B] is observed by the Handler's receiveMsg_ChildActorExited handler and the corresponding parts value is set to None. Because Python uses references to dictionaries, the self.parts is still the same as the msg from receiveMsg_str's self.send(self.parts[tgt],msg) line.
    • When [C] is finally created, the Handler can now send the first Message to [A], and the message.sendTo is essentially {'a': addr_A, 'b': None, 'c': addr_C }, so when [A] sends it back, it goes to [C] and ultimately you print Test [A][C] as expected. Notably, message.sendTo does not contain an address for (the failed) [B], so there is no attempt to send a message there, and therefore no PoisonMessage response.
    • Then the second iteration runs, and only [B] is None, so a single createActor for [B] is attempted, and then the send is attempted, but the [B] address in the sendTo for the message is still Local, so it is queued internally for the Handler.
    • The createActor for [B] fails, and [A] is notified. This time the pending outgoing message contains a Local Address for [B] instead of None, and so when it is discovered that it is impossible to create an address for [B], the message cannot be delivered, so it is wrapped in the PoisonMessage and handled as described above. Without a PoisonMessage handler (or an unrecognized message handler) for this message, it was just being dropped, and that is why it seemed as if your example simply hung.

Hopefully this helps and will be useful in your actual implementation. If you have more problems, or if I haven't explained the above very clearly, please let me know.

-Kevin

@KorbinianK
Copy link

Thanks a lot again for the very detailed and useful explanation. It cleared up a few things and helped a great deal. I'm working towards a boilerplate handler and message for my implementation and catching those messages will certainly be useful 👍 So glad this project still receives support by the developer, having lots of fun with the actor system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants