Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gocircuit channel's performance #19

Open
briantk1988 opened this issue Mar 4, 2015 · 25 comments
Open

gocircuit channel's performance #19

briantk1988 opened this issue Mar 4, 2015 · 25 comments

Comments

@briantk1988
Copy link

I am thinking of using gocircuit's channel to build a message queue. Is the channel fast enough for this kind of thing? Thanks.

@petar
Copy link
Member

petar commented Mar 4, 2015

The channel sends data across TCP pretty much unmodified. There is one small RPC protocol frame around each transmission. That's the only overhead. I don't know how efficient it is, specifically, but I expect it to be quite reasonable.

@briantk1988
Copy link
Author

I have 2 sides questions:

  • What is the best way to use go's rpc in combination with gocircuit?
  • Do you think gocircuit is stable enough for production?

Thanks.

@petar
Copy link
Member

petar commented Mar 5, 2015

Go's RPC and the circuit are unrelated.

The circuit hasn't had bug reports in a long while, so I believe it is
production ready.

On 4 March 2015 at 18:48, briantk1988 [email protected] wrote:

I have 2 sides questions:

  • What is the best way to use go's rpc in combination with gocircuit?
  • Do you think gocircuit is stable enough for production?

Thanks.


Reply to this email directly or view it on GitHub
#19 (comment).

@briantk1988
Copy link
Author

Since the circuit is a service discovery framework, I think that it's natural to ask how to use the two together. Since the two are separated, I guess the only way is get the ip address from the circuit (is this possible?) and then call rpc as usual. Is that correct?

Please correct me if I'm wrong (I'm very new to go).

Thanks.

@petar
Copy link
Member

petar commented Mar 7, 2015

Soon there will be a tutorial on the site that answers your questions.

For now, see if this helps:

Say you want to launch a server process on host1 and a client process
(that connects to the server) on host2. (These would be your RPC
applications.)

Since you are the party launching both processes, you can arrange that
they find each other in many different ways.

For instance, when you start the server process you can remember or just
set what IP address it listens to. And then when you start the client
process,
you can simply tell it on startup where the server is.

If you are wondering how to discover the IP address of a particular
circuit host:

Use the circuit to run a shell command on the target host, which
prints out its IP address. The shell command depends on what
underlying system you have on your hosts. For example, if the host
is OSX you could use:

ifconfig en0 | grep 'inet ' | awk '{print $2}'

Cheers

Petar

On 5 March 2015 at 21:35, briantk1988 [email protected] wrote:

Since the circuit is a service discovery framework, I think that it's
natural to ask how to use the two together. Since the two are separated, I
guess the only way is get the ip address from the circuit (is this
possible?) and then call rpc as usual. Is that correct?

Please correct me if I'm wrong (I'm very new to go).

Thanks.


Reply to this email directly or view it on GitHub
#19 (comment).

@briantk1988
Copy link
Author

This is very helpful! Thank you.

There's one thing that confuses me a bit in this phrase "when you start the server process you can remember or just set what IP address it listens to." To my understanding, most computers have one ip address, which is automatically also the ip address of the circuit server. However, when you start a circuit server, you always indicate the ip address (instead of just the port, for example). What is the reason behind this?

Thanks.

@petar
Copy link
Member

petar commented Mar 7, 2015

In some datacenter computers can have more than one IP.

If you are playing at home, you can just use 127.0.0.1

On 6 March 2015 at 22:44, briantk1988 [email protected] wrote:

This is very helpful! Thank you.

There's one thing that confuses me a bit in this phrase "when you start
the server process you can remember or just set what IP address it listens
to." To my understanding, most computers have one ip address, which is
automatically also the ip address of the circuit server. However, when you
start a circuit server, you always indicate the ip address (instead of just
the port, for example). What is the reason behind this?

Thanks.


Reply to this email directly or view it on GitHub
#19 (comment).

@briantk1988
Copy link
Author

Thanks. I suppose that the ipaddr of each circuit host is stored somewhere. So, is it possible to retrieve it via some circuit api instead of running the shell command?

Another comment/question: currently, via the join/leave channel, we can detect easily when a new host has been added to the circuit. Is it possible to have the same thing for other objects inside the anchor path (like process/channel etc.)? This would be quite convenient and useful!

@maymounkov
Copy link

On 7 March 2015 at 23:04, briantk1988 [email protected] wrote:

Thanks. I suppose that the ipaddr of each circuit host is stored
somewhere. So, is it possible to retrieve it via some circuit api instead
of running the shell command?

Yes. On the command-line "circuit peek /X1234" will give you a JSON object
containing the circuit address. There is a respective Peek() method in the
Go API.

Another comment/question: currently, via the join/leave channel, we can
detect easily when a new host has been added to the circuit. Is it possible
to have the same thing for other objects inside the anchor path (like
process/channel etc.)? This would be quite convenient and useful!

It is possible to add this functionality. I would need a concrete use case
to understand its generality though.


Reply to this email directly or view it on GitHub
#19 (comment).

@briantk1988
Copy link
Author

Thanks for informing me about the Peek() api.

Re join/leave: the join/leave for hosts facilitates host discovery, while the join/leave for general elements in the anchor path can help facilitate service discovery. Of course, you can always do ls to list everything and then do whatever with the (possibly huge) list whenever you need to see what has been added. But even for the case of hosts, having a dedicated channel is already much more convenient (and probably this is why you implemented this for hosts). Moreover, in general, elements in the anchor path (representing services) have undetermined life span which might have nothing to do with the life span of its host.

Thanks.

@petar
Copy link
Member

petar commented Mar 12, 2015

See inline:

On 9 March 2015 at 09:54, briantk1988 [email protected] wrote:

Thanks for informing me about the Peek() api.

Re join/leave: the join/leave for hosts facilitates host discovery, while
the join/leave for general elements in the anchor path can help facilitate
service discovery.

There is a philosophical issue here:

Hosts have to be discovered by the user software, because the software is
unaware when the technician adds a new host to the system.

General elements are created by the user software, so why would the user
software need to discover them if it was the reason for their creation in
the first place?
Shouldn't the software know where it places its things?

Of course, you can always do ls to list everything and then do whatever
with the (possibly huge) list whenever you need to see what has been added.

Yes. Doing ls on everything is not expensive at all, even for gigantic
clusters.
Furthermore, a cloud application which is well-designed from scratch will
rarely
need to "discover" things.

I generally think that the style of cloud programming where "discovering"
services is a common operation is entirely flawed:

In a simple single process program, does one ever lose objects they created
themselves and need discovery mechanisms? Not really. So why is that
necessary in distributed programming?

But even for the case of hosts, having a dedicated channel is already much

more convenient (and probably this is why you implemented this for hosts).

This was not added for convenience alone.
The event "host added" is caused by an external party and it is
asynchronous from the point of view of the software. This is why
notifications are a clean design choice.

Creation of processes, on the other hand, is not caused by external parties.
It is caused by the software itself. So it is not clear why the software
needs to notify itself that it did something.

Death of processes, on the other hand, is not caused by the software.
But you can be notified of the death event of a process, using the
specialized Wait command (both in the Go and the command-line API).

Moreover, in general, elements in the anchor path (representing services)

have undetermined life span which might have nothing to do with the life
span of its host.

Same thing: The death of a process can be caught using the Wait command.

Cheers
Petar

Thanks.


Reply to this email directly or view it on GitHub
#19 (comment).

@briantk1988
Copy link
Author

Thanks for the very detailed response!

I guess I have something different in mind, which I will try to explain below. I would love to hear your opinion. And again, since I'm completely new to this area, your comments are truly appreciated since I can learn a lot from them.

Re. Software/external party created general element: Suppose I have a cloud app that has 2 types of servers, say front and back. To start my app, I would run gocircuit at each node, join all of them to the circuit, and then run a particular process (say a program written in golang utilizing the gocircuit api) depending on whether the server is front or back. In the setting, the starting of each process is done by some external party (namely me, and not the software), and so it's asynchronous wrt. the software itself.

The first issue to address is how each server knows who is who to communicate (eg. a front server might need to pick one back server for a certain task). One way to solve it is to add an extra name to the path for recognition purposes, and communication is done via gocircuit's channel. When a server needs to communicate it can do an ls and filter out the desired member(s). Alternatively, if there were a join/leave event for the creation/destruction of general elements in the anchor path, then each process running on each node can save it directly and then later use for communication.

Re. Performance of ls: Someone this new to the field like me probably shouldn't try to create a huge system. But just for the fun of it, let me assume that I actually have a huge system, say a couple of million. Then, if I use ls and then filter out the desired node each time one node needs to communicate with another, it's quite an overhead. This overhead would be largely reduced if there were a join/leave system for general elements.

Question 1: Maybe there's a better way to use ls in this scenario?

Quesiton 2: More importantly, you seem to have something completely different in mind, since you said that a well-designed cloud app shouldn't be too dependent on discovery. I would love to hear how to redesign what I said above into something better.

Thanks.

@petar
Copy link
Member

petar commented Mar 16, 2015

Inline:

On 12 March 2015 at 22:57, briantk1988 [email protected] wrote:

Thanks for the very detailed response!

I guess I have something different in mind, which I will try to explain
below. I would love to hear your opinion. And again, since I'm completely
new to this area, your comments are truly appreciated since I can learn a
lot from them.

Re. Software/external party created general element: Suppose I have a
cloud app that has 2 types of servers, say front and back. To start
my app, I would run gocircuit at each node, join all of them to the
circuit, and then run a particular process (say a program written in golang
utilizing the gocircuit api) depending on whether the server is front
or back. In the setting, the starting of each process is done by some
external party (namely me, and not the software), and so it's asynchronous
wrt. the software itself.

The first issue to address is how each server knows who is who to
communicate (eg. a front server might need to pick one back server
for a certain task). One way to solve it is to add an extra name to the
path for recognition purposes, and communication is done via gocircuit's
channel. When a server needs to communicate it can do an ls and filter
out the desired member(s). Alternatively, if there were a join/leave event
for the creation/destruction of general elements in the anchor path, then
each process running on each node can save it directly and then later use
for communication.

I just put the first draft of the new circuit site online. Go to
http://gocircuit.org and
scroll to the bottom. There is a tutorial on creating a cloud app with two
servers.
See if this answers your question.

Re. Performance of ls: Someone this new to the field like me probably
shouldn't try to create a huge system. But just for the fun of it, let me
assume that I actually have a huge system, say a couple of million. Then,
if I use ls and then filter out the desired node each time one node needs
to communicate with another, it's quite an overhead. This overhead would be
largely reduced if there were a join/leave system for general elements.

Question 1: Maybe there's a better way to use ls in this scenario?

Quesiton 2: More importantly, you seem to have something completely
different in mind, since you said that a well-designed cloud app shouldn't
be too dependent on discovery. I would love to hear how to redesign what I
said above into something better.

I have a programming technique in mind, which won't fit in an email. I will
rather plan to write a documentation on it later on when I finish with more
basic
docs.

I can give you a short hint: The idea is that you will make multiple
circuit clients
applications, let's call them agents.
Each of them would be responsible for monitoring only a subset of
hosts, so it will have to ls only those hosts. A higher-level logic may
assign
new hosts to agents and create new agents as necessary.
The same higher level logic would also monitor the agents. If an agent dies,
it will create a new one and reassign the dead agent's duties to the new
one.

P

Thanks.


Reply to this email directly or view it on GitHub
#19 (comment).

@briantk1988
Copy link
Author

Thanks for the reply!!!

About the programming technique you have in mind, as far as I understand, the agents act as admins for the orchestration of the cloud. What I have in mind is essentially self orchestrate, which seems to be one of the goals of GoCircuit. In this paradigm, each circuit host would be notified whenever a new service (process/channel etc.) or host is available, and each would use this info to save a list of service/host it wants to talk to. Of course, having channels for general elements' join/leave is not strictly necessary for this purpose, but it's quite convenient and efficient.

What do you think?
Thanks!

@briantk1988
Copy link
Author

Another quick comment: the tutorial on gocircuit.org seems to gear toward using gocircuit to write an orchestration tool, while my comments above are about embedding gocircuit into the program logic.

@petar
Copy link
Member

petar commented Mar 18, 2015

I think more generally, you are asking for a general broadcasting mechanism,
which can broadcast on different topics, and where every topic has a source
host where the new messages are pushed.

You would have something like:

circuit make_topic /X123/topic_a

circuit make_subscription /X789/subs_a /X123/topic_a

circuit send /X123/topic_a MSG

circuit recv /X789/subs_a

A broadcasting facility would be useful and I'll plan to implement one
eventually.

P

On 17 March 2015 at 16:18, briantk1988 [email protected] wrote:

Thanks for the reply!!!

About the programming technique you have in mind, as far as I understand,
the agents act as admins for the orchestration of the cloud. What I have in
mind is essentially self orchestrate, which seems to be one of the goals of
GoCircuit. In this paradigm, each circuit host would be notified whenever a
new service (process/channel etc.) or host is available, and each would use
this info to save a list of service/host it wants to talk to. Of course,
having channels for general elements' join/leave is not strictly necessary
for this purpose, but it's quite convenient and efficient.

What do you think?
Thanks!


Reply to this email directly or view it on GitHub
#19 (comment).

@ghost
Copy link

ghost commented Mar 19, 2015

New docs look great. Thanks Petar

I need to hook a message queue in. I have been using kafka.

Any advice.

About self orchestration.
There is a cloud orchestration system that does this.
Coopr by cask. The how it works page explains it. Its using set theory.

Ast / cfg if compute graph. Any plans for a reflection based GUI tool ?
I developed a web GUI for parametric flow control and I could hook into circuit if an api for this was there. Maybe I missedvit in the current api.

@briantk1988
Copy link
Author

@petar A general mechanism for message broadcasting would be fantastic! Health check for a general element could then be implemented on top of that.

@gedw99 Thanks for letting me know about Coopr, I'll take a look into it. Also, by "hooking a message queue" do you mean putting kafka into the circuit, or do you mean implementing a message queue using the circuit?

@ghost
Copy link

ghost commented Mar 19, 2015

Thank you.

Well kafka is very full featured for fan in / fan out cfg topologies, so its probably a fair bit of work.
But there is a very good message queue written in go that has excellent perf and memory management that might not need too much extra to meet the various needs. In terms of assessing the full picture of what a message queue needs to fulfil in this domain I would recommend reading the excellent docs of samza. Kafka and samza are both linked inventions if I recall. When you read the two together its like a glove fitting a hand !

The message queue is
https://github.com/bitly/nsq


Wondering what you think about the CFG reflection idea. I am a bit attached to the idea. It would allow the develop as well as a user (think ETL and business reporting) to see the whole big picture or pick a start point and see just that flow.
It would also provide an very nice tracing tool to see metrics like back pressure.
Also I would add that the next logical win would be that the system could detect back pressure and deploy more compute instances. In this case kafka automatically adapt to extra compute instances.
But after that you can use particle swarm optimisation to train / tune the system and evolve to the fastest perf and or highest perf / watt.
Thats damn exciting

@maymounkov
Copy link

Inline:

On 19 March 2015 at 02:28, Ged [email protected] wrote:

New docs look great. Thanks Petar

I need to hook a message queue in. I have been using kafka.

Any advice.

There's NSQ.io as well.

About self orchestration.
There is a cloud orchestration system that does this.
Coopr by cask. The how it works page explains it. Its using set theory.

Ast / cfg if compute graph. Any plans for a reflection based GUI tool ?
I developed a web GUI for parametric flow control and I could hook into
circuit if an api for this was there. Maybe I missedvit in the current api.

I don't plan to make a GUI myself in the short term. The Go API should
suffice
for building a GUI though.


Reply to this email directly or view it on GitHub
#19 (comment).

@maymounkov
Copy link

The tutorial just shows how to use the Go API.

You can use the API from within your services as well to start new services
and so on..

On 18 March 2015 at 12:33, briantk1988 [email protected] wrote:

Another quick comment: the tutorial on gocircuit.org seems to gear toward
using gocircuit to write an orchestration tool, while my comments above are
about embedding gocircuit into the program logic.


Reply to this email directly or view it on GitHub
#19 (comment).

@petar
Copy link
Member

petar commented Mar 24, 2015

You could also build a DIY broadcasting service by implementing a
broadcasts node agent and than using the circuit to run the agent
everywhere and
connect all the agents in a distribution tree.

This is akin to using the circuit to install something like etcd or nsq.

On 19 March 2015 at 09:29, briantk1988 [email protected] wrote:

@petar https://github.com/petar A general mechanism for message
broadcasting would be fantastic! A health check for a general element could
then be implemented on top of that.

@gedw99 https://github.com/gedw99 Thanks for letting me know about
Coopr, I'll take a look into it. Also, by "hooking a message queue" do you
mean putting kafka into the circuit, or do you mean implementing a message
queue using the circuit?


Reply to this email directly or view it on GitHub
#19 (comment).

@briantk1988
Copy link
Author

Hi @petar, I just browsed the source code. There seems to be some sort of pubsub already set up, which is used in making the join/leave for hosts. How much work is needed to implement the subscription feature you mentioned above? It seems like it's not too much (though I don't understand the code very well yet).

In circuit/anchor/x.go, there's a function

func (y YTerminal) Make(kind string, arg interface{}) (yelm interface{}, err error) 

where you make some (rpc?) call

r := y.X.Call("Make", kind, arg)

but I can't seem to find where "Make" is actually implemented. Could you point me to the right place? Also, is there any reason why you don't use the standard rpc library of golang?

Thanks.

@petar
Copy link
Member

petar commented Mar 29, 2015

On 28 March 2015 at 22:54, briantk1988 [email protected] wrote:

Hi @petar https://github.com/petar, I just browsed the source code.
There seems to be some sort of pubsub already set up, which is used in
making the join/leave for hosts. How much work is needed to implement the
subscription feature you mentioned above? It seems like it's not too much
(though I don't understand the code very well yet).

The join/leave system broadcasts all host events to all hosts, which is ok
because this events are rare. But if you want to pub/sub for all types of
events,
you would need to implement something more efficient that only sends
events to the points that are subscribed. So this needs a fresh
implementation.

In circuit/anchor/x.go, there's a function

func (y YTerminal) Make(kind string, arg interface{}) (yelm interface{}, err error)

where you make some (rpc?) call

r := y.X.Call("Make", kind, arg)

but I can't seem to find where "Make" is actually implemented. Could you
point me to the right place? Also, is there any reason why you don't use
the standard rpc library of golang?

That's implemented in circuit/anchor/term.go

My RPC library is substantially more powerful than the default one.
Using my library, you can pass interfaces (pointers) to other processes,
for instance,
and GC works correctly across workers.
"My RPC library" is actually a separate "product". You can find
just the RPC library here: github.com/gocircuit/core

It needs a documentation update. I'll try to do that this week.

P

Thanks.


Reply to this email directly or view it on GitHub
#19 (comment).

@briantk1988
Copy link
Author

Thanks. Your rpc library sounds great! From what you said then probably it doesn't take much work to have the join/leave for general elements in the anchor path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants