JGroups - A Toolkit for Reliable Multicast Communication


JGroups

Documentation

Download

Getting Involved

Links

Frequently Asked Questions

  1. Does JChannel have a limit on the size of messages sent over it ?
  2. Is it time consuming to port all of our code from JChannel to EnsChannel?
  3. I receive an error message when starting EnsChannel 'Some IO garbage at ensemble outboard startup'. What does it mean ?
  4. When 2 EnsChannels start up, they do not seem to 'find' each other (they don't form a process group).
  5. How can I tunnel I firewall ?
  6. What happens if I pull the plug on 2 members at the same time ?
  7. Why do Windows (Win2000/WinXP) members behave weird when I pull the plug ?
  8. Why does my process bind to 127.0.0.1 (loopback interface) instead of the regular interface ?

Q: Does JChannel have a limit on the size of messages sent over it ?

  • We are using JChannel. When reading image (*.gif), JChannel seems have size limit. We have no problem to show tiny images, but we cannot show large images. The wrong message is that UDP has limited size.
  • I have a question concerning with Java Groups. I am using JChannel to send packets, using channel.Receive(0), etc. Have you ever experience massive lost of packets. From what I am understanding, Java groups is running on top of UDP, which is really unreliable. That means we will experience lost of packets. Thanks.

A:

UDP has a size limitation which is the cause for the problem encountered. To overcome the problem, one can use the FRAG layer which fragments larger messages into smaller ones and defragments them at the receiver side. That layer can then just be used on top of the UDP layer. The default protocol stack for JGroups does use FRAG; have a look at default.xml.

Q: Is it time consuming to port all of our code from JChannel to EnsChannel?

A:

Absolutely not. Applications should be written against the Channel abstract class. An actual implementation might for example be JChannel, EnsChannel or IbusChannel. You can parameterize an application to choose the desired stack subclass when started. Applications may also use instances of each type of stack in the same application.

Q: I receive an error message when starting EnsChannel 'Some IO garbage at ensemble outboard startup'. What does it mean ?

The exact error message is:
Waiting for the outboard process to start
java.net.ConnectException: Connection refused
Some IO garbage at ensemble outboard startup.

A:

Q: When EnsChannel starts up, it spawns the outboard executable and then tries to connect to it via a socket. To ensure that outboard has enough time to start and initialize, EnsChannel waits 2.5 seconds before it tries to connect. There are 2 main problems that cause the above error message:
  1. The outboard executable cannot be found. Make sure it is in the PATH.
  2. 2.5 seconds may be too short for outboard to start up. Therefore EnsChannel cannot connect correctly to outboard via socket. The timeout can be increased by changing file JGroups/Ensemble/Hot_Ensemble.java (look for sleep(2500)).

Q: When 2 EnsChannels start up, they do not seem to 'find' each other (they don't form a process group).

A:

There is probably no gossip daemon running. Refer to the Ensemble documentation on how to start it. Also, check that the ENS_* environment variables have been set correctly.

Q: How can I tunnel a firewall ?

A:

Okay, there are 2 things: a gossip deamon and a router.
  1. The gossip daemon is used to register channels, and keep track of channels and groups. Channels periodically register with the gossip daemon. When a registration from a channel hasn't been received for a certain period of time (10 secs), the channel is dropped. New channels query the gossip daemon for initial membership. The gossip daemon is used when IP multicast is disabled. Otherwise, IP multicast would ping to a well-known IP mcast address to find the initial membership.
  2. The router is used to tunnel traffic through a firewall using TCP. Your stack has to contain a TUNNEL layer at the bottom, instead of a UDP layer. TUNNEL establishes a TCP connection with JRouter, and sends outgoing packets over that connection, and receives incoming packets.
In your case, I would use both the gossip daemon and the router. You would start the components in the following order: 1. Start gossip daemon: JGroups/JavaStack/GossipServer (starts on port 12001 by default) 2. Start JRouter: JGroups/JavaStack/JRouter (starts on port 12002 by default) 3. Create your channel: new JChannel The channel properties in this case have to be defined as follows: "TUNNEL(router_host=janet.cs.cornell.edu;router_port=12002):" + "PING(gossip_host=janet.cs.cornell.edu;gossip_port=12001):FD:GMS"; 'janet.cs.cornell.edu' would have to be replaced by the hostname on which you run gossip and JRouter. When starting a new channel, you would see messages in both the gossip server's window, and the JRouter. These messages would tell you what happens.

Q: What happens if I pull the plug on 2 members at the same time ?

A:

This should not be a problem as long as there are group members around. If you pull the plug on 2 participants P1 and P2 (none of them is the coordinator), then the coordinator will mcast 2 new views: V1 excludes P1 and V2 excludes P2 (or P2 and then P1, depending on which member is suspected first. In any case, there will not be a view which excludes P1 and P2 at the same time. This can only happen when you do the following: create a group with ca. 7 members (P0 - P6). Kill P4 and P5 and immediately afterwards make P6 leave regularly (e.g. press 'leave' on Draw). The leave protocol tries to flush all pending mcasts and therefore sends a FLUSH to all members including P6. However, while doing so, it detects that both P4 and P5 have failed. Therefore it excludes them dynamically from the FLUSH destinations, so the FLUSH is only sent to P0-P3 and P6. This results in a view that excludes 3 members at the same time (P4,P5,P6).
If you pull the plug on both the coordinator and a participant, the following happens (depending on who is suspected first): if it is the coordinator, another member will take over. Then the participant is suspected. The new coordinator will the exclude the participant and mcast a new view. If the part is suspected first, since there is no coordinator to handle this, the SUSPECT events regarding the part will go unheard. However the SUSPECT events wrt coord will be handled and a new coord will be elected. Only then will the SUSPECT events regarding the part be handled (by the new coord) and the part will be excluded. It gets a bit trickier if the failed part is the one who would take over the coordinator role. But essentially, this just lasts a bit longer, but still works correctly.

Q: Why do Windows (Win2000/WinXP) members behave weird when I pull the plug ?

A:

Some newer Windows systems (confirmed for Win2000 and XP) have a feature called Media Sense, which removes the NIC (similar to unplumming on Solaris) when it detects that the plug has been pulled. This will cause some weird behavior, e.g. a member cannot exclude others because it is not able to receive even its own multicasts anymore. A workaround here is to disable Media Sense; instructions can be found at
Microsoft's web site.

Q: Why does my process bind to 127.0.0.1 (loopback interface) instead of the regular interface ?

A:

OK, so I figured it out. Looks like if you have the hostname in /etc/hosts and your hostname shares the line with the localhost line, like this:
127.0.0.1 localhost.localdomain localhost radioactiveman
then your call in UDP.java to get the local address will return 127.0.0.1.
I added a separate line to /etc/hosts for my host, and it works just fine.
you may want to put this in the FAQ or something. it's a bit esoteric
(contributed by nate@storeperform.com)
Note that even if you have this line, you can still force a transport to use a specific interface by using the bind_addr parameter of the UDP protocol (in the protocol spec), e.g.
UDP(...;bind_addr=radioactiveman;...):...



Copyright © 2002-2008, Bela Ban
Hosted by SourceForge.net Logo The best Java IDE