Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Communication is not happening between AWS EC2 instances, stuck at find and offer packets stage only. #800

Open
aadarshkt opened this issue Oct 30, 2024 · 13 comments
Labels

Comments

@aadarshkt
Copy link

aadarshkt commented Oct 30, 2024

vSomeip Version

v3.5.1

Boost Version

1.74

Environment

Ubuntu 22.04, x86, AWS instance

Describe the bug

Following vSOMEIP in 10 minutes, to establish communication between two AWS EC2 instances.

Find and offer packets are both visible at client interface, but next step of the communication three-way handshake is not happening. Nor in the client logs SERVICE Available is visible.

Also things that I did -

  1. Used this command to add multicast address to kernel routing table. on both sides - route add -nv dev
  2. Added Multicast address to the Same multicast domain, in which two instances are lying on AWS console.
  3. Multicast offer packets are visible at client side, this proves multicasting is correct.
  4. Inbound and Outbound rules allow all IPv4 packets from all port range.

I have provided Client logs, Server logs, Client.json, Server.json, both src code and also packet capture.

Client IP is 172.31.88.228 and Server IP is 172.31.26.247. (Please check in pcap only offer and find packets from these IPs)

Client Log :
2024-10-30 07:17:47.299023 client [info] Using configuration file: "../client.json".
2024-10-30 07:17:47.299474 client [info] Parsed vsomeip configuration in 0ms
2024-10-30 07:17:47.299539 client [info] Configuration module loaded.
2024-10-30 07:17:47.299582 client [info] Security disabled!
2024-10-30 07:17:47.299713 client [info] Initializing vsomeip (3.5.1) application "client".
2024-10-30 07:17:47.299856 client [info] Instantiating routing manager [Host].
2024-10-30 07:17:47.300890 client [info] create_routing_root: Routing root @ /tmp/vsomeip-0
2024-10-30 07:17:47.301365 client [info] Service Discovery enabled. Trying to load module.
2024-10-30 07:17:47.302839 client [info] Service Discovery module loaded.
2024-10-30 07:17:47.303158 client [info] Application(client, 1313) is initialized (11, 100).
2024-10-30 07:17:47.303280 client [info] REQUEST(1313): [1965.1771:1.1]
2024-10-30 07:17:47.303731 client [info] create_local_server: Listening @ /tmp/vsomeip-1313
2024-10-30 07:17:47.303893 client [info] Starting vsomeip application "client" (1313) using 2 threads I/O nice 0
2024-10-30 07:17:47.304307 client [debug] Thread created. Number of active threads for client : 1
2024-10-30 07:17:47.304652 client [info] Client [1313] routes unicast:172.31.88.228, netmask:255.255.240.0
2024-10-30 07:17:47.304611 client [info] main dispatch thread id from application: 1313 (client) is: 79fb6e200640 TID: 4167
Service [1965.1771] is NOT available.
2024-10-30 07:17:47.305551 client [info] shutdown thread id from application: 1313 (client) is: 79fb6d800640 TID: 4168
2024-10-30 07:17:47.305903 client [info] Watchdog is disabled!
2024-10-30 07:17:47.306281 client [info] io thread id from application: 1313 (client) is: 79fb6f91ab80 TID: 4165
2024-10-30 07:17:47.306899 client [info] Network interface "eth0" state changed: up
2024-10-30 07:17:47.307233 client [info] vSomeIP 3.5.1 | (default)
2024-10-30 07:17:47.307409 client [info] Route "default route (0.0.0.0/0) if: eth0 gw: 172.31.80.1" state changed: up
2024-10-30 07:17:47.311367 client [debug] Joining to multicast group 224.244.224.245 from 172.31.88.228
2024-10-30 07:17:47.311535 client [info] SOME/IP routing ready.
2024-10-30 07:17:47.311787 client [warning] Route "224.244.224.245/32 if: eth0 gw: n/a" state changed: up
2024-10-30 07:17:47.312219 client [info] io thread id from application: 1313 (client) is: 79fb67e00640 TID: 4170
2024-10-30 07:17:47.312834 client [info] udp_server_endpoint_impl: SO_RCVBUF is: 212992 (1703936) local port:30491
2024-10-30 07:17:57.310545 client [info] vSomeIP 3.5.1 | (default)
2024-10-30 07:18:07.311756 client [info] vSomeIP 3.5.1 | (default)
2024-10-30 07:18:17.313237 client [info] vSomeIP 3.5.1 | (default)

Server Logs: (I edited some logging statements)

2024-10-30 07:18:18.410431 server [info] Using configuration file: "/home/ubuntu/vsomeip_server_minimal/server.json".
2024-10-30 07:18:18.410869 server [info] Parsed vsomeip configuration in 0ms
2024-10-30 07:18:18.410931 server [info] Configuration module loaded.
2024-10-30 07:18:18.411008 server [info] Security disabled!
2024-10-30 07:18:18.411039 server [info] Initializing vsomeip (3.5.1) application "server".
2024-10-30 07:18:18.411332 server [info] Instantiating routing manager [Host].
2024-10-30 07:18:18.412252 server [info] create_routing_root: Routing root @ /tmp/vsomeip-0
2024-10-30 07:18:18.413021 server [info] Service Discovery enabled. Trying to load module.
2024-10-30 07:18:18.414867 server [info] Service Discovery module loaded.
2024-10-30 07:18:18.415201 server [info] Application(server, 1212) is initialized (11, 100).
2024-10-30 07:18:18.415788 server [info] create_local_server: Listening @ /tmp/vsomeip-1212
2024-10-30 07:18:18.416007 server [info] OFFER(1212): [1965.1771:1.1] (true)
2024-10-30 07:18:18.416072 server [info] Starting vsomeip application "server" (1212) using 2 threads I/O nice 0
2024-10-30 07:18:18.416404 server [debug] Thread created. Number of active threads for server : 1
2024-10-30 07:18:18.416793 server [info] Client [1212] routes unicast:172.31.26.247, netmask:255.255.240.0
2024-10-30 07:18:18.417170 server [info] main dispatch thread id from application: 1212 (server) is: 7b81b62006c0 TID: 4709
2024-10-30 07:18:18.417881 server [info] Watchdog is disabled!
2024-10-30 07:18:18.417456 server [info] shutdown thread id from application: 1212 (server) is: 7b81b58006c0 TID: 4710
2024-10-30 07:18:18.418874 server [info] io thread id from application: 1212 (server) is: 7b81b79a0b80 TID: 4707
2024-10-30 07:18:18.419116 server [info] io thread id from application: 1212 (server) is: 7b81a7e006c0 TID: 4712
2024-10-30 07:18:18.422682 server [info] Network interface "enX0" state changed: up
2024-10-30 07:18:18.419497 server [info] vSomeIP 3.5.1 | (default)
2024-10-30 07:18:18.423133 server [info] Route "default route (0.0.0.0/0) if: enX0 gw: 172.31.16.1" state changed: up
2024-10-30 07:18:18.423646 server [debug] Joining to multicast group 224.244.224.245 from 172.31.26.247
2024-10-30 07:18:18.424200 server [info] SOME/IP routing ready.
2024-10-30 07:18:18.424486 server [warning] Route "224.244.224.245/32 if: enX0 gw: n/a" state changed: up
2024-10-30 07:18:18.424043 server [info] Hey udp_server_endpoint_implementation called first time
2024-10-30 07:18:18.425452 server [info] Hey udp_server_endpoint function called
2024-10-30 07:18:18.426019 server [info] udp_server_endpoint_impl: SO_RCVBUF is: 212992 (1703936) local port:30491
2024-10-30 07:18:28.424212 server [info] vSomeIP 3.5.1 | (default)
2024-10-30 07:18:38.425042 server [info] vSomeIP 3.5.1 | (default)
2024-10-30 07:18:48.427056 server [info] vSomeIP 3.5.1 | (default)
2024-10-30 07:18:58.427942 server [info] vSomeIP 3.5.1 | (default)

Client JSON :

{
    "unicast" : "172.31.88.228",
    "netmask" : "255.255.240.0",
    "logging" :
    {
        "level" : "debug",
        "console" : "true",
        "file" : { "enable" : "false", "path" : "/tmp/vsomeip.log" },
        "dlt" : "false"
    },
    "applications" : 
    [
        {
            "name" : "client",
            "id" : "0x1313"
        } 
    ],
    "routing" : "client",
    "service-discovery" :
    {
        "enable" : "true",
        "multicast" : "224.244.224.245",
        "port" : "30491",
        "protocol" : "udp",
        "initial_delay_min" : "10",
        "initial_delay_max" : "100",
        "repetitions_base_delay" : "200",
        "repetitions_max" : "3",
        "ttl" : "3",
        "cyclic_offer_delay" : "2000",
        "request_response_delay" : "1500"
    }
    
}

Server JSON :

{
    "unicast" : "172.31.26.247",
    "netmask" : "255.255.240.0",
    "logging" :
    { 
        "level" : "debug",
        "console" : "true",
        "file" : { "enable" : "false", "path" : "/tmp/vsomeip_server.log" },
        "dlt" : "false"
    },
    "applications" : 
    [
        {
            "name" : "server",
            "id" : "0x1212"
        }
    ],
    "services" :
    [
        {
            "service" : "0x1965",
            "instance" : "0x1771",
            "unreliable" : "30509"
        }
    ],
    "routing" : "server",
    "service-discovery" :
    {
        "enable" : "true",
        "multicast" : "224.244.224.245",
        "port" : "30491",
        "protocol" : "udp",
        "initial_delay_min" : "10",
        "initial_delay_max" : "100",
        "repetitions_base_delay" : "200",
        "repetitions_max" : "3",
        "ttl" : "3",
        "cyclic_offer_delay" : "2000",
        "request_response_delay" : "1500"
    }
}

Client Code :

#include <iomanip>
#include <iostream>
#include <sstream>

#include <condition_variable>
#include <thread>
#include <chrono>
#include <unistd.h>

#include <vsomeip/vsomeip.hpp>

#define SAMPLE_SERVICE_ID 0x1965
#define SAMPLE_INSTANCE_ID 0x1771

std::shared_ptr<vsomeip::application> app;

void on_availability(vsomeip::service_t _service, vsomeip::instance_t _instance, bool _is_available) {
    std::cout << "Service ["
            << std::setw(4) << std::setfill('0') << std::hex << _service << "." << _instance
            << "] is " << (_is_available ? "available." : "NOT available.")  << std::endl;
}

int main()
{
    app = vsomeip::runtime::get()->create_application("client");
    app->init();
    app->register_availability_handler(SAMPLE_SERVICE_ID, SAMPLE_INSTANCE_ID, on_availability);
    app->request_service(SAMPLE_SERVICE_ID, SAMPLE_INSTANCE_ID, 1, 1);
    app->start();
}

Server Code :

#include <iomanip>
#include <iostream>
#include <sstream>

#include <vsomeip/vsomeip.hpp>

#define SAMPLE_SERVICE_ID 0x1965
#define SAMPLE_INSTANCE_ID 0x1771

std::shared_ptr<vsomeip::application> app;

int main() {

    app = vsomeip::runtime::get()->create_application("server");
    app->init();
    app->offer_service(SAMPLE_SERVICE_ID, SAMPLE_INSTANCE_ID, 1, 1); // only offers service, not events.
    app->start();
}

vsomeip_minimal_2.zip

Reproduction Steps

Use the code and JSON files with your IPs to reproduce the same behavior on AWS EC2 instance.

Expected behaviour

Three-way handshake should happen with SOME/IP packets visible.

Logs and Screenshots

Already attached.

@aadarshkt aadarshkt added the bug label Oct 30, 2024
@aadarshkt aadarshkt changed the title [BUG]: Communication is not happening between AWS EC2 instances, stuck at three-way handshake. [BUG]: Communication is not happening between AWS EC2 instances, stuck at find and offer packets stage only. Oct 30, 2024
@KareemLMR
Copy link

The fact that only multicast messages are visible between the 2 sides show that there might be an issue with the unicast network configuration. Are you able to ping server from client and client from server. Because as I think from the subnet mask and unicast IPs shared in the logs you attached doesn't belong to the same network and I don't think they will ping each other and unicast traffic from both sides will be lost and won't reach the other side:
Netmask: 11111111.11111111.11110000.00000000
Client IP: 10101100.00011111.01011000.11100100
Server IP: 10101100.00011111.00010000.00000000

@aadarshkt
Copy link
Author

aadarshkt commented Nov 1, 2024

Hi, @KareemLMR,

Both AWS instances are pingable from each other. Also, I guess netmask is correct, as the IP addresses are /20. I have taken netmask from ifconfig command. We can also remove the netmask option, as it is not there in example config file -> vsomeip-tcp-service.json

Netmask image

Currently I am using most minimal C++ code used to check only service available.

@KareemLMR
Copy link

I got you are just trying to test handshake without calling methods or subscription to events for now. I checked the wireshark trace you attached. It shows client is trying to FIND but it didn't see the OFFER from the server for that it repeated the FIND multiple times. This means either the OFFER packets didn't make their way completely to the client or there is something (security layer) blocking them from reaching client socket. So to debug that I wanted to ask you. You are building the vsomeip source code on your own right? not getting it as a binary from somewhere.
If so, you will find a file called service_discovery_impl.cpp in implementation/service_discovery/src. Inside it you will find this disable block of code by #if 0:
image

Remove the #if 0 and endif, then rebuild both client and server and let's check the logs again. If you saw packets being logged then there are no network issues and it is an issue somewhere else, but if you didn't see any packets being logged then it is for sure a network issue which I expect to be the case so let's check.

@KareemLMR
Copy link

I run vsomeip on normal Linux machines before, I didn't run it on AWS EC2 Instances before but when I searched online, I found many docs and websites saying that multicast handling is not supported and to make it work it is not straight forward as normal Linux Machines. Here is one of them:
https://stackoverflow.com/questions/40740997/how-to-multicast-between-different-ec2-instances-using-java-multicastsocket

If you want to check you can either try my test shared above or implement a simple multicast C++ Socket app or simply listen by tcpdump on both sides while running your apps.

@aadarshkt
Copy link
Author

I did the change for service_discovery_impl.cpp file for both client and server. Recompiled it. But no packet data was visible. Same logs are there.
client_side_change

I used a simple multicast socket C++ program. I can send and receive multicast packets with client and server on both different instances. I have attached the tcpdump pcap files for multicast packets. (check with ip.dst == 224.244.224.245)

multicast_server.zip

@KareemLMR
Copy link

Hello @aadarshkt
It is great that we checked that the issue is caused because the OFFER Messages really are not read by the client side so it doesn't know that service is available and also FIND Messages also are not read by the server and that's the cause of the behavior.

For the server client multicast test app you said you are able to send and receive multicast packet. Does that mean you are printing the received messages same as we did on vsomeip src code or you just checked tcpdump trace?
By the way also I checked the pcap logs file but I didn't see packets going from unicast IP to multicast in any side.

@KareemLMR
Copy link

This is the only packet I saw in the trace
image
But anyway as I told you the tcpdump only shows low layer reception of the packet. It is not enough to check that there are packets on tcpdump trace. It is in fact the same behavior that happened with vsomeip. We need to log the received packets on console or in a file same as what we did with vsomeip app.

@aadarshkt
Copy link
Author

Yes, getting in console at application layer.

Server
Client

Client code,

#include <iostream>
#include <cstring>
#include <arpa/inet.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
 
#define MULTICAST_GROUP "224.244.224.245"  // Multicast group IP address
#define MULTICAST_PORT 12345         // Port number for multicast
 
void sendMulticastPacket(const char* message) {
    int sock;
    struct sockaddr_in multicastAddr;
 
    // Create a UDP socket
    sock = socket(AF_INET, SOCK_DGRAM, 0);
    if (sock < 0) {
        perror("Socket creation failed");
        return;
    }
 
    // Set up the multicast address
    memset(&multicastAddr, 0, sizeof(multicastAddr));
    multicastAddr.sin_family = AF_INET;
    multicastAddr.sin_addr.s_addr = inet_addr(MULTICAST_GROUP);
    multicastAddr.sin_port = htons(MULTICAST_PORT);
 
    // Send the multicast packet
    if (sendto(sock, message, strlen(message), 0, (struct sockaddr*)&multicastAddr, sizeof(multicastAddr)) < 0) {
        perror("Send failed");
    } else {
        std::cout << "Multicast packet sent: " << message << std::endl;
    }
 
    close(sock);
}
 
int main() {
    // Send a test message
    sendMulticastPacket("Hello, multicast world from sender!");
    return 0;
}

Server Code,

#include <iostream>
#include <cstring>
#include <arpa/inet.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
 
#define MULTICAST_GROUP "224.244.224.245"  // Multicast group IP address
#define MULTICAST_PORT 12345         // Port number for multicast
#define BUFFER_SIZE 1024             // Buffer size for received data
 
void receiveMulticastPackets() {
    int sock;
    struct sockaddr_in localAddr;
    struct ip_mreq group;
    char buffer[BUFFER_SIZE];
 
    // Create a UDP socket
    sock = socket(AF_INET, SOCK_DGRAM, 0);
    if (sock < 0) {
        perror("Socket creation failed");
        return;
    }
 
    // Allow multiple sockets to use the same port
    int reuse = 1;
    if (setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, (char*)&reuse, sizeof(reuse)) < 0) {
        perror("Setting SO_REUSEADDR failed");
        close(sock);
        return;
    }
 
    // Bind to the local address
    memset(&localAddr, 0, sizeof(localAddr));
    localAddr.sin_family = AF_INET;
    localAddr.sin_addr.s_addr = htonl(INADDR_ANY);  // Accept any local address
    localAddr.sin_port = htons(MULTICAST_PORT);
 
    if (bind(sock, (struct sockaddr*)&localAddr, sizeof(localAddr)) < 0) {
        perror("Bind failed");
        close(sock);
        return;
    }
 
    // Join the multicast group
    group.imr_multiaddr.s_addr = inet_addr(MULTICAST_GROUP);
    group.imr_interface.s_addr = htonl(INADDR_ANY);
 
    if (setsockopt(sock, IPPROTO_IP, IP_ADD_MEMBERSHIP, (char*)&group, sizeof(group)) < 0) {
        perror("Adding multicast group failed");
        close(sock);
        return;
    }
 
    std::cout << "Listening for multicast packets on group " << MULTICAST_GROUP << "..." << std::endl;
 
    while (true) {
        // Receive multicast packets
        int bytesRead = recvfrom(sock, buffer, BUFFER_SIZE, 0, nullptr, nullptr);
        if (bytesRead < 0) {
            perror("Receive failed");
            break;
        }
 
        buffer[bytesRead] = '\0';
        std::cout << "Received multicast packet: " << buffer << std::endl;
    }
 
    // Leave the multicast group
    setsockopt(sock, IPPROTO_IP, IP_DROP_MEMBERSHIP, (char*)&group, sizeof(group));
    close(sock);
}
 
int main() {
    receiveMulticastPackets();
    return 0;
}

Code Source - Lllama 3.1

@aadarshkt
Copy link
Author

@KareemLMR ,

Also, I noticed that this service_discovery_impl function is not called, as seen in the logs after rebuilding. Is there any way to know what function my code is stuck at what next function should be called. Simply where it is failing.

@KareemLMR
Copy link

Hello @aadarshkt,
There is a way yes to check. You can add logs to these functions:

  1. Inside implementation/endpoints/src: you have this function:
    image
    Add logs here to determine whether it registers callback for multicast messages or not. This is the endpoint of the vsomeip so this can tell us what happens exactly when we receive the multicast packet or if we don't receive at all.

  2. Inside the callback function on_multicast_received in the same file:
    image

  3. Inside the packet processing function on_message_received in the same file uncomment the #if 0 endif same as what we did with SD to log all packets received:
    image

  4. Finally inside implementation/routing/src uncomment #if 0 endif in this function:
    image

It is the one that calls service_discovery module to parse SD messages where we tried to log before.

@KareemLMR
Copy link

One more question I wanted to ask you about:
After building, what do you push?
Remember you need to push libvsomeip3-sd.so.VERSION for any change in service_discovery and libvsomeip3.so.VERSION for any change in routing manager or endpoints but to guarantee your change Push all the dynamic linked libraries out from the build.

@aadarshkt
Copy link
Author

You are talking about the files inside install_folder/lib?

folder_structure

Yes, when I am rebuilding it, I am deleting older install files and these install files are somewhere else apart from build folder of vsomeip.

I am using cmake -DCMAKE_INSTALL_PREFIX:PATH=$YOUR_PATH .. command.

@KareemLMR
Copy link

Ok that's great. Good that you push everything. Try these changes and tell me the result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants