[c++] What is the correct way of reading from a TCP socket in C/C++?

Here's my code:

// Not all headers are relevant to the code snippet.
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <cstdlib>
#include <cstring>
#include <unistd.h>

char *buffer;
stringstream readStream;
bool readData = true;

while (readData)
{
    cout << "Receiving chunk... ";

    // Read a bit at a time, eventually "end" string will be received.
    bzero(buffer, BUFFER_SIZE);
    int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE);
    if (readResult < 0)
    {
        THROW_VIMRID_EX("Could not read from socket.");
    }

    // Concatenate the received data to the existing data.
    readStream << buffer;

    // Continue reading while end is not found.
    readData = readStream.str().find("end;") == string::npos;

    cout << "Done (length: " << readStream.str().length() << ")" << endl;
}

It's a little bit of C and C++ as you can tell. The BUFFER_SIZE is 256 - should I just increase the size? If so, what to? Does it matter?

I know that if "end" is not received for what ever reason, this will be an endless loop, which is bad - so if you could suggest a better way, please also do so.

This question is related to c++ c tcp

The answer is


1) Others (especially dirkgently) have noted that buffer needs to be allocated some memory space. For smallish values of N (say, N <= 4096), you can also allocate it on the stack:

#define BUFFER_SIZE 4096
char buffer[BUFFER_SIZE]

This saves you the worry of ensuring that you delete[] the buffer should an exception be thrown.

But remember that stacks are finite in size (so are heaps, but stacks are finiter), so you don't want to put too much there.

2) On a -1 return code, you should not simply return immediately (throwing an exception immediately is even more sketchy.) There are certain normal conditions that you need to handle, if your code is to be anything more than a short homework assignment. For example, EAGAIN may be returned in errno if no data is currently available on a non-blocking socket. Have a look at the man page for read(2).


Just to add to things from several of the posts above:

read() -- at least on my system -- returns ssize_t. This is like size_t, except is signed. On my system, it's a long, not an int. You might get compiler warnings if you use int, depending on your system, your compiler, and what warnings you have turned on.


This is an article that I always refer to when working with sockets..

THE WORLD OF SELECT()

It will show you how to reliably use 'select()' and contains some other useful links at the bottom for further info on sockets.


For any non-trivial application (I.E. the application must receive and handle different kinds of messages with different lengths), the solution to your particular problem isn't necessarily just a programming solution - it's a convention, I.E. a protocol.

In order to determine how many bytes you should pass to your read call, you should establish a common prefix, or header, that your application receives. That way, when a socket first has reads available, you can make decisions about what to expect.

A binary example might look like this:

#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <arpa/inet.h>

enum MessageType {
    MESSAGE_FOO,
    MESSAGE_BAR,
};

struct MessageHeader {
    uint32_t type;
    uint32_t length;
};

/**
 * Attempts to continue reading a `socket` until `bytes` number
 * of bytes are read. Returns truthy on success, falsy on failure.
 *
 * Similar to @grieve's ReadXBytes.
 */
int readExpected(int socket, void *destination, size_t bytes)
{
    /*
    * Can't increment a void pointer, as incrementing
    * is done by the width of the pointed-to type -
    * and void doesn't have a width
    *
    * You can in GCC but it's not very portable
    */
    char *destinationBytes = destination;
    while (bytes) {
        ssize_t readBytes = read(socket, destinationBytes, bytes);
        if (readBytes < 1)
            return 0;
        destinationBytes += readBytes;
        bytes -= readBytes;
    }
    return 1;
}

int main(int argc, char **argv)
{
    int selectedFd;

    // use `select` or `poll` to wait on sockets
    // received a message on `selectedFd`, start reading

    char *fooMessage;
    struct {
        uint32_t a;
        uint32_t b;
    } barMessage;

    struct MessageHeader received;
    if (!readExpected (selectedFd, &received, sizeof(received))) {
        // handle error
    }
    // handle network/host byte order differences maybe
    received.type = ntohl(received.type);
    received.length = ntohl(received.length);

    switch (received.type) {
        case MESSAGE_FOO:
            // "foo" sends an ASCII string or something
            fooMessage = calloc(received.length + 1, 1);
            if (readExpected (selectedFd, fooMessage, received.length))
                puts(fooMessage);
            free(fooMessage);
            break;
        case MESSAGE_BAR:
            // "bar" sends a message of a fixed size
            if (readExpected (selectedFd, &barMessage, sizeof(barMessage))) {
                barMessage.a = ntohl(barMessage.a);
                barMessage.b = ntohl(barMessage.b);
                printf("a + b = %d\n", barMessage.a + barMessage.b);
            }
            break;
        default:
            puts("Malformed type received");
            // kick the client out probably
    }
}

You can likely already see one disadvantage of using a binary format - for each attribute greater than a char you read, you will have to ensure its byte order is correct using the ntohl or ntohs functions.

An alternative is to use byte-encoded messages, such as simple ASCII or UTF-8 strings, which avoid byte-order issues entirely but require extra effort to parse and validate.

There are two final considerations for network data in C.

The first is that some C types do not have fixed widths. For example, the humble int is defined as the word size of the processor, so 32 bit processors will produce 32 bit ints, while 64 bit processors will produces 64 bit ints. Good, portable code should have network data use fixed-width types, like those defined in stdint.h.

The second is struct padding. A struct with different-widthed members will add data in between some members to maintain memory alignment, making the struct faster to use in the program but sometimes producing confusing results.

#include <stdio.h>
#include <stdint.h>

int main()
{
    struct A {
        char a;
        uint32_t b;
    } A;

    printf("sizeof(A): %ld\n", sizeof(A));
}

In this example, its actual width won't be 1 char + 4 uint32_t = 5 bytes, it'll be 8:

mharrison@mharrison-KATANA:~$ gcc -o padding padding.c
mharrison@mharrison-KATANA:~$ ./padding 
sizeof(A): 8

This is because 3 bytes are added after char a to make sure uint32_t b is memory-aligned.

So if you write a struct A, then attempt to read a char and a uint32_t on the other side, you'll get char a, and a uint32_t where the first three bytes are garbage and the last byte is the first byte of the actual integer you wrote.

Either document your data format explicitly as C struct types or, better yet, document any padding bytes they might contain.


Where are you allocating memory for your buffer? The line where you invoke bzero invokes undefined behavior since buffer does not point to any valid region of memory.

char *buffer = new char[ BUFFER_SIZE ];
// do processing

// don't forget to release
delete[] buffer;

Several pointers:

You need to handle a return value of 0, which tells you that the remote host closed the socket.

For nonblocking sockets, you also need to check an error return value (-1) and make sure that errno isn't EINPROGRESS, which is expected.

You definitely need better error handling - you're potentially leaking the buffer pointed to by 'buffer'. Which, I noticed, you don't allocate anywhere in this code snippet.

Someone else made a good point about how your buffer isn't a null terminated C string if your read() fills the entire buffer. That is indeed a problem, and a serious one.

Your buffer size is a bit small, but should work as long as you don't try to read more than 256 bytes, or whatever you allocate for it.

If you're worried about getting into an infinite loop when the remote host sends you a malformed message (a potential denial of service attack) then you should use select() with a timeout on the socket to check for readability, and only read if data is available, and bail out if select() times out.

Something like this might work for you:

fd_set read_set;
struct timeval timeout;

timeout.tv_sec = 60; // Time out after a minute
timeout.tv_usec = 0;

FD_ZERO(&read_set);
FD_SET(socketFileDescriptor, &read_set);

int r=select(socketFileDescriptor+1, &read_set, NULL, NULL, &timeout);

if( r<0 ) {
    // Handle the error
}

if( r==0 ) {
    // Timeout - handle that. You could try waiting again, close the socket...
}

if( r>0 ) {
    // The socket is ready for reading - call read() on it.
}

Depending on the volume of data you expect to receive, the way you scan the entire message repeatedly for the "end;" token is very inefficient. This is better done with a state machine (the states being 'e'->'n'->'d'->';') so that you only look at each incoming character once.

And seriously, you should consider finding a library to do all this for you. It's not easy getting it right.


If you actually create the buffer as per dirks suggestion, then:

  int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE);

may completely fill the buffer, possibly overwriting the terminating zero character which you depend on when extracting to a stringstream. You need:

  int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE - 1 );

Examples related to c++

Method Call Chaining; returning a pointer vs a reference? How can I tell if an algorithm is efficient? Difference between opening a file in binary vs text How can compare-and-swap be used for a wait-free mutual exclusion for any shared data structure? Install Qt on Ubuntu #include errors detected in vscode Cannot open include file: 'stdio.h' - Visual Studio Community 2017 - C++ Error How to fix the error "Windows SDK version 8.1" was not found? Visual Studio 2017 errors on standard headers How do I check if a Key is pressed on C++

Examples related to c

conflicting types for 'outchar' Can't compile C program on a Mac after upgrade to Mojave Program to find largest and second largest number in array Prime numbers between 1 to 100 in C Programming Language In c, in bool, true == 1 and false == 0? How I can print to stderr in C? Visual Studio Code includePath "error: assignment to expression with array type error" when I assign a struct field (C) Compiling an application for use in highly radioactive environments How can you print multiple variables inside a string using printf?

Examples related to tcp

What does "app.run(host='0.0.0.0') " mean in Flask What is the difference between HTTP 1.1 and HTTP 2.0? Sending a file over TCP sockets in Python Telnet is not recognized as internal or external command How to open port in Linux adb connection over tcp not working now Understanding [TCP ACKed unseen segment] [TCP Previous segment not captured] How do I debug error ECONNRESET in Node.js? Differences between TCP sockets and web sockets, one more time Is SMTP based on TCP or UDP?