[c] Concatenating strings in C, which method is more efficient?

I came across these two methods to concatenate strings:

Common part:

char* first= "First";
char* second = "Second";
char* both = malloc(strlen(first) + strlen(second) + 2);

Method 1:

strcpy(both, first);
strcat(both, " ");       // or space could have been part of one of the strings
strcat(both, second);

Method 2:

sprintf(both, "%s %s", first, second);

In both cases the content of both would be "First Second".

I would like to know which one is more efficient (I have to perform several concatenation operations), or if you know a better way to do it.

This question is related to c string performance concatenation

The answer is


sprintf() is designed to handle far more than just strings, strcat() is specialist. But I suspect that you are sweating the small stuff. C strings are fundamentally inefficient in ways that make the differences between these two proposed methods insignificant. Read "Back to Basics" by Joel Spolsky for the gory details.

This is an instance where C++ generally performs better than C. For heavy weight string handling using std::string is likely to be more efficient and certainly safer.

[edit]

[2nd edit]Corrected code (too many iterations in C string implementation), timings, and conclusion change accordingly

I was surprised at Andrew Bainbridge's comment that std::string was slower, but he did not post complete code for this test case. I modified his (automating the timing) and added a std::string test. The test was on VC++ 2008 (native code) with default "Release" options (i.e. optimised), Athlon dual core, 2.6GHz. Results:

C string handling = 0.023000 seconds
sprintf           = 0.313000 seconds
std::string       = 0.500000 seconds

So here strcat() is faster by far (your milage may vary depending on compiler and options), despite the inherent inefficiency of the C string convention, and supports my original suggestion that sprintf() carries a lot of baggage not required for this purpose. It remains by far the least readable and safe however, so when performance is not critical, has little merit IMO.

I also tested a std::stringstream implementation, which was far slower again, but for complex string formatting still has merit.

Corrected code follows:

#include <ctime>
#include <cstdio>
#include <cstring>
#include <string>

void a(char *first, char *second, char *both)
{
    for (int i = 0; i != 1000000; i++)
    {
        strcpy(both, first);
        strcat(both, " ");
        strcat(both, second);
    }
}

void b(char *first, char *second, char *both)
{
    for (int i = 0; i != 1000000; i++)
        sprintf(both, "%s %s", first, second);
}

void c(char *first, char *second, char *both)
{
    std::string first_s(first) ;
    std::string second_s(second) ;
    std::string both_s(second) ;

    for (int i = 0; i != 1000000; i++)
        both_s = first_s + " " + second_s ;
}

int main(void)
{
    char* first= "First";
    char* second = "Second";
    char* both = (char*) malloc((strlen(first) + strlen(second) + 2) * sizeof(char));
    clock_t start ;

    start = clock() ;
    a(first, second, both);
    printf( "C string handling = %f seconds\n", (float)(clock() - start)/CLOCKS_PER_SEC) ;

    start = clock() ;
    b(first, second, both);
    printf( "sprintf           = %f seconds\n", (float)(clock() - start)/CLOCKS_PER_SEC) ;

    start = clock() ;
    c(first, second, both);
    printf( "std::string       = %f seconds\n", (float)(clock() - start)/CLOCKS_PER_SEC) ;

    return 0;
}

Here's some madness for you, I actually went and measured it. Bloody hell, imagine that. I think I got some meaningful results.

I used a dual core P4, running Windows, using mingw gcc 4.4, building with "gcc foo.c -o foo.exe -std=c99 -Wall -O2".

I tested method 1 and method 2 from the original post. Initially kept the malloc outside the benchmark loop. Method 1 was 48 times faster than method 2. Bizarrely, removing -O2 from the build command made the resulting exe 30% faster (haven't investigated why yet).

Then I added a malloc and free inside the loop. That slowed down method 1 by a factor of 4.4. Method 2 slowed down by a factor of 1.1.

So, malloc + strlen + free DO NOT dominate the profile enough to make avoiding sprintf worth while.

Here's the code I used (apart from the loops were implemented with < instead of != but that broke the HTML rendering of this post):

void a(char *first, char *second, char *both)
{
    for (int i = 0; i != 1000000 * 48; i++)
    {
        strcpy(both, first);
        strcat(both, " ");
        strcat(both, second);
    }
}

void b(char *first, char *second, char *both)
{
    for (int i = 0; i != 1000000 * 1; i++)
        sprintf(both, "%s %s", first, second);
}

int main(void)
{
    char* first= "First";
    char* second = "Second";
    char* both = (char*) malloc((strlen(first) + strlen(second) + 2) * sizeof(char));

    // Takes 3.7 sec with optimisations, 2.7 sec WITHOUT optimisations!
    a(first, second, both);

    // Takes 3.7 sec with or without optimisations
    //b(first, second, both);

    return 0;
}

They should be pretty much the same. The difference isn't going to matter. I would go with sprintf since it requires less code.


  • strcpy and strcat are much simpler oprations compared to sprintf, which needs to parse the format string
  • strcpy and strcat are small so they will generally be inlined by the compilers, saving even one more extra function call overhead. For example, in llvm strcat will be inlined using a strlen to find copy starting position, followed by a simple store instruction

I don't know that in case two there's any real concatenation done. Printing them back to back doesn't constitute concatenation.

Tell me though, which would be faster:

1) a) copy string A to new buffer b) copy string B to buffer c) copy buffer to output buffer

or

1)copy string A to output buffer b) copy string b to output buffer


Don't worry about efficiency: make your code readable and maintainable. I doubt the difference between these methods is going to matter in your program.


The difference is unlikely to matter:

  • If your strings are small, the malloc will drown out the string concatenations.
  • If your strings are large, the time spent copying the data will drown out the differences between strcat / sprintf.

As other posters have mentioned, this is a premature optimization. Concentrate on algorithm design, and only come back to this if profiling shows it to be a performance problem.

That said... I suspect method 1 will be faster. There is some---admittedly small---overhead to parse the sprintf format-string. And strcat is more likely "inline-able".


size_t lf = strlen(first);
size_t ls = strlen(second);

char *both = (char*) malloc((lf + ls + 2) * sizeof(char));

strcpy(both, first);

both[lf] = ' ';
strcpy(&both[lf+1], second);

Neither is terribly efficient since both methods have to calculate the string length or scan it each time. Instead, since you calculate the strlen()s of the individual strings anyway, put them in variables and then just strncpy() twice.


Examples related to c

conflicting types for 'outchar' Can't compile C program on a Mac after upgrade to Mojave Program to find largest and second largest number in array Prime numbers between 1 to 100 in C Programming Language In c, in bool, true == 1 and false == 0? How I can print to stderr in C? Visual Studio Code includePath "error: assignment to expression with array type error" when I assign a struct field (C) Compiling an application for use in highly radioactive environments How can you print multiple variables inside a string using printf?

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to performance

Why is 2 * (i * i) faster than 2 * i * i in Java? What is the difference between spark.sql.shuffle.partitions and spark.default.parallelism? How to check if a key exists in Json Object and get its value Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? Most efficient way to map function over numpy array The most efficient way to remove first N elements in a list? Fastest way to get the first n elements of a List into an Array Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3? pandas loc vs. iloc vs. at vs. iat? Android Recyclerview vs ListView with Viewholder

Examples related to concatenation

Pandas Merging 101 What does ${} (dollar sign and curly braces) mean in a string in Javascript? Concatenate two NumPy arrays vertically Import multiple csv files into pandas and concatenate into one DataFrame How to concatenate columns in a Postgres SELECT? Concatenate string with field value in MySQL Most efficient way to concatenate strings in JavaScript? How to force a line break on a Javascript concatenated string? How to concatenate two IEnumerable<T> into a new IEnumerable<T>? How to concat two ArrayLists?