[serialization] What is the difference between Serialization and Marshaling?

I know that in terms of several distributed techniques (such as RPC), the term "Marshaling" is used but don't understand how it differs from Serialization. Aren't they both transforming objects into series of bits?

Related:

What is Serialization?

What is Object Marshalling?

This question is related to serialization terminology marshalling rpc

The answer is


Marshalling is usually between relatively closely associated processes; serialization does not necessarily have that expectation. So when marshalling data between processes, for example, you may wish to merely send a REFERENCE to potentially expensive data to recover, whereas with serialization, you would wish to save it all, to properly recreate the object(s) when deserialized.


My understanding of marshalling is different to the other answers.

Serialization:

To Produce or rehydrate a wire-format version of an object graph utilizing a convention.

Marshalling:

To Produce or rehydrate a wire-format version of an object graph by utilizing a mapping file, so that the results can be customized. The tool may start by adhering to a convention, but the important difference is the ability to customize results.

Contract First Development:

Marshalling is important within the context of contract first development.

  • Its possible to make changes to an internal object graph, while keeping the external interface stable over time. This way all of the service subscribers won't have to be modified for every trivial change.
  • Its possible to map the results across different languages. For example from the property name convention of one language ('property_name') to another ('propertyName').

Marshaling refers to converting the signature and parameters of a function into a single byte array. Specifically for the purpose of RPC.

Serialization more often refers to converting an entire object / object tree into a byte array Marshaling will serialize object parameters in order to add them to the message and pass it across the network. *Serialization can also be used for storage to disk.*


My vies is:

Problem: Object belongs to some process(VM) and it's lifetime is the same

Serialisation - transform object state into stream of bytes(JSON, XML...) for saving, sharing, transforming...

Marshalling - contains Serialisation + codebase. Usually it used by Remote procedure call(RPC) -> Java Remote Method Invocation(Java RMI) where you are able to invoke a object's method which is hosted on remote Java processes.

codebase - is a place or URL to class definition where it can be downloaded by ClassLoader. CLASSPATH[About] is as a local codebase

JVM -> Class Loader -> load class definition -> class

Very simple diagram for RMI

Serialisation - state
Marshalling - state + class definition

Official doc


I think that the main difference is that Marshalling supposedly also involves the codebase. In other words, you would not be able to marshal and unmarshal an object into a state-equivalent instance of a different class. .

Serialization just means that you can store the object and reobtain an equivalent state, even if it is an instance of another class.

That being said, they are typically synonyms.


From the Marshalling (computer science) Wikipedia article:

The term "marshal" is considered to be synonymous with "serialize" in the Python standard library1, but the terms are not synonymous in the Java-related RFC 2713:

To "marshal" an object means to record its state and codebase(s) in such a way that when the marshalled object is "unmarshalled", a copy of the original object is obtained, possibly by automatically loading the class definitions of the object. You can marshal any object that is serializable or remote. Marshalling is like serialization, except marshalling also records codebases. Marshalling is different from serialization in that marshalling treats remote objects specially. (RFC 2713)

To "serialize" an object means to convert its state into a byte stream in such a way that the byte stream can be converted back into a copy of the object.

So, marshalling also saves the codebase of an object in the byte stream in addition to its state.


Basics First

Byte Stream - Stream is sequence of data. Input stream - reads data from source. Output stream - writes data to desitnation. Java Byte Streams are used to perform input/output byte by byte(8bits at a time). A byte stream is suitable for processing raw data like binary files. Java Character Streams are used to perform input/output 2 bytes at a time, because Characters are stored using Unicode conventions in Java with 2 bytes for each character. Character stream is useful when we process(read/write) text files.

RMI(Remote Method Invocation) - an API that provides a mechanism to create distributed application in java. The RMI allows an object to invoke methods on an object running in another JVM.


Both Serialization and Marshalling are loosely used as synonyms. Here are few differences.

Serialization - Data members of an object is written to binary form or Byte Stream(and then can be written in file/memory/database etc). No information about data-types can be retained once object data members are written to binary form.

enter image description here

Marshalling - Object is serialized(to byte stream in binary format) with data-type + Codebase attached and then passed Remote Object(RMI). Marshalling will transform the data-type into a predetermined naming convention so that it can be reconstructed with respect to the initial data-type. enter image description here

So Serialization is part of Marshalling.

CodeBase is information that tells the receiver of Object where the implementation of this object can be found. Any program that thinks it might ever pass an object to another program that may not have seen it before must set the codebase, so that the receiver can know where to download the code from, if it doesn't have the code available locally. The receiver will, upon deserializing the object, fetch the codebase from it and load the code from that location. (Copied from @Nasir answer)

Serialization is almost like a stupid memory-dump of the memory used by the object(s), while Marshalling stores information about custom data-types.

In a way, Serialization performs marshalling with implematation of pass-by-value because no information of data-type is passed, just the primitive form is passed to byte stream.

Serialization may have some issues related to big-endian, small-endian if the stream is going from one OS to another if the different OS have different means of representing the same data. On the other hand, marshalling is perfectly fine to migrate between OS because the result is a higher-level representation.


Both do one thing in common - that is serializing an Object. Serialization is used to transfer objects or to store them. But:

  • Serialization: When you serialize an object, only the member data within that object is written to the byte stream; not the code that actually implements the object.
  • Marshalling: Term Marshalling is used when we talk about passing Object to remote objects(RMI). In Marshalling Object is serialized(member data is serialized) + Codebase is attached.

So Serialization is part of Marshalling.

CodeBase is information that tells the receiver of Object where the implementation of this object can be found. Any program that thinks it might ever pass an object to another program that may not have seen it before must set the codebase, so that the receiver can know where to download the code from, if it doesn't have the code available locally. The receiver will, upon deserializing the object, fetch the codebase from it and load the code from that location.


Marshalling is the rule to tell compiler how the data will be represented on another environment/system; For example;

[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
public string cFileName;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
public string cAlternateFileName;

as you can see two different string values represented as different value types.

Serialization will only convert object content, not representation (will stay same) and obey rules of serialization, (what to export or no). For example, private values will not be serialized, public values yes and object structure will stay same.


Think of them as synonyms, both have a producer that sends stuff over to a consumer... In the end fields of instances are written into a byte stream and the other end foes the reverse ands up with the same instances.

NB - java RMI also contains support for transporting classes that are missing from the recipient...


Here's more specific examples of both:

Serialization Example:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

typedef struct {
    char value[11];
} SerializedInt32;

SerializedInt32 SerializeInt32(int32_t x) 
{
    SerializedInt32 result;

    itoa(x, result.value, 10);

    return result;
}

int32_t DeserializeInt32(SerializedInt32 x) 
{
    int32_t result;

    result = atoi(x.value);

    return result;
}

int main(int argc, char **argv)
{    
    int x;   
    SerializedInt32 data;
    int32_t result;

    x = -268435455;

    data = SerializeInt32(x);
    result = DeserializeInt32(data);

    printf("x = %s.\n", data.value);

    return result;
}

In serialization, data is flattened in a way that can be stored and unflattened later.

Marshalling Demo:

(MarshalDemoLib.cpp)

#include <iostream>
#include <string>

extern "C"
__declspec(dllexport)
void *StdCoutStdString(void *s)
{
    std::string *str = (std::string *)s;
    std::cout << *str;
}

extern "C"
__declspec(dllexport)
void *MarshalCStringToStdString(char *s)
{
    std::string *str(new std::string(s));

    std::cout << "string was successfully constructed.\n";

    return str;
}

extern "C"
__declspec(dllexport)
void DestroyStdString(void *s)
{
    std::string *str((std::string *)s);
    delete str;

    std::cout << "string was successfully destroyed.\n";
}

(MarshalDemo.c)

#include <Windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(int argc, char **argv)
{
    void *myStdString;

    LoadLibrary("MarshalDemoLib");

    myStdString = ((void *(*)(char *))GetProcAddress (
        GetModuleHandleA("MarshalDemoLib"),
        "MarshalCStringToStdString"
    ))("Hello, World!\n");

    ((void (*)(void *))GetProcAddress (
        GetModuleHandleA("MarshalDemoLib"),
        "StdCoutStdString"
    ))(myStdString);

    ((void (*)(void *))GetProcAddress (
        GetModuleHandleA("MarshalDemoLib"),
        "DestroyStdString"
    ))(myStdString);    
}

In marshaling, data does not necessarily need to be flattened, but it needs to be transformed to another alternative representation. all casting is marshaling, but not all marshaling is casting.

Marshaling doesn't require dynamic allocation to be involved, it can also just be transformation between structs. For example, you might have a pair, but the function expects the pair's first and second elements to be other way around; you casting/memcpy one pair to another won't do the job because fst and snd will get flipped.

#include <stdio.h>

typedef struct {
    int fst;
    int snd;
} pair1;

typedef struct {
    int snd;
    int fst;
} pair2;

void pair2_dump(pair2 p)
{
    printf("%d %d\n", p.fst, p.snd);
}

pair2 marshal_pair1_to_pair2(pair1 p)
{
    pair2 result;
    result.fst = p.fst;
    result.snd = p.snd;
    return result;
}

pair1 given = {3, 7};

int main(int argc, char **argv)
{    
    pair2_dump(marshal_pair1_to_pair2(given));

    return 0;
}

The concept of marshaling becomes especially important when you start dealing with tagged unions of many types. For example, you might find it difficult to get a JavaScript engine to print a "c string" for you, but you can ask it to print a wrapped c string for you. Or if you want to print a string from JavaScript runtime in a Lua or Python runtime. They are all strings, but often won't get along without marshaling.

An annoyance I had recently was that JScript arrays marshal to C# as "__ComObject", and has no documented way to play with this object. I can find the address of where it is, but I really don't know anything else about it, so the only way to really figure it out is to poke at it in any way possible and hopefully find useful information about it. So it becomes easier to create a new object with a friendlier interface like Scripting.Dictionary, copy the data from the JScript array object into it, and pass that object to C# instead of JScript's default array.

test.js:

var x = new ActiveXObject("Dmitry.YetAnotherTestObject.YetAnotherTestObject");

x.send([1, 2, 3, 4]);

YetAnotherTestObject.cs

using System;
using System.Runtime.InteropServices;

namespace Dmitry.YetAnotherTestObject
{
    [Guid("C612BD9B-74E0-4176-AAB8-C53EB24C2B29"), ComVisible(true)]
    public class YetAnotherTestObject
    {
        public void send(object x)
        {
            System.Console.WriteLine(x.GetType().Name);
        }
    }
}

above prints "__ComObject", which is somewhat of a black box from the point of view of C#.

Another interesting concept is that you might have the understanding how to write code, and a computer that knows how to execute instructions, so as a programmer, you are effectively marshaling the concept of what you want the computer to do from your brain to the program image. If we had good enough marshallers, we could just think of what we want to do/change, and the program would change that way without typing on the keyboard. So, if you could have a way to store all the physical changes in your brain for the few seconds where you really want to write a semicolon, you could marshal that data into a signal to print a semicolon, but that's an extreme.


Marshaling uses Serialization process actually but the major difference is that it in Serialization only data members and object itself get serialized not signatures but in Marshalling Object + code base(its implementation) will also get transformed into bytes.

Marshalling is the process to convert java object to xml objects using JAXB so that it can be used in web services.


Examples related to serialization

laravel Unable to prepare route ... for serialization. Uses Closure TypeError: Object of type 'bytes' is not JSON serializable Best way to save a trained model in PyTorch? Convert Dictionary to JSON in Swift Java: JSON -> Protobuf & back conversion Understanding passport serialize deserialize How to generate serial version UID in Intellij Parcelable encountered IOException writing serializable object getactivity() Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects Cannot deserialize the JSON array (e.g. [1,2,3]) into type ' ' because type requires JSON object (e.g. {"name":"value"}) to deserialize correctly

Examples related to terminology

The differences between initialize, define, declare a variable What is the difference between a web API and a web service? What does "opt" mean (as in the "opt" directory)? Is it an abbreviation? What's the name for hyphen-separated case? What is Bit Masking? What is ADT? (Abstract Data Type) What exactly are iterator, iterable, and iteration? What is a web service endpoint? What is the difference between Cloud, Grid and Cluster? How to explain callbacks in plain english? How are they different from calling one function from another function?

Examples related to marshalling

Convert Python ElementTree to string Is it possible to get all arguments of a function as single object inside that function? What is the difference between Serialization and Marshaling?

Examples related to rpc

REST vs JSON-RPC? What is the difference between Document style and RPC style communication? What is the difference between Java RMI and RPC? Git fails when pushing commit to github What is the current choice for doing RPC in Python? What is the difference between Serialization and Marshaling?