Introduction and use of MessagePack: an efficient binary serialization format

Table of Contents

What is a MessagePack

The compression principle of MessagePack

msgpack for C/C++


 

MessagePack is an efficient binary serialization format. It allows you to exchange data between multiple languages ​​such as JSON. But it's faster and smaller. Small integers are encoded as one byte, and typical short strings require only one extra byte in addition to the string itself.

What is a MessagePack

The official msgpack official website summarizes in one sentence:
It's like JSON.
but fast and small.
In short, its data format is similar to json, but a lot of optimizations have been done on numbers, multi-byte characters, arrays, etc. Useless characters, binary format, and also ensure that no extra storage space is added by characterization. The following is a simple example diagram given by the official website:

The length of the json in the picture is 27 bytes, but in order to represent this data structure, it uses 9 bytes (that is, those braces, quotation marks, colons, etc., they are extra in vain) to represent those additionally added Meaningless data. The optimization of msgpack is also clearly shown in the figure, special symbols are omitted, and various types are defined with specific codes, such as A7 in the figure above, where the first four bits A are the codes representing str, and it represents The length of this str can be represented by only half a byte, that is, the following 7, so A7 means that it is followed by a 7-byte string.
Some students will ask, how to represent strings whose length is greater than 15 (binary 1111)? This depends on the compression principle of messagepack.

 

The compression principle of MessagePack

For the core compression method, please refer to the official description messagepack specification
. In summary, it is:

  1. true, false and the like : these are too simple, give 1 byte directly, (0xc3 means true, 0xc2 means false)
  2. There is no need to indicate the length : they are numbers and the like, they are naturally fixed length, and use a byte to indicate what the following content is. Use 0xca to indicate that the following is a float 32). The numbers are further compressed and stored in fewer bytes according to the size. For example, an int with a length < 256 can be represented by one byte.
  3. Variable length : such as string, array, binary data (bin type), add 1~4 bytes after the type to store the length of the string, if the length of the string is within 256, only 1 word is required Section, the longest string that MessagePack can store, is (2^32 -1 ) the longest string size of 4G.
  4. Advanced structure : MAP structure, which is the data of the kv structure, similar to the array, plus 1~4 bytes to indicate how many items there are
  5. Ext structure : Represents a specific small unit of data. That is, user-defined data structures.

Let's take a look at the official stringformat diagram

 

For the above problem, a string whose length is greater than 15 (that is, the length cannot be represented by 4 bits) is represented as follows: the specified byte 0xD9 is used to indicate that the following content is a string represented by a length of 8 bits, such as a 160-character length. String , its header information can be represented as D9A0.
It is worth mentioning here the Ext extension format. It is this structure that ensures the completeness of messagepack, because custom structures are very common in actual data interfaces, simple known data types and advanced structures map, array, etc. It does not meet the requirements, so an extension format is required to match it. For example, the following interface format:

  1. {
  2. "error_no":0,
  3. "message":"",
  4. "result":{
  5. "data":[
  6. {
  7. "datatype":1,
  8. "itemdata":
  9. { //There are 45 fields in total
  10. "sname":"\u5fae\u533b",
  11. "packageid":"330611",
  12. "tabs":[
  13. {
  14. "type":1,
  15. "f":"abc"
  16. },
  17. ]
  18. }
  19. },
  20. ],
  21. "hasNextPage":true,
  22. "dirtag":"soft"
  23. }
  24. }

How to write the subdata in tabs as a whole into the itemdata structure? How can itemdata be written into its upper data structure data? At this time Ext came out. We can customize a data type, specify its Type value, and parse it according to our custom structure when parsing encounters this type. How to implement it will be discussed later in the code example.

 

msgpack for C/C++

https://msgpack.org/#languages


It's like JSON, but smaller and faster.

Overview

MessagePack is an efficient binary serialization format that lets you exchange data between multiple languages ​​like JSON, but it's faster and smaller. Small integers are encoded as one byte, and short strings require only one extra byte in addition to the string itself.

example

In C:

  1. #include <msgpack.h>
  2. #include <stdio.h>
  3. int main(void)
  4. {
  5. /* msgpack::sbuffer is a simple buffer implementation. */
  6. msgpack_sbuffer sbuf;
  7. msgpack_sbuffer_init(&sbuf);
  8. /* serialize values into the buffer using msgpack_sbuffer_write callback function. */
  9. msgpack_packer pk;
  10. msgpack_packer_init(&pk, &sbuf, msgpack_sbuffer_write);
  11. msgpack_pack_array(&pk, 3);
  12. msgpack_pack_int(&pk, 1);
  13. msgpack_pack_true(&pk);
  14. msgpack_pack_str(&pk, 7);
  15. msgpack_pack_str_body(&pk, "example", 7);
  16. /* deserialize the buffer into msgpack_object instance. */
  17. /* deserialized object is valid during the msgpack_zone instance alive. */
  18. msgpack_zone mempool;
  19. msgpack_zone_init(&mempool, 2048);
  20. msgpack_object deserialized;
  21. msgpack_unpack(sbuf.data, sbuf.size, NULL, &mempool, &deserialized);
  22. /* print the deserialized object. */
  23. msgpack_object_print(stdout, deserialized);
  24. puts("");
  25. msgpack_zone_destroy(&mempool);
  26. msgpack_sbuffer_destroy(&sbuf);
  27. return 0;
  28. }

seeQUICKSTART-C.mdfor more details.

In C++:

  1. #include <msgpack.hpp>
  2. #include <string>
  3. #include <iostream>
  4. #include <sstream>
  5. int main(void)
  6. {
  7. msgpack::type::tuple<int, bool, std::string> src(1, true, "example");
  8. // serialize the object into the buffer.
  9. // any classes that implements write(const char*,size_t) can be a buffer.
  10. std::stringstream buffer;
  11. msgpack::pack(buffer, src);
  12. // send the buffer ...
  13. buffer.seekg(0);
  14. // deserialize the buffer into msgpack::object instance.
  15. std::string str(buffer.str());
  16. msgpack::object_handle oh =
  17. msgpack::unpack(str.data(), str.size());
  18. // deserialized object is valid during the msgpack::object_handle instance is alive.
  19. msgpack::object deserialized = oh.get();
  20. // msgpack::object supports ostream.
  21. std::cout << deserialized << std::endl;
  22. // convert msgpack::object instance into the original type.
  23. // if the type is mismatched, it throws msgpack::type_error exception.
  24. msgpack::type::tuple<int, bool, std::string> dst;
  25. deserialized.convert(dst);
  26. // or create the new instance
  27. msgpack::type::tuple<int, bool, std::string> dst2 =
  28. deserialized.as<msgpack::type::tuple<int, bool, std::string> >();
  29. return 0;
  30. }

seeQUICKSTART-CPP.mdfor more details.

usage

C++ header only library

When using msgpack on C++, just add msgpack-c/include to the include path:

g++ -I msgpack-c/include your_source_file.cpp

If you want to use the C version of msgpack, you need to build it. You can also install the C and C++ versions of msgpack.

build and install install from git repository

Using Terminal (CLI)

You will need:

  • gcc >= 4.1.0
  • cmake >= 2.8.0

C and C++03:

  1. $ git clone https://github.com/msgpack/msgpack-c.git
  2. $ cd msgpack-c
  3. $ cmake .
  4. $ make
  5. $ sudo make install

If you want to set the C++11 or C++17 version of msgpack instead, execute the following command:

  1. $ git clone https://github.com/msgpack/msgpack-c.git
  2. $ cd msgpack-c
  3. $ cmake -DMSGPACK_CXX[11|17]=ON .
  4. $ sudo make install

MSGPACK_CXX[11|17]Flags do not affect installation files. Just switch the test case. All files are installed in all settings.

When using part Cmsgpack-c, you need to build and link the library. Both static/shared libraries are built by default. If you only want to build static libraries, setBUILD_SHARED_LIBS=OFFfor cmake. If you only want to build shared libraries, set `BUILD_SHARED_L

GUI on Windows

Clone msgpack -c git repository.

$ git clone https://github.com/msgpack /msgpack-c.git _

Or use GUI git client.

eg) tortoise git  https://code.google.com/p/tortoisegit/

  1. Start the cmake GUI client .
  2. Set the "Where is the source code:" text box and the "Where is the binaries built:" text box.
  3. Click the "Configure" button.
  4. Choose your version of Visual Studio.
  5. Click the "Generate" button.
  6. Open the created msgpack.sln on Visual Studio.
  7. All build.

reference

You can get more information including tutorials on the wiki .

contribute

msgpack-cDeveloped via msgpack/msgpack-c on GitHub . To report an issue or send a pull request, please use the  issue tracker .

Here is a list of great contributors .

license

msgpack-cLicensed under the Boost Software License, Version 1.0. relatedLICENSE_1_0.txtSee documentation for details.

 

Related: Introduction and use of MessagePack: an efficient binary serialization format