Unverified Commit ee516614 authored by Qianqian Fang's avatar Qianqian Fang Committed by GitHub
Browse files

Support UBJSON-derived Binary JData (BJData) format (#3336)

* support UBJSON-derived Binary JData (BJData) format

* fix Codacy warning

* partially fix VS compilation errors

* fix additional VS errors

* fix more VS compilation errors

* fix additional warnings and errors for clang and msvc

* add more tests to cover the new bjdata types

* add tests for optimized ndarray, improve coverage, fix clang/gcc warnings

* gcc warn useless conversion but msvc gives an error

* fix ci_test errors

* complete test coverage, fix ci_test errors

* add half precision error test

* fix No newline at end of file error by clang

* simplify endian condition, format unit-bjdata

* remove broken test due to alloc limit

* full coverage, I hope

* move bjdata new markers from default to the same level as ubjson markers

* fix ci errors, add tests for new bjdata switch structure

* make is_bjdata const after using initializer list

* remove the unwanted assert

* move is_bjdata to an optional param to write_ubjson

* pass use_bjdata via output adapter

* revert order to avoid msvc 2015 unreferenced formal param error

* update BJData Spect V1 Draft-2 URL after spec release

* amalgamate code

* code polishing following @gregmarr's feedback

* make use_bjdata a non-default parameter

* fix ci error, remove unwanted param comment

* encode and decode bjdata ndarray in jdata annotations, enable roundtrip tests

* partially fix ci errors, add tests to improve coverage

* polish patch to remove ci errors

* fix a ndarray dim vector condition

* fix clang tidy error

* add sax test cases for ndarray

* add additional sax event tests

* adjust sax event numbering

* fix sax tests

* ndarray can only be used with array containers, discard if used in object

* complete test coverage

* disable [{SHTFNZ in optimized type due to security risks in #2793 and hampered readability

* fix ci error

* move OutputIsLittleEndian from tparam to param to replace use_bjdata

* fix ci clang gcc error

* fix ci static analysis error

* update json_test_data to 3.1.0, enable file-based bjdata unit tests

* fix stack overflow error on msvc 2019 and 2022

* use https link, update sax_parse_error after rebase

* make input_format const and use initializer

* return bool for write_bjdata_ndarray

* test write_bjdata_ndarray return value as boolean

* fix ci error
parent a6ee8bf9
......@@ -32,7 +32,7 @@
- [Implicit conversions](#implicit-conversions)
- [Conversions to/from arbitrary types](#arbitrary-types-conversions)
- [Specializing enum conversion](#specializing-enum-conversion)
- [Binary formats (BSON, CBOR, MessagePack, and UBJSON)](#binary-formats-bson-cbor-messagepack-and-ubjson)
- [Binary formats (BSON, CBOR, MessagePack, UBJSON, and BJData)](#binary-formats-bson-cbor-messagepack-ubjson-and-bjdata)
- [Supported compilers](#supported-compilers)
- [Integration](#integration)
- [CMake](#cmake)
......@@ -961,9 +961,9 @@ Other Important points:
- When using `get<ENUM_TYPE>()`, undefined JSON values will default to the first pair specified in your map. Select this default pair carefully.
- If an enum or JSON value is specified more than once in your map, the first matching occurrence from the top of the map will be returned when converting to or from JSON.
### Binary formats (BSON, CBOR, MessagePack, and UBJSON)
### Binary formats (BSON, CBOR, MessagePack, UBJSON, and BJData)
Though JSON is a ubiquitous data format, it is not a very compact format suitable for data exchange, for instance over a network. Hence, the library supports [BSON](https://bsonspec.org) (Binary JSON), [CBOR](https://cbor.io) (Concise Binary Object Representation), [MessagePack](https://msgpack.org), and [UBJSON](https://ubjson.org) (Universal Binary JSON Specification) to efficiently encode JSON values to byte vectors and to decode such vectors.
Though JSON is a ubiquitous data format, it is not a very compact format suitable for data exchange, for instance over a network. Hence, the library supports [BSON](https://bsonspec.org) (Binary JSON), [CBOR](https://cbor.io) (Concise Binary Object Representation), [MessagePack](https://msgpack.org), [UBJSON](https://ubjson.org) (Universal Binary JSON Specification) and [BJData](https://neurojson.org/bjdata) (Binary JData) to efficiently encode JSON values to byte vectors and to decode such vectors.
```cpp
// create a JSON value
......
......@@ -683,7 +683,7 @@ add_custom_target(ci_infer
add_custom_target(ci_offline_testdata
COMMAND mkdir -p ${PROJECT_BINARY_DIR}/build_offline_testdata/test_data
COMMAND cd ${PROJECT_BINARY_DIR}/build_offline_testdata/test_data && ${GIT_TOOL} clone -c advice.detachedHead=false --branch v3.0.0 https://github.com/nlohmann/json_test_data.git --quiet --depth 1
COMMAND cd ${PROJECT_BINARY_DIR}/build_offline_testdata/test_data && ${GIT_TOOL} clone -c advice.detachedHead=false --branch v3.1.0 https://github.com/nlohmann/json_test_data.git --quiet --depth 1
COMMAND ${CMAKE_COMMAND}
-DCMAKE_BUILD_TYPE=Debug -GNinja
-DJSON_BuildTests=ON -DJSON_FastTests=ON -DJSON_TestDataDirectory=${PROJECT_BINARY_DIR}/build_offline_testdata/test_data/json_test_data
......
set(JSON_TEST_DATA_URL https://github.com/nlohmann/json_test_data)
set(JSON_TEST_DATA_VERSION 3.0.0)
set(JSON_TEST_DATA_VERSION 3.1.0)
# if variable is set, use test data from given directory rather than downloading them
if(JSON_TestDataDirectory)
......
......@@ -23,7 +23,7 @@ namespace nlohmann
namespace detail
{
/// the supported input formats
enum class input_format_t { json, cbor, msgpack, ubjson, bson };
enum class input_format_t { json, cbor, msgpack, ubjson, bson, bjdata };
////////////////////
// input adapters //
......
......@@ -3773,7 +3773,7 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec
auto ia = detail::input_adapter(std::forward<InputType>(i));
return format == input_format_t::json
? parser(std::move(ia), nullptr, true, ignore_comments).sax_parse(sax, strict)
: detail::binary_reader<basic_json, decltype(ia), SAX>(std::move(ia)).sax_parse(format, sax, strict);
: detail::binary_reader<basic_json, decltype(ia), SAX>(std::move(ia), format).sax_parse(format, sax, strict);
}
/// @brief generate SAX events
......@@ -3788,7 +3788,7 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec
auto ia = detail::input_adapter(std::move(first), std::move(last));
return format == input_format_t::json
? parser(std::move(ia), nullptr, true, ignore_comments).sax_parse(sax, strict)
: detail::binary_reader<basic_json, decltype(ia), SAX>(std::move(ia)).sax_parse(format, sax, strict);
: detail::binary_reader<basic_json, decltype(ia), SAX>(std::move(ia), format).sax_parse(format, sax, strict);
}
/// @brief generate SAX events
......@@ -3809,7 +3809,7 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec
// NOLINTNEXTLINE(hicpp-move-const-arg,performance-move-const-arg)
? parser(std::move(ia), nullptr, true, ignore_comments).sax_parse(sax, strict)
// NOLINTNEXTLINE(hicpp-move-const-arg,performance-move-const-arg)
: detail::binary_reader<basic_json, decltype(ia), SAX>(std::move(ia)).sax_parse(format, sax, strict);
: detail::binary_reader<basic_json, decltype(ia), SAX>(std::move(ia), format).sax_parse(format, sax, strict);
}
#ifndef JSON_NO_IO
/// @brief deserialize from stream
......@@ -3965,6 +3965,33 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec
binary_writer<char>(o).write_ubjson(j, use_size, use_type);
}
/// @brief create a BJData serialization of a given JSON value
/// @sa https://json.nlohmann.me/api/basic_json/to_bjdata/
static std::vector<std::uint8_t> to_bjdata(const basic_json& j,
const bool use_size = false,
const bool use_type = false)
{
std::vector<std::uint8_t> result;
to_bjdata(j, result, use_size, use_type);
return result;
}
/// @brief create a BJData serialization of a given JSON value
/// @sa https://json.nlohmann.me/api/basic_json/to_bjdata/
static void to_bjdata(const basic_json& j, detail::output_adapter<std::uint8_t> o,
const bool use_size = false, const bool use_type = false)
{
binary_writer<std::uint8_t>(o).write_ubjson(j, use_size, use_type, true, true);
}
/// @brief create a BJData serialization of a given JSON value
/// @sa https://json.nlohmann.me/api/basic_json/to_bjdata/
static void to_bjdata(const basic_json& j, detail::output_adapter<char> o,
const bool use_size = false, const bool use_type = false)
{
binary_writer<char>(o).write_ubjson(j, use_size, use_type, true, true);
}
/// @brief create a BSON serialization of a given JSON value
/// @sa https://json.nlohmann.me/api/basic_json/to_bson/
static std::vector<std::uint8_t> to_bson(const basic_json& j)
......@@ -4000,7 +4027,7 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec
basic_json result;
detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);
auto ia = detail::input_adapter(std::forward<InputType>(i));
const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::cbor, &sdp, strict, tag_handler);
const bool res = binary_reader<decltype(ia)>(std::move(ia), input_format_t::cbor).sax_parse(input_format_t::cbor, &sdp, strict, tag_handler);
return res ? result : basic_json(value_t::discarded);
}
......@@ -4016,7 +4043,7 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec
basic_json result;
detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);
auto ia = detail::input_adapter(std::move(first), std::move(last));
const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::cbor, &sdp, strict, tag_handler);
const bool res = binary_reader<decltype(ia)>(std::move(ia), input_format_t::cbor).sax_parse(input_format_t::cbor, &sdp, strict, tag_handler);
return res ? result : basic_json(value_t::discarded);
}
......@@ -4043,7 +4070,7 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec
detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);
auto ia = i.get();
// NOLINTNEXTLINE(hicpp-move-const-arg,performance-move-const-arg)
const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::cbor, &sdp, strict, tag_handler);
const bool res = binary_reader<decltype(ia)>(std::move(ia), input_format_t::cbor).sax_parse(input_format_t::cbor, &sdp, strict, tag_handler);
return res ? result : basic_json(value_t::discarded);
}
......@@ -4058,7 +4085,7 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec
basic_json result;
detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);
auto ia = detail::input_adapter(std::forward<InputType>(i));
const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::msgpack, &sdp, strict);
const bool res = binary_reader<decltype(ia)>(std::move(ia), input_format_t::msgpack).sax_parse(input_format_t::msgpack, &sdp, strict);
return res ? result : basic_json(value_t::discarded);
}
......@@ -4073,7 +4100,7 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec
basic_json result;
detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);
auto ia = detail::input_adapter(std::move(first), std::move(last));
const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::msgpack, &sdp, strict);
const bool res = binary_reader<decltype(ia)>(std::move(ia), input_format_t::msgpack).sax_parse(input_format_t::msgpack, &sdp, strict);
return res ? result : basic_json(value_t::discarded);
}
......@@ -4097,7 +4124,7 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec
detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);
auto ia = i.get();
// NOLINTNEXTLINE(hicpp-move-const-arg,performance-move-const-arg)
const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::msgpack, &sdp, strict);
const bool res = binary_reader<decltype(ia)>(std::move(ia), input_format_t::msgpack).sax_parse(input_format_t::msgpack, &sdp, strict);
return res ? result : basic_json(value_t::discarded);
}
......@@ -4112,7 +4139,7 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec
basic_json result;
detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);
auto ia = detail::input_adapter(std::forward<InputType>(i));
const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::ubjson, &sdp, strict);
const bool res = binary_reader<decltype(ia)>(std::move(ia), input_format_t::ubjson).sax_parse(input_format_t::ubjson, &sdp, strict);
return res ? result : basic_json(value_t::discarded);
}
......@@ -4127,7 +4154,7 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec
basic_json result;
detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);
auto ia = detail::input_adapter(std::move(first), std::move(last));
const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::ubjson, &sdp, strict);
const bool res = binary_reader<decltype(ia)>(std::move(ia), input_format_t::ubjson).sax_parse(input_format_t::ubjson, &sdp, strict);
return res ? result : basic_json(value_t::discarded);
}
......@@ -4151,10 +4178,64 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec
detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);
auto ia = i.get();
// NOLINTNEXTLINE(hicpp-move-const-arg,performance-move-const-arg)
const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::ubjson, &sdp, strict);
const bool res = binary_reader<decltype(ia)>(std::move(ia), input_format_t::ubjson).sax_parse(input_format_t::ubjson, &sdp, strict);
return res ? result : basic_json(value_t::discarded);
}
/// @brief create a JSON value from an input in BJData format
/// @sa https://json.nlohmann.me/api/basic_json/from_bjdata/
template<typename InputType>
JSON_HEDLEY_WARN_UNUSED_RESULT
static basic_json from_bjdata(InputType&& i,
const bool strict = true,
const bool allow_exceptions = true)
{
basic_json result;
detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);
auto ia = detail::input_adapter(std::forward<InputType>(i));
const bool res = binary_reader<decltype(ia)>(std::move(ia), input_format_t::bjdata).sax_parse(input_format_t::bjdata, &sdp, strict);
return res ? result : basic_json(value_t::discarded);
}
/// @brief create a JSON value from an input in BJData format
/// @sa https://json.nlohmann.me/api/basic_json/from_bjdata/
template<typename IteratorType>
JSON_HEDLEY_WARN_UNUSED_RESULT
static basic_json from_bjdata(IteratorType first, IteratorType last,
const bool strict = true,
const bool allow_exceptions = true)
{
basic_json result;
detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);
auto ia = detail::input_adapter(std::move(first), std::move(last));
const bool res = binary_reader<decltype(ia)>(std::move(ia), input_format_t::bjdata).sax_parse(input_format_t::bjdata, &sdp, strict);
return res ? result : basic_json(value_t::discarded);
}
template<typename T>
JSON_HEDLEY_WARN_UNUSED_RESULT
static basic_json from_bjdata(const T* ptr, std::size_t len,
const bool strict = true,
const bool allow_exceptions = true)
{
return from_bjdata(ptr, ptr + len, strict, allow_exceptions);
}
JSON_HEDLEY_WARN_UNUSED_RESULT
static basic_json from_bjdata(detail::span_input_adapter&& i,
const bool strict = true,
const bool allow_exceptions = true)
{
basic_json result;
detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);
auto ia = i.get();
// NOLINTNEXTLINE(hicpp-move-const-arg,performance-move-const-arg)
const bool res = binary_reader<decltype(ia)>(std::move(ia), input_format_t::bjdata).sax_parse(input_format_t::bjdata, &sdp, strict);
return res ? result : basic_json(value_t::discarded);
}
/// @brief create a JSON value from an input in BSON format
/// @sa https://json.nlohmann.me/api/basic_json/from_bson/
template<typename InputType>
......@@ -4166,7 +4247,7 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec
basic_json result;
detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);
auto ia = detail::input_adapter(std::forward<InputType>(i));
const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::bson, &sdp, strict);
const bool res = binary_reader<decltype(ia)>(std::move(ia), input_format_t::bson).sax_parse(input_format_t::bson, &sdp, strict);
return res ? result : basic_json(value_t::discarded);
}
......@@ -4181,7 +4262,7 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec
basic_json result;
detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);
auto ia = detail::input_adapter(std::move(first), std::move(last));
const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::bson, &sdp, strict);
const bool res = binary_reader<decltype(ia)>(std::move(ia), input_format_t::bson).sax_parse(input_format_t::bson, &sdp, strict);
return res ? result : basic_json(value_t::discarded);
}
......@@ -4205,7 +4286,7 @@ class basic_json // NOLINT(cppcoreguidelines-special-member-functions,hicpp-spec
detail::json_sax_dom_parser<basic_json> sdp(result, allow_exceptions);
auto ia = i.get();
// NOLINTNEXTLINE(hicpp-move-const-arg,performance-move-const-arg)
const bool res = binary_reader<decltype(ia)>(std::move(ia)).sax_parse(input_format_t::bson, &sdp, strict);
const bool res = binary_reader<decltype(ia)>(std::move(ia), input_format_t::bson).sax_parse(input_format_t::bson, &sdp, strict);
return res ? result : basic_json(value_t::discarded);
}
/// @}
......
This diff is collapsed.
......@@ -72,7 +72,7 @@ endif()
if (CMAKE_CXX_COMPILER_ID STREQUAL "MSVC")
# avoid stack overflow, see https://github.com/nlohmann/json/issues/2955
json_test_set_test_options("test-cbor;test-msgpack;test-ubjson" LINK_OPTIONS /STACK:4000000)
json_test_set_test_options("test-cbor;test-msgpack;test-ubjson;test-bjdata" LINK_OPTIONS /STACK:4000000)
endif()
# disable exceptions for test-disabled_exceptions
......
......@@ -10,7 +10,7 @@ CXXFLAGS += -std=c++11
CPPFLAGS += -I ../single_include
FUZZER_ENGINE = src/fuzzer-driver_afl.cpp
FUZZERS = parse_afl_fuzzer parse_bson_fuzzer parse_cbor_fuzzer parse_msgpack_fuzzer parse_ubjson_fuzzer
FUZZERS = parse_afl_fuzzer parse_bson_fuzzer parse_cbor_fuzzer parse_msgpack_fuzzer parse_ubjson_fuzzer parse_bjdata_fuzzer
fuzzers: $(FUZZERS)
parse_afl_fuzzer:
......@@ -27,3 +27,6 @@ parse_msgpack_fuzzer:
parse_ubjson_fuzzer:
$(CXX) $(CXXFLAGS) $(CPPFLAGS) $(FUZZER_ENGINE) src/fuzzer-parse_ubjson.cpp -o $@
parse_bjdata_fuzzer:
$(CXX) $(CXXFLAGS) $(CPPFLAGS) $(FUZZER_ENGINE) src/fuzzer-parse_bjdata.cpp -o $@
/*
__ _____ _____ _____
__| | __| | | | JSON for Modern C++ (fuzz test support)
| | |__ | | | | | | version 3.10.5
|_____|_____|_____|_|___| https://github.com/nlohmann/json
This file implements a parser test suitable for fuzz testing. Given a byte
array data, it performs the following steps:
- j1 = from_bjdata(data)
- vec = to_bjdata(j1)
- j2 = from_bjdata(vec)
- assert(j1 == j2)
- vec2 = to_bjdata(j1, use_size = true, use_type = false)
- j3 = from_bjdata(vec2)
- assert(j1 == j3)
- vec3 = to_bjdata(j1, use_size = true, use_type = true)
- j4 = from_bjdata(vec3)
- assert(j1 == j4)
The provided function `LLVMFuzzerTestOneInput` can be used in different fuzzer
drivers.
Licensed under the MIT License <http://opensource.org/licenses/MIT>.
*/
#include <iostream>
#include <sstream>
#include <nlohmann/json.hpp>
using json = nlohmann::json;
// see http://llvm.org/docs/LibFuzzer.html
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size)
{
try
{
// step 1: parse input
std::vector<uint8_t> vec1(data, data + size);
json j1 = json::from_bjdata(vec1);
try
{
// step 2.1: round trip without adding size annotations to container types
std::vector<uint8_t> vec2 = json::to_bjdata(j1, false, false);
// step 2.2: round trip with adding size annotations but without adding type annonations to container types
std::vector<uint8_t> vec3 = json::to_bjdata(j1, true, false);
// step 2.3: round trip with adding size as well as type annotations to container types
std::vector<uint8_t> vec4 = json::to_bjdata(j1, true, true);
// parse serialization
json j2 = json::from_bjdata(vec2);
json j3 = json::from_bjdata(vec3);
json j4 = json::from_bjdata(vec4);
// serializations must match
assert(json::to_bjdata(j2, false, false) == vec2);
assert(json::to_bjdata(j3, true, false) == vec3);
assert(json::to_bjdata(j4, true, true) == vec4);
}
catch (const json::parse_error&)
{
// parsing a BJData serialization must not fail
assert(false);
}
}
catch (const json::parse_error&)
{
// parse errors are ok, because input may be random bytes
}
catch (const json::type_error&)
{
// type errors can occur during parsing, too
}
catch (const json::out_of_range&)
{
// out of range errors may happen if provided sizes are excessive
}
// return 0 - non-zero return values are reserved for future use
return 0;
}
This diff is collapsed.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment