Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json start/end position implementation #4517

Merged
merged 35 commits into from
Dec 18, 2024
Merged

Conversation

sushshring
Copy link
Contributor

Abstract

Referring to discussion: 4455, this pull request introduces the implementation to retrieve the start and end positions of nested objects within the JSON during parsing.

Motivation

We have a service implementation with JSON schema where a field within the nested objects contains the hash value for that object. The service verifies the hash value of each of the nested objects before operating on the rest of the data sent.

For example, consider the following JSON:

{
    "name": "foo",
    "data":
    {
        "type": "typeA",
        "value": 1,
        "details": {
            "nested_type": "nested_typeA",
            "nested_value": 2
        }
    },
    "data_hash": "hashA"
}

Here, data_hash contains the hash of the object "details". In order to verify the data hash, we need to be able to retrieve the exact string that parsed out "details" including the spaces and newlines. Currently there is no way to achieve this using nlohmann/json parser.

Changes proposed

  • Add two fields to basic_json: size_t start_position and size_t end_position.
  • Add a reference to the lexer in json_sax_parser to retrieve the current position in the input string.
  • Whenever a BasicJsonType is created by the parser, calculate the start and end positions for that object from the original string and store those values.

Memory considerations

We considered storing substrings in the output JSON objects and sub-objects directly as well, however, considering the memory footprint increase that it would create, we opted for the option where only two size_t fields are stored per basic_json created.

Validation

We have added tests to the class_parser test suite that cover the following cases:

  • Array inside an object
  • Objects inside arrays
  • Doubly nested objects
  • String fields
  • Integer and float fields
  • Float values with insignificant digits
  • Boolean fields
  • Null fields

Since the change affects the sax_parser, for each of these test cases we validate scenarios where no callback is passed, a callback is passed that accepts all fields, and a callback is passed that filters specific fields.


Pull request checklist

Read the Contribution Guidelines for detailed information.

  • Changes are described in the pull request, or an existing issue is referenced.
  • The test suite compiles and runs without error.
  • Code coverage is 100%. Test cases can be added by editing the test suite.
  • The source code is amalgamated; that is, after making changes to the sources in the include/nlohmann directory, run make amalgamate to create the single-header files single_include/nlohmann/json.hpp and single_include/nlohmann/json_fwd.hpp. The whole process is described here.

Please don't

  • The C++11 support varies between different compilers and versions. Please note the list of supported compilers. Some compilers like GCC 4.7 (and earlier), Clang 3.3 (and earlier), or Microsoft Visual Studio 13.0 and earlier are known not to work due to missing or incomplete C++11 support. Please refrain from proposing changes that work around these compiler's limitations with #ifdefs or other means.
  • Specifically, I am aware of compilation problems with Microsoft Visual Studio (there even is an issue label for this kind of bug). I understand that even in 2016, complete C++11 support isn't there yet. But please also understand that I do not want to drop features or uglify the code just to make Microsoft's sub-standard compiler happy. The past has shown that there are ways to express the functionality such that the code compiles with the most recent MSVC - unfortunately, this is not the main objective of the project.
  • Please refrain from proposing changes that would break JSON conformance. If you propose a conformant extension of JSON to be supported by the library, please motivate this extension.
  • Please do not open pull requests that address multiple issues.

@coveralls
Copy link

coveralls commented Nov 26, 2024

Coverage Status

coverage: 99.639% (+0.005%) from 99.634%
when pulling c4d1091 on sushshring:develop
into 6cb099e on nlohmann:develop.

@nlohmann
Copy link
Owner

Thanks for the effort!

However, adding two size_t members is a lot of overhead. When we introduced diagnostics, we hid a single pointer behind a preprocessor macro to avoid every single client to suffer from the overhead. Issues like #4514 show that the memory efficiency is already quite bad.

I am hesitant how to continue here.

@nlohmann nlohmann added the state: please discuss please discuss the issue or vote for your favorite option label Nov 26, 2024
@gregmarr
Copy link
Contributor

I wonder if this is something that could be done with the data in the custom base class, so that only those that want to opt in to this behavior could enable it. That does make it a custom class, and not nlohmann::json or nlohmann::ordered_json.

@sushshring
Copy link
Contributor Author

sushshring commented Nov 26, 2024

Taking that advice, I'm gonna add a new class like below to use as a json custom base class

class json_base_class_with_start_end_markers {
    size_t start_position = std::string::npos;
    size_t end_position = std::string::npos;

public:
    size_t get_start_position() const noexcept
    {
        return start_position;
    }

    size_t get_end_position() const noexcept
    {
        return end_position;
    }

    void set_start_position(size_t start) noexcept
    {
        start_position = start;
    }

    void set_end_position(size_t end) noexcept
    {
        end_position = end;
    }
};

We will use if (std::is_base_of<json_base_class_with_start_end_markers, BasicJsonType>){} whenever the start_position and end_position setters are called within json_sax.hpp.

docs/examples/diagnostic_positions.cpp Outdated Show resolved Hide resolved
include/nlohmann/json.hpp Outdated Show resolved Hide resolved
include/nlohmann/json.hpp Outdated Show resolved Hide resolved
include/nlohmann/json.hpp Outdated Show resolved Hide resolved
include/nlohmann/json.hpp Outdated Show resolved Hide resolved
Copy link

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @sushshring
Please read and follow the Contribution Guidelines.

@sushshring
Copy link
Contributor Author

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @sushshring Please read and follow the Contribution Guidelines.

Not sure why it's saying this, i've run amalgamate on the recent commit.

@nlohmann
Copy link
Owner

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @sushshring Please read and follow the Contribution Guidelines.

Not sure why it's saying this, i've run amalgamate on the recent commit.

The pipeline was quite busy, so maybe this was just delayed.

@nlohmann
Copy link
Owner

Once the pipeline is through, please check why the coverage went down. Note you can download an artifact from the coverage job which contains HTML pages showing which lines are not covered. As always, coverage information is a bit fuzzy and sometimes you see the closing braces of functions in red which makes no sense. Nonetheless you should make sure every added code is covered by a test.

@sushshring
Copy link
Contributor Author

The coverage check is also odd. The commit here 8c67186 has the same coverage, which seemed acceptable to it.

Regardless, I can add one more test that improves the coverage for json_type_t::discarded, but the remaining missing coverage check is for the default switch branch which has an assert(false) since it should never be hit.

@nlohmann
Copy link
Owner

These lines can be skipped by adding // LCOV_EXCL_LINE.

@sushshring
Copy link
Contributor Author

@nlohmann looks like CI is all green. Any other blockers before this can be checked in?

@nlohmann nlohmann removed the state: please discuss please discuss the issue or vote for your favorite option label Dec 18, 2024
Copy link
Owner

@nlohmann nlohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@nlohmann nlohmann added this to the Release 3.11.4 milestone Dec 18, 2024
@nlohmann
Copy link
Owner

Please update to the latest develop branch which contains a fix for Clang-Tidy, see #4558.

@nlohmann nlohmann added the please rebase Please rebase your branch to origin/develop label Dec 18, 2024
Copy link

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @sushshring
Please read and follow the Contribution Guidelines.

@nlohmann nlohmann merged commit 58f5f25 into nlohmann:develop Dec 18, 2024
123 checks passed
@nlohmann
Copy link
Owner

Thanks a lot!

@sushshring
Copy link
Contributor Author

Woot woot 🎉!

@nlohmann nlohmann removed the please rebase Please rebase your branch to origin/develop label Dec 19, 2024
nlohmann pushed a commit that referenced this pull request Dec 20, 2024
* Add implementation to retrieve start and end positions of json during parse

* Add more unit tests and add start/stop parsing for arrays

* Add raw value for all types

* Add more tests and fix compiler warning

* Amalgamate

* Fix CLang GCC warnings

* Fix error in build

* Style using astyle 3.1

* Fix whitespace changes

* revert

* more whitespace reverts

* Address PR comments

* Fix failing issues

* More whitespace reverts

* Address remaining PR comments

* Address comments

* Switch to using custom base class instead of default basic_json

* Adding a basic using for a json using the new base class. Also address PR comments and fix CI failures

* Address decltype comments

* Diagnostic positions macro (#4)

Co-authored-by: Sush Shringarputale <[email protected]>

* Fix missed include deletion

* Add docs and address other PR comments (#5)

* Add docs and address other PR comments

---------

Co-authored-by: Sush Shringarputale <[email protected]>

* Address new PR comments and fix CI tests for documentation

* Update documentation based on feedback (#6)

---------

Co-authored-by: Sush Shringarputale <[email protected]>

* Address std::size_t and other comments

* Fix new CI issues

* Fix lcov

* Improve lcov case with update to handle_diagnostic_positions call for discarded values

* Fix indentation of LCOV_EXCL_STOP comments

* fix amalgamation astyle issue

---------

Co-authored-by: Sush Shringarputale <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants