mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [mesos] bmahler commented on issue #356: libprocess: check protobuf (de)serialisation success.
Date Tue, 07 Apr 2020 19:59:41 GMT
bmahler commented on issue #356: libprocess: check protobuf (de)serialisation success.
URL: https://github.com/apache/mesos/pull/356#issuecomment-610590957
 
 
   Took a look at the protobuf code:
   
   > Before the code didn't check at all the return value of
   > Message::SerializeToString, which can fail for various reasons,
   > e.g. out-of-memory, message too large, or invalid UTF-8 string.
   
   As far as I see, `MessageLite::SerializeToString` boils down to [this code](https://github.com/protocolbuffers/protobuf/blob/v3.11.4/src/google/protobuf/message_lite.cc#L422-L436):
   
   ```
   bool MessageLite::AppendPartialToString(std::string* output) const {
     size_t old_size = output->size();
     size_t byte_size = ByteSizeLong();
     if (byte_size > INT_MAX) {
       GOOGLE_LOG(ERROR) << GetTypeName()
                  << " exceeded maximum protobuf size of 2GB: " << byte_size;
       return false;
     }
   
     STLStringResizeUninitialized(output, old_size + byte_size);
     uint8* start =
         reinterpret_cast<uint8*>(io::mutable_string_data(output) + old_size);
     SerializeToArrayImpl(*this, start, byte_size);
     return true;
   }
   ```
   
   This will only return false for the message being too large? Where do you see out of memory
being handled? It also looks like invalid UTF-8 does not fail serialization, but rather only
logs an error for "proto3" always and logs an error in debug mode for "proto2", see [this
example of a proto3 message](https://github.com/protocolbuffers/protobuf/blob/v3.11.4/src/google/protobuf/any.pb.cc#L214).
   
   On the parsing side, looks like the possible cases for a false return are: `!IsInitialized()`
[[1]](https://github.com/protocolbuffers/protobuf/blob/v3.11.4/src/google/protobuf/message_lite.h#L529)[[2]](https://github.com/protocolbuffers/protobuf/blob/v3.11.4/src/google/protobuf/message_lite.h#L475-L479),
didn't consume entire message [[3]](https://github.com/protocolbuffers/protobuf/blob/v3.11.4/src/google/protobuf/message_lite.cc#L135),
invalid message data (e.g. invalid varint, length of a string / submessage goes out of bounds).
For invalid UTF-8, it looks [pretty complicated](https://github.com/protocolbuffers/protobuf/blob/e667bf6eaaa2fb1ba2987c6538df81f88500d030/src/google/protobuf/compiler/cpp/cpp_helpers.cc#L1034-L1101):
   
   * If the message is "proto3", it seems to always validate UTF-8 by directly calling `WireFormatLite::VerifyUtf8String`
and will then strictly fail parsing if not UTF-8.
   * For "proto2" code, it seems to call [`WireFormat::VerifyUTF8StringNamedField`](https://github.com/protocolbuffers/protobuf/blob/df6e3a2f54ca893635119e98688a47ad85ecca26/src/google/protobuf/wire_format.h#L355-L381)
and **only log** if it's invalid UTF-8, not fail. Also the code seems to be within a `#ifdef
GOOGLE_PROTOBUF_UTF8_VALIDATION_ENABLED` which in turns is [only enabled if in a debug build
i.e. `!NDEBUG`](https://github.com/protocolbuffers/protobuf/blob/e667bf6eaaa2fb1ba2987c6538df81f88500d030/src/google/protobuf/wire_format_lite.h#L54-L57).
I'm a little puzzled at how you saw the logging, is your libprotobuf using debug mode? If
you're using protoc from mesos I wonder if the way it's built in mesos is hitting the same
issue as https://github.com/Homebrew/legacy-homebrew/issues/9279.
   
   Here's a summary of my findings:
   
   **Serialization of invalid UTF-8**:
   * "proto3": serialization will succeed but invalid UTF-8 will be logged.
   * "proto2": serialization will succeed. Invalid UTF-8 will only be logged in debug mode
(NDEBUG not defined).
   
   **De-serialization of invalid UTF-8**:
   * "proto3": de-serialization will fail and it will be logged.
   * "proto2": de-serialization will succeed. Invalid UTF-8 will only be logged in debug mode
(NDEBUG not defined).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message