— Tensorflow, Python, macOS — 2 min read
TensorFlow models are saved in the protocol buffer format, which is an efficient way of storing and transporting data. I first heard and used it when I worked at Popsa where we used them to speed network requests between mobile apps and the backend. I then realised TensorFlow also uses this format to store models, including the weights and metadata. Popsa had a github repo specifically used to design and agree upon the interface between the backend and the mobile applications. gRPC also uses protocol buffers as the format. gRPC also uses protocol buffers as its format, by default. You can generate the classes to create, serialize and deserialize these objects in your preferred language with the protobuf compiler, protoc.
Instead of protocol buffers, TensorFlow Lite uses FlatBuffers. It has many of the benefits of protocol buffers, but without the cost of needing to deserialise the entire data file. In network requests, we don't pack that much information into a protocol buffer anyway, where a file might be 100KB. We can deserialise the entire data object very quickly. However, for ML model, we might store anywhere between 500KB to 1GB. For example, the CenterNet Object and Keypoints detection model with the Hourglass backbone is ~695MB. It might be important to get some information out of the model without loading the entire model in memory.
I wanted to read the tensorflow lite model (flatbuffers) in Python without using TensorFlow lite. There are easier ways to do this, but I wanted to use flatc!
flatcWe had protoc for protobufs, but I couldn't find the FlatBuffer equivalent, flatc anywhere. Unfortunately, the FlatBuffers repo doesn't provide an obvious way of getting the flatc binary so you can generate your class files. You can either download it from a CI job, or build it yourself.
git clone git@github.com:google/flatbuffers.gitcd flatbufferscmake -G "Xcode" -DCMAKE_BUILD_TYPE=Releasexed .) and build it, or just run xcodebuild clean build: build all the binaries. Find the flatc binary and use it, orcp ./Debug/flatc /usr/local/bin. This means you can use flatc anywherewget https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/schema/schema.fbs. I found it by just searching for it.flatc --python schema.fbs . Code is generated in a new directory:tflite/flatbuffers and numpy:python3 -m venv ./venv. ./venv/bin/activatepip install --upgrade flatbuffers numpy1import flatbuffers2# import numpy # Not used directly, but needed for AsNumpy function calls3from tflite.Model import Model45# Update the path to your model6model_path = "./mobilenet_v2_1.0_224_quantized_1_metadata_1.tflite"7with open(model_path, "rb") as model_file:8 buffer = model_file.read()910# Read output tensor11model = Model.GetRootAs(buffer)1213print(dir(model))1415print(model.Metadata(0).Name())1617output_index = model.Subgraphs(0).Outputs(0)18output_tensor = model.Subgraphs(0).Tensors(output_index)19quantization = output_tensor.Quantization()2021# read its quantizationParameters22print(f"Quantization: {quantization}")23q_min = quantization.MinAsNumpy()24q_max = quantization.MaxAsNumpy()25q_scale = quantization.ScaleAsNumpy()26q_zero_point = quantization.ZeroPointAsNumpy()27print(f"q_min: {q_min}")28print(f"q_max: {q_max}")29print(f"q_scale: {q_scale}")30print(f"q_zero_point: {q_zero_point}")There is an easier way to get the quantization parameters in Python. Just load the model, and read the input or output details. If quantization parameters are set, they will be in the dictionary.
So lets download a quantized model from tfhub.dev, called mobilenet_v1_0.25_128_quantized
Then lets prepare the python environment. TensorFlow now supports Python3.9, so this is easy.
python3 -m venv ./venv. ./venv/bin/activatepip install tensorflow1import tensorflow as tf23interpreter = tf.lite.Interpreter(model_path=model_path)4input_details = interpreter.get_input_details()5output_details = interpreter.get_output_details()Open it in Netron, and the quantization parameters are right there:
The syntax is: min <= scale * (q - zero_point) <= max. I wrote more about the syntax on StackOverflow.
At least now, you know where they sit in the TensorFlow model file, which is actually flatbuffer file with a .tflite extension.