Table of contents
If a lion could talk, we would not understand him.
— Ludwig Wittgenstein
Let’s discuss the challenges of communication across heterogeneous systems and the need for using abstract notation for such communications.
What are heterogeneous systems?
To understand heterogeneous systems, let’s first define a system. A system can be composed of many aspects, most important identifying characteristics are
CPU — Architecture, parameters like endianness, word size
Operating System — determines the binary format and process structure, abstracts away some of the hardware details from the next layer
Process — each Operating System will have its view of how a process is run
Programming Language — The programming language running inside the process also plays a role in defining the system. Such as different languages can have different sizes of
int
depending upon the platform CPU’s word size.
A sample view of a few heterogeneous systems.
This should give a better idea about what I mean by a system. Now let’s focus our attention on why we want to define these systems in a semi-formal definition.
As long as a system works totally in silo such that it need not communicate with other systems or need to share output with another system, it doesn’t matter what a system is comprised of. But as soon as we want to communicate we start to see problems arising in the system.
For example, let’s see how two systems one with little endianness and another with big endianness-based CPUs can result in incompatible interpretation if endianness is not taken into account. In the example below if Intel systems don’t convert the byte order it will read it incorrectly as 49414
.
Representation of 1729 in a little-endian vs big-endian memory layout
This example illustrates a point of difficulty in communication between two heterogeneous systems that have a difference in hardware architecture. These differences are not limited to hardware and are present in software as well, imagine we want to communicate an int across two different processes we need to communicate int
, one process is written in Go and another in Java, we must agree on which corresponding data types to use in the respective languages such as in Go we can use int32
as int length depends on CPU word size (on 32-bit computers Go integer is 32-bit long and on 64 bits computers Go integer is 64-bit long) and in Java we have no such problem as defined in Java spec int
is always 32 bits long.
We can extend this argument to programming languages' function signatures if two languages need to communicate and invoke functions among each other they must have a common signature definition.
What is an Interface?
Whenever we have two such systems communicating with each other they form an interface. Whenever such systems need to exchange data across boundaries of systems we need to take into account various factors that define a system, to have a successful communication.
Otherwise, If two heterogeneous systems talked they would probably not understand each other.
Hence we need to create a common language that is not specific to a system. That abstract language is known as Interface Description Language. In our case, we have chosen to study gRPC.
As functions and data are two loosely coupled entities, we have choices available in the gRPC stack, you can use data encoders of your choice apart from Protocol Buffers while using protobuf to define service interfaces. More on encoders and Remote Procedure Calls in the upcoming articles.