2023 IoTDB Summit: Christofer Dutz - Universal Data-Acquisition from Industrial Hardware with PLC4X

I'm a bit sad that I'm not able to be with you right now, so I have to do this remotely, but I'll do my best to still get you on board with what currently really excites me.

So I'll be talking about universal data-acquisition from industrial hardware with Apache PLC4X, so what does that have to do with IoTDB? I mean we're at the IoTDB Summit. But the thing is, it's great if you can store data, but it's even better if you can actually get data, and that's what I think currently is still one of the biggest, mostly unsolved problems with industrial automation. We now know how to store and process huge amounts of data, but still, accessing them is quite a problem.

So what will I be talking about? In general, first of all, I'm gonna give you a little overview of the current state of OT communication. After that, I'll be introducing Apache PLC4X. I would expect that the one or the other of you has already heard of it, but just to make sure that everybody has heard of it and everybody likes it as much as I do, I'll be talking a bit about Apache PLC4X. And in the end, I'm going to wrap up the session with my vision on how we can use PLC4X and IoTDB together to solve real industrial problems.

01 Current State of OT Communication

So, current state of OT communication. For those of you who are not that familiar with these topics, OT is sort of like the IT of the industrial automation. While in the rest of the world, everything with IT is sort of like software, computers, networks. Well, it's the same. OT is automation hardware, automation networks, and automation software. So in general, we currently come from the IT world, we want to communicate with devices from the OT world, and that's where it starts getting tricky.

If you believe the industry, especially last month, we were at the SPS fair in Nuremberg, and if you had a walk around, you would believe that this isn't actually a problem, because everybody was proclaiming that OPC-UA solved this problem.


But the thing is, that's the marketing side of things, but if you have a deeper look at reality, and this is from a company called HMS, they publish this chart every year. So I've been using their comparison charts in my presentations for the last 6 or 7 years, and in general, they haven't been changing that much.

The one thing you can see is that industrial ethernet is growing faster, a lot than the Fieldbus, so you can imagine that industrial ethernet is all the, let's say, OT communication based on normal network infrastructure. Fieldbus usually has some proprietary wire connections.

While industrial ethernet is growing in comparison with Fieldbuses, one thing you should probably notice, there is one topic you can't see on this chart, and that's OPC-UA, and that's just because its adoption isn't actually that big. So right now, I would assume that OPC-UA is among others, all allocated in the “Other Ethernet”, which is not a small portion, but still, it's sharing its shares there with other protocols like MQTT. Well, in PLC4X we have loads of other ethernet-based protocols, so they're all sharing these very few shares down there.


Why is that the case? From my perspective, OPC-UA has lost a lot of trust in the last few years. One reason is it's pretty expensive. You have to replace your hardware with new devices, and usually, these PLCs that support OPC-UA, they're a bit more expensive than the ones that don't. And if you want to use OPC-UA, you usually have to buy a license. I think the last number I had was like 400 Euros per device to enable OPC-UA. It may vary from vendor to vendor, but in general, you have to pay even if it's just for the PLC.

But from my perspective, the worst thing is it's really bad performance, so you can really have a very powerful PLC, and you can still get it to fall on its knees by just asking too many simple questions at a time. Maybe as a comparison on one Siemens S7 device, we were able to pull 2600 data points every 2 seconds without having a significant impact on the load of the PLC. Well, you can completely make it shut down with OPC-UA by just asking 200 data points every 2 seconds, so maybe that gives you a little bit impression on how low-performance it actually is.

Also, another thing that's been coming up more and more is that it actually introduces new compatibility issues. Not on communication layers, but on higher levels, because OPC-UA also introduces a concept of a companion spec. It's sort of like a standard specification of, let's say, devices of a certain domain. But here, it feels like there are now fights on whose companion spec is the one to dominate the others, so we just lifted the things one level higher.

All this causes for its adoption to be relatively low, compared with all the proclamations they do. And the only current serious alternative to it, I think, is MQTT, but even MQTT still requires you to retrofit your device.


So how to solve this problem? You could introduce new PLCs that use OPC-UA and MQTT as gateways, but, yet still have to pay a bit of money for them. There are a number of protocol conversion gateways. They also usually come with a quite steep price tags. You could buy some commercial drivers, and write software for directly communicating within the industrial hardware. But I think the last time I checked drivers from the major vendors of commercial drivers, for an S7 device to run on a Linux node, I think we were at something near 5000 Euros per node. So you can imagine that this price goes steeply up, so it's very expensive.

Then, in 2017 when I started PLC4X, there were also some open-source drivers, but they were usually badly licensed. Most of them were a GPL license, which makes it difficult for using them in commercial settings. Most of them were also badly written. Some of them you could really see how they just took C code and converted that to Java without changing anything in the structure. And most of them were, even then, already no longer maintained.


02 Introducing Apache PLC4X

In order to solve that problem, I had the joy and luck to be able to start a new project, and that was to become Apache PLC4X.

What is PLC4X? I think our project statement wraps it up quite nicely. PLC4X is a set of libraries for communicating with industrial programmable logic controllers using a variety of protocols but with a shared API. And especially the last part of that sentence is the one that actually makes quite a difference, the shared API. Because till then, you need to for every driver you use, it's like defining the structure of your software. And we try to build something similar to JDBC or ODBC, where you have a shared API, and you could use that with a variety of different products.


The core concepts of PLC4X are that you write applications and you only use the API module for writing your software. That defines all the data types and the data structures for communicating. Then for each protocol that you want to support, a driver implements that logic, and it completely takes care of mapping the PLC data types of that particular protocol into the standard data types. There is a number of integration modules available that allow you to integrate PLC4X into other projects, such as NiFi, Camel, StreamPipes, but there is a huge number of other projects available.

It's called PLC4X and not PLC4J, and that's for a reason, because we're not only focusing on writing Java drivers, because we knew that in the automation industry, Java is not, let's say, the dominant language. And generally, understanding a driver is the tricky part, so we built a code generation framework that allows us to write drivers for any language that we write templates for. Right now, the usable versions are in Java, Go and C, while the community is working hard on some C#, Python, and even Rust versions of the driver. The goal of this is to write software that is almost independent of the actual PLC used.


Which operations does PLC4X support? Currently, it supports discovery. Some PLCs or industrial hardware, automation hardware, support discovery, so if they support that, our Discover module definitely will pick them up. Once you've found a device and you've connected to the device, you sometimes want to know what data does this device provide, so that is provided by the Browse API. Now that you know what's on the device, you might want to read it. That's done with the Read API, or you want to write it with the Write API. Some protocols even support subscriptions, so you can also subscribe for variables, and then you get them in regular intervals or whenever they change.

And one API, that is indicated by the wrench there, that is what we are currently working on, is a Publish API. That's generally used for Fieldbus protocols such as ProfiNet or EtherCAT, where the application also emits values in certain intervals, but that's still something we're currently working on.


I mentioned PLC4X was built around a number of concepts, and one of them is that of a protocol, and that generally wraps and implements the general logic of a protocol. It doesn't handle transfer, because for that we have the concept of a transport. But the protocol itself, it handles serialization, deserialization, the model types for a given protocol. It handles the protocol state, how tag addresses or field addresses are written down. And it takes care of the mapping of the data types using the PLC to the PLC4X standard data types.

For those of you who are not familiar with the industrial protocols, there are, let's say, there's a really wide variety of complexities here. For example, if you take the Modbus protocol, that's usually one we use as first protocol to implement when implementing drivers for a new language, just because it's very, very simple.


To give you a little impression on how complex things could be, here's a little demonstration on this state graph of a Beckhoff ADS protocol, of the connections. On the one side where the yellow thing is, that's the connection establishment. You can see there are a lot of states and conditions where the control flow takes a different path. But then we have, let's say, I think the red is for reading, the orange is for writing, and for browsing, and for subscriptions, and whatever. You can see it's quite complex, and usually in the past, you would had to have this complexity in your program, and that's what PLC4X takes away from you completely. You only need to say, "I want to read this variable", then PLC4X takes care of all the magic paths needed for that.


So, which protocols do we support yet? Siemens S7, that was definitely the first. Beckhoff ADS, Modbus, we support in both TCP and Serial, so Modbus/TCP, Modbus/ASCII and Modbus/RTU are supported. We support EtherNet/IP, and here even the Logix variant that adds quite a number of additional features that are only available on Allen-Bradley devices. And we support OPC-UA with our own driver implementation. We're working hard on BacNet. KnxNet/IP is quite stable. Firmata, that is sort of a very simple protocol for using on Arduinos or something like that. We have an implementation for the CAN protocol. Some of you might know, last year or this year, I quickly implemented an IEC-60870 protocol for gas turbines. I'm currently working on ProfiNet. A colleague of mine is working on the C-Bus. Open-Protocol is from Atlas Copco. It was called Torque-Tools in the past. It's like a protocol for, let's say, automated wrenches, used in automotive and car manufacture. I'm also currently working on Bosch Rexroth CtrlX connector. Emerson DeltaV, that is like a special one, because I no longer have the hardware available to continue working on that, but we were actually the first people to be able to reverse-engineer the Emerson DeltaV protocol. One protocol I've still got on my to-do, and I really hope that with recent changes that'll be possible soon, is the new Siemens S7 protocol, but that'll probably take another few months or so, till I will be able to get started on that.


03 My Vision of PLC4X and IoTDB

As I mentioned, I have loads of visions of how we can use PLC4X and IoTDB together to really have an impact in the industry.

My vision for PLC4X and IoTDB is to build a gateway software using PLC4X that's runnable on small edge computers or even on the PLC itself. It uses PLC4X for communicating with the machines, and it produces data in a format called Sparkplug B. Sparkplug B, some of you might have heard, but it's a pretty new protocol that, I think on the website they aim to be offering the plug and play for industrial automation, and it sort of does what OPC-UA is trying with the companion specs. And with Sparkplug B, we will be not only emitting data, but also the schematics of the data, and the communication between this edge device and IoTDB should be purely based on MQTT. Well, Sparkplug B implies MQTT, but I just wanted to lay emphasis on this. In IoTDB, we would be using a new Sparkplug B adapter that is able to consume Sparkplug B data, and directly insert that into IoTDB.


It is a little schematic on how this generally would look like. So you can see, there are some Sparkplug B-enabled devices, and there will be more and more, so if you've got one of those, it will be sending to an MQTT broker, and the IoTDB Sparkplug B-adapter will be just directly consuming that. If you've got some legacy hardware, then you would be using the Apache PLC4X Edge Gateway that simply communicates with the other protocols and publishes Sparkplug B messages via MQTT, which are then consumed by the Sparkplug B-adapter and inserted into IoTDB.


So what's needed in order to do that? Well, in order to be able to do that, Apache IoTDB needs to be able to connect to an external MQTT broker. Right now, if we want to use MQTT in IoTDB, we enable an embedded MQTT broker, but I think that's not really scalable in large installations, so I think one of the things we really need to change pretty soon is to refactor the MQTT adapter that we currently have, to not use the embedded broker, but to become a client, and based on these changes, to implement a Sparkplug B adapter for IoTDB. Right now our MQTT adapter consumes JSON messages that are formatted in a special format, while Sparkplug B consumes binary payload that is Protobuf encoded, so that will be a completely different component that will be able to consume Sparkplug B communication.


Next thing is, which changes or extensions to Apache PLC4X are needed? Well, on the one hand side, I said that we support subscriptions. Well, right now we support subscriptions only on protocols that actually support that, right now that's the Beckhoff ADS protocol, ProfiNet, KnxNet, BacNet, I think that was all of them. But the thing is, we always planned to emulate subscriptions on drivers that actually don't support that. For example, on Modbus, in the background, we would be polling a certain value, and depending on the settings of the subscription, we would only be firing an event if the value actually changed.

Another part that is widely used in PLC4X, is what we call scraper. You can think of it as a scheduler that handles various tasks of collecting sets of data in a given interval on a given connection. That would greatly need an update for supporting subscriptions and sometimes even mixtures. So one thing that I think we really need is the ability to have a set of variables that are actively read by polling them, but to have a trigger for polling them that might be based on the subscription. And that's one thing the current implementation doesn't support, and one thing that definitely needs being addressing.

And the last thing, well, if I'm implementing a Sparkplug B consumer on the IoTDB side, for this proposed edge gateway, I would need to implement a Sparkplug B emitter. But that should actually be quite simple task, because there's already an Eclipse Project that's providing all the necessary infrastructures for that.


With these changes, I think IoTDB can become the central data-hub for industrial automation systems. We can use it for implementing a software that provides what in the industrial automation field is often called the Unified-Namespace. It's a single place that you go to get information about your equipment. It's like replacing the current hierarchy, where one layer builds on top of the other, and every application separately asks for the same bits of information from industrial hardware. It would change to that these applications no longer connect directly to the industrial hardware, but they connect to our system, and simply ask IoTDB for the current values.

And with that, as soon as we have that in place, I think IoTDB can become the real central piece for building future custom MES systems. Because let's face it, the MES systems available out there, they just don't scale. At least I don't know a single one that actually scales across multiple nodes. They're all single-node systems that usually what you can do in your factory is limited by the size of server that you're able to purchase, and that doesn't work nicely. I've seen in the past that if you try applying modern IoT or modern IT concepts, something like microservices or just write services to industrial automation, you can do things a lot better. And that's why I'm really hoping for that we will be able to change in the future.

I hope I was able to provide you with some interesting ideas and concepts. Thank you for watching and I think I'll be available for questions, I hope. But looking forward to seeing you all and have a great time.



Christofer Dutz

Solution Consulting Expert at Timecho

Board Member at Apache Software Foundation