Industrial Evolution – Helping Engineers to Slash Break-Down Maintenance Time with Gen AI

In-Short

CaveatWisdom

Caveat: If root cause during a machine break-down maintenance is not identified with-in time and proper procedures to fix the problem safely is not followed in a challenging industrial environment then they can lead to lot of production loss and create safety concerns for maintenance personal.

Wisdom: Expert level understanding and huge experience in interpreting the problem in machines is required to troubleshoot and fix machines. By leveraging cutting edge Generative AI and IoT technologies which help in understanding root cause and fix problems, break-down maintenance time can be slashed from days to few hours.

In-Detail

Let’s understand the complex industrial environment before we discuss the solution. To build and operate any major industry like Cement, Power, Aluminium Smelter Plants, Water Treatment Plants, etc. it involves many engineering disciplines like Process, Civil, Mechanical, Electrical, Instrumentation, Safety, Chemical and Control Systems.

For example, in a Water Treatment plant which supply water to entire city, it requires process and environmental engineering to treat water, chemical engineering for chlorination and sludge handling, Civil engineering to design and construct huge Clariflocculators, Aerators, Filter Beds, Sludge tanks and Pipelines, Mechanical Engineering for huge equipment like Mixers, Rotors, Pumping Motors, Valves, gates, etc., Electrical Engineering for powering the equipment, Instrumentation Engineering for measuring water parameters, control systems for controlling equipment. All the systems are interconnected and depend on each other’s function controlled by SCADA systems.

Dissecting Breakdown Maintenance Time

In a complex industrial environment where multiple systems are interconnected, a problem in any one system can affect the whole process of the industry, lets first look at some important activities performed by operation and maintenance engineers to do the breakdown maintenance when any problem arises.

  1. First, engineers must follow the safety protocols and shutdown the effected and interrelated systems which can affect the safety of personal and equipment.
  2. Co-ordinate with multiple stakeholders like production managers, quality control, logistics, leadership, safety team, etc., to plan and coordinate the maintenance activities.
  3. For troubleshooting go through lot of equipment manuals, electrical and control wiring diagrams, observing and inferencing parameters from SCADA systems, etc.
  4. By following proper safety procedures, Dismantle the equipment, replace the faulty parts, fix wiring, assemble the equipment back and do the test runs before getting into production.

When we dissect the total breakdown maintenance time and look at where major time is spent we find that most of the time goes in going through huge equipment manuals, wiring diagrams, co-ordination efforts and following procedures.

Solution to Slash Breakdown Maintenance Time from Days to Few Hours

The solution involves deploying multiple Artificial Intelligent Agents that can coordinate with each other and assist engineers in troubleshooting the root cause of the problem. These AI agents can analyse data from various sources, such as equipment manuals, electrical and control wiring diagrams, and SCADA systems, to quickly identify the issue. By doing so, they eliminate the need for engineers to manually sift through vast amounts of information, thereby saving valuable time.

Moreover, these AI agents can facilitate better coordination among multiple stakeholders, including production managers, quality control, logistics, leadership, and safety teams. This ensures that maintenance activities are planned and executed efficiently, minimizing downtime and production loss.

By following proper safety procedures, engineers can dismantle the equipment, replace faulty parts, fix wiring, reassemble the equipment, and conduct test runs before resuming production. The AI agents can also provide real-time guidance and support throughout this process, ensuring that all steps are performed correctly and safely.

In the above architecture AI agents can be easily developed on Amazon Bedrock with multi-agent collaboration feature.  The industrial knowledge base for the Agents can be deployed on cost-effective Amazon Aurora PostgreSQL database with vector database addon. Live data from the factory floor SCADA systems can be ingested into AWS IoT SiteWise and this data can be read and interpreted by Agents along with the Knowledge Base. Plant Engineers through their mobile apps can interact with AI Agents interfaced through Lambda functions and API Gateway.

In my previous post I have discussed about understanding industrial protocols in the perspective of IoT and Cloud where you can find more information on integrating live factory data to cloud.

In this repo I have demonstrated integrating Gen AI with IoT with an use case of wind energy, here you can find code for lambda functions, instructions for creating Gen AI Agent and configuring a simulation OPC server with Kepware.

In summary, the integration of Generative AI and IoT technologies in breakdown maintenance not only enhances the efficiency of troubleshooting but also ensures the safety of personnel and equipment. This innovative approach can revolutionize the way industries handle maintenance, leading to significant time and cost savings.

Understanding Industrial Protocols in the Perspective of IoT and Cloud

In-Short

CaveatWisdom

Caveat: To take advantage of latest technologies like Generative AI on Cloud, data is being ingested from different sources into the Cloud, coming to real-time industrial data, it’s important to understand the nature of data and it’s flow from its source on shop floor of the industry to its destination in the cloud.

Wisdom: To understand the nature of data and its flow, we need to understand the protocols involved at different levels of data flow, like Modbus, Profibus, EtherCAT, DNP3, OPC, MQTT, etc.

 

In-Detail

Before we jump into the IoT and Cloud, it’s important to understand the fact that sophisticated industrial automation systems which includes many sensors, instruments, actuators, PLCs, SCADA, etc., existed decades before the advent of Cloud and IoT technologies.

History

Industry 1.0 started with the advent of machine powered by steam engines which replaced the tools powered by human labour. This is during 1760s.

Industry 2.0 started when the machines were powered by electricity which made production more efficient. This is during 1870s.

Industry 3.0 started when machines were controlled by computers (Programmable Logic Controllers – PLCs) and SCADA (Supervisory Control and Data Acquisition) systems. This is during 1970s

Industry 4.0 started with the advent of Cloud and IoT Technologies from year 2011, this enabled analysing huge amounts of industrial data with respect to enterprise data.

Automation in industries is implemented with the help of Sensors, Instruments, PLCs, Actuators, Relays and SCADA systems.

Protocols:

To establish communication between sensors, instruments, PLCs and SCADA systems and also to support their products many major industrial automations companies like Schneider Electric, Siemens, Allen Bradly, GE, Mitsubishi, etc., have developed many industrial protocols like Modbus, Profibus, EtherCAT, DNP3, etc. If we go to any industry like Refineries, Cement Plants, Wind Farms etc., we find automation systems and instruments working on these protocols.

Modbus: Modbus is a data communication protocol that allows devices to communicate with each other over networks and buses. Modbus can be used over serial, TCP/IP, and UDP, and the same protocol can be used regardless of the connection type.

Profibus: is a fieldbus communication standard for industrial automation that allows devices like sensors, controllers, and actuators to share process values. It’s a digital network that connects field sensors to control systems. Profibus is used in many industries, including manufacturing, process industries, and factory automation.

Many of these industrial protocols are synchronous in nature with Client-Server architecture, they are designed to operate within the plant network delivering data on sub-milli second latency for machine operations.

When industrial systems became more and more complex with multiple providers of equipment in a single plant, major industrial automation companies formed an organization called OPC Foundation to define a standard protocol called OPC which can be interoperable between major industrial protocols. Initially OPC DA use to stand for “OLE for Process Control Data Access” based on Microsoft’s OLE (Object Linking and Embedding) technology which was used for communication between applications in the Windows ecosystem. This OPC DA became a legacy protocol (still used in many old industries) and OPC UA has evolved which is interoperable with multiple operating systems, today the acronym OPC stand for “Open Platform Communications” and UA stands for “Unified Architecture”. You can find more information at OPC Foundation.

As a first step to ingest data to cloud, basically we need to convert industrial protocols to OPC UA first. We can write a Driver software for that in .Net or Java using the OPC standard from OPC Foundation, or we can use software from companies who have already done it. There are many providers for this OPC software and open-source implementations are also available. Some major providers are Kepware OPC Server and MatrikonOPC.

The OPC servers will poll the data on different devices with Industrial Protocol Drivers and make the data available on OPC for the clients.

These industrial protocols including OPC are heavier, that is the data packet size is higher and because of the synchronous nature it becomes difficult to have a reliable connection over internet or to send data to a remote server in another geography over long distances, this is the reason for invention of light weight MQTT protocol by Andy Stanford-Clark (IBM) and Arlen Nipper (then working for Eurotech, Inc.) who authored the first version of the protocol in 1999.

Because of MQTT’s light weight nature and pub-sub model, it has been widely adopted for IoT (Internet of Things) after the advent of cloud computing.

As a second step to ingest data into cloud we must convert OPC to MQTT. IoT SiteWise OPC UA collector does this for us by becoming a client to the OPC server, subscribing or polling the data from OPC server and then converting it to MQTT. IoT SiteWise OPC UA collector is a component of AWS IoT Greengrass which is an edge runtime helping to build, deploy and manage IoT applications on the devices.

Important Points to Note:

Data is transitioning from Synchronous industrial protocols to Asynchronous IoT protocol, due to which there could be latency for the real-time data in the cloud.

  1. Key decisions to control the machine should be done at the factory level this can be achieved by running Lambda functions on IoT Greengrass.
  2. Important data processing for tasks such as Predictive Preventive Maintenance can be done with ML Inference components in IoT Greengrass.
  3. Post data ingestion to cloud, tasks like Analytics, Visualization and integrations with other services like Generative AI Apps, SAP systems can be done in the Cloud.