How to implement Advanced Process Control (APC) in a manufacturing facility to improve process capability and product quality
I was recently asked to describe the software components and architecture required to support a 24/7 manufacturing environment. Well, I can tell you this is a very large field of study. I’ve been in manufacturing for over 20 years starting as a process engineer in a semiconductor fab, moving to semiconductor process equipment manufacturing and for the last 10 years designing service-oriented process control systems for facilities in the US, Europe, and Asia.
Through this experience I’ve learned first hand that there are lots of moving pieces in a manufacturing facility such as inventory management, manufacturing execution systems, etc. Further complicating matters, each manufacturing facility I’ve worked with has had its own version of these systems with lots of custom modifications which each posed unique challenges to integration.
So in this post, I will focus on the subset of these components related to process control. I will describe how these individual systems can collaborate to form a comprehensive advanced process control (APC) framework to support improved process capability and quality based on a message-driven architecture.
First of all, I can tell you that the systems and approaches to process control have evolved greatly over the last decade. Many of us have read about the “lights-out” semiconductor fabs that are fully automated, operating with almost no human interaction. When reading these articles or browsing the APC/AEC conference papers, it can seem quite intimidating if your facility has not yet begun the move toward APC much less full automation.
While those utopian places do exist, the rest of us mortals have to deal with legacy software systems, older process equipment (that may not even have a network card) and a laundry list of technologies and approaches that have been developed over years or decades of continuous system evolution. When someone demands to improve process capability on equipment controlled by a 1984 Apple IIe, throwing ones hands in the air in frustration is not an unreasonable reaction.
The good news is that there is a way to wrangle even the most stubborn pieces of legacy equipment into the 21st century. The key is first designing a flexible architecture for allowing communication between all the various systems that you have. With the right architecture in place, adapter software can be written to get these machines participating in APC.
Before diving in, we’ll need to define the individual systems that typically control a factory. My definitions here are extremely informal to keep things moving along. There are lots of resources online describing these systems, if you’d like more formal definitions.
Statistical Process Control (SPC)
Simply stated, SPC systems use statistics to monitor the quality of products created by a process and send alerts when something abnormal is detected. There are many ways to do this but one of the most well known tools is the control chart. Control charts are pretty good at showing when a process is behaving normally and are usually the first step in controlling quality. If you do not have a software system running SPC, this might be some low hanging fruit to improve your operations.
Run-to-Run Control (R2R)
R2R control involves the use of measurement data to optimize a particular process step or set of steps. For example, consider a process that produces a widget. After each widget is made the size is measured. If the size is too large, the process is adjusted so the next part will be targeted to a slightly smaller dimension. This is an example a feedback control system. R2R control systems generally start off controlling a single process but as they grow they can begin to collaborate to achieve factory-wide control solutions.
Fault Detection and Classification (FDC)
FDC involves monitoring the behavior of the manufacturing equipment during operation and detecting events that might affect the quality of the product. A simple example might be an application that parses and analyzes the log files produced during a manufacturing run. If an issue is detected such as a power fluctuation, a message is sent to the factory with the details.
Advanced Process Control (APC)
APC brings together all of these pieces of software. In a fully realized APC system, each component (SPC, R2R, FDC) can interact with the others to form a comprehensive factory control system.
So now you might say, “This all sounds great but let’s be realistic. I’ve got an Apple IIe running my critical process equipment and the vendor is long since gone. That machine is never going to benefit from all this stuff.”
Don’t be so sure. But before we get to that specific scenario, let’s cover some of the more common obstacles to implementing APC.
Obstacles to achieving the full benefits of APC
– R2R control algorithms are “mixed” within the machine control software
This is by far the most common issue I have seen. The machine that is executing the process also contains the run-to-run algorithm used to compute the process settings. This has many downsides.
1. Each identical machine has its own copy of the control algorithm. Algorithms may become out of sync with each other producing unanticipated results.
2. It is impossible to change f(x) without touching the machine code. For a highly automated system, this poses a risk to operations everytime f(x) is updated. Those associated with quality may sometimes push back on making improvements to f(x) because of this risk. Even if these changes will improve quality.
3. Changes to f(x) require updating each machine individually. This causes long delays in implementing process improvements. Depending on the number of machines, f(x) improvements may takes days to weeks to fully implement. Rolling back a change is equally as difficult.
4. Changes to f(x) are done infrequently because of the operational risk and downtime required to update individual machines. In my experience, improving process capability is best achieved by making lots of small changes over time and analyzing the result of each change in isolation. When making a change requires downtime, multiple changes to f(x) are often grouped together. This approach makes analyzing the true impact of these changes difficult.
5. Changes to f(x) are generally “dumbed down” because complex solutions are just too risky to implement in the machine code due to the inability to fully test these solutions offline. These leave potential process improvements unrealized.
Solution: The algorithms used to control the process must be separated from the machine control code. A common way to do this is to place the control algorithms into a dedicated R2R control system. This system is typically a service that can be called by any piece of equipment on the factory floor and serves as a single point for all run-to-run control decisions. This frees the machine code from the burden of managing algorithms which results in cleaner code that is updated less often.
The design of this R2R system can be tailored to each manufacturing environment but should include several key features:
1. System should establish a communication method that is platform and language independent. XML or JSON may be a good fit for this communication. Sharing data between systems using a common set of database tables is not a good idea. I would be more than happy to expand on this point if someone is interested.
2. System should provide users with the ability to implement new control algorithms as well as the ability to rollback control algorithms. Only by providing the capability to quickly revert to a known good state will a manufacturing organization agree to allow frequent updates to f(x).
3. Control algorithms must be fully testable, offline. This is the most critical piece of a run-to-run control system. Unit tests must be written to handle not only the optimal process conditions but all conditions that might occur in manufacturing. Building a suite of tests that fully exercises the control system is the best way to guard against unexpected behavior. This is the reason a message-driven architecture is useful. Unit tests can be written to send messages to a development R2R system to test its behavior. For example, a unit test might send several good measurements and several bad measurements to the R2R system. After sending this data, the test would then ask the R2R system to provide a process recommendation for the next run. The test would conclude by verifying that the answer is correct, given the information that was provided.
4. R2R control system decisions must be logged and traceable. The data used to make each individual process decision must be available so that manufacturing can easily review any process decision made by the system. Often, these investigations will yield information that will further improve the control algorithm. So this is a very important feature both for improving the system as well as gaining the trust of the operations team.
– Fault Detection Systems non-existent or very rudimentary
FDC is probably the most overlooked component of an APC system. Many facilities rely solely on inspections, measurements, and SPC systems to catch product defects. Analyzing the information that is available during a process run can be the only protection from certain kinds of quality issues. However, creating a unified FDC system is often a chore because information availability and format vary greatly vendor to vendor. To begin this process, I would suggest making a list of critical tools and then selecting a tool where the run information is most readily available. This way, you are not struggling with FDC architecture at the same time you are struggling with the equipment.
– SPC, Run-to-Run, FDC have no method to communicate with each other or other systems
This is extremely common in the facilities that I have seen. Often, these systems have grown over the years and were developed by different vendors or internal teams. However, to truly achieve a factory-wide APC system, protocols must be developed which allow these systems to communicate. Communication enables the individual components to operate as a whole to improve quality and process capability.
Below are a few scenarios to illustrate how these systems might communicate.
Scenario 1: The FDC system analyzes a log file and determines that part “ABC” experienced a minor power fluctuation during processing. The FDC system sends a message to the R2R system that part “ABC” should not be used for process control feedback. Note that part “ABC” is not necessarily defective, it is simply not representative of a normal processing run and using it might add noise to the control system, which could degrade sigma.
Scenario 2: The SPC system detects that the measurement for part “DEF” was well outside the 6 sigma limit and determines that it should be scrapped. SPC should be able to tell R2R control that part DEF should not be used for process feedback. SPC may also want to communicate to other factory systems that control inventory in this case.
Scenario 3: The maintenance team has taken machine #1 down for preventative maintenance. SPC and R2R systems must be notified that a machine event has occurred. R2R control may decide to delete all previous feedback data and start from default values. SPC may decide to purge past run data and start a new control chart to avoid false alarms.
“Enough already! Apple IIe … remember that? How can I make that work in this APC fantasyland where everything communicates seamlessly working toward factory wide goals of quality and process capability?” Well, it will take a little work but it can be done.
The first thing to do is list out all the key process input variables (KPIVs) that the Apple IIe system controls. These KPIVs are typically power, time, pressure, rotation, etc. These are the things that are contained in the program or recipe selected by the operator to run the tool.
Second, determine all the key process output variables (KPOVs) that are used to determine whether the process is producing good parts. These might be thickness measurements, hole diameters, etc. Basically, whatever is used to determine if the part meets specification.
Next, interview the operators/techs/engineers and determine how they control this equipment in live production. This process usually takes several meetings. Operator 1 might tweak the power if they notice the last 3 parts were on the low end of the specification. Operator 2 may never touch the machine. He might run the equipment until a specification is violated, then call maintenance. A flow diagram of logic is useful here.
With this information in hand, you have everything you need to automate this process. You will create the message schemas required to communicate the KPIVs and KPOVs at each point in the process. Once the message schemas are agreed to, each system owner will modify their systems to generate/receive/process these messages.
Finally, a modern computer will be placed next to the existing Apple IIe. (Put it on a cart with a network cable, hang it from a chain, whatever works) This computer will serve as the user interface and will tell the operator which program/recipe to select on the Apple IIe for each process run, based on APC recommendations. This interface will not only provide a way to get consistent, operator independent, process decisions to Apple IIe but it will also allow the operator to tell the factory if a different program/recipe is used. For an engineering experiment, the recommendation by APC may not be appropriate, for example. In these cases, the operator can override the APC values and inform the factory that these parts will be processed differently. This decision can be used by other control systems to mark these parts as “experimental” so that they are excluded from use in production control algorithms.
To say I have glossed over some implementation details is an understatement. However, I hope that this post shows the basic steps for bringing even legacy equipment into the modern age of process control.
In summary, if you want to improve the quality and process capability of your manufacturing facility, invest some time evaluating how you are utilizing the systems you use today. Make an inventory of the pieces that you have, decide what pieces you are missing, and then begin thinking about a unified approach which allows these systems to interoperate.
I cannot stress enough that a standardized communication method that is language and platform independent is the key for enabling the benefits of an APC system. For more information, refer to SEMI E133.1 – The Process Control System Standard. This is an XML process control standard I co-authored that is used within the semiconductor industry. If you are in another industry, SEMI E133.1 may help drive discussions about what format is right for you. Unfortunately, the E133.1 standard is owned by SEMI and it is not free. If you’d like to discuss if it is applicable for your company, feel free to drop me a line.