Smart Data Visualization for High-Throughput Data

Whether you are operating high-throughput equipment or you only work with the results, visualizing the data efficiently is key for sending a quick and clear message. At Avantium, we operate units that test up to 64 reactors in parallel. This means that the amount of data generated quickly builds up and conventional visualization plots provide more confusion than clarity. This makes it necessary to use smart visualization figures allowing operators and scientist to quickly decide on unit performance or interpret data in a fast and efficient way.

How to visualize Trending Data?

Assume that the reaction pressure is a critical process parameter, which you want to monitor closely. A common approach would be to create 64 scatter plots (one per reactor) showing the trends of reaction pressure versus time (see Figure 1). Reviewing these 64 individual figures would be a lengthy process and it could even become confusing with so many plots. In a different approach, you could make a scatter plot that includes the data of the 64 reactors and choose different colors (or markers) to differentiate them. In this case, the density of the data is so high that one could barely distinguish data from one reactor to the other, and there will be little understanding of what is happening with this critical process parameter, as shown in the picture.

Figure 1 : Reaction Pressure in dependence of time 

A smarter way to visualize trending data

The use of a smart figure would allow both operators and scientists to quickly identify if a process variable is within specifications or if actions are required, e.g. a pressure drift could indicate the plugging or bypassing of the catalytic bed. The smart plot we have chosen for this case is called a Cell plot. At Avantium we automatically generate these type of figures using JMP and using the trending data from our Flowrence software. A typical cell plot representing the reaction pressure as a function of time, for 64 reactors working in parallel, is shown in Figure 2.

In this case, the color represents a continuous scale, showing red and blue for the upper and lower specification limits, respectively. When you look at the graph, you can now easily see that the pressure in all 64 reactors is within the limits of the specifications. In addition, as the color per reactor varies slightly one could interpret that the pressure per reactor is stable. Besides, one could quickly detect that R21-R23 have a higher-pressure trend compared to the other reactors. This fast analysis could be followed by a more detailed review of process conditions of reactors showing irregular behavior, for example by use of Shewhart control charts, but instead of drawing 64 plots, we would need to focus only in 3 of them, corresponding to reactors R21 to R23.

Figure 2 : Cell Plot of Reaction Pressure in dependence of time 

Visualize Catalyst Performance Data

In high-throughput catalyst testing programs, the target is usually the evaluation of different catalyst formulations to identify the most promising materials considering key performance indicators, catalyst synthesis price and/or catalyst synthesis recipes. Within Avantium, we have specialized catalyst-testing services to evaluate a broad range of catalysts in terms of conversion and yield to the desired product while keeping an eye on the final price. In these large screening campaigns, the priority is to quickly identify the catalysts with the highest yield. However, we observe that there is an increasing desire to cluster the data based on the composition, which allows the customer to understand better the results.

Figure 3 : Catalyst Yield vs. Catalyst Price

In Figure 3, one could identify the circled catalyst as the most promising candidates, considering that the price is relatively low and the yield is relatively high. However, immediately questions arise: “How many of the ZSM-5 containing catalysts where tested?” or “Are these promising catalysts outliers? “

Figure 3 : Catalyst Yield vs. Catalyst Price

In Figure 3, one could identify the circled catalyst as the most promising candidates, considering that the price is relatively low and the yield is relatively high. However, immediately questions arise: “How many of the ZSM-5 containing catalysts where tested?” or “Are these promising catalysts outliers? “
One way to answer these questions is to transform the data and use a bar chart, grouped by active component as shown in Figure 4. Now, one can easily identify which catalysts perform better, based on the Yield/Price ratio, and the color map provides a direct link to rank material based on their cost.

A disadvantage is that the x-axis is saturated because of the number of catalysts screened in this test, and therefore, its analysis becomes cumbersome. This is especially ineffective for presenting data or results at management level meetings. The use of a smart figure would allow clear and fast interpretation of the data.

Ideally, everyone should be able to look at the plot and in an instant be able to judge which catalysts are performing better, and answer questions like “Did I select a reasonable amount of materials from each class?” and “What cost level is the catalyst?”. For this case, we have selected a Treemap as a smart plot visualization.

Figure 4 : Bar chart of Catalyst Yield vs. Catalyst Price

Figure 4 : Treemap plot of Catalyst Yield vs. Catalyst Price

The figure shows two axis, Yield/Cost versus Catalyst and the color as the total cost. However, the technique is now to segregate catalysts in different areas of the plot to indicate the main zeolite component, and to use the size of the individual boxes to provide information on the Yield/Cost ratio. Please note that the area of each box is proportional to the normalized Yield/Cost ratio, which allows comparison across catalysts with different zeolites.

It can easily be observed that, within these tests, a similar number of total catalysts containing USY compared to the sum of SAPO-34 and BETA were tested, and approximately a third of the samples tested contained ZSM-5 as active zeolite material. Within each of these areas, one can identify the best performing catalyst on the upper left corner and the color indicates the normalized cost.

The results shown in this Treemap indicate that the best catalyst is Cat 11 (among the tested materials, of course); the number of catalysts screened with this active component is significant (box size of ZSM-5 is a 1/3 of the total amount screened); and lastly, that a proposal for follow-up test could consider more SAPO-34 and BETA material to show that an even selection solids has been done. In our busy day-to-day activities such smart figures simplify the interpretation of results by conveying a clear message in a single picture.

General guidelines for a clear visualization

No matter which smart visualization you choose, the message should be conveyed in a simple and concise form to facilitate the communication process. Always keep in mind, advanced statistical and modeling tools are needed to make quantitative conclusions and identify optimal values for key parameters. In this newsletter, we emphasize those that must be considered sine qua non:

1. Keep it simple

Whenever possible, make simple figures and avoid saturation with unnecessary information.

2. Know your audience and define the key message

Determine your audience to tailor the graphics based on your key message. Consider three main types of audiences: managerial, technical or academics and non-technical. They all have different needs.

3. Use colors effectively

Minimize the use of color whenever possible. If you have troubles selecting colors, remember the color wheel.

4. Use the correct tool

Plots need to be informative and engaging. The correct tool, e.g. Matplotlib, R or JMP, can facilitate the preparation of your story and the visual context around it.