A STUDY ON PCB FAULT DETECTION USING BOARD HISTOGRAM BY THERMOGRAPHY

Satoshi NISHINO, Kenji OHSHIMA

Oyama National College of Technology
771 Nakakuki Oyama-shi Tochigi 323 Japan

Tel: + (81) 285 21 0348
Fax: + (81) 285 21 0359

1. Introduction

The fault detection using thermography has been discussed in previous papers. One method used is that of determining whether or not the temperature of each individual IC package on the board is within the threshold level (Individual IC Package Temperature Method: IITM). However, this method involves the tedious task of constructing a very large database and requires a large amount of memory. This paper tries to overcome these shortcomings by using a unique feature of the board's histogram (histogram method). The results show that the amount of effort exerted to perform the histogram method is 1/n that of the IITM, where n is the number of ICs on the PCB. The results obtained also show that the fault detection ability of the histogram method is superior to the individual IC temperature method for large values of n.

2. Principle of fault detection

The fault detection process is done by sampling k number of identical non-defective boards and taking the standard deviation of each board (σ₁, σ₂, ..., σₖ). The average value, μ, of this set of standard deviation is computed and a range of values μ ± 3σ is determined to be the range by which boards are decided to be non-defective.

If the standard deviation (σ) of a test board falls outside this range then the board is judged defective.

\[ a < σ < b \]
\[ a = μ - 3σ \]
\[ b = μ + 3σ \]  

(1)

When individual pixel level is represented by Xi, the standard deviation (σ) is computed as,

\[ σ = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (X_i - μ)^2} \]  

(2)

μ : Average value
N : Total number of pixels

Assuming that the pixels of the IC fall in the range a ~ k equation (2) becomes

\[ \sum_{i=1}^{k} X_i^2 = N(σ^2 + μ^2) \]
\[ -\left(\sum_{i=1}^{σ} X_i^2 + \sum_{i=σ+1}^{N} X_i^2\right) \]  

(3)

In equation (3), the value of Xi can be determined by substituting the values of σ = a, b. This area represents the total heat area for one IC within a temperature range of a ~ k degrees. This result can be used to simulate the IC package heat distribution.
temperature is below $X_L$, the fault cannot be detected.

### Simulation process

The first step is to obtain the temperature distribution (histogram) of the whole non-fault board, then the histogram of the 40 pin IC ($Z_80$ chip) is computed and shifted using the procedure listed below, and the standard deviation $\sigma$ is computed. The total number of pixels enclosing the 40 pin IC package is 3473.

1. Subtract the 40 pin IC histogram area from the histogram of the total board.
2. Shift the 40 pin IC histogram to the 0 point and add it to the result in (1).
3. Shift the 40 pin IC histogram to the maximum point (255 level).
4. Stack the area of the 40 pin IC histogram at the maximum point.

The above mentioned process is illustrated in Fig. 3, with the result shown in Fig 4. This figure shows that fault detection impossible if the minimum heat generated by the 40 pin IC package is above the 146 level (46.6°C). The heat generated by the IC package can assume many patterns, one of those is shown in Fig. 5.

Fig. 6 shows the plot of the two smallest ICS, the often used 74LS00 and the 4049 chip which also has the lowest temperature on the $Z_80$ board. This figure is a plot of the standard deviation when the temperature of the two chips is varied from 10°C to 74°C. According to this result, fault detection is not possible because the standard deviation falls below the threshold level ($\sigma = 27.1$) when varying the temperature from 0 to 74°C. However, when IIITM is applied to the same board, faults in every IC package can be detected, thus, the histogram method is inferior to the IIITM in this respect. However, in chapter 8 and 10, it will be shown that the histogram method is superior to the IIITM.
4. IC package area limit

According to the result in the previous chapter, this method is not as effective as the IITM because of its inability to detect faults in 14 and 16 pin IC packages.

Thus, this chapter aims to discuss the heat area limit of IC packages for which fault detection is possible. The simulation is first done by subtracting the 40 pin IC histogram from the total board's histogram, then the 40 pin IC histogram's heat levels are progressively truncated and shifted to the 255 level and the standard deviation is computed each.

Consequently, it was determined that if there are 1827 pixels above the 255 level (74°C), the standard deviation of the board histogram becomes 27.1, thus it is possible to detect board level fault. The total number of pixels of the 40 pin IC is 3472 which corresponds to 6.76 cm². The simulation resulted into a pixel count of 1827 pixels which corresponds to 3.56 cm². This area is 52% of the 40 pin IC area, which is equivalent to the heat area generated by a 20 pin IC. Thus, when all of the pixels of the 20 pin IC is above 74°C the method presented in this paper can be used to detect faults.

5. MSI board simulation

Fig. 1(b) shows an MSI board whose largest IC

Table 1 40 pin IC fault detection temperature.

<table>
<thead>
<tr>
<th>Threshold</th>
<th>Simulation value X_i(T)</th>
</tr>
</thead>
<tbody>
<tr>
<td>19.6</td>
<td>30.2</td>
</tr>
<tr>
<td>27.1</td>
<td>49.8</td>
</tr>
</tbody>
</table>

Fig. 4 Temperature(level) vs standard deviation(σ) Z-80 chip [Z-80 board].

Fig. 5 40 pin IC heat generated pattern for fault detection.
is the memory chip 2101 and its smallest IC the SN7400 with the simulation results shown in Fig. 7 and 8. It can be seen from these figures that if the temperature for the 2101 and SN7400 IC is over 30, 9°C and 42.3°C respectively, fault detection is possible. The cutoff temperature to detect fault in the 2101 IC is less than the SN7400 IC because the 2101 IC generates a larger heat area than the SN7400 IC.

In the Z80 board it is not possible to detect fault in the SN74LS00 IC but in the MSI board it is possible to detect fault in the SN7400. This is due to the fact that the MSI board has a lesser chip count and second, due to the absence of large ICs with many pins. Also due to this reason, it is possible to detect faults at a lower temperature in the 2101 IC than in the Z80 IC.

6. IC generated heat influence on the board

When an IC on a board becomes defective, it generates high temperature conducting heat to the board and the surrounding space. This results to a shift in the board's histogram to a higher level (temperature). If the effect of the conduction of the heat to the surrounding chips is considered, there will be a larger variation of the board's standard deviation, thus increasing the possibility of fault detection. The algorithm below was used in a simulation to verify this fact.

1) Obtain the histogram of the PCB with a defective Z80 chip, then, obtain the histogram of the same PCB with a non defective Z80 chip inserted and subtract this from the histogram with a defective Z80 chip. This result is then subtracted by the histogram of the Z80 chip to obtain the heat generated in the surrounding chips due to the effect of the defective Z80 chip.

2) Assuming that 20 minutes after power on, the heat generated by the defective Z80 chip has a circular form, as illustrated by Fig. 9, Fig. 10 shows 1/4 of the area representing the heat transmitted to the surrounding area by the

---

Fig. 6 Heat generated of the 4049 and SN74LS00 chip [Z80 board].

Fig. 7 Temperature(level) vs standard deviation(σ) 2101 chip [MSI board].

Fig. 8 Temperature(level) vs standard deviation(σ) SN74LS00 chip [MSI board].
defective Z80 chip.

Shown in Fig. 10, is a quadrant where the cross hatched portion represents the heat transmitted by the defective Z80 chip to the surrounding area. The cross hatched portion HA = Q - T - O, where Q is the quadrant area, T the triangle area and O the sector area. This is represented by the following equation relating the area S with C and a:

\[ S = 4 \left( (C + a) \cos^{-1} \frac{C}{C+a} - \frac{1}{2} C \sqrt{2aC - a^2} \right) \]  

(4)

According to equation (4), the a and S relationship is shown in Table 2 which assumes the functional relationship of a Chebyshev approximation:

\[ S = 0.09a + 1.92a^2 - 4.05a^3 + 10a^4 - 15.9a^5 + 15.6a^6 - 8.67a^7 + 2.26a^8 - 0.14a^9 \]  

(5)

Eqn. (5) shows the relationship of a to S, in order to show the effect of the heat generated by the defective Z80 on the board, it is necessary to convert a to temperature level L (0~255) and S to the number of pixels. At this point, the value of L (level) can be converted to a by the relation \( a = L/62.7 \). The number of pixels in the S area can be obtained by using a simple ratio and proportion involving the total pixel of the Z80 chip, the Z80 chip area and the S area as computed in Eqn.(5).

Using the above relationship, the simulation process used in chapter 3 resulted in an improved fault detection as shown in Fig.11. This figure shows a decrease of about 4°C in the temperature needed before a fault can be detected. The results of the above simulation can be assumed to approximate the actual conditions.

7. IC heat pattern fault detection possibility

According to the simulation result in chapter 3
and shown in Fig. 4, fault detection of the Z80 chip in the Z80 board was possible only if the chip temperature (minimum temperature) was above 46.6°C and the number of pixels on and above the 255 level (74°C) was 160 pixels or more. Moreover, in actual conditions, the heat generated pattern as shown in Fig.5, can assume many different patterns but that the two conditions should be satisfied.

Simulation results of the Z80 board showed that if the number of pixels beyond the 74°C temperature is equal or more than 1827, then fault detection is possible. Using this result, one example of a Z80 chip heat generated pattern is illustrated in Fig. 12. In order to obtain the heat distribution shown in Fig. 12, it is necessary to compute the radius r, as shown in Fig. 13 using Eqn.(6) with the values of a and C.

\[ S = \frac{C}{2} \sqrt{a^2 + 2aC + \frac{1}{2} (C+a)^2} \sin^{-1} \frac{C}{C+a} \]  

\[ (6) \]

8. Method Efficiency Comparison

In the IITM, the amount of data needed to perform fault detection is equal to n, the number of ICs on board the PCB, while the histogram method achieves the same objective by taking only one data, the whole board temperature histogram, in order to perform fault detection. Thus, the amount of effort exerted to perform fault detection in the histogram method is only 1/n that of the IITM.

Recently, the number n of ICs on board PCBs have increased beyond 100 chips thus, using the histogram method decreases the amount of effort needed to perform fault detection. By comparison, the amount of effort needed to perform the IITM is n (equal to the number of ICs) while that of the histogram method and the supply current test is 1.
Table 3  Fault detection ability.

<table>
<thead>
<tr>
<th>Method</th>
<th>IITM : 3σ</th>
<th>Supply current test: 3σ</th>
<th>Histogram method</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Minimum</td>
<td>Maximum</td>
<td></td>
</tr>
<tr>
<td>MSI board</td>
<td>3.8[mA]</td>
<td>21.0[mA]</td>
<td>4.32[mA]</td>
</tr>
<tr>
<td>Z-80 board</td>
<td>1.5[mA]</td>
<td>72.0[mA]</td>
<td>32.0[mA]</td>
</tr>
</tbody>
</table>

Fig.14  IC supply current vs package temperature relation.
Table 4 Fault detection rating.

(a) MSI ボード

<table>
<thead>
<tr>
<th>Method</th>
<th>Detection current</th>
<th>Work</th>
<th>Detection rating</th>
<th>Detection rating (Depending on n)</th>
</tr>
</thead>
<tbody>
<tr>
<td>IITM</td>
<td>3.80</td>
<td>n</td>
<td>3.8n</td>
<td>76</td>
</tr>
<tr>
<td>Supply current test</td>
<td>4.32</td>
<td>1</td>
<td>4.32</td>
<td>19</td>
</tr>
<tr>
<td>Histogram method</td>
<td>39.0</td>
<td>1</td>
<td>39.0</td>
<td></td>
</tr>
</tbody>
</table>

(b) Z-80 ボード

<table>
<thead>
<tr>
<th>Method</th>
<th>Detection current</th>
<th>Work</th>
<th>Detection rating</th>
<th>Detection rating (Depending on n)</th>
</tr>
</thead>
<tbody>
<tr>
<td>IITM</td>
<td>1.5</td>
<td>n</td>
<td>1.5n</td>
<td>30</td>
</tr>
<tr>
<td>Supply current test</td>
<td>32.0</td>
<td>1</td>
<td>32.0</td>
<td>143</td>
</tr>
<tr>
<td>Histogram method</td>
<td>112</td>
<td>1</td>
<td>112</td>
<td></td>
</tr>
</tbody>
</table>

9. Fault detection ability

Shown in Fig. 14 are the supply current and chip temperature relationship of the SN7476, 2101, SN74LS00 and Z80 chips. Using this figure, each method's $3\sigma$ value is converted to its supply current value as shown in Table 3. In both the Z80 and MSI board, the IITM gives a superior result, next in performance is the supply current test followed by the histogram method.

However, when the number of ICs on board the PCB increases, the histogram method is superior to the supply current test because the supply current test's fault detection ability decreases more rapidly than that of the histogram method.

10. Fault Detection Rating

As mentioned before, the amount of effort to perform the histogram method is $1/n$ that of the IITM, where $n$ is the total number of ICs. However, this method's fault detection ability is inferior to the other two methods, namely, the individual IC temperature method and the supply current test. Thus, in this paper, the amount of effort exerted to perform the fault detection multiplied by the fault detection ability shall be defined as the fault detection rating. Table 4 shows a comparison of the three methods with the method having the least fault detection rating as the best method.

The histogram method is superior to the individual IC method and supply current test for large values of $n$.

11. Conclusion

This paper discussed the histogram method which decreases the amount of effort exerted to
perform fault detection when compared to the IITM. As a result, the amount of effort to perform the histogram method is $1/n$ that of the individual IC temperature method, where $n$ is the total number of ICs. Moreover, it was determined that fault detection ability (expressed as supply current value) is inferior to the IITM.

Moreover, it was determined that the fault detection ability (expressed in terms of supply current value) of the individual IC temperature method is inferior to the fault detection ability of the histogram method using thermography to perform these two methods.

Comparing the three methods using the fault detection rating as defined in this paper, showed that the histogram method is superior to the other two methods for large values of $n$ ($n=$ number of ICs on the PCB).

The next step in the research process would be to determine the effect of the board size to the fault detection rating. Moreover, the effect of different IC package sizes on one board to the fault detection rating should also be investigated.

(Received Sept. 30, 1996)