# LSI Board Fault Diagnosis using Adaptive Image Restoration to the Thermography

Satoshi NISHINO and Kenji OHSHIMA.

SUMMARY At present, VLSI chips have increased their number of input/output pins, therefore standard methods of fault diagnosis, which uses limited input/output pins, has become difficult. This research establishes a new concept diagnosis method which does not use input/output pins but apply the restoration thermography. This paper compares three methods (Standard method, Histogram method and This method), after this, we confirmed that this method is superior to the other two methods. Consequently, this system has verified the possibility of fault IC diagnosis by thermography restoration using the neural network which was constructed by the little learning data (134). Using this method, a 96.43% fault diagnosis rate was obtained. As for this reason, this paper's neural network which inputs the difference image using learning data of IC coordinate (IC position) with gradation value (IC heat generation temperature) satisfactorily performed the for fault IC image restoration. Moreover, by adding the addition emphasis process, the fault diagnosis rate progressed 1.72% more than the non-addition method.

Key words: fault diagnosis, thermography, LSI board, VLSI

#### 1. Introduction

Due to the development in the semiconductor technology, the fault diagnosis of circuits that have been transformed into VLSI chips have become increasingly difficult. Moreover, the use of standard methods to diagnose faults in complex situations with limited input/output pins, has had limited success. Recently, there has been an increase of IC package types, such as BGA and CSP, that have their IC pins arranged at the back. These IC packages are used by notebook personal computers, cellular phones, and portable electronic machines such as digital video cameras. Consequently, in-circuit test procedures cannot apply to these ICs. Thus, board testing becomes difficult to boards which mount LSI, wherein the BIST is not applied, the boundary scan and the other testable circuits [5]. Moreover, when faults occur in the test circuits of these ICs, diagnosis becomes impossible. Also, making the IC design for testable design increases the overhead chip area thus a decline in speed.

Thus, a new research concept of fault diagnosis using thermography which does not use input/output pins was tried. Actually, this paper discusses the use of thermography to obtain the heat image generated by each IC on the board, and then fault diagnosis at the IC level is performed. Also this paper assumes that faults only occur at the user's side. For example: surge faults occur at the user's side. Actually internal circuit connection shorts (ex: electromigration) or transistor faults does not occur at the manufacturing process[3]. The arrangement of the diagnostic object LSI board is shown in Figure 1. This method uses a neural network which makes use of the image taken from the difference of the normal image and the fault diagnostic image of the LSI board thermograpy which was obtained by the infrared ray camera. After diagnosis, the system shows the result of fault diagnosis as shown by the fault ICs that are enclosed by white boxes.



Fig.1 LSI board.

The thermography indicates the temperature distribution of the LSI board image. The thermography which was used, with the 8bit gray scale, displayed a temperature data with 256 gradations. When the gradation value is small, that represents low temperature, which is close to black as a color. Conversely, when the gradation value is large, it represents high temperature, which is close to white.

#### 2. Construction of a Neural Network

The block diagram of a neural network is shown in Figure 2.



Fig.2 Processing flow

When a neural network is constructed for an image of 320pixels ×2400pixcels, its input and output becomes enormous, and composition becomes impossible. However, the result of this thesis shows that there is a reduction of the neural network size by 1/64. After that, this was divided into 5 in the horizontal direction, and 1 neural network was constructed 240 input/output (=6×40). These outputs were returned to the former size more extensively, and a neural network was constructed. This neural network has an input layer which has a 240 input terminals to gradation of individual pixels (=temperature), medium layer which has 24 units and an output layer which has 240 output terminals. The input/put terminals correspond to 40 pixel × 6 pixels, the output is 240 units which correspond to each coordinate for reconstruction.

There was no problem encountered in this thesis because the image can get the position of the fault IC. Position only has to be restored even if the outputs were expanded and reduced by this composition.

The difference image, with manual operation was drawn up by the binary conversion image of the ideal fault IC extraction image which was used as teaching data for the neural network. In order to assure the acceleration in the decrease and focus of learning error, the input of the neural network is defined by formula (1) which is gradation of a pixel divided by the largest gradation 255. As for the learning parameter, learning factor is 1.0, inclination of the sigmoid function is 0.7. Learning data is 134 images as the input data, the learning was performed 5000 times. As a result, approximately 2 hours were required, the error value of mean square became with 0.005.

Input data = 
$$\frac{Gradation \ of \ a \ pixel}{Largest \ gradation \ 255}$$
 (1)

### 3. Processing Flow

The processing flow is shown in Figure 2. In Figure 2, a correspondence is made from (a) to (h) with (2) to

- (1) Taking in the image (320 pixels × 240 pixels).
- (2) Compilation of mean image (a).
- (3) Taking in the fault diagnostic image (b).
- (4) With the normal image and the fault diagnostic image, it draws up the difference image (the difference is the absolute value) (c).
- (5) Image in 1 of 64 parts reduction.
- (6) The reduction image is divided 5. (Per 1 division 40×6 pixel) (d)(e).
- (7) The neural network is applied to each of the divided image (f).
- (8) It synthesizes the graphics data which was obtained from each neural network, makes one image.
- (9) The image is expanded to its original size.
- (10) The diagnostic image is outputted (g).
- (11) The result of the fault diagnosis is shown by the fault ICs that are enclosed in white boxes (h).

#### 4. The Learning of the Neural Network

The learning policy of the neural network is to get the generalization capability, then this method will use this capability for the diagnosis. In order to achieve this purpose, it was necessary to learn the temperature

# LSI board fault diagnosis using adaptive image restoration to the thermograhy

relations of the IC position on the board to the neural network. The generalization capability of the neural network is that it has the ordinary function of providing proper output if the learning data are suitable. Consequently, it can do so even if it faces a value far from the learning data (a non-learning data) of the neural network.

In order to achieve this capability, a learning data involving four fault ICs to one fault IC on the board was realized, and the image which had random IC position on the board was intentionally setup as basis to realize these facts. Also, the images which had ICs of the different fault conditions were also used as learning data. The neural network used the 96 images to learn, which have one fault IC, and 18 images which have two fault ICs moreover, it used 20 images which have three fault ICs on the board. Therefore total images to learn were 134 for the neural network. Figure 3 shows the example of the learning image (the difference image), the teaching image and the output image. Also, as shown in **Figure 3**, there is one fault IC in (a), two ICs in (b) and three ICs in (c) on the board.



Fig.3 Example of learning and teaching image.

#### 5. Result of Fault Diagnosis

#### 5.1 Fault Diagnosis Rate

Formula (2) below shows how the fault diagnosis ratio is defined, this method is appraised.

The non-learning image (the image which has not been used for neural network learning) example of the result

which was diagnosed is shown in Figure 4.

The number of non-learning images of faults for one IC is 43, 10 for two ICs, 10 for three ICs and 5 for four ICs for a total of 68 images for diagnosis in this case. The entire diagnostic ratio is shown in **Table 1**.

Table 1 Diagnosis rate.

| Number of fault ICs | Fault diagnosis rate[%] |  |  |
|---------------------|-------------------------|--|--|
| 1                   | 95.46                   |  |  |
| 2                   | 94.71                   |  |  |
| 3                   | 93.81                   |  |  |
| 4                   | 94.29                   |  |  |
| Total               | 94.71                   |  |  |

A test image, difference image, teaching image, an output image and result image examples are shown in the Figure 4. Figure 4 (a) is a single IC, in this case. All 21 fault ICs on the board are given to the neural network as teaching image, and it is being made to learn in this case. But, this becomes enormous when more than one four fault ICs gives all combinations as a learning and teaching image from 2 to 4. Therefore, the neural network goes through some teaching image cells by trial and error selection, then the fault diagnosis expects the generalization ability. The example of the result of the generalization ability is shown Figure 4.

As for the Figure 4(b) and (c), Figure (b) is an example of two fault ICs and Figure (c) is the case of three fault ICs. (3) is part of the teaching image of Figure 4(b) and (c) influences a result image. These five teaching images contribute to the correct result images. According to these result images, the neural network properly performed the two IC fault diagnosis of (b) Figure 4. The case of (c)-(5) of Figure 4 is when there are three fault ICs. Three fault ICs are diagnosed from the three teaching image of (c)-(3). These examples of generalization ability which is the merit of the neural network were shown in Figure 4. Also, four IC fault and a low temperature fault are shown in Figure 5. From single IC fault to four IC fault and a low temperature fault were correctly performed by the generalization ability of a neural network in Figure 4 and Figure 5.



Fig.4 Example of diagnosis.



Fig.5 Four fault and law temperature fault.

# 5.2 Discussion of Possible Diagnosis Temperature Ranges

In order to determine whether or not IC heat generation temperature is related to the fault diagnosis, a measure of the fault IC heat generation temperature of the non-learning image, temperature difference of the normal image (temperature of fault IC in non-learning image minus temperature of non-fault IC in normal image) is calculated.

When there is one fault IC, the temperature difference is shown in **Table 2**. From this Table, the method shows that for fault diagnosis, the range of possible temperature difference is -7.91~165.65 [°C]. However, although fault IC No.11 has a 82.27[°C] temperature difference, it could not be diagnosed. In addition, fault IC No.1 shows low temperature from the non-fault temperature, but in this case it was possible to diagnose because the difference is the absolute value.

Our paper [1] demonstrated that temperature of the non-fault IC has the normal distribution by influence of

the ambient temperature. Letting x be the non-fault IC's temperature,  $\mu$  the statistical mean,  $\sigma$  the standard deviation, the equation is represent as,

$$y(x) = \frac{1}{\sqrt{2\pi \cdot \sigma}} \exp\left\{-\frac{(x-\mu)^2}{2\sigma^2}\right\}$$
(3)

 $\mu$ : the statistical mean  $\sigma$ : the standard deviation

As for temperature, the non-fault IC is within the range  $\mu \pm 3\sigma$  (4)

Table 2 Temperature difference from normal temperature.

| IC<br>J | Difference<br>of<br>Diagnosis | 3σ    | Result<br>of<br>Diagnosis | IC<br>] | Difference<br>of<br>Diagnosis | 3σ   | Result<br>of<br>Diagnosis |
|---------|-------------------------------|-------|---------------------------|---------|-------------------------------|------|---------------------------|
| 1       | -7.91                         | 3.48  | 0                         | 12      | -2.21                         | 2.51 | 0                         |
| 2       | 165.65                        | 2.66  | 0                         | 13      | -0.32                         | 1.08 | ×                         |
| 3       | 13,09                         | 5.86  | 0                         | 14      | 39.50                         | 2.92 | 0                         |
| 4       | 71,93                         | 3.74  | 0                         | 15      | -1.99                         | 2.48 | ×                         |
| 5       | 26.47                         | 3.74  | 0                         | 16      | 34.21                         | 2.92 | 0                         |
| 6       | -2.13                         | 3.31  | 0                         | 17      | 17.87                         | 2.08 | ×                         |
| 7       | -2,74                         | 4,13  | 0                         | 18      | 18.36                         | 3.04 | 0                         |
| 8       | -2.60                         | 4.13  | ×                         | 19      | -3.39                         | 2.97 | ×                         |
| 9       | 51,40                         | 10.70 | 0                         | 20      | -1.52                         | 3,35 | ×                         |
| 10      | 2.04                          | 5.14  | ×                         | 21      | -4.55                         | 2.09 | 0                         |
| 11      | 82,27                         | 21.10 | ×                         |         |                               |      |                           |

O: Success of the diagnosis X: Failure of the diagnosis

As shown in Table 2 below, for an IC outside the temperature range of  $\mu \pm 3\sigma$ , it can be seen that a fault can almost be diagnosed. However, even if an IC is outside the  $\mu \pm 3 \sigma$  range this may actually be a non-fault. But, because the IC is connected to other ICs on the board, when the connecting IC breaks down, even this non-fault IC temperature value is shown outside the range (4). While depending upon the fault condition, even the fault IC temperature value is shown inside the range (4). Therefore, it is worthwhile to construct the neural network which designates the heat generation image of the fault IC combination in the learning data fault diagnosis. On the basis of the above discussion, when the formula which is expressed in (4) applies to the Table2, the diagnosis success is many when the IC which is largely outside the range (4). For example IC2, it is 14 and so on and so forth.

For example, the fault-IC (Fault-IC) has a fan-in IC (IC-IN1,2,) and a fan-out IC (IC-OUT1,2,3) as shown in Figure 6. When the IC-IN1,2 and the IC-OUT1,2,3 are non-fault , these ICs takes an unusual temperature value outside the range of  $\mu \pm 3 \, \sigma$ . This is due to the fact that the temperature of those ICs are influenced by

the Fault-IC. However, this depends on the condition of the fault IC, whether or not the temperature value of these IC's fall outside the range. Also conversely, a fault IC itself sometimes generates a temperature value inside the normality range of  $\mu \pm 3\,\sigma$ , by a state of fault. Therefore, when a temperature value is within  $\mu \pm 3\,\sigma$ , its IC is concluded as the normality, the temperature value simply can not be judged. Hence, in order to correspond to the above mentioned phenomenon, the neural network is constructed by using various fault ICs and normal ICs for its learning data.



Fig.6 Influence of fan in and fan out IC.

# 6. Improvement Method by Addition Emphasis

A new process was added to improve this method which is called addition emphasis to the test image. The addition threshold level is 100 that was decided by trial and error. When the gradation of the image is higher, the pixels of those gradation will be added three times. Therefore the thermography pixels lower than the threshold level becomes the judgment element of the neural network. Consequently the light and darkness of the part where there is a fault part and a non-fault on the board is shown clearly, and the diagnostic ratio can be increased. Therefore, from this trial, it was decided to add this procedure to everything as part of the processing flow.

The Improvement method by addition emphasis is shown in **Figure 7**. **Table 3** shows the fault diagnosis rate using the addition emphasis method. From Table 3, it can be seen that there is an improvement in the rate from the previous method. Its value is 1.72 %.

Table 3 Diagnosis rate (After improvement).

| Number of fault ICs | Fault diagnos rate[%] |  |  |
|---------------------|-----------------------|--|--|
| 1                   | 96.68                 |  |  |
| 2                   | 99.05                 |  |  |
| 3                   | 96.67                 |  |  |
| 4                   | 93.33                 |  |  |
| Total               | 96.43                 |  |  |

Compilation of difference image

(a) Test image

(b) Difference image

Addition

Neural network diagnosis

(c) Additional image

(d) Output image

Fault ICs: 2,5,13

Fig.7 Improvement process.

## 7. Comparison with Another Method

This chapter compares three methods (Standard method [4]-[6], Histogram method and This method). Various diagnostic procedures against the multiple faults of the combination circuit are researched, besides, the diagnosis against sequential circuit is difficult[4]. Actually, against the single fault diagnosis, this is not established when a sequential circuit does not have a BIST and boundary scan. The test input formation to multiple fault diagnosis for a sequential circuit is more difficult [5]. Therefore, in this paper, the diagnosis program of a Z-80 chip was made and fault simulation was used together with the technique which were excluded for trial and error (Standard method).

Consequently, the diagnosis rate obtained had a value of 84.60% to the single fault.

Next, in order to determine the kind of composition that is best for the neural network, a similar fault was diagnosed with a neural network of the construction of the Figure 8. This neural network has the 256 inputs and 21 outputs. The inputs given are the gradation value of the difference image, similarly, the outputs represent the fault IC number (1~21). The output value is between 0 and 1. If this value is the upper 0.5, the diagnostic result is the fault. Conversely, if this value is the upper 0.5, the diagnostic result is the fault. Applying this method, the fault diagnosis rate is 87.06%. Even the neural network of the Figure 8 had an image enhancement, but the fault diagnosis rate was not good enough with a fault diagnosis rate of 84.23%. Therefore, for trial and error, an image enhancement was done by changing histogram in Figure 8, upper side, then a fault diagnosis was performed. For the fault diagnosis rate in this case, the emphasis was put on the image by the table processing in the non-line type in Figure 8. Applying this method, the fault diagnosis rate was 87.06%.

As the result, the fault diagnosis rate of three methods are show in **Figure 9**. Therefore, we confirmed that the previously mentioned method (This method) is superior to the other two methods. The histogram method gave a low diagnosis rate in plural IC fault as mentioned in the following.



Fig.8 Histogram method.

The whole gradation value (temperature) of the test image is inputted into the neural network of the Figure 8. Consequently, a positional information of the ICs is not used like a neural network of Figure 2.



Fig.9 Comparison with three methods.

It is important that a positional information is the connection information between the ICs on the board. Because a neural network must learn the influence of the fault IC, this information was already mentioned in chapter 5 using Figure 6.

#### 8. Conclusion

This research establishes a new diagnosis concept which uses restoration thermography. Consequently, this system has verified the possibility of fault IC diagnosis by thermography restoration using a neural network which was construct by the little learning data (134). This method then obtained a 96.43% fault diagnosis rate. Moreover, by adding the addition emphasis process, the fault diagnosis rate improved by 1.72% higher than the non-addition method.

As for this reason, this paper's neural network which inputs the difference image using learning data of IC

coordinate (IC position) with gradation value (IC heat generation temperature) satisfactorily performed the for fault IC image restoration. This method has another merit of a short diagnosis time which is about 0.5 ms. Also, it is easy to recognize the fault IC by the white box image. In addition, it can be concluded that the fault diagnosis ability depends upon the neural network construction.

Recently, the generated heat of a fault is becoming easier to observe from the outside, because ICs are now being applied with a bar chip. Also, the method of this paper is becoming important, because ICs now adopt multiple layer wiring inside the chip to increase the density of the circuit. Hence, the method which observes the generated heat of the IC is becoming more effective.

The next step to take may be how to take the various faults examples to the neural network as learning data in order to increase the diagnostic ratio.

### References

- [1] S.Nishino, K.Ohshima, "A Study on Fault Detection for IC Board by Thermography," IEICE Trans. Information and Communication, vol.J-80-D-I, no.6, pp.514-526, June.1997.
- [2] S.Nishin.o, K.Ohshima, "Fault Detection for IC Board Using Histogram Thermography," IEICE Trans. Information and Communication, vol.J82-D-I, no.9, pp.1154 -1163, Nov.1999.
- [3] E.A.Amerasekera D.S.Campbell, Failure Mechanisms in semiconductor devices, John Wiley & Sons, 1987.
- [4] John B.Gosling, Simullation in the Desgin of Digital Electronic System, Cambridge University Press, 1993.
- [5] Miron Abramoici, Melvin A. Breuer, Arthur D.Friedman, Digital Systems Testing and Testabile Design, IEEE Press, 1990.
- [6] Alexander Micro, Digital Logic Testing and Simulation, John Wiley & Sons, 1986.
- [7] S.Nishino, K.Ohshima, "A Study on LSI board fault diagnosisusing adaptive image restoration to the thermography," ITC-CSCC2003 Proceeding (in Korea), vol.2, pp828-831, July.2003.

[Received Sept. 26, 2003]