Patents

Method and architecture for DRAM defect management and status display

Abstract

The present invention relates to a method and architecture for DRAM defect management and status display. A monitoring program tests the memory at regular intervals. A backup memory page is assigned first to temporarily store data of the memory page being tested. The data of the memory page being tested is copied to the assigned memory page and a table of look-aside buffer (TLB) is then established to correlate the memory page being tested with the assigned memory page. Through TLB, the memory page being tested is relocated to an assigned memory page and all normal access to that page by the program is redirected to the assigned memory page. When defect is found in a memory page being tested, the monitoring program blocks the memory page being tested continuously and redirects all access requests to the memory page being tested to an assigned memory page according to the TLB. The monitoring program also drives LCD to display information such as testing frequency, full testing report, defect found, memory utilization summary, actual memory size, and so on. Thus, DRAM maintains normal function and data integrity despite the defect.

TW504692B

Taiwan

Other languages
Chinese
Inventor
Chien-Tzu Hou
Hsiu-Ying Hsu

Worldwide applications
2001 TW

Application TW90109590A events
2002-10-01
Application granted

Description

504692 V. Description of the Invention (l) The present invention relates to a method and structure for repairing and displaying defects of dynamic random access memory, especially a kind of memory that is bad and invalid in dynamic random access memory (DRAM). The page (memryry page) is redirected to a predetermined backup memory and displays various information about the memory usage status' a design that allows the memory to operate smoothly even in a defective situation. Technical background: According to the past 25 years, the demand for the storage capacity of dynamic random access memory (hereinafter referred to as DRAM) has increased by 106 times. This is due to the introduction of a capacitor cell, a capacitor, and a capacitor. The scaling and introduction of stacked capacitors, and the application of various technologies for transistor scaling have greatly reduced the size of DRAM cells, allowing each chip to have a higher cell density. Unfortunately, along with the increase in density, the processing costs of the aforementioned minimization feature have also increased rapidly. Another disadvantage of high-density DRAM is that as the density continues to increase, even if it is a good DRAM, it is prone to electronic puncture when used, thus accelerating its attenuation rate, and therefore reducing the integrity of the data stored in it. The main lethal point of high-level server memory for hierarchical integrity. As far as the stability of DRAM is concerned, its product life cycle (bathtub cUrve) as shown in Figure 1 can be divided into the initial unstable period (infant mortailty), stable use period (usefui Hfe) and Three stages of product wearout period α In the initial unstable period,

Page 5 504692 V. Description of the invention (2) Because the DRAM is cut, tested, and packaged on the wafer (wafer), in order to avoid defects caused by the process (such as impurity deposition), the MEMS cannot be accessed normally. 'Must undergo various tests and repairs (such as lasers or capacitors) in order to obtain good products that can be used. These and no repairs can be repaired.' = The cost of test and repair accounts for a very high ratio of production costs, and it is impossible to convert Lower costs to achieve better competitiveness.

The good products obtained after the foregoing steps can still operate normally but are still extremely unstable. Therefore, DRAM manufacturers usually perform burn-in during the initial unstable period, using high temperature and high pressure environments. , DRAM will be put into stable use early, so that the Nine Dragons purchased by consumers will have good working stability. After using for a period of time, users will gradually get into the product aging period due to the voltage and temperature of the DRAM material and the working environment. In this stage, the instability of the DRAM operation is raised to the south. When the machine crashes and the execution is unstable, at the current stage, 'when the user notices the aforementioned phenomenon of the system, he often adopts new measures and measures, so DRAM ends its product life.

But in fact, because DRAM is a basic storage unit that is cut into multiple memory pages, that is, the aging phenomenon of DRAM is caused by the inability to access data due to the aging of memory pages. At present, most systems use error correction codes EcC (error correction code) to detect data access errors and correct them. Error correction code. ECC basically detects n bi t and m bi t data; m 〇 For example: the bus is a 64-bit DRAJ |, you can use the 8-bit error correction code ECC, that is, every 8 bits Metadata error correction code ECC to detect errors and correct them, but the data bits have an additional 8 bit error correction

Page 6 504692 V. Description of the invention (3) The code 'but lengthening the 8-bit length of the data will increase the memory cost by 1/8, so for manufacturers, in order to achieve detection, correction and cost considerations, errors The correction code ECC adopts an 8-bit length, which is more appropriate. Therefore, the error correction code ECC is limited to 2-bit detection and 1-bit correction. Once a single-bit error is converted into a double-bit Errors will form hard errors that cannot be repaired ° In order to prevent single-bit errors from turning into double-bit errors, the current error correction 'ECC will detect the data, the system's normal operation will temporarily stop and execute a special program To check whether there is an error in the data, when a single-bit error is found, it will be repaired immediately, but the occurrence of a single-bit error means that the DRAM operation is unstable, and the system execution appears unstable, and the error occurs. Although the site is repaired, it is difficult to guarantee that it will not happen again next time, and it may be converted into a double-bit error due to instability, causing the DRAM to fail to operate and will have to be replaced due to an error correction code The operation of ECC is entirely performed by hardware. The user has no way of knowing the operating status of the DRAM. In this case, the system must be shut down, reinstalled, and restarted from time to time. However, in most working environments, the system is not It is allowed to be shut down, especially the internal network server of a large enterprise. Once shut down, it will inevitably cause the shutdown of internal work, increase the cost of downtime and the maintenance cost of server memory. For this reason, the main purpose of the present invention is to provide A method and structure for defect repair and status display of dynamic random access memory, which mainly starts a test periodically through a supervisory program, and provides real-time testing and memory page repair in the three cycles of the aforementioned DRAM, enabling DRAM manufacturing The merchant does not need to perform any tests during the # period of instability before it can be sold at the factory to save testing and repair

Page 7 504692 V. Description of the invention (4) Multiple costs and expenses, and the ram will not crash due to abnormal operation of one of the memory pages during system use, which can prolong the life cycle of DRAM products, especially cannot be shut down and occurred. A faulty server system can maintain normal access operation. 'Reducing the number of DRAM replacements, the system's shutdown frequency, and a high degree of data integrity. According to the foregoing, the present invention predetermines a plurality of backup memory pages in the DRAM as memory during the memory page test. The temporary storage place of the data is to copy the memory data of the tested memory page into this predetermined backup memory page, and then establish a corresponding buffer table (TLB) to indicate the relative mapping position of the tested memory page and the predetermined backup memory page. Corresponding to the buffer table, the tested memory page is reconfigured to a predetermined backup memory page, and the supervisor program temporarily freezes (bl0Ck) the access action of the tested page; when the test finds a defective memory page, the supervisor program will continue to freeze the memory page. The memory page under test, and any access to the memory page will be changed to a predetermined backup according to the corresponding buffer table Recalling the page, so that the data access operations are assigned to the backup memory page, so that the DRAM can operate normally with or without defects and a high degree of data integrity. Another object of the present invention is to drive an LCD through the CPU, Information such as test frequency, complete report, errors found, total memory utilization, and actual memory size indicate that users can grasp and observe the status of DRAM usage at any time. Another object of the present invention is to perform an error correction code check procedure through a supervisory program when data is copied to a backup memory page. If a single-bit or double-bit error occurs, the check program will record whether the memory page is Unstable or irreparable, and strengthen inspection in the future to avoid single bit transfer

Page 8 504692 V. Description of the invention (5) It becomes a double bit error. The structural design and technical principles of the present invention will be described in detail below, and the features of the present invention will be further understood by referring to the attached drawings: As shown in FIG. 2, the present invention can be implemented by hardware Or implemented by software technology, the dynamic random access memory / DRAM10 architecture includes: a supervisory program 20, which regularly checks the integrity of the data stored in the DRAM; a timer 30, which provides the frequency of the test cycle to Supervision program; a display element 40 (in this embodiment, an LCD liquid crystal display element is used, or is directly displayed through a monitor), and is used to display various messages related to the DRAM 10. Φ As shown in the process steps in Figure 3, after each cycle starts, the supervisory program 20 will first reserve a backup memory page as the memory page to be tested. 丨 The temporary storage address of the data (because DRAM10 uses memory page elements) Units are stored sequentially, so it is usually the last memory page of DRAM10). The test memory fll memory data is copied to a predetermined backup memory page 12, and a corresponding buffer table (Table of Look-Aside Buffer, TLB) is created. ) Is used to indicate the relative mapping position between the tested memory page 11 and the predetermined backup memory page 12, and through the corresponding buffer table, the tested memory page 11 is relocated to the backup memory page 12 \ so it will not affect The original access operation of the system, and at the same time, the monitoring program 20 also temporarily blocks the memory page to be tested, and starts the test of the memory page. In this embodiment, the monitoring program 20 is checked page by page;

Page 9 504692 V. Description of the invention (6) When no error is found, the data on this page will be backed up from the predetermined backup memory page 丨 2 to the tested memory page 11 and its access action will be reopened, and the next memory will be continued. Page test. The foregoing memory page inspection can be implemented in the present invention in the following ways: 1. Does not include an error correction code (ECC) inspection method: mainly through normal hardware tests, The continuous action of writing and then reading to test whether it can be accessed normally. If not, the table is the memory page error. 2. Including error correction code ECC check method: The supervisor program will perform the error correction code ECC check process at the same time when copying data to the backup memory page. If a single bit error occurs, the check program will record whether the memory page is unstable Or it cannot be repaired and strengthened inspection; if it happens again, the memory page where the error occurred will be frozen to avoid the single-bit conversion to double-bit errors, and any access to the memory page will be based on the corresponding buffer table rain Change to the scheduled backup memory page to maintain normal access operation. When it is found that there is a defect in the tested memory page 11 of the DRAM10 (such as the aforementioned electronic drilling), or an error occurs, the supervisory program 20 will keep reading and freezing the tested memory page 11 and any access to the memory page The actions of 1 丨 will be changed to the predetermined backup memory page 12 according to the corresponding buffer table. Therefore, the original backup memory page 12 will continue to be occupied. In order to test the next memory page, the supervisory program 20 must reserve another Back up memory page 12 to temporarily store data for the next tested memory page. "At the same time, the display element 4 (LCD) will be tested simultaneously, such as test frequency, complete report, errors found (such as the number of ECC errors,

Page 10 504692 V. Description of the invention (7) Numbers that can be repaired (non-repairable), total memory usage and actual memory size are displayed, so that users can grasp the status of DRAM10 in real time. 0 In addition, the display element 40 (LCD) display will remain as it is until the next test cycle begins. Based on the above, the following steps can be summarized: a. Reserve a backup memory page 12 as the temporary storage address of the tested page data 11; b. After each test cycle starts, copy the memory data of the tested memory page 11 to the aforementioned backup memory Page 12; c · Establish a corresponding buffer table to indicate the relative mapping position of the tested memory page 丨 丨 with the predetermined backup memory page 1 2; and reconfigure the tested memory page 11 to the predetermined backup memory through the corresponding buffer table Page 12 to change the access action to the memory page being backed up; d. Start the test; e · If no errors are found, back up the memory data of backup memory page 12 to the memory page 11 to be tested and reopen its access action , And continue to the next memory page test;

f · If an error is found, the monitoring program 20 will continue to freeze the tested memory page 11 and any access to the memory page will be changed to a predetermined backup memory page according to the corresponding cache table to maintain normal Access operation; g. Display test results or DRAM usage status through transparent display elements. ”In summary, the present invention has the following advantages: 1 · DRAM manufacturers do not need to go through any testing after completing packaging.

Page 11 504692

Test, and the test process is completely carried out in the user's system, to maintain the normal operation of the system for a lifetime, no need to do unnecessary tests, repair costs. Top 2 · Under the premise that the server cannot be shut down and an error occurs, the invention's = trial and function maintenance can keep the DRAM operation normal and display through the LCD: the maintainer can fully grasp the operation status of 卯 0, which will make Minimize downtime costs and server memory maintenance costs. 3. When the error correction code is checked by ECC, the execution efficiency to the system is still normal, and will not affect

In summary, the method and structure for defect repair and status display of the dynamic random access memory provided by the present invention freezes and repairs damaged memory pages in real time through a monitoring program, and simultaneously The use status shows that users can keep track of the usage status of DRAM at any time, and can maintain normal access and high data integrity due to errors. It is effective for the lack of traditional memory defects and the need to replace the entire memory module. The solutions and countermeasures have indeed met the application requirements for patents. We urge the Bureau to examine them in detail and grant the patents to the benefit of the people and the country. Only the methods, techniques, narration, programs, or control methods described above are only one of the preferred embodiments of the present invention; for example, any equal change or modification or extraction of t made according to the technology of the patent application of the present invention The same production of functions shall still fall within the scope covered by the patent right of the present invention; when the scope of implementation of the present invention cannot be limited, c

504692 Brief description of the drawings Description of the drawings: Figure 1 is a curve diagram of the DRAM bath tank; Figure 2 is a schematic diagram of the memory module structure of the present invention; Figure 3 is a flowchart of the operation steps of the present invention.

10 DRAM 11 Memory page under test 12 Backup memory page 20 Supervisory program 3 0 Timer 40 Display element

Page 13

Claims (1)
Hide Dependent

  1. 刈 4692
    A method for maintaining and displaying the function of dynamic random access memory, mainly = a supervisory program routinely detects the operation status of the memory data integrity of each memory page of the dynamic random access memory (dram) and maintains normal The operation includes the following steps: a • Reserve a backup memory page as the temporary storage address of the test page data; after each test cycle starts, copy the memory data of the tested memory page to copy the aforementioned backup memory page; c · Establish a corresponding buffer table to indicate the relative mapping position of the tested memory page and the predetermined backup memory page; and reconfigure the tested memory page to the predetermined memory page through the corresponding buffer table, so that the access action is changed to be ##; If the test is correct, back up the memory data of the backup memory page to the memory page under test 'and reopen its access action, and continue the test of the next memory page; ^ e. If the test finds If there is an error, the monitoring program will continue to freeze the tested memory page, and any access to the tested memory page will be based on the corresponding buffer Is changed to the predetermined backup memory pages; f · display the test result display element ❹
    2. The method for maintaining and displaying the function of the dynamic random access memory as described in item 1 of the scope of the patent application, wherein the supervised light test memory page is a page of supervised programs and page by page. Page by page check by page β 3 · The method for maintaining and displaying the dynamic random access memory function as described in item 1 of the scope of the patent application, wherein the test cycle of the supervisory program is supplied by a timer. I ^ ΒΠΒΒ Page 14 504692 VI. Application for patent scope 4. The method for maintaining and displaying the dynamic random access memory described in item 1 of the scope of patent application, wherein the display element is a liquid crystal display element (LCD), monitor器 等。 And other. 5. The method for maintaining and displaying the dynamic random access memory function as described in item 1 of the scope of patent application, wherein the results displayed in step f include: test frequency, complete report, found errors, total memory utilization and actual memory Body size and other information so that users can keep abreast of DRAM usage
    L The method for maintaining and displaying the dynamic random access memory described in item 1 of the scope of the patent application, wherein the display content of the display element is maintained as it is until the start of the next test cycle. 7. The method for maintaining and displaying the function of the dynamic random access memory according to item 1 of the scope of the patent application, wherein the memory page under test in step e is occupied by the holding 1 'When the next memory page is tested, the supervisory program is re- Book another backed up memory page so that the tested memory page continues to temporarily store data, corresponding to the buffer table and record the memory page where the defect is found, and the mapping relationship between the next tested memory page and the scheduled memory page β
    8. The method for maintaining and displaying the dynamic random access memory function described in item 1 of the scope of the patent application, wherein the memory page inspection further includes a method that does not include error repair JE code inspection, which is passed through normal hardware testing. The continuous action of writing to and reading from the memory page is to test whether it can be accessed normally. If not, it means that the memory page has an error. 9. The method for maintaining and displaying the function of the dynamic random access memory according to item 丨 in the scope of the patent application, wherein the memory page inspection further includes an error correction code inspection method based on the aforementioned supervisory program to copy the data to the backup memory
    Page 15 504692 6. When the patent application page is applied at the same time, if a single-bit error occurs, the 100 million pages will be recorded as unstable and repaired and strengthened; if the same error occurs, reissue the error; if the error disappears Avoiding the conversion from a single bit to a double bit ~ Perform step d as described in item 1.
    Page 16