Memory Errors

Memory Errors
Image result for Memory Errors
Memory is an electronic storage device, and all electronic storage devices have the potential to incorrectly return information different than what was originally stored. Some technologies are more likely than others to do this. DRAM memory, because of its nature, is likely to return occasional memory errors. DRAM memory stores ones and zeros as charges on small capacitors that must be continually refreshed to ensure that the data is not lost. This is less reliable than the static storage used by SRAMs.

Every bit of memory is either a zero or a one, the standard in a digital system. This in itself helps to eliminate many errors, because slightly distorted values are usually recoverable. For example, in a 5 volt system, a "1" is +5V and a "0" is 0V. If the sensor that is reading the memory value sees +4.2V, it knows that this is really a "1", even though the value isn't +5V. Why? Because the only other choice would be a "0" and 4.2 is much closer to 5 than to 0. However, on rare occasions a+5V might be read as +1.9V and be considered a "0" instead of a "1". When this happens, a memory error has occurred.

There are two kinds of errors that can typically occur in a memory system. The first is called a repeatable or hard error. In this situation, a piece of hardware is broken and will consistently return incorrect results. A bit may be stuck so that it always returns "0" for example, no matter what is written to it. Hard errors usually indicate loose memory modules, blown chips, motherboard defects or other physical problems. They are relatively easy to diagnose and correct because they are consistent and repeatable.

The second kind of error is called a transient or soft error. This occurs when a bit reads back the wrong value once, but subsequently functions correctly. These problems are, understandably, much more difficult to diagnose! They are also, unfortunately, more common. Eventually, a soft error will usually repeat itself, but it can take anywhere from minutes to years for this to happen. Soft errors are sometimes caused by memory that is physically bad, but at least as often they are the result of poor quality motherboards, memory system timings that are set too fast, static shocks, or other similar problems that are not related to the memory directly. In addition, stray radioactivity that is naturally present in materials used in PC systems can cause the occasional soft error. On a system that is not using error detection, transient errors often are written off as operating system bugs or random glitches.

The exact rate of errors returned by modern memory is a matter of some debate. It is agreed that the DRAMs used today are far more reliable than those of five to ten years ago. This has been the chief excuse used by system vendors who have dropped error detection support from their PCs. However, there are factors that make the problem worse in modern systems as well. First, more memory is being used; 10 years ago the typical system had 1 MB to 4 MB of memory; today's systems usually have 16 MB to 64 MB--or much more, since RAM prices have fallen dramatically in the last three years. Second, systems today are running much faster than they used to; the typical memory bus is running from 3 to 10 times the speed of those of older machines. Finally, the quality level of the average PC is way down from the levels of 10 years ago. Cheaply thrown-together PCs, made by assembly houses whose only concern is to get the price down and the machine out the door, often use RAM of very marginal quality.

Regardless of how often memory errors occur, they do occur. How much damage they create depends on when they happen and what it is that they get wrong. If you are playing your favorite game and one of the bits controlling the color of the pixel at screen location (520, 277) is inverted from a one to a zero on one screen redraw, who cares, right? However, if you are defragmenting your hard disk and the memory location containing information to be written to the file allocation table is corrupted, it's a whole different ball game...

The only true protection from memory errors is to use some sort of memory detection or correction protocol. (Well, that's not totally true. The other form of protection is prevention: buying quality components and not abusing or neglecting your system.) Some protocols can only detect errors in one bit of an eight-bit data byte; others can detect errors in more than one bit automatically. Others can both detect and correct memory problems, seamlessly.

Troubleshooting Computer Memory

As electronic devices with no moving parts, memory modules seldom malfunction if they are installed properly. When problems do occur, they may be as obvious as a failed RAM check at boot or as subtle as a few corrupted bits in a datafile. The usual symptom of memory problems is that Windows displays the Blue Screen of Death. Sadly, there are so many other possible causes of a BSOD that it's of little use as a diagnostic aid.

    When Bad Memory Turns Good

    As odd as it sounds, faulty memory is seldom the cause of memory problems. When you experience memory errors, the most likely cause is a marginal, failing, or overloaded power supply. The next most likely cause is system overheating. In particular, if the system works normally when first turned on but develops problems after it's been running for a while, power supply or heat problems are the most likely cause. Only after you have eliminated these possibilities should you consider the possibility that the memory itself is defective.

As a first step in diagnosing memory problems, run Memtest86 (http://www.memtest86.com). Memtest86 is available as executables for DOS, Windows, and Linux, but the most useful form is the bootable ISO image, which can load even on a system with memory problems so severe that Windows or Linux cannot load and run. If you have a Knoppix disk handy, insert that, power up the system, type memtest at the boot prompt, and press Enter. However you get it running, configure Memtest86 to do deep testing and multiple loops. Let it run overnight, and log the results to disk.

When you examine the log, note the addresses where errors occurred. If errors occur reproducibly at the same address or nearby addresses, it's likely that the memory module is defective. If the errors occur at seemingly random addresses, it's more likely that the problem is the power supply or a system temperature that's too high. One possibility, of course, is that the system temperature spikes only when you're gaming or doing graphics work (running the CPU and video card flat out). This effect can make temperature-related component problems difficult to isolate.

    The POST Check

    During POST (Power-On Self Test), most systems test the memory. Although the POST memory test is not nearly as exhaustive as running a memory diagostic utility, it is useful as a "tripwire" test to warn you if severe memory problems occur. Many system BIOSs allow you to disable or abbreviate the POST memory test. We recommend leaving it enabled unless you have so much memory installed that the time required to test it at boot-up is excessive.

If the errors are random, take steps to eliminate the power or heat problem. If the errors occur at reproducible addresses, it's time to start pulling DIMMs. When troubleshooting memory problems, always

    Use standard antistatic precautions. Ground yourself by touching the case frame or power supply before you touch a memory module.
    Remove and reinstall all memory modules to ensure they are seated properly. While you're doing that, it's a good idea to clean the contacts on the memory module. Some people gently rub the contacts with a pencil eraser. We've done that ourselves, but memory manufacturers recommend against it because of possible damage to the contacts. Also, there is always the risk of a fragment from the eraser finding its way into the memory slot, where it can block one or more contacts. Better practice is to use a fresh dollar bill, which has just the right amount of abrasiveness to clean the contacts without damaging them, as shown in Figure 6-7.

The next steps you should take depend on whether you have made any changes to memory recently.

When you have not added memory

If you suspect memory problems but have not added or reconfigured memory (or been inside the case), it's unlikely that the memory itself is causing the problem. Memory does simply die sometimes, and may be killed by electrical surges, but this is uncommon, because the PC power supply itself does a good job of isolating memory and other system components from electrical damage. The most likely problem is a failing power supply. Try one or both of the following:

    If you have another system, install the suspect memory in it. If it runs there, the problem is almost certainly not the memory, but either an inadequate power supply or high temperatures inside the case.
    If you have other memory, install it in the problem system. If it works, you can safely assume that the original memory is defective. More likely is that it will also fail, which strongly indicates power supply or heat problems.

If you have neither another system nor additional memory, and if your system has more than one memory module installed, use binary elimination to determine which module is bad. For example, if you have two modules installed, simply remove one module to see if that cures the problem. If you have four identical modules installed, designate them A, B, C, and D. Install only A and B, restart the system, and run the memory tests again. If no problems occur, A and B are known good and the problem must lie with C and/or D. Remove B and substitute C. If no problems occur, you know that D is bad. If the system fails with A and C, you know that C is bad, but you don't know whether D is bad. Substitute D for C and restart the system to determine if D is good.

    WINDOWS XP IS UNFORGIVING

    Windows 95, 98, 98SE, and ME do not stress memory. If you upgrade to Windows XP or Linux, memory errors may appear on a PC that seemed stable. People often assume that they did something while installing the new OS to cause the errors, but that is seldom true. Such errors almost always indicate a real problem a marginal power supply, overheating, or defective memory. The problem was there all along, but Windows 9X simply ignored it.

When adding memory

If you experience problems when adding memory, note the following:

    If a DIMM appears not to fit, there's good reason. DIMMs are available in many different and mutually incompatible types. Every DIMM has one or more keying notches whose placement corresponds to protrusions in the memory slot. If the keying notches in the DIMM match the slot protrusions, the DIMM is compatible with that slot and can be seated. If the DIMM keying notches don't match the socket protrusions, the DIMM is the wrong type and is prevented physically from seating in that slot.
    Make sure that the DIMM seats fully in the memory slot and that the retaining arms snap into place to secure the DIMM. A partially seated DIMM may appear to be fully seated, and may even appear to work. Sooner or later (probably sooner), problems will develop with that module.
    Verify that the modules are installed in the proper slots to match one of the supported memory configurations listed in your motherboard manual.
    If the system displays a memory mismatch error the first time you restart, that usually indicates no real problem. Follow the prompts to enter Setup, select Save and Exit, and restart the system. The system should then recognize the new memory. Some systems require these extra steps to update CMOS.
    If the system recognizes a newly installed module as half actual size and that module has chips on both sides, the system may recognize only single-banked or single-sided modules. Some systems limit the total number of "sides" that are recognized, so if you have some existing smaller modules installed, try removing them. The system may then recognize the double-side modules. If not, return those modules and replace them with single-side modules.


Diagnosing memory problems on your computer
Applies to Windows 7

If Windows detects possible problems with your computer’s memory, it will prompt you to run the Memory Diagnostics Tool.
Running the Memory Diagnostics Tool

When you receive a notification about a possible memory problem, click the notification to choose between two options for when to run the Memory Diagnostics Tool.

The Memory Diagnostics Tool gives you two options

If you choose to restart your computer and run the tool immediately, make sure that you save your work and close all of your running programs. The Memory Diagnostics Tool will run automatically when you restart Windows. It might take several minutes for the tool to finish checking your computer's memory. Once the test is completed, Windows will restart automatically. If the tool detects errors, you should contact your computer manufacturer for information about fixing them, since memory errors usually indicate a problem with the memory chips in your computer or other hardware problem.
Advanced options for running the Memory Diagnostics Tool

We recommend that you let the Memory Diagnostics Tool run automatically. However, advanced users might want to adjust the tool's settings. Here's how:

    When the Memory Diagnostics Tool starts, press F1.

    You can adjust the following settings:

        Test mix. Choose what type of test you want to run: Basic, Standard, or Extended. The choices are described in the tool.

        Cache. Choose the cache setting you want for each test: Default, On, or Off.

        Pass count. Type the number of times you want to repeat the test.

    Press F10 to start the test.

To run the Memory Diagnostics Tool manually

If the Windows Memory Diagnostics tool doesn't run automatically, you can run it manually.

    Open Memory Diagnostics Tool by clicking the Start button Picture of the Start button, and then clicking Control Panel. In the search box, type Memory, and then click Diagnose your computer's memory problems.? Administrator permission required If you're prompted for an administrator password or confirmation, type the password or provide confirmation.

    Choose when to run the tool.

Five tips for diagnosing memory problems

 Memory problems can be tricky to troubleshoot. But working your way through these diagnostic steps can help you zero in on the cause.

As hardware problems go, memory issues can be among the toughest to diagnose. Occasionally, your computer's BIOS may flat out tell you that memory problems exist. But more often than not, you will have to find the problem on your own. This article offers five tips for diagnosing memory problems on a PC.
1: Look for odd behavior

The first step in diagnosing memory problems is to look for strange behavior -- things like lockups and blue screens -- that might indicate a problem with the machine's memory. For example, just last week I was attempting to make a configuration change on one of my machines. I was using a tool I've used countless times, but it kept returning error messages that made absolutely no sense. In the end, I discovered that the machine was having some memory problems.

Keep in mind that strange behavior alone does not necessarily point to a memory problem. The symptoms I have outlined can also sometimes be traced to problems with a CPU or a system board or even a malware infection. Even so, paying attention to odd behavior is a good first step in diagnosing a memory problem.

2: Run Memtest86

If you suspect that a machine might have a memory problem, I recommend running a free memory diagnostic tool called Memtest86. Unfortunately, memory diagnostic utilities such as this one are not perfect. Some of the machine's memory must be used to run the tool, and that memory range can't be tested. Furthermore, running a memory diagnostic tool usually requires you to shut down the computer you're testing and run the tool from a boot disk. In spite of these drawbacks, I have had good luck with Memtest86.

3: Listen to the beep codes

One way to diagnose memory problems without opening the computer's case or run specialized diagnostic software is to pay attention to the beep codes when you power up the machine. Since beep codes vary from one manufacturer to another, you'll have to look on the manufacturer's Web site to determine the meanings of any beeps you hear.

For example, some machines make one beep at startup to indicate that the machine is healthy. But some of the computers that use AMI BIOS don't beep at all. If you hear a single beep on such a machine, it doesn't mean that the machine is healthy. It usually indicates a DRAM refresh failure. So be sure you check the documentation for the machine you're diagnosing.

4: Check the BIOS

Sometimes, you may not have to use diagnostic software or listen to beep codes. You may be able to look at the machine's BIOS to see how much memory is reported as being installed. Not every memory failure will cause the BIOS to see less memory, but it does happen. Some BIOS will even go so far as to show you how much memory is installed in each slot. If you have such a machine, and it suddenly reports that less memory is being installed, you can look at how much memory is supposedly installed in each slot and use that information to quickly determine which memory module is causing your problem.

5: Use the process of elimination

Once you're relatively sure that a memory problem exists, you have to determine which memory module has gone bad. Occasionally, you might run into a situation in which more than one memory module is bad. If this happens, you can still use the process of elimination to determine where the problem lies, but you will have to test each module individually. In most cases, however, only a single module goes bad at a time.

The first thing I recommend is to reseat all the memory in the system. I've seen quite a few situations over the years in which memory was merely loose, rather than bad. If the problem still exists after reseating, the next step is to begin using the process of elimination to determine which module is bad. Remove one memory module at a time (assuming that the machine does not require memory to be installed in pairs) and test the machine without that module. Through trial and error, you should be able to determine which one of the memory modules is to blame for the problem.

Related Post:

Next
Previous
Click here for Comments

0 comments:



:)
:(
hihi
:-)
:D
=D
:-d
;(
;-(
@-)
:P
:o
:>)
(o)
[-(
:-?
(p)
:-s
(m)
8-)
:-t
:-b
b-(
:-#
=p~
$-)
(y)
(f)
x-)
(k)
(h)
(c)
cheer