Memory Technologies

In principle, there are Static, Dynamic, and Non-volatile Memory technologies.

 

Non Volatile Memory

What all Non-volatile Memories have in common is that they retain their contents, i.e., the stored data, even if the power supply is switched off.  They are randomly accessible and the memory cannot be changed (there are a few caveats to this statement).

                                               

                                                                  ROM

 


                                   

                                    Bipolar                                                 MOS

 


                       

                        Mask               PROM’S         mask                PROM’s          EPROM’s

                        ROM’s                                                ROM’s                                                EEPROM’s

                                                                                                                         Flash

 

  ROM or mask programmable ROM

This is a semiconductor type of read-only memory whose stored information is programmed during the manufacturing process. In essence this type of ROM is an array of possible storage cells with a final layer of metal forming the interconnections to determine which of the cells will hold 0’s and is. Although the contents may be read they may not be written. It is programmed by the manufacturer at the time of production using a specially constructed template. All masked ROMs are therefore very similar, since most of their layers are the same with the differences only being introduced  in the final layer metal mask.

Masked ROMs are typically manufactured in high volumes in order to minimise costs. They can cram a lot of data onto a relatively small chip area; unfortunately, because the information is programmed by the design of a mask which is used during the manufacturing process, it cannot be erased. Also, as the final metal layer is deposited at the chip factory, any repair of a masked ROM requires a long turnaround time. The process of repair consists of identifying the error(s), notifying the chip firm, and altering the mask before finally fabricating new chips. Any detected bug also has the unwanted effect of leaving the user with lots of worthless parts. Most commentators usually refer to mask—programmable ROMs as simply ROM’s.

 

PROM — A programmable read—only memory (PROM) is part of the read—only memory (ROM) family. It is a programmable chip which once written cannot be erased and rewritten, i.e. it is a read only after written memory. To implant programs, or data, into a PROM a programming machine (called a PROM programmer) is used to apply the correct voltage for the proper time to the appropriate addresses selected by the programmer. As the PROM is simply an array of fusible links the programming machine essentially blows the various unwanted links within the PROM leaving the correct data patterns, a process which clearly cannot be reversed. Like the ROM, the PROM is normally used as the component within a computer used to carry any permanent instructions that the system may require.

 

EPROM — An erasable programmable read—only memory (EPROM) is a special form of semiconductor read only memory that can be completely erased by exposure to ultraviolet light. The device is programmed in a similar way to the programmable read—only memory (PROM); however, it does not depend on a permanent fusible link to store information, but instead relies on charges stored on capacitors in the memory array. The capacitors determine the on/off state of transistors, which in turn determine the presence of is or 0’s in the array.

The EPROM is so arranged that the information programmed into it can be erased, if required, by exposing the top surface of the package to ultraviolet radiation. This brings about an ionizing action within the package, which causes each memory cell to be discharged. EPROMs are easily identified physically by the clear window that covers the chip to admit the ultraviolet light. Once an EPROM has been erased, it can be reprogrammed with the matrix being used again to store new information. The user can then completely erase and reprogram the contents of the memory as many times as desired.

Intel first introduced the EPROM in 1971; however, the storage capacity has increased dramatically with improving IC technology. Current EPROMs can store multiple megabytes of information.

 

EEPROM — An electrically erasable programmable ROM (EEPROM) is a closely related device to the erasable programmable ROM (EPROM) in that it is programmed in a similar way, but the program is erased not with ultraviolet light but by the use of electricity. Erasure of the device is achieved by applying a strong current pulse, which removes the entire program, thus leaving the device ready to be reprogrammed. The voltages necessary to erase the EEPROM can either be applied to the device outside or (more often) from within the host system, thereby allowing systems to be reprogrammed regularly without disturbing the EEPROM chips. In this way electrical eras ability does yield certain benefits; however, this comes at the cost of fewer memory cells per chip and lower density, than on a standard ROM or EPROM.

 

 

Flash Memory

A characteristic of Flash Memories is that individual bytes can be addressed and read out, whereas write and delete processes operate on blocks of addresses at a time. Read access times, currently about l00ns, are about double those of Dynamic Memories. The number of programming and delete cycles is limited to about 100,000. In gener­al, the retention of data is guaranteed over a period of 10 years. Among the various forms of Flash Memories available are SIMM, PC Card (PCMCIA), Compact Flash (CF) Card, Miniature Card (MC), and Solid State Floppy Disc Card (SSFDC). Over and above their exterior appearance, there are two main types of Flash Memory modules:

Linear Flash and ATA Flash. Linear Flash modules have a linear’ address space and any address can be directly accessed from out­side. On the other hand, for the ATA Flash cards address conversion takes place internally, so that the addressing procedure is similar to that of a disk drive, a fact that may for instance simplify driver programming. Examples of the application of Flash modules are mass or program memories in notebooks, network router, printers, PDAs, and digital cameras.

 

 

 

 

 

RAM (Random Access Memory)

Static RAM

This memory is based on transistor technology, and does not require refreshing. It is random access and is volatile i.e. it loses its data if the power is removed. It consumes more power (thus generates more heat) than the dynamic type, and is significantly faster. It is often used in high speed computers or as cache memory. Another disadvantage is that the technology uses more silicon space per storage cell than dynamic memory, thus chip capacities are a lot less than dynamic chips. Access times of less than 15ns are currently available.

 

 

 

 

 Dynamic Random Access Memory (DRAM)

 

Basic DRAM operation

 

In order to store lots of small things  we can divide the storage space up into small bins and stick one item in each bin. If each item we're storing is unique and we're ever going to be able to retrieve a specific item, we need an organisational scheme to order the storage space. Sticking a unique, numerical address on each bin is the normal approach. The addresses will start at some number and increment by one for each bin.  If we wanted

to search the entire storage space, we'd start with the lowest address and step through each successive one until we get to the higher address.

Now, once we've got the storage space organised properly, we'll need a way to get the items into and out of it. For RAM storage, the data bus is what allows us to move stuff into and out of storage. And of course, since the storage space is organised, we need a way to tell the RAM exactly which location contains the exact data that we need; this is the job of the address bus. To the CPU, the RAM looks like one long, thin line of storage cells, each with a unique address. If the CPU wants a piece of data from RAM, it first places the address of the location on the address bus. It then waits a few cycles and listens on the data bus for the requested information to show up. 

 

 

           

 

Fig 1 Simple Model of DRAM

 

 

 

 

The round dots in the middle are memory cells, and each one is hooked into a unique address line. The address decoder takes the address off of the address bus and identifies which cell the address is referring to. It then activates that cell, and the data in it drops down into the data interface where it's placed on the data bus and sent back to the CPU. The CPU sees those cells as a row of addressable storage spaces that hold 1 byte each, so it understands memory as a row of bytes. The CPU usually grabs data in 32-bit or 64-bit chunks, depending on the width of the data bus. So if the data bus is 64-bits wide and the CPU needs one particular byte, it'll go ahead and grab the byte it needs along with the 7 bytes that are next to it. It grabs 8 bytes at a time because:

 

a) it wants to fill up the entire data bus with data every time it makes a request and

b) it'll probably end up need those other 7 bytes shortly.

 

Instead of trying to design and manufacture a chip that's long enough to accommodate

a few million bytes in a row, a better, less expensive way to approach the problem is to organise the cells in a grid of individual bits and split up the address into rows and columns, which you can use to locate the individual bit that you need. This way, if you wanted to store, say, 1024 bits, you can use a 32 x 32 grid to do so. Obviously, a 32 x 32

grid is a much more compact design than a single row of 1024 bits.  RAM chips don't store whole bytes, but rather they store individual bits in a grid, which you can address one bit at a time.

 

When the CPU requests an individual bit it would place an address in the form of a string of 22 binary digits (for the x86) on the address bus. The RAM interface would then break that string of numbers in half, and use one half as an 11 digit row address and one half as an 11 digit column address. The row decoder would decode the row address and activate the proper row line so that all the cells on that row become active. Then the column decoder would decode the column address and activate the proper column line, selecting which particular cell on the active row is going to have its data sent back out over the data bus by the data interface. Also, note that the grid does not have to be square, and in fact in

real life it's usually a rectangle where the number of rows is less than the number of columns

 

 

                       

Figure 2 Row and Column Addressing

 

The cells are actually comprised of capacitors and are addressed via row and column decoders, which in turn receive their signals from the RAS and CAS clock generators. In order to minimise the package size, the row and column addresses are multiplexed into row and column address buffers. For example, if there are 11 address lines, there will be 11 row and 11 column address buffers. Access transistors called ‘sense amps’ are connected to the each column and provide the read and restore operations of the chip.

 

 

DRAM Read

 

1) The row address is placed on the address pins via the address bus.

 

2) The /RAS pin is activated, which places the row address onto the Row

Address Latch.

 

3) The Row Address Decoder selects the proper row to be sent to the sense

amps

 

4) The Write Enable (not pictured) is deactivated, so the DRAM knows that

it's not being written to.

 

5) The column address is placed on the address pins via the address bus.

 

6) The /CAS pin is activated, which places the column address on the Column

Address Latch.

 

7) The /CAS pin also serves as the Output Enable, so once the /CAS signal has

stabilised the sense amps place the data from the selected row and column on the

Data Out pin so that it can travel the data bus back out into the system.

 

8) /RAS and /CAS are both deactivated so that the cycle can begin again.

 

 

 

 

 

                       

                        Figure 3 SRAM Read

 

 

 

One of the problems with DRAM cells is that they leak their charges out over time, so that charge has to be refreshed if the DRAM is actually going to be useful as a data storage device. Reading from or writing to a DRAM cell refreshes its charge, so the most common way of refreshing a DRAM is to read periodically from each cell. This isn't quite as bad as it sounds for a couple of reasons. First, you can sort of cheat by only activating each row using /RAS, which is how refreshing is normally done. Second, the DRAM controller takes care of scheduling the refreshes and making sure that they don't interfere with regular reads and writes. So to keep the data in DRAM chip from leaking away the

DRAM controller periodically sweeps through all of the rows by cycling RAS repeatedly and placing a series of row addresses on the address bus.

A RAM grid is always organised as a rectangle, and not a perfect square. With DRAMs, it is advantageous to have fewer rows and more columns because the fewer rows you have, the less time it takes to refresh all the rows.

Even though the DRAM controller handles all the refreshes and tries to schedule them for maximum performance, having to go through and refresh each row every few milliseconds can seriously get in the way of reads and writes and thus impact the performance of DRAM. EDO, Fast Page, and the various other flavours of

DRAM are mainly distinguished by the ways in which they try to get around this

potential bottleneck.

 

Each of the cells in an SRAM or DRAM chip traps only a 1 or a 0. Also, the early DRAM and SRAM chips only had one Data In and one Data Out pin apiece. Now, the CPU actually sees main memory as a long row of 1-byte cells, not 1-bit cells.

Therefore to store a complete byte  just stack eight, 1-bit RAM chips together, and have each chip store one bit of the final byte. This involves feeding the same address to all eight chips, and having each chip's 1-bit output go to one line of the data bus. The following diagram should help you visualise the layout. (To save space, I used a 4-bit configuration, but it should be easy to see how you can extend this to eight bits by just adding four more chips and four more data bus lines. Just pretend that the picture below is twice as wide, and that there are eight chips on the module instead of four.

 

 

Figure 4 DRAM Organisation

 

 

 

Sticking those eight chips on one printed circuit board (PCB) with a common

address and data bus would make an 8-bit RAM module. To the CPU, the above,

single-module configuration would look just like one big RAM chip that, instead

of holding just one bit in each cell, holds four.

In the above picture, we will assume the address bus is 22 bits wide and the data bus is 8 bits wide. This means that each single chip in the module holds 222 or 4194304 bits. When the eight chips are put together on the module, with each of their 1-bit outputs connected to a single line of the 8-bit data bus, the module appears to the CPU to hold 4194304 cells of 8 bits (1 byte) each (or as a 4MB chip). So the CPU asks the module for data in 1 byte chunks from one of the 4194304 virtual 8-bit locations. In RAM notation, we say that this 4MB module is a 4194304 x 8 module (or alternatively, a 4M x 8 module. Note that the M in 4M is not equal to MB or megabyte, but to Mb or megabit.)

The now-obsolete SIMM on which I've based the 4M x 8 module discussed above is the TM4100GAD8, from Texas Instruments. The TM4100GAD8 is a 4MB, 30-pin SIMM that's organised as eight, 4M x 1 DRAM chips. Here's some info from the datasheet

 

 

                       

                        Figure 5 SIMM Module

 

 

 

 

The CPU likes to fill up its entire 32-bit (i486) or 64-bit (Pentium and up) data bus when it fetches data. So the problem arises of how to fill up that 32-bit or 64-bit data bus. Well,

the answer is, unsurprisingly, more of the same. Except this time, instead of stacking the outputs of multiple chips together on one module, we stack the outputs of multiple modules together into one RAM bank, Figure 6 shows you one bank of four, 8-bit modules. Assume that each chip in each module is a 4194304 x 1 chip, making each module a 4194304 x 8 (4 MB) module. The following bank then, with the 8-bit data buses from each module combined gives a bus width of 32 bits.

 

 

           

            Figure 6 RAM Bank

 

 

The 16MB of memory that the above bank represents is broken up between the

modules so that each module stores every fourth byte. So, module 1 stores byte

1, module 2 stores byte 2, module 3 stores byte 3, module 4 stores byte 4,

module 1 stores byte 5, module 2 stores byte 6, and so on up to byte 16,777,216.This is done so that when the CPU needs a particular byte, it can not only grab the byte it needs but it can also put the rest of the adjacent bytes on the data bus, too, and bring them all in at the same time.

To add memory to a system like this, you can do one of two things. The first option would be to increase the size of the bank by increasing the size of each individual module by the same amount. Say you wanted 32MB of memory; you'd increase the amount of storage on each module from 4MB to 8MB. The other option would be to add more banks. The example above shows what a RAM bank on some i486 systems would actually have looked like, with each of the modules being a 30-pin, single-sided SIMM.

 

A SIMM (Single In-line Memory Module) is a basic DRAM packaging type that

fits most older systems. These SIMMs come in sizes from a few kilobytes (for really ancient SIMMs) to 16MB, and can be either single-sided (with RAM chips on one

side only, like the TM4100GAD8) or double-sided (with RAM chips on both sides).

A 30-pin SIMM, like our TM4100GAD8, can only spit out 8 bits of data at a time.

As you can see from the datasheet for the TM4100GAD8, after you put in those 8

data pins, address pins, and control pins, you just don't have too much room for

anything else. (Actually, the TM4100GAD8 had some unconnected pins, which you

could convert to address pins if you needed to put more memory on the SIMM. But

there still isn't enough room to make the data bus wider.)

The more advanced SIMM flavor is the 72-pin SIMM. These SIMMs not only come

in larger sizes, but they have wider data buses as well. A 72-pin SIMM spits out 32 bits of data at a time, which means that a system based on 72-pin SIMMs can use one SIM per bank, because the output of each SIMM is enough to fill up the entire data bus.

And then there's the DIMM, which has 168 pins and a data bus width of 64 bits.

 

Modern high density DRAM chips have more than one Data In or Data Out pin and can in fact have 4, 8, 16, 32 or even 64 data pins per chip.

When a DRAM chip has 8 Data pins, this means that it has grouped its cells internally into 8-bit chunks and it has interface circuitry that allows you to address its data 8 bits at a time, instead of one bit at a time like with the older devices. Putting 8 data pins on a single DRAM chip can make your life easier. For instance, if instead of the 4M x 1-bit chips that the TM4100GAD8 uses you used 1M x 8-bit chips, you could create a 4MB, 32-bit wide module with only 4 chips. Reducing the number of chips down to four reduces the power consumption, and makes the module easier to manufacture.

Figure 7 shows a 4MB SIMM that's organised as four, 1M x 8-bit DRAM chips.

 

 

           

            Figure 7

 

 

While 8 bits worth of data pins in a DRAM bank actually makes the memory organisation of a single SIMM a bit simpler and easier to understand, putting 16 or more bits worth of data pins on a single chip can actually make things more confusing.

The DIMM in Figure 7 is the Texas Instruments TM124BBJ32F. The TM124BBJ32F is a 4MB, 32-bit wide DRAM, which has only two RAM chips on it. This means that each chip is 16 bits wide and holds 2 MB. Externally, however, to the system as a whole, the module appears to be made up of four, 1M x 8-bit DRAM chips.  Each of those 2M x 16-bit DIMMs is almost like a mini DRAM module, with an upper and lower half of 1M apiece, where each half has its own CAS and RAS signals.

 

Since asynchronous DRAM doesn't operate based on any kind of common system clock pulse that it shares with the CPU, the timings of the control signals, addresses and data have to be consciously taken into account. Below are the steps required for a DRAM read.

 

1) The row address is placed on the address pins via the address bus.

 

2) The /RAS pin is activated, which places the row address onto the Row Address Latch.

 

3) The Row Address Decoder selects the proper row to be sent to the sense amps.

 

4) The Write Enable (not shown in Figure 8 below) is deactivated, so the DRAM knows that it's not being written to.

 

5) The column address is placed on the address pins via the address bus.

 

6) The /CAS pin is activated, which places the column address on the Column Address Latch.

 

7) The /CAS pin also serves as the Output Enable, so once the /CAS signal has stabilised the sense amps place the data from the selected row and column on the Data Out pin so that it can travel the data bus back out into the system.

 

8) /RAS and /CAS are both deactivated so that the cycle can begin again.

 

There are two main types of delays that we have to take into account. The first type includes the delays that have to take place between successive DRAM reads. You can't just fire off a read and then fire off another one immediately afterwards. Since a DRAM read involves charging and recharging capacitors, and various control signals have to propagate hither and thither so that the chip will know what it's doing, you have to stick some space in between reads so that all the signals can settle back down and the capacitors can recharge.

Of this first type of in-between-reads delay, there's only one that's going to concern us really, and that's the /RAS and /CAS precharge delay. After /RAS has been active and

you deactivate it, you've got to give it some time to charge back up before you can activate it again. Figure 8 should help you visualise this.

 

 

 

           

            Figure 8 Asynchronous DRAM timing

 

 

 

The same goes for the /CAS signal as well, and in fact to visualise the /CAS precharge delay just look at the above picture and replace the term RAS with CAS.

If you think about these /RAS and /CAS precharge delays in light of the list of DRAM read steps, it should be obvious that this rest period limits the number of reads you can do in a given period of time. Specifically, step 8 dictates that you've got to deactivate /RAS and /CAS at the end of each cycle, so the fact that after you deactivate them you've got to wait for them to precharge before you can use them again means you have to wait a while in between reads (or writes, or refreshes, for that matter).

 

This precharge time in between reads isn't the only thing that limits DRAM operations either. The other type of delay that concerns us is internal to a specific read. Just like the in-between-reads delay is associated with deactivating /RAS and /CAS, the inside-the-read delay is associated with activating /RAS and /CAS. For instance, the row access time (tRAC), is the minimum amount of time you have to wait between the moment you activate RAS and the moment the data you want can appear on the data bus. Likewise, the column access time (tCAC) is the minimum delay between the moment you activate /CAS and the moment the data can appear on the data bus.

Think of tRAC and tCAC as the amount of time it takes the chip to fill an order you just placed at the drive-in window. You place your order (the row and column address of the data you want), and it has to go and fetch the data for you so it can place it on the data pins. Figure 8 should help you visualise how the two types of delays work

 

 

 

           

            Figure 8 Row Access Time

 

 

Figure 9 shows both types of delay in action in a series of DRAM reads.

 

Figure 9 Complete DRAM timing diagram

 

 

Latency

 

There are two important types of latency ratings for DRAMs: access time and cycle

time, where access time is related to the second type of delays we talked about (those internal to the read cycle) and cycle time is related to the first (those in between read cycles). Both ratings are given in nanoseconds.

 

For asynchronous DRAM chips, the access time describes the amount of time it takes in between when you drop that row address on the address pins and when you can expect the

data to show up at the data pins. Going back to our drive-in analogy, the access time is the time in between when you place your order and when your food shows  up at the window. So a DIMM with a 60ns latency takes at least 60ns to get your data to you after you've placed the row address (which is of course followed by the column address) on the pins.

 

Cycle time is  the amount of time you have to wait in between successive read operations. Minimising both cycle time and access time are what the next two flavours of DRAM we'll cover are all about.

 

When buying DRAM, the latency rating that you see most often is the access time. (We'll see why this is the case in a moment). The lower the access time the higher the bus speed at which you can use it. This makes sense, because higher bus speeds mean shorter processor cycles, and if the processor's cycles are short and the DRAM's latency is long, then the processor has to sit idle for more cycles. Or, to put it another way, it's more of a disaster for a fast, 1GHz PIII to have to sit around waiting on a 70ns memory access than it is for a 400MHz PII, because the faster PIII could be doing substantially more work in that 70ns than could the slower PII. So 70ns is a bigger waste of time for a processor that moves faster than it is for a processor that moves slower. And since the processor speed is a multiple of the bus speed... well, you get the picture.

If the CPU actually has to sit around and wait on some really slow DRAM to get back to it with data, you have to include wait states in its operation. Wait states are just what they sound like: they're predefined periods during which the CPU has to take time out to wait on memory. The slower the memory you're using (or the faster the CPU), the more wait states you have to insert. Wait states eat up performance, and are a Bad Thing.

 

 

FPM DRAM.

Fast Page Mode DRAM is so called because it squirts out data in 4-word bursts (a word is whatever the default memory chunk size is for the DRAM, usually a byte), where the four words in each burst all come from the same row, or page. For the read that fetches the first word of that four word burst, everything happens like a normal read--the row address is put on the address pins, /RAS goes active, the column address is put on the address pins, /CAS goes active, etc.. It's the next three successive reads that look kind of strange. At the end of that initial read, instead of deactivating /RAS and then reactivating it to take the next row address, the controller just leaves /RAS active for the next three reads. Why? Since the four words all come from the same row but different columns, there's no need to keep sending in the same row address. The controller just leaves /RAS active so that to get the next three words all it has to do is send in three column addresses.

 

To sum up, you feed the FPM DRAM the row and column addresses of the initial word you want, and then you can quickly grab three more words on that same row by simply feeding it three column addresses and pumping /CAS three times for each new column. Here's a diagram that'll show you what's going on.

 

Figure 10 FPM Timing

 

 

 

As you can see from figure 10, FPM is faster than a regular read because it takes the delays associated with both /RAS (tRAC and the /RAS precharge) and the row address out of the equation for three of the four reads. All you have to deal with are /CAS-related delays for those last three reads, which makes for less overhead and faster access and cycle times. The first read takes a larger number of CPU cycles to complete (say, 6), and the next three take a smaller number of cycles (say, 3). For an FPM DRAM where the initial read takes 6 cycles and the successive three reads take 3 cycles, we'd label it a 6-3-3-3 DRAM. I'm sure you've seen this x-y-y-y notation before. It's commonly used to describe latency in terms of bus clock cycles for both asynchronous DRAM and synchronous DRAM (SDRAM).

One important thing to notice in the FPM DRAM diagram is that you can't latch the column address for the next read until the data from the previous read is gone. Notice that the Column 2 block doesn't overlap with the Data 1 block, nor  does the Column 3 block overlap with the Data 2 block, and so on. The output for one read has to be completely finished before the next read can be started by placing the column address on the bus, so there's a small delay imposed as depicted in figure 11.

 

                                   

                                    Figure 11

 

 

EDO RAM

 

The reason I pointed out that FPM DRAM can't start a read until the previous read's data is gone is that this isn't true for Extended Data Out DRAM. With EDO DRAM, you can hold output data on the pins for longer, even if it means that the data from one read

is on the pins at the same time that you're latching in the column address for the next read.

 

 

Figure 12

 

 

 

It's like we took the FPM diagram and crammed everything in closer so that the column address blocks overlap with the data blocks from the previous read. This

ability to pipeline reads by having one start before the other is finished gives EDO a significant performance increase over its predecessor, FPM DRAM. When EDO first came out, there were claims of anywhere from 20% to 40% better performance.

 

Since EDO can put out data faster than FPM, it can be used with faster bus speeds. With EDO, you could increase the bus speed up to 66MHz without having to insert wait states. Some EDO RAMs even come with 5-2-2-2 latencies at 66MHz.

 

SDRAM

 

SDRAM is a different beast from the previous flavours of asynchronous DRAM that have been discussed so far. It's based on mostly the same types of basic principles (cells

arranged in a grid, RAS and CAS, etc.), but the way that it's organized and controlled is quite a bit different.

DRAM is organised into memory banks, where each bank fills up the entire data bus. With SIMMs, you have to put multiple SIMMs in a bank in order to fill up an entire 32- or 64-bit data bus. DIMMs have more pins, so a single DIMM put out enough data to fill up the entire bus, meaning you usually only need 1 DIMM per bank. SDRAM takes banks a step further by having multiple banks on a single DIMM. It does this not in order to fill up the entire memory bus, but because having more than one bank can significantly enhance performance.

 

Immediately preceding DRAM discussion that /RAS and /CAS have to precharge for a little while before they can be used again after being deactivated. In an SDRAM module with two banks, you can have one bank busy precharging while the other bank is being used. Then, when you want to read a row from the other bank, it's already precharged and ready to go, so you can start using it instantly without having to wait. To accommodate this functionality, SDRAMs include way for you to tell a particular bank to start precharging so that when you're ready to use it you don't have to wait. A two-bank SDRAM also has to include a pin, called BA0, that lets you select between banks, so you'll know which bank you're precharging, reading, or writing. When BA0 is low, Bank 0 is selected, and when it's high, Bank 1.

 

SDRAM_control

 

Not only does an SDRAM's organisation into banks distinguish it from other types of

DRAMs, but so does the way it's controlled. Since asynchronous DRAM doesn't share any sort of common clock signal with the CPU and chipset, the chipset has to manipulate the DRAM's control pins based on all sorts of timing considerations. SDRAM, however, shares the bus clock with the CPU and chipset, so the chipset can place commands (or,

certain predefined combinations of signals) on its control pins on the rising clock edge.

 

SDRAM has five primary control pins on which commands can be placed by feeding them high or low signals in sync with the clock pulse. These signals are:

 

 

 

 

 

 

The following chart shows you which combinations of the above signals translate into which commands. An "H" indicates a "high" signal, an "L" indicates a "low" signal, and an "X" indicates that it doesn't matter what kind of signal you put on the pins. Remember that if the signal has a "/" in front of it (or a "#" behind it), then L means it's activated.

 

 

 

 

 

 

COMMAND INHIBIT and NO OPERATION 

These two commands are basically just the two different states of the Chip Select signal. As you'll recall, you use the /CS signal to tell the individual SDRAM chip one of two things:"hey, I'm talking to you here. Listen up"; or "Go back to sleep... this conversation doesn't involve you". When /CS is inactive (COMMAND INHIBIT), the SDRAM is in the latter state and won't respond to any commands placed on its pins. /CS has to be active for the chip to respond to any commands.  

 NO OPERATION is a command that just activates the chip using /CS and then tells it to do nothing. Why would you want to issue a specific command to the chip that tells it to do nothing? We'll find out when we talk about CAS latency.

 

ACTIVATE, READ, and WRITE

These three commands are the ones you really need to know to do a basic READ or WRITE.You use ACTIVATE to select a particular bank and activate a particular row, READ to read a particular column, and WRITE to write to a particular bank and column. It's pretty straightforward, actually--much more so than FPM or EDO DRAM reads, in

fact. Figure 13 shows a typical SDRAM READ.

 

Figure 13

 

 

Here are the steps you go through in the diagram above, broken down by clock cycle.

 

Clock 1: ACTIVATE the row by turning on /CS and /RAS. When you do this, place the proper row address on the address bus so the chip will know which row you want to ACTIVATE.

 

Clock 3: READ the column you want from the row you've ACTIVATED by turning on /CAS while placing the column's address on the address bus.

 

Clocks 5-10: The data from the row and column that you gave the chip goes out onto the Data Bus, followed by a BURST of other columns, the order of which depends on which BURST MODE you've set.

 

There's not much to a basic READ. Having commands synchronised to the bus clock really simplifies working with SDRAM.

 

 

 

 

LOAD MODE REGISTER

While asynchronous DRAM like EDO and FPM are designed to allow you to burst data onto the bus by keeping a row active and selecting only columns, SDRAM take this a step further by being so wholly burst oriented that you can actually program a chip to spit

out data bursts in predefined sequences. For instance, you can program an SDRAM so that every time you feed it a row and column address, it automatically spits out a burst of eight columns in the following order, where 0 is the column you sent it: 5-6-7-0-1-2-3-4 (or any other sequential order). Or, you could have it burst four columns onto the bus in the order, 1-0-3-2. (Again, any sequence will do here.) Or, you could just have it burst the entire page. How can you program it to do this? By LOADing the SDRAM's mode register with proper configuration information.

 

The mode register holds a 12-bit value that the SDRAM looks at in order to determine how many columns it should BURST and in what order it should BURST them. To program the mode register, you issue the LOAD MODE REGISTER command by activating /CS, /RAS, /CAS, and /WE while placing the proper op-code on the address and BA0 pins. The diagram to the left of the page below shows you the format of the op-code, along with what each of the bits stands for. The chart on the right tells you the burst sequences that the BURST bits designate.

 

Figure 14

 

 

In Figure 14 you should notice that the mode register is used to set the CAS latency. So if you ever wondered how it is that you can set the CAS timing in your system's BIOS, and have that affect how your SDRAM operates, now you know. The BIOS just initialises the mode register on boot-up with the proper CAS latency.

 

 

 

BURST TERMINATE, PRECHARGE,  and AUTO REFRESH

These three commands aren't really related, but I'm grouping them together because they're all very simple. You issue a BURST TERMINATE command to stop the burst that's currently in progress. We've already discussed the PRECHARGE command, which is used to precharge a bank so that it's ready to go when you need it. AUTO REFRESH uses the SDRAM's own internal counter to refresh its rows, so that you don't have to stick multiple row addresses on the address bus to refresh multiple rows.

 

Write Enable/Output Enable, Write Inhibit/ Output Inhibit

These commands amount to nothing more than manipulating the DQM line, a line that controls the state of the data pins. The DQM line can tell the data pins to either explicitly refuse whatever input or output you're trying to give them (Write Inhibit/Output Inhibit) or to explicitly accept whatever input or output you're trying to give them (Write Enable/Output Enable).

 

SDRAM CAS timing

The last aspect of SDRAM that bears looking at is CAS latency. As most of you probably know from buying your own SDRAM, SDRAM comes in CAS 1, CAS 2, and CAS 3 flavours. Which flavour you should buy depends on the bus speed at which you're going to run it. This section will tell you what CAS latency values actually mean, and why they relate to bus speed.

The Micron datasheet for the MT48LC4M4A1/A2 has about as good a definition of CAS latency as you can get. The CAS latency is the delay, in clock cycles, between the registration of a READ command and the availability of the first piece of output data. The latency can be set to 1, 2 or 3 clocks. If a READ command is registered at clock edge n, and the latency is m clocks, the data will be available by clock edge n+m.   

Below are 3 diagrams, also from the Micron datasheet, to help you visualise what's going on.

 

 

                       

                        Figure 15

 

 

 

 

 

 

 

 

 

                       

                        Figure 16

 

 

 

           

            Figure 17

 

 

And now you see what those mysterious NO OPERATION (NOP) commands were for. The SDRAM inserts NOPs to delay the output of a READ so that the data shows up with the proper latency.    

At this point you're probably thinking, "why would you want to insert NOPs and delay the READ's output? It seems like you'd always want to speed up a READ, not delay it, right?" Well, not exactly.

Remember from our DRAM discussion how I said that if the DRAM is really slow and the CPU is really fast, the CPU has to insert wait states into its operation so that it can wait for a DRAM read to finish? Remember how I also said that slower CPUs, since they do less work in the same amount of time, don't really have to use these wait states because they're not going that much faster compared to the memory? Well, the same sort of principles hold true for SDRAM, but the difference is that the wait states happen on the RAM chip, and not on the CPU.   

If the bus speed is really high, then those clock pulses are flying by the SDRAM really fast and pushing it really hard. (Think of Richard Simmons speeding up the pace of the workout.) Just because the <strike>disco music</strike> clock pulses have sped up, however, doesn't mean that the SDRAM can actually operate that much faster. All those delays that were internal to a DRAM read and that made its access time (the time between when you request some data and when it shows up) higher are still there with SDRAM. It's just that SDRAM has to be synchronized with the bus clock, so if its access time is really long and the bus clock pulses are coming really fast, you've got to wait a few more clock pulses for that data to show up than you would if the access time were shorter and/or the bus clock were slower.    

The advantage of moving this thumb twiddling from the processor to the SDRAM module should be obvious. Instead of having to sit around and wait for a READ to finish, the processor can fire off the READ command and then turn its attention to other things until the output data shows up. This keeps the CPU from having to waste valuable processing time, thereby increasing overall system performance.   

To see the relationship between CAS latency and bus speed in a real world setting, let's take a look at the CAS latency chart for the SDRAM we've been studying. You'll notice that the higher the bus speed goes the higher the CAS latency has to be. Think of this old and out of shape. (It is> actually obsolete.) If the bus clock picks up the pace, the SDRAM just takes a larger number of beats to do what it's got to do, because it can't keep up with the faster clock.

 

 

                       

                        Figure 18

 

 

If the SDRAM we're using as an example were in better shape (read, "not obsolete and a bit more modern"), those MHz numbers would look much higher all around; it would be capable of operating at higher frequencies than 33MHz with a CAS latency of 1.

 

 

CAS latency in depth

For the truly hardcore who want an even deeper understanding of SDRAM, knowing the issues that give rise to the above chart really helps. In the diagram that I did of the SDRAM READ, you probably noticed that I didn't include any of those gray delay bars. I just lined things up along the rising edges of the clock and left it at that. Well, the delays that we talked about with asynchronous DRAM (tRAC and tCAC) are still a factor with SDRAM. In particular, tCAC (column access time), which is the minimum amount of between the moment you latch in the column address and the moment the data shows up at the output pins, is a direct factor in determining what CAS latency you need for what bus speed. If tCAC is slow and the bus speed is fast, then it'll take more clocks for tCAC to elapse and for the output data to show up. I've redone the first SDRAM READ picture to show the effects of the tCAC and tRAC delays.

 

 

 

Figure 19

 

 

Looking at Figure 19, just imagine if we compressed about 15 clock pulses into the same amount of space as the 10 pulses shown, while leaving the tRAC and tCAC delays the same length. Obviously, there would be more clock pulses in between the time that the column address was latched onto the bus and the time that the first data showed up on the bus. 

Even though Figure 19 probably implies otherwise, it's actually just tCAC that's the limiting factor in CAS latency; by the time the first piece of data shows up, tRAC has long since expired. Since tCAC is the limiting factor in the latency, you have to be able to multiply the clock period by the CAS latency and have the result be greater than or equal to tCAC. So the equation goes, tCLK x CAS latency >= tCAC. If you look at that equation, you can see that as tCLK gets shorter, the CAS latency has to get higher to compensate. This relationship yields the CAS latency chart.   

 

SDRAM ratings: PC66, PC100, and  PC133      

When it comes to rating SDRAM for sale, tCAC isn't the only delay that matters. Usually, when SDRAM is marked for sale, it's marked with three numbers of the format x-y-z. For instance, you'll see some SDRAM marked as 3-2-2. This is not the same as the x-y-y-y marking for asynchronous DRAM. The x-y-y-y marking, as you recall, gives you the access time (in clock cycles) followed by the latencies for the individual data bursts. For SDRAM, the three numbers signify, in this order: the CAS latency, the RAS-to-CAS delay, and the RAS precharge time. We've already discussed the importance of the last two items in the DRAM section, so I won't recap that. Suffice it to say that with these three numbers and a MHz speed rating, you can pretty accurately describe the performance of an SDRAM.

Speaking of MHz speed ratings, SDRAM is also rated by MHz instead of in nanoseconds like regular DRAM. This is so you can get some feel for what type of bus speed it's supposed to go with. As with most things that are rated in MHz, there's usually some SDRAMs are normally used with a 66MHz bus. Hence, such SDRAM is used in DIMMs that are labeled as PC66. PC100 and PC133 DIMMs likewise contain SDRAM chips that are rated high enough to where they're supposed to work reliably with 100MHz and 133MHz buses, respectively.

 

 

 

DDR DRAM

DDR DRAM is basically just a more advanced flavor of SDRAM, with an added twist

at the data pins.  SDRAM transfers its commands, addresses, and data on the rising edge of the clock. Like regular SDRAM, DDR DRAM transfers its commands and addresses on the rising edge of the clock, but unlike SDRAM it contains special circuitry behind its data pins that allows it to transfer data on both the rising and falling edges of the clock. So DDR can transfer two data words per clock cycle, as opposed to SDRAM's one word per clock cycle, effectively doubling the speed at which it can be read from or written to under optimal circumstances. Thus the "DDR" in DDR DRAM stands for "Double Data Rate" a name that it gets from this ability to transfer twice the data per clock as an SDRAM.

Now let's take a look at two block diagrams from two Micron DRAM datasheets.The one on the top is a DDR DRAM and the one on the bottom is a regular SDRAM.

Figure 20

 

Figure 21

 

 

As you can see from the two diagrams, there's a patch of extra logic between the DRAM array and the data pins that produces a data output strobe, called DQS, which syncs the data output to the external clock and allows DDR to transfer the results of a read on both clock edges. For writes, there's a corresponding DQS signal that must be generated by the chipset's memory interface in order to sync the write data to both edges of the clock. This strobe circuitry increases the die size a little, but the increase is fairly negligible.

Since DDR DRAM is an evolution of SDRAM, its overall approach to providing memory

bandwidth is pretty much the same, aside from the fact that it transfers two words of data per clock. To recap the last edition a bit, remember that SDRAM (and now DDR) provides a wide, 64-bit data path from each DIMM, and has multiple banks of memory on each DIMM that can feed into that data path. Check out the following conceptual diagram, which shows a memory subsystem with 3 DIMMs, each of which contains four banks of memory.

 

 

           

            Figure 21 DDR

 

 

The little blue things are 64-bit bundles of data that the banks spit out and send back to the CPU during a read operation. Each bundle is composed of four, 16-bit chunks in this picture. (I'll talk about why this in a later section.) The red arrows show data flowing out of the DIMMs and down the main data bus, to which each of the DIMMs is connected. For write operations, just reverse the direction of the red arrow to show the data flowing into the banks.

 

RAM Banks and performance

The advantage to having multiple banks on one DIMM is that each bank can have

a row (or "page") active and waiting to spit out data. You'll remember

from the previous edition that only one row at a time in an individual bank can

be active, and that whenever you need data from a row  other than the active one,           1) you've got to precharge the new row,

2) close out the active row, and then

3) open the new row for reading.

All of this stuff that's involved in switching rows eats up valuable time, so it's best to keep a particular row active as long as possible. Plus, when a row is active, you can strobe column addresses to it without having to repeat the row address, which allows you to burst data from those columns onto the bus, one column after another. So the more rows of memory that a system can have open at once, the quicker the memory can get data to the CPU whenever it asks for it. And since only one row per bank can be active at a time, having more open rows means having more banks.

Another way to think about the need for multiple banks is to think of it all

in terms of keeping the data bus full. If a 64-bit wide data bus is being run at 100MHz, then it has a lot of clock pulses flying by on it every second. Since two words of data can ride on a single clock pulse (one word on the rising edge and one word on the falling edge), the bus has a lot of potential clock pulses open every second that are available to carry pairs of 64-bit data chunks. So think of each clock pulse as having two empty slots that need to be filled, and think of the 100MHz bus as a conveyor that carries those pairs of slots by at a good clip. Now, if there were only one bank of memory, then there could be only one row open on each DIMM. If you had a system with only 1 DIMM then that one row would have to put out enough data every second to fill up all of those slots. Since memory accesses often move around from row to row, that's not likely to happen. That one bank would have to  keep switching rows, and since switching rows takes time, a lot of those clock pulses would have to fly by empty-handed while the bank is doing its

switching.

More banks gives you more rows open, so that each of those rows can do its part to fill up those clock pulses as they fly by. And if one bank needs to switch rows, another bank can (hopefully) pitch in with some data from its own active row and fill up those clock pulses itself so that none of the pulses get  wasted while that other bank is taking care of its business. So you can see that having the ability to keep multiple banks, and thus multiple rows, open is   essential to keeping a high-bandwidth data bus full. Note that regular  PC133 SDRAM sports multiple banks per chip, but the higher bandwidth and double data rate of DDR DRAM makes the need for multiple banks even more critical.

 

On an added note, this discussion illustrates why two, 512MB DIMMs of SDRAM will outperform a single, 1GB DIMM. Since each DIMM can have up to four banks, regardless of its size, spreading your memory out among multiple DIMMs offers better performance because of the increased number of banks.

 

 

 

DDR DRAM

DDR DRAM is basically just a more advanced version of SDRAM, with an added twist

at the data pins. Now SDRAM transfers its commands, addresses, and data on the rising edge of the clock. Like regular SDRAM, DDR DRAM transfers its commands and addresses on the rising edge of the clock, but unlike SDRAM it contains special circuitry behind its data pins that allows it to transfer data on both the rising and falling edges of the clock. So DDR can transfer two data words per clock cycle, as opposed to SDRAM's one word per clock cycle, effectively doubling the speed at which it can be read from or written to under optimal circumstances. Thus the DDR in DDR DRAM stands for Double Data Rate a name that it gets from this ability to transfer twice the data per clock as an SDRAM.

 

 

 

Figure 22

Figure 23

 

 

 

From Figures 22 and 23 you can see that there is a patch of extra logic between the DRAM array and the data pins that produces a data output strobe, called DQS, which synchronises the data output to the external clock and allows DDR to transfer the results of a read on both clock edges. For writes, there's a corresponding DQS signal that must be generated by the chipset's memory interface in order to sync the write data to both edges of the clock. This strobe circuitry increases the die size a little, but the increase is fairly negligible.

Since DDR DRAM is an evolution of SDRAM, its overall approach to providing memory

bandwidth is pretty much the same, aside from the fact that it transfers two words of data per clock. Now SDRAM (and now DDR) provides a wide, 64-bit data path from each DIMM, and has multiple banks of memory on each DIMM that can feed into that data path.  Figure 24 shows a memory subsystem with 3 DIMMs, each of which contains four banks of memory.

 

           

            Figure 24

 

The little blue things are 64-bit bundles of data that the banks produce and send back to the CPU during a read operation. Each bundle is composed of four, 16-bit chunks in this picture. The red arrows show data flowing out of the DIMMs and down the main data bus, to which each of the DIMMs is connected. For write operations, just reverse the direction of the red arrow to show the data flowing into the banks.

 

RAM Banks and performance

The advantage to having multiple banks on one DIMM is that each bank can have

a row (or page) active and waiting to spit out data. Now only one row at a time in an individual bank can be active, and that whenever you need data from a row  other than the active one:

1) you've got to precharge the new row,

2) close out the active row, and then

3) open the new row for reading.

All of this stuff that's involved in switching rows eats up valuable time, so it's best to keep a particular row active as long as possible. Plus, when a row is active, you can strobe column addresses to it  without having to repeat the row address, which allows you to burst data from those columns onto the bus, one column after another. So the more rows of memory that a system can have open at once, the quicker the memory can get data to the CPU whenever it asks for it. And since only one row per bank can be active at a time, having more open rows means having more banks.

Another way to think about the need for multiple banks is to think of it all in terms of keeping the data bus full. If a 64-bit wide data bus is being run at 100MHz, then it has a lot of clock pulses flying by on it every second. Since two words of data can ride on a single clock pulse (one word on the rising edge and one word on the falling edge), the bus has a lot of potential clock pulses open every second that are available to carry pairs of 64-bit data chunks. So think of each clock pulse as having two empty  slots that need to be filled, and think of the 100MHz bus as a conveyor that carries those pairs of slots by very quickly. Now, if there were only one bank of memory, then there could be only one row open on each DIMM. If you had a  system with only 1 DIMM then that one row would have to put out enough data every second to fill up all of those slots. Since memory accesses often move around from row to row, that's not likely to happen. That one bank would have to keep switching rows, and since switching rows takes time, a lot of those clock pulses would have to fly by empty-handed while the bank is doing its switching.

More banks gives you more rows open, so that each of those rows can do its part to fill up those clock pulses as they fly by. And if one bank needs to switch rows, another bank can (hopefully) provide some data from its own active row and fill up those clock pulses itself so that none of the pulses get wasted while that other bank is taking care of its business. So you can see that having the ability to keep multiple banks, and thus multiple rows, open is essential to keeping a high-bandwidth data bus full. Note that regular PC133 SDRAM has multiple banks per chip, but the higher bandwidth and double data rate of DDR DRAM makes the need for multiple banks even more critical.

On an added note, this discussion illustrates why two, 512MB DIMMs of SDRAM will outperform a single, 1GB DIMM. Since each DIMM can have up to four banks, regardless of its size, spreading your memory out among multiple DIMMs offers better performance because of the increased number of banks.