posted
As the ISA and PCI ROM's get bigger, the code to generate them must get more sophisticated. Here's some snippets showing code evolution that will help condense larger ROM's.
The first snippet of code is a somewhat standard method of setting memory timings, one-at-a-time. This is very easy to write, troubleshoot and understand:
code:
==============================Standard================================== mov eax, 8000C290h ; DRAM Configuration Low Address mov dx,0CF8h ; set port address out dx,eax ; send address through the port mov dx,0CFCh ; set port data in eax,dx ; fetch data and eax, 0F1FFFFFFh ; Bypass Max or eax, 00E000000h ; 7x out dx, eax ; send new data
mov eax, 8000C290h ; DRAM Configuration Low Address mov dx,0CF8h ; set port address out dx,eax ; send address through the port mov dx,0CFCh ; set port data in eax,dx ; fetch data and eax, 0FFFF3FFFh ; Read/Write Queue Bypass or eax, 00000C000h ; 16x out dx, eax ; send new data
The next snippet shows how we can combine some code and only address the DRAM 32 bit register once:
code:
============================Combined==================================== mov eax, 8000C290h ; DRAM Configuration Low Address mov dx,0CF8h ; set port address out dx,eax ; send address through the port mov dx,0CFCh ; set port data in eax,dx ; fetch data and eax, 0F1FFFFFFh ; Bypass Max or eax, 00E000000h ; 7x and eax, 0FFFF3FFFh ; Read/Write Queue Bypass or eax, 00000C000h ; 16x out dx, eax ; send new data
The next logical progression is to combine the dwords that mask the data and combine the dword data. I call this the "dword method". Note that this code performs the same function as the 2 above, but is much smaller:
code:
=============================DWord====================================== mov eax, 8000C290h ; DRAM Configuration Low Address mov dx,0CF8h ; set port address out dx,eax ; send address through the port mov dx,0CFCh ; set port data in eax,dx ; fetch data and eax, 0F1FF3FFFh ; Bypass Max & R/W Queue Bypass or eax, 00E00C000h ; 7x & 16x out dx, eax ; send new data
Here's a dword code snippet to set 21 memory timings. Each DRAM register is read only once and written to only once:
code:
============================21 Timings==================================== mov eax,08000C288h ; DRAM Timing Low address mov ebx,000EDC224h ; register data for: Cas, Trcd, Trp, Tras, ; Trc, Trrd, Twr mov dx,0CF8h ; set port address out dx,eax ; send address through the port mov dx,0CFCh ; set port data in eax,dx ; fetch data and eax,0FF000CC8h ; set data byte to zero or eax,ebx ; increase data by new setting out dx,eax ; send data through port data
mov eax,08000C28Ch ; cDRAM Timing High address mov ebx,06DB20300h ; register data for: Twtr, Tref, Trfc mov dx,0CF8h ; set port address out dx,eax ; send address through the port mov dx,0CFCh ; set port data in eax,dx ; fetch data and eax,0000CFCFFh ; set data byte to zero or eax,ebx ; increase data by new setting out dx,eax ; send data through port data
mov eax,08000C290h ; DRAM Configuration Low address mov ebx,000000080h ; register data for: Drive Strength Normal mov dx,0CF8h ; set port address out dx,eax ; send address through the port mov dx,0CFCh ; set port data in eax,dx ; fetch data and eax,0FFFFFF7Fh ; set data byte to zero or eax,ebx ; increase data by new setting out dx,eax ; send data through port data
mov eax,08000C294h ; DRAM Configuration High address mov ebx,00F100090h ; register data for: Async, CRC, QBypass mov dx,0CF8h ; set port address out dx,eax ; send address through the port mov dx,0CFCh ; set port data in eax,dx ; fetch data and eax,0F0EFFF0Fh ; set data byte to zero or eax,ebx ; increase data by new setting out dx,eax ; send data through port data
mov eax,08000C2A0h ; DRAM Miscellaneous address mov ebx,00000012Ch ; register data for: Idle Cycle, ; Dynamic Idle, Queue Bypass, Trfc0-3 mov dx,0CF8h ; set port address out dx,eax ; send address through the port mov dx,0CFCh ; set port data in eax,dx ; fetch data and eax,0FFFFFE13h ; set data byte to zero or eax,ebx ; increase data by new setting
The next snippet I call the "Indexing Method" and has been used in the addition of new Items. A CMOS register ir read, and that data is modified to be the correct data sent to a memory register as a dword. The following code is in general format:
code:
ASL: ;========================== Setup DRAM Config High Address ======================= mov ebx, 0h ;clear mov ecx, 0h ;clear ;=============================== Async Latency =================================== mov eax, 0h ;clear mov al, 21h ;set index-21h out 72h, al ;send index thru port in al, 73h ;fetch data and al, 0Fh ;mask 0000.1111 mov cl, al ;save for later
mov eax,08000C294h ;DRAM Config High Address mov dx,0CF8h ;set port address out dx,eax ;send address thru port mov dx,0CFCh ;set port data in eax,dx ;fetch dword at DRAM Config High Address
mov ebx, eax ;save dword in case no change cmp cl, 00h ;see if AUTO was selected je NEXT ;jump, AUTO selected
mov eax, 0 ;clear mov al, cl ;move index to eax ;mask 0Fh add eax, 04h ;add 04h to eax - 0000.0001+0000.0100 = 0000.0101 sal eax, 04h ;shift left 4 bits - 0101.0000<----0000.0101 ;mask F0h ;add eax, 40h ;add 40h to eax 0001.0000+0100.0000 = 0101.0000 ;sal eax, 04h ;shift left 4 bits - not needed ;============================== READ-MODIFY-WRITE ================================
and ebx,0FFFFFF0Fh ;Mask Async Latency Bit or eax,ebx ;add Async Latency bits to dword out dx,eax ;Send dword at DRAM Config High Address
Edit: As I get more up to speed in Assembly, I'm noticing things I should not have done. The one for this topic is to use the directive "shr" instead of "sar" when shifting right. "sar" keeps the left-most bit as it could be a sign bit.
-------------------- Too Many Computers,... Too Little Time ..... Com'on ???..! Posts: 27985 | From: Fire Island, NY | Registered: Feb 2003
posted
Hi Polygon, Changing too much values in the memory controller registers at once is quite dangerous. Indeed, it improves the code compression in the module that you write, but it can introduce a subtle bug that causes instability in the system. Please read the critical update section, i.e. A Very Subtle Bug and Its Patch section in this article. FYI, the link to the section in the beginning of the article is broken, I haven't got time to fix it. Just click the nearest section link to get to the section as fast as possible.
This is the important summary:
quote:This new patch only initializes one register at a time and gives enough "CPU clock-cycle" to the PCI bus intensive routine. Personally, I think that to appropriately initialize a PCI chipset it's not enough just by relaxing the read-write timing, but more importantly we have to initialize only one register at a time in order to minimize the "sudden-load" in the chipset. This is especially true for performance-related registers within the chipset. In my tests for this new patch, I placed the call to the patch in a few places within the POST-jump-table an everyone of them work flawlessly as expected. The testing has been carried out more than 100 boot-reboot cycle for each variant
Cheers,
Pinczakko
-------------------- -- Human knowledge belongs to the world -- Posts: 133 | From: Taka Bonerate National Park, Indonesia | Registered: Jun 2004
posted
OK, here's question. In the following code, the DRAM Timing Low address, is addressed 2 times, both times using the same port address and the same data address. But doesn't dx still retain the correct port data such that it does not need the port data to be entered a 2nd time? Or do those extra clock cycles help with the subtle bug issue?
code:
mov eax,08000C288h ; DRAM Timing Low address mov ebx,000200000h ; copy register data for Twr: 5T mov dx,0CF8h ; set port address out dx,eax ; send address through the port mov dx,0CFCh ; set port data in eax,dx ; fetch data and eax,0FFCFFFFFh ; set data byte to zero or eax,ebx ; increase data by new setting out dx,eax ; send data through port data
mov eax,08000C288h ; DRAM Timing Low address mov ebx,000800000h ; copy register data for Trrd: 4T mov dx,0CF8h ; set port address out dx,eax ; send address through the port mov dx,0CFCh ; set port data in eax,dx ; fetch data and eax,0FF3FFFFFh ; set data byte to zero or eax,ebx ; increase data by new setting out dx,eax ; send data through port data
It seems like the code could be reduced by the links I have commented out. Yet it still writes to 1 register at a time:
code:
mov eax,08000C288h ; DRAM Timing Low address mov ebx,000200000h ; copy register data for Twr: 5T mov dx,0CF8h ; set port address out dx,eax ; send address through the port mov dx,0CFCh ; set port data in eax,dx ; fetch data and eax,0FFCFFFFFh ; set data byte to zero or eax,ebx ; increase data by new setting out dx,eax ; send data through port data
;mov eax,08000C288h ; DRAM Timing Low address ;mov ebx,000800000h ; copy register data for Trrd: 4T ;mov dx,0CF8h ; set port address ;out dx,eax ; send address through the port ;mov dx,0CFCh ; set port data
in eax,dx ; fetch data and eax,0FF3FFFFFh ; set data byte to zero or eax,ebx ; increase data by new setting out dx,eax ; send data through port data
-------------------- Too Many Computers,... Too Little Time ..... Com'on ???..! Posts: 27985 | From: Fire Island, NY | Registered: Feb 2003
posted
Might sound stupid, but is your code not too fast ? I don't see extra timing instructions such as jmp short $+2 or similar. I'm under the impression that those ports are not amused with too tight coding because they need a couple of clicks to recuperate or set the return value. I might be totally of the track here, but I seem to miss it in the code, unless I'm missing something or worrying about a thing that isn't an issue ( on that board )
example on what I mean:
The circumstances mentioned in the article Pinczakko did are valid and, as always is the case with his articles, prove to be interesting in that the timing of timing setup is essential. When one wants to edit the memory settings at such a late state, as is the case with ISA Option ROMS that are IMHO often if not always executed after initialisation and almost complete execution of the System Code of the BIOS, that registers that are written to Northbridges ( eg: AMD = CPU MSRs, the AMD docs clearly state that a lot of those values should only be initialised to at early POST )
I saw in the AMI code that most RD/WRMRS's especially for the cpu/chipset/memory init are executed in cache, hence the speed problem and problems with read/writing to those type of low level registers at a later state. ( eg: WRMSR takes somewhere close to 19 clocks to execute, more then enough time to put the system in a complete instable state. ) Perhaps not unlike the setup of the System Clock on the MCP55 at early that works perfectly, yet totally fails in the way the BIOS code does it at a(ny) later stage.
Perhaps your code should be more dependant on the already existing setup value, and your value on a byte level AND/OR-ed into it after reading out the previous value, instead of a 'clean' mov ?
That way some unexpected or un-documented values aren't overwritten....
posted
None of the codes posted in the entire BIOS Workshop have jmp short $+2 or similar instructions. So I don't know how to answer your inquiries. All the varities of code I posted above, are in working BIOS and so far, exhibit no issues. Not to say that there may not be race conditions waiting to rear it's ugly head.
When I started writing Option ROMs over a year ago, the first thing I did was disassemble working code in posted BIOS that were written by tictac, sideefect, and a few other experienced BIOS modders. None have jmp short $+2 or similar instructions.
In the original topic Building an AWARD ISA Option ROM Discussion Thread where tictac and Pinczakko taught us how to write ISA Option ROMS, jmp short $+2 or similar instructions., were never mentioned.
Since that time, probably 50 mod BIOS have been posted using varities of the above code. No issue has ever been reported....
-------------------- Too Many Computers,... Too Little Time ..... Com'on ???..! Posts: 27985 | From: Fire Island, NY | Registered: Feb 2003
posted
That is probably a good question for Pinczakko, by PM as he may not see this thread...
I honestly don't know if it's needed or not as my experience writing code is very, very limited. And just because it works doesn't necessarly mean it is "safe"
-------------------- Too Many Computers,... Too Little Time ..... Com'on ???..! Posts: 27985 | From: Fire Island, NY | Registered: Feb 2003
posted
I personally think that it's important to include an instruction that "relax" the timing for register setup routine such as the:
code:
jmp $+2
The main point is to "wait" for the PCI Bus (PCIe or Hypertransport for your particular case) to be ready before sending the data to initialize the chipset. More over, it's important to only set/reset the bit(s) that we know or experiment with and leaving the rest to their unaltered form as it can harm the stability of the system as well.
Anyway, back then I intended to experiment with assembly instructions that would "serialize" i.e., "flushes" the cache before executing any PCI register related activity in the code. Nonetheless, I haven't got the time until now, maybe you guys can experiment with it ;-).
One more thing that we must be aware that certain PCI register is "one time set" only, i.e. you can only initialize it once, you cannot reinitialize it later. In Intel chipset document, some registers are mentioned to be of that type. In a case like this, the only option we have is to patch the system BIOS, because Option ROM simply wouldn't affect the register.
A little off-topic. But, I want to inform that I've got AMD690G mobo right now and new experiments should come from me ;-). Looking forward experimenting with RAM related initialization.
That's all for now guys. Keep up the good work. I'll be joining soon
-------------------- -- Human knowledge belongs to the world -- Posts: 133 | From: Taka Bonerate National Park, Indonesia | Registered: Jun 2004
So in the following code, where would the jmp short $+2 instruction(s) be placed:
code:
==============================Standard================================== mov eax, 8000C290h ; DRAM Configuration Low Address mov dx,0CF8h ; set port address out dx,eax ; send address through the port mov dx,0CFCh ; set port data in eax,dx ; fetch data and eax, 0F1FFFFFFh ; Bypass Max or eax, 00E000000h ; 7x out dx, eax ; send new data
mov eax, 8000C290h ; DRAM Configuration Low Address mov dx,0CF8h ; set port address out dx,eax ; send address through the port mov dx,0CFCh ; set port data in eax,dx ; fetch data and eax, 0FFFF3FFFh ; Read/Write Queue Bypass or eax, 00000C000h ; 16x out dx, eax ; send new data
-------------------- Too Many Computers,... Too Little Time ..... Com'on ???..! Posts: 27985 | From: Fire Island, NY | Registered: Feb 2003
quote: Perhaps your code should be more dependant on the already existing setup value, and your value on a byte level AND/OR-ed into it after reading out the previous value, instead of a 'clean' mov ?
I am reading the existing value and changing only the bits of interest. When a single value in the memory timing is changed, are you saying only write to that single register and not the whole 32bit register, which is made up of 8-4 bit registers?
As some "timings" straddle 2 registers, and those registers contain either some or all bits for other timings, I don't know how to not re-write those bits that are not of interest when they are in the registers of interest.
Again, from the very beginning, all code generated by tictac and Pinczakko has written to the 32 bits even if only 1 setting is changed....
-------------------- Too Many Computers,... Too Little Time ..... Com'on ???..! Posts: 27985 | From: Fire Island, NY | Registered: Feb 2003
You're right that my code was writing 32-bit register(s) at once ;-). I thought it was just fine back then.
However, I realize that it's not a recommended practice after I found out about the thing in my "Critical Update" in the article linked above.
Better to code a procedure that writes 8-bit at a time with parameters consist of 8-bit value and the address of the intended register. Maybe Twobombs can help at this for the time being because I'm still too busy. I'll drop by tomorrow, if I have a chance.
Anyway, as a hint, you can modify the code in that article. It should be pretty easy ;-).
(-- Pasted below for your convenience and cross documenting --)
Read_PCI_Bus0_Byte: mov ax, 8000h shl eax, 10h mov ax, cx and al, 0FCh mov dx, 0CF8h out dx, eax ;<-- you can put the "relaxing" jump here --> mov dl, 0FCh ; '?' mov al, cl and al, 3 add dl, al in al, dx ;<-- you can put the "relaxing" jump here --> retn
Write_PCI_Bus0_Byte: xchg ax, cx shl ecx, 10h xchg ax, cx mov ax, 8000h shl eax, 10h mov ax, cx and al, 0FCh mov dx, 0CF8h out dx, eax ;<-- you can put the "relaxing" jump here --> add dl, 4 or dl, cl mov eax, ecx shr eax, 10h out dx, al ;<-- you can put the "relaxing" jump here --> retn ;------------------------------ file: mem_optimize.asm -----------------------------------
-------------------- -- Human knowledge belongs to the world -- Posts: 133 | From: Taka Bonerate National Park, Indonesia | Registered: Jun 2004
posted
I forgot to say that it's highly recommended to disable the interrupt with
code:
cli
instruction before your PCI related code and enable it with
code:
sti
after you're done with it. This is assuming you're still coding an expansion ROM. If you make a system BIOS patch, you have to look at the current system state from the code surrounding your patch
-------------------- -- Human knowledge belongs to the world -- Posts: 133 | From: Taka Bonerate National Park, Indonesia | Registered: Jun 2004
posted
I believe it to be very good practice to disable Interrupts when possible as it greatly improves system stability while writing those system registers! Once again applause from me *clap clap clap*. I forgot about that important nugget of info ; the last time I worked with that type of code the 16 Mhz 68040 was regarded as 'hot', and Hblank programming was *in*.
As we're programming into chipsets that often don't have all of their features documented, or have some reserved bits that spring into action when a certain condition is met I believe it's wise not to touch variables that aren't necessary to touch. I see that way of programming a lot in BIOSes, as those guys also don't know what's going to happen futher down the road(map).
A read from the register, such as a typical out/in command should be separated with wait cycles. There are several of those, one of them is the jmp+$2, but there are others with different characteristics. The picture below sets up some of the PCI devices on the MCP55Pro southbridge through SMbus calls, you can clearly see the jmp short $+2 , a lot of links on the internet quote the OUT 0EBh, AL to be a wait cycle as well, which does make some sense to me, but I haven't really seen the whole idea or truth behind it.
The value retrieved from the in/out command should be preferably AND-ed with the smallest byte-size possible and done in a way that the unused/reserved bits state remain intact. Example is
where values are AND-ed/Bit-tested in that way to function as a 'filter' for those reserved values. Extracting the value necessary . In the example the MSR Max_FID ( frequency ID ) is read and written to through RD/WRMSR.
So I think that, when its 'legal', disabling interrupts (for a while) to be a 'good' thing yet it makes your code even faster making the wait cycles even more important. ___________
On a sidenote I would like to wish Pinczakko a lot of fun disecting the AM2 board. Should prove to be a lot of fun; I think you'll like the way AMD has setup the Northbridge as part of the CPU.