The New Rebels Haven Forum!! Post New Topic  Post A Reply
my profile | directory login | search | faq | forum home

  next oldest topic   next newest topic
» The New Rebels Haven Forum!! » General » The BIOS Workshop » Assembly Source Code Evolution (Page 1)

  This topic comprises 4 pages: 1  2  3  4   
Author Topic: Assembly Source Code Evolution
Polygon
Admin
Member # 1

Icon 2 posted      Profile for Polygon     Send New Private Message       Edit/Delete Post 
As the ISA and PCI ROM's get bigger, the code to generate them must get more sophisticated. Here's some snippets showing code evolution that will help condense larger ROM's.

The first snippet of code is a somewhat standard method of setting memory timings, one-at-a-time. This is very easy to write, troubleshoot and understand:
code:
 ==============================Standard==================================
mov eax, 8000C290h ; DRAM Configuration Low Address
mov dx,0CF8h ; set port address
out dx,eax ; send address through the port
mov dx,0CFCh ; set port data
in eax,dx ; fetch data
and eax, 0F1FFFFFFh ; Bypass Max
or eax, 00E000000h ; 7x
out dx, eax ; send new data

mov eax, 8000C290h ; DRAM Configuration Low Address
mov dx,0CF8h ; set port address
out dx,eax ; send address through the port
mov dx,0CFCh ; set port data
in eax,dx ; fetch data
and eax, 0FFFF3FFFh ; Read/Write Queue Bypass
or eax, 00000C000h ; 16x
out dx, eax ; send new data

The next snippet shows how we can combine some code and only address the DRAM 32 bit register once:
code:
============================Combined====================================
mov eax, 8000C290h ; DRAM Configuration Low Address
mov dx,0CF8h ; set port address
out dx,eax ; send address through the port
mov dx,0CFCh ; set port data
in eax,dx ; fetch data
and eax, 0F1FFFFFFh ; Bypass Max
or eax, 00E000000h ; 7x
and eax, 0FFFF3FFFh ; Read/Write Queue Bypass
or eax, 00000C000h ; 16x
out dx, eax ; send new data

The next logical progression is to combine the dwords that mask the data and combine the dword data. I call this the "dword method". Note that this code performs the same function as the 2 above, but is much smaller:
code:
 =============================DWord======================================
mov eax, 8000C290h ; DRAM Configuration Low Address
mov dx,0CF8h ; set port address
out dx,eax ; send address through the port
mov dx,0CFCh ; set port data
in eax,dx ; fetch data
and eax, 0F1FF3FFFh ; Bypass Max & R/W Queue Bypass
or eax, 00E00C000h ; 7x & 16x
out dx, eax ; send new data


Here's a dword code snippet to set 21 memory timings. Each DRAM register is read only once and written to only once:
code:
============================21 Timings====================================
mov eax,08000C288h ; DRAM Timing Low address
mov ebx,000EDC224h ; register data for: Cas, Trcd, Trp, Tras,
; Trc, Trrd, Twr
mov dx,0CF8h ; set port address
out dx,eax ; send address through the port
mov dx,0CFCh ; set port data
in eax,dx ; fetch data
and eax,0FF000CC8h ; set data byte to zero
or eax,ebx ; increase data by new setting
out dx,eax ; send data through port data

mov eax,08000C28Ch ; cDRAM Timing High address
mov ebx,06DB20300h ; register data for: Twtr, Tref, Trfc
mov dx,0CF8h ; set port address
out dx,eax ; send address through the port
mov dx,0CFCh ; set port data
in eax,dx ; fetch data
and eax,0000CFCFFh ; set data byte to zero
or eax,ebx ; increase data by new setting
out dx,eax ; send data through port data

mov eax,08000C290h ; DRAM Configuration Low address
mov ebx,000000080h ; register data for: Drive Strength Normal
mov dx,0CF8h ; set port address
out dx,eax ; send address through the port
mov dx,0CFCh ; set port data
in eax,dx ; fetch data
and eax,0FFFFFF7Fh ; set data byte to zero
or eax,ebx ; increase data by new setting
out dx,eax ; send data through port data

mov eax,08000C294h ; DRAM Configuration High address
mov ebx,00F100090h ; register data for: Async, CRC, QBypass
mov dx,0CF8h ; set port address
out dx,eax ; send address through the port
mov dx,0CFCh ; set port data
in eax,dx ; fetch data
and eax,0F0EFFF0Fh ; set data byte to zero
or eax,ebx ; increase data by new setting
out dx,eax ; send data through port data

mov eax,08000C2A0h ; DRAM Miscellaneous address
mov ebx,00000012Ch ; register data for: Idle Cycle,
; Dynamic Idle, Queue Bypass, Trfc0-3
mov dx,0CF8h ; set port address
out dx,eax ; send address through the port
mov dx,0CFCh ; set port data
in eax,dx ; fetch data
and eax,0FFFFFE13h ; set data byte to zero
or eax,ebx ; increase data by new setting

The next snippet I call the "Indexing Method" and has been used in the addition of new Items. A CMOS register ir read, and that data is modified to be the correct data sent to a memory register as a dword. The following code is in general format:

code:
 ASL:
;========================== Setup DRAM Config High Address =======================
mov ebx, 0h ;clear
mov ecx, 0h ;clear
;=============================== Async Latency ===================================
mov eax, 0h ;clear
mov al, 21h ;set index-21h
out 72h, al ;send index thru port
in al, 73h ;fetch data
and al, 0Fh ;mask 0000.1111
mov cl, al ;save for later

mov eax,08000C294h ;DRAM Config High Address
mov dx,0CF8h ;set port address
out dx,eax ;send address thru port
mov dx,0CFCh ;set port data
in eax,dx ;fetch dword at DRAM Config High Address

mov ebx, eax ;save dword in case no change
cmp cl, 00h ;see if AUTO was selected
je NEXT ;jump, AUTO selected

mov eax, 0 ;clear
mov al, cl ;move index to eax
;mask 0Fh
add eax, 04h ;add 04h to eax - 0000.0001+0000.0100 = 0000.0101
sal eax, 04h ;shift left 4 bits - 0101.0000<----0000.0101
;mask F0h
;add eax, 40h ;add 40h to eax 0001.0000+0100.0000 = 0101.0000
;sal eax, 04h ;shift left 4 bits - not needed
;============================== READ-MODIFY-WRITE ================================

and ebx,0FFFFFF0Fh ;Mask Async Latency Bit
or eax,ebx ;add Async Latency bits to dword
out dx,eax ;Send dword at DRAM Config High Address


; Auto, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 0Bh settings 0Fh mask

Edit: As I get more up to speed in Assembly, I'm noticing things I should not have done. The one for this topic is to use the directive "shr" instead of "sar" when shifting right. "sar" keeps the left-most bit as it could be a sign bit.

--------------------
Too Many Computers,... Too Little Time .....  - Com'on ???..!

Posts: 27985 | From: Fire Island, NY | Registered: Feb 2003
Pinczakko
Honorary Member
Member # 699

Icon 1 posted      Profile for Pinczakko     Send New Private Message       Edit/Delete Post 
Hi Polygon,
Changing too much values in the memory controller registers at once is quite dangerous. Indeed, it improves the code compression in the module that you write, but it can introduce a subtle bug that causes instability in the system. Please read the critical update section, i.e. A Very Subtle Bug and Its Patch section in this article. FYI, the link to the section in the beginning of the article is broken, I haven't got time to fix it. Just click the nearest section link to get to the section as fast as possible.

This is the important summary:

quote:
This new patch only initializes one register at a time and gives enough "CPU clock-cycle" to the PCI bus intensive routine. Personally, I think that to appropriately initialize a PCI chipset it's not enough just by relaxing the read-write timing, but more importantly we have to initialize only one register at a time in order to minimize the "sudden-load" in the chipset. This is especially true for performance-related registers within the chipset. In my tests for this new patch, I placed the call to the patch in a few places within the POST-jump-table an everyone of them work flawlessly as expected. The testing has been carried out more than 100 boot-reboot cycle for each variant
Cheers,

Pinczakko

--------------------
-- Human knowledge belongs to the world --

Posts: 133 | From: Taka Bonerate National Park, Indonesia | Registered: Jun 2004
Polygon
Admin
Member # 1

Icon 2 posted      Profile for Polygon     Send New Private Message       Edit/Delete Post 
OK, thanks! I'll check it out. [Smile]

Edit: Yes, I remember reading that several times, but I didn't remember it a year later [Big Grin]

--------------------
Too Many Computers,... Too Little Time .....  - Com'on ???..!

Posts: 27985 | From: Fire Island, NY | Registered: Feb 2003
Polygon
Admin
Member # 1

Icon 2 posted      Profile for Polygon     Send New Private Message       Edit/Delete Post 
OK, here's question. In the following code, the DRAM Timing Low address, is addressed 2 times, both times using the same port address and the same data address. But doesn't dx still retain the correct port data such that it does not need the port data to be entered a 2nd time? Or do those extra clock cycles help with the subtle bug issue?


code:
 	mov eax,08000C288h		; DRAM Timing Low address
mov ebx,000200000h ; copy register data for Twr: 5T
mov dx,0CF8h ; set port address
out dx,eax ; send address through the port
mov dx,0CFCh ; set port data
in eax,dx ; fetch data
and eax,0FFCFFFFFh ; set data byte to zero
or eax,ebx ; increase data by new setting
out dx,eax ; send data through port data

mov eax,08000C288h ; DRAM Timing Low address
mov ebx,000800000h ; copy register data for Trrd: 4T
mov dx,0CF8h ; set port address
out dx,eax ; send address through the port
mov dx,0CFCh ; set port data
in eax,dx ; fetch data
and eax,0FF3FFFFFh ; set data byte to zero
or eax,ebx ; increase data by new setting
out dx,eax ; send data through port data

It seems like the code could be reduced by the links I have commented out. Yet it still writes to 1 register at a time:

code:
 	mov eax,08000C288h		; DRAM Timing Low address
mov ebx,000200000h ; copy register data for Twr: 5T
mov dx,0CF8h ; set port address
out dx,eax ; send address through the port
mov dx,0CFCh ; set port data
in eax,dx ; fetch data
and eax,0FFCFFFFFh ; set data byte to zero
or eax,ebx ; increase data by new setting
out dx,eax ; send data through port data

;mov eax,08000C288h ; DRAM Timing Low address
;mov ebx,000800000h ; copy register data for Trrd: 4T
;mov dx,0CF8h ; set port address
;out dx,eax ; send address through the port
;mov dx,0CFCh ; set port data

in eax,dx ; fetch data
and eax,0FF3FFFFFh ; set data byte to zero
or eax,ebx ; increase data by new setting
out dx,eax ; send data through port data



--------------------
Too Many Computers,... Too Little Time .....  - Com'on ???..!

Posts: 27985 | From: Fire Island, NY | Registered: Feb 2003
Twobombs
Senior Member
Member # 3355

Icon 1 posted      Profile for Twobombs     Send New Private Message       Edit/Delete Post 
Might sound stupid, but is your code not too fast ? I don't see extra timing instructions such as jmp short $+2 or similar. I'm under the impression that those ports are not amused with too tight coding because they need a couple of clicks to recuperate or set the return value. I might be totally of the track here, but I seem to miss it in the code, unless I'm missing something or worrying about a thing that isn't an issue ( on that board )

example on what I mean:
 -

The circumstances mentioned in the article Pinczakko did are valid and, as always is the case with his articles, prove to be interesting in that the timing of timing setup is essential. When one wants to edit the memory settings at such a late state, as is the case with ISA Option ROMS that are IMHO often if not always executed after initialisation and almost complete execution of the System Code of the BIOS, that registers that are written to Northbridges ( eg: AMD = CPU MSRs, the AMD docs clearly state that a lot of those values should only be initialised to at early POST )  -

I saw in the AMI code that most RD/WRMRS's especially for the cpu/chipset/memory init are executed in cache, hence the speed problem and problems with read/writing to those type of low level registers at a later state. ( eg: WRMSR takes somewhere close to 19 clocks to execute, more then enough time to put the system in a complete instable state. ) Perhaps not unlike the setup of the System Clock on the MCP55 at early that works perfectly, yet totally fails in the way the BIOS code does it at a(ny) later stage.

Perhaps your code should be more dependant on the already existing setup value, and your value on a byte level AND/OR-ed into it after reading out the previous value, instead of a 'clean' mov ?

That way some unexpected or un-documented values aren't overwritten....

Edit: displayed pics.

[ June 04, 2008, 11:57 AM: Message edited by: Polygon ]

--------------------
!gesa 3300 diff-transplant -> done now working on CPU module patch

Posts: 218 | From: NL | Registered: May 2007
Polygon
Admin
Member # 1

Icon 2 posted      Profile for Polygon     Send New Private Message       Edit/Delete Post 
None of the codes posted in the entire BIOS Workshop have jmp short $+2 or similar instructions. So I don't know how to answer your inquiries. All the varities of code I posted above, are in working BIOS and so far, exhibit no issues. Not to say that there may not be race conditions waiting to rear it's ugly head.

When I started writing Option ROMs over a year ago, the first thing I did was disassemble working code in posted BIOS that were written by tictac, sideefect, and a few other experienced BIOS modders. None have jmp short $+2 or similar instructions.

In the original topic Building an AWARD ISA Option ROM Discussion Thread where tictac and Pinczakko taught us how to write ISA Option ROMS, jmp short $+2 or similar instructions., were never mentioned.

Since that time, probably 50 mod BIOS have been posted using varities of the above code. No issue has ever been reported....

--------------------
Too Many Computers,... Too Little Time .....  - Com'on ???..!

Posts: 27985 | From: Fire Island, NY | Registered: Feb 2003
Twobombs
Senior Member
Member # 3355

Icon 1 posted      Profile for Twobombs     Send New Private Message       Edit/Delete Post 
Ok, if that works its fine.

--------------------
!gesa 3300 diff-transplant -> done now working on CPU module patch

Posts: 218 | From: NL | Registered: May 2007
Polygon
Admin
Member # 1

Icon 2 posted      Profile for Polygon     Send New Private Message       Edit/Delete Post 
That is probably a good question for Pinczakko, by PM as he may not see this thread...

I honestly don't know if it's needed or not as my experience writing code is very, very limited. And just because it works doesn't necessarly mean it is "safe" [Smile]

--------------------
Too Many Computers,... Too Little Time .....  - Com'on ???..!

Posts: 27985 | From: Fire Island, NY | Registered: Feb 2003
Pinczakko
Honorary Member
Member # 699

Icon 1 posted      Profile for Pinczakko     Send New Private Message       Edit/Delete Post 
I personally think that it's important to include an instruction that "relax" the timing for register setup routine such as the:
code:
jmp $+2

The main point is to "wait" for the PCI Bus (PCIe or Hypertransport for your particular case) to be ready before sending the data to initialize the chipset. More over, it's important to only set/reset the bit(s) that we know or experiment with and leaving the rest to their unaltered form as it can harm the stability of the system as well.

Anyway, back then I intended to experiment with assembly instructions that would "serialize" i.e., "flushes" the cache before executing any PCI register related activity in the code. Nonetheless, I haven't got the time until now, maybe you guys can experiment with it ;-).

One more thing that we must be aware that certain PCI register is "one time set" only, i.e. you can only initialize it once, you cannot reinitialize it later. In Intel chipset document, some registers are mentioned to be of that type. In a case like this, the only option we have is to patch the system BIOS, because Option ROM simply wouldn't affect the register.

A little off-topic. But, I want to inform that I've got AMD690G mobo right now and new experiments should come from me ;-). Looking forward experimenting with RAM related initialization.

That's all for now guys. Keep up the good work. I'll be joining soon [Big Grin]

--------------------
-- Human knowledge belongs to the world --

Posts: 133 | From: Taka Bonerate National Park, Indonesia | Registered: Jun 2004
Polygon
Admin
Member # 1

Icon 2 posted      Profile for Polygon     Send New Private Message       Edit/Delete Post 
OK, [Smile]

So in the following code, where would the jmp short $+2 instruction(s) be placed:

code:
 ==============================Standard==================================
mov eax, 8000C290h ; DRAM Configuration Low Address
mov dx,0CF8h ; set port address
out dx,eax ; send address through the port
mov dx,0CFCh ; set port data
in eax,dx ; fetch data
and eax, 0F1FFFFFFh ; Bypass Max
or eax, 00E000000h ; 7x
out dx, eax ; send new data

mov eax, 8000C290h ; DRAM Configuration Low Address
mov dx,0CF8h ; set port address
out dx,eax ; send address through the port
mov dx,0CFCh ; set port data
in eax,dx ; fetch data
and eax, 0FFFF3FFFh ; Read/Write Queue Bypass
or eax, 00000C000h ; 16x
out dx, eax ; send new data



--------------------
Too Many Computers,... Too Little Time .....  - Com'on ???..!

Posts: 27985 | From: Fire Island, NY | Registered: Feb 2003
Polygon
Admin
Member # 1

Icon 2 posted      Profile for Polygon     Send New Private Message       Edit/Delete Post 
Originally posted by twobombs:
quote:
Perhaps your code should be more dependant on the already existing setup value, and your value on a byte level AND/OR-ed into it after reading out the previous value, instead of a 'clean' mov ?
I am reading the existing value and changing only the bits of interest. When a single value in the memory timing is changed, are you saying only write to that single register and not the whole 32bit register, which is made up of 8-4 bit registers?

As some "timings" straddle 2 registers, and those registers contain either some or all bits for other timings, I don't know how to not re-write those bits that are not of interest when they are in the registers of interest.

Again, from the very beginning, all code generated by tictac and Pinczakko has written to the 32 bits even if only 1 setting is changed....

--------------------
Too Many Computers,... Too Little Time .....  - Com'on ???..!

Posts: 27985 | From: Fire Island, NY | Registered: Feb 2003
Pinczakko
Honorary Member
Member # 699

Icon 1 posted      Profile for Pinczakko     Send New Private Message       Edit/Delete Post 
Hi Polygon,

You're right that my code was writing 32-bit register(s) at once ;-). I thought it was just fine back then.

However, I realize that it's not a recommended practice after I found out about the thing in my "Critical Update" in the article linked above.

Better to code a procedure that writes 8-bit at a time with parameters consist of 8-bit value and the address of the intended register. Maybe Twobombs can help at this for the time being because I'm still too busy. I'll drop by tomorrow, if I have a chance.

Anyway, as a hint, you can modify the code in that article. It should be pretty easy ;-).

(-- Pasted below for your convenience and cross documenting --)

code:
;------------------------------ file: mem_optimize.asm -----------------------------------
use16

start:
pushf
cli


mov cx, 0x50 ;patch the ioq register of the chipset
call Read_PCI_Bus0_Byte
or al, 0x80
mov cx, 0x50
call Write_PCI_Bus0_Byte

mov cx, 0x64 ;DRAM Bank 0/1 Interleave = 4-way
call Read_PCI_Bus0_Byte
or al, 2
mov cx, 0x64
call Write_PCI_Bus0_Byte

mov cx, 0x65 ;DRAM Bank 2/3 Interleave = 4-way
call Read_PCI_Bus0_Byte
or al, 2
mov cx, 0x65
call Write_PCI_Bus0_Byte

mov cx, 0x66 ;DRAM Bank 4/5 Interleave = 4-way
call Read_PCI_Bus0_Byte
or al, 2
mov cx, 0x66
call Write_PCI_Bus0_Byte

mov cx, 0x67 ;DRAM Bank 6/7 Interleave = 4-way
call Read_PCI_Bus0_Byte
or al, 2
mov cx, 0x67
call Write_PCI_Bus0_Byte

mov cx, 0x68 ;Allow pages of different bank to be active simultanoeusly
call Read_PCI_Bus0_Byte
or al, 0x44
mov cx, 0x68
call Write_PCI_Bus0_Byte

mov cx, 0x69 ;Fast DRAM Precharge for Different Bank
call Read_PCI_Bus0_Byte
or al, 0x8
mov cx, 0x69
call Write_PCI_Bus0_Byte

mov cx, 0x6C ;Activate Fast TLB lookup
call Read_PCI_Bus0_Byte
or al, 0x8
mov cx, 0x6C
call Write_PCI_Bus0_Byte


popf

clc ;indicate that this POST routine successful
retn ;return near to the header of the rom file


;-- Read_PCI_Byte__ --
;in: cx = dev_func_offset_addr
;out: al = reg_value

Read_PCI_Bus0_Byte:
mov ax, 8000h
shl eax, 10h
mov ax, cx
and al, 0FCh
mov dx, 0CF8h
out dx, eax
;<-- you can put the "relaxing" jump here -->
mov dl, 0FCh ; '?'
mov al, cl
and al, 3
add dl, al
in al, dx
;<-- you can put the "relaxing" jump here -->
retn


;-- Write_Bus0_Byte --
;in: cx = dev_func_offset addr
;al = reg_value to write

Write_PCI_Bus0_Byte:
xchg ax, cx
shl ecx, 10h
xchg ax, cx
mov ax, 8000h
shl eax, 10h
mov ax, cx
and al, 0FCh
mov dx, 0CF8h
out dx, eax
;<-- you can put the "relaxing" jump here -->
add dl, 4
or dl, cl
mov eax, ecx
shr eax, 10h
out dx, al
;<-- you can put the "relaxing" jump here -->
retn
;------------------------------ file: mem_optimize.asm -----------------------------------



--------------------
-- Human knowledge belongs to the world --

Posts: 133 | From: Taka Bonerate National Park, Indonesia | Registered: Jun 2004
Pinczakko
Honorary Member
Member # 699

Icon 1 posted      Profile for Pinczakko     Send New Private Message       Edit/Delete Post 
I forgot to say that it's highly recommended to disable the interrupt with
code:
cli

instruction before your PCI related code and enable it with
code:
sti

after you're done with it. This is assuming you're still coding an expansion ROM. If you make a system BIOS patch, you have to look at the current system state from the code surrounding your patch [Wink]

--------------------
-- Human knowledge belongs to the world --

Posts: 133 | From: Taka Bonerate National Park, Indonesia | Registered: Jun 2004
Polygon
Admin
Member # 1

Icon 2 posted      Profile for Polygon     Send New Private Message       Edit/Delete Post 
OK, thanks! I'll look that over... [Smile]

--------------------
Too Many Computers,... Too Little Time .....  - Com'on ???..!

Posts: 27985 | From: Fire Island, NY | Registered: Feb 2003
Twobombs
Senior Member
Member # 3355

Icon 1 posted      Profile for Twobombs     Send New Private Message       Edit/Delete Post 
I believe it to be very good practice to disable Interrupts when possible as it greatly improves system stability while writing those system registers! Once again applause from me *clap clap clap*. I forgot about that important nugget of info ; the last time I worked with that type of code the 16 Mhz 68040 was regarded as 'hot', and Hblank programming was *in*.

As we're programming into chipsets that often don't have all of their features documented, or have some reserved bits that spring into action when a certain condition is met I believe it's wise not to touch variables that aren't necessary to touch. I see that way of programming a lot in BIOSes, as those guys also don't know what's going to happen futher down the road(map).

A read from the register, such as a typical out/in command should be separated with wait cycles. There are several of those, one of them is the jmp+$2, but there are others with different characteristics. The picture below sets up some of the PCI devices on the MCP55Pro southbridge through SMbus calls, you can clearly see the jmp short $+2 , a lot of links on the internet quote the OUT 0EBh, AL to be a wait cycle as well, which does make some sense to me, but I haven't really seen the whole idea or truth behind it.
 -

The value retrieved from the in/out command should be preferably AND-ed with the smallest byte-size possible and done in a way that the unused/reserved bits state remain intact. Example is
 -

where values are AND-ed/Bit-tested in that way to function as a 'filter' for those reserved values. Extracting the value necessary . In the example the MSR Max_FID ( frequency ID ) is read and written to through RD/WRMSR.

So I think that, when its 'legal', disabling interrupts (for a while) to be a 'good' thing yet it makes your code even faster making the wait cycles even more important.
___________

On a sidenote I would like to wish Pinczakko a lot of fun disecting the AM2 board. Should prove to be a lot of fun; I think you'll like the way AMD has setup the Northbridge as part of the CPU.


Edit: Host pics locally

[ September 23, 2007, 06:42 AM: Message edited by: Polygon ]

--------------------
!gesa 3300 diff-transplant -> done now working on CPU module patch

Posts: 218 | From: NL | Registered: May 2007
  This topic comprises 4 pages: 1  2  3  4   

Post New Topic  Post A Reply Close Topic   Feature Topic   Move Topic   Delete Topic next oldest topic   next newest topic
Hop To:


Contact Us | Main Page | Privacy Statement

Copyright© 2002-2013 Lejabeach Dot Com, Inc.

Powered by Infopop Corporation
UBB.classic™ 6.7.2