[1;30mLink: https://www.hanshq.net/fire.html[0m
Many years ago, sometime in the nineties, my family was having dinner at
the neighbours', and their son showed me some cool DOS programs on the
computer. One of the programs was a demo that drew animated flames on the
screen. I was amazed! Asking what language it was written in, I was told it
was Pascal.
[9C Until then I had only programmed in QBasic, but if one could make
fire with Pascal, I was determined to learn it.
My uncle supplied me with his university text-book on the language, and I
stepped to it. Unfortunately, the book turned out to be extremely thin on
the subject of making fire. Also it was mainly concerned with programming
the PDP-10 as opposed to the IBM PC that I was using. And so, I never
learned the skill.
(There's a lesson here about the social aspects of programming: I could
have asked more and much better questions of our neighbour.)
I've wanted to revisit this for years. Having acquired better programming,
English, and web search skills, it's time to fill this gap in my education.
This post contains a walk-through of the classic MS-DOS firedemo, a port of
it to SDL, and an implementation of the fire effect that runs on bare-metal
PCs.
[31mFiredemo[0m
According to the internet, Javier "Jare" Arévalo's firedemo from 1993 was
the first implementation of this effect. He wrote a blog post about it for
the 20th anniversary, which includes a version in Javascript (source on
GitHub).
When I asked about the firedemo, here's what he told me:
[33m It began when we bought a 80387 math coprocessor and, to enjoy it,[0m
[33mplayed a lot with a famous fractal generator called Fractint. Then I wanted[0m
[33mto make a kind of plasma style fractal, but animated in a more complex way[0m
[33mthan the color rotation typical of the time. I just started writing some[0m
[33mcode without thinking much. A few bugs later, I had something that looked[0m
[33mlike small blue explosions that quickly faded to black. Tweaking the code[0m
[33mrather than fixing the bugs got me the fire effect. We did realize how and[0m
[33mwhy it looked like a fire, and JCAB's implementation in Inconexia was fully[0m
[33mintentional and correct, but I never sat down to truly understand all the[0m
[33msubtle bits in the original (where did the initial white explosion come[0m
[33mfrom? Why was there no apparent "real" random number generator, yet it[0m
[33mlooked random?) until I recreated it in Javascript. As far as I can tell,[0m
[33mthe Javascript version is pixel perfect, it shows the exact same animation[0m
[33mas the original did.[0m
(FractInt still exists and has been ported to Linux. Jare released a plasma
effect demo, iris, a few days after the fire demo. Inconexia (YouTube) uses
the fire effect in the final scene.)
(I'm not sure whether this is the program I saw that night long ago, or if
I saw one of the many other implementations that followed. Kirk A. Baum has
collected some of them in firecode.zip, including a version that is indeed
written in Pascal called Flames by Mark D. Mackey.)
Jare's firedemo.zip comes with source code in assembly. Let's dissect it to
see how it works: Data
[1;30m; ------------------------------ FIRE.ASM ------------------------------[0m
; Bye Jare of VangeliSTeam. Want more comments? Write'em. O:-)
[1m.MODEL SMALL .STACK 400 DOSSEG LOCALS[0m
This syntax suggests the code is written for Borland's Turbo Assembler. (I
suppose this write-up serves as an answer to the call for more comments.)
.DATA
[1mFirePal LABEL BYTE ; Fire palette, colors 0-63 ------------[0m
[1m DB 0, 0, 0, 0, 1, 1, 0, 4, 5, 0, 7, 9 DB 0, 8, 11, 0, 9, 12, 15, 6,[0m
[1m 8, 25, 4, 4 DB 33, 3, 3, 40, 2, 2, 48, 2, 2, 55, 1, 1 DB 63, 0, 0, 63, 0,[0m
[1m 0, 63, 3, 0, 63, 7, 0 DB 63, 10, 0, 63, 13, 0, 63, 16, 0, 63, 20, 0 DB 63,[0m
[1m 23, 0, 63, 26, 0, 63, 29, 0, 63, 33, 0 DB 63, 36, 0, 63, 39, 0, 63, 39, 0,[0m
[1m 63, 40, 0 DB 63, 40, 0, 63, 41, 0, 63, 42, 0, 63, 42, 0 DB 63, 43, 0, 63,[0m
[1m 44, 0, 63, 44, 0, 63, 45, 0 DB 63, 45, 0, 63, 46, 0, 63, 47, 0, 63, 47, 0[0m
[1m DB 63, 48, 0, 63, 49, 0, 63, 49, 0, 63, 50, 0 DB 63, 51, 0, 63, 51, 0, 63,[0m
[1m 52, 0, 63, 53, 0 DB 63, 53, 0, 63, 54, 0, 63, 55, 0, 63, 55, 0 DB 63, 56,[0m
[1m 0, 63, 57, 0, 63, 57, 0, 63, 58, 0 DB 63, 58, 0, 63, 59, 0, 63, 60, 0, 63,[0m
[1m 60, 0 DB 63, 61, 0, 63, 62, 0, 63, 62, 0, 63, 63, 0[0m
FirePal contains the first 64 colours of the palette that will be used,
stored as (Red,Green,Blue) byte triplets with each value between 0 and 63.
The remaining 192 colours of the 256-colour palette are all white and will
get set separately when programming the VGA palette.
[1mByeMsg DB 'FIRE was coded bye Jare of VangeliSTeam, 9-10/5/93', 13, 10[0m
[1mDB 'Sayonara', 13, 10, 10 DB 'ELYSIUM music composed by Jester of Sanity[0m
[1m(an Amiga demo group, I think)', 13, 10 DB 'The music system you''ve just[0m
[1mbeen listening is the VangeliSTracker 1.2b', 13, 10 DB 'VangeliSTracker[0m
[1mis Freeware (no money required), and distributed in source code', 13,[0m
[1m10 DB 'If you haven''t got your copy of the VangeliSTracker, please[0m
[1mgo to your', 13, 10 DB 'nearest BBS and get it NOW', 13, 10 DB 'Also,[0m
[1mdon''t forget that YOU can join the VangeliSTeam. Contact the', 13, 10[0m
[1mDB 'VangeliSTeam in the following addresses: ', 13, 10, 10 DB ' Mail:[0m
[1mVangeliSTeam ³ This demo is dedicated to', 13, 10 DB ' Juan Carlos[0m
[1mArévalo Baeza ³ Mark J. Cox', 13, 10 DB ' Apdo. de Correos 156.405 ³[0m
[1mand', 13, 10 DB ' 28080 - Madrid (Spain) ³ Michael Abrash', 13, 10 DB '[0m
[1mInternet: jarevalo@moises.ls.fi.upm.es ³ At last, the PC showed good', 13,[0m
[1m10 DB ' Fidonet: 2:341/27.16, 2:341/15.16, 2:341/9.21 ³ for something.',[0m
[1m13, 10, 10 DB 'Greetings to all demo groups and MOD dudes around.', 13, 10[0m
[1mDB '$'[0m
ByeMsg contains the text that's printed before exiting the program. 13
and 10 are the ASCII character codes for carriage return and newline,
respectively. The dollar sign signals the end of the string.
[1m UDATASEG[0m
[1mImagen DB 80*50 DUP (?)[0m
[1mImagen2 DB 80*50 DUP (?)[0m
These two 400-byte uninitialized arrays will be used for storing the
intensity of the fire in each pixel. One array is for the current frame,
and the other for the previous one. Setting up the VGA
[1m .CODE STARTUP[0m
[1m CLD MOV AX,13h INT 10h CLI[0m
CLD clears the direction flag (the DF bit in FLAGS), which means the index
registers SI and DI get incremented (as opposed to decremented) after each
string operation, such as LODS and STOS below.
INT 10h raises an interrupt that's handled by the VGA BIOS, like a system
call. The contents of the AH register (the high 8 bits of AX) specify the
function, in our case 0 which means "Set video mode", and AL specifies
which mode to set, in our case Mode 13h. This mode has a resolution of
320x200 pixels and 256 colors, specified with one byte per pixel in a
linear address space starting at A000:0000. The BIOS configures this mode
by writing specific values to the many VGA registers that control exactly
how the contents of the VGA memory, the frame-buffer, is to be accessed and
drawn onto the screen.
The CLI instruction disables interrupts. This is in order to protect the
code that writes directly to VGA registers below. The code will perform
OUT operations in a certain order, and could break if an interrupt handler
performed other I/O operations in the middle.
[1m MOV DX,3c4h MOV AX,604h ; "Unchain my heart". And my VGA... OUT[0m
[1m DX,AX MOV AX,0F02h ; All planes OUT DX,AX[0m
Now the code starts to tweak the VGA registers, moving away from the
standard Mode 13h. 3c4h, loaded into DX, is the I/O port number of the VGA
Sequence Controller Index register (See The Graphics Programming Black Book
Chapter 23 and FreeVGA).
By doing OUT DX,AX, the code writes the 16-bit value in AX to the port,
which is effectively the same as writing the 8-bit value in AL to ec4h
(Sequence Controller Index Register) and the 8-bit value in AH to ec5h
(Sequence Controller Data Register). The Index Register selects an internal
Sequence Controller register, and the Data Register provides the value to
write into it.
In our case, the code is writing 06h to register index 04h, which is the
Sequencer Memory Mode Register. This disables the Chain 4 bit which is
otherwise set in mode 13h. This is what the "Unchain" comment refers to:
turning off Chain-4 addressing mode and entering normal mode.
The VGA RAM is split into four different "planes", which were often
implemented by four different memory chips on the circuit board. One
reason was to solve the frame-buffer memory-access problem: to output 70
high-resolution frames per second, the VGA's CRT controller would need to
read bytes at a higher rate than was feasible for a byte-addressed DRAM
chip at the time. But with the frame-buffer split into four planes, stored
in four chips, the CRT controller could read four bytes in parallel at a
time, enough to keep up with the CRT refresh rate.
Chain 4 is a mode for addressing the four memory planes. When enabled,
it uses the two least significant bits of the address to select which
plane to read or write to (and leaves those two bits clear when addressing
inside the plane, if I understand correctly), allowing linear addressing
of the four planes "chained together". For example, writes to A000:0004,
A000:0005, and A000:0006 in Chain 4 mode would end up at address 4 in plane
0, 1, and 2 respectively.
With Chain 4 disabled, the programmer has to explicitly select which
plane(s) to access by setting the VGA Sequence Controller's Map Mask
Register (index 02h). The write of 0Fh to that register enables writes to
all four planes at once, hence the "All planes" comment. This means that
each byte written to the framebuffer will get written to all four planes at
that address, effectively appearing as four consecutive identical pixels.
[1m MOV DX,3D4h MOV AX,14h ; Disable dword mode OUT DX,AX MOV AX,0E317h[0m
[1m ; Enable byte mode. OUT DX,AX[0m
The VGA Sequence Controller controls how the frame-buffer is accessed
from the CPU, but it's the CRT Controller that decides how to access the
frame-buffer when scanning it to produce the video signal. 3D4h addresses
the CRT Controller's Index Register (immediately followed by the Data
Register). Writing 0014h to that port sets the Underline Location Register
to zero, clearing the DW and DIV4 bits which enabled the double-word
addressing mode that is normally used for scanning when Chain-4 is enabled.
The write of E3h (the leading 0 in 0E317h is required for the assembler to
recognize it as a number) to index 17h sets the Byte Mode bit in the CRTC
Mode Control Register.
If I understand correctly, the reason for scanning to be done with
double-word addressing in mode 13h is that Chain-4 clears the lower two
address bits when writing into a plane. This means that after the scanner
has read a value from each plane, it needs to increment the address by four
(the size in bytes of a 32-bit "double word") to get to the next set of
values.
[1m MOV AL,9 OUT DX,AL INC DX IN AL,DX AND AL,0E0h ; Duplicate each[0m
[1m scan 8 times. ADD AL,7 OUT DX,AL[0m
The first two instructions above write 09h to the CRT Controller Index
Register, which is the index of the Maximum Scan Line Register. Then DX is
incremented to address the port of the CRT Controller Data Register, after
which a byte is read, masked, added with 7, and written back, resulting in
the Maximum Scan Line field of the register being set to 7, which means
each scan line will be repeated eight (7+1) times.
Regular mode 13h produces 400 scan lines, with each scan line repeated
twice for a vertical resolution of 200 pixels. With the operation above,
the vertical resolution becomes 50 pixels instead. Mode 13h has a
horizontal resolution of 320 pixels, but with our "unchaining" and writing
to all four planes at once above, we now have a horizontal resolution of 80
pixels instead. In summary, these operations have changed from the standard
256-color 320-by-200 pixel mode 13h to a custom 256-color 80-by-50 mode.
Why is this lower resolution desirable? Aren't the pixels chunky enough
in 320x200 mode? The reason was probably to make the program run faster.
Computing the values for 80x50 pixels is much less work than for 320x200,
so the lower resolution allows for producing more frames per second on a
slow machine.
[1m MOV DX,3c8h ; Setup palette. XOR AL,AL OUT DX,AL INC DX MOV CX,64*3[0m
[1mMOV SI,OFFSET FirePal ; Prestored... @@pl1: LODSB OUT DX,AL LOOP @@pl1[0m
The DAC (Digital-to-Analog Converter) is the part of the video adapter
responsible for converting the bits coming out of memory to an analog video
signal that can be fed to a monitor. It contains 256 registers, mapping
each possible byte value to an 18-bit color representation: 6 bits for red,
green, and blue intensity, respectively. (The VGA also has something called
the Palette RAM, which is different and used for EGA compatibility.)
To program the DAC, our program first writes a zero to 3c8h, the DAC
Address Write Mode Register, signalling that it wishes to set the value of
DAC register zero. It then writes repeatedly to port 3c9h, the DAC Data
Register, three byte-sized writes for each of the 64 colours in FirePal
(LODSB reads a byte from DS:SI and then increments SI, LOOP jumps to a
label and decrements CX until it's zero).
[1m MOV AL,63 MOV CX,192*3 ; And white heat. @@pl2: OUT DX,AL LOOP[0m
[1m@@pl2[0m
The code above fills the remaining 192 DAC registers with "white heat":
all-white (red, green and blue all 63) color values.
[1m STI[0m
STI turns interrupts back on, now that the code for setting up the VGA is
done. Main Loop
[1m MOV AX,DS MOV ES,AX MOV DI,OFFSET Imagen ; Cleanup both Images. MOV[0m
[1m CX,80*50 XOR AX,AX REP STOSW[0m
Before we enter the main loop, the code above clears the Imagen and Imagen2
arrays using REP STOSW which performs a word-sized write (of AX which is
zero) to ES:DI, increments DI, and repeats 400 (CX) times. Using word-sized
writes means the code writes 800 bytes in total, clearing both arrays.
[1mMainLoop: MOV DX,3DAh ; Retrace sync. @@vs1: IN AL,DX TEST AL,8 JZ @@vs1[0m
[1m@@vs2: IN AL,DX TEST AL,8 JNZ @@vs2[0m
The main loop starts by reading from the 3DAh I/O port, which is the VGA's
Input Status #1 Register, and checking the VRetrace bit. It loops first
while the bit is zero and then while it's one, effectively waiting for it
to go from one to zero, thus synchronizing the loop with the VGA refresh
cycle.
[1m PUSH DS POP ES MOV SI,81+OFFSET Imagen ; Funny things start here.[0m
[1m 8-P MOV DI,81+OFFSET Imagen2 MOV CX,48*80-2 XOR BH,BH[0m
Treating Imagen and Imagen2 as 80-by-50 two-dimensional arrays (matching
the screen resolution), SI and DI are set up to point to the second
element on the second row (counting from the top-left corner) of Imagen
and Imagen2, respectively. CX will be used for the loop count, and BH is
cleared to be used as a zero below.
[1m@@lp: XOR AX,AX ADD AL,-1[SI] ADC AH,BH ADD AL,-80[SI] ADC AH,BH ADD[0m
[1mAL,-79[SI] ADC AH,BH ADD AL,-81[SI] ADC AH,BH ADD AL,1[SI] ADC AH,BH ADD[0m
[1mAL,80[SI] ADC AH,BH ADD AL,79[SI] ADC AH,BH ADD AL,81[SI] ADC AH,BH[0m
The code above sums together the values of all eight pixels neighbouring SI
in Imagen into AX (-1[SI] is the pixel to the left of SI, -80[SI] is the
pixel just above, etc.). First the low bits are added to AL, then any carry
bit is added to AH using ADC.
It is because the code accesses neighbours of SI that it was set up to
start at the second element of the second row in Imagen, and why the loop
count in CX was chosen so the process will stop after the second-last
element of the second-last row.
[1m ROR AX,1[0m
[1m ROR AX,1[0m
[1m ROR AX,1[0m
Rotating the bits in AX three steps to the right leaves AL containing the
previous sum divided by eight, in other words it contains the average of
the eight values surrounding SI. This is the core idea in the fire effect:
computing the "heat" of each pixel as an average of its neighbours.
[1m TEST AH,60h ; Wanna know why 60h? Me too JNZ @@nx ; This is pure .[0m
[1m experience .[0m
After the ROR instructions, the three least significant bits of the sum of
neighbours have ended up as the three highest bits of AH. This means that
the TEST instruction effectively checks whether the two low bits of the
sum were set. If they were not, we fall through to the code below. As the
comment suggests, this was probably chosen somewhat randomly.
CMP DI,46*80+OFFSET Imagen2 ; And this was a bug. JNC @@dec ; This
one's by my cat. OR AL,AL ; My dog coded here too. JZ @@nx ; I helped my
sister with this one. @@dec: DEC AL ; Yeah! Cool a bit, please.
The code above checks whether DI is past the first 46 rows of Imagen2, and
if so jumps straight to @@dec. Otherwise, the code checks whether AL is
greater than zero, and only proceeds to @@dec if so.
All this is effectively to decide whether to decrement AL, thereby
"cooling" that pixel. If no cooling occurred, the screen would eventually
fill with a single colour. Instead, the code cools pixels given the
semi-random condition that the two low bits of the neighbour sum are zero
(so roughly 25% of the time).
If AL is already zero however, decrementing doesn't "cool" it, but rather
"re-ignites" it since the value wraps around to 255. The code only allows
this for pixels in the lower four rows, which is how it "feeds the fire"
from below. Note that when the program starts, all pixels are initially
zero, so the low bits of the sum will be zero, and all pixels on the lower
rows will ignite, causing the initial burst of flame.
[1m@@nx: INC SI STOSB LOOP @@lp ; New image stored in Imagen2.[0m
With the final value of AL computed, STOSB writes it to the address pointed
to by DI which it also increments. SI is also incremented, and the loop
repeats with the next pixel.
[1m MOV SI,80+OFFSET Imagen2 ; Scrolling copy. :-) MOV DI,OFFSET Imagen[0m
[1m MOV CX,40*48 REP MOVSW[0m
With all the new pixel values in Imagen2, the program now copies them back
to Imagen for next time. By starting the source pointer (SI) 80 bytes into
the array, the copy effectively scrolls the contents up one line. The
actual copying is done with REP MOVSW which performs 40*48 (CX) word-sized
moves from DS:SI to ES:DI, incrementing SI and DI after each one. Only 40
moves are needed per line because they are word-sized, and only 48 lines
are copied because the top line is discarded (by starting at offset 80) and
the bottom line is all zeros.
[1m MOV SI,80*43+OFFSET Imagen2 ; Get rid of some ashes. MOV CX,6*80[0m
[1mMOV AH,22 @@rcl: MOV AL,[SI] CMP AL,15 JNC @@rcn SUB AL,AH NEG AL MOV[0m
[1m[SI],AL @@rcn: INC SI LOOP @@rcl[0m
By "ashes", the code means pixels with low heat values. Such pixels look
a bit unsightly in the bottom lines, so to smooth things over, the code
above loops over the pixels in the bottom six lines, looking for pixels
with values lower than 15. For such pixels, the code subtracts 22 (AH), and
negates the result (effectively computing 22 minus the pixel value), which
brightens them up a bit.
[1m MOV SI,80+OFFSET Imagen2 ; And show it. MOV DI,0 MOV AX,0A000h MOV[0m
[1m ES,AX MOV CX,40*48 REP MOVSW[0m
With all the pixel values ready in Imagen2, the program copies them over
to the 80x50 linearly addressed framebuffer at A000:0000 using the same
"scrolling copy" technique as before. The frame will be displayed the next
time the monitor refreshes.
[1m MOV AH,1 INT 16h JNZ Bye JMP MainLoop[0m
After the frame has been copied to the graphics memory, the code invokes
Int 16/AH=01h to check whether there's a keystroke in the keyboard buffer.
If there's not, the MainLoop continues, otherwise it jumps to the code
below. Epilogue
[1mBye: XOR AH,AH INT 16h MOV AX,3 INT 10h MOV DX,OFFSET ByeMsg MOV AH,9 INT[0m
[1m21h[0m
First, Int 16/AH=00h is invoked to retrieve the keystroke from the keyboard
buffer (the result, in AX, is ignored). Then Int 10/AH=00h is used to reset
the video mode back to 03h, which is the regular 80x25 16-color text mode.
Finally, Int 21/AH=09h is used to write the goodbye message to the screen.
[1m MOV AX,4C00 INT 21 [0;37;40m h
[1m END ; [0m
[1;30m------------------------------ End of FIRE.ASM[1C--------------------------------[0m
At the very end, Int 21/AH=4Ch terminates the program.
That's it: 200 lines of assembly and the rest is history. The original
firedemo running in Dosbox Firedemo in SDL
After reading through the original firedemo code above, I wanted to
re-implement it to run on modern operating systems. In the Othello project,
we did some graphical programming by using the native libraries (Xlib,
Win32 GDI, Cocoa, etc.), but in this case we're not trying to build a
graphical user interface, we just want to paint pixels on the screen.
One popular cross-platform library for doing that, often used in games
programming, is SDL (Simple Directmedia Layer).
The code below (available in fire.c) is a pixel-perfect port of the
firedemo to SDL2. (It mainly follows this guidance from the SDL2 migration
guide.) Hopefully it's a little easier to read than the assembly version.
[1;36m#include [0m
[1;36m#include [0m
[1;36m#include [0m
[1;36m#define WIDTH 80[0m
[1;36m#define HEIGHT 50[0m
[1;36m#define WIN_WIDTH 640[0m
[1;36m#define WIN_HEIGHT 400[0m
[1;36m#define FPS 30[0m
[1;36mstatic const uint32_t palette[256] = {[0m
[1;36m/* Jare's original FirePal. */[0m
[1;36m#define C(r,g,b) ((((r) * 4) << 16) | ((g) * 4 << 8) | ((b) * 4))[0m
[1;36mC( 0, 0, 0), C( 0, 1, 1), C( 0, 4, 5), C( 0, 7, 9),[0m
[1;36mC( 0, 8, 11), C( 0, 9, 12), C(15, 6, 8), C(25, 4, 4),[0m
[1;36mC(33, 3, 3), C(40, 2, 2), C(48, 2, 2), C(55, 1, 1),[0m
[1;36mC(63, 0, 0), C(63, 0, 0), C(63, 3, 0), C(63, 7, 0),[0m
[1;36mC(63, 10, 0), C(63, 13, 0), C(63, 16, 0), C(63, 20, 0),[0m
[1;36mC(63, 23, 0), C(63, 26, 0), C(63, 29, 0), C(63, 33, 0),[0m
[1;36mC(63, 36, 0), C(63, 39, 0), C(63, 39, 0), C(63, 40, 0),[0m
[1;36mC(63, 40, 0), C(63, 41, 0), C(63, 42, 0), C(63, 42, 0),[0m
[1;36mC(63, 43, 0), C(63, 44, 0), C(63, 44, 0), C(63, 45, 0),[0m
[1;36mC(63, 45, 0), C(63, 46, 0), C(63, 47, 0), C(63, 47, 0),[0m
[1;36mC(63, 48, 0), C(63, 49, 0), C(63, 49, 0), C(63, 50, 0),[0m
[1;36mC(63, 51, 0), C(63, 51, 0), C(63, 52, 0), C(63, 53, 0),[0m
[1;36mC(63, 53, 0), C(63, 54, 0), C(63, 55, 0), C(63, 55, 0),[0m
[1;36mC(63, 56, 0), C(63, 57, 0), C(63, 57, 0), C(63, 58, 0),[0m
[1;36mC(63, 58, 0), C(63, 59, 0), C(63, 60, 0), C(63, 60, 0),[0m
[1;36mC(63, 61, 0), C(63, 62, 0), C(63, 62, 0), C(63, 63, 0),[0m
[1;36m/* Followed by "white heat". */[0m
[1;36m#define W C(63,63,63)[0m
[1;36mW, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W,[0m
[1;36mW, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W,[0m
[1;36mW, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W,[0m
[1;36mW, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W,[0m
[1;36mW, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W,[0m
[1;36mW, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W,[0m
[1;36mW, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W,[0m
[1;36mW, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W[0m
[1;36m#undef W[0m
[1;36m#undef C[0m
[1;36m};[0m
[1;36mstatic uint8_t fire[WIDTH * HEIGHT];[0m
[1;36mstatic uint8_t prev_fire[WIDTH * HEIGHT];[0m
[1;36mstatic uint32_t framebuf[WIDTH * HEIGHT];[0m
[1;36mint main()[0m
[1;36m{[0m
[1;36m SDL_Window *window;[0m
[1;36m SDL_Renderer *renderer;[0m
[1;36m SDL_Texture *texture;[0m
[1;36m SDL_Event event;[0m
[1;36m int i;[0m
[1;36m uint32_t sum;[0m
[1;36m uint8_t avg;[0m
[1;36m bool full_screen = false;[0m
[1;36m bool keep_running = true;[0m
[1;36m if (SDL_Init(SDL_INIT_VIDEO) < 0) {[0m
[1;36m fprintf(stderr, "Failed SDL_Init: %s\n", SDL_GetError());[0m
[1;36m return 1;[0m
[1;36m }[0m
[1;36m window = SDL_CreateWindow("SDL2 firedemo (www.hanshq.net/fire.html)",[0m
[1;36m SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED,[0m
[1;36m WIN_WIDTH, WIN_HEIGHT,[0m
[1;36m SDL_WINDOW_SHOWN | SDL_WINDOW_RESIZABLE);[0m
[1;36m if (window == NULL) {[0m
[1;36m fprintf(stderr, "Failed CreateWindow: %s\n", SDL_GetError());[0m
[1;36m return 1;[0m
[1;36m }[0m
[1;36m renderer = SDL_CreateRenderer(window, -1, 0);[0m
[1;36m if (renderer == NULL) {[0m
[1;36m fprintf(stderr, "Failed CreateRenderer: %s\n", SDL_GetError());[0m
[1;36m return 1;[0m
[1;36m }[0m
[1;36m texture = SDL_CreateTexture(renderer, SDL_PIXELFORMAT_ARGB8888,[0m
[1;36m SDL_TEXTUREACCESS_STREAMING, WIDTH, HEIGHT);[0m
[1;36m if (texture == NULL) {[0m
[1;36m fprintf(stderr, "Failed CreateTexture: %s\n", SDL_GetError());[0m
[1;36m return 1;[0m
[1;36m }[0m
[1;36m while (keep_running) {[0m
[1;36m while (SDL_PollEvent(&event)) {[0m
[1;36m if (event.type == SDL_QUIT) {[0m
[1;36m keep_running = false;[0m
[1;36m } else if (event.type == SDL_KEYDOWN) {[0m
[1;36m if (event.key.keysym.sym == SDLK_f) {[0m
[1;36m full_screen = !full_screen;[0m
[1;36m SDL_SetWindowFullscreen(window,[0m
[1;36m full_screen ?[0m
[1;36m SDL_WINDOW_FULLSCREEN_DESKTOP :[0m
[1;36m 0);[0m
[1;36m } else if (event.key.keysym.sym == SDLK_q) {[0m
[1;36m keep_running = false;[0m
[1;36m }[0m
[1;36m }[0m
[1;36m }[0m
[1;36m for (i = WIDTH + 1; i < (HEIGHT - 1) * WIDTH - 1; i++) {[0m
[1;36m /* Average the eight neighbours. */[0m
[1;36m sum = prev_fire[i - WIDTH - 1] +[0m
[1;36m prev_fire[i - WIDTH ] +[0m
[1;36m prev_fire[i - WIDTH + 1] +[0m
[1;36m prev_fire[i - 1] +[0m
[1;36m prev_fire[i + 1] +[0m
[1;36m prev_fire[i + WIDTH - 1] +[0m
[1;36m prev_fire[i + WIDTH ] +[0m
[1;36m prev_fire[i + WIDTH + 1];[0m
[1;36m avg = (uint8_t)(sum / 8);[0m
[1;36m /* "Cool" the pixel if the two bottom bits of the[0m
[1;36m sum are clear (somewhat random). For the bottom[0m
[1;36m rows, cooling can overflow, causing "sparks". */[0m
[1;36m if (!(sum & 3) &&[0m
[1;36m (avg > 0 || i >= (HEIGHT - 4) * WIDTH)) {[0m
[1;36m avg--;[0m
[1;36m }[0m
[1;36m fire[i] = avg;[0m
[1;36m }[0m
[1;36m /* Copy back and scroll up one row.[0m
[1;36m The bottom row is all zeros, so it can be skipped. */[0m
[1;36m for (i = 0; i < (HEIGHT - 2) * WIDTH; i++) {[0m
[1;36m prev_fire[i] = fire[i + WIDTH];[0m
[1;36m }[0m
[1;36m /* Remove dark pixels from the bottom rows (except again the[0m
[1;36m bottom row which is all zeros). */[0m
[1;36m for (i = (HEIGHT - 7) * WIDTH; i < (HEIGHT - 1) * WIDTH; i++) {[0m
[1;36m if (fire[i] < 15) {[0m
[1;36m fire[i] = 22 - fire[i];[0m
[1;36m }[0m
[1;36m }[0m
[1;36m /* Copy to framebuffer and map to RGBA, scrolling up one row. */[0m[1;36m for (i = 0; i < (HEIGHT - 2) * WIDTH; i++) {[0m
[1;36m framebuf[i] = palette[fire[i + WIDTH]];[0m
[1;36m }[0m
[1;36m /* Update the texture and render it. */[0m
[1;36m SDL_UpdateTexture(texture, NULL, framebuf,[0m
[1;36m WIDTH * sizeof(framebuf[0]));[0m
[1;36m SDL_RenderClear(renderer);[0m
[1;36m SDL_RenderCopy(renderer, texture, NULL, NULL);[0m
[1;36m SDL_RenderPresent(renderer);[0m
[1;36m SDL_Delay(1000 / FPS);[0m
[1;36m }[0m
[1;36m SDL_DestroyTexture(texture);[0m
[1;36m SDL_DestroyRenderer(renderer);[0m
[1;36m SDL_DestroyWindow(window);[0m
[1;36m SDL_Quit();[0m
[1;36m return 0;[0m
[1;36m}[0m
To build and run the program on Debian GNU/Linux (or Ubuntu):
[1;32m$[37m sudo apt-get install libsdl2-dev[0m
[1;32m$[37m gcc -O3 -o fire `sdl2-config --cflags --libs` fire.c[0m
[1;32m$[37m ./fire[0m
To install SDL2 from MacPorts and build on Mac:
[1;32m$[0;37;40m [1msudo port install libsdl2[0m
[1;32m$[0;37;40m [1mclang -O3 -o fire `sdl2-config --cflags --libs` fire.c[0m
[1;32m$[0;37;40m [1m./fire[0m
To build on Windows, download the latest "Visual C++ 32/64-bit" development
library from the SDL 2.0 download page (currently the latest version is
SDL2-devel-2.0.7-VC.zip). Extract that somewhere (I used C:\), and build in
a Visual Studio Developer Command Prompt:
cl /Ox /DSDL_MAIN_HANDLED /Ic:\SDL2-2.0.7\include
c:\SDL2-2.0.7\lib\x86\SDL2.lib fire.c copy c:\SDL2-2.0.7\lib\x86\SDL2.dll .
fire.exe
The /DSDL_MAIN_HANDLED flag is to prevent SDL from replacing the main
function. The copy is to make sure the SDL2.dll can be found when running
the program.
(The program may not work in VirtualBox if video acceleration is not set
up correctly. In that case, pass SDL_RENDERER_SOFTWARE instead of the 0
argument in the call to SDL_CreateRenderer.)
[1;31mA New Fire Demo for DOS[0m
I don't think the firedemo above was actually the program I saw that
evening in the nineties. The way I remember it, the flames were just along
the bottom of the screen. What I remember resembles much more what's
described in Lode Vandevorde's Fire Effect tutorial.
One important difference in how that tutorial creates the fire is that it
only averages pixel values from rows below the current one. This means the
computation can be performed on a single buffer, in other words, there is
no need to have separate buffers for the current and previous frame.
That makes things easier, and since the fire is located mostly along the
bottom of the screen, it should be no problem running this in 320x200
resolution, even on a slow machine.
I've used this technique to make a little fire demo of my own (fire.asm):
[1;36m org 0x100 ; For .com file.[0m
[1;36msection .text[0m
[1;36mstart:[0m
[1;36m ; Enter mode 13h: 320x200, 1 byte (256 colors) per pixel.[0m
[1;36m mov ax, 0x13[0m
[1;36m int 0x10[0m
[1;36m ; Make sure es and ds point to our segment (cs).[0m
[1;36m push cs[0m
[1;36m push cs[0m
[1;36m pop ds[0m
[1;36m pop es[0m
[1;36m ; Write string.[0m
[1;36m mov ax, 0x1300 ; ah=13h, al=write mode[0m
[1;36m mov bx, 0xf ; bh=page number (0), bl=attribute (white)[0m
[1;36m mov cx, (msg_end - msg) ; cx=length[0m
[1;36m mov dx, ((10 << 8) + (40 / 2 - (msg_end - msg) / 2)) ; dh=row, cl=column[0m[1;36m mov bp, msg ; es:bp=string address[0m
[1;36m int 0x10[0m
[1;36m ; Set up the palette.[0m
[1;36m ; Jare's original FirePal:[0m
[1;36m cli ; No interrupts while we do this, please.[0m
[1;36m mov dx, 0x3c8 ; DAC Address Write Mode Register[0m
[1;36m xor al, al[0m
[1;36m out dx, al ; Start setting DAC register 0[0m
[1;36m inc dx ; DAC Data Register[0m
[1;36m mov cx, (firepal_end - firepal)[0m
[1;36m mov si, firepal[0m
[1;36msetpal1:[0m
[1;36m lodsb[0m
[1;36m out dx, al ; Set DAC register (3 byte writes per register)[0m
[1;36m loop setpal1[0m
[1;36m mov al, 63[0m
[1;36m mov cx, (256 * 3 - (firepal_end - firepal))[0m
[1;36msetpal2:[0m
[1;36m out dx, al ; Set remaining registers to "white heat".[0m
[1;36m loop setpal2[0m
[1;36m sti ; Re-enable interrupts.[0m
[1;36m ; A buffer at offset 0x1000 from our segment will be used for preparing[0m
[1;36m ; the frames. Copy the current framebuffer (the text) there.[0m
[1;36m push 0xa000[0m
[1;36m pop ds[0m
[1;36m push cs[0m
[1;36m pop ax[0m
[1;36m add ax, 0x1000[0m
[1;36m mov es, ax[0m
[1;36m xor si, si[0m
[1;36m xor di, di[0m
[1;36m mov cx, (320 * 200 / 2)[0m
[1;36m cld[0m
[1;36m rep movsw ; Copy two bytes at a time.[0m
[1;36m push es[0m
[1;36m pop ds[0m
[1;36mmainloop:[0m
[1;36m ; On entry to the loop, es and ds should point to the scratch buffer.[0m
[1;36m ; Since we'll be working "backwards" through the framebuffer, set the[0m
[1;36m ; direction flag, meaning stosb etc. will decrement the index registers.[0m[1;36m std[0m
[1;36m ; Let di point to the pixel to be written.[0m
[1;36m mov di, (320 * 200 - 1)[0m
[1;36m ; Write random values to the bottom row.[0m
[1;36m ; For random numbers, use "x = 181 * x + 359" from[0m
[1;36m ; Tom Dickens "Random Number Generator for Microcontrollers"[0m
[1;36m ; http://home.earthlink.net/~tdickens/68hc11/random/68hc11random.html[0m
[1;36m mov cx, 320[0m
[1;36m xchg bp, ax ; Fetch the seed from bp.[0m
[1;36mbottomrow:[0m
[1;36m imul ax, 181[0m
[1;36m add ax, 359[0m
[1;36m xchg al, ah ; It's the high 8 bits that are random.[0m
[1;36m stosb[0m
[1;36m xchg ah, al[0m
[1;36m loop bottomrow[0m
[1;36m xchg ax, bp ; Store the seed in bp for next time.[0m
[1;36m ; For the next 50 rows, propagate the fire upwards.[0m
[1;36m mov cx, (320 * 50)[0m
[1;36m mov si, di[0m
[1;36m add si, 320 ; si points at the pixel below di.[0m
[1;36mpropagate:[0m
[1;36m ; Add the pixel below, below-left, below-right and two steps below.[0m
[1;36m xor ax, ax[0m
[1;36m mov al, [si][0m
[1;36m add al, [si - 1][0m
[1;36m adc ah, 0[0m
[1;36m add al, [si + 1][0m
[1;36m adc ah, 0[0m
[1;36m add al, [si + 320][0m
[1;36m adc ah, 0[0m
[1;36m imul ax, 15[0m
[1;36m shr ax, 6 ; Compute floor(sum * 15 / 64), averaging and cooling.[0m
[1;36m stosb[0m
[1;36m dec si[0m
[1;36m loop propagate[0m
[1;36m ; Mirror some of the fire onto the text.[0m
[1;36m mov dx, 15 ; Loop count, decrementing.[0m
[1;36m mov di, (90 * 320) ; Destination pixel.[0m
[1;36m mov si, (178 * 320) ; Source pixel.[0m
[1;36mmirrorouter:[0m
[1;36m mov cx, 320 ; Loop over each pixel in the row.[0m
[1;36mmirrorinner:[0m
[1;36m mov al, [di] ; Load destination pixel.[0m
[1;36m test al, al ; Check if its zero.[0m
[1;36m lodsb ; Load the source pixel into al.[0m
[1;36m jnz mirrorwrite ; For non-zero destination pixel, don't zero al.[0m
[1;36m xor al, al[0m
[1;36mmirrorwrite:[0m
[1;36m stosb ; Write al to the destination pixel.[0m
[1;36m loop mirrorinner[0m
[1;36m add si, 640 ; Bump si to the row below the one just processed.[0m
[1;36m dec dx[0m
[1;36m jnz mirrorouter[0m
[1;36m ; Sleep for one system clock tick (about 1/18.2 s).[0m
[1;36m xor ax, ax[0m
[1;36m int 0x1a ; Returns nbr of clock ticks in cx:dx.[0m
[1;36m mov bx, dx[0m
[1;36msleeploop:[0m
[1;36m xor ax, ax[0m
[1;36m int 0x1a[0m
[1;36m cmp dx, bx[0m
[1;36m je sleeploop[0m
[1;36m ; Copy from the scratch buffer to the framebuffer.[0m
[1;36m cld[0m
[1;36m push 0xa000[0m
[1;36m pop es[0m
[1;36m mov cx, (320 * (200 - 3) / 2)[0m
[1;36m xor si, si[0m
[1;36m mov di, (320 * 3) ; Scroll down three rows to avoid noisy pixels.[0m
[1;36m rep movsw[0m
[1;36m ; Restore es to point to the scratch buffer.[0m
[1;36m push ds[0m
[1;36m pop es[0m
[1;36m ; Check for key press.[0m
[1;36m mov ah, 1[0m
[1;36m int 0x16[0m
[1;36m jz mainloop[0m
[1;36mdone:[0m
[1;36m ; Fetch key from buffer.[0m
[1;36m xor ah, ah[0m
[1;36m int 0x16[0m
[1;36m ; Return to mode 3.[0m
[1;36m mov ax, 0x3[0m
[1;36m int 0x10[0m
[1;36m ; Exit with code 0.[0m
[1;36m mov ax, 0x4c00[0m
[1;36m int 0x21[0m
[1;36m; Data.[0m
[1;36mmsg: db 'www.hanshq.net/fire.html'[0m
[1;36mmsg_end:[0m
[1;36mfirepal:[0m
[1;36m db 0, 0, 0, 0, 1, 1, 0, 4, 5, 0, 7, 9[0m
[1;36m db 0, 8, 11, 0, 9, 12, 15, 6, 8, 25, 4, 4[0m
[1;36m db 33, 3, 3, 40, 2, 2, 48, 2, 2, 55, 1, 1[0m
[1;36m db 63, 0, 0, 63, 0, 0, 63, 3, 0, 63, 7, 0[0m
[1;36m db 63, 10, 0, 63, 13, 0, 63, 16, 0, 63, 20, 0[0m
[1;36m db 63, 23, 0, 63, 26, 0, 63, 29, 0, 63, 33, 0[0m
[1;36m db 63, 36, 0, 63, 39, 0, 63, 39, 0, 63, 40, 0[0m
[1;36m db 63, 40, 0, 63, 41, 0, 63, 42, 0, 63, 42, 0[0m
[1;36m db 63, 43, 0, 63, 44, 0, 63, 44, 0, 63, 45, 0[0m
[1;36m db 63, 45, 0, 63, 46, 0, 63, 47, 0, 63, 47, 0[0m
[1;36m db 63, 48, 0, 63, 49, 0, 63, 49, 0, 63, 50, 0[0m
[1;36m db 63, 51, 0, 63, 51, 0, 63, 52, 0, 63, 53, 0[0m
[1;36m db 63, 53, 0, 63, 54, 0, 63, 55, 0, 63, 55, 0[0m
[1;36m db 63, 56, 0, 63, 57, 0, 63, 57, 0, 63, 58, 0[0m
[1;36m db 63, 58, 0, 63, 59, 0, 63, 60, 0, 63, 60, 0[0m
[1;36m db 63, 61, 0, 63, 62, 0, 63, 62, 0, 63, 63, 0[0m
[1;36mfirepal_end:[0m
To assemble the program and run it with Dosbox on Linux:
[1;32m$[37m sudo apt-get install nasm dosbox nasm fire.asm -fbin -o fire.com dosbox[0m
[1;32m$[37m fire.com[0m
(fire.com can also be downloaded here.)
On Mac:
[1;32m$[37m sudo port install nasm dosbox nasm fire.asm -fbin -o fire.com dosbox[0m
[1;32m$[37m fire.com[0m
For Windows, the idea is the same, but you have to download the programs
from the nasm and Dosbox web sites manually. My fire demo as an animated
GIF. Running on Bare Metal
While the fire.com demo above runs under MS-DOS, the program doesn't
actually use DOS for anything. In fact, it's not so much a DOS program as
an IBM PC-compatible program: it's just 16-bit x86 code, some BIOS calls
and fiddling with the VGA.
The exciting thing is that while PCs have gotten much faster and more
capable in the last 20 years, the old stuff is still there. It should
be possible to run my program on a modern PC, without the help of any
operating system.
Running a program without an operating system is sometimes referred to as
running on bare metal. This is most common in embedded systems, but it's
possible on PCs as well.
When a PC starts, it first performs power-on self tests (POST), and then
proceeds to load the operating system. Typically it loads it from the hard
drive, but it can also boot from other devices such as a CD-ROM, USB stick
or floppy disk.
The way a PC traditionally decides if it can boot from some medium is by
reading the first sector (512 bytes) of it and checking whether that ends
with the two magic bytes 0x55 0xAA, the Master Boot Record boot signature.
If so, it loads that sector into memory at address 0000:7c00 and runs it.
Luckily, our program fits in well under 512 bytes, so to make it run as
a Master Boot Record, we just have to make it expect to be loaded at
0000:7c00:
[1morg 0x7c00[0m
and insert padding and the magic bytes at the end:
[1m times (510 - ($ - $$)) db 0 [0;37;40m; Pad to 510 bytes
[1m db 0x55[19C [1C[0;37;40m; MBR boot
[8C[1msignature. db 0xaa[0m
We assemble it as before:
[1;32m$ [37mnasm fire.asm -fbin -o fire.img[0m
and end up with fire.img which contains our program and functions as a
Master Boot Record.
An easy way to test this is with VirtualBox. Configure a new virtual
machine, load the .img file as a virtual floppy disk, start the machine and
watch it boot into the fire demo.
To create a bootable USB stick with our demo from a Linux machine, insert a
USB stick and check dmesg to see what device ID it gets assigned:
[1;32m$[0;37;40m [1mdmesg[0m
..
[23722.398774] usb-storage 3-1.2:1.0: USB Mass Storage device detected
[23722.400366] scsi7 : usb-storage 3-1.2:1.0
[23723.402196] scsi 7:0:0:0: Direct-Access USB DISK 2.0
[23723.402883] sd 7:0:0:0: Attached scsi generic sg4 type 0
[23726.611204] sd 7:0:0:0: [sdc] 15138816 512-byte logical blocks:
(7.75 GB/7.21 GiB)
[23726.613778] sd 7:0:0:0: [sdc] Write Protect is off
[23726.613783] sd 7:0:0:0: [sdc] Mode Sense: 23 00 00 00
[23726.615824] sd 7:0:0:0: [sdc] No Caching mode page found
[23726.615829] sd 7:0:0:0: [sdc] Assuming drive cache: write through
[23726.629461] sdc: sdc1
[23726.638104] sd 7:0:0:0: [sdc] Attached SCSI removable disk
[1;33mNote: don't try this at home if you don't know what you're doing. Also[0m
[1;33mdon't try it at work.[0m
To write the image to the USB stick (it will effectively delete all
existing data on the USB device; make sure you got the right device ID and
don't have anything important on it):
[1;32m$[0;37;40m [1msudo dd if=fire.img of=/dev/sdc[0m
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.000129039 s, 4.0 MB/s
Restart the computer, boot from the USB stick (you might have to enter a
BIOS menu to select boot device) and watch it run on a modern computer just
like it would twenty years ago! Boot menu Firedemo after booting Further
Reading
Michael Abrash's Graphics Programming Black Book is full of .
information about the VGA, including techniques like "unchaining". The.
full text (web friendly version) is available online Fabien Sanglard's.
Game Engine Black Book: Wolfenstein 3D has excellent explanations .
of the PC hardware of the early nineties and provided significant .
inspiration for this post .