Basic Os
Basic Os
"Hello World"!!!!
Assembler is a low-level language. It consists of a list of instructions that are in no way comparable to
anything you might know from C, Basic or Pascal. The AVR has about 120 of them, depending on the type
and it's peripherals and size. You can download the instruction set from Atmel's website and print out the
summary (a list of all instructions and what type of operands they have). If you download it, I suggest
printing out pages 1 and 10 to 15. That's not too much and yet is everything you need for a start. Here's a
link to it. The document is about 150 pages in total.
Let's have look at probably the easiest program possible:
main:
rjmp main
"Main" (the first line) is not translated by the assembler, but used as a label. This label replaces an address
in the AVR's code space (FLASH memory). At exactly this address the next instruction (rjmp main) is
placed. When the instruction is exectued, the cpu will jump to "main" again. The jump will be repeated over
and over, resulting in an infinite loop.
After power-up, or when a reset has occured, the micro will always start program execution from address
0x0000. The first bytes in code space are the "Interrupt Vector Table". The AVR has internal peripherals,
like timers, a UART or an analog-to-digital converter. These can generate interrupts which will stop normal
code execution in order to react on certain events as fast as possible. This is good to know.
See Architecture -> Interrupts for more!
The interrupt vector table can be used by you to tell the micro what it has to do when a specific interrupt
occurs. The normal AVRs have space for one instruction per interrupt vector (an rjmp for example). This
instruction will be executed when the interrupt occurs (There's more to tell you about this, but not now...)
The first interrupt vector is the "reset vector". It contains the instruction the cpu should execute when a
reset occurs. We will use it to jump to our program we already had above:
.org 0x0000
rjmp main
main:
rjmp main
Assuming that our AVR is running at 4 MHz (4 Million clock cycles per second), how long does all this
take? AVRs are pretty fast - most instructions can be executed in one or two clock cycles. Some
instructions need a bit more time, but these are not important now. As the external clock is not divided
internally (some other microcontrollers do that, like the HC11 from motorola, but that's an old one), two
clock cycles at 4 MHz means that the instruction takes 0.0000005 seconds. Pretty fast!
"main" itself only needs 0.0000005 seconds per round, as it only consists of a single rjmp. Right now, our
main program doesn't actually DO anything.
The first thing I did when I started on AVRs was making an LED flash. LEDs can be connected to the
AVR's I/O ports (Architecture -> I/O Ports). These can be set to be input or output individually for each pin
and you can also enable an internal pull-up resistor if the port pin is set to input. Each I/O port has three
registers you can work with: The port's data register (for example PortB), the Data Direction Register (for
example DDRB) and the Pin register (for example PinB). For confiugring a pin as an output pin, set its
corresponding bit in the data direction register. The output value (0 or 1) can then be set in the Port Data
register.
If you have an STK500, get yourself an ATmega8 in a DIP package and a 4 MHz crystal. The micro can
plugged into the green target socket named "SCKT3200A2". Connect the 6-pin cable from "ISP6PIN" to
the ISP header for the green socket. That's the one named "SPROG2". The default jumper setting for the
oscillator system is the software oscillator. We want to use the crystal oscillator instead. Set the "OSCSEL"
jumper to close pins 2 and 3 (pin 1 is marked with a "1"). "Vtarget", "AREF", "RESET" and "XTAL1" should
be closed. Of course, the mega8 should be the only micro plugged into the STK. Do not insert more than
one AVR at a time! Now take one of the two wire-cables and use it to connect "PB3" (that's one of the
"PORTB" pins) to "LED0" (that's one of the "LEDS" pins). The crystal belongs into the crystal socket on the
STK500.
The STK500 LEDs are connected to the AVR via some extra components. If you want, you can take a look
at the LED circuit in the STK500 documentation (it's included in the AVR Studio help file and also came
with the STK). The most important fact is that the LED is connected to be "active low", which means that if
PortB.3 is low now (we connected it to one of the LEDs), the LED will be ON.
If you don't have an STK, connect the LED to PortB.3 via a current limiting resistor (about 470 Ohms is
OK, the value is not critical). Connect the resistor to the port pin and the LED's cathode, the anode goes to
Vcc. This will result in the same "active low" behavior.
Let's now change our program a bit so that it configures all PortB pins as output pins. After a reset, all Data
bits are set to zero, so the LED should be ON when the program is executed:
.org 0x0000
rjmp main
main:
ldi r16, 0xFF
out DDRB, r16
loop:
rjmp loop
The new loop was inserted so that we can set the Data Direction bits to our needs and then loop without
doing that again. We could, however, include the load and store instructions (ldi and out) in the loop. It
wouldn't hurt, but the micro would configure the Ports to the same value over and over again.
If you haven't done it already, download AVR Studio (I prefer version 3.5, not 4!) from Atmel's website and
install it. Now read "Creating a Project" in the AVR Studio Section and create a new project, choose your
favourite name and take into account that the LED will flash when it's finished :-)
It's now time to start a new page (it's already long enough). Our code will grow a bit...it might be useful to
have a calculator for all the timing stuff...
On the first page I told you how a simple program works, how to create a project and some details about
the AVR. Now it's time to have a look at what we want to do and how it can be done.
We want to make an LED flash. Basically we want to switch it on and off in a loop. The LED is connected
to PortB.3, one of the AVRs I/O pins we configured as an output.
Single bits in the I/O space can be cleared and set with the cbi (clear bit in I/O) and sbi (set bit in I/O)
instructions. Unfortunately, these don't work for all I/O registers. Open the AVR instruction set (the
complete one, not just the 7 pages you printed out) and look for cbi: It can only clear bits in registers 0-31
(Hex: 0x00...0x1F). PortB is in this range (0x18), so we can use cbi (same for sbi).
As the LED is ON when PortB.3 is low, the LED can be switched on with cbi and off with sbi. Let's add that
to the loop:
.org 0x0000
rjmp main
main:
ldi r16, 0xFF
out DDRB, r16
loop:
sbi PortB, 3
cbi PortB, 3
rjmp loop
; switch it on
; jump to loop
The reason why we first switch it off is that it was already on when entering the loop for the first time: After
reset, all port data bits are zero. By setting bit 3 in DDRB we configured portB.3 as an output and switched
the LED on.
You could now program your mega8 with this code, but you would only see the LED being on all the time.
The loop switches the LED off (sbi PortB, 3), which takes 1 clock cycle. Then, after 0.00000025 (again 1
clock cycle) seconds, the LED is on again (cbi PortB, 3). The rjmp takes 2 clock cycles (0.0000005
seconds). That's a bit too fast for the eye. We need to waste some time between switching the LED on and
off (let's say 0.5 seconds)
0.5 seconds at 4 MHz equals 2,000,000 clock cycles. Generating such a long delay requires either a timer
(which would use interrupts) or delay code that just takes lots of time for execution while occupying only
small space. This example will use a delay loop.
Keeping track of how many times the loop has been executed is done with a counter. As the AVR is an 8bit microcontroller, the registers con only hold the values 0 to 255. That's less than the 2 Million clock
cycles we need to wait between toggling the LED output, but we'll see how far we can get with that and
some tricks... The delay loop will be in a seperate subroutine we can call in order to wait for half a second.
The AVR Assembler -> jump and subroutine call pages might be worth looking at now.
Registers can be used in pairs as well, allowing to work with values from 0 to 65535. The following piece of
code clears registers 24 and 25 and increments them in a loop until they overflow to zero again. When that
condition occurs, the loop doesn't go around again:
clr r24
clr r25
delay_loop:
adiw r24, 1
brne delay_loop
; clear register 24
; clear register 25
; the loop label
; "add immediate to word": r24:r25 are incremented
; if no overflow ("branch if not equal"), go back to "delay_loop"
This little loop takes a lot of time: clr needs 1 cycle, adiw needs two cycles and brne needs 2 cycles if the
branch is done and 1 otherwise. Every time the registers don't overflow the loop takes adiw(2) + brne(2) =
4 cycles. This is done 0xFFFF times before the overflow occurs. The next time the loop only needs 3
cycles, because no branch is done. This adds up to 4*0xFFFF(looping) + 3(overflow) + 2(clr) = 262145
cycles. This is still not enough: 2,000,000/262,145 ~ 7.63
We need to tweak the loop a bit and also crete a loop "around" it which will contain our 262,145 cycle loop.
For fine-tuning the inner loop we need to change the clr instructions to ldi so that we can use a different
start value than 0. The "outer" loop will be down-counting from 8 to zero using r16. This is how the delay
code looks now:
ldi r16, 8
outer_loop:
ldi r24, 0
ldi r25, 0
delay_loop:
adiw r24, 1
brne delay_loop
dec r16
brne outer_loop
Again, some calculations: The inner loop is now treated like one BIG instruction needing 262,145 clock
cycles. ldi needs 1 clock cycle, dec also needs 1 clock cycle and brne needs 1 or 2 cycles (see above).
The overall loop needs: 262,145 (inner loop) + 1 (dec) + 2 (brne) = 262148 * 8 = 2097184 cycles plus the
initial ldi = 2097185. Wait. Subtract one because the last brne didn't result in a branch, so it needs
2097184 cycles. This is more like what we want, but 97184 cycles too long. This is where the fine-tuning
comes in - we need to change the initial value of r24:r25.
The outer loop is executed 8 times and includes the "big-inner-loop-instruction". We have to subtract some
cycles from the inner loop: 97184 / 8 =12148 cycles per inner loop. This is what the inner loop has to be
shorter. Every iteration of the inner loop takes 4 cycles (the last one takes 3 but that's not so important), so
let's divide those 12148 by 4. That's 3036.5 or 3037 less iterations. This is our new initialisation value for
r24:r25!
Now, if you want, do all those calculations again: The result is 2,000,000 clock cycles! Now just put this
into a seperate routine and call it from the main LED flashing loop:
.org 0x0000
rjmp main
main:
ldi r16, low(RAMEND)
out SPL, r16
ldi r16, high(RAMEND)
out SPH, r16
ldi r16, 0xFF
out DDRB, r16
loop:
sbi PortB, 3
rcall delay_05
cbi PortB, 3
rcall delay_05
rjmp loop
delay_05:
ldi r16, 8
outer_loop:
ldi r24, low(3037)
ldi r25, high(3037)
delay_loop:
adiw r24, 1
brne delay_loop
dec r16
brne outer_loop
ret
For trying this in the simulator or assembling it, also add this line at the beginning of the text:
.include "c:\avr_studio_working_directory\appnotes\m8def.inc"
This will include the ATmega8 def file from Atmel which includes things like the PortB address definition.
Without this file the assembler will spit out error warnings!
Assembler Basics
Assembler is a low-level language which does not know any C-like commands like for{;;} or while{}.
Assembler instructions are small, for example out PortD, r15 writes the contents of register 15 (which in an
AVR can hold one byte) to PortD (which is 8 I/O lines handled as one I/O register).
Other assembler instructions only work on the register rather than on registers AND I/O registers or SRAM.
"inc r15" is one of them. It increments the value register 15 holds by one. This is useful for loops (like
for{;;}).
Almost every instruction leaves certain bits in the Status Register set or cleared based on the instruction's
result. These bits can be used by branch instructions or arithmetic instructions in order to perform correctly
(branch/don't branch, increment result etc).
Branch instructions jump to a specific code address (or code line) if the microcontroller is in a specific state
or just go on with the next code line if this state is not present. If the counting variable in a loop has not
reached the desired value, they can let the mcu repeat the loop.
Here is a small example code snippet showing how arithmetic, I/O and branch instructions work together:
ldi r16, 0
for_loop:
inc r16
out PortD, r16
cpi r16, 10
brlo for_loop
In the loop the counter (r16) is increased in every iteration and written to PortD. When it reaches 10, brlo
will not jump to the beginning of the loop, but to the next instruction. This small example gives a good
impression of how small the steps you can take in assembler are. It can't get smaller, as this is what the
mcu does. None of the instructions will be split into smaller ones by the assembler. With the comments in
mind have a look at the AVR instruction set and also have a look at other instructions of the same type.
Assembler is very sensitive to programming errors. Try the above example with the increase instruction
and the compare instruction swapped. What happens? The first value on PortD is 0, the last one is 9.
Now have a look at the "Flow Charts" section and try to write a flow chart of the code above. You'll see that
flow charts make code less "cryptic" and readable. And keep them up to date every time you make a big
enhancement to your code so that you can still read it after two weeks. Comments are also very important,
especially if you can't make a flow chart every time your code changes. In assembler, a comment can be
written for almost every line of code, especially when tricks are used.
Flow Charts
Flow charts are a graphical representation of code, Program states or even SRAM contents, if used in a
creative way. Once you know how to use them for code you'll quickly develop your own style to create flow
charts for almost anything.
When you think about implementing a special algorithm or peripheral driver it might be better to have a
flow chart already done before you start hacking code. That will save lots of time. Trust me. I know. If you
have code that is not sufficiently commented or just BIG, analyse it by making up a flow chart. Very often
that helps, especially when you got it from the web.
Especially when writing code in assembler they are a great help, because assembler instructions are not
always self-explanatory and even well-structured code will get hard to read once it has grown to a certain
size.
Here is a small example flow chart:
MCU Status
The microcontroller operates based on the Status Register (SREG) and other internal registers or
components. Most important is the Status Register which holds information on the last instruction and its
result and Interrupt enable status.
The SREG holds 8 Flags:
bit:
0
1
2
3
4
5
6
7
Name:
C (Carry bit)
Z (zero Flag)
N (Negative Flag)
V (Overflow Flag)
S (Signed Flag)
H (Half Carry bit)
T (Bit store Flag)
I (Global Interrupt enable Flag
The Carry flag is used for shift and rotate instructions. See the Basic Maths section for this. If the Global
Interrupt Flag is set, the mcu will perform the Interrupt Service Routine corresponding to the interrupt that
occured. Detailed knowledge about the other flags is not essential - most of the compare and branch
instructions can be used without looking at the flags in detail.
Just remember: They are important for mathematical operations, and changing them between calculating a
value and comparing it with something else might be fatal. That's why Interrupt Service routines should
preserve the SREG and any other registers they use (unless these registers are unused in normal code).
An interrupt might occur between comparing two values with each other and a following branch - the ISR
might change status flags and corrupt the flags the branch relies on.
When a register is rotated left (rol), the MSB is shifted into the Carry Bit, bit 6 goes to bit 7, bit 5 to bit 6
and so on and bit 0 is replaced by the old Carry Bit.
C <- b7 <------ b0 <- C (Rotate left, rol)
C -> b7 ------> b0 -> C (Rotate right,ror)
When a register is shifted, the bit shifted out replaces the Carry Bit, the bit shifted in is 0.
C <- b7 <------ b0 <- 0 (Shift left, lsl)
0 -> b7 ------> b0 -> C (Shift right, lsr)
Another shift operation is asr (arithmetic shift right), which works like lsr, but bit 7 remains unchanged. The
rest is shifted right and the Carry bit is replaced by bit 0. This effectively divides a signed number by 2 (bit
7 holds the sign) and the Carry Bit can be used to round the result.
Bit Manipulation
cbr and sbr clear or set one multiple bit(s) in a register. These instructions only work on registers r16 to
r31. They do not use single bits as an argument, but masks which can contain multiple bits:
sbr r16, (1<<5)+(1<<3)
cbr r16, 0x03
See the instruction set summary or AVRStudio assembler help for details on Status Register flags changed
by this instruction. You can for example use breq after this instruction when the result is zero.
See instruction set for logical instructions like and, andi, or, ori, eor, com and so on. These are pretty
simple to understand.
add is easy to understand. The registers are added to the first register and appropriate flags in the Status
register are set.
adc is a bit fancier. It also uses the Carry flag of the previous operation to increase the result if it is set.
Good for multiple-byte operations (see Advanced Assembler section).
adiw is the only add instruction which takes a constant as an argument. It only works on the low bytes of
register pairs (24, 26, 28, 30) and adds the constant to the pair.
Subtracting can be done with some more instructions:
sub r16, r17
sbc r16, r2
sbiw ZL, 5
subi r16, 30
sbci r16, 4
The subtract instructions work like the add instructions, so I won't explain them in detail. The "subtract
immediate" (sbi) instruction can be used to make up an "addi" instruction:
(This can also be used to make an "addi" macro)
subi r16, -5
; r16 = r16 + 5
Multiplying
Multiplication is a bit more difficult than adding or subtracting. Classic AVRs (AT90S Series) don't support
the mul instruction, ATmegas have 6 different multiply instructions (multiply, multiply signed with unsigned,
multiply signed, fractional mul, fractional mul signed, fractional mul signed with unsigned).
Classic AVRs need some extra coding to perform multiplications, here is some pseudo-code:
for r16 = 1 to multiplier do
result = result + multiplicand
r16 = r16 + 1
repeat
This is more a loop adding than a multiplication, but it will do the job. In most cases the result will be 16bits wide, so have a look at the Advanced Assembler Section as well if you're not familiar with 16-bit
operations.
As ATmegas have the mul instruction, they don't need 16-bit operations and loops to perform a
multiplication. The result of mul instructions is always returned in r0:r1.
mul r16, r17
muls r16, r17
mulsu r16, r17
As I mentioned above the megas also have fractional multiply instructions, but these are more advanced
and I never used them. So I can't tell you how they work. And I won't.
When the 8 bit range is not enough, multiple bytes are needed to hold a value. Performing mathematical
operations on them requires more work than maths on single bytes. Most of these operations can written
as a macro and then be used just like normal instructions. The Carry Bit (SREG) is the reason for all this to
work - if an add instruction resulted in a number that is greater than 255, the carry bit is set. It can then be
used by adc (add with carry) to increment the high byte (16-bit example):
ldi r16, 1
ldi r17, 0
ldi r18, 255
ldi r19, 0
add r16, r18
add r17, r19
The result of the operation above is 0 because the carry of the low byte add was not used when adding the
high bytes.
add r16, r18
adc r17, r19
; r16:r17 = 1
; r18:r19 = 255
; add low bytes (= 256 => r16 = 0)
; add high byte with carry
; (= 0 + 1 (from carry) = 1)
; => r16:r17 = 256
Subtracting words from each other works just like adding: Subtract (sub) the low bytes from each other,
then subtract the high bytes with carry (sbc). This also works with subi (subtract immediate) and sbci
(subtract with carry immediate). A close look at the instruction set reveals some usefule compare
instructions as well! If a normal compare instruction (cp) returns "not equal" the carry bit is set as well. This
can them be taken into account by cpc (compare with carry) to compare the high bytes of two words. So
what about 32 bit values? It's the same, but with 4 bytes:
; r16..r19 = 0x00000100
; r20..r23 = 0x002000FF
add r16, r20
adc r17, r21
adc r18, r22
adc r19, r23
cp r16, r20
cpc r17, r21
cpc r18, r22
cpc r19, r23
brne not_eq
; add bytes0
; add bytes1 with carry
; add bytes2 with carry
; add bytes3 with carry
; result:
; r16..r19 = 0x00200200
; perform 32-bit compare
; (result: greater than)
; jump to "not equal"-code
Multiply and divide operations are not as easy as adding and subtracting. The bigger AVRs have a
hardware multiplier, but dividing values still has to be done in software. For those AVRs without a HW mul,
you'll have to write a software multiply routine (which is not difficult, I will add one here). If you need
multiply and divide operations, see The Atmel Appnote page and look for AVR200, AVR201 and AVR202.
The advantage of rjmp over jmp is that rjmp only needs 1 word of code space, while jmp needs 2 words.
Example:
rjmp go_here
ijmp
"Indirect Jump" to (Z). This instruction performs a jump to the address pointed to by the Z index register
pair. As Z is 16 bits wide, ijmp allows jumps within the lower 64k words range of code space (big enough
for a mega128). This instruction is especially cool for jumping to calculated addresses, or addresses from
a lookup table. Of course, special care has to be taken when setting up Z. Example:
ldi ZL, low(go_there)
ldi ZH, high(go_there)
ijmp
jmp
"Jump". While rjmp is limited to +/- 2k words, jmp can be used to jump anywhere within the code space.
The address operand of jmp can be as big as 22 bits, resulting in jumps of up to 4M words. The
disadvantage over rjmp is that jmp needs 2 words of code space, while rjmp needs just one word.
Example:
jmp go_far
Subroutine Calls
The AVR also has various subroutine call instructions. These are now described in the order they appear in
the instruction set summary. IMPORTANT: Subroutine calls require a proper stack setup and use of the
return instructions (which are described at the end of this page). For more about the stack,
read Architecture -> The Stack. That section also provides some more information on subroutines.
rcall
"Relative Call Subroutine". Just as rjmp, rcall can reach addresses within +/- 2k words. When rcall is
executed, the return address is pushed onto the stack. It needs 1 word of program space. Example:
rcall my_subroutine
icall
"Indirect Call to (Z)". This instruction works similar to ijmp, but as a subroutine call. The subroutine pointed
to by the Z index register pair is called. As Z is 16 bits wide, the lower 64k words of code space can be
addressed. The return address is pushed onto the stack. icall needs two words of code space. Example:
ldi ZL, low(my_subroutine)
ldi ZH, high(my_subroutine)
icall
call
"Call Subroutine". This instruction can reach the lower 64k words of code space (enough for the biggest
AVR, the mega128). It works just like rcall (regarding the stack) and needs 2 words of code space.
Example:
call my_subroutine
The Return Instructions ret And reti
These instructions have to placed at the end of any subroutine orinterrupt service routine (ISR). The return
address is popped from the stack and program execution goes on from there. This is what ret does.
reti is used after ISRs. Basically it works like ret, but it also sets the I Flag (Global Interrupt Enable Flag) in
the status register. When an ISR is entered, this bit is cleared by hardware.
Indirect Calls/Jumps
Indirect calls or jumps are needed when a computed value determines where the ALU has to proceed
executing code. They are fairly easy to understand.
Indirect calls/jumps don't use a constant address as a target, but have the Z index register pair as an
argument instead. As the program memory is organized in 16-bit words, they also don't need an extension
for 128kbyte devices such as the mega128. For lpm, it has elpm to reach the whole program space, but
lpm uses addresses for 8-bit organisation. So no eijmp or eicall is available. While the label for setting up Z
for lpm needs to be multiplied by two (to have byte addresses), this doesn't have to be done for ijmp/icall.
Example:
ldi ZL, low(led_on)
ldi ZH, high(led_on)
icall
led_on:
ldi r16, 0b11111110
out PortA, r16
ret
Indirect jumps/calls can also be used to make big case structures faster: If 20 different cases can occur
and the case we have is determined at the end of all checks, it takes longer to be precessed than the first
one, as 19 checks have already been done before.
If the value to be checked for different case values is used to perform an indirect jump or call, life is easier,
faster and more effective regarding code space usage:
The value we want to process is multiplied by the number of words a jmp needs (which is two) and then
added to the base address of our table. The following interrupt routine loads r16 with the current UDR data
and calls the appropriate subroutine:
in r16, UDR
lsl r16
clr r17
ldi ZL, low(case_table)
ldi ZH, high(case_table)
add ZL, r16
adc ZH, r17
icall
reti
case_table:
jmp UDR_is_one
jmp UDR_is_two
jmp UDR_is_three
One fundamental thing is not shown here: The ISR has to check for values not handled by the table before
the icall is done. The table as it is above does not have entries for the values zero and 4 to 255, so these
can result in an error! Also watch out for the table cell size: A jump needs two words and is slower than a
rjmp, but using an rjmp (which needs one word only) must only be done together with a nop so that things
stay correct: An rjmp without a nop (the nop should follow the rjmp for speed reasons) would result in an
error (two jmp instructions are combined to one garbage word).
Indirect addressing is also used by operating system to let tasks install themselves into interrupt jump
tables: The operating system cares for the interrupt being serviced, the tasks leave their own ISR address
in a table and the operating system can call each routine. Such a table could be an array of addresses in
SRAM at an address known when the code is written:
.org 0x0000
rjmp reset
reset:
lds ZL, reset_table
lds ZH, reset_table + 1
icall
.dseg
reset_table: .byte 8
.cseg
ldi XL, low(my_reset_ISR)
ldi XH, high(my_reset_ISR)
sts reset_table, XL
sts reset_table + 1, XH
Of course, the task needs some more information: Which table position is free (not used by other tasks)
and what information is provided by the OS and so on, but that's not the problem now, this page is just
meant to illustrate how indirect jumps and calls work.
Conditional Branches
Conditonal branches are branches based on the micro's Status Register. If the result of a previous
operation left a status (for example "Zero"), this can be used to jump to code handling this result. Loops
(for, while...) make use of this.
Any add, subtract, increment, decrement or logic instruction for example leaves a status that can be used
for almost any branch instruction the AVR offers. There are as well some tests which set status flags based
on their arguments. Basically they are just a subtraction: Comparing two numbers to each other is done by
subtracting one from the other. The result of a - b can be negative (b > a), positive (b < a) and zero (b = a).
This information is stored in the status register. When two numbers are added to each other, it can happen
that the 8-bit result is greater than 255 and therefore "rolls over". In this case, the carry bit in SREG is set.
Some examples:
subi r16, 5
breq r16_is_0
brlo r16_is_lower0
r16_is_greater5:
; r16 = r16 - 5
; r16 was 5, handle that
; r16 was lower than 5
; r16 was higher than 5
Here is a list of simple tests which can also be used for branch instructions (exception: cpse - this
instruction performs a compare and skips the next instruction if equal):
Valid SREG flags after test:
instruction:
cpi
cp
cpc
tst
cpse
arg 1:
reg
reg
reg
reg
reg
arg 2:
const
reg
reg
--reg
"action":
reg - const
reg - reg
reg - reg - C
reg AND reg
reg - reg
I
-
T
-
H
<>
<>
<>
-
S
<>
<>
<>
<>
-
V
<>
<>
<>
0
-
N
<>
<>
<>
<>
-
Z
<>
<>
<>
<>
-
C
<>
<>
<>
-
If you want more info on which results change which SREG flag, see the AVRStudio assembler help. Here
is a list of a few branch instructions and what they do based on the flags:
breq
brne
brsh
brlo
brmi
brpl
brge
brlt
; branch if equal
; branch if not equal
; branch if same or higher
; branch if lower
; branch if minus
; branch if plus
; branch if greater than or equal (signed)
; branch if less than (signed)
AVR assembler has more branches which test the interrupt flag or single other status flags. If you need
one of them, see the AVRStudio assembler help. Branch instructions leave the flags they test untouched,
so the code branched to or the code following the branch can use them without restriction.
Case Structures
Quite often, for example when receiving command values via the UART, it is necessary to build up a case
structure to deternime which function needs to be called. The case structure compares a value to various
case values. As the branch instructions don't change any flags, this can be implemented straight forward:
in r16, UDR
cpi r16, 0
breq case_0
cpi r16, 1
breq case_1
cpi r16, 2
breq case_2
After all the case tests you can write the "default" code that will be executed if none of the tests results
equal. Case structures don't necessarily test for single values, but can also test for values within a specific
range, or compare strings to each other. Here is value range example:
in r16, UDR
chk_case_03:
cpi r16, 4
brlo case_03
cpi r16, 20
brlo case_419
default:
; (default code)
The second example could also use brlt (branch if less than) if signed numbers are used. Advanced users
can write compare routines for any data structure they want. If these return usable flags in SREG,
conditional branch instructions can of course be used then.
For Loops
I don't think I need to explain how a for loop works, but in assembler we need to take care of the counting
register, which we wouldn't need to do in C ot Pascal. For loops can work in many different ways. Some
are more code efficient, some are more flexible.
The flexible version counts from zero up to the required number of iterations. It is possible to use the
counting register to address for example array elements.
ldi r16, 0
loop:
out PortB, r16
inc r16
cpi r16, 10
brne loop
ldi r16, 0
loop1:
inc r16
out PortB, r16
cpi r16, 10
brne loop1
What is the difference beteween the two example loops? The first loop increments the counter after writing
the counter value to PortB. So the values we can see on that port are 0..9. The second loop increments
the counter before writing it to PortB. We can see the values 1..10. So whenever you plan to use the
counter register within the loop (for whatever you can think of) remember to check where the counter has
to be incremented.
If the counter value is only important for counting purposes (not used from within the loop), you can use a
decrementing version:
ldi r16, 10
loop2:
(insert loop code)
dec r16
brne loop2
You might have noticed that there is no compare instruction. decleaves the status manipulated in uch a
way that we can use breq to determine whether the result was zero or not. That saves 1 word of program
space compared to the up-couting version of a for-loop.
While Loops
Just as for loops, while loops come in different variations. This time, we don't have to care about a counter
register.
The while()...do{}-loop checks if a certain test result is true and performs the loop instructions.
while1:
in R16, PinD
cpi r16, 1
brne while1_end
rcall port_is_1
rjmp while1
while1_end:
This type of while-loop will only execute the loop instructions if the condition is true, else it will never do
that.
The do{}...while()-loop executes the loop instruction at least once:
while2:
rcall port_is_1
in r16, PinD
cpi r16, 1
breq while2
while2_end:
These two examples also demonstrate two different branch instructions used for (basically) the same
thing. You'll easily find that out by yourself. Little helper: Conditional Branches.
.macro ldi16
ldi @0, low(@2)
ldi @1, high(@2)
.endmacro
Above, I wrote that arguments are replaced during assembly. The following should make it clear:
ldi16 r16, r17, 1024
; is assembled to:
ldi r16, 0
ldi r17, 0x04
As I said, macros can also be used to replace 16-bit calculations. This is one example (along with ldi16):
.macro addi
subi @0, -(@1)
.endmacro
.macro addi16
subi @0, low(-@2)
sbci @1, high(-@2)
.endmacro
Macros can of course be more complex, take more arguments and crash the assembler. If too many
macros are defined in one file, the last ones can't be found. I've had this with more than 7 I think. Just split
them into more files, that helps sometimes. Or just don't be that lazy and write the code yourself...
.org 0x0100
.db 128
.db low(1000)
.db 128, low(1000)
Strings can of course be placd in program memory with only one .db directive:
.db "Hello World"
This will fill 6 words and a 0 will be added by the assembler. If your string processing routine looks for 0
terminated strings, this is no problem, as the 0 is already there. If the string is
.db "Hello World!"
no 0 will be added, so
.db "Hello World!", 0
is better.
.dw
"Data Word"; .dw works just like .db, but will use one word for every value.
.org
.org can be used to set the program counter to a specific value.
.org 0x01 is the Interrupt Vector for external interrupt 0 in devices with 1-word interrupt tables. The
mega128 has two words for each interrupt, so for setting the program counter to the external interrupt 0
you have to use .org 0x02 in this case.
Syntax:
.org location (location is the word address of where the following instructions/data tables are to be placed)
SRAM Directives
.byte
Reserves a given number of bytes of SRAM space for a label. This might sound a bit complicated, but the
syntax example will make it clear... This directive is only allowed in data segments (see .dseg).
Syntax:
.byte size
array_5: .byte 5
my_word: .byte 2
.dseg
"Data Segment"; Tells the assembler that the following text is meant to used for setting up the SRAM. To
switch to code again, use .cseg.
.org
Use this directive to set the SRAM location counter to a specific value within a .dseg. (see .org in "Program
Memory Directives"). Together with .byte you can define SRAM locations at a specific address with a
specific size.
EEPROM Directives
The EEPROM Directives work just like the directives for program memory and SRAM. I won't go into detail
here. As EEPROM values can be downloaded to EEPROM to be stored there, the .db and .dw directives
can be used for storing calibration values in EEPROM during programming.
.db
.dw
.eseg
.org
Register and Constant Directives
.def
"Define (register)"; With this directive you can assign names to registers.
Syntax:
.def name = register
Example:
.def temp = r16
.equ
This directive assigns a name to a constant value which can't be changed later:
.equ max_byte = 255
.set
This work similarly to .def, but the value of the label can be changed later:
.set counter = 1
.set counter = 2
can occur in the same piece of code and they're each valid until a new .set is found, so .set counter = 1 is
overridden by .set counter = 2.
Coding Directives
.endm / .endmacro
"End Macro"; This tells the assembler that a macro previously started ends here. Only use after you've
also started a macro with .macro :-)
.macro
This will start a piece of macro code. See Assembler -> Macros for examples and usage suggestions.
.include
Including files (for examples the part specific definition files for each AVR) makes code more readable and
gives you the possibility to split code into sperate files.
Syntax:
.include path
Example:
.include c:\program files\avr studio\assembler\8515def.inc"
.include "\drv_routines\lcd.inc"
Assembler Output Directives
.device
This directive tells the assembler which AVR this code is for and only has effect on the AVR Studio
Simulator settings and does not affect the way your code will run on the actual device. Possible arguments
(device codes) are (list not complete, but you'll get the picture....):
AT90S1200
AT90S2313
ATmega8535
ATmega128
ATtiny11
ATtiny26
Syntax:
.device devicecode
.exit
Tells the ssembler to stop assembling the current file. WHAT FOR????? Well, if include files contain text at
the end (explanations of routines, constants and so on), the .exit directive can be used to let the assembler
proceed with the file in which the .include directive occured without any warnings or errors caused by the
text.
Example:
.def byte_max = 255
.def clock = 8000000
.exit
The maximum value a byte can hold is 255 and the device is clocked at 8 MHz
.list
The assembler by default creates a listfile (a combination of source code, opcodes, constants and so on).
Together with .nolist you can secify which parts of your file are to be shown in the listfile.
.listmac
This directive will turn macro expansion in the listfile on. By default, you'll only see which macro is called
and which arguments are used. As it can be useful to see what's going on (for debugging pruposes), it's
possible to get expanded macros.
.nolist
Turns listfile generation off (see .list)
Assembler Expressions
The AVR assembler supports many expressions for calculating constants or manipulating other values to
suit your needs. I don't want to cut n paste the AVR Studio assembler help file (as I did with the directives
part.......), so I'll just give you some pointers.
The possible operators are labels (addresses in Flash, SRAM or EEPROM), variables defined by the .SET
directive (see above), constants defined by the .EQU directive (also see above), integer constants
(decimal, Hexadecimal, Binary or Octal) and the program counter (PC).
How to use the labels, variables, constants and and integer values should be clear. The PC is interesting
and can be quite handy. Many loops only constist of one instruction and thinking of a label for all those little
things can be nasty, especially if the label you just want to use is already in use in some include file. The
code ends up seperated by ugly labels nobody can understand. Using the program counter is much better
(That's my opinion and this has been subject to heavy discussions...) for those loops (add a comment to
be on the safe side). Here's an example:
ldi r6, 0
inc r16
out PortD, r16
cpi r16, 10
brne PC - 3
When the PC is used for calculations, the current PC value is used. That means that in the example above
the location of the branch instruction is used for the calculation (not the loaction afterwards). The
instructions we want to include in the loop are inc, out and cpi and each occupy one word of program
memory. So we need to skip those three words: PC - 3. Always verify your code by simulating it!
Conversions...
As you read through these pages you might want to have a look at an ascii table. You can find one in the
Banner frame ("quick links").
Conversions are very important for user interaction: If a register has the value 0x30 its corresponding ascii
character is '0'. But if you want it to be displayed as '48' (0x30 is 48), you need to convert the number. In
the case of '0' this is not that important, but 0xFF is displayed as a block. And '48' is better than '0' if you're
displaying a temperature...
Some protocols, such as Ymodem, also use strings of values we have to convert first before we can
perform calculations on them: Ymodem sends a file size of 512 bytes as '512'. An AVR has to convert this
from ascii coded decimal to 16bit int first before it knows what '512' means.
Some number formats you should have in mind when doing calculations:
128
'128'
0x30
0b11001010
'11001010'
It's up to you which number format you use for a specific task. Ascii coded hex is quite often used for
debugging purposes, because the numbers are all of the same size (number of characters needed) and
becase the conversion always takes the same number of cpu cycles and doesn't require much space. Ascii
coded decimal is better for things like temperatures or rpm of a motor. Ascii coded binary is good for
displaying flag registers (SREG, Interrupt flag registers and so on).
I'll show you ways to convert numbers in both directions: From int to something you can display and back
(remember the Ymodem example).
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9 10 11 12 13 14 15
9 A B C D E F
If an 8-bit number is sent or printed as ASCII Coded Hex, the number is split into high nibble and low
nibble (in the case of ox20 these are 2 and 0). Then the nibbles are converted to their ASCII
representative: 0x32 for 2 and 0x30 for 0. These values can be printed on screen. The values from the
table above can not be preinted on the screen: In the ASCII table these are either not defined or control
characters. A won't be displayed as 'A'.
Binary format:
The binary format should be quite clear: 0b00001000 is equal to 8, 0b00011000 is equal to 24. Easy.
When a number comes as ASCII coded binary, the 1s and 0s are sent as their ASCII representative, 0x30
and 0x31, and thus have to be converted before they are "real" bits. The binary format also requires bit
shifting for the conversion.
Binary Coded Decimal (BCD) format:
Binary coded decimal is very handy for storing two digits (0..9) in one byte without much coding. The digits
are directly written to a byte nibble.
0x22 means that the low nibble contains the number 2 and the high nibble contains the number 2 as well.
A consequence of this is that a byte can only hold value in the range of 0 to 99: The values 10 to 15 (A to F
in Hex format) are not allowed in BCD format.
This format can for example be written to a port which has a 7447 connected to it. This IC is a 7-segment
LED driver which converts this format so that the segments of the LED display show the number of the
nibble.
$x0
$x1
$x2
$x3
$x4
$x5
$x6
$x7
$x8
$x9
$xA
$xB
$xC
$xD
$xE
$xF
dec
000
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
$0x
char
ctrl
NUL
SOH
STX
ETX
EOT
ENQ
ACK
BEL
BS
HT
LF
VT
FF
CR
SO
SI
dec
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
$1x
char
ctrl
DLE
DC1
DC2
DC3
DC4
NAK
SYN
ETB
CAN
EM
SUB
ESC
FS
GS
RS
US
dec
032
033
034
035
036
037
038
039
040
041
042
043
044
045
046
047
$2x
char
spc
!
"
#
$
%
&
'
(
)
*
+
,
.
/
dec
048
049
050
051
052
053
054
055
056
057
058
059
060
061
062
063
$3x
char
0
1
2
3
4
5
6
7
8
9
:
;
<
=
>
?
dec
064
065
066
067
068
069
070
071
072
073
074
075
076
077
078
079
$4x
char
@
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
dec
080
081
082
083
084
085
086
087
088
089
090
091
092
093
094
095
$5x
char
P
Q
R
S
T
U
V
W
X
Y
Z
[
\
]
^
_
result 200
10
30
These add up to 233. The number consists of single digits which specifiy the number of hundreds, tens,
and ones we add together.
To get these results, we need to divide the number; first by 100, then by 10 and then by 1. As the AVR
doesn't have a divide instruction, this has to be done manually:
Divide by 100:
- copy the number into a temporary register
- compare the number with 100
- if greater or equal, increase the hundreds count and subtract 100 from the temporary register
- go to the compare again
When this is done, the number in the temporary register is lower than 100. Now we can proceed with 10s
and 1s. Instead of dividing it by 1 we can just copy the remaining number to the register that holds the
ones.
Unfortunately, this is not enough to convert a number to decimal coded ASCII. In an ASCII table we can
see that '0' is 0x30. So we add 0x30 to the single digits (hundreds, tens, ones) and can now print it on the
screen (via UART, USB, LCD interface, whatever).
It's now also possible to reformat the number, delete characters we don't need (print a space instead of 0
hundreds if the number was lower than 100) or add additional characters in between.
Here's a flow chart of how the conversion can be done:
It should be pretty easy for you to write the code for this yourself.
Doing this with a 16-bit number is just the same, but with 5 digits and 16-bit compares. The code space
needed (as well as cpu time) is 40% bigger. If you have a lot of free program space, you can build up a
case-like structure to do the conversion: If the number is greater than 200, the hundreds counter is loaded
with 2 and 200 is subtracted from the original number. This is faster but requires more space. It's up to
you.
The reason why you have to swap the nibbles in reg A is that the register holding the high nibble should
have a value between 0x00 and 0x0F (15). If we didn't swap the nibbles, their value would be 0x00..0xF0
which we can't convert to ASCII.
Now we have two nibbles, each in a separate register, which are between 0x00 (0) and 0x0F (15). These
must now be converted into their ASCII representative: For 0 it's '0', for 10 it's 'A' and for 15 it's 'F'. This can
be done with a lookup table or by using a case structure.
A lookup table for this would have the ASCII values of the possible nibble values at the nibble positions:
Table Position:
Nibble value:
ASCII:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
'0' '1' '2' '3' '4' '5' '6' '7' '8' '9' 'A' 'B' 'C' 'D' 'E' 'F'
The conversion only consists of replacing the register value by a corresponding table value (which is the
overall concept of a lookup table).
When this is done and the number we had was 128, we now have reg A holding '8' and reg B holding '0'
because 128 is 0x80.
Converting 16-bit numbers from int to ASCII coded hex is not much harder. The result needs 4 registers
and we need to convert two ints without having to use any 16-bit instructions:
1024 (hex 0x0400) converts to '0' and '4' for the high byte and '0' and '0' for the low byte.
itoacb_loop:
ldi result, '0'
ror value
brcc is_0
inc result
is_0:
rcall print_result
dec r16
brne itoacb_loop
Example: 45->BCD
45 divided by 10 is 4; 5 remains. 4 Is stored in the result register and the nibbles are swapped. The result
is now 0x40. Now 5 is added and the result is 0x45, which is exactly the result we want.
Solving the problem that occurs when converting numbers >99 is up to you. A second byte is needed.
This conversion is done for the two nibble characters. These are then combined in one byte:
Maybe the nibbles have to be swapped again, depending on how the two nibbles were sent: If the high
nibble was sent first, the received byte can left as it is: The high nibble was added, the nibbles were
swapped and then the low nibble was added.
Consequently, if the low nibble is sent first, the nibbles have to be swapped again.
I'll only discuss the case of the string being sent most significant bit first here.
What is to be done? The bit is received, converted to a 'real' bit (1 or 0) and then shifted into the result.
rcall receive_character
tst character
breq end_of_string
clc
subi character, '0'
lsr character
rol result
What happens during the shift operations? These are most important for the conversion, as they combine
the result with the converted character.
The lsr instruction shifts the byte one place to the right. The bit that is shtifted out is placed in the carry bit
and the most significant bit is replaced by 0.
The carry bit is then rotated (rol) into the result: The most significant bit of the result is placed in the carry
bit and the least significant bit is replaced by the old carry bit which we got from the character.
But how can we determine when to begin a new byte? We don't know the exact string length by now and
have to start a new byte after 8 bits have been received and shifted into the result. The code above has to
be altered so that it handles the highest possible number of bits that might be sent by the source of the
string.
This can be done with multiple rol instructions. One for each byte, beginning with the least significant one:
rol result_0
rol result_1
rol result_2
rol result_3
Now the bit from the character is shifted into the 32-bit result. Of course, the result can also be left 16 bits
wide.
Make sure that the result is initialised to 0 before starting to shift in bits: Some bits might not be overwritten
by the bits from the string!
When storing data to/loading data from a direct address you know exactly where the data is stored. When
using address 0x60 to store a byte value (lets call it "hour"), you can use the sts and lds instructions to
handle that data:
lds r16, hour
That's pretty simple. r16 is loaded with the SRAM contents at address 0x60. Same procedure for sts:
sts hour, r16
Indirect addressing is done similarly to using pointers in C or pascal: The Index Register Pairs (r26:r27 are
called X, r28:r29 Y and r30:r31 Z) can be used to point at the AVR address space. If X (r26:r27) holds the
value 0x60, it will point to "hour" and can be used to handle that value. This is what the indirect addressing
instructions are made for:
ldi XL, 0x60
ldi XH, 0x00
ld r16, X
Indirect loading of a register is pretty useless without indirect storing, so there's also a store instruction
which is used just like sts but with X, Y or Z instead of a direct address. It's called st.
It's now time to explain the address space a bit more detailed before proceeding with more advanced
load/store instructions.
The AVR address space consists of 3 major regions: The register file (r0..r31), I/O registers (Timers, UART
and so on) and internal SRAM. Here is a diagram showing how it's organised:
In this diagram you can see why the first SRAM address is 0x60. The AVR registers and I/O registers are
also located in the data space and occupy the low addresses. This has an advantage: You can access the
I/O registers via the index register pairs X, Y, and Z as well. The only thing to remember is that the I/O
addresses you can use with In and out don't work in this case, as in and out work with 0x00 to 0x3F (see
left column). In this case you must use 0x20 to 0x5F instead (see Address Space column). The working
registers can also be accessed using indirect addressing. The code below demonstrates the difference
between addressing I/O registers indirectly and writing to them using out.
ldi XL, 0x3B
ldi XH, 0x00
lds r16, X
The assembler will store the table in Flash. This will use up 4 words: 6 characters + 0 zero. That 7 bytes.
The last one will be padded with another zero so that only whole words are used. 8 bytes. For getting the
BYTE address of the table, it's address has to be multiplied with 2. Let's load the table's address into the
inde register pair Z:
ldi ZL, low(2*string)
ldi ZH, high(2*string)
string:
.db "Hello!", 0
Lookup Tables
Lookup tables are an easy way to convert numbers to other formats and they also provide a way to make
calculations faster by providing basic values. A sine table could look like this (see second example below
for an AVR assembler version of this):
Angle:
10
20
30
40
50
60
70
80
90
Sine (*100)
17
34
50
64
77
87
94
98
100
This tble is pretty rough and won't give you a very good result if you really need to calculate values based
on a sine. On the other hand this table would only use 8 bits for the lookup value and doesn't even really
use the 8 bit range as it could: By multiplying the sine of an angle by 200 we would still move within the 8
bit range while having a higher precision: The sine of 60 * 200 is 173, not 174 as the table above would
give us (when multiplied by two). With 16 bits the precision is fairly good, as we can even multiply the sine
by 40,000 for storing it in a table! When the values have to be used however we still need to keep in mind
that the AVR needs them for calculations: A multiplication by 2 or 4 is good as we can just shift the result
right one or two places for dividing it again. But that's not the problem for now....
Lookup tables are usually stored in program memory using the .db or .dw directives and have an own label
for addressing:
sine_table:
.db 0, 17, 34, 50, 64, 77, 87, 94, 98, 100
The label is used to have an index register pair point at the table. Assuming the angle we need the sine of
is 40 and the index register pait used is X, the following code returns the sine of 40 * 100:
ldi XL, low(2*sine_table)
ldi XH, high(2*sine_table)
ldi r16, 4
ldi r17, 0
add XL, r16
adc XH, r17
lpm
This example was fairly simple, but it shows how to get a value from a table we first made up in program
memory. If multiply values are to be read from the table, the AVR has two powerful instructions for us: adiw
(add immediate to word) and sbiw (subtract immediate from word). These only take the lower register of a
word as an argument and can only operate on r24, 26, 28, 30 (which includes X, Y and Z). The advantage
of these instructions over normal 6-bit additions and subtractions is that they don't need any registers for
holding the add/sub value. The example above needs those registers, as the value we want to add is not
known at the time when we're writing the code.
On some AVRs lpm can also load the program memory contents to a register different than r0 and can
post-increment Z. Possible ways of using lpm are:
lpm
lpm r16, Z
lpm r16, Z+ (while r16 can in both cases be replaced by any other register)
This makes usage of adiw or sbiw not necessary and saves code space on the devices which support the
lpm rr, Z+ instruction. The AT90S1200 doesn't have lpm at all, the 2313 only supports a bare lpm. Look at
your device's specific instruciton set for details.
Strings can be stored in Flash using the .db directive. At runtime, they can be loaded from Flash by using
the lpm (Load Program Memory) instruction in order to work with them. There are many ways to write
routines to transfer them to SRAM, and some of them are really cool. Basically, you need to write a loop
that loads the character from Flash and processes it (by transferring it to SRAM or something else
depending on the application) until the terminating character is reached (usually zero).
The following example uses a routine to move a string from Flash (Z) to a known memory location (Y) and
returns a pointer (Y) to the string. The address of the string in SRAM shall be 0x100 after the transfer. The
terminating character (zero) shall not be cut off the string.
transfer_string:
push YL
push YH
clr r1
transfer_loop:
lpm
adiw ZL, 1
st Y+, r0
cp r0, r1
brne transfer_loop
pop YH
pop YL
ret
;Usage:
ldi YL, low(0x100)
ldi YH, high(0x100)
ldi ZL, low(2*mystring)
ldi ZH, high(2*mystring)
rcall transfer_string
mystring:
.db "Hello", 0
; this is the label we call to transfer a string from Flash (Z) to SRAM (Y)
; first, we save the SRAM pointer so that we can return it again
;
; as r0 doesn't support cpi, we need a zero register for the compare
; which is r1
; this is a do...while loop:
; load characte from Flash to r0
; increment Flash pointer
; store in SRAM and increment Y
; check if terminator reached
; if not, go back to the loop
;
; restore SRAM pointer
;
; and return.
; this is how you use this routine:
; load Y with destination address
;
; load Z with source address
;
;
; call the routine
;
; this is our string in flash:
; "Hello" with terminating zero
Not all AVRs support lpm rd, Z+ (load program memory to register and post-increment Z), so I used lpm
together with a seperate add immediate to word (adiw). Check the device specific datasheet of your AVR
to see if it supports lpm rd, Z+.
Making it More Elegant
The are of course more elegant ways to implement this. A design note (#043) on www.avrfreaks.net by
Kelly Small (Here's the direct link) shows how to use the stack to get the source address of the string. The
string is stored in Flash directly after the routine call instruction:
rcall transfer_string
.db "Hello", 0
The cool thing about this is that the string can be found where it is used by the code. The example above
would most probably use a string that is stored together with some other strings that might be a few pages
away.
When the routine is called, the rcall instruction will store the address of the NEXT instruction (in this case
the string). Therefore, the address of the string can just be popped off the stack. When the string has been
processed and Z points at the next data byte after the zero, this will be the next address after the string,
where normal code can follow (read the design note for a better explanation). So Z can be pushed onto the
stack and ret will return to it.
rcall stores the low byte of the return address first, so popping has to be done in high byte, then low byte
order. As the return address is a word address, it has to be multiplied by 2 to get a byte address for lpm,
which can be done by shifting the address left once. For devices with 128KB of program memory, elpm
has to be used then (together with rampz).
Messing around with return addresses is quite dangerous and has to be done carefully to avoid serious
errors. The design note mentioned above does not mention why it's safe, So I'll explain that here. First of
all, the routine has to be explained though:
Te return address is pushed onto the stack by the call instruction. The routine pops it from the stack and
multiplies it by two to make it a byte address:
pop ZH
pop ZL
lsl ZL
rol ZH
Z now points at the first character of the string, which can now be processed by the process_string routine
in a do...while loop that post-increments Z after every load-'n-process. The loop in this example is slightly
different from the one above (uses r16). process_string uses the data in r16 to process it (send to UART,
LCD, whatever).
read_string:
lpm
mov r16, r0
adiw ZL, 1
rcall process_string
cpi r16, 0
brne read_string
Once the string is processed, the routine has to finish it's job by translating the address Z is pointing at to a
word address again. Then the return address has to be pushed onto the stack again (for the ret
instruction). BUT: What if the string had an even number of characters? Then the overall length of the
string including the terminating zero is odd, so that Z still points to the string itself. The following two
drawings assume that the instruction following the string is "inc r16":
The second zero in the second example is added by the assembler (padding to achieve even number of
bytes in the .db directive). As the routine stops upon reaching the first zero, Z will point to a zero in the
second example and not at the increment instruction.
After the string has been processed, the routine will (as written above) push the return address onto the
stack again. If Z is now shifted one place right (division by 2 to convert the byte address to a word address
for ret), it will (second example) point to the first zero again and this word (two zeros) will be exectued after
the return, but that's not too bad as the opcode 0x0000 is a nop. So any string with an even number of
characters plus the terminating zreo will result in an additional nop being executed.
Therefore, the routine can end with
lsr ZH
ror ZL
push ZL
push ZH
ret
The three code snippets pasted together work without problems, just process_string has to be added by
you. Here' a working example with comments: flash_string.asm
This example stores the string at the memory address Y points to and works well in the simulator.
Copying data from one memory location to the other can cause serious headaches. When only one byte is
copied, that's no problem. Assuming the source pointer is X and the destiation pointer is Y, we can just
load a register from X and save it to Y:
ld r16, X
st Y, r16
When more than one byte (e.g. a string) has to be copied/moved, things can get more difficult. If the
source and destination memory areas don't overlap, the copy routine is still fairly simple. The following
routine copies from X to Y, while the number of bytes to be copied are in r16. To ensure that even 0 bytes
can copied, the loop first checks for r16=0, then copies and post-decrements. When everything is finished,
the number of bytes copied is subtracted from both pointers so that they each point to the first byte of the
data block again:
copy_mem:
mov r17, r16
copy_mem_loop:
tst r17
breq end_copy_mem
ld r18, X+
st Y+, r18
dec r17
rjmp copy_mem_loop
end_copy_mem:
sub XL, r16
sbci XH, 0
sub YL, r16
sbci YH, 0
ret
A disadvantage of the above is that a zero-copy call needs more cycles than actually needed, because it
subtracts 0 from both pointers. With an extra label just before the return instruction and an extra test
before copy_mem_loop this can be improved at the cost of code space.
The Problem...
...is that during runtime it can happen that the source space and the destination space overlap. Assuming
that an array of 10 bytes has to be moved, the source being at 0x65 and the destination being at 0x60.
That's no problem:
"A" is copied to 0x60 and so on without problems because only the addresses from 0x65 upwards overlap.
Where we before had "A" the routine will now copy the "F" from address 0x6A and "A" will be overwritten.
That's not bad because we already copied it to it's destination.
Now swap source and destination: The array has to be copied from 0x60 to 0x65. What happens? The
result is "ABCDEABCDE" - why?
Remember that the copy routine uses load - post-increment(source) - store - post-increment(destination).
Here's a diagram of the memory contents before copying the block:
When "A" is copied from 0x60 to 0x65, "F" (which we want to copy later) will be overwritten. The result is
that at the address where the "F" was, an "A" will be read and stored at the "F" destination address.
I've made up an example asm file which can be simulated in AVR Studio. Before running it please put a
10-byte array into sram beginning at address 0x65. The code will copy it to 0x60 and then back to 0x65
which will result in the error described above.
A mem copy routine that is supposed to correctly copy the data has to be able to handle the two cases
above: First, it has to use post-increment addressing if the two blocks overlap and the source is at a higher
address than the destination (first diagram). Second, it can happen that they overlap and the source is at a
lower address. Then, pre-decrement addressing has to be used (second diagram).
Testing for all these cases requires more code and time than necessary - the only information required is if
the source address is higher or lower than the destination address: If the source address is lower, predecrement addressing has to be used. If the two blocks overlap we're on the safe side then. Otherwise
(source address > destination address; first diagram) post increment addressing can be used. Here's the
complete one:
copy_mem:
ldi r18, 0
mov r17, r16
cp XL, YL
cpc XH, YH
brsh copy_mem_inc_loop
add XL, r17
adc XH, r18
add YL, r17
adc YH, r18
copy_mem_dec_loop:
tst r17
breq end_copy_mem_dec
ld r18, -X
st -Y, r18
dec r17
rjmp copy_mem_dec_loop
end_copy_mem_dec:
ret
copy_mem_inc_loop:
tst r17
breq end_copy_mem_inc
ld r18, X+
st Y+, r18
dec r17
rjmp copy_mem_inc_loop
end_copy_mem_inc:
sub XL, r16
sbci XH, 0
sub YL, r16
;
; create zero register
; save block size
; source >= destination?
;
; if so, use incrementing addressing
;
; for pre-decrement addressing, we have to copy from top to bottom
; so we have to add the blocksize to the two pointers
;
;
; here's the loop:
; first, check if the copy is done
; if so, return
;
; load from source
; store at destination
; and decrement the to-be-done counter
; loop
;
; when the whole thing is finished, we can just return, because the
pointers
; were pre-decremented to their original value
;
; in case the source address is smaller than the destination address,
; we can use post-inc
; again: the copy-done-check
;
; get data...
; ...store it again
; and decrement the counter
; loop
;
; when the copy has been finished, we have to subtract the block
size
; from the two pointers again, because this time we used post-inc
sbci YH, 0
ret
;
;
;
; return.
As always, there's ways to make this routine faster. A 2313 for example only has 128 bytes of SRAM which
only have 8-bit addresses, so all 16-bit calculations can be converted to 8-bit. If only even numbers of
bytes have to copied, two can be subtracted from the loop counter in every run if two bytes are copied
(load-store-load-store -> loop).
The Stack
The Stack is used by the ALU to store return addresses from subroutines.
Imagine you can't remember where you just left. You'd have to write down where you left and, if you're
visiting several locations, put the notes onto a stack. Your stack pointer tells you where that stack is. A
microcontroller is just doing that - when a subroutine is called, it leaves the place in flash where it was just
working and saves the return address on the stack.
The Stack needs a stack pointer (SP) and space in SRAM (the stack pointer must point above the first
SRAM address). When a return address is stored, the SP is post-decremented (!!!!!!). In other words: The
stack is growing towards smaller SRAM addresses. The biggest stack possible is initialised to RAMEND. It
can then grow all the way down to the first SRAM address.
Here's a table/diagram/figure/whatever of how the stack is changed by rcall and ret.
.org 0x00
ldi SPL low(RAMEND)
ldi SPH, high(RAMEND)
rcall subrtn_1
.org 0x100
subrtn_1:
rcall subrtn_2
ret
.org 0x140
subrtn_2:
ret
layer 0:
layer 1:
layer 2:
Stack value
-------
SP value
SP = ???
Comment
Stack before init
layer 0:
layer 1:
layer 2:
Stack value
-------
SP value
<-SP
Comment
Stack after init
layer 0:
layer 1:
layer 2:
Stack value
0x01
-----
SP value
<- SP
Comment
return address
SP=SP-1
layer 0:
layer 1:
layer 2:
Stack value
0x01
0x0101
---
SP value
<- SP
Comment
return address
return address
SP=SP-1
When the return is executed, the return address is popped from the stack and the SP is incremented. In
the example, when returning from subrtn_2, the micro jumps to 0x101 (the ret instruction in subrtn_1) and
the Stack Pointer points to stack layer 1 again. I didn't make a table for that as it should be easy to
understand now.
The stack can also be used to pass arguments to subroutines using push and pop. If a subroutine has a
16-bit argument, passing it would look like this:
push r16
push r17
rcall set_TCNT1
set_TCNT1:
pop r17
pop r16
out TCNT1H, r17
out TCNT1L, r16
ret
It's important to keep the push and pop instructions balanced to each other. If a value is pushed on the
stack as an argument folowed by a subroutine call, the next ret can result in unexpected behavior if the
subroutine popped too many or no argumants at all. One push, one pop. This bug is often hard to find.
Why can't the subroutine just use r16:r17 instead of the stack as a base for passing arguments? Good
question. By using the stack, you can use any register to push the value on the stack. You're not limited to
r16 and r17. You can also push an argument and then use the registers to calculate the next one (file
systems for example need lots of registers for calculations). You can also use a heap to pass arguments.
This has the advantage that you can't mess up your return addresses.
Let's take a closer look at how the return address is stored on the stack by simulating it in AVR Studio. I've
not included images of this in order to save space, but it's quite simple. This is the code for finding out how
return addresses are pushed on the stack:
;(include 2313def.inc)
.org 0x0000
rjmp reset
reset:
ldi r16, low(RAMEND)
out SPL, r16
rcall dummy
.org 0x0123
dummy:
rcall dummy2
ret
dummy2:
ret
;
;
; reset interrupt vector
;
;
; initialisation:
; stack pointer to RAMEND
;
;this will push 0x0004 on the stack (note 1)
;
;
;first dummy routine
; address on stack: 0x0124 [Break Point]
; the ret is at address 0x0124
;
; second dummy routine
; [Break point]
note 1: rcall dummy will push 0x0004 on the stack because there are 3 instructions before it that use one
word of code space each (rjmp; ldi; out; + rcall) so the next address after the subroutine call instruction is
0x0004.
The simulator is set up as follows: 2313 @ 1MHz, one memory window (Data) for viewing SRAM contents.
Now run the code. After the first break the SRAM will hold 0x04 at address 0xDF and 0x00 at address
0xDE. That means that the low byte of the address (which is 0x04) is at the higher address.
After the second rcall (second break) the return address to dummy's ret is also pushed on the stack: 0x24
at address 0xDD and 0x01 at address 0xDC.
The low address byte is pushed first, as the simulation shows. If you wanted to do calculations on that
address, you'd have to pop the high byte first. Beware: Messing with the stack is not easy and should be
done with caution!
Subroutines
Subroutines are code segments you can call and return from. That's cool, because you can reuse the code
from every point in your program without wasting program space. For subroutines to work, some
preparation is needed.
In order to know where to store and find return addresses (where to go on when returning from a
subroutine), we need to setup the Stack Pointer (SP). When a return address is stored, the SP is set to the
location before the stored address, so setting the SP to the last SRAM location (RAMEND) for initialisation
is (in most cases) best (see upper part of image). In the lower part of the image you can see how the
address is stored. If external memory is connected, the SP should be set to the last internal address for
speed reasons (accessing external SRAM takes longer).
ldi r16, low(RAMEND)
out SPL, r16
ldi r16, high(RAMEND)
out SPH, r16
RAMEND is defined in the micros include file you get with AVRStudio and equal to the last available
internal SRAM address.
A subroutine begins with a label which is the subroutine's name. The follwing example routine writes the
value of r16 to PortA and then returns:
out_portA:
out PortA, r16
ret
main:
rcall out_PortA
rjmp main
In the example above rcall is used. This instruction jumps to a relative address and is 2 bytes long and
needs 3 cycles for execution. The disadvantage is that the subroutine has to be located at +/- 2k words.
Another possible instruction is call. This instruction jumps to an absolute address and therefore needs
more code space: 4 bytes, 4 cycles. It can reach the whole code space, which is important in devices with
more than 8kB of program space. The 8k AVRs only need rjmp and rcall, as all addresses can be reached
with +/- 2k word jumps.
In the Advanced Assembler section you will find an introduction to icall. It uses an address stored in the Z
register pair to call a subroutine.
Interrupts
Interrupts, as the name suggests, interrupt the normal program flow. When an interrupt occurs, the ALU
calls the correspoding interrupt vector and executes the code at that address. As the interrupt vectors each
are only one word long (classics AVR, two words for some megas), you'd usually put a jump instruction
there which goes to an Interrupt Service Routine.
The Interrupt vectors start at address 0x0000. The very first one (at 0x0000) is the reset vector. When a
reset (internal or external) occurs, this is where the program counter will be set to. That's why almost all
programs begin with
.org 0x0000
rjmp reset
;(maybe something
;in between...)
reset:
...
Other interrupt vectors will follow the reset interrupt vector. The first ones are the external interrupt lines
(INT0, INT1 and so on), then there's timers, UART and other periphrals. Every AVR datasheet has an
"Interrupts" section somewhere which will include a list of the available interrupts and their vector
addresses. If the table is not entirely filled, you can use single .org statements to set the program counter
of the assembler to the right interrupt vector address instead of filling up the table with other useless code.
Here are two examples for the 8515 doing the same thing:
.org 0x0000
rjmp reset
rjmp Ext_Int0
rjmp Ext_Int1
reti
reti
reti
reti
reti
reti
rjmp UART_RxC
.org 0x0000
rjmp reset
rjmp Ext_Int0
rjmp Ext_Int1
.org 0x0009
rjmp UART_RxC
So why do some people use the first version? The second one is shorter and, if many interrupt sources are
available (have a look ata the mega128!) better to look at if only a few are used.
The first one is safer. If an interrupt occurs (by error) that has no instruction at the reset vector address, the
next valid one will be called. So if in the second table the SPI transfer complete interrupt occurs for some
unknown reason, the UART_RxC ISR is called. Not good.
Interrupts can occur at any time (unless the Interrupt Enable Bit in the SREG is cleared). Consequently
they can also occur if the code is just doing some calculations. These calculations change flags in the
status register and are used for the next step of the calculation, or some branch. If the ISR is also
changing flags in SREG (for example by testing a register for zero) it can corrupt the calculation that is
taking place in the normal application. That's why ISRs should take some precautional steps:
If interrupts are not wanted during a particular code segment (when doing time critical stuff or calculations),
just disable the Global Interrupt Enable Bit (GIE bit) in SREG.
When an ISR is called, the GIE bit is cleared, so that no int can interrupt the ISR. ISRs should return with
reti instead of ret, as reti reenables the GIE bit automatically.
I/O Ports
The AVR I/O Ports are pretty simple to understand. They have a Port register, a Data Direction register and
a Pin register. These are part of every I/O port in every AVR. Here's a drawing of their basic functionality:
As you can see, there's an internal pull-up for every pin. It can be activated by setting the DDR bit of the
pin to 0 and the Port bit to 1. A cleared DDR bit means that the pin is an input pin. So the pin is
disconnected from the Port register (see the driver in the drawing?) and the pin is floating. In this case the
Port bit controls the pull-up. I have not drawn boxes for writing/reading the data direction bit, because it
would only make the drawing more complex.
Why I actually made the drawing for is not only describing the pull-up, but also to explain a mistake many
people make, even experienced programmers, just because it doesn't "hit the eye":
When the actual state of a port pin is needed (which is not necessarily the Port bit value), often the Port bit
is read instead of the pin bit (by mistake). The Pin bit is directly connected to the physical pin. The port bit
can be disconnected from the pin via the data direction register. So if you have problems with your I/O
code, check for this mistake first.
Here's a table with the possible Port/DDR cominations and what they do to the pin:
DDR bit = 1
Port bit = 1
Port bit = 0
High
Low
DDR bit = 0
pull-up
floating
Reading from/writing to the ports can be done bit-wise or byte-wise (whole port), on Pin, Port, and Data
Direction registers.
The drawing above is just a simple one. As many Port pins have special functions, their values are also
controlled by the internal peripherals, like the pins of the UART or SPI. These are more complex and can
be looked at in the datasheets.
I/O Instructions
The simplest I/O instructions are in and out.
in reads the value of an I/O Port or internal peripheral register (Timers, UART and so on) into a register.
Timers
[8-bit] [16-bit] [Register Overview] [Modes] [Examples]
The AVR has different Timer types. Not all AVRs have all Timers, so look at the datasheet of your AVR
before trying to use a timer it doesn't have...This description is based on the AT90S2313.
I will only describe the "simple" timer modes all timers have. Some AVRs have special timers which
support many more modes than the ones described here, but they are also a bit more difficult to handle,
and as this is a beginners' site, I will not explain them here.
The timers basically only count clock cycles. The timer clock can be equal to the system clock (from the
crystal or whatever clocking option is used) or it can be slowed down by the prescaler first. When using the
prescaler you can achieve greater timer values, while precision goes down.
The prescaler can be set to 8, 64, 256 or 1024 compared to the system clock. An AVR at 8 MHz and a
timer prescaler can count (when using a 16-bit timer) (0xFFFF + 1) * 1024 clock cycles = 67108864 clock
cycles which is 8.388608 seconds. As the prescaler increments the timer every 1024 clock cycles, the
resolution is 1024 clock cycles as well: 1024 clock cycles = 0.000128 seconds compared to 0.125s
resolution and a range of 0.008192 seconds without prescaler. It's also possible to use an external pin for
the timer clock or stop the timer via the prescaler.
The timers are realized as up-counters. Here's a diagram of the basic timer hardware. Don't panic, I'll
explain the registers below.
The 8-bit Timer:
The 8-bit timer is pretty simple: The timer clock (from System Clock, prescaled System Clock or External
Pin T0) counts up the Timer/Counter Register (TCNT0). When it rolls over (0xFF -> 0x00) the Overflow
Flag is set and the Timer/Counter 1 Overflow Interrupt Flag is set. If the corresponding bit in TIMSK (Timer
Interrupt Mask Register) is set (in this case the bit is named "TOIE0") and global Interrupts are enabled,
the micro will jump to the corresponding interrupt vector (in the 2313 this is vector number 7).
The 16-bit Timer
is a little more complex, as it has more modes of operation:
Register Overview:
Bit 0
---
---
---
---
PWM11 PWM10
COM1A0
0
1
0
1
With these bit you can connect the OC1 Pin to the Timer and generate pulses based on the timer. It's
further described below.
PWM11/PWM10: Pulse Width Modulator select bits; These bits select if Timer1 is a PWM and it's
resolution from 8 to 10 bits:
PWM11
0
0
1
1
PWM10
0
1
0
1
PWM Mode
PWM operation disabled
Timer/Counter 1 is an 8-bit PWM
Timer/Counter 1 is a 9-bit PWM
Timer/Counter 1 is a 10-bit PWM
Bit 7
ICNC1
ICES1
---
---
CTC1
CS12
CS11
Bit 0
CS10
ICNC1: Input Capture Noise Canceler; If set, the Noise Canceler on the ICP pin is activated. It will trigger
the input capture after 4 equal samples. The edge to be triggered on is selected by the ICES1 bit.
ICES1: Input Capture Edge Select;
When cleared, the contents of TCNT1 are transferred to ICR (Input Capture Register) on the falling edge
of the ICP pin.
If set, the contents of TCNT1 are transferred on the rising edge of the ICP pin.
CTC1: Clear Timer/Counter 1 on Compare Match; If set, the TCNT1 register is cleared on compare match.
Use this bit to create repeated Interrupts after a certain time, e.g. to handle button debouncing or other
frequently occuring events. Timer 1 is also used in normal mode, remember to clear this bit when leaving
compare match mode if it was set. Otherwise the timer will never overflow and the timing is corrupted.
CS12..10: Clock Select bits; These three bits control the prescaler of timer/counter 1 and the connection to
an external clock on Pin T1.
CS12
CS11
CS10
Mode Description
Stop Timer/Counter 1
divide clock by 8
divide clock by 64
OCR1
The Output Compare register can be used to generate an Interrupt after the number of clock ticks written
to it. It is permanently compared to TCNT1. When both match, the compare match interrupt is triggered. If
the time between interrupts is supposed to be equal every time, the CTC bit has to be set (TCCR1B). It is
a 16-bit register (see note at the beginning of the register section).
ICR1
The Input Capture register can be used to measure the time between pulses on the external ICP pin (Input
Capture Pin). How this pin is connected to ICR is set with the ICNC and ICES bits in TCCR1A. When the
edge selected is detected on the ICP, the contents of TCNT1 are transferred to the ICR and an interrupt is
triggered.
TIMSK and TIFR
The Timer Interrupt Mask Register (TIMSK) and Timer Interrupt Flag (TIFR) Register are used to control
which interrupts are "valid" by setting their bits in TIMSK and to determine which interrupts are currently
pending (TIFR).
Bit 7
TOIE1 OCIE1A
---
---
TICIE1
---
TOIE0
Bit 0
---
TOIE1: Timer Overflow Interrupt Enable (Timer 1); If this bit is set and if global interrupts are enabled, the
micro will jump to the Timer Overflow 1 interrupt vector upon Timer 1 Overflow.
OCIE1A: Output Compare Interrupt Enable 1 A; If set and if global Interrupts are enabled, the micro will
jump to the Output Compare A Interrupt vetor upon compare match.
TICIE1: Timer 1 Input Capture Interrupt Enable; If set and if global Interrupts are enabled, the micro will
jump to the Input Capture Interrupt vector upon an Input Capture event.
TOIE0: Timer Overflow Interrupt Enable (Timer 0); Same as TOIE1, but for the 8-bit Timer 0.
TIFR is not really necessary for controlling and using the timers. It holds the Timer Interrupt Flags
corresponding to their enable bits in TIMSK. If an Interrupt is not enabled your code can check TIFR to
deternime whether an interrupt has occured and clear the interrupt flags. Clearing the interrupt flags is
usually done by writing a logical 1 to them (see datasheet).
Timer Modes
[Normal Mode] [Output Compare Mode] [Input Capture Mode] [PWM Mode]
Normal Mode:
In normal mode, TCNT1 counts up and triggers the Timer/Counter 1 Overflow interrupt when it rolls over
from 0xFFFF to 0x0000. Quite often, beginners assume that they can just load the desired number of clock
ticks into TCNT1 and wait for the interrupt (that's what I did...). This would be true if the timer counted
downwards, but as it counts upwards, you have to load 0x0000 - (timer value) into TCNT1. Assuming a
system clock of 8 MHz and a desired timer of 1 second, you need 8 Million System clock cycles. As this is
too big for the 16-bit range of the timer, set the prescaler to 1024 (256 is possible as well).
8,000,000/1024 = 7812.5 ~ 7813
0x0000 - 7813 = 57723 <- Value for TCNT1 which will result in an overflow after 1 second (1.000064
seconds as we rounded up before)
So we now know the value we have to write to the TCNT1 register. So? What else? This is not enough to
trigger the interrupt after one second. We also have to enable the corresponding interrupt and the global
interrupt enable bit. Here's a flow chart of what happens:
The flow chart should show an arrow from CTC1 set? "Yes" to set OCF instead of a line, but somehow the
Flow Charting Program didn't think that was a good idea. Bad luck.
Let's discuss a small example:We want the Timer to fire an int every 10ms. At 8 MHz that's 80,000 clock
cycles, so we need a prescaler (out of 16-bit range).
In this case, 8 is enough. Don't use more, as that would just pull down accuracy. With a prescaler of 8, we
need to count up to 10,000.As the value of TCNT1 is permanently compared to OCR1 and TCNT1 is upcounting, the value we need to write to OCR is acutally 10,000 and not 0x0000-10,000, as it would be
when using the timer in normal mode.
Also, we need to set CTC1: If we didn't, the timer would keep on counting after reaching 10,000, roll over
and then fire the next int when reaching 10,000, which would then occur after 0xFFFF*8 clock cycles.
That's after 0.065536 seconds. Not after 10ms. If CTC1 is set, TCNT1 is cleared after compare match, so
it will count from 0 to 10,000 again with out rolling over first.
What is to be done when those 10ms Interrupts occur? That depends on the application. If code is to be
executed, the corresponding interrupt enable bit has to be set, in this case it is OCIE1A in TIMSK. Also
check that global interrupts are enabled.
If the OC1 (Output Compare 1) pin is to be used, specify the mode in TCCR1A. You can set, clear or
toggle the pin. If you decide that you want to toggle it, think about your timing twice: If you want a normal
pulse which occurs every 10ms, the timer cycle must be 5ms: 5ms -> toggle on -> 5ms -> toggle off. With
the 10ms example above and OC1 set up to be toggled, the pulse would have a cycle time of 20ms.
Input Capture Mode
The Input Capture Mode can be used to measure the time between two edges on the ICP pin (Input
Capture Pin). Some external circuits make pulses which can be used in just that way. Or you can measure
the rpm of a motor with it. You can either set it up to measure time between rising or falling edges on the
pin. So if you change this setting within the ISR you can measure the length of a pulse. Combine these two
methods and you have completely analysed a pulse. How it works?
Here's a flow chart of its basic functionality:
You see that it's actually pretty simple. I left out the low level stuff such as Interrupt validation
(enabled/global enable), as you should understand that by now. The contents of TCNT1 are transferred to
ICR1 when the selected edge occurs on the Input Capture Pin and an ISR can be called in order to clear
TCNT1 or set it to a specific value. The ISR can also change the egde which is used to generate the next
interrupt.
You can measure the length of a pulse if you change the edge select bit from within the ISR. This can be
done the following way:
Set the ICES (Input Capture Edge Select) bit to 1 (detect rising edge)
When the ISR occurs, set TCNT1 to zero and set ICES to 1 to detect negative egde
When the next ISR is called, the pin changed from high to low. The ICR1 now contains the number of
(prescaled) cycles the pin was high. If the ISR again sets the edge to be detected to rising (ICES=1), the
low pulse time is measured. Now we have the high time AND the low time: We can calculate the total cycle
time and the duty cycle.
It's also possible to connect the Analog Comparator to the input capture trigger line. That means that you
can use the Analog Comparator output to measure analog signal frequencys or other data sources which
need an analog comparator for timing analysis. See the Analog Comparator page for more.
PWM Mode
The Pulse Width Modulator (PWM) Mode of the 16-bit timer is the most complex one of the timer modes
available. That's why it's down here.
The PWM can be set up to have a resolution of either 8, 9 or 10 bits. The resolution has a direct effect on
the PWM frequency (The time between two PWM cycles) and is selected via the PWM11 and PWM10 bits
in TCCR1A. Here's a table showing how the resolution select bits act. Right now the TOP value might
disturb you but you'll see what it's there for. The PWM frequency show the PWM frequency in relation to
the timer clock (which can be prescaled) and NOT the system clock.
PWM11 PWM10 Resolution TOP-value PWM Frequency
0
0
PWM function disabled
0
1
1
1
0
1
8 bits
9 bits
10 bits
$00FF
$01FF
$03FF
fclock/510
fclock/1022
fclock/2046
To understand the next possible PWM settings, I should explain how the PWM mode works. The PWM is
an enhanced Output Compare Mode. In this mode, the timer can also count down, as opposed to the other
modes which only use an up-counting timer. In PWM mode, the timer counts up until it reaches the TOP
value (which is also the resolution of the timer and has effect on the frequency).
When the TCNT1 contents are equal to the OCR1 value, the corresponding output pin is set or cleared,
depending on the selected PWM mode: You can select a normal and an inverted PWM. This is selected
with the COM1A1 and COM1A0 bits (TCCR1A register). The possible settings are:
COM1A1
0
0
1
1
COM1A0
0
1
0
1
Effect:
PWM disabled
PWM disabled
Non-inverting PWM
inverting PWM
Non-inverted PWM means that the Output Compare Pin is CLEARED when the timer is up-counting and
reaches the OCR1 value. When the timer reaches the TOP value, it switches to down-counting and the
Output Compare Pin is SET when the timer value matches the OCR1 value.
Inverted PWM is, of course, the opposite: The Output Compare Pin is set upon an up-counting match and
cleared when the down-couting timer matches the OCR1 value. Here are two diagrams showing what this
looks like:
The reason why you can select between inverting and non-inverting pwm is that some external hardware
might need an active-low pwm signal. Having the option to invert the PWM signal in hardware saves code
space and processing time.
The PWM is also glitch-free. A glitch can occur when the OCR1 value is changed: Imagine the PWM
counting down to 0. After the pin was set, the OCR1 value is changed to some other value. The next pulse
has an undefined length because only the second half of the pulse had the specified new length. That's
why the PWM automatically writes the new value of OCR1 upon reaching the TOP value and therefore
prevents glitches.
Typical applications for the PWM are motor speed controlling, driving LEDs at variable brightness and so
on. Make sure you have appropriate drivers and protection circuitry if you're using motors!
Some Examples...
Some simple examples can also be found here...
- Setting up a timer
- Flashing LED using the Timer Overflow Interrupt and the Output compare mode
- Pulse Width Modulated LED demo with two timers
Setting up a Timer
Setting up a timer is pretty simple - once you know how it basically works. Once you've set up a timer
successfully you can also use the other modes without much learning as all timer modes are based on the
same principles.
Right now, we just want to let an LED light up for 1 second after reset. What do we need for that? The best
way is to set up a timer (Timer1, the 16 bit timer), switch the LED on and wait. As the timer overflow has an
own interrupt, we can write an ISR that switches the LED off again.
First, some stuff that has to be prepared/kept in mind. The following is assumed:
- The program is running on a AT90S2313, the LED is connected to PortB.4 (this pin doesn't have any
special functions), cathode connected to PortB.4 via a current limiting resistor and the anode connected to
Vcc. That means that the LED is ON when the port pin is LOW.
- The micro is running at 4MHz
- How do we get the timer to overflow after 1 second? 1 second means 4 Million cycles, so we need a big
prescaler: 1024 seems to be good. 4,000,000 / 1024 = 3906,25; so after 3906 timer clock cycles the timer
has to overflow. As the timers count UP and then overflow from $FFFF to 0 (that's when the ISR is called),
we have to load TCNT1 with -3906 (=0xF0BE)
- The interrupt vector for Timer1 overflow is at address 0x0005
Here's the code: (don't forget to include 2313def.inc!!!)
.org 0x0000
rjmp reset
.org 0x0005
rjmp led_off
reset:
ldi r16, low(RAMEND)
out SPL, r16
ldi r16, high(0xF0BE)
out TCNT1H, r16
ldi r16, low(0xF0BE)
out TCNT1L, r16
ldi r16, 0b00000101
out TCCR1B, r16
ldi r16, 0b10000000
out TIMSK, r16
sei
sbi DDRB, 4
cbi PortB, 4
loop:
rjmp loop
led_off:
push r16
in r16, SREG
push r16
ldi r16, 0
out TCCR1B, r16
sbi PortB, 4
pop r16
out SREG, r16
pop r16
reti
Simulating this code in AVR Studio showed that the LED is turned off after 3999759 cycles. When
changing the timer value to 0xF0BD the simulator turns the LED off after 4000783 cycles (3999759 +
1024).
This is not the fastest code you can write for this specific problem. As the micro isn't doing anything during
the loop, the ISR doesn't need to preserve any register or the SREG, but I included this anyway to remind
you of that important step.
No it's up to you to alter this code to use a prescaler of 256 or turn the LED off after some other time
interval. Then the TCCR1B and TCNT1 values change. You can also connect the LED to PortB.3 (Output
Capture Pin) and use the Output Compare mode of Timer 1 to make the LED flash! It's now just a matter of
reading the datasheet or the AVR Architecture -> Timers page (see "modes").
loop:
rjmp loop
Timer1_ovf:
in r16, SREG
push r16
ldi r16, high(timer_value)
out TCNT1H, r16
ldi r16, low(timer_value)
out TCNT1L, r16
in r16, PortB
ldi r17, 0b00001000
eor r16, r17
out PortB, r16
pop r16
out SREG, r16
reti
When simulating this in AVR Studio the ISR toggles the Port pin every 2,000,000 clock cycles.
There's an even better way to make an LED flash! Timer 1 has a mode called "Output Compare Mode".
You can read about this on the AVR Architecture -> Timers page or in the datasheet. On the 2313, PortB.3
is also the Output Compare pin, that's why I chose it for this example. I'll not go into detail, just give you
some code and explain what it does.
.equ timer_value = 31250
.org 0x0000
rjmp reset
reset:
ldi r16, low(RAMEND)
out SPL, r16
ldi r16, 0b01000000
out TIMSK, r16
ldi r16, high(timer_value)
out OCR1AH, r16
ldi r16, low(timer_value)
out OCR1AL, r16
ldi r16, 0b01000000
out TCCR1A, r16
ldi r16, 0b00000011
out TCCR1B, r16
sbi DDRB, 3
sbi PortB, 3
loop:
rjmp loop
This time, as the timer value is compared to the Output Compare value, we use 31250 to dreate the
required timing of 0.5 seconds. The timer counts up, and when it reaches 31250, the Output Compare
Match occurs. Then, the OC1 pin is toggled (we told the AVR to do so via TCCR1A). We need no ISR, as
the LED is toggled by hardware. It's also possible to use the ISR from the first example. You then need to
delete the lines which set up TCCR1A (these connect the timer to the OC pin) and set up the interrupt
vector at 0x0003.
This example uses the Timer 1 PWM mode to make an LED sweep through different brightness levels.
The LED shall reach its full brightness after 1 second. After another second it shall be off again and so on.
The PWM resolution is 8 bits. A second timer (Timer 0) os used to update the pwm value 256 times per
second. Therefore the whole pwm range will be gone through once per second. First second up-couting,
second one down-counting and so on.
The LED is connected to the PWM output pin OC1 with anode to Vcc and cathode via a current limiting
resistor to the output pin. That means that the LED is switched on when the output pin is low (if you have
an STK500 just connect one of the LEDs to the output compare pin) and we need an inverted PWM.
We're using two timers and a flag register for this. The flag register does nothing but signal if the LED is
currently getting brighter or not. Let's choose r2 for this. If it is cleared (=0), the LED is getting less power
over time (OC1 value is decreasing). If it is set (=0xFF), the LED is getting brighter (OC1 value is
increasing).
All this can be done interrupt driven. After setting up the cpu and the timers nothing needs to be done:
- for 28800 cycles we'll need a prescaler of 1024 or 256 (28800 / 1024 = 28.125; 28800 / 256 = 112.5;
28800 / 64 = 450 which is out of 8-bit range). Let's choose 256.
- As the timer is up-counting, we need to set TCNT0 to (256 - 113 =) 143 every time the Timer 0 overflow
ISR is called. Unfortunately, timer 0 does not support an output compare mode.
Timer 1 is responsible for generating the PWM output for the LED.
- For simple PWM output we don't need a prescaler.
- To enable inverted PWM operation of timer 1, we need to set COM1A1 and COM1A0 in TCCR1A.
- The resolution shall be 8 bits. For an LED this doesn't really matter, we could also choose a higher
resolution. But 8 bits require less calculations at runtime. That means that the PWM10 bit in TCCR1A has
to be set, while PWM11 is cleared.
TCCR1B has to be set to 1 for enabling the timer 1 clock.
Remember to include 2313def.inc from a path that works on your system. With the one from my example it
most probably won't work.Here's the code, with enough comments to make everything clear.
The problem with this is the following: Though the Master transmits the slave address (see line five), it is
doing that in Master Receiver Mode, because the read bit is set. This important, because the status codes
returned in Master Receiver mode are not in the same table as those in Master Transmitter mode in the
datasheet! Both tables should be printed out to have them ready for programming, as all TWI operations
should be initiated only if the status codes were those that had been expected.
One more thing though: The short TWI action list above mentions ACK and NACK. These are transmitted
as a 9th data bit and indicate whether the device receiving data accepted the data transfer or address.
More on that in the Addressing and Data Transfer parts of this page.
Bus Hardware
All devices connected to the bus must be capable of driving the bus lines SCL (clock) and SDA (data).
That's why the bus is externally pulled up by resistors. The devices connected to the bus only pull it low.
The following figure (Figure 68 from the mega8 datasheet) shows how devices are connected to the bus:
...pretty simple actually. In fact, this is just about everything you'll need for a start. If you're missing the
master in the figure, remember that all devices can be the master if programmed to. Usually there will be
just one master (your AVR), which might be device 2 or 6 or 120 or n. It doesn't matter. The value of the
Pull-Up resistors depends on your bus capacitance (HAHA I know) and can be calculated with a formular
in the datasheet (TWI Characteristics). 4K7 works.
When idle, the bus lines are high (pulled high by the resistors). When SCL is high, SDA must not change
except for Start and Stop. Data on SDA can change while SCL is low and must be valid when SCL goes
high. SCL is pulsed high to clock in the data. SCL is ALWAYS controlled by the master. Both master and
slaves can control SDA.
Start and Stop Conditions
Befora any address or data transmission takes place, the master generates a Start condition. This is done
by taking SDA low while SCL remains high. After a transmission is complete, the master has to generate a
stop condition. This is done by taking SDA high while SCL is high. Again, a figure from the datasheet:
Oh, the repeated start... In multi-master systems it can happen that when a master generates a stop
condition, another master will take control over the bus, though the first master actually wanted to transfer
some more data to a different device than before (more on this is in the datasheert). This doesn't happen if
a repeated start is generated. It works just like a normal start, but the current master remains master.
Addressing the Slaves
After a start condition has been generated by the master, it has to send the address of the slaves it wants
to address. The slave address byte consists of a 7-bit slave address plus a transfer direction bit (R/W; 0 for
writing, 1 for reading). Again, from the datasheet:
The main part of the slave address is in the high nibble (bits 4..7). A 24C16 EEPROM for example has the
slave address 0xA0/0xA1 (for write and read, respectively) and uses the lower three bits (1..3) for page
addressing. The lowermost bit (bit 0) is the R/W bit as explained.
If a slave recognizes its own address, it will pull SDA low in the 9th SCL cycle. This is called an ACK
(acknowledge pulse), and is also used for verifying data transfers. If no slave (with the right address) is
present (or if the slave doesn't want to ACK or if it's busy), SDA will stay high during the 9th SCL cycle
(Pull-Up!). That would be a NACK (Not ACK).
A special case of slave addressing is the "General Call". A general call is done by addressing slave 0x00
(write). It can be used for all sorts of stuff depending on the slaves.
Data Transfer
Data transfers work just like address transfers, but they can be done in both directions (address transfers
always go from the master to the slave). They are as well terminated by an ACK/NACK. The ACK or NACK
is generated by the receiving device. This can be the master or the slave, depending on the transfer
direction (depends on the previous address + R/W transfer!). Multiple data transfers can be done after
transmitting the slave address (for EEPROM page writes, or for reading multiple slave status registers for
example). When a master reads data from a slave, it has to generate ACKs after every byte received and
a NACK after the last byte. The data transfer(s) is/are followed by a Stop condition or a repeated start. A
figure shouldn't be necessary here (similar to the address transfer).
How It's Done
It's time to show some example code! First, some important notes:
- The TWI operates based on the TWINT (TWI Interrupt) Flag in TWCR. This flag is set when an operation
has been finished by the TWI hardware. While it is set, no new operation can be started. It is cleared by
being written to 1. As every TWI operation is started by setting appropriate flags in TWCR, TWINT has to
be written as well for EVERY operation.
- The TWI Enable bit (TWEN) is also located in TWCR and also has to be written to one for starting an
operation.
Here's a TWCR description:
Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
TWINT TWEA TWSTATWSTOTWWC TWEN --- TWIE
Bit 7 - TWINT: As described above; This is the TWI Interrupt Flag. It is set when the TWI finishes ANY bus
operation and has to cleared (by writing a 1 to it) before a new operation can be started.
Bit 6 - TWEA: TWI Enable Acknowledge; When the device receives data (as slave or as master), this bit
has to be set if the next incoming byte should be ACKed and cleared for a NACK.
Bit 5 - TWSTA: TWI Start; When a master has to generate a start condition, write this bit 1 together with
TWEN and TWINT. The TWI hardware will generate a start condition and return the appropriate status
code.
Bit 4 - TWSTO: TWI Stop; Similar to TWSTA, but generates a Stop condition on the bus. TWINT is not set
after generating a Stop condition.
Bit 3 - TWWC: TWI Write Collision; Set by the TWI hardware when writing to the TWI Data Register TWDR
while TWINT is high.
Bit 2 - TWEN: Any bus operation only takes place when TWEN is written to 1 when accessing TWCR.
Bit 0 - TWIE: TWI Interrupt Enable; If this bit is set, the CPU will jump to the TWI reset vector when a TWI
interrupt occurs.
As all TWI operations are determined by the value written to TWCR, they're all similar. Here's the usual
structure:
ldi r16, (1<<TWINT)+(1<<TWEN)+(1<<TWSTA)
out TWCR, r16
This will generate a Start condition. After that, you might want for TWINT to be set and then check the TWI
status register (TWSR) if everything is right:
TWI_wait:
in r16, TWCR
sbrs r16, TWINT
rjmp TWI_wait
in r16, TWSR
andi r16, 0xF8
cpi r16, 0x08
brne TWI_error
HA! Something I didn't tell you above: TWSR also contains the clock prescaler bits (TWSR:0..1). These
have to masked away for checking the status value. More on the TWI clock rate below. What this piece of
code does is:
- Wait for TWINT to be set (after generating the start condition above)
- get the status value
- mask away the prescaler bits
- compare the status value to the status value expected. The expected status values are in 4 tables in the
datasheet. The first two tables are for master transmitter and receiver mode. Print out the tables for
programming! You'll need them.
- If the TWI status is not as expected, jump to TWI_error. This can occur for example if a master that is
NOT master tries to control the bus (-> datasheet!)
Sending data or an address is similar, but you'll have to load the address/data into TWDR first. Assuming a
start condition has just been generated, this piece of code will send slave address 0xA1 (24C16 EEPROM
read):
ldi r16, 0xA1
out TWDR, r16
ldi r16, (1<<TWINT)+(1<<TWEN)
out TWCR, r16
Now the same wait-and-check procedure as above will follow. The status code expected is 0x28 (see
master receiver mode status code table). Again: it's VERY important to be absolutely sure what's
happening on the bus for checking for the correct status value!
TWI Clock Speed
The TWI clock speed is usually 100kHz or 400kHz. It is set by writing proper prescaler and clock rate
values to TWSR (bits 0 and 1: prescaler) and TWBR (TWI bit rate register). The formula for the resulting
TWI clock speed is:
CPU_clock/(16 + 2*TWBR*(4^prescaler))
At 8 MHz, a prescaler of 0 (4^0 = 1) and TWBR = 32 will result in the clock speed being 100kHz. The
mega8 datasheet says that TWBR values <10 should not be used.
The EEPROM
EEPROM (Electrically Erasable Programmable Read Only Memory) is one of the three memory types of
AVRs (the other are the Flash memory and SRAM). EEPROM is able to retain its contents when there is
no supply voltage. You can also change the EEPROM contents on runtime, so, EEPROM is useful to store
information like calibration values, ID numbers, etc.
Most AVRs have some amount of EEPROM (the exceptions are ATtiny11 and ATtiny28). You must check
the corresponding datasheet to know the exact amount of memory of your particular device.
To write in the EEPROM, you need to specify the data you want to write and the address at which you
want to write this data. In order to prevent unintentional EEPROM writes (for instance, during power supply
power up/down), a specific write procedure must be followed. The write process is not instantaneous, it
takes between 2.5 to 4 ms. For this reason, your software must check if the EEPROM is ready to write a
new byte (maybe a previous write opeartion is not finished yet).
The address of the byte you want to write is specified in the EEPROM Address Register (EEAR). If the
AVR you are using has more than 256 bytes, the EEAR register is divided in the EEARH and EEARL
registers. The EEPROM Data Register (EEDR) contains the data you want to store.
The EEPROM Control Register (EECR) is used to control the operation of the EEPROM. It has three bits :
EEMWE, EEWE and EERE. The EERE (EEPROM Read Enable) bit is used to read the EEPROM and is
discussed later. In order to issue an EEPROM write, you must first set the EEMWE (EEPROM Master
Write Enable) bit, and then set the EEWE (EEPROM write enable) bit. If you don't set EEMWE first, setting
EEWE will have no effect. The EEWE bit is also used to know if the EEPROM is ready to write a new byte.
While the EEPROM is busy, EEWE is set to one, and is cleared by hardware when the EEPROM is ready.
So, your program can poll this bit and wait until is cleared before writing the next byte.
The following is a code snippet for writing the data 0xAA in address 0x10 :
cli
; disable interrupts
EEPROM_write:
sbic EECR, EEWE
rjmp EEPROM_write
ldi r16, 0x10
out EEAR, r16
ldi r16, 0xAA
out EEDR, r16
sbi EECR; EEMWE
sbi EECR, EEWE
sei
To read a data from the EEPROM, you must first check that the EEPROM is not busy by polling the EEWE
bit, then you set the EEAR register with the address you want to read, and then set the EERE bit in the
EECR register. After that, the requested data is found in the EEDR register.
The following is a code snippet for reading the data stored in address 0x10. The read data is stored in r16.
EEPROM_read:
sbic EECR, EEWE
rjmp EEPROM_read
ldi r16, 0x10
out EEAR, r16
sbi EECR, EERE
in r16, EEDR
Quite often people report problems reading the data at EEPROM address 0. The data is corrupted or
appears not to be written correctly after a reset. This has a power reason: If the AVR does not have
enough power to run (during times of low supply voltage) it can perform unexpected instructions and
corrupt the first eeprom address. You either need a good reset circuit which can do a reset whenever
needed or just don't use address 0.
Bus Description
Before you can successfully communicate through the SPI, both the Master and Slave must agree on
some clock signal settings. Details on how to configure this in the AVR will be discussed later.
Please note that not all AVRs have an SPI (you must check the particular datasheet). If your AVR doesn't
have an SPI, you still can implement it in software (the details are not discussed here).
In an AVR, four signals (pins) are used for the SPI: MISO, MOSI, SCK and SS' (SS' means SS
complemented). Here is a brief description of the function of each signal:
MISO (Master In Slave Out): the input of the Master's shift register, and the output of the Slave's shift
register.
MOSI (Master Out Slave In): the output of the Master's shift register, and the input of the Slave's shift
register.
SCK (Serial Clock): In the Master, this is the output of the clock generator. In the Slave, it is the input clock
signal.
SS' (Slave Select): Since in an SPI setup you can have several slaves at the same time, you need a way
to select which Slave you want to communicate to. This is what SS' is used for. If SS' is held in a high
state, all Slave SPI pins are normal inputs, and will not receive incoming SPI data. On the other hand, if
SS' is held in a low state, the SPI is activated. The software of the Master must control the SS'-line of each
Slave.
If the SPI-device is configured as a Master, the behavior of the SS' pin depends on the configured data
direction of the pin. If SS' is configured as an output, the pin does not affect the SPI. If SS' is configured as
an input, it must be held high to ensure Master SPI operation. If the SS' pin is driven low, the SPI system
interprets this as another Master selecting the SPI as a Slave and starting to send data to it. Having two
SPI Masters is quite unusual, so the details of how to manage this are not discussed here (if you are
curious, read the datasheet). So, if you want to keep your life simple, configure the Master's SS' pin as an
output.
The following figures show a typical setup used with SPI:
A word of caution about the SPI pin names. MISO, MOSI, SCK and SS' are the names used by AVRs.
Other devices may use a different set of names. You must check the data sheet of the particular device
you are using to get them right.
What are the data directions of the SPI pins? It depends on the particular pin and on whether the SPI is set
as a Master or Slave. In general, there are two possibilities. A pin is configured as an input regardless of
the setting of the Data Direction Register of the port, or the pin must be configured by the user according
to its function. The following table summarizes this:
Pin
MOSI
MISO
SCK
SS'
Registers
[SPCR] [SPSR] [SPDR]
SPCR
(SPI Control Register)
Bit 7
SPIE
SPIE (SPI Interrupt Enable) bit: Set SPIE to one if you want the SPI interrupt to be executed when a serial
transfer is completed.
SPE (SPI Enable) bit: If you want to use the SPI, you must set this bit.
DORD (Data Order) bit: You can choose in which order the data will be transmitted. Set DORD to one to
send the least significant bit (LSB) first. Set DORD to zero to send the most significant bit (MSB) first.
MSTR (Master/Slave Select) bit: Set MSTR to configure the AVR as a Master SPI device. Clear MSTR to
configure it as a Slave.
CPOL (Clock Polarity) and CPHA (Clock Phase) bits: As stated previously, Master and Slave must agree
on how to interpret the clock signal. The first thing to do is to configure which logic level the clock will be in
when the SPI is idle. If CPOL is set to one, SCK is high when idle, and if CPOL is set to zero, SCK is low
when idle. The second thing is to configure during which clock transition the data will be sampled. Set
CPHA to sample the data on the trailing (last) edge, and clear CPHA to sample the data in the leading
(first) edge.
So, there are four different ways of configuring the clock generation, which are known as 'SPI modes'. The
following table summarizes the four SPI modes.
SPI Mode
0
1
2
3
CPOL
0
0
1
1
CPHA
0
1
0
1
Sample
Leading (Rising) Edge
Trailing (Falling) Edge
Leading (Falling) Edge
Trailing (Rising) Edge
The following image shows figure 76 and 77 from the mega128 datasheet:
SPR1 and SPR2 (SPI Clock Rate Select) bits: The SPR bits configure the frequency of the clock signal.
Since the Slave reads the clock from an input pin, the SPR bits have no effect on the Slave. The frequency
of the SPI clock is related to the frequency of the AVR oscillator. The faster the SPI clock signal is, the
faster the data trasfer will be, however, you must respect the maximum clock frequency specified by the
Slave (as usual, read the datasheet). The following table summarizes the relationship between the SCK
frequency and the SPR bits:
SPR1
0
0
1
1
SPR0
0
1
0
1
SCK frequency
fosc/4
fosc/16
fosc/64
fosc/128
SPSR
(SPI Status Register)
Bit 7 Bit 6 Bit 5
SPIF WCOL ---
Bit 4
---
Bit 3
---
Bit 2
---
Bit 1 Bit 0
--- (SPI2x)
SPIF (SPI Interrupt Flag) bit: This is a read only bit. It is set by hardware when a serial transfer is
complete. SPIF is cleared by hardware when the SPI interrupt handling vector is executed, or when the
SPIF bit and the SPDR register are read.
WCOL (Write Colision Flag) bit: This is a read only bit. The WCOL bit is set if the SPDR register is written
to during a data transfer. The WCOL bit (and the SPIF bit) are cleared by first reading the SPI Status
Register with WCOL set, and then accessing the SPI Data Register.
SPI2x (Double SPI Speed) bit: This feature is not implemented in all AVRs (check the particular data
sheet). When this bit is set to one, the SPI speed will be doubled when the SPI is in Master mode.
SPDR
(SPI Data Register)
The SPI Data Register is a read/write register used for data transfer between the Register File and the SPI
Shift Register. Writing to the register initiates data transmission. Reading the register causes the Shift
Register receive buffer to be read.
Finally, here is a code snippet to generate a data transfer between a Master and a Slave. Both Master and
Slave are configured to send the MSB first and to use SPI mode 3. The clock frequency of the Master is
fosc/16. The Master will send the data 0xAA, and the Slave the data 0x55.
Master code:
SPI_Init:
sbi DDRB,DDB5
sbi DDRB,DDB7
sbi DDRB,DDB4
ldi r16,01011101b
out SPCR,r16
SPI_Send:
ldi r16,0xAA
out SPDR,r16
Wait:
sbis SPSR,SPIF
rjmp Wait
in SPDR,r16
Slave code
SPI_Init:
sbi DDRB,DDB6
ldi r16,01001100b
out SPCR,r16
ldi r16,0x55
out SPDR,r16
SPI_Receive:
sbis SPSR,SPIF
rjmp SPI_Receive
in r16,SPDR
The UART
[Registers] [Baud Rate Generator] [TX] [RX] [Important Hardware Note]
The AVR UART is a very powerful and useful peripheral and used in most projects. It can be used for
debugging code, user interaction, or just sending data for logging it on a PC. Here's an image of how it
basically is built up (based on the AT90S2313 UART):
The AVR UART can be set up to transmit 8 or 9 bits, no parity, one Stop bit. It filters the data received and
also detects overrun conditions and framing errors. It has three interrupts and allows highly efficient data
stream handling with software buffers.
From the diagram you see that the transmitter and receiver share the UDR (UART Data Register). Actually
they only share the UDR address: The "real" register is divided into the transmitter and receiver register so
that received data cannot overwrite data being written into the transmit register. Consequently you can't
read back data you wrote into the transmitter register.
As both parts of the UART, the transmitter and the recevier, share the Baud Rate Generator and the
control registers, I'll explain them first before showing you the basics of transferring data via the UART.
UART Registers
TXCIE
UDRIE
RXEN
TXEN
CHR9
RXB8
Bit 0
TXB8
RXCIE: Receive Complete Interrupt Enable; If this bit is set, the reception of a byte via the UART will
cause an Interrupt if global Ints are enabled.
TXCIE: Just the same as RXCIE, but will allow a transmit complete Interrupt.
UDRIE: UART Data Register Empty Interrupt Enable; If this bit is set, an interrupt occurs if UDR is empty.
That allows writing the next byte to UDR while the currently being sent byte is still in the shift register. Also
good if the transmit complete interrupt doesn't write the next byte to UDR. It also allows interrupt driven
start of a transmission if nothing was sent before and a transmit complete interrupt therefore can't occur.
RXEN: Receiver Enable; If this bit is set, the UART receiver is enabled and the RXD pin is set up as an
input pin connected to the UART. All the previous port settings are now disabled, but not overwritten:
Disabling the receiver again will restore the old port settings.
TXEN: Transmitter Enable; If this bit is set, the UART transmitter is enabled and the the TXD pin is set up
as an output pin connected to the transmitter.
CHR9: 9 bit characters; This bit enables the 9-bit character size. By default, it is set to 0 and 8 bits are
used. If 9 bit characters are enabled, the 9th bit is found in RXB8 and TXB8.
RXB8: If CHR9 is set, this is the 9th received bit.
TXB8: If CHR9 is set, this is the 9th bit that is to be transmitted.
If 9 bit transmissions are enabled, TXB8 has to be filled before transmission is started by writing the lower
8 bits to UDR. RXB8 is valid after the received data has been transferred from the rx shift register. It is
buffered as well, so it doesn't change until a new byte is completely received.
USR
The UART status register holds status flags such as interrupt flags, overflow and framing error flags:
Bit 7
RXC
TXC
UDRE
FE
OR
---
---
Bit 0
---
RXC:
Receive Complete; This is the interrupt flag that is set when the UART has completely received a
character. You can clear it in software by writing a 1 to it. You can either use it to let the AVR execute the
interrupt service routine or poll it in a loop with interrupts disabled.
TXC:
Transmit Complete; This flag is set when a transmit is completed. It can be used in the same ways as RXC
(regarding clearing it in software and polling).
UDRE:
UART Data Register Empty; This flag is set while the UDR is empty. This condition occurs when a
character is transferred from the UDR to the transmit shift register. If the next character is written to UDR
now, it will not be transferred to the UDR until the character currently being transferred is completely
shifted out.
This flag can be used to ensure maximum throughput by using a software buffer. Consequently, the UDRE
ISR has to wite UDR: Otherwise the interrupt will occur again until data has been written to UDR or the
UDRIE flag has been cleared.
UDRE is set upon reset to indicate that the transmitter is ready.
FE:
Framing Error; This flag is set if the STOP bit is not received correctly. This is the case if it was interpreted
to be low by the data recovery logic. And that's wrong. So if the FE bit is read 1 by your software, you must
have serious noise problems or another hardware error.
OR:
OverRun; The OverRun Flag is very useful for detecting if your code is handling incoming data fast
enough: It is set when a character is transferred from the rx shift register to UDR before the previously
received character is read. It is cleared again when the next character is read.
The Baud Rate Generator
The UART Baud Rate Generator defines the clock used for transmitting and receiving data via the UART.
Unlike the timer clock, which can be prescaled in some rough steps, the UART clock can be divided very
precisely, resulting in clean and (to some extent) error-free data transfer.
You might have noticed that the baud rate is divided by 16 before it is fed into the Rx/Tx Shift registers.
The clock generated by the UART baud rate generator is 16 times higher than the baud rate we want to
use for transferring data.
This clock is used by the Data Recovery Logic: It samples the data and therefore filters it a bit, so that less
errors occur. In the middle of a bit that is to be received, it takes three samples: The two (or three) equal
samples are high, the bit shifted into the Rx Shift register is high as well. If two samples are wrong, the
data in the shift register is also wrong, but that is only possible if the connection is really bad.
The Clock used for shifting in the data is then divided by 16 (see diagram) and therefore corresponds to
the baud rate.
As there's no need to sample data for the Tx shift register, it is directly clocked by the baud rate.
The formlua for calculating the Baud rate generated from a specific value in UBRR (UART Baud Rate
Register) the AVR datasheets presents this formula:
BAUD= fck / (16(UBRR+1))
Example: System Clock is 8 MHz and we need 9600 Baud. Unfortunatley, the formula above does not give
us the UBRR value from fck and baud rate, but Baud rate from fck and UBRR. The better version for this of
the formula is:
UBRR
=
fck
-1
(16 * baud)
Using the value above (8 MHz and 9600 baud) we get the value of 51.08333333 for UBRR. So it's 51. The
error we get is the actual baud ratedivided by the desired bud rate: The actual baud rate is (first formula!)
9615 baud, dividing this by 9600 gives 1.0016 and therefore an error of 0.16%.
This will work, but it's not perfect. That's why you can get crystals with funny frequencies, such as 7.3728
MHz: Using that one for 9600 baud gives (2nd formula) us UBRR = 47 and no error. You can find tables
with various clock/baud combinations in the AVR datasheets. If you can't find the one you want to use, just
use the formulas above which wil give you the same results.
The UART Transmitter
The UART transmitter sends data from the AVR to some other device (data logger, PC, ...anything) at the
specified Baud Rate. The transmission is initiated by writing data to UDR. This data is then transferred to
the TX shift register when the previously written byte has been shifted out completely. The next byte can
now be written to UDR.
When a byte is transferred to the TX shift register, the UDRE flag is set. The UDRE ISR can write the next
byte to UDR without corrupting the transmission in progress.
When a byte is completely shifted out AND no data has been written to UDR by the UDRE ISR, the TXC
flag is set.
How the transmitter interrupt flags work together can be understood quite easily with the following flow
chart:
This flow chart depends on a software FIFO buffer which is a somehow non-trivial task, but it also explains
the flags pretty well I think: The transmission complete flag will only be set if the transmission is really
complete: By writing the buffer software properly YOU tell the UART when the transmission is complete.
Isn't that cool?
The UART Receiver
The UART receiver is basically built up like the transmitter, but with the appropriate extras for receiving
data: Data recovery logic for sampling the data and just one interrupt for the completion of data reception.
It uses the same baud rate setting as the transmitter. The data is sampled in the middle of the bit to be
received:
The small lines at the bottom of the image (three of which are samples) are the clock generated by the
UART Baud Rate Generator. This should also make clear why the baud rate is first generated 16 times
higher than needed and then divided by 16 in order to shift in the data. This higher baud rate is used for
sampling/filtering.
Important Hardware Note
If you want to connect your AVR to a PC you have to use RS-232 voltage levels. The voltage levels used
by an AVR are normal TTL levels (5V or 3.3V for high and 0V for low levels). RS-232 levels are much
different from that.
To convert the logic levels to RS-232 you need a normal level converter such as the MAX232. It's pretty
cheap an only needs a few external caps to work. It comes in a variety of packages and is available almost
everywhere.
WARNING! This is a diagram of the MAX202 from the MAX232 datasheet. Use 10F caps for the
MAX232! The one connected from VCC to ground should be 0.1F though.
For the UART to work you need one driver per direction only: One Transmitter (T1 or T2 in the diagram)
from AVR to PC and one receiver (R1 or R2 in the diagram) from PC to AVR.
The Cable
The Cable from your circuit to the PC will most probably have a 9-pin D-type connector. The signals we
need are Ground, Receive Data and Transmit Data. Below is a table of the necessary connections. The
signal name refers to the PC side.
Signal
PC side (male)
5
3
2
5
3
2
Ground
Tx
Rx
MAX232 pin to
connector
15
13 or 8
14 or 7
To find out which pin of the connector has which number, have a close look at it: Most have tiny numbers
next to the pins on the plastic isolator. For more information, see www.hardwarebook.net. If your PC has a
25-pin connector you'll find the pinouts for it on that site as well.
I will not go into detail about the RS232 protocol. The AVR datasheets have a small description of it (in the
2313 datasheet see the "sampling received data" figure), which should be enough for a start. If you want
more, have a look at www.beyondlogic.org.
UART Setup
Setting up the UART is not very hard. You need to know the following:
- Clock frequency of your AVR
- desired baud rate
- data format (how many bits per transmission)
The clock frequency and the desired baud rate are used for calculating the UBRR value. With the formula
from the datasheet or the AVR Architecture -> UART page this can be calculated in no time. Assuming a
speed 3.6864 MHz and a desired baud rate of 38400, we get a value of 5. This must be written to UBRR
The data format will usually be 8 bits per transfer. Sometimes 9 bits are used, which the 2313 supports as
well. The megas even have more options, but the 8-bit format is enough for now.
The next question we have to answer is: Interrupt driven or polling? Interrupt driven is of course more
efficient, but when sending strings or packets of data, polling is easier, as an interrupt driven UART needs
software buffers for efficient string transfers. These can be added, but then it's not a "simple example" any
more :-) Below the polling example, you will find an interrupt driven version of it.
The example code below shows how to use polling. As we don't use interrupts, these can be left disabled.
The transmitter and receiver have to be enabled though in order to make usage of the UART possible.
The Setup Code
setup_uart:
ldi r16, 5
out UBRR, r16
ldi r16, 0b00011000
out UCR, r16
ret
So what do we want the AVR to do with the UART. A very simple task is to echo back the data we received
from the PC. When typing in characters in a terminal, we should receive copies from it, so everything we
type in should show up twice (assuming a local echo).
For receiving data we wait until the RXC flag in USR (UART Status Register) is set and then read that data
from UDR (UART Data Register). Then we can transmit it again by writing it to UDR. If we write data to
UDR while a byte is received that won't hurt, as the UDR is divided into two registers, one for each
direction. God huh? Before writing it to UDR we need for the UDRE flag to be set, because it indicates
when a character is transferred to the UART transfer shift register. Then a new character can be written to
UDR.
Example Code
So, enough theory, here's the code. Don't forget to include the 2313def.inc file and the setup routine
above!
.org 0x0000
rjmp reset
reset:
ldi r16, low(RAMEND)
out SPL, r16
rcall setup_uart
loop:
rcall rx_uart
rcall tx_uart
rjmp loop
rx_uart:
in r16, USR
sbrs r16, RXC
rjmp rx_uart
in r16, UDR
ret
tx_uart:
in r17, USR
sbrs r17, UDRE
rjmp tx_uart
out UDR, r16
ret
;include setup_uart here!
After thinking about the code a bit you might come to the conclusion that the status register check for
transmitting data is not necessary, as the data is coming in at very low speed (as fast as you can type) and
therefore will be echoed back before the next character comes and can be transmitted again. I included
this for showing how this check is done, because other applications might send data at higher speed. This
is the case when sending data packets or strings. In that case, the application would send a character, get
the next one from memory and send it as soon as possible.
Interrupt Driven Examples
The interrupt driven example doesn't hang around in loops checking if data had come in. Instead, the Rx
Complete interrupt is used to determine when data is ready. It is then echoed back by the RXC ISR. To
make the interrupt driven echo possible, the RXC Interrupt has to enabled (RXCIE in UCR is set) and, of
course, global interrupts have to be allowed as well. The correct interrupt vector has to be installed, too.
.org 0x0000
rjmp reset
.org 0x0007
rjmp UART_RXC
reset:
ldi r16, low(RAMEND)
;
;
; clock divider value for 38400 baud @ 3.6864 MHz
;
; enable Rx Complete Int, enable receiver and transmitter
;
;
; enable interrupts
;
; loop here (do nothing)
;
;
; UART Rx complete interrupt handler:
; get data we received
; write it to UDR
; return from int
This examle, apart from being interrupt driven, is different from the first one: The ISR doesn't check if it's
allowed to write to UDR, so collisions can occur if the previous character wasn't transferred yet. This could
be done with an ISR for the UART Data Register Empty interrupt. The flow chart shows how the two ISR
would communicate via the UDRE Interrupt Enable (UDRIE) bit:
.org 0x0000
rjmp reset
.org 0x0007
rjmp UART_RXC
rjmp UART_DRE
reset:
ldi r16, low(RAMEND)
out SPL, r16
ldi r16, 5
out UBRR, r16
ldi r16, 0b10011000
out UCR, r16
sei
loop:
rjmp loop
; same as above
;
;
; here's the Rx Complete vector
; here's the UDRE Int vector (.org 0x0008)
;
;
; stack setup
;
;
; set baud rate
;
; enable Rx and Tx, enable Rx Complete Interrupt
; UDRIE is NOT(!) set!!! This is done by the RXC ISR
;
; enable Interrupts
;
; do nothing as long as power is present
;
UART_RXC:
in r17, UDR
in r16, UCR
sbr r16, 0b00100000
out UCR, r16
reti
UART_DRE:
in r16, UCR
cbr r16, 0b00100000
out UCR, r16
out UDR, r17
reti
;
; UART Rx Complete ISR:
; get data
; get UART Control Register
; and set UDRIE bit;
; store UART Control Register again
; and that's it.
;
; UART Data Register Empty ISR: Will be called as soon as
UART_RXC
; returns! Get UCR
; clear UDRIE bit
; and store UCR again
; send data
; return from ISR
These three examples should have given you an idea about UART usage and interrupt setup issues. The
last example (with RXC and UDRE interrupts) is almost ready for FIFO buffer usage.
.db "0123456789ABCDEF"
The base address of the table plus 4 is the address where "4" is stored at.
Here's the complete code (without includes!):
.equ baud19200 = 12
.dseg
low_nibble: .byte 1
.cseg
.org 0x0000
rjmp reset
.org 0x0007
rjmp UART_RXC
rjmp UART_UDRE
reset:
ldi r16, low(RAMEND)
out SPL, r16
ldi r16, baud19200
out UBRR, r16
ldi r16, 0b10011000
out UCR, r16
sei
loop: rjmp loop
UART_RXC:
push r16
in r16, SREG
push r16
in r16, UDR
mov r17, r16
andi r16, 0x0F
swap r17
andi r17, 0x0F
ldi ZL, low(2*hex_table)
ldi ZH, high(2*hex_table)
add ZL, r17
lpm
out UDR, r0
ldi ZL, low(2*hex_table)
ldi ZH, high(2*hex_table)
add ZL, r16
lpm
sts low_nibble, r0
ldi r16, 0b10111000
out UCR, r16
pop r16
out SREG, r16
pop r16
reti
UART_UDRE:
push r16
in r16, SREG
push r16
lds r16, low_nibble
out UDR, r16
ldi r16, 0b10011000
out UCR, r16
pop r16
out SREG, r16
pop r16
reti
hex_table:
.db "0123456789ABCDEF"
;
;
; load r16 with data from low_nibble
; and send it
;
;disable UDRE interrupt
;
;
; restore r16
; and SREG
;
;and return from interrupt
;
; This is the hex table label
; and this is the hex -> ascii table
Just add the include file for the 2313 and simulate the code now. You'll see that "A" typed in at the terminal
returns "41", which correspond to "A"'s ascii value. A backspace returns 0x08, return is 0x0D.
tx_n: .byte 1
.cseg
init_FIFOs:
ldi r16, low(rx_fifo)
ldi r17, high(rx_fifo)
sts rx_in, r16
sts rx_in + 1, r17
sts rx_out, r16
sts rx_out + 1, r16
clr r16
sts rx_n, r16
ldi r16, low(tx_fifo)
ldi r17, high(tx_fifo)
sts tx_in, r16
sts tx_in + 1, r17
sts tx_out, r16
sts tx_out + 1, r16
clr r16
sts tx_n, r16
ret
;
;
;
; this is a routine we can call during init:
; load address of the rx FIFO space to r16:r17
;
; and store it as the in and
;
; out pointer
;
; clear the counter
; and store it as well.
;
; same for the transmitter
;
;
;
;
;
;
;
; return from the routine
Receiver FIFO:
As the UART receiver only has one interrupt source, we don't need to choose one (this will be needed for
the transmitter). The UART Rx interrupt occurs whenever a byte has been received. This byte is then
added to the Rx FIFO by the ISR. Another routine is needed to consume a byte from the buffer again
during normal operation, for example when we need to process some received data.
That makes 2 routines for the Rx side. First, the ISR:
UART_RXC:
push r16
; get counter
; increment
; store counter again
;
; restore registers we used
;
;
;
;
;
; return
;
; if X stored the data at the last fifo memory location,
; roll over to the first address again
;
; and proceed as usual
Reading from the buffer requires another routine which uses the rx_out pointer to get data from the buffer.
It also doesn't need to save stuff, as it's not an ISR and will be executed at a known time. The routine shall
return the data from the buffer in r18.
UART_read_fifo:
lds r16, rx_n
cpi r16, 1
brsh rx_fifo_read
ret
; call this from within the application to get UART Rx data to r18
; load number of received bytes
; if one byte or more available,
; branch to rx_fifo_read
;else return
;
rx_fifo_read:
; data is available:
lds XL, rx_out
; Get the Rx FIFO consume pointer
lds XH, rx_out + 1
;
ld r18, X+
; and load data to r18
;
ldi r16, low(rx_fifo + rx_size) ; check if end of mem space reached:
ldi r17, high(rx_fifo + rx_size) ; r16:r17 = first invalid address above Rx FIFO memory
cp r16, XL
; 16-bit compare: X = invalid address above Rx FIFO memory?
cpc r17, XH
;
breq rx_fifo_r_rollover
; yes, roll over to base address
;
rx_fifo_w_store:
; store the new pointer
sts rx_out, XL
;
sts rx_out + 1, XH
;
;
lds r16, rx_n
; load counter
dec r16
; decrease it
sts rx_n, r16
; and store it again
ret
; return to application
;
rx_fifo_r_rollover:
; roll over to base address:
ldi XL, low(rx_fifo)
; load base address to X
ldi XH, high(rx_fifo)
;
rjmp rx_fifo_r_store
; and store the pointer
Transmitter FIFO
The transmitter FIFO for the UART works just like the receiving one, with a small difference: The ISR
routine in this case reads from the FIFO and writes the data to UDR, while the write routine takes the data
from a specified location or register (let's take r18) and writes it to the FIFO.
So which interrupt do we choose? The UART offers the UART Data Register Empty (UDRE) interrupt and
the UART Transmit Complete (TXC) interrupt. The transmit complete interrupt only occurs when a
transmission is finished, so we can't use it for our purpose for two reasons:
- The Transmission finishes and then the ISR is called. So what? Maximum speed can't be achieved when
using this interrupt. By using the UDRE int, the next byte to be transmitted is already in UDR when the
previous transmission finishes and can be tranmitted by the hardware. If the interrupt occurs when the
previous transmission finishes, the next byte has to be taken from the buffer memory space first and time
is lost between two transmissions.
- If the UDRE interrupt is used and no data is available (last transmission was the last byte in the buffer)
we can just disable the UDRE int re-enable it as soon as new data is written to the transmit FIFO. By reenabling it, the ISR will be called because UDR is emtpy and transmission will start again. The TXC int will
not provide this automatical transmission start. The code for the transmit FIFO can be cut 'n pasted from
the RX FIFO with the small changes described above. This will be no problem if you understood the RX
FIFO.
The following code does the following (it's written for a 2313):
Stack and UART setup (38400 baud @ 7.3728 MHz)
FIFO setup
Receive data via Rx FIFO and loop it back via Tx FIFO
If you have an STK 500 you only need to plug in a 2313 and a 7.3728 MHz crystal, connect PD0 to the
RS232 spare RxD pin and PD1 to the TxD pin. Don't forget power and the connection to your PC via a
COM port...
Also change the first line (include directive for 2313def.inc) to suit your system.
Here's the asm file
Since AIN0 and AIN1 are the alternate functions of two Port B pins, you must set the data direction bits
accordingly (which pins are connected to the Analog Comparator depends on the particular AVR you are
using, check the datasheet). Clear DDBx and DDBy to set the pins as an input, and clear PBx and PBy to
disable the internal pullup resistor.
The Analog Comparator is quite simple and has only one register : the Analog Comparator Control and
Status Register (ACSR):
Bit 7
ACD
Bit 6
---
Bit 5
ACO
Bit 4
ACI
Bit 3
ACIE
Bit 2
ACIC
Bit 1
ACIS1
Bit 0
ACIS0
ACD (Analog Comparator Disable) bit : If you want to disable the Analog Comparator (for instance, to
reduce power consumption), you must set this bit. A word of caution : you must disable the Analog
Comparator interrupt before disabling the Analog Comparator to avoid an unintentional interrupt.
ACO (Analog Comparator Output) bit : Is the output of the Analog Comparator. You can read this bit to
determine the current state of the Analog Inputs. What the output states mean is described above.
ACI (Analog Comparator Interrupt Flag) bit : This bit is set when a comparator output triggers the interrupt
mode defined by ACIS bits (see below for details). Also, if the Global Interrupt and the Analog Comparator
interrupt is enabled, the Analog Comparator interrupt service routine is executed. ACI is cleared by
hardware when executing the corresponding interrupt handling vector. Alternatively, ACI is cleared by
writing a logical 1 to the flag. YES, it's not a typo: you must write a 1 to clear the flag. This has a nasty side
effect : if you modify some other bit of ACSR using the SBI or the CBI instruction, ACI will be cleared if it
was set before the sbi/cbi operation.
ACIE (Analog Comparator Interrupt Enable) bit : When the ACIE bit is set and global interrupts are
enabled, the Analog Comparator interrupt is activated. When cleared, the interrupt is disabled.
ACIC (Analog Comparator Capture Enable) bit : One interesting thing you can do, is to connect the Analog
Comparator output to the Timer1/Counter1 Input Capture function. In this way, you can measure the time
between two events in the Analog Comparator. If you want to use this feature, set this bit.
ACIS (Analog Comparator Interrupt Mode Select) bits : you can choose when the the Analog Comparator
Interrupt Flag (ACI) will be triggered. There are three possibilities. When the Analog Comparator output
changes from 0 to 1 (rising output edge), when the Analog Comparator Output changes 1 to 0 (falling
output edge), or whenever the Analog Comparator output changes (output toggle). As with the ACD bit,
you must disable the Analog Comparator interrupt when you change these bits to avoid unwanted
interrupts.
ACIS1 ACIS0
Interrupt Mode
0
0 Interrupt on output toggle
0
1 Reserved (don't use)
1
0 Interrupt on falling output edge
1
1 Interrupt on rising output edge
Some AVRs have a more complex (see datasheets) Analog Comparators. They are equal to the Analog
Comparator explained here, but in addition, you can use an internal voltage reference or one of the inputs
of the Analog to Digital Converter as one of the Analog Inputs.
CAUTION :
You MUST respect the voltage range allowed for the AVR pins (see Maximum Absolute Ratings in the
Electrical Characteristics section of the datasheet). The voltage must be below VCC+0.5V and above -1V.
If you don't respect this, you will blow your AVR. Be sure that the analog signals you are using are in the
right range. If they come from the external world, is a good idea to use some kind of protection at the input.
See the suggested circuit below (which consists of just one resistor...).
This circuit uses the internal clamping diodes present in all AVR I/O pins. If the analog voltage is higher
than Vcc plus the conduction voltage of the diode (around 0.5V), the upper diode will conduct and the
voltage at the input pin is clamped to Vcc+0.5 . On the other hand, if the analog voltage is lower than 0V
minus the conduction voltage of the diode, the lower diode will conduct, and the voltage at the input pin is
clamped to 0.5V. The resistor will limit the current through the conducting diode, which must not exceed
1mA, so you must design the resistor accordingly. For instance, if you expect that the max value that may
reach the analog voltage is 24V, the resistor value should be :
R=24V/1mA=24K.
At the input of the ADC itself is an analog multiplexer, which is used to select between eight analog inputs.
That means that you can convert up to eight signals (not at the same time of course). At the end of the
conversion, the correponding value is transferred to the registers ADCH and ADCL. As the AVR's registers
are 8-bit wide, the 10-bit value can only be held in two registers.
The analog voltage at the input of the ADC must be greater than 0V, and smaller than the ADC's reference
voltage AREF. The reference voltage is an external voltage you must supply at the Aref pin of the chip. The
value the voltage at the input is converted to can be calculated with the following formula:
ADC conversion value = round( (vin/vref)*1023)
Since it is a 10-bit ADC, you have 1024(1024=2^10) possible output values (from 0 to 1023). So, if vin is
equal to 0V, the result of the conversion will be 0, if vin is equal to vref, it will be 1023, and if vin is equal to
vref/2 it will be 512. As you can see, since you are converting a continuous variable (with infinite possible
values) to a variable with a finite number of possible values (elegantly called a "discrete variable"), the
ADC conversion produces an error, known as "quantization error".
Modes of Operation
The ADC has two fundamental operation modes: Single Conversion and Free Running. In Single
Conversion mode, you have to initiate each conversion. When it is done, the result is placed in the ADC
Data register pair and no new conversion is started. In Free Runing mode, you start the conversion only
once, and then, the ADC automatically will start the following conversion as soon as the previous one is
finished.
The analog to digital conversion is not instantaneous, it takes some time. This time depends on the clock
signal used by the ADC. The conversion time is proportional to the frequency of the ADC clock signal,
which must be between 50kHz and 200kHz.
If you can live with less than 10-bit resolution, you can reduce the conversion time by increasing the ADC
clock frequency. The ADC module contains a prescaler, which divides the system clock to an acceptable
ADC clock frequency. You configure the division factor of the prescaler using the ADPS bits (see below for
the details).
To know the time that a conversion takes, just need to divide the number of ADC clock cycles needed for
conversion by the frequency of the ADC clock. Normaly, a conversion takes 13 ADC clock cycles. The first
conversion after the ADC is switched on (by setting the ADEN bit) takes 25 ADC clock cycles. This first
conversion is called an "Extended Conversion". For instance, if you are using a 200kHz ADC clock signal,
a normal conversion will take 65 microsenconds (13/200e3=65e-6), and an extended conversion will take
125 microseconds (25/200e3=125e-6).
Registers
[ADMUX] [ADCSR] [ADCL/ADCH]
There are four registers related to the operation of the ADC : ADC Multiplexer Select Register (ADMUX),
ADC Control and Status Register (ADCSR), ADC Data Register Low (ADCL) and ADC Data Register High
(ADCH). Let's discuss them in detail.
ADMUX
Bit 7
---
Bit 6
---
Bit 5
---
Bit 4
---
This register is used to select which of the 8 channel (between ADC0 to ADC7) will be the input to the
ADC. Since there are 8 possible inputs, only the 3 least significant bits of this register are used. The
following table describe the setting of ADMUX.
MUX2 MUX1 MUX0
0
0
0
0
0
1
0
1
0
0
1
1
1
0
0
1
0
1
1
1
0
1
1
1
Selected Input
ADC0
ADC1
ADC2
ADC3
ADC4
ADC5
ADC6
ADC7
You can see that it's possible to load a register with the desired input number and write it to ADMUX
directly, as the register does not contain any other flags or setting bits.
If these bits are changed during a conversion, the change will have no effect until this conversion is
complete. This is a problem when multiple channels are scanned:
If you can make sure that the ISR always changes the ADMUX value to the next channel (or some other
value that can be reconstructed by the next ISR) the value in the ADC data register pair is always the
conversion result from the last ADMUX change. When the ISR changes ADMUX from 2 to 3, the value in
the data registers is from channel 2.
ADCSR
Bit 7
ADEN
Bit 6
ADSC
Bit 5
ADFR
Bit 4
ADIF
Bit 3
ADIE
Bit 2
Bit 1
Bit 0
ADPS2 ADPS1 ADPS0
ADEN (ADC Enable) bit : Setting this bit enables the ADC. By clearing this bit to zero, the ADC is turned
off. Turning the ADC off while a conversion is in progress will terminate this conversion.
ADSC (ADC Start Conversion) bit : In Free Running Mode, you must set this bit to start the first
conversion. The following conversions will be started automatically. In Single Conversion Mode, you must
set it to start each conversion. This bit will be cleared by hardware when a normal conversion is
completed. Remember that the first conversion after the ADC is enabled is an extended conversion. An
extended conversion will not clear this bit after completion.
ADFR (ADC Free Running Select) bit : If you want to use the Free Running Mode, you must set this bit.
ADIF (ADC Interrupt Flag) bit : This bit is set when an ADC conversion is completed. If the ADIE bit is set
and global interrupts are enabled, the ADC Conversion Complete interrupt is executed. ADIF is cleared by
hardware when executing the corresponding interrupt handling vector. Alternatively, ADIF is cleared by
writing a logical 1 (!) to the flag. This has a nasty side effect : if you modify some other bit of ADCSR using
the SBI or the CBI instruction, ADIF will be cleared if it has become set before the operation.
ADIE (ADC Interrupt Enable) bit : When the ADIE bit is set and global interrupts are enabled, the ADC
interrupt is activated and the ADC interrupt routine is called when a conversion is completed. When
cleared, the interrupt is disabled.
ADPS (ADC Prescaler Select ) bits : These bits determine the division factor between the AVR clock
frequency and the ADC clock frequency. The following table describe the setting of these bits :
ADPS2 ADPS1 ADPS0 Division Factor
0
0
0
2
0
0
1
2
0
1
0
4
0
1
1
8
1
0
0
16
1
0
1
32
1
1
0
64
1
1
1
128
ADCL and ADCH
These registers hold the result of the last ADC conversion. ADCH holds the two most significant bits, and
ADCL holds the remaining bits.
When ADCL is read, the ADC Data Register is not updated until ADCH is read. Consequently,it is essential
that both registers are read and that ADCL is read before ADCH.
Here is a code snippet to make a conversion of ADC3. The result is placed in r16 and r17. The AVR is
running at 4MHz:
ADC_Init:
ldi r16,3
out ADMUX, r16
ldi r16, 10000101b
out ADCSR,r16
sbi ADCSR,ADSC
; Select ADC3
; Enable ADC, Single Mode conversion
; ADC Interrupt disable, Prescaler division factor = 32
; this gives an ADC clock frequency of 4e6/32=125kHz.
; Start conversion
Wait:
sbis ADCSR,ADIF
rjmp Wait:
in r16,ADCL
in r17,ADCH
The ATmega series of AVRs have a more complex ADC. They are similar to the ADC explained here, but
have some additional features like (see the datasheet for the details) :
Hardware issues
Due to the analog nature of the ADC, there are some additional issues you must consider. First of all, the
ADC has two separate analog supply voltage pins, AVCC and AGND. If your application doesn't require
great accuracy, you can keep your life simple and just connect directly AVCC to VCC, and AGND to GND.
However, if you want to get the best performance of the ADC, you must pay special attention to the ADC
power supply and PCB routing. See the "ADC Noise Canceling Techniques" section of the datasheet to get
the details. Beside that, the CPU core of the AVR also induce some noise during the conversion. For this
reason, the ADC features a noise canceler that enables conversion during Idle Mode. Please see the
datasheet to get the details.
CAUTION :
You MUST respect the voltage range allowed for the AVR pins (see Maximum Absolute Ratings in the
Electrical Characteristics section of the datasheet). The voltage must be below VCC+0.5V and above -1V.
If you don't respect this, you will blow your AVR. Be sure that the analog signals you are using are in the
right range. If they come from the external world, is a good idea to use some kind of protection at the input.
See the suggested circuit below (which consists of just one resistor...).
This circuit uses the internal clamping diodes present in all AVR I/O pins. If the analog voltage is higher
than Vcc plus the conduction voltage of the diode (around 0.5V), the upper diode will conduct and the
voltage at the input pin is clamped to Vcc+0.5 . On the other hand, if the analog voltage is lower than 0V
minus the conduction voltage of the diode, the lower diode will conduct, and the voltage at the input pin is
clamped to 0.5V. The resistor will limit the current through the conducting diode, which must not exceed
1mA, so you must design the resistor accordingly. For instance, if you expect that the maxim value that
may reach the analog voltage is 24V, the resistor value should be :
R=24V/1mA=24K.
Common Pitfall
I am not sure if this is a "Common" pitfall, but at least two guys (one of them me) had fallen in it. Is a
common temptation to use the output of the voltage regulator as the voltage reference for the ADC. The
problem is that a typical voltage regulator, like a 7805, has voltage tolerance of about 5%. This mean that
the ADC converted value will have a 5% error. Lets take an example. Suppose that the regulator output
voltage is 5.1V, and the input to the ADC is 2.5V. You would expect a converted value of 512, but instead
you get 501. Seeing that, you could think that something is wrong with your ADC, but the problem is with
your reference voltage. Don't worry, there's components designed to produce reference voltages, like the
LM285. However. There is one exception to this rule: when you are making a radiometric measurement. In
a radiometric measurement, the voltage is a proportion of the regulator voltage, so any error in the value of
this voltage is canceled. The output of a potentiometer is a typical radiometric output. The problem
described above is common to external sensors that have an own power supply.
; reset vector
; jump to "reset"
;
; ADC Conversion Complete Interrupt vector:
; jump the "ADC_ISR"
;
reset:
ldi r16, low(RAMEND)
out SPL, r16
ldi r16, high(RAMEND)
out SPH, r16
ldi r16, 0xFF
out DDRD, r16
ldi r16, 0
out ADMUX, r16
ldi r16, 0b11101101
out ADCSR, r16
sei
loop:
rjmp loop
ADC_ISR:
push r16
in r16, SREG
push r16
push r17
in r16, ADCL
in r17, ADCH
lsr r17
ror r16
lsr r17
ror r16
com r16
out PortD, r16
pop r17
pop r16
out SREG, r16
pop r16
reti
Fuses
Certain features of AVRs are controlled by fuse bits. For example, settings like clock options or the brownout reset circuit are configured with these bits. These configurations differ from the other AVR peripherals
(like SPI, ADC, etc) because you set the fuse bits at program time (with a programmer) instead of writing
to some I/O memory space register at run time. So, for instance, to set the AVR to use an external
oscillator, you must set this at the moment you program it. There is no way to change the clock behavior
through the program code. If you change your mind, you must reprogram the AVR.
The details of how to program the fuse bits in your AVR depend on the particular programmer you are
using. Consult the manual about the details. For instance, if you are using an STK500 with AVR Studio, the
STK500 window has a tab labeled Fuses, where you set the different bits and where you can program,
verify or read the fuse bits.
A little oddity is that to program a feature, you must write a "0" to the particular bit. It's sort of a negative
logic. If you are programming the fuse bits with AVR studio, you don't have to worry about this issue
because the value of the fuse bit is managed by the programmer.
Fuse bits live in a different memory space than the program memory. This means that the fuse bits are not
affected by a program memory erasure. This has the advantage that once you program the correct fuse
bits in your AVR, you can forget about them and don't need to reprogram them each time you alter the
program memory.
Fuse bits differ greatly between different AVR variants. Some AVRs, like the AT90S8535, have only two
fuse bits, while others, like Atmega128, have 18. I will explain the AT90S4433's fuse bits.
The AT90S4433 AVR has 6 fuse bits. One is related to serial programming (SPIEN), two are related to the
Brown-Out Reset Circuit operation (BODEN and BODLEVEL), and three are for configuring the clock
options (CKSEL2..0).
AVRs have two programming modes, parallel and serial. (See the Memory Programming section of the
datasheet for details). When the SPIEN fuse bit is programmed, serial programming and data downloading
is enabled. The default value for this fuse is programmed. You can change this fuse only if you are
programming the AVR in parallel mode.
The Brown-Out Detector circuit monitors the Vcc voltage. When Vcc drops below the trigger level, this
circuit resets the AVR. When Vcc is above the trigger level, the reset signal is released. If you want to
enable the Brown-Out Detector circuit, you must program the BODEN fuse. The BODLEVEL fuse sets the
trigger level. The following table summarizes this:
Fuse
Programmed
Unprogrammed
BODEN
Default
Unprogrammed
Unprogrammed
There are several options for the AVR clock which differ in the start-up time after a reset. You need to
adjust the start-up time according to the clock source you are using. See the datasheet and the clock
section for more details. The following table summarizes the setting of CKSEL fuses:
CKSEL[2..0]
Recommended Usage
000
001
010
Crystal Oscillator
011
100
101
Ceramic Resonator
110
111
Common Pitfall
A common pitfall with fuses is to forget about them, so you end up working with the default settings. If you
are lucky, these are the settings you need, but if you are not, strange things will happen. For instance,
many AVRs have an internal oscillator, which is enabled by default. If you are using the UART based on
the frequency of an external oscillator, your serial link won't work. Or maybe your Atmega128 is not
working as expected, because you forget to unprogram the Atmega103 compatibility mode fuse, which is
programmed by default. So, here is my advice: Carefully check the fuse bits before using your AVR!
Lock Bits
Lock Bits are similar to fuse bits. You program them through a programmer, and you can't change them at
run time. Lock Bits are used to set different access levels to Flash and EEPROM memory from an external
programmer.
All AVRs have at least two Lock Bits, LB1 and LB2, which allow you to configure three different Lock Bits
modes, as shown in the following table :
Mode
1
LB1
1
LB2
Description
1 You can read from and write to Flash and EEPROM with a programmer
You can only read from Flash/EEPROM. Writing is disabled. The Fuse Bits
1
can't be changed either.
0 Both reading and writing are disabled on Flash, EEPROM and Fuse Bits.
May be you are wondering why these different acces level are needed. Let say that you finished a
prototype, and don't want to accidentally erase your AVR, but you want in the future to read the program
memory to know the software version used. Then you use Lock Bit Mode 2. Or may by you are selling a
product and don't want that sombody copy your program. Then you use Lock Bit Mode 3.
If you are suspicious, you may wonder how secure AVRs really are. In my opinion, they are pretty good in
that respect. However, they are not bulletproof. There are companies, like www.chipworks.comthat
specialize in reverse engineering. Their services are VERY expensive, and so it's unlikely that someone
pays them to copy your program. If your work is top secret, or if you are just paranoid, then you need to
use a microcontroller specifically designed with high security in mind, like Atmels AVR based Secure
Microcontrollers. For more information about this issue, look at thisas well.
The only way to unprogramm a Lock Bit (change it from 0 to 1) is the Chip Erase command. Also notice
that if you use Lock Bits Mode 2 or 3, you can't change the Fuse Bits anymore. So, if you mess up things
with Lock Bits, don't worry, just erase the chip and start again.
The AVRs with bootloading capabilities (ATmega series), have four additional Lock Bits, which configure in
which section of FLASH the LPM and SPM instructions can be used. For the details, look at the particular
datasheet.
The details of how to program the Lock Bits in your AVR depend on the particular programmer you are
using. Read the manual for details. For instance, if you are using a STK500 with AVR Studio, the STK500
window has a tab labeled Lock Bits, where you set the different modes, and where you can program, verify
or read the Lock Bits.
You can set the prescaler to divide the WDT oscillator frequency, so that the reset interval can be adjusted.
It is important to notice that the On-chip oscillator frequency is dependant on the Vcc value and the
temperature of the chip. The following table shows the setting of the prescaler for the AT90S4433. All AVRs
behave similar in that respect, but there are some minor variations between differents devices. Check the
datasheet for the details.
WDP2 WDP1 WDP0
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
Number of Oscillator
cycles
16K cycles
32K cycles
64K cycles
128K cycles
256K cycles
512K cycles
1024K cycles
2048K cycles
Enabling the WDT is simple, just set the Watchdog Enable (WDE) bit. Disabling the WDT is not that
simple, you must follow a special procedure. The WDT is designed this way to avoid disabling the WDT
unintentionally by an instruction executed during a fault condition.
The Watchdog Turn-off Enable (WDTOE) bit is used to disable the WDT. To disable the WDT you must
follow the following procedure:
1. Write, with one instruction, a logical "1" to the WDTOE and WDE.
2. Within the next four clock cycles, write a logical "0" to WDE. This disables the Watchdog.
This is a code snippet to disable the WDT:
ldi r16, 0x18
out WDTCR, r16
ldi r16, 0x10
out WDTCR, r16
;
; set WDTOE and WDE
;
; write a 0 to WDE
A word of caution : before turning-on the WDT, and before changing the prescaler, execute a WDR
instruction. In this way you are sure that the WDT starts cleared and does not generate an accidental WDT
overflow.
;
;
; delay for 0.5 seconds
; turn off the LED
; loop forever (until the WDT resets the AVR)
;
;
;
;
;
;
;
;
;
;
;
;
;
rcall delay_05
sbi PortB, 3
loop:
rjmp loop
delay_05:
ldi r16, 8
outer_loop:
ldi r24, low(3037)
ldi r25, high(3037)
delay_loop:
adiw r24, 1
brne delay_loop
dec r16
brne outer_loop
ret
What the code will do is: Turn on the LED, wait for 0.5 seconds and turn it off again. As the Watch Dog is
enabled to generate a reset after 1 second, the reset will occur 0.5 seconds after the LED has been turned
off. The result is an LED flashing at 1 Hz with a duty cycle of 50%.
3
5
V
V
e
e
5
R
S
R
/
W
7
E
8
D
0
9
D
1
10
D
2
D
3
11
D
4
12
D
5
13
D
6
14
D
7
15
L
E
D
+
16
L
E
D
-
Gnd and 5V shouldn't need any explanation. Vee is the LCDs contrast voltage and should be connected to
a pot (voltage divider). The voltage should be between 0 and 1.5V (this may vary for different
manufacturers) and the pin *can* also be tied to ground.
RS is the register select pin. To write display data to the LCD (characters), this pin has to be high. For
commands (during the init sequence for example) this pin needs to be low.
R/W is the data direction pin. For WRITING data TO the LCD it has to be low, for READING data FROM
the LCD it has to be high. If you only want to write to the LCD you can tie it to ground. The disadvantage of
this is that you then can't read the LCDs busy flag. This in turn requires wait loops for letting the LCD finish
the current operation, which also means wasting CPU time.
E is the Enable pin. When writing data to the LCD, the LCD will read the data on the falling edge of E. One
possible sequence for writing is:
- Take RW low
- RS as needed for the operation
- Take E high
- put data on bus
- take E low
You can also prepare the data before taking E high, which might save 1 word of code space (more on that
later).
The purpose of the data lines should be obvious. When no read operation is in progress, they're tri-stated
which means that these lines can be shared with other devices. In 4-bit mode only the high nibble (D4..D7)
is used. Bit 7 is the busy flag (more on that later).
The Example Circuit
As we want to concentrate on the mega8, the code examples for the LCD are also written for the mega8.
PortD is the only "complete" port offering us 8 bits, so PortD will be used for as data port. PortB is used by
the ISP, so it will not be used by the LCD. PortC can be used for the LCD control lines (RS, R/W and E):
PortD -> LCD Data
PortC.0 -> RS
PortC.1 -> R/W
PortC.2 -> E
For the LCD to work, three more lines are necessary: Vcc, Ground and Vee. Vee can be tied to ground.
The circuit can be built up using our mega8 board or an STK500.
When powered up, you should see a black bar in the first LCD line. That bar will disappear when the LCD
is being initialised. Init is done by using a special command set. Commands are issued by writing data to
the LCD when RS is low (see above). If the LCD is not initialised correctly, writing display data to it won't
work at all. So I'll briefly describe the command set now, then go on with writing characters on the screen.
Afterwards I'll describe the commands in detail.
The LCD Command Set
Most of the LCD commands don't need more time fore the LCD to execute them than writing a character.
The datasheet of the LCD used for writing this code stated 40s for a simple command.
Clear Display: 0x01
This command clears the display and returns the cursor to the home position (line 0, column 0). This
command takes 1.64 ms to complete!
Cursor Home: 0b0000001x
This commands also sets the cursor position to zero, but the display data remains unchanged. It also
takes 1.64 ms for execution, but it also shifts the display to its original position (later!).
Entry Mode:
0
I/D
I/D: Increment/Decrement Cursor bit. If set, the cursor will be post-incremented after any read data or write
data operation. If cleared, the cursor will be post-decremented.
S: If set, the whole display will be shifted, depeding on I/D: If I/D and S are set, the display will be shifted to
the left, if I/D is cleared (S set), the display will be shifted to the right. Usually I/D = 1, S = 0 is used
(increment cursor, don't shift display).
Display On/Off:
0
D: Display On/Off. If set, the display is turned on. When the display is turned off, character data
remains unchanged!
C: Cursor On/Off. If set, the cursor is visible in the way set by B.
B: Cursor blink on/off. If this bit is set, the cursor will blink as a black block. Otherwise the cursor is shown
as an underscore _.
Shift Cursor/Display:
0
1 S/C R/L
S/C: If set, the display is shifted. If cleared, the cursor is shifted. The direction depends on R/L.
R/L: If set, the display/cursor is shifted to the right. If cleared, it's shifted to the left.
Function Set:
0
DL
DL: Interface length. If set, 8-bit mode is selected (as in this example). If cleared, 4 bit mode is selected.
N: Number of display lines. If cleared, the LCD is a one line display. If set, the display is in 2/4 line mode.
F: Font size. If cleared, 5x7 Font is selected. If set, 5x10 font is selected.
The last two features might lead to the question "Why doesn't my display know what it is?" Well the
controller (HD44780) is always the same, but it works with many different display types (1 to 4 lines, 8 to
40 characters per line) and the displays also come with different character sizes (5x7 or 5x10).
CG Ram Address Set:
0
ACG
ACG is the Address to be set for Character Generator Ram access. The CG can be used to configure and
show custom characters.
DD Ram Address Set:
1
ADD
ADD is the address to be set for Display Data Ram access. The display data ram holds the data displayed
(the characters). See below for DD Ram organisation.
Busy Flag/DD Ram Address READ:
BF
ADD
If the command register is read, the value actually returned by the LCD is the DD Ram Address (bits 0..6)
and the busy flag (bit 7). THe busy flag should be read after every command to achieve max speed. If the
busy flag is set, the controller is still busy executing a command/writig data.
Display Data Addressing
The display controller has to offer a way of addressing all display characters which works for ALL kinds of
displays (remember: 1 to 4 rows, 8 to 40 characters). That's why the rows don't follow each other. The
rows start at fixed addresses:
Display size
Nx8
N x 16
N x 20
1st row
$00 - $07
$00 - $0F
$00 - $13
2nd row
$40 - $47
$40 - $4F
$40 - $53
3rd row
x
$10 - $1F
$14 - $27
4th row
x
$50 - $5F
$54 - $67
Of course this list is not complete, but it shows some mean details about using a 16 x 4 display: The first
address of the second row is bigger than the first address of the third row!
Example: For setting the cursor to the 3rd character of the second row in a 16 x 2 display, write 0x42 |
0b10000000 to the display command register.
The following page will explain display initialisation and simple read/write operations together with code
examples.
[ Next Page ]
The nop delay between setting and clearing E is important for the line to settle. I've tried the code without
them and sometimes it didn't work due to the long wires. After a command is issued you have to either
insert a wait routine or check the busy flag (see "Reading Data").
So if we want to initialise the LCD for 8-bit mode, 2 lines and 5x7 font, we need to write 0b00111000 (0x38)
to the command register:
ldi r16, 0x38
rcall LCD_Command
Writing Data
Writing data to the LCD is just as easy as writing commands to it, but now we have to SET the RS line.
We'll use r16 for the routine argument again:
LCD_w_Data:
sbi PortC, LCD_RS
out PortD, r16
sbi PortC, LCD_E
nop
nop
nop
cbi PortC, LCD_E
;and RS
;return from subroutine
As AVR Studio can convert ascii characters to their hex value, we can use 'A' for loading r16:
ldi r16, 'A'
rcall LCD_w_data
For reading data from the LCD, just rewrite the routine with LCD_RS = 1 (data register select). It will then
read the data at the location the Address counter is pointing to. I won't rewrite this routine now, as you just
have to change one single line.
The good thing is that we can now use the LCD_r_Addr routine to check the LCDs busy flag. Before we
would have needed to include delays between the command and data writes. Now we can wait until the
LCD has finished (AND NOT ANY LONGER!) and then proceed with the next command. The LCD_wait
routine can have sepereate read data code (this will speed up things, but require more code space) or it
can use LCD_r_Addr for reading the busy flag:
LCD_wait:
rcall LCD_r_Addr
sbrc r16, 7
rjmp LCD_wait
ret
This way of writing the routine has a good side effect: When it returns, the busy flag is cleared from r16
(because the LCD cleared it), but r16 still holds the address we just read from the display. It can be used
for other purposes then.
The Init Sequence
The LCD init sequence has to be executed after startup. It tells the LCD which font size it has, what kind of
interface to use, if and how the cursor should be shown and so on. Here's a working init sequence for a 16
x 2 LCD, 8 bit interface, 5 x 7 font; show cursor as underscore; auto-increment cursor:
Though no data has been written to the LCD before issuing the clear display/cursor home command, the
cursor can be at a position that's not visible, so this command is important if you want to see what you
write to the LCD!
Some really slow LCDs might require your app to write 0x30 (8 bit interface) to the LCD before any other
command. If your LCD refuses to work, try that.
_delay_loop_2(0xFFFF);
LCD_command(0x30);
_delay_loop_2(0xFFFF);
LCD_command(0x30);
_delay_loop_2(0xFFFF);
//now: 8 bit interface, 5*7 font, 2 lines.
LCD_command(0x38);
//wait until command finished
LCD_wait();
//display on, cursor on (blinking)
LCD_command(0x0F);
LCD_wait();
//now clear the display, cursor home
LCD_command(0x01);
LCD_wait();
//cursor auto-inc
LCD_command(0x06);
}
//now it's time for a simple function for showing strings on the LCD. It uses the low-level functions above.
//usage example: LCD_write("Hello World!");
void LCD_write(char* dstring)
{
//is the character pointed at by dstring a zero? If not, write character to LCD
while(*dstring)
{
//if the LCD is bus, let it finish the current operation
LCD_wait();
//the write the character from dstring to the LCD, then post-inc the dstring is pointing at.
LCD_putchar(*dstring++);
}
}
This code example is also available as a complete .h file with tabs for better reading: lcd.h
I'll now explain how to write commands and data to the LCD as well as reading the address counter and
data. For having a better overview, I suggest opening the LCD 4-bit mode codem8_LCD_4bit.asm in a
seperate window. The code contains init and some other routines which are not of interest now. They are
commented though and should be easy to understand. Scroll down to LCD_command8, LCD putchar and
LCD_command.
For initialising the LCD, we need to write commands to it. So have a look at LCD_command8 and
LCD_command now. LCD commands was written after LCD_putchar, but it shares LCD_putchar's
comments. I didn't repeat them.
After power-up the LCD is in 8-bit mode by default. For switching to 4-bit mode, we need to write an 8-bit
command to it : 0b00100000. Hey! We've just got 4 data lines, so how should we write an 8-bit command?
The good news about this is that the data length bit is in the upper nibble, which is connected to the LCD
(see above). The lower 4 bits are not important now. We just want to set the data length to 4 bits now. As it
is an 8-bit command, we only have to strobe E once, not twice as in 4-bit mode. The extra routine is ONLY
needed for init. The only thing to watch out for is the data direction for each pin, as control and data lines
share the same port. It's all in the comments, really! Have a look (send me an email if it's not commented
enough!).
Now comes the interesting part. Writing characters to the LCD works EXACTLY like writing commands, but
when writing character RS is taken high, while for commands it is taken low. Nothing special :-) First of all,
the data direction bits for the data lines are set for output, as in
DDRD |= 0xF0;
Then, for safety, all LCD lines are cleared. PortD.0 is not used by the LCD, so this bit is saved through this
process (see code, only the upper 7 bits are cleared). In 4-bit mode, the high nibble of our data byte is
written first, then the low nibble is written. For writing the high nibble, the low nibble of the argument is
cleared. Then the rest (the high nibble) is combined with the PortD data:
PortD |= (argument & 0xF0);
Then the control lines are set as needed: RS high (char) or low (command), RW low. Then E is strobed to
write the high nibble. Now we need the low data nibble, which we destroyed by clearing the low nibble
before for writing the high nibble. This means that the argument has to be saved at the beginning of the
routine (in this case it's pushed onto the stack). For getting the low nibble again, we now pop the argument
again and clear the high nibble. The data lines are on the high port lines though, so we also need to swap
the argument now. The high port data nibble is cleared, then the port data is ORed with the argument to
set the data lines as required. Again, E is strobed. Before returning from the routine, the data direction of
the LCD data port is set to input again, the control lines stay outputs. This procedure is the same for data
and commands, as already mentioned.
With these tools (LCD_command and LCD_command8) it's possible to init the the LCD (a small delay
routine is also needed). First, the LCD is set to 4-bit mode. Then the usual settings are made (-> LCD_init
at the end of the code!).
We still can't check the busy flag or read data from the LCD. Checking the busy flag is especially useful
during LCD init (we can get rid of those looooong delays!). I'll now just describe reading in general, as all
read operations are again almost equal. When reading from the LCD, the high nibble is also tansmitted
first. We have to read it while E is high during the E strobe. The read routines first make sure that the data
direction bits for the data lines are zero (input). Then E is taken high and the PIN data is read. Now the pin
data still contains some unknown bits (especially PinD.0, which might be used by other app code!) and
these bits are masked away. The remaining value is the high nibble of the data read, which will be stored
in the high nibble of our return value (mov is used for this so that the return value doesn't have to be
cleared before). Then the low nibble is read in the same manner (read PINs while E is high). For
combining the high nibble with the new low nibble we again have to clear the unused bits in the value from
the PIN register. The low nibble of the LCD data is now in the high nibble of the PIN value, so we need to
swap the pin value. Then OR is used to combine the return value (high nibble) with the new low nibble.
That's it!
A routine that waits for the busy flag is no problem then: Just read the address counter (which inclues BF),
mask off the lower 7 bits and see if the result is zero. Then the busy flag is cleared. Have a look at
LCD_wait.
The main code first inits the LCD and then writes 'A' to the first LCD position. Then LCD_command is used
to set the address counter to zero again: If bit 7 of the command is set, the lower 7 bits are interpreted as
an address for the cursor. Then the character at position zero is read (after a read operation, the cursor is
also auto-incremented!) and written to the next position. The LCD now shows "AA".
First of all, a start condition is generated. Then the slave address byte is transferred. It consists of the bare
slave address (upper nibble: 0xA0 = 0b10100000 for better viewing), the upper three page select bits (bits
1..3) and the read/write bit (LSB) which is zero for writing. This address block is ACKed by the EEPROM if
the EEPROM is present and NOT BUSY (!). If the EEPROM is busy, it will not respond with an ACK. Then
the word address is transferred. Don't mind the "*" in the figure, it's for the 1K version of the EEPROM. In
our case, the word address has 8 bits. This address is also ACKed by the EEPROM.
The difference between byte-write and page-write is just the number of data bytes that are transferred now.
For a byte write just transfer one data byte (which is again ACKed by the EEPROM if everything's alright).
For a page write, transfer up to 16 bytes. If more than 16 bytes are transferred, the page address counter
will roll over to the first address of the current page.
When everything is done, the master generates a Stop condition. The EEPROM should disconnect itself
from the bus and enter some kind of power-save mode.
After every TWI operation, the TWI will set the TWINT flag and return a status code in TWSR. TWINT is
NOT set after the TWI generated a Stop condition (why should it?). Our code will tell the TWI what to do,
then wait for TWINT being set and then check the status code to see if everything is right. Depending on
the status of the operation that was completed, it will print success/error messages on the LCD.
Before we can do any printing, we'll have to run through some init code though:
LCD_init();
TWBR = 32;
LCD_init() will initialize the LCD (8 bit interface, 2 lines, 5*7 font, auto-inc cursor, cursor on and blinking).
TWBR is the TWI bit rate register. At 8 MHz, a value of 32 will result in a SCL frequency of 100kHz.
Write Code Example
The first thing we'll need is some function that initiates TWI operations, such as generating Start and
address transfer. As the TWI won't do anything while TWINT is set, our function will also make sure that
TWINT is cleared when writing TWCR. TWINT is cleared by writing a 1 to it. Then we'll wait for the TWI
hardware to set TWINT again and return the status code from TWSR:
char TWI_action(char command)
{
//write command to TWCR and make sure TWINT is set
TWCR = (command | (1<<TWINT);
//now wait for TWINT to be set again (when the operation is completed)
while (!(TWCR & (1<<TWINT)));
return TWSR;
}
The status codes are a good and rich source for errors. If the application checks for errors by looking at
the status codes, it can happen that the *wrong* status code is expected (especially when reading from the
EEPROM). This is not too dangerous now, but I thought it might be worth mentioning. The status codes
are divided into four groups: Master Transmitter Mode (MT), Master Receiver Mode (MR), Slave
Transmitter Mode (ST) and Slave Receiver Mode (SR). The slave modes are not interesting now. The
tables are in the mega8 datasheet (print them out). When switching between these modes, it can happen
that status codes get mixed up. For writing, only MT mode is used. Reading from the EEPROM also uses
MR mode.
Here's the code for sending a Start condition and the slave address. We'll write to page 0, byte 0. The
word address and data transfer is described seperately (but it's similar).
//send start. the expected status code is 0x08
if(TWI_action((1<<TWINT)|(1<<TWEN)|(1<<TWSTA)) == 0x08)
//if that worked, print 'S' on the LCD
LCD_putchar('S');
else
//if something went wrong, print 'E'
LCD_putchar('E');
//wait for the LCD to finish the character write (just for safety...)
LCD_wait();
//now send slave address, expected status code 0x18 (ACK received)
TWDR = 0xA0;
if(TWI_action((1<<TWINT)|(1<<TWEN)) == 0x18)
LCD_putchar('A');
else
LCD_putchar('N');
That's all you need for addressing a slave. The TWI hardware will return different status codes for data
sent AFTER the slave address. On the bus side, these transfers are equal, but the status codes are
different. That's why I've divided the code into two parts. The word address and the data byte are true data
transfers:
LCD_wait();
//send word address 0x00, expected status code is 0x28 (ACK)
TWDR = 0;
if(TWI_action((1<<TWINT)|(1<<TWEN)) == 0x28)
//if word address ACKed, print 'W', else print 'N' on the LCD
LCD_putchar('W');
else
LCD_putchar('N');
LCD_wait();
//now send the data byte. We'll use 0x55. Again, the expected status code is 0x28.
TWDR = 0x55;
if(TWI_action((1<<TWINT)|(1<<TWEN)) == 0x28)
//if data ACKed, print 'D', else print 'N' on the LCD
LCD_putchar('D');
else
LCD_putchar('N');
The very first 24C16 memory location should now be ready to be verified as 0x55.
If we wanted to write the whole page (or parts of it), we would just write more data bytes now (up to 16
byte in total plus the word address first):
Both byte write and page write are terminated with a stop condition:
LCD_wait();
TWCR = ((1<<TWINT)|(1<<TWEN)|(1<<TWSTO));
LCD_putchar('P');
The LCD should now show "SAWDP" for Start - Slave Address ACK - Word Address ACK - Data ACK Stop. The TWI hardware does not leave any specific status code after generating the Stop condition.
TWINT will also not be set. If the EEPROM is not connected to the bus, the LCD will show "SNNNP".
Read Operations
Read operations will put the TWI in two different states: MT mode and MR mode. There are three different
read operations: The current address read, the random read (from a specific address) and the sequential
read. The current address read just consists of a single read transfer without word address:
The master generates a Start condition, then sends the slave address (now with the R/W bit set for
reading). The slave responds with an ACK. Then the master reads the data from the slave (SCL driven by
the master, SDA driven by the EEPROM) and sends a NACK afterwards indicating that it does not want to
read any more data. Then a Stop is generated by the master.
When reading from a specific address, the word address is transferred first (as in a data write operation).
Then a repeated start is generated and the data is read:
Now it's important to understand which transfer modes the master is in: The first time the device address is
sent, the master is in MT mode for both the address and the word address transfer. Then, after the
repeated start condition, the master is is MR mode. This important, because the status codes come from
different tables. Here's the complete random read code (read from address 0):
LCD_wait();
//send start
if(TWI_action((1<<TWINT)|(1<<TWSTA)|(1<<TWEN)) == 0x08)
LCD_putchar('S');
else
LCD_putchar('E');
LCD_wait();
//send slave address
TWDR = 0xA0;
if(TWI_action((1<<TWINT)|(1<<TWEN)) == 0x18)
LCD_putchar('A');
else
LCD_putchar('N');
LCD_wait();
//send word address
TWDR = 0x00;
if(TWI_action((1<<TWINT)|(1<<TWEN)) == 0x28)
LCD_putchar('W');
else
LCD_putchar('N');
LCD_wait();
//repeated start
if(TWI_action((1<<TWINT)|(1<<TWSTA)|(1<<TWEN)) == 0x10)
LCD_putchar('S');
else
LCD_putchar('E');
LCD_wait();
//send slave address, read bit = 1; MR mode!
TWDR = 0xA1;
if(TWI_action((1<<TWINT)|(1<<TWEN)) == 0x40)
LCD_putchar('A');
else
LCD_putchar('N');
LCD_wait();
//now, in MR mode, get data byte. We don't set TWEA, so no ACK is sent afterwards:
TWI_action((1<<TWINT)|(1<<TWEN));
if (TWDR == 0x55)
LCD_write("read OK");
else
LCD_write("read error");
//send stop
TWCR = ((1<<TWINT)|(1<<TWSTO)|(1<<TWEN));
For this to work as expected, the first memory location should hold 0x55. If everything is right, the LCD
should show "SAWSAread OK" after such a read operation.
As you can see in this example, it's very important to look at the right status code!
It is also possible to do a sequential read (of the whole EEPROM if required). To do that, just write the
desired start address after sending the slave address, and read as man bytes as you wish, each time
sending an ACK. After the last byte, a NACK has to be sent:
The address pointer will roll over to address 0 after the last byte in memory has been read (it does NOT
roll over page-wise like in a write page operation!). You can read out all 2k bytes in one go if you want.
I've put these examples in a file, with read and write functions for single bytes and pages. Though not all
functions are used by the main code, they have all been tested.
24C16.c
lcd.h and the makefile (atmega8, avrdude programmer: stk500) are also required. The makefile has been
generated with mfile. mfile can be downloaded from the http://winavr.sourceforge.net/ news page.
The files contain connecting information.
AVR Calc
AVR Calc is a cool tool for calculating values for timers and the UART. It can also convert floating point
values to their hex representation. No need to tell you a lot more, just download and try it. It's available
on www.avrfreaks.net in the tools section (here's a direct link).
You will see the "Project Manager" (right image). It is used to add files to the project, create new ones and
keeping track of the files associated with the project. Add a file to the project byy clicking right on the
"Assembler Files" folder and select "Create New File...". In the Dialog, check that the directory is correct,
choose a file name and make sure you add a valid extension (.asm), otherwise it won't work. The new file
will show up underneath the "Other Files" folder. Drag it into the "Assembler Files" folder.
You can include files in your assembler project using the .include directive. The definition files for the AVR
types for example have to be included for names like "PortB" to work. The top file, from which all other files
are included is the "Assembler Entry File" which the assember starts with when it try to translate your
code. You can set the assembler entry file (if your project contains more than one file) by right clicking in it
and checking the "Assembler Entry File" option in the drop-down menu. For the first file this will be
checked by default (what else should the assembler start with?).
This is it. You can now open the new file and add code to it. You can also add an already existing file and
choose that one to be the entry file of course.
The "Project Settings" box is quite important as well.
If you want to simulate your code in AVR Studio, choose "Object format for AVR Studio" in the "Output file
format" box. The assember will the create files that can be simulated by AVR Studio. If you want to
download your code to a target system such as the STK500 or some other board/projaect hardware, you'll
have to choose "Intel Hex" as output format. The assembler will then generate a hex file you can
download. Quite often I wonder why my code doesn't work because I already had an old hex file with the
right file name, but simulated newer code. If I then didn't change the output file format, old code is stored in
Flash. Of course there's more you can do with your projects in the menus, but this will be enough for the
project to work.
ISP Circuit
Port Headers
RS232 Transceiver Circuit
LEDs and Buttons
Other Connectors And Jumpers
If you wish to use external slaves and maintain ISP capability, Atmel recommends adding series resistors
of 4.7K in the SPI lines (mega8 -> ISP connector -> resistor -> slave).
(note: each connector has its own decoupling cap and connecton to the power lines. This is only shown for
PortC here!)
You can see that not all pins are used. The mega8 doesn't have all those port pins. A special case is PortB:
PortB.6 and PortB.7 are the crystal oscillator pins. In order to ensure correct operation of the crystal
oscillator, these are located next to the crystal and NOT routed to the Port header (See Other Connectors).
J4 is the 2-pin connector we know from the STK500: It can be used to connect PortD.0 (Rxd) and PortD.1
(Txd) to the transceiver IC. On the board you can find it next to the PortD header.
The other pair of transceivers can be used in basically two ways. I have made simple drawings of how they
can be used. They also show how the pins are located on the board:
Flow Control Signals (RTS/CTS): If the spare pair of transceivers should be used for the flow control
signals RTS (from the PC) and CTS (to the PC), close J2/J3 as shown in the drawing and connect J5 to
those pins you want to use for flow control.
If you want to use the transceivers for other purposes, for example a second UART (in software or ext.
hardware), you can take the RS232 side of the data on J2. The J2 pin next to the MAX232 is for data
FROM the PC and goes to the RTS pin on J5. The CTS pin can be used for data TO the PC, which will
come out on the J2 pin next to J5.
The Buttons are just connected to ground and will thus generate a low level when pressed. They are NOT
equipped with a pull-up resistor, so the internal pull-ups of the AVR have to be used.
Both headers provide Vcc and Gnd (just as the port headers) and have a decoupling cap (not shown
here).
http://www.avrbeginners.net/