Quantcast
Channel: Bit Bucket – Big Mess o' Wires
Viewing all articles
Browse latest Browse all 164

AVR-GCC Compiler Makes Questionable Code

$
0
0

Most people believe that modern compilers generate better-optimized assembly code than humans, but look at this example from AVR-GCC 5.4.0 with -O2 optimization level:

    7b96:	10 92 34 37 	sts	0x3734, r1	; 0x803734 <tachFlutter>
    7b9a:	e0 e0       	ldi	r30, 0x00	; 0
    7b9c:	f0 e0       	ldi	r31, 0x00	; 0
    7b9e:	a0 91 35 37 	lds	r26, 0x3735	; 0x803735 <driveTachHalfPeriod>
    7ba2:	b0 91 36 37 	lds	r27, 0x3736	; 0x803736 <driveTachHalfPeriod+0x1>
    7ba6:	ae 1b       	sub	r26, r30
    7ba8:	bf 0b       	sbc	r27, r31
    7baa:	b0 93 89 00 	sts	0x0089, r27	; 0x800089 <OCR1AH>
    7bae:	a0 93 88 00 	sts	0x0088, r26	; 0x800088 <OCR1AL>
    7bb2:	10 92 95 00 	sts	0x0095, r1	; 0x800095 <TCNT3H>
    7bb6:	10 92 94 00 	sts	0x0094, r1	; 0x800094 <TCNT3L>
    7bba:	32 2d       	mov	r19, r2
    7bbc:	e0 e0       	ldi	r30, 0x00	; 0
    7bbe:	f0 e0       	ldi	r31, 0x00	; 0
    7bc0:	f0 93 e3 33 	sts	0x33E3, r31	; 0x8033e3 <currentTrackBytePos+0x1>
    7bc4:	e0 93 e2 33 	sts	0x33E2, r30	; 0x8033e2 <currentTrackBytePos>

This is straight-line code with no branching. All registers and memory references are 8-bit. With AVR-GCC, the register r1 always holds the value 0, so the code is doing this: Set tachFlutter to 0, load driveTachHalfPeriod, set OCR1A to driveTachHalfPeriod minus 0, set TCNT3 to 0, set currentTrackBytePos to 0. There’s also a move of r2 to r19, which is used later, and I’m not sure why the compiler located the instruction here. There are at least three glaring inefficiences:

  • the compiler wastes time loading 0 into r30 and r31, when it could have just used r1
  • it does this TWICE, when we know r30 and r31 were already zero after the first time
  • it subtracts a constant 0 from driveTachHalfPeriod

I can maybe understand the subtraction of constant 0, if there’s another code path that jumps to 7ba6 where the value in r30:r31 isn’t 0. But why wouldn’t the compiler make a completely separate path for that, with faster execution speed when the subtracted value is known to be 0, even if the code size is greater? After all this is -O2, not -Os.

It also appears there’s no optimization for setting multi-byte variables like currentTrackBytePos to zero. Instead of just storing r1 twice for the low and high bytes, the compiler first creates an unnamed 16-bit temporary variable in r30:r31 and sets its value to 0, then stores the unnamed variable at currentTrackBytePos.

This whole block of code could easily be rewritten:

    sts	0x3734, r1	; 0x803734 <tachFlutter>
    lds	r26, 0x3736	; 0x803736 <driveTachHalfPeriod+0x1>
    sts	0x0089, r26	; 0x800089 <OCR1AH>
    lds	r26, 0x3735	; 0x803735 <driveTachHalfPeriod>
    sts	0x0088, r26	; 0x800088 <OCR1AL>
    sts	0x0095, r1	; 0x800095 <TCNT3H>
    sts	0x0094, r1	; 0x800094 <TCNT3L>
    mov	r19, r2
    sts	0x33E3, r1	; 0x8033e3 <currentTrackBytePos+0x1>
    sts	0x33E2, r1	; 0x8033e2 <currentTrackBytePos>

This is much shorter, and avoids using r27, r30, and r31, so there are more free registers available for other purposes.


Viewing all articles
Browse latest Browse all 164

Trending Articles