Skip to content

Minor speed up of MULADD and some comments about it #4

@ped7g

Description

@ped7g

This is more like JFYI and me being annoying because I can't resist myself when I see Z80 code... sorry. :)

MULADD:
    or a
    jr z, DONE       ; weight=0: skip entirely
    jp m, NEG        ; weight<0: subtract
    ; weight=+1: add activation
    ld hl, (ACC)
    add hl, de
    ld (ACC), hl
    ret
NEG:
    cp 0FFh
    jr z, NEG1       ; weight=-1
    ; weight=-2: subtract twice
    ld hl, (ACC)
    sbc hl, de
    sbc hl, de
    ld (ACC), hl
    ret
NEG1:
    ; weight=-1: subtract once
    ld hl, (ACC)
    sbc hl, de
    ld (ACC), hl
    ret

I believe (haven't tested the code, but "should" work) you can speed it up a bit for machines like ZX Spectrum - having Zilog timing of instructions, for machines like CPC where machine cycles are forced to be multiples of 4T this will be slightly slower in some paths, overall should be still faster:
(for ZX: -3T for +1 weight, -11T for 0 weight, -10T for -1 weight and -9T for -2 weight ... if I'm counting it correctly from head)

MULADD:
    or a
    ret z            ; weight=0: skip entirely
    ld hl,ACC
    jp m, NEG        ; weight<0: subtract
    ; weight=+1: add activation
    ld a,e
    add a,(hl)
    ld (hl),a
    inc hl           ; can be `inc l` if ACC is ALIGN 2
    ld a,d
    adc a,(hl)
    ld (hl),a
    ret
NEG:
    inc a
    jr z, NEG1       ; weight=-1: subtract once
    rl e             ; CF=0 from `or a`
    rl d             ; weight=-2: subtract twice (DE*=2)
NEG1:
    ld a,e
    sub (hl)
    ld (hl),a
    inc hl           ; can be `inc l` if ACC is ALIGN 2
    ld a,d
    sbc a,(hl)
    ld (hl),a
    ret

BTW for weight -2, you are doing 2x sbc, so the carry overflow from first subtraction goes into the accumulator from bottom... I don't think that's intended, but probably doesn't matter in NN calculation as it will skew results in negligible way, if at all (I guess that subtraction often doesn't overflow at all or generally should not overflow, otherwise acc would have to have more bits to accumulate the result properly... I guess you can collect a bit more of those extra overflow bits in case you are oscillating around total zero weight with several -2 weights).

And looking at the loop calling MULADD itself, this could be sped up further quite a bit, but it's very tiresome for me to read the code generator syntax, I wish you would use external assembler like sjasmplus and generate from python rather regular Z80 syntax code snippets as strings, so they can be read and edited in common syntax... :D (but that's my personal bias/preference).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions