Minor speed up of MULADD and some comments about it

This is more like JFYI and me being annoying because I can't resist myself when I see Z80 code... sorry. :)

```
MULADD:
    or a
    jr z, DONE       ; weight=0: skip entirely
    jp m, NEG        ; weight<0: subtract
    ; weight=+1: add activation
    ld hl, (ACC)
    add hl, de
    ld (ACC), hl
    ret
NEG:
    cp 0FFh
    jr z, NEG1       ; weight=-1
    ; weight=-2: subtract twice
    ld hl, (ACC)
    sbc hl, de
    sbc hl, de
    ld (ACC), hl
    ret
NEG1:
    ; weight=-1: subtract once
    ld hl, (ACC)
    sbc hl, de
    ld (ACC), hl
    ret
```

I believe (haven't tested the code, but "should" work) you can speed it up a bit for machines like ZX Spectrum - having Zilog  timing of instructions, for machines like CPC where machine cycles are forced to be multiples of 4T this will be slightly slower in some paths, overall should be still faster:
(for ZX: -3T for +1 weight, -11T for 0 weight, -10T for -1 weight and -9T for -2 weight ... if I'm counting it correctly from head)

```
MULADD:
    or a
    ret z            ; weight=0: skip entirely
    ld hl,ACC
    jp m, NEG        ; weight<0: subtract
    ; weight=+1: add activation
    ld a,e
    add a,(hl)
    ld (hl),a
    inc hl           ; can be `inc l` if ACC is ALIGN 2
    ld a,d
    adc a,(hl)
    ld (hl),a
    ret
NEG:
    inc a
    jr z, NEG1       ; weight=-1: subtract once
    rl e             ; CF=0 from `or a`
    rl d             ; weight=-2: subtract twice (DE*=2)
NEG1:
    ld a,e
    sub (hl)
    ld (hl),a
    inc hl           ; can be `inc l` if ACC is ALIGN 2
    ld a,d
    sbc a,(hl)
    ld (hl),a
    ret
```

BTW for weight -2, you are doing 2x sbc, so the carry overflow from first subtraction goes into the accumulator from bottom... I don't think that's intended, but probably doesn't matter in NN calculation as it will skew results in negligible way, if at all (I guess that subtraction often doesn't overflow at all or generally should not overflow, otherwise acc would have to have more bits to accumulate the result properly... I guess you can collect a bit more of those extra overflow bits in case you are oscillating around total zero weight with several -2 weights).

And looking at the loop calling MULADD itself, this could be sped up further quite a bit, but it's very tiresome for me to read the code generator syntax, I wish you would use external assembler like sjasmplus and generate from python rather regular Z80 syntax code snippets as strings, so they can be read and edited in common syntax... :D (but that's my personal bias/preference).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor speed up of MULADD and some comments about it #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Minor speed up of MULADD and some comments about it #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions