Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interface cast spin lock #1126

Merged
merged 7 commits into from
Feb 20, 2024
Merged

Interface cast spin lock #1126

merged 7 commits into from
Feb 20, 2024

Conversation

vyzo
Copy link
Collaborator

@vyzo vyzo commented Feb 20, 2024

This changes the prototype table lock to be a spin lock .... and it is almost 3x faster.

Before:

$ /tmp/interface-bench cast 10 1000000
(time (let () (declare (not safe)) (std/interface-benchmark#cast-benchmark _iters180_ _threads179_ std/interface-benchmark#do-cast)))
    2.117610 secs real time
    2.117559 secs cpu time (2.117150 user, 0.000409 system)
    142 collections accounting for 0.122705 secs real time (0.122418 user, 0.000250 system)
    1120026112 bytes allocated
    672 minor faults
    no major faults
    5529541900 cpu cycles

After:

$ /tmp/interface-bench2 cast 10 1000000
(time (let () (declare (not safe)) (std/interface-benchmark#cast-benchmark _iters180_ _threads179_ std/interface-benchmark#do-cast)))
    0.857509 secs real time
    0.857492 secs cpu time (0.857288 user, 0.000204 system)
    142 collections accounting for 0.104038 secs real time (0.103984 user, 0.000000 system)
    1119990592 bytes allocated
    672 minor faults
    no major faults
    2239135638 cpu cycles

Copy link

netlify bot commented Feb 20, 2024

Deploy Preview for elastic-ritchie-8f47f9 ready!

Name Link
🔨 Latest commit 8808136
🔍 Latest deploy log https://app.netlify.com/sites/elastic-ritchie-8f47f9/deploys/65d472029dc6de000875d4ef
😎 Deploy Preview https://deploy-preview-1126--elastic-ritchie-8f47f9.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@vyzo vyzo requested review from fare and a team February 20, 2024 08:41
@vyzo vyzo mentioned this pull request Feb 20, 2024
@fare
Copy link
Collaborator

fare commented Feb 20, 2024

I think we should be using futexes or something else on Linux that doesn't busywait and waste batteries on laptop smp—and importantly, this should be all abstracted over somehow in a macro.

If you want this in, fine for now, but then you should open an issue about getting locking right on SMP.

@vyzo
Copy link
Collaborator Author

vyzo commented Feb 20, 2024

It is abstracted over a macro, see __lock-inline! and __unlock-inline!.

It is a workable solution for SMP, the critical sections are expected to be small (less than 10ns each) so the spin/busy wait is not that bad.

Ideally we'd have futexes; unfortunately full gambit mutexes are so much slower it is not even funny.

@vyzo
Copy link
Collaborator Author

vyzo commented Feb 20, 2024

Follow up issue in #1128.

@vyzo vyzo merged commit 000892a into master Feb 20, 2024
12 checks passed
@vyzo vyzo deleted the interface-spin-lock branch February 20, 2024 20:24
vyzo added a commit that referenced this pull request Feb 20, 2024
On top of ##1126

And so it begins... the compiler generates specializers for all bound
methods that could benefit from it, and interface prototype creation
plugs to it, with wondrous performance results for certain programs.

Here is an example:
```
(defclass A (x y))
(defclass (B A) (z))

(defmethod {linear A}
  (lambda (self w)
    (fx+ (fx* (A-x self) w) (A-y self))))

(defmethod {bilinear-combination B}
  (lambda (self w z)
    {self.bilinear {self.linear w} z}))

(defmethod {bilinear B}
  (lambda (self lc z)
    (fx+ lc (fx* (B-z self) z))))

(interface Combinator
  (bilinear-combination w z))

(def (run iters)
  (let (instance (Combinator (B x: 1 y: 2 z: 3)))
    (for (i (in-range iters))
      (let (result (&Combinator-bilinear-combination instance 4 5))
        (unless (= result 21)
          (error "bad result" result: result expected: 21))))))

(def (main iters)
  (let (iters (string->number iters))
    (time (run iters))))

```

With gxc master:
```
$ gxc -exe -o /tmp/ispec-bench -O src/gerbil/test/interface-specialization-bench.ss
/tmp/gxc.1708454081.817515/test__interface-specialization-bench.scm:
/tmp/ispec-bench__exe.scm:
/tmp/gxc.1708454081.817515/test__interface-specialization-bench.c:
/tmp/ispec-bench__exe.c:
/tmp/ispec-bench__exe_.c:
$ /tmp/ispec-bench 1000000
(time (let () (declare (not safe)) (test/interface-specialization-bench#run _iters79_)))
    0.215345 secs real time
    0.215332 secs cpu time (0.211338 user, 0.003994 system)
    20 collections accounting for 0.016191 secs real time (0.015828 user, 0.000356 system)
    159892208 bytes allocated
    671 minor faults
    no major faults
    562301090 cpu cycles
```

With the specializers:
```
$ ./build.sh env gxc -exe -o /tmp/ispec-bench -O gerbil/test/interface-specialization-bench.ss
/tmp/gxc.1708454105.0922964/test__interface-specialization-bench.scm:
/tmp/ispec-bench__exe.scm:
/tmp/gxc.1708454105.0922964/test__interface-specialization-bench.c:
/tmp/ispec-bench__exe.c:
/tmp/ispec-bench__exe_.c:
[*] Done
$ /tmp/ispec-bench 1000000
(time (let () (declare (not safe)) (test/interface-specialization-bench#run _iters79_)))
    0.010587 secs real time
    0.010587 secs cpu time (0.010586 user, 0.000001 system)
    no collections
    1408 bytes allocated
    no minor faults
    no major faults
    27638154 cpu cycles
```

**20x, not bad huh?**

Basically all the dynamic dispatch call cost of the MOP for self
references (slots or methods) has disappeared.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants