-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BLAS desiderata
Matti Picus edited this page Aug 23, 2018
·
6 revisions
The numerical ecosystem could really use a modern, optionally-multithreaded BLAS under a BSD-like license with a priority on
- Correctness
- Out-of-the-box single-binary functionality (e.g., runtime kernel selection, runtime thread control)
- Speed
- Portability
...in roughly that order.
OpenBLAS is currently the library that's closest to providing these things, but there are a number of improvements possible. Fixing these might make some good concrete targets for people to go after:
- The path leading to getting a generally-useful build is lined with tricky booby-traps (e.g., automagic capping of the maximum number of threads and the famous
NO_AFFINITY
). - There are concerns about lack of tests. That link lists a number of specific bugs that made it past the existing test suite and still are not tested for; in general it would be very useful to build up a set of comprehensive BLAS/Lapack tests that includes tests for realistic problem sizes.
- It's not possible (?) to override CPU detection at runtime, which makes it hard to run comprehensive tests.
-
The use of AT&T-syntax inline asm (?) prevents the use of MSVC; using intrinsics instead might be more maintainable and certainly more portable.MSVC now supported - ...any more?