API design
A little friction in your design is good.
In python there are two ways to enforce this. First with keyword-only arguments:
def function(a, **, new, this): pass function(0, new=1, this=2)
Then with namedtuples or dataclasses:
class Parameters(NamedTuple): a: int b: str @dataclass class Parameters: a: int b: str
Avoid **kwargs at all cost
They are hard to document. Use dataclasses or namedtuples instead (similar to the Go way of doing things). This also has the advantage of putting the emphasis on your data structure, the interface with the outside world. A way to control the data (can add validation when initializing a dataclass for instance)
Difference between a framework and a library
- Designed to do specific things
- Composes with other libraries (composability)
- Modularity
- Framework already contain the base flow & architecture of the application you're building (Flask vs http library)
Build the conceptually right thing first, the abstraction later
Naming is 90% of the work
Take it seriously.
Always be thinking "How much documentation am I going to have to write for this?"
This is linked to naming. Ideally function name and name of the arguments should suffice.
Writig the function is only 50% of the work
This is junior's most frequent error. Once the code has been written they consider their job done. But you need to keep re-writing the code until it makes sense to someone else.
Programming is writing
Hunmanities students make better programmers. Stop thinking of programming as a scientific discipline (I mean it slightly is for machine learning). You are writing.
But mostly reading
Code is read more often than written.
OOP is usually the path to broken abstractions
When you create objects, you are populating a world. Bad APIs create a fantasy world, a collections of objects that have no bearing on reality.
Is Machine Learning really about creating objects? What's an algorithm? Can you eat it?
"There is an object X that does Y" vs "When I take X and apply this transformation I get Z" Stateless. If I run it twice with the same input it gives me the same result.
Sometimes combinations of transformations are useful and we give them a name: a new object or a new function.
What are you preventing your users from doing?
We are often thinking in terms of what our API allows users to do, less in terms about what it prevents them from doing. In some cases creating a sandbox is good (framework), but sometimes it is frustrating (library).
You need to have trap doors, way to escape your leaky abstractions
The Julia language is great with that. You can always act on the abstraction layer below.
A model is a combination of arrays and operations on arrays.
Write that first.
API = abstraction. All abstractions suck.
Naming
There is a joke in CS that naming is the hardest thing in programming along with cache invalidation. Except it is not a joke. Naming is hard, and good names go a loooong way.
Why do poiticians use vague terms? Allow people to project whatever people want on them. Don't code like a politician. Be pedantic.
Favorite APIs
numpy
, I rarely have any surprises with functions I discover for the first time;schedule
jax
Choose COMPOSITION OVER CONFIGURATION
Interchangeable elements should have the same signature
Please do not reinvent the wheel!
In doubt do like numpy. Example of theano is partially hard to use because it uses different naming and type casting conventions.
You are not going to do better than numpy; if you do no one will adopt your conventions.
import jax.numpy as jnp
and import theano.tensor as tt
JAX feels like the natural extension of numpy to which it adds:
- JIT-compilation
jax.jit(fn)
- SIMD mapping
jax.vmap(fn)
orjax.pmap(fn)
- forward and backward autodiff:
jax.grad(fn)
def function(x): y = jnp.exp(x) return y