Ocaml, Python, and ugly inventions in the midst of beauty

Ocaml is a pretty language. It is simple, it is very clear, and while it lags behind the extensibility of Haskell with typeclasses and such, it is easier to use because it eschews the idea of purity through monadic IO(we still don't get IO) and lazy evaluation by default(you can get that with the Lazy module though).

When we started using it we were a sophomore at RPI(before we dropped out of the college due to mental health issues and constant escalating panic), and since it has improved by a fairly significant amount, especially in terms of the tooling available, but the core language has changed relatively little.

The language itself is based on the old Meta Language(ML) syntax, the representations extended for object orientation, it is largely functional, but not purely functional like Haskell is. It features type inference and is statically typed. It has an interpreted mode(invoked with Ocaml) and also supports compilation into relatively efficient native code(compiled with ocamlopt).

For package installation and Ocaml version management, the opam package manage is wonderfully simple to use and makes it possible to avoid writing all that code you need yourself using a admittedly bare standard library that comes with Ocaml.

If you are building a larger project or number of projects, you may use the dune build system to compile and manage it.

You can also compile it to javascript with the js_of_ocaml tool from the Ocsigen project, as well as bucklescript.

It is not a highly used language, but it has a reasonably active ecosystem.


module type Point = sig
  type t

  val zero : t
 (*float should probably be replaced with an additional type parameter  
    but we weren't thinking about showing this off when we wrote it.*)
  val distance : t -> t -> float

  val add : t -> t -> t

  val divide : t -> float -> t
end

This defines a type signature for that you can then feed into, for example


module K_MeansFactory (T : Point) = struct
  type t = { points : T.t array; centroids : T.t array }
  val init : T.t array -> int -> t
  (*Update the approximations of the centers of clusters, 
   *returning the centroid's index that each point gets and  and the 
   *distance each centroid moves by*)
  val update t->(int array * float array)
end

This is abbreviated because the actual implementation is significantly longer. You may have also noted that it doesn't return the type of the K_MeansFactory, this is on account of the fact that arrays in Ocaml are mutable, and given the size of the arrays it can work with, that might not be a bad thing.

So, when you want to do K-means on points of two dimensional floats, you can do this:

module FloatingPoint2 = struct
  type t = float * float

  let zero = (0.0, 0.0)

  let distance (ax, ay) (bx, by) =
    let cx = bx -. ax in (*The -. operator is the floating point version of the - operator.*)
    let cy = by -. ay in (*Adding "let open Float in" before this usage would allow - to be used for float instead*)
    Float.sqrt @@ ((cx *. cx) +. (cy *. cy)) (* @@ is the ocaml composition operator, much like '$' is in haskell *)

  let add (ax, ay) (bx, by) = (ax +. bx, ay +. by)

  let divide (x, y) c = (x /. c, y /. c)
end

The lack of polymorphic mathematical operators honestly isn't great for Ocaml, oddly the comparison operators < > = >= <= are polymorphic(and there is some talk in the community about how they are dangerous and thus undesirable), but they prevent some of the issues that you can get when you're coming from a C background and thus accustomed to things like automatic type promotion for integers to floats and 'seemless' interop between the two varieties of numbers.

This isn't the project we wanted to talk about however. That dubious honor goes to the questionably named Ocaml-Bot-Dfa project we whipped up using a few libraries found in the ecosystem(Angstrom, Zarith, and pyml). For a fediverse bot that does what it does, that's a lot of libraries, isn't it? Yes, it is. It also uses some python in order to post to mastodon

For what it's worth, we consider them all useful and fulfilling the more or less minimal requirements that we came up with through the process of developing it.

For example, it takes botspeak in a format like this:

<color> = { "green"
          | "blue"
          | "yellow"
          | "red" };
<mountain-noun> = { "hills"
                | "foothills"
                | "mount"
                | { "devil's"
                  | "angel's"
                  | "broken"
                  }
                  "mountain"
                | { "dragon's"
                  | "ogre's"
                  | "warlock's"
                  | "quarterback's"
                  | ""
                  }
                  {
                    "teeth"
                  | "crag"
                  }};

The rules in forms such as <color> describe collections of sequences that they can generate, and the stuff in forms like { "Hello" | "Goodbye"} provide a branch point that the bot may choose between. In addition, the bot may carry a little bit(of arbitrary size) of state forwards and then reconstitute it on demand later on.

To be honest, it's a mess, but the delineation between the parsing, generation, and then posting into three separate modules makes most of the parts significantly easier to write, and the python code is simple enough, but the mixture of langauges doesn't sit well with us.

Python isn't a dependency you can just say "This is light enough" for, it's a big language, with a big runtime, Mastodon.py does require its own dependencies too, and if you don't have them already installed then our bot program won't by any means feel lightweight, but the best option that we can find for ocaml would involve writing our own authentication code, but that is just enough effort that we don't much care to do it for this bot alone.

In either case, this is the first project we did after being away from ocaml for several years, so we're reasonably happy with it, even though it's definitely messier than it ought to be.