Hello World - Evan Madill

Intro

After submitting many applications requiring a portfolio website, and realizing I didn't have a well maintained collection of projects or writing I had worked on, I decided to put up a simple site. One requirement from the beginning was that it should be dead simple to update. I often only remember to update when I'm away from a computer, and so any git based content management was a bit too slow for what I wanted.

For this I was looking into headless CMS's such as Strapi and Sanity. Sanity seemed to do the trick, and was easy enough to use on mobile.

MDX

While the content is managed in Sanity, I used MDX to render any content. This lets us do cool things with custom components and syntax highlighting. I'll test this a bit below running through a common interview question on K-Means.

Kmeans

kmeans.py

def kmeans(X, k, max_iters=100):
    centroids = X[np.random.choice(len(X), k, replace=False)]
    for _ in range(max_iters):
        distances = np.linalg.norm(X[:, None] - centroids, axis=2)
        labels = np.argmin(distances, axis=1)
        new_centroids = np.array([X[labels == i].mean(axis=0) for i in range(k)])
        if np.all(centroids == new_centroids):
            break
        centroids = new_centroids
    return centroids, labels

It makes use of an "iterative refinement" technique. Given a set of inputs $(\mathbf{x}_1, \mathbf{x}_2, \dots, \mathbf{x}_n)$ , $k$ -means aims to partition the $n$ items into $k$ sets $S = \{S_1, S_2, \dots, S_k\}$ with each cluster minimizing the distance to it's centroid by the sum of squares (WCSS).

This objective can be written as:

J = \sum_{i=1}^{k} \sum_{\mathbf{x} \in S_i} \| \mathbf{x} - \boldsymbol{\mu}_i \|^2

Each point $x$ is assigned to the cluster with the nearest mean (the centroid $\mu_i$ ). This can be written as np.argmin(distances, axis=1) or:

S_i^{(t)} = \left\{ x : \left\| x - \mu_i^{(t)} \right\|^2 \le \left\| x - \mu_j^{(t)} \right\|^2 \forall j, 1 \le j \le k \right\}

Finally, centroids are recalculated as the mean of all points in that cluster np.array([X[labels == i].mean(axis=0) for i in range(k)]), or formally:

\mu_i^{(t+1)} = \frac{1}{|S_i^{(t)}|} \sum_{x_j \in S_i^{(t)}} x_j