# The integral of csc(x)

[NOTE: At the end of editing this, I found that the substitution used below is famous enough to have a name, and for Spivak to have called it the "world's sneakiest substitution".  Glad I'm not the only one who thought so.]

In the course of working through some (very good) material on neural networks (which I may try to work through here later), I noticed that it was beneficial for a so-called “activation function” to be able to be written as the solution of an “easy” differential equation.  Here by “easy” I mean something closer to “short to write” than “easy to solve”.

The [activation] function sigma.

In particular, two often used activation functions are
$\sigma (x) := \frac{1}{1+e^{-t}}$
and
$\tau (x) := \tanh{x} = \frac{e^{2x}-1}{e^{2x}+1}.$

One might observe that these satisfy the equations
$\sigma' = \sigma (1-\sigma),$
and
$\tau' = 1-\tau^2.$

By invoking some theorems of Picard, Lindelof, Cauchy and Lipschitz (I was only going to credit Picard until wikipedia set me right), we recall that we could start from these (separable) differential equations and fix a single point to guarantee we would end up at the functions above.  In seeking to solve the second, I found after substituting cos(u) =τ that
$-\int\frac{du}{\sin{u}} = x+C,$
and shortly after that, I realized I had no idea how to integrate csc(u).  Obviously the internet knows (substitute v = cot(u) + csc(u) to get the integral being -log(cot(u)+csc(u))), which is a really terrible answer, since I would never have gotten there myself.

Not the right approach.

Instinctually, I might have tried the approach to the right, which gets you back to where we started, or by changing the numerator to cos2x+sin2x, which leads to some amount of trouble, though intuitively, this feels like the right way to do it.  Indeed, eventually this might lead you to using half angles (and avoiding integrals of inverse trig functions).  We find
$I = \int \frac{du}{\sin{u}} = \int \frac{\cos^2{u/2} + \sin^2{u/2}}{2\cos{u/2}\sin{u/2}}.$
Avoiding the overwhelming temptation to split this integral into summands (which would leave us with a cot(u)), we instead divide the numerator and denominator by sin2(u) to find
$I=\int \frac{1+\tan^2{u/2}}{2\tan{u/2}} du.$
Now substituting v = tan(u/2)we find that dv = 1/2 (1+tan2(u/2))du = 1/2(1+v2)du, so making this substitution, and then undoing all our old substitutions:
$I = \int \frac{1+v^2}{v}*\frac{2}{1+v^2}dv = \int \frac{dv}{v} = \log{|v|} + C = \log{|\tan{\frac{u}{2}}|}+C = \log{|\tan{\frac{\cos^{-1}\tau}{2}}|}+C.$

The function tau we worry so much about. Looks pretty much like sigma.

Using the half angle formulae that everyone of course remembers and dropping the C (remember, there’s already a constant on the other side of this equation), this simplifies to (finally)
$I = \log{|\frac{\sqrt{1-\tau^2}}{1+\tau}|}.$  Subbing back in and solving for $\tau(x)$ gives, as desired,

$\tau(x) = \frac{e^{2x}-1}{1+e^{2x}}$.

Phew.

# Hamming codes in python

As a bit of an experiment I’ve started up a GitHub repository with some code from this blog here [LATE EDIT: you need numpy (and might as well get scipy and matplotlib) to run this].  In particular, this week I’ve implemented a script in Python which contains a script for generating Hamming matrices (one for encoding, one for parity check, one for decoding), which constitutes as much of a proof of existence as I’m willing to get into.

There is also a “Message” class inside where you can play around with how many corrupted bits your message might have versus how long your message is versus the size of Hamming matrix you use.  The defaults are set at 3 corrupt bits in a set of 4000 bits, with the error checking done with the Hamming(7,4) code.  You can run this by downloading hamming_codes.py and running “python hamming_codes.py” from the command line.

The specific directory with this project is located inside the “HammingCodes” folder.  Possible experiments with this code later, but now I need sleep!

# Hamming Codes III

We are now in a position to actually write down the Hamming (7,4) code.  As explained in the previous three entries, we want some way of both detecting that an error occurred and of correcting that error.  As with many problems, this is much easier once you know it is possible (thanks, Richard Hamming!)  In particular, last time we proved that it is necessary to send a 7-bit message in our scheme of correcting errors for a 4-bit message, but is it sufficient?  An easy way to deduce the solution in this case (and then to see the pattern that proves the general case) is to require that our parity check detects the location of any 1-bit error.

Specifically, someone will receive a (possibly corrupt) 7-bit string v, and we want a matrix that will output 0 if all is well, or communicate a number 1-7 if one of the bits is corrupt.  It takes 3 bits to communicate 8 numbers (8 = 23), so our parity matrix H (following wikipedia’s notation) must be 3 x 7.  To make it easy to remember, we’ll even have column j be the binary representation for j.  More directly:

I am so fed up with wordpress’ latex compiler…

Now we can work backwards (again, we’re assuming an answer exists),  and for reasons that may be clear later, we’ll set our three parity bits to be the three “singleton” columns of H, so that the “coded” message v = (p1,p2,d1,p3,d2,d3,d4).  Then if everything goes smoothly, we have that Hv0, so that

0 = p1+d1+d2+d4
0 = p2+d1+d3+d4
0 = p3+d2+d3+d4.

Notice also that if one bit gets corrupted, this is equivalent to sending the message v+ej, and
G(v+ej) = 0+gj,
where gj is the jth column of G (which is the binary representation of the number j). Hence multiplying a message with a 1-bit mistake gives us the index of the corrupt bit.
But this tells us how we must encode our message m = (d1,d2,d3,d4) as well.  We want a matrix G so that G= (p1,p2,d1,p3,d2,d3,d4).   But the above gives us a linear condition for what this matrix must look like (and an explanation for why the parity bits are all “singletons”).

Finally we want to “decode” our message, which is also straightforward at this point, since it will just be the matrix which returns the non-parity bits from the encoded message.

As a review, and to wrap everything up:

2. Transmit the message v = Gm = (0,0,1,1,0,0,1)
3. Check the parity by confirming that Hv = (0,0,0).
4. Decode the message Rv = (1,0,0,1), as desired.

Wikipedia’s also got an explanation involving Venn diagrams which I did not much like, though I may write a bit about Venn diagrams themselves in the future…

# Hamming Codes II

No reasonable pictures for today! Have a cactus!

So now we know the general idea behind Hamming error correcting codes, and how one might construct and visualize hypercubes.  Now suppose we want to encode, in an error correcting way, 4 bits.  Recall that this means finding a hypercube with enough vertices that we can designate 16 (=42) of them, *and* pick those 16 so that no two “symbol vertices” are closer than distance 2.  This means each “symbol vertex” has a disjoint neighborhood of distance 1.

A back of the envelope calculation gives a necessary condition to allow this: an n dimensional hypercube has 2n vertices and each vertex has n neighbors (so a “symbol neighborhood” takes up n+1 vertices).  Hence it is necessary that n satisfy

16*(n+1) ≤ 2n.

More generally, to encode m bits in n bits, we require 2m*(n+1) ≤ 2n.  Note without proof (for now, hopefully soon by construction) that this is also a sufficient condition.  Interesting from an efficiency point of view is seeing where equality exists.

Taking logs (base 2) of both sides, and realizing that log(n+1) is an integer only when (n+1) is a power of 2, so m = n-log(n+1), or, letting :=2k-1, m = 2k-1 - k.  In fact, one may (and we may) describe a whole class of Hamming (2k-1, 2k-1 - k) codes.

# A brief foray into hypercubes

The discussion on error-correcting codes is about to get a little hypercube heavy (never a good state to be in), and a brief foray into how to construct/visualize them may be in order.  I’ll take the liberty of defining an n-dimensional (unit) hypercube as a shape whose

1. vertices are located at coordinates made of entirely 0′s and 1′s, and
2. has an edge wherever two vertices are distance 1 apart.

This would take two more things to make a complete definition: I should let you move the cube about however you like (no reason to have it fixed is space), and I should tell you about the 2-D faces, 3-D hyperfaces, and so on up to the (n-1)-D hyperfaces.  You can use that first one if you want, but I’ll ignore the second.  I think I did a good job of defining what’s called the 1-skeleton of a very particular n-dimensional hypercube.

Two sample vertices representing (1,1,0,0) and (1,0,1,0). These will not be connected in the hypercube.

Anyways.  Wednesday had pictures of a 2-cube and 3-cube.  What about the 4-cube?  Or 5-cube?  It will help to consider this all from a less analytic, more graph theory (or, if that sounds technical, “pictures and problem solving”) point of view.  Condition 1 for a hypercube says that there are 2n vertices, all the binary sequences of length n. Then condition 2 says that two vertices are connected if you can change one vertex’s binary sequence to the other’s by changing a single bit.   We’ll go one step further, by just coloring particles on a line: white for 0, black for 1 (this is something of a homage to my undergraduate thesis advisor’s work with polyhedra).

The only two things left to do are to draw the vertices and arrange them in nice ways (that is, fine a “nice” projection).

A projection of a 4-cube, with vertices labelled.

Below is the image from the wikipedia 5-, 6-, and 7- cubes.  Note the some of the vertices are laying on top of eachother.  I’ll leave it as an exercise to the reader to label these vertices with the appropriate binary sequences.

5-cube

6-cube

7-cube

# Hamming Error-Correcting Codes

This is part 1 in hopefully a series on the Hamming error-correcting codes, to be continued on Friday.  The problem is this:  suppose you wish to send a message composed entirely of 0′s and 1′s, but suspect part of it might get “corrupted”.  More concretely, I am going to send you the message

01101,

but there’s a chance that you receive instead something like 00101 (so the second bit was flipped), or 01111 (so the fourth bit was flipped).  Is there any way to control for this?  An idea that is both natural and wrong is to send two copies of the message.  In this case, I’d send along

0110101101.

Now if a bit gets flipped (note that there are now more bits that could be flipped), you can see exactly where — perhaps you receive

0010101101,
where the non-matching bits are highlighted.  The problem here is that you cannot tell whether the offending bit should be interpreted as a 0 or a 1 (which might be ok if you are only interested in error detection).  But if we want to be able to correct the error, we’ll have to be a little more clever.  As a very broad outline of our strategy, we are going to take each of the symbols we would like to transmit and encode them into a “big enough” space so that each symbol has a buffer zone around it to account for any errors in transmission.

In some sense, this is a familiar strategy: on touchscreen keyboards you do not have to hit the exact spot for a key to work, only be close enough.  In this case, the “symbols” I’m referring to are letters, and the “buffer zone” is the area of the screen you can touch so the computer recognizes that you are trying to touch this particular letter.

The trick here (and what is sort of confusing about this) is that the symbols we wish to transmit are bits, and the space that we will be embedding our symbols into will also be bits (rather than a shiny retina display!) As a hint of things to come, I’ve displayed below a representation of the space of 2-bit strings (which will not be big enough to detect 1-bit errors), and a representation of the space of 3-bit strings, which is of perfect size to detect 1-bit errors in a 1 bit message.

All 2-bit strings, with a line connecting them if the distance between them is 1, so you can get to one from the other by switching a single bit.

All 3-bit strings, connected whenever the distance between them is 1. Notice that the points (0,0,0) and (1,1,1) have disjoint balls of radius 1 around them. This will be the basis of the error correcting code, by agreeing that anything received in the red “ball” will be interpreted as the symbol (0,0,0), and anything received in the blue “ball” with be interpreted as (1,1,1). Then this can be turned into a 1 bit error correcting code.

# Networks

Theme of the day is networks, and let’s even say self-organizing networks.  I expect to flesh this out in the future, but I have some pretty pictures now, so that’s a pretty good start.

A graph with 30 cliques of various sizes.

A general problem will be to take a graph (nodes and edges, etc), and identify sub-populations of that graph.  Two great questions, (with some reasonable answers):

- What kind of graph?
We could restrict the discussion to nodes connected by edges, with only one edge allowed between two nodes, and no self connections (this might be a “friend” graph in facebook), but perhaps it would be more useful to look at a directed graph (like a “follow” graph in Google+) where a connection is only one-way.  A weighted graph could let you have strong or weak connections – maybe you are interested in a “normalized” facebook friend graph, where the “strength” of a friendship (it strikes me that the quotation marks might just as well be around “friendship”) depends on how many friends each person has.  A bipartite graph might model facebook “likes”, since a person can like a product (but not other people).

The adjacency matrix for the above graph.

-What does it mean to “identify sub-populations”?
I hope to have some images below that help to give an intuitive understanding of this, and maybe flesh out that understanding in a later post.  This is actually a hard question to answer — must every node belong to some sub-population?  Must a node belong to only one sub-population?  In the first case, where does the crunchy peanut-butter eater belong, in a land of creamy peanut-butter eaters?  In the second case, where does a triathlete belong in a population of runners, swimmers and bikers?

The above matrix with the columns/rows acted on by the same permutation. This is what one might expect their network data to look like, and you’d hope to find the block structure above (or find an indication that such structure does not exist).

Anyways, in order to investigate some of this I’ve got some basic scripts set up that:
1) Generate a graph with 30 cliques of various sizes, sparsely connected to each other
2) Print an image of this graph
3) Print the adjacency matrix of the graph
4) Scramble the matrix, and print that
5) Run the affinity propagation algorithm from sklearn on the scrambled matrix, and print that

If affinity propagation was perfect, it would return a result very close to the original block matrix (possibly with the blocks permuted).

The results of affinity propagation. Notice that it identifies many more clusters than originally existed.  Also notice that this is also an adjacency matrix for the above graph.

# Volume of Balls

I like the following question:

“Which dimension has the ‘biggest’ unit ball?”

To define a few words, a “ball” of radius r about a point x in a metric space with metric d (in particular, in $R^n$ with standard metric) is defined by $B_X(x,r) = \{ y \in X: d(x,y) \leq r \}$.  So the unit ball in Euclidean space is the collection of points of distance less or equal to one from, say, the origin.  If you think I am being overly careful with this definition, you’re right!

1-, 2-, and 3-dimensional balls.

By “biggest”, I will mean the n-dimensional volume.  So if we define V(n) to be the volume of the unit ball in $R^n$, then V(1) = 2 (the 1-dimensional ball is a line), $V(2) = \pi r^2 \approx 3.14$, and $V(3) = \frac{4}{3} \pi r^3 \approx 4.19$.  So the volumes are getting bigger, and maybe we have a good question on our hands.  But now what’s the volume of a four dimensional sphere?

It turns out that one can derive the formula

$V(n) = \frac{\pi^{n/2}}{\Gamma(n/2+1)}$,

where $\Gamma$ is the gamma function, a generalization of the factorial (!) function!  In particular, $\Gamma(n+1) = n!$ for integers n.  Then, for even dimensions,

$V(2n) = \frac{\pi^{n}}{n!}$.

Now so long as we are only worried about integer dimensions, we can use the half-integer identity for the gamma function:
$\Gamma(n+1/2) = \frac{(2n)!}{4^nn!}\sqrt{\pi},$
to get a similar formula for odd dimensions:
$V(2n-1) = \frac{4^n\pi^{n-1}n!}{(2n)!}$.

Then a quick calculation gives:

V(1) = 2.00
V(2) = 3.14
V(3) = 4.19
V(4) = 4.93
V(5) = 5.26
V(6) = 5.17
V(7) = 4.72
V(8) = 4.06
V(9) = 3.30.

We note that the denominator for both odd and even dimensions grows much faster than the numerator to conclude (in academia we could only nod our head suggestively so far at this conclusion, but we’re in industry now!) that the 5-dimensional ball has the most volume.

As an aside, and possibly a subject for a later post, if we allow for fractional dimensions (everything in V(n) is defined for n a real number, not just an integer), then the maximum value of V is at about 5.2569, where the unit ball has volume around 5.2778.

The function V(n), plotted for real numbers n, with max around (5.26, 5.28).

# Some code.

Decided to publish the code from yesterday’s example, though wordpress only allows for certain somewhat odd file types to be shared, so I am just copy/pasting below.  There are 5 functions, and as usual, you save each to the same folder in MATLAB, open that folder, and then typing

>>billiardgame3(25,10,15,20)

will generate a 3-dimensional billard game with 25 balls of masses between 0 and 10, speeds between 0 and 15 bouncing for 20 seconds, then play this.  It also returns the position vectors [X,Y,Z] associated with this.  I’d be glad to also publish the 2-dimensional version of this, though it should not be too hard to wade through my code and cut it down to two dimensions.

function [v1n,v2n] = collide3(m1,m2,v1,v2)
%solves the conservation of velocity and momentum for two balls of mass m1
%and m2 colliding at velocities v1 and v2, and returns their new
%velocities.
C1 = (m1-m2)/(m1+m2);
C2 = (2*m2)/(m1+m2);
C3 = (2*m1)/(m1+m2);
v1n = C1*v1+C2*v2;
v2n = -C1*v2+C3*v1;
end

%takes vectors X, Y, and Z where each entry contains the x- or y-position of
%a billiard ball and returns an n x 2 matrix A containing the indices where
%and a matrix W containing the indices of which balls collided with walls
%in a w x l x h rectangle

C = X>=w; %oddly, I can’t find a way to make this code nicer. Checks collisions with walls
C = +C;
D = Y>=l;
D = +D;
E = X<=0;
E = +E;
F = Y<=0;
F = +F;
G = Z>=h;
G = +G;
H = Z<=0;
H = +H;
C = C+D+E+F+G+H;
for j = 1:length(C)
if C(j) > 0
W = [W;j];
end
end

X = ones(length(X),1)*X;
Y = ones(length(Y),1)*Y;
Z = ones(length(Z),1)*Z;

B = sqrt((X-X.’).^2+(Y-Y.’).^2+(Z-Z.’).^2); %a matrix whose (i,j)th entry is the distance between particle i and particle j
A = [];
for j = 1:size(B,2)
for k = 1:j-1
if B(k,j)==1
A = [A;k,j];
end
end
end
end

%simulates billiards with elastic collisions on a w x l billiards table.  M
%should be a vector recording the (positive) masses of the billiard balls
%(the function will create as many balls as the length of M).  vx, vy, X0,
%Y0 will similarly be vectors giving the initial x and y velocities of each
%billiard ball, and then initial positions.  The program runs for T seconds
%with time step tstep.  A reasonable setup is
%billiards(9,4.5,randi(10,1,9),3*rand(1,9),3*rand(1,9),.5+randi(8,1,9),.4+ran
%di(4,1,9),5,.2,.01)

X = zeros(floor(T/tstep),length(M)); %initialize the three position arrays, one column per particle
Y = zeros(floor(T/tstep),length(M));
Z = zeros(floor(T/tstep),length(M));

X(1,:) = X0; %set initial position
Y(1,:) = Y0;
Z(1,:) = Z0;

for k = 2:floor(T/tstep)
tryxpos = X(k-1,:)+tstep*vx; %here we check if any collisions will happen in the next step
tryypos = Y(k-1,:)+tstep*vy;
tryzpos = Z(k-1,:)+tstep*vz;

for j = 1:size(A,1)
[vx(A(j,1)),vx(A(j,2))] = collide3(M(A(j,1)),M(A(j,2)),vx(A(j,1)),vx(A(j,2))); %avoiding collisions with particles
[vy(A(j,1)),vy(A(j,2))] = collide3(M(A(j,1)),M(A(j,2)),vy(A(j,1)),vy(A(j,2)));
[vz(A(j,1)),vz(A(j,2))] = collide3(M(A(j,1)),M(A(j,2)),vz(A(j,1)),vz(A(j,2)));
end
for j = 1:length(W)
if tryxpos(W(j)) >= w || tryxpos(W(j))<=0 %avoiding collisions with walls
vx(W(j)) = -vx(W(j));
elseif tryypos(W(j))>=l || tryypos(W(j))<=0
vy(W(j)) = -vy(W(j));
elseif tryzpos(W(j)) >=h || tryzpos(W(j))<=0
vz(W(j)) = -vz(W(j));
end
end
X(k,:) = X(k-1,:)+tstep*vx;%updating the position with the “fixed” velocity vectors
Y(k,:) = Y(k-1,:)+tstep*vy;
Z(k,:) = Z(k-1,:)+tstep*vz;
end
end

function billiardplayer3(X,Y,Z,ptime,w,l,h)
%a program for visualizing billiard movement in 3 dimensions.

figure(1)
axis equal
%maxwindow
for j = 2:size(X,1)
%color the first particle red
plot3(X(j,1),Y(j,1),Z(j,1),’ro’,'MarkerFaceColor’,'r’)
hold on
%keep the rest of the particles blue circles
plot3(X(j,2:size(X,2)),Y(j,2:size(Y,2)),Z(j,2:size(Z,2)),’o')
hold off
axis([0 w 0 l 0 h])
grid on
pause(ptime)
end
hold on
%draw the path of the first particle
plot3(X(:,1),Y(:,1),Z(:,1),’k',’LineSmoothing’,'on’)
end

function [X,Y,Z] = billiardgame3(N,massmax,vmax,T)
%generates and plays a game of billiards with N randomly placed billiard
%balls on a table.  The balls have random mass between 0 and massmax,
%velocity between 0 and vmax, and plays for T seconds, with timestep .01.
w = 9;
l = 4.5;
h = 4.5;

M = massmax*rand(1,N);

vx = 2*vmax*rand(1,N)-vmax;
vy = 2*vmax*rand(1,N)-vmax;
vz = 2*vmax*rand(1,N)-vmax;

X0 = w*rand(1,N);
Y0 = l*rand(1,N);
Z0 = h*rand(1,N);

tstep = 0.01;

billiardplayer3(X,Y,Z,0.01,w,l,h)
end

# Bouncing balls

Inspired, as usual, by Leonid’s recent post, I decided to first write a script that would mimic his.  After that, since I had all the numbers worked out, I wrote two more MATLAB programs: one that mimicked elastic collisions in 2-dimensions, and one that mimics them in 3.

In theory, you can specify the number of particles and their radius, as well as the mass, position, and initial velocity for each (I didn’t vectorize radius for some reason, so I cannot model balls of different sizes bouncing around).  However, in practice I just generate random vectors for each of these numbers.  The final aspect is that the domain I put the balls in was a pool table of 9 x 4.5 units, or 9 x 4.5 x 4.5 for the 3D version.  This was just to make calculating the reflecting angle easier when a ball hit the wall.

As with Leonid’s code, mine works by checking whether the next step will cause any collisions, then adjusting the velocity vector so that the collision didn’t happen (using conservation of momentum and kinetic energy).  This algorithm is not “smart” in the sense that by avoiding one collision, it might get pushed into a second collision which it does not detect, and if a particle gets going fast enough, it can reflect off a wall from a large distance (my time step is just 0.01).  You can spot this in some of the figures below.

Anyways, here are some of the outputs.  I did not go through the trouble of turning these into .gifs, but they play fairly smoothly.  What happens is I simulate N particles of varying masses and velocities bouncing around in a 2- or 3- dimensional box for T seconds, then plot the path of one of the particles.  The end position of all the particles, plus this path, is in each picture below (with the “tracked” particle colored in red).

4 particles for 20 seconds. Not many collisions... I can spot 3, I think.

40 particles bouncing for 20 seconds. This particle is involved in a few more collisions than last time (and looks to have been moving faster, too.

400 particles bouncing for 20 seconds. Quite a few collisions here.

4 particles bouncing for 20 seconds in 3 dimensions. Again, not many collisions (the ball radius is 0.1, compared to the volume of the box, 9x4.5x4.5 = 182).

Now 40 particles for 20 seconds in 3d. A few more collisions, and it looks like it was going pretty fast in the middle there for a while.

400 particles for 20seconds. Starting to look more like the "random walk" of Robert Brown's pollen, though I would certainly have to mess with how heavy the particles were to more accurately model that.