./home/davidsiaw

Adjacency in a Grid

2020-05-04T04:07:56+00:00

Many times when I have to deal with grids and adjacency, I always come up with a solution that results in difficult to understand code that makes no sense to anyone who uses it. Here I will try and explain my system to defining adjacency in a square grid and why I use it.

Consider a grid where you stand on AA and you want the coordinates for the things around you.

00 00
AA 00
00 00

I will for this thing assume you have a simple expression to convert your xy coordinates to position in an array

pos = y * width + x;

Lets not consider the diagonals for now and concentrate on just the sides. I usually set numbers to define directions:

LEFT  = 0
RIGHT = 1
UP    = 2
DOWN  = 3

These numbers seem arbitrary but there is a reason to them, in binary these numbers are

LEFT  = 00
RIGHT = 01
UP    = 10
DOWN  = 11

When you do this its now possible to get adjacent cells like this:

int oper = dir << 1;
int d[2] = {0, 0};
d[oper >> 2] = (oper & 10) - 1;
diff = d[0] + d[1] * width;

Let’s unpack that. The numbers we chose for each direction lets us assign a meaning to each of the bits, so we have one bit for direction, and one bit for dimension:

00 <- direction bit
^
dimension bit

Using this arrangement, first we assume we are on the X axis, and if its a 1 we want to move forward and if its 0 we want to move backwards. This means we have to convert 0 to -1 and 1 to 1. We can do this by subtracting 1 from 2 times the direction bit.

int dx = (2 * dir & 1) - 1;

Next, we want to use the dimension bit to choose a dimension. This is straightforward because we simply create an array with two elements, and use the dimension bit to index the array.

int d[2] = {0, 0};
d[dir >> 1] = dx;

Since we are walking a one dimensional array, we can simply substitute each bit of the directions int our array:

diff = d[0] + d[1] * width;

Putting it all together, we have

int dx = (2 * dir & 1) - 1;
int d[2] = {0, 0};
d[dir >> 1] = dx;
diff = d[0] + d[1] * width;

We realize that anything multiplied by two is just a left shift, so we first of all shift the entire number up one bit. (I will show why we do the entire number instead of the direction bit later)

int oper = dir << 1;

And then dx simply becomes

int oper = dir << 1;
int dx = (oper & 10) - 1;

Lets get rid of dx since its so simple now, and we can use oper as well with a down shift of 2, but it doesn’t really matter. Both are equally fast.

int oper = dir << 1;
int d[2] = {0, 0};
d[oper >> 2] = (oper & 10) - 1;
diff = d[0] + d[1] * width;

By making this a function, we can use it to produce an amount we can move in the array to where we want to be:

int pos_of(Dir dir)
{
  int oper = dir << 1;
  int d[2] = {0, 0};
  d[oper >> 2] = (oper & 10) - 1;
  return d[0] + d[1] * width;
}

Using this, say we have the position somewhere in the array

us = arr[x];

Usage

We can simply add our movement into the position to give us the say, cell below ours:

below = arr[x + pos_of(LEFT)];

If we wanted to access all the side cells, we just have to loop through all of the 4 numbers.

for(int i = 0; i < 4; i++)
{
  arr[x + pos_of(4)] = 0; /* Set all adjacent cells to 0 */
}

Diagonals

Say we want to also do diagonals, then we modify our map to this

010 101
AAA 001
011 110

I usually set a bunch of values for this:

LEFT UP    -> 10 00 -> 100
RIGHT DOWN -> 11 01 -> 101
RIGHT UP   -> 10 01 -> 110
LEFT DOWN  -> 11 00 -> 111

This way we can add to our directions enum:

TOPLEFT     = 4
BOTTOMRIGHT = 5
TOPRIGHT    = 6
BOTTOMLEFT  = 7

Our function then becomes:

int oper = (dir & 3) << 1;
int d[2] = {0, 0};
int idx = (oper >> 2);
int diag = (dir >> 2);
int second = (oper & 10) ^ (dir & 10);
d[idx] = (oper & 10) - 1;
d[~idx & 1] = (second - 1) * diag;
diff = d[0] + d[1] * width;

Now, lets analyze this again. First of all, notice that the new directions all have an extra bit set, this bit is what I call the diagonal bit. We first of all need to remove this diagonal bit from our original oper variable, because it will cause a buffer overflow.

int oper = (dir & 3) << 1;
int d[2] = {0, 0};
d[oper >> 2] = (oper & 10) - 1;
diff = d[0] + d[1] * width;

By masking out our oper bit, now the function behaves as if it was

TOPLEFT     = LEFT
BOTTOMRIGHT = RIGHT
TOPRIGHT    = TOP
BOTTOMLEFT  = BOTTOM

However, since these are diagonals, we need to set the other dimension’s diff as well. The missing directions then are:

TOPLEFT     = LEFT    TOP
BOTTOMRIGHT = RIGHT   BOTTOM
TOPRIGHT    = TOP     RIGHT
BOTTOMLEFT  = BOTTOM  LEFT

Since our second directions are -1, 1, 1, -1 respectively, we need to use the XOR operator to produce them. Recall that dir values are

Using an XOR operator on the bottom two bits, we can create

-> 0
-> 1
-> 1
-> 0

Now we can go

int secondbit = ((dir & 1) ^ ((dir & 10) >> 1));

And then applying the (x * 2) - 1 trick, we can essentially make that

-> 0 -> -1
-> 1 ->  1
-> 1 ->  1
-> 0 -> -1

By just applying

int second = secondbit * 2 - 1;

However, since we know that oper is dir shifted up one, we can instead just let secondbit operate in the 2nd bit field, without having to mask and shift dir down by one, we can just use it where it is, along with oper.

int second = (oper & 10) ^ (dir & 10);

Now that we have our second bit, all we need to do is assign it to the dimension that was not assigned to.

int oper = (dir & 3) << 1;
int d[2] = {0, 0};
int second = (oper & 10) ^ (dir & 10);
d[oper >> 2] = (oper & 10) - 1;
d[~(oper >> 2) & 1] = second - 1;
diff = d[0] + d[1] * width;

But we aren’t done yet, we only need the second dimension to be filled if the diagonal bit is set, so we need to do this:

int diag = (dir >> 2);

And simply multiply the diagonal bit in, to mask it out.

int oper = (dir & 3) << 1;
int d[2] = {0, 0};
int diag = (dir >> 2)
int second = (oper & 10) ^ (dir & 10);
d[oper >> 2] = (oper & 10) - 1;
d[~(oper >> 2) & 1] = (second - 1) * diag;
diff = d[0] + d[1] * width;

We are also repeating ourselves by going oper >> 2 so we go

int oper = (dir & 3) << 1;
int d[2] = {0, 0};
int diag = (dir >> 2)
int second = (oper & 10) ^ (dir & 10);
int idx = oper >> 2;
d[idx] = (oper & 10) - 1;
d[~(idx) & 1] = (second - 1) * diag;
diff = d[0] + d[1] * width;

Now you can use this to access all 8 squares around by iterating from 0 to 7, or simply iterate from 0 to 4 to get adjacent squares only.

for(int i = 0; i < 8; i++)
{
  arr[x + pos_of(i)] = 0; /* set all 8 surrounding squares */
}

Obviously this is very much a toy example and you will still have to check bounds and manage datatypes yourself, but it gives a very nice set of shortcuts to anyone who wishes to work easily on a grid, and hopefully will go some way to explaining the bit twiddling I do when I work on grids.

Working with mruby

2019-08-17T13:55:01+00:00

Having worked with Lua before and made a LuaCppInterface, I decided about a year ago to start working on a C++ interface for mounting mruby as a scripting language.

To start, I have been developing in Ruby for several years now and I found it to be quite a pleasant language to work with. The particularly attractive thing about Ruby is its ability to create very clean DSLs. I have proceeded to use this to build some DSLs of my own.

Despite the power of Ruby, it turned out to be incredibly hard to use as scripting language. However, recently a Japanese-Government-funded project mruby has been taking off and has headed into version 2.0.1. A year ago when I tried using it it was in version 1.4.0, and it was already in a fairly stable state. Since its syntax is meant to be compatible with Ruby 1.9 and its easy to compile into a C++ project, I decided to try it out.

mruby-cpp

To that end, I created mruby-cpp which provides a C++ frontend into mruby. Its still in beta as I trawl through the mruby C API and slowly gain an understanding of it. I’ll continue to evolve it until it makes sense in both the C++ and mruby contexts.

Simple introduction

To start, mruby-cpp is a header-only C++ library. Just clone it into your sources as a submodule or copy it in and start using it. Your executable obviously needs to be linked with libmruby.a.

You can run scripts like this:

#include 

int main()
{
  vm.run("puts 'hello ruby'");
  return 0;
}

You can also set and get global variables, instance variables and class variables:

vm.run("$a = 100;");
int global_a = vm.get_global_variable<int>("$a");

vm.run("@a = 100;");
int instance_a = vm.get_instance_variable<int>("@a");

vm.run("@@a = 100;");
int class_a = vm.get_class_variable<int>("@@a");

You can bind your C++ classes and their methods to mruby, and specify constructor arguments too!

class Person {
  int age;
public:
  Person(int age) : age(age) { }
  int how_old() const { return age; }
};

auto cls = vm.create_class<Person>("Person");
cls.bind_instance_method("how_old", &Person::how_old);
vm.run("puts Person.new(5).how_old")

Hopefully you will have a good idea of what I plan to achieve with this library. There are more examples in the tests folder.

See the tests for more examples!

All the capabilities of mruby-cpp are tested by the tests which are also really good examples of how to use mruby-cpp.

mruby

While I was writing this library, I’ve learned a few things about mruby and actually about Ruby itself too.

The C API isn’t well documented

One of the first things I realize when I started looking for ways to do things in mruby is its C API is sparsely documented. There are a bunch of comments in the headers but they are not nearly detailed enough.

A lot of functions have very abbreviated names such as

mrb_value mrb_vm_special_get(mrb_state*, mrb_sym);
void mrb_vm_special_set(mrb_state*, mrb_sym, mrb_value);
mrb_value mrb_vm_cv_get(mrb_state*, mrb_sym);
void mrb_vm_cv_set(mrb_state*, mrb_sym, mrb_value);
mrb_value mrb_vm_const_get(mrb_state*, mrb_sym);
void mrb_vm_const_set(mrb_state*, mrb_sym, mrb_value);

I still don’t know what mrb_vm_special_get mean, but I know cv means class variable and const means constant. These are the vm variations, which means they only access global scope. Strangely enough, matz removed vm variations for the instance variables. Not sure why that is the case.

Sometimes typenames are confusing and its not entirely clear how some types are converted to other types, such as RProc* and mrb_value.

Difficulties I faced

While writing mruby-cpp, I encountered some difficulty with unifying mruby objects. All things in mruby are objects but they are not treated that way in the API. This is something I still have to solve.

Contrast with Lua where it exposes its GC reference api allowing you to increment and decrement references to its objects, and allowing you to create your own objects on the Lua GC and reference them the same way. This means that you can basically have a variant of the shared_pointer but managed by the Lua GC.

In mruby when you create a function you are forced to assign it to a class with a name. However, you can create a proc but it has a transparent self keyword, which means its impossible to associate it to an object.

As a result, the easiest way to work with mruby-cpp right now is to create classes and bind functions to them so you can use them in mruby. It makes little to no sense creating a function and assigning it to a variable. I am still working on a nice way to do this, and I spent hours trying to find a way to pass a function to mruby as a callable object.

The differences between procs, objects, classes and modules also mean that it was difficult to create an mruby::Object class. Meaning, pure ruby objects cannot really be passed to C++ with mruby-cpp right now.

Jepsen and Docker

2018-08-16T00:06:00+00:00

Recently I have been reading a series of very interesting posts by an Aphyr who tests distributed databases and then publishes their performance under network partition conditions, clock skews, dead nodes and other interesting failure conditions.

The tool he uses is called Jepsen which he wrote himself, and the repo itself includes the set of tests that he used on various databases. After reading all the posts about how various supposedly industry-standard (I guess this is the industry standard) databases which claim to be quite reliable fail under precisely the conditions they are not supposed to fail under, I decided to try the tool out myself and document the experience.

My first thought about this tool was that it would be a great application of Docker, and it turns out he had already written a Docker harness for Jepsen. This was great since you could do all sorts of things in the container environment of Docker and still simulate network partitions and all those nasty things without needing too much hardware.

I checked out Jepsen and ran the scripts the way it told me to.

$ sh up.sh
[INFO] Generating key pair
Generating public/private rsa key pair.
Your identification has been saved in ./secret/id_rsa.
.
.
.

And then it goes off and builds a bunch of containers from scratch.

Then at the very end (after maybe 15 mins of downloading and installing) it goes

.
.
.
jepsen-control | Please run `docker exec -it jepsen-control bash` in another terminal to proceed.

Excellent I can run it now!

So I open another terminal and type it in and instantly I’m transported to where I should be to start working with Jepsen. Hooray!

$ docker exec -it jepsen-control bash
Welcome to Jepsen on Docker
===========================

This container runs the Jepsen tests in sub-containers.

You are currently in the base dir of the git repo for Jepsen.
If you modify the core jepsen library make sure you "lein install" it so other tests can access.

To run a test:
   cd etcd && lein run test --concurrency 10
root@control:/jepsen#

Let’s try the next command. It looks to me like I should try running the test on etcd with 10 clients?

# cd etcd && lein run test --concurrency 10

It then goes off and spews a large amount of output

16:09:00.452 [main] INFO  jepsen.cli - Test options:
 {:concurrency 10,
 :test-count 1,
 :time-limit 60,
 :nodes ["n1" "n2" "n3" "n4" "n5"],
 :ssh
 {:username "root",
  :password "root",
  :strict-host-key-checking false,
  :private-key-path nil}}
.
.
.

It basically goes off and does a large number of reads and writes. At the very end, it comes back with this:

Analysis invalid! (ﾉಥ益ಥ）ﾉ ┻━┻

Wow that was quick. I did not expect to run into an etcd problem so fast. I did not even specify any failures.

Seems like I had some code to read. After fumbling around I discovered the source file.

Aha.

...
(merge tests/noop-test
         {:name "etcd"
          :os debian/os
          :db (db "v3.1.5")
          :client (client nil)
          :nemesis (nemesis/partition-random-halves)
          :model  (model/cas-register)
...

Looks like the Jepsen nemesis was baked into the test script. Well thats fine for now.

Knossos

The next step basically to me was to find out what was wrong. There was a large amount of data output so there must be a tool of some sort that can tell me what was wrong. (Or so I imagined)

After a bit of googling I came across another project called Knossos.

I didn’t know very much about Clojure so I just followed through the README anyway. So first I cloned knossos

# git clone https://github.com/jepsen-io/knossos
# cd knossos

I dug around to find the .edn file from my Jepsen run and discovered it in /jepsen/etcd/store/latest/history.edn

Then I ran the command in the README

# lein run --model cas-register /jepsen/etcd/store/latest/history.edn 
/jepsen/etcd/store/latest/history.edn false

Excellent. Yes. I knew that.

Scrolling down a little in the README I discovered it could generate the linearizability graph too. That was cool so I tried, by following the code in the README

root@control:/knossos# lein repl
nREPL server started on port 37400 on host 127.0.0.1 - nrepl://127.0.0.1:37400
REPL-y 0.3.7, nREPL 0.2.12
Clojure 1.8.0
OpenJDK 64-Bit Server VM 1.8.0_181-8u181-b13-0ubuntu0.16.04.1-b13
    Docs: (doc function-name-here)
          (find-doc "part-of-name-here")
  Source: (source function-name-here)
 Javadoc: (javadoc java-object-or-class-here)
    Exit: Control+D or (exit) or (quit)
 Results: Stored in vars *1, *2, *3, an exception in *e

I am then dropped into the Clojure prompt, where I try clumsily to guess how things are done:

knossos.cli=> (def h (read-history "/jepsen/etcd/store/latest/history.edn"))
#'knossos.cli/h
knossos.cli=> (def a (competition/analysis (model/cas-register) h))
#'knossos.cli/a
knossos.cli=> (require '[knossos.linear.report :as report])
nil
knossos.cli=> (report/render-analysis! h a "linear.svg")
AssertionError Assert failed: No invocation for op {:process 2, :type :ok, :f :read, :value [0 4], :index 12, :time 421990474}
inv  knossos.linear.report/time-coords/fn--5143 (report.clj:186)

Okay, I had no idea what that meant. Looks like there are still kinks.

Thoughts

In short I have a few thoughts:

Using docker for this is way cool
The analysis tool sucks
This can be so much more general

Using docker is way cool

While Aphyr says that he uses his own LXC stuff to work with, I think its cool that he does provide a Docker interface. However he isn’t taking advantage of a few things docker can do:

Prebuilt docker containers
Networking in docker that can be used to simulate faults
Skewing clocks in the container
Adding and removing new containers, simulating scaling effects and changing configuration

It is probably because the tool is in its early stages of development, and really he knows how to use it the best, so he can do his stuff the best. But I can’t help but feel he himself would benefit from some improvement in this domain. Perhaps he does not use it right now, or maybe he does in another way. I’ve only sat with this code for the last 4 hours.

The analysis tool sucks

It needs to recover from weird errors nicely and maybe still present some amount of information. Unfortunately it doesn’t, so I will need to spend some time figuring out what I did wrong. Once I get it then I will be able to truly start analysing databases myself.

This can be so much more general

All in all it looks like a large amount of the code done here is still quite research quality and tests, while composable seemed like they had to be baked into the testing code still. On the surface it felt to me that skew clocks, network failures, network delays and failing environments were things that did not have to be part of the test and could be external.

None of this makes any of Jepsen less cool

In my opinion all one should really need to do is specify what to do in order to perform a read, write or CAS, and the testing tool should attempt a combination of the three to figure out how a database puts up with it.

Its also likely to be the case because different databases are different. Some are queues, some are data structures, some are RDBMS and some are key value stores. However they all should have the same idea that storing data stores, and automated analysis that shows if it is CP or AP and how would be incredible. This is something I might work on in a later post when I have time.

Conclusions

The world needs more of these. This run was an eye-opener to real simulations of faults, and also the idea that distributed databases are actually extremely hard to understand and build. It also has shown me how one person who makes an effort to do something advanced and difficult and documents it in a clear manner can so quickly change the landscape of a field.

Thanks to him I am sure I and a large number of people now understand the definitions and jargon behind distributed databases which have been hidden behind walls of text in white papers and difficult to read (read longwinded) articles, and this really opens the door to helping people build better and more resilient systems.

This probably will not be the last post on distributed databases on the blog as I, as are many others, interested in how things can scale safely. We have more or less solved it for webservers, but scaling databases, that is clearly a different beast.

An Utilitarian View on Intelligence

2016-04-09T22:56:00+00:00

The recent victories of AlphaGo has been but an interesting demonstration that machines are finally able to outplay masters at a highly complex game. Some may consider this to be the beginning of a real takeover of machines ala The Terminator, which to me is simply a fear of that which we do not understand.

As one of the many traits we can assign to people, intelligence is just one of them. However it seems to be a very valuable and desirable trait to many, and people often seek to measure this “intelligence” to order people by usefulness. Intelligence is measured in many different ways, from highly formal methods such as IQ tests to empirical rules judging by their personalities or the company they keep.

However let us first understand that intelligence is indeed valuable and while measuring it accurately may not be easy, measuring its value may be easier. How do we do this? We can try a very simple form of intelligence that we have developed ourselves called “artificial intelligence”.

Artificial intelligence comes in many different forms such as a computer player in games, sudoku solvers, self driving cars, the ability to read and parse written text and so on. However even with this many forms of artificial intelligence, their value is always a measure of the quality of their decisions in the domain. For example, an AI that is capable of beating most players at a game is considered better than an AI that does not. A self driving car that has less accidents on the road has a higher value than those that have more accidents. In essence the final value of a machine driven by some form of AI is always measured by its track record of decisions and results.

Thus the real value of an AI is the quality of the decisions it forms.

So what has AI really got to do with human intelligence? Well, I believe that AI and human intelligence actually are similar and almost equivalent in their playing field.

In its most degenerate form, being able to count is a quality humans learn and find useful, and which humans will practice in many fields. However, under fatigue, a human may count wrongly and this may result in bad decisions. Machines have mostly replaced humans in this regard due to their ability to more consistently count without errors (notwithstanding their ability to do it faster). This explains the great drop in secretaries and clerical jobs.

As time goes by the number of machines that can more consistently make better decisions about the actions they have to take will increase, and humans are forced to compete with this. After the industrial revolution, the value of labor has actually steadily dwindled as energy becomes more available and machines allow the conversion of energy sources into more kinds of value. Our value is more and more tied to the quality and weight of our decisions.

High level managers in companies and leaders are often held at a higher value than those who perform labor intensive tasks such as transporting due to the low value of labor and the high value of decisions. Certain labor intensive jobs do not have high value but are still performed by humans because humans are still capable of making decisions on the job that are better than machines, such as janitors and construction work, although cleaning is slowly being taken over by machines as exemplified by robots such as the Roomba.

In many ways we are already competing with machines to show that human decisions are still better than machine decisions in many fields. This means that if we were to assign a dollar value to machines that can do a particular work as well as a human being, that dollar value will be assigned to a human being in that field of work as well, essentially proving the equivalence of an AI and human intelligence in their ability to make decisions.

We are now in constant competition with machines to demonstrate our value through our ability to make better decisions on the actions we make, so the real value of human intelligence and thus the human who has that intelligence is also the quality of the decisions it makes him make, equivalent to the value of artificial intelligence itself.

IQ tests and apitude examinations have seemed to miss the point of intelligence and tested for a more abstract and less substantial quality of humans.

So in many ways, our old view of hard work as a good quality of a person falls in the face of scrutiny when we consider the modern realities we live in. Leaving decisions to another has always been tolerated, and blame and responsibility assigned to another, when the true value of our place in society actually comes from the decisions we make. The application of hard work is also a decision we often have to make but allow others to make for us. Perhaps we should begin to teach children less about working hard and more about learning to make good decisions.

Rotating Integrals

2015-12-22T01:11:00+00:00

This post assumes you know some trigonometry and can do some integral calculus on it.

My little brother in college complained about sines and cosines in an indefinite integral today. Specifically, an integration that he was annoyed with was:

\[\int{x^3\,sin(x)\,dx}\]

It sounded like a whine at first but upon closer inspection, his complaints are not unwarranted. This is indeed a troublesome kind of indefinite integral. There is a trick to doing it quickly, but let us try and perform this particular integration normally first:

Since this is integration of the product of two functions, we perform integration by parts.

\[\int{u\,dv} = uv\, - \int{v\,du}\]

I choose \(u = x^3\) and \(dv = sin(x)\,dx\) mostly to avoid having to integrate the \(x^3\) and have fractions mess up my working.

This results in:

\[ \begin{aligned} u & = x^3 \cr du & = 3x^2\,dx \cr dv & = sin(x)\,dx \cr v & = -cos(x) \end{aligned} \]

Hence:

\[ \begin{aligned} \int{u\,dv} & = uv\, - \int{v\,du} \cr & = -x^3\,cos(x)\, - \int{–3x^2 cos(x)\,dx} \cr & = -x^3\,cos(x) + \int{3x^2 cos(x)\,dx} \end{aligned} \]

We seem to have ended up with another integral to solve: \(\int{cos(x)(3x^2)\,dx}\), so we shall proceed to solve it.

\[ \begin{array}{cc} \begin{aligned} \int{u\,dv} & = uv\, - \int{v\,du} \cr \int{3x^2 cos(x)\,dx} & = 3x^2 sin(x)\, - \int{6x\,sin(x)\,dx} \end{aligned} & \begin{aligned} u & = 3x^2 \cr du & = 6x\,dx \cr dv & = cos(x)\,dx \cr v & = sin(x) \end{aligned} \end{array} \]

How vexing. We have yet another integral to solve. Let us proceed anyway.

\[ \begin{array}{cc} \begin{aligned} \int{u\,dv} & = uv\, - \int{v\,du} \cr \int{6x\,sin(x)\,dx} & = -6x\,cos(x)\, - \int{-6\,cos(x)\,dx} \cr \int{6x\,sin(x)\,dx} & = -6x\,cos(x) + \int{6\,cos(x)\,dx} \end{aligned} & \begin{aligned} u & = 6x \cr du & = 6\,dx \cr dv & = sin(x)\,dx \cr v & = -cos(x) \end{aligned} \end{array} \]

Once again, we have to solve another integral. This one seems to be the last though, because the x term has withered away into oblivion, it becomes a trivial integral.

\[ \begin{aligned} \int{6x\,sin(x)\,dx} & = -6x\,cos(x) + \int{6\,cos(x)\,dx} \cr & = -6x\,cos(x) + 6\int{cos(x)\,dx} \cr & = -6x\,cos(x) + 6\,sin(x) \end{aligned} \]

Bringing it all together, we have:

\[ \begin{aligned} \int{x^3\,sin(x)\,dx} & = -x^3\,cos(x) + \int{3x^2 cos(x)\,dx} \cr & = -x^3\,cos(x) + 3x^2 sin(x)\, - \int{6x\,sin(x)\,dx} \cr & = -x^3\,cos(x) + 3x^2 sin(x)\, - [-6x\,cos(x) + \int{6\,cos(x)\,dx}] \cr & = -x^3\,cos(x) + 3x^2 sin(x)\, - [-6x\,cos(x) + 6\,sin(x)] \cr & = -x^3\,cos(x) + 3x^2 sin(x)\, + 6x\,cos(x) - 6\,sin(x) \end{aligned} \]

Phew! Basically due to the fact that you have to integrate by parts as many times as the power of x, the amount of integration you have to do increases proportionally to x’s power, making it troublesome.

A Closer Look

However, if you have been following the working closely, you will notice that all we are really doing when we integrate by parts is performing a derivative of both parts over and over again. Notice how the \(sin(x)\) term keeps flipping between \(sin(x)\) and \(cos(x)\).

Better yet, if you differentiate \(sin(x)\) 4 times, the entire thing rotates back to \(sin(x)\) again. Since we integrated \(sin(x)\) 4 times in this example, that is exactly what happened. I call this kind of integral a Rotating Integral because the trigonometric term constantly rotates between the four combinations of negative and positive, \(sin(x)\) and \(cos(x)\).

Notice also that the \(u\) term we chose right at the beginning was differentiated at every step until it became a constant:

\[x^3\,\rightarrow\,3x^2\,\rightarrow\,6x\,\rightarrow\,6\]

If we factorize the result, we get the following:

\[ -x^3\,cos(x) + 3x^2 sin(x)\, + 6x\,cos(x) - 6\,sin(x) = (-x^3 + 6x)\,cos(x) + (3x^2 - 6)\,sin(x) \]

Notice how there is a negative term in both the \(sin(x)\) and \(cos(x)\) groups. Since the starting \(u\) we chose was positive, we can see the \(sin(x)\) go through

\[-cos(x)\,\rightarrow\,-sin(x)\,\rightarrow\,cos(x)\,\rightarrow\,sin(x)\]

We can see a pattern of repeating derivatives distributed among the \(sin(x)\) and \(cos(x)\) groups. If we recognize that

\[ \begin{array}{cc} x^3 &\rightarrow\,3x^2 &\rightarrow\,6x &\rightarrow\,6 \cr f &\rightarrow\,f^\prime &\rightarrow\,f^{\prime \prime} &\rightarrow\,f^{\prime \prime \prime} \cr \end{array} \]

And from our rearranged result,

\[ (-f + f^{\prime \prime})\,cos(x) + (f^\prime - f^{\prime \prime \prime})\,sin(x) \]

This peaked my interest, and I decided to check the results of other powers of x:

\[ \begin{array}{cc} 3 & (-x^3 + 6x)\,cos(x) & + (3x^2 - 6)\,sin(x) \cr 4 & (-x^4 + 12x^2 - 24)\,cos(x) & + (4x^3 - 24x)\,sin(x) \cr 5 & (-x^5 + 20x^3 - 120x)\,cos(x) & + (5x^4 - 60x^2 + 120)\,sin(x) \cr 6 & (-x^6 + 30x^4 - 360x^2 + 720)\,cos(x) & + (6x^5 - 120x^3 + 720x)\,sin(x) \cr 7 & (-x^7 + 42x^5 - 840x^3 + 5040x)\,cos(x) & + (7x^6 - 210x^4 + 2520x^2 - 5040)\,sin(x) \end{array} \]

The Solution!

After doing some more wolframming, I found that we have a general solution of:

\[ \int{f\,sin(x)\,dx} = (- f + f^{\prime \prime} - f^{\prime \prime \prime \prime} + \cdots - f^{\prime2n})\,cos(x) + (f^{\prime} - f^{\prime \prime \prime} + f^{\prime \prime \prime \prime \prime} - \cdots + f^{\prime2n+1})\,sin(x) \]

Where odd numbered differentials are on the \(sin(x)\) group and even numbered differentials are on the \(cos(x)\) group, and each differential alternates between negative and positive signs. The \(sin(x)\) group starts with a positive sign while the \(cos(x)\) starts with a negative sign.

Of course, the same rule slightly modified works for the \(cos(x)\) form:

\[ \int{f\,cos(x)\,dx} = (f - f^{\prime \prime} + f^{\prime \prime \prime \prime} - \cdots + f^{\prime2n})\,sin(x) + (f^{\prime} - f^{\prime \prime \prime} + f^{\prime \prime \prime \prime \prime} - \cdots + f^{\prime2n+1})\,cos(x) \]

Note that this only works if \(f\) eventually derives into 0, and if the trigonometric function is \(sin(x)\) or \(cos(x)\). Otherwise you will need another method of finding the solution.

Weird Application

With this we can try and verify that what I said was true by trying to use it on a very special function that does not change even when under differentiation: \(e^x\). Even though I said this only works if it eventually derives to zero, let us try anyway:

So for this let us try the following function:

\[\int{e^x\,sin(x)\,dx}\]

This should give us

\[ \begin{align} \int{e^x\,sin(x)\,dx} & = (- e^x + e^x - e^x + \cdots)\,cos(x) + (e^x - e^x + e^x - \cdots)\,sin(x) & \cr & = (- e^x + e^x - e^x + \cdots)\,cos(x) - (-e^x + e^x - e^x + \cdots)\,sin(x) & \text{If we flip the sign}\cr & = (- e^x + e^x - e^x + \cdots)(cos(x) - sin(x)) & \text{Hence} \cr & = (-1+1-1+\cdots)(e^x)(cos(x) - sin(x)) & \end{align} \]

We are left with a strange, almost nonsensical looking sum that looks like it should evaluate to zero, but does it?

\[ \begin{align} S & = -1+1-1+1-1+1\cdots \cr S & = -1-S \cr 2S & = -1 \cr S & = -\frac{1}{2} \cr \end{align} \]

Oh so that infinite sum actually evaluates to \(-\frac{1}{2}\). This may seem dubious but proving that is outside the scope of this post. (Lots of people have done it anyway.) This just means that our thing above should look like this:

\[ \begin{align} \int{e^x\,sin(x)\,dx}& = (-1+1-1+\cdots)(e^x)(cos(x) - sin(x)) \cr & = -\frac{1}{2}(e^x)(cos(x) - sin(x)) \cr & = \frac{1}{2}(e^x)(sin(x)-cos(x)) \cr \end{align} \]

But is this correct? We must try and check again that we did not just stumble on nonsense.

Verifying again

To verify, let us perform it again with good ol' integration by parts.

\[\int{e^x\,sin(x)\,dx}\]

\[ \begin{array}{cc} \begin{aligned} \int{u\,dv} & = uv\, - \int{v\,du} \cr \int{sin(x)(e^x)\,dx} & = sin(x)(e^x)\, - \int{e^x\,cos(x)\,dx} \end{aligned} & \begin{aligned} u & = sin(x) \cr du & = cos(x)\,dx \cr dv & = e^x\,dx \cr v & = e^x \end{aligned} \end{array} \]

We have to do this integration again for \(cos(x)\), so let us do so.

\[ \begin{array}{cc} \begin{aligned} \int{u\,dv} & = uv\, - \int{v\,du} \cr \int{e^x\,cos(x)\,dx} & = cos(x)(e^x)\, + \int{e^x\,sin(x)\,dx} \end{aligned} & \begin{aligned} u & = cos(x) \cr du & = -sin(x)\,dx \cr dv & = e^x\,dx \cr v & = e^x \end{aligned} \end{array} \]

Substituting back, we get:

\[ \begin{aligned} \int{sin(x)(e^x)\,dx} & = sin(x)(e^x)\, - \int{e^x\,cos(x)\,dx} \cr & = sin(x)(e^x)\, - cos(x)(e^x)\, - \int{e^x\,sin(x)\,dx} \cr 2\int{sin(x)(e^x)\,dx} & = sin(x)(e^x)\, - cos(x)(e^x) \cr \int{sin(x)(e^x)\,dx} & = \frac{1}{2}(e^x)(sin(x)-cos(x)) \end{aligned} \]

Which is the same result we got before, proving that we didn’t screw up.

With this, you can find the indefinite integral of expressions of this form by simply differentiating the non-trigonometric side. You can also use identities, factorizations and other ways to rearrange an expression into this form to use this technique to integrate this otherwise troublesome class of Rotating Integrals.

Finding Square Roots

2015-06-12T01:36:00+00:00

Most of us has at some point in our lives used the Math.sqrt() function. We would even know that \(\sqrt{2} = 1.414\ldots\). However, we never really give how it is implemented a second thought. Thus is the power of a tight abstraction. For those of us who lived in the age of calculators, finding square roots has always been a source of mystery. For me during my teenage years, calculating a square root has really just been about looking up a numeric table.

Obviously this very simple problem would have been solved close to the beginnings of civilization, and blazingly fast methods must already exist. But for sake of exploration, let us examine how we would implement a square root function should we need to.

\[\sqrt{-n} = \sqrt{-1} \sqrt{n}\]

Since square roots of negative numbers are really just square roots of positive numbers times the square root of a negative number, let us focus our efforts on finding just the square roots of positive real numbers.

One way to find a square root is by looking at this graph of \(y=x^2\). Simply drawing a horizontal line from the y-axis from whose square root we desire to the graph line, and then find the intersect on the x-axis. So say we want \(\sqrt{30}\).

We simply draw a line from the y-axis at 30 to the graph and find that it is located somewhere between 5.4 and 5.6 along the x-axis, but closer to 5.4. This makes sense. A quick glance at google) shows \(\sqrt{30} = 5.4772255\)

This in-betweenness tells us something about square roots. Besides the nice square roots, the ugly ones don’t seem to have any end to their decimals. Hence, they are always in between two numbers that we know. This means that we can keep making educated guesses about where the square root is until we get close enough.

\[0 < \sqrt{30} < 30\]

Let’s apply this knowledge first and do the calculation by hand for the number 30 since we know its answer so we can verify that our method is correct.

First of all, we must figure out where the root is. Obviously, it would be less than 30, since you need to multiply it by itself to get 30. Its also more than zero, because as mentioned earlier, you will always end up with a positive number.

\[\sqrt{30} \approx 15?\]

The number most in between of 0 and 30 is 15. But is this the number? The only way to check is by squaring 15.

\[15^2 = 225\]

Nope. Not even close. 225 is waaay too big. But this tells us something: the root cannot be bigger than 15 because if it was, we would get an even bigger square than 30, so we must look to the left of 15. This means our upper bound is now 15 instead of 30.

\[0 < \sqrt{30} < 15\]

The number in the middle this time is 7.5. Is this our number?

\[7.5^2 = 56.25\]

Still too big. This means our root must be smaller than 7.5, but still bigger than zero.

\[0 < \sqrt{30} < 7.5\]

The number in the middle of 7.5 is 3.75. Perhaps this is our number?

\[3.75^2 = 14.0625\]

It seems the square of 3.75 is smaller than 30! This means that that the root must be bigger than 3.75 since if the number is smaller, we get an even smaller square, we must be getting close.

\[3.75 < \sqrt{30} < 7.5\]

Now we must find out what number is in the middle. Well, from our geometry class we know that the average of two numbers is the middle number, so \(\frac{3.75 + 7.5}{2} = 5.625\). We simply need to square this number to check now.

\[5.625^2 = 31.640625\]

It seems we have gotten a lot closer now that we moved the lower bound up. Our guess of 5.625 seems really close to the answer now. But because its square is bigger than 30, we know the root must be smaller than 5.625, so:

\[3.75 < \sqrt{30} < 5.625\]

The middle number of this is then 4.6875.

\[4.6875^2 = 21.972656\]

That’s really far away. This can’t be the answer, but lets keep looking. Since its square is smaller than 30, we change the lower bound to this.

\[4.6875 < \sqrt{30} < 5.625\]

This time, the middle number is 5.15625.

\[5.15625^2 = 26.586914\]

We are getting closer again, but this is smaller than 30, so our lower bound should be changed.

\[5.15625 < \sqrt{30} < 5.625\]

I know, it starts to get frustrating at this point since we seem to be crawling, but lets stay on for another two rounds. The number in the middle is 5.390625

\[5.390625^2 = 29.058838\]

Our result now is very very close to 30. It is smaller, so the lower bound should be changed.

\[5.390625 < \sqrt{30} < 5.625\]

The middle number is now 5.507812. That means the square is…

\[5.507812^2 = 30.336\]

Okay! We are pretty close, but the important thing is we know it will eventually arrive at the answer because we kept getting closer to the answer as we went. Now, we know computers do things faster than we do so let us write some code!

So first of all, we want to have our initial guess. The number to be rooted is 30.

double number = 30;

double upperBound = number;
double lowerBound = 0;

Then, we write down what we did in every iteration.

// We try and figure out if the middle number is the correct root
double rootGuess = (upperBound + lowerBound) / 2;
double rootGuessSquared = rootGuess * rootGuess;

if (rootGuessSquared > number)
{
  // if the square of our guess is bigger than the number, that means the root is too big.
  // so the root cannot be bigger than our current guess
  upperBound = rootGuess;
}
else
{
  // otherwise, the root cannot be smaller than our current guess.
  lowerBound = rootGuess;
}

Next, we tell the computer to do it over and over again until the square of our guess is the number itself, so we wrap that in a loop.

double rootGuess = 0;
do
{
  // We try and figure out if the middle number is the correct root
  rootGuess = (upperBound + lowerBound) / 2;
  double rootGuessSquared = rootGuess * rootGuess;

  if (rootGuessSquared > number)
  {
      // if the square of our guess is bigger than the number, that means the root is too big.
      // so the root cannot be bigger than our current guess
      upperBound = rootGuess;
  }
  else
  {
      // otherwise, the root cannot be smaller than our current guess.
      lowerBound = rootGuess;
  }

} while(rootGuess * rootGuess != number);

Great, that seems simple enough. Let’s put it all together.

double number = 30;

double upperBound = number;
double lowerBound = 0;

double rootGuess = 0;
do
{
  // We try and figure out if the middle number is the correct root
  rootGuess = (upperBound + lowerBound) / 2;
  double rootGuessSquared = rootGuess * rootGuess;

  if (rootGuessSquared > number)
  {
      // if the square of our guess is bigger than the number, that means the root is too big.
      // so the root cannot be bigger than our current guess
      upperBound = rootGuess;
  }
  else
  {
      // otherwise, the root cannot be smaller than our current guess.
      lowerBound = rootGuess;
  }

  printf("%lf < sqrt(30) < %lf guess=%lf rootGuessSquared=%lf\n", upperBound, lowerBound, rootGuess, rootGuessSquared);

} while(rootGuess * rootGuess != number);

printf("result = %lf\n", rootGuess);

Notice I put in a printf to check the bounds as it runs. This makes it more interesting and actually shows us what’s going on. Lets run it now.

000000 < sqrt(30) < 0.000000 guess=15.000000 rootGuessSquared=225.000000
500000 < sqrt(30) < 0.000000 guess=7.500000 rootGuessSquared=56.250000
500000 < sqrt(30) < 3.750000 guess=3.750000 rootGuessSquared=14.062500
625000 < sqrt(30) < 3.750000 guess=5.625000 rootGuessSquared=31.640625
625000 < sqrt(30) < 4.687500 guess=4.687500 rootGuessSquared=21.972656
625000 < sqrt(30) < 5.156250 guess=5.156250 rootGuessSquared=26.586914
625000 < sqrt(30) < 5.390625 guess=5.390625 rootGuessSquared=29.058838
507812 < sqrt(30) < 5.390625 guess=5.507812 rootGuessSquared=30.335999
507812 < sqrt(30) < 5.449219 guess=5.449219 rootGuessSquared=29.693985
478516 < sqrt(30) < 5.449219 guess=5.478516 rootGuessSquared=30.014133
478516 < sqrt(30) < 5.463867 guess=5.463867 rootGuessSquared=29.853845
478516 < sqrt(30) < 5.471191 guess=5.471191 rootGuessSquared=29.933935
478516 < sqrt(30) < 5.474854 guess=5.474854 rootGuessSquared=29.974021
478516 < sqrt(30) < 5.476685 guess=5.476685 rootGuessSquared=29.994074
477600 < sqrt(30) < 5.476685 guess=5.477600 rootGuessSquared=30.004103
477600 < sqrt(30) < 5.477142 guess=5.477142 rootGuessSquared=29.999088
477371 < sqrt(30) < 5.477142 guess=5.477371 rootGuessSquared=30.001595
477257 < sqrt(30) < 5.477142 guess=5.477257 rootGuessSquared=30.000342
477257 < sqrt(30) < 5.477200 guess=5.477200 rootGuessSquared=29.999715
477228 < sqrt(30) < 5.477200 guess=5.477228 rootGuessSquared=30.000028
477228 < sqrt(30) < 5.477214 guess=5.477214 rootGuessSquared=29.999872
477228 < sqrt(30) < 5.477221 guess=5.477221 rootGuessSquared=29.999950
477228 < sqrt(30) < 5.477225 guess=5.477225 rootGuessSquared=29.999989
477226 < sqrt(30) < 5.477225 guess=5.477226 rootGuessSquared=30.000009
477226 < sqrt(30) < 5.477225 guess=5.477225 rootGuessSquared=29.999999
477226 < sqrt(30) < 5.477225 guess=5.477226 rootGuessSquared=30.000004
477226 < sqrt(30) < 5.477225 guess=5.477226 rootGuessSquared=30.000001
477226 < sqrt(30) < 5.477225 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
477226 < sqrt(30) < 5.477226 guess=5.477226 rootGuessSquared=30.000000
result = 5.477226

After all that spam, now we can see that the result is the square root of 30, since we know 5.477226 is the square root of 30.

This method of calculating square roots is called the Bisection method, also known as a binary search. Its not very efficient as you can see, and takes a long time to converge.

Now we know what it takes to find a square root! Now you can simply try it with a bunch of numbers and get the right answer. You can also compare the answer with the actual sqrt() function provided by the standard library.

It is also interesting to note that you can use this method to get cube roots, quartic roots and quintic roots too.

Script for setting up FTP to a folder on a Mac

2015-01-23T10:34:00+00:00

This script came in handy many times when I had to share things with my other laptops or windows users.

On a Mac you should have Ruby installed. Macs normally come with an ftpd whose frontend has been ripped out, so you can only do this on the command line. Basically, write this to a script file (lets call it setftp)

and then use it by typing:

./setftp /directoryIWantToShare

Tested on Mavericks

#!/usr/bin/env ruby

#puts ARGV.inspect

puts `sudo -s launchctl unload -w /System/Library/LaunchDaemons/ftp.plist`

config = <<-THEFILE
# Set the ftp root dir to this folder
umask all 022
chroot GUEST #{ARGV[0]}
modify guest off
umask  guest 0707
upload guest on
THEFILE

File.open("/etc/ftpd.conf", 'w') { |file| file.write(config) }

puts `sudo -s launchctl load -w /System/Library/LaunchDaemons/ftp.plist`

With this you will have an ftpd turn on whose root folder is the folder you named. in this case it is /directoryIWantToShare

Hexlife Part 2

2014-11-22T11:47:00+00:00

Yesterday while fooling around, I wanted to build a tri-star, but I made a mistake and made this instead:

But to my surprise I did not find a tri-star, instead this grew out of it:

Since I had made changes to tint different generations with different colors, I wanted to test it. Wondering if I had introduced bugs, I went through my git history to make sure everything was fine. After reverting, it turned out I hadn’t. It was just a variation I had never tried.

In my earlier post, I said that still lifes could not exist under the rules because cells could not live long enough, but these structures persisted. In other words, they were non-transient cells.

It turns out that most of the cells here had 3 neighbours, and that gave them a long lease on life. Also, all the way along the columns, there were no dead cells that had 2 cell neighbours, meaning that the cells would never die because of their spawn.

However, the ends are kept alive by an interesting “flower” oscillator that resembles the twinkling star oscillator, and this keeps the entire structure alive. Basically, you can make an infinitely long structure, but you must place the oscilators at the ends.

Armed with this knowledge, I set out to create a simpler column.

I call this kind of life form a Semi-still life, since a large proportion of it can be unchanging, but must be supported by oscillation.

Perhaps there are more of these under these rules…

Hexlife

2014-11-21T10:46:00+00:00

Last week I wrote a simple Game of Life variation that runs on a hexagonal grid after viewing an example on Wikipedia. The cells considered are the immediate neighbours. The rules are:

if a dead cell is surrounded by 2 live cells, the dead cell becomes alive
if a live cell is surrounded by 3 or 4 live cells, it stays alive
in all other cases, the cell dies

One of the interesting things about this set of rules is the absence of still life. The reason for this absence is that a cell can only be born under conditions where a live cell would otherwise die. This means that whenever a cell is born it is likely to kill its parents.

Hence the only known stable life forms in this set of rules at the moment are oscillators. No gliders have been found yet.

You can play with my implementation here: http://davidsiaw.github.io/hexlife/

In my search for a glider in this set of rules, I found a bunch of oscillators and decided to record them:

Picture	Description
	The 2cell is the simplest and most common oscillator. It is left behind by almost any unstable life.
	The Spinner is another common oscillator that has a period of 2. It simply looks like it is spinning.
	The Mouth looks like a spinner but instead of 2nd level adjacent, they are 3rd level adjacent, so it looks like it is always opening and closing
	The Needle has a period of 2 and flips back and forth.
	The Dancer has a period of 2 and looks like its swinging back and forth.
	The Star looks like a twinkling star. It is quite peculiar in the sense that it has a period of 3.
	The Rotator has a period of 4 and looks like it is spinning in a weird way.
	The Bat is perhaps the most common 4-period oscillator you get from random starts.
	The Snake is a period-4 oscillator that looks like a snake that wiggles around
	The Morpher is really simple but really interesting-looking oscillator. It has got a period of 12 and transforms into all its possible orientations. This means that even though it has no symmetry, it does not matter which way you orient it, it will achieve the same configurations. I call this temporal homogeneity.
	The Tristar is a period-12 oscillator that twinkles in a more elaborate way than the star.
	The Swimmer is an oscillator with a period of 48. You can actually find it on the Wikipedia page I linked. It seems like a lost fish swimming back and forth.

If you find more oscillators please leave a comment!

Using Travis to deploy my blog

2014-10-30T01:07:00+00:00

I’ve been using Travis-CI more and more as a platform from which I can deploy things, due to the fact that we can run any code on it. Today I made it so that this blog is deployed to gh-pages when pushed. I have also set up my personal blog to be pushed this way as well.

Why did I use Travis-CI instead of Shippable or other CI systems for this? Well, its mainly due to the fact that I was already using Travis, and the tools (specifically the Travis gem) are quite mature. Many of the things that are quite troublesome, like generating a key and placing decrypt commands into the .travis.yml, are now covered in simple command line instructions.

My blog uses Jekyll + Octopress, but I don’t like the limitations imposed by github on the templates I can use. So I decided it was better to simply upload the finished product. First of all, I push all my blog sources up to a public repository at https://github.com/davidsiaw/davidsiaw.github.io.source

While the setup is easy, its not obvious that you can do this. Hopefully this will go some way to helping others who want to circumvent the github limitations on their gh-pages content as well.

In this post I will show you how to set it up. First of all, I create a key that will give push access to my blog’s repository at https://github.com/davidsiaw/davidsiaw.github.io by calling up ssh-keygen

nagatsuki david$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/david/.ssh/id_rsa): deploy_key
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in deploy_key.
Your public key has been saved in deploy_key.pub.
The key fingerprint is:
89:8d:a9:60:5c:8b:77:05:c4:2b:05:a4:96:64:8e:fa david@nagatsuki
The key's randomart image is:
+--[ RSA 2048]----+
|     .=+o        |
|     *  o.       |
|    = o. o.      |
| o = ..=.o       |
|  = . =.S        |
|   o o .         |
|    E            |
|                 |
|                 |
+-----------------+

I then place the deploy_key.pub in my github repository.

Next, I make use of the Travis gem to encrypt my private key. I add the --add parameter to make it write to my .travis.yml (I am in the directory.)

nagatsuki david$ travis encrypt-file deploy_key --add
encrypting deploy_key for davidsiaw/davidsiaw.github.io
storing result as deploy_key.enc
storing secure env variables for decryption

Make sure to add deploy_key.enc to the git repository.
Make sure not to add deploy_key to the git repository.
Commit all changes to your .travis.yml.

This gives me a deploy_key.enc that is my encrypted private key.

In order to use this key, I need to add some more lines to .travis.yml to enable it to push to github. First of all, I need to install the key into the .ssh folder so git can use it. I also chmod it so ssh will not complain.

- chmod 600 deploy-key
- cp deploy-key ~/.ssh/id_rsa

With this, I can now tell Travis to push the generated files. All I do is tell it to generate the site (since this is just Jekyll), and then call my deploy script which simply pushes the right stuff up to github.

- bundle exec rake generate
- bash deploy

With this, my website gets updated everytime I push my changes to https://github.com/davidsiaw/davidsiaw.github.io.source, Travis will automatically update my blog.

New Blog

2013-10-12T16:03:00+00:00

As all of you may have noticed the theme has changed and the URL seems to have been redirected to Github. I decided to try out this github pages thing and it seems to work pretty well. The static content idea is really attractive to me and really gives me much more power over the pages with a small tradeoff for convenience. But convenience is a non-issue for most programmers who can write tools to make the inconvenient convenient.

What I want to talk about here is actually how I started doing this and why I ended up moving my blog across, along with what I learned along the way about Jekyll-Bootstrap, Octopress, Jekyll, the annoying problems and the process of migrating from a WordPress blog to a Octopress/Disqus duo.

1. Github Pages

After a friend of mine started up his blog, I decided to give it a shot too. From searching around most of the Github pages around were set up as blogs. It seemed like that was a natural thing to do so I went on and used the Github automatic page generator.

The themes provided were incredibly high quality, and were easy to read on the iPhone. The syntax highlighting seemed to just work too so it seemed like everything was going to be good.

I soon realized that all the automatic page generator did was set up an index.html on the repository. I had to begin editing the html page myself and if I wanted an automatically updated sidebar of pages and posts, there needed to be some Liquid code in the pages that required that.

It started to seem like a pain for a blogging environment.

2. Jekyll-bootstrap

Believing in the power of existing tools, I was sure someone out there would have solved this problem for me. After googling around I found Jekyll-Bootstrap. It seemed fairly simple to set up, since it was simply a starting point with Rakefiles and preset templates for blogging. It also used pygments for syntax highlighting, and had theming support and plugins.

$ rake new_post title="Rants of today"
$ rake new_page title="My new page"
$ rake preview

You had to have an empty repository to start with Jekyll-bootstrap because it needs to dictate how things are in your blog. The easiest way to do this it seems is to begin your repository contents with the contents of Jekyll-bootstrap.

$ git clone https://github.com/plusjade/jekyll-bootstrap.git USERNAME.github.com
$ cd USERNAME.github.com
$ git remote set-url origin git@github.com:USERNAME/USERNAME.github.com.git
$ git push origin master

The site, being a set of markdown and html files with Liquid markups would then simply be checked in to github where it would automatically be run through Jekyll and displayed.

The themes for Jekyll-bootstrap weren’t as good as the ones you could find for Github, but that did not pose much of a problem. There also weren’t many of them. It also turned out that besides the automatic page and post file generation and theme application, there wasn’t much in terms of defaults. The index.md only had a simple example of listing posts, and if you wanted a sidebar and show posts on the index.html, or apply disqus to it, you had to do all of that yourself.

Jekyll-bootstrap Themes

$ rake theme:install git="https://github.com/dhulihan/hooligan.git"

I was surprised at the seeming lack of extra examples that can be copy-pasted in to set the blog up. As it turns out jekyll-bootstrap hasn’t been maintained for a while, since its maintainer has moved on to create another static site generator.

I set this aside on a different folder and proceeded to try the other alternative which seemed more used and still had an active community and maintainers.

3. Octopress

The alternative was Octopress. Octopress is different from Jekyll-bootstrap in the sense that it wasn’t just a template for blogging sites. It is a small collection of tools that allow you to set themes and generate your site locally. Another difference is that the static site generation is actually done on your own PC, and you push the result to github. This means it would work for any other site, which was fairly attractive. Similar to Jekyll-bootstrap, you would fork Octopress and clone your modifications in to another repository, and not your site.

$ git clone git://github.com/davidsiaw/octopress.git Octopress   # from my forked repository
$ cd octopress
$ gem install bundler
$ rbenv rehash
$ bundle install        # prepares the set of tools required to maintain your site
$ rake install          # generates your site's bootstrap sources

$ rake generate
$ rake deploy           # automatically creates a _deploy folder and pushes that to your site

$ git add .             # add the created sources and sass folders that contain the real sources for your site
$ git commit -m "update sources"
$ git pushes            # push your site's sources

Octopress also has a much larger set of themes available to use to customize your site. Like Jekyll-bootstrap, the theme is on a github repository that you can clone and install with.

Octopress Theme Site

One big headache with Octopress was that its syntax highlighting was not part of the theme, and that the default theme was the Solarize theme which was fairly ugly and bluish. It also wasn’t straightforward to configure. It turned out that the syntax configuration is in sass/syntax.scss and the variables in it are stored in sass/solarized.scss.

In order to solve this problem, some googling turned up some people solving this issue by switching the markdown generator to CodeRay. But it did not change the theme for me and it seemed like a very messy change to have, so I decided to roll my sleeves up and edit the scss files myself, which came out pretty well.

4. Wordpress Migration

The final part of my blog setup was to see if I could transfer all the data from my old WordPress blog to this one. It turns out I could. Jekyll had a set of tools that allowed me to import the XML file produced by the WordPress export function. This automatically added the pages to source/pages and posts to source/posts in html format.

This is not desirable because in HTML format the newlines are ignored and the posts look fairly retarded, so I renamed all the extensions to .md.

$ gem install jekyll-import --pre
$ gem install hpricot
$ ruby -rubygems -e 'require "jekyll/jekyll-import/wordpressdotcom";
    JekyllImport::WordpressDotCom.process({ :source => "wordpress.xml" })'

However, the export was far from perfect. My WordPress install used a lot of plugins, meaning the content was littered with shortcodes that did not look like part of any content. I had to touch nearly every generated file that was imported to get everything nice and polished again.

5. Disqus

Disqus is a hosted commenting system that provides a very nice commenting theme that blends in to most Jekyll sites, that’s the sole reason I chose it. I thought facebook comments were nice and would provide me with more exposure but it just looked ugly.

Setting Disqus up was fairly easy. It involved registering on Disqus and adding the shortcode of Disqus to the _config.yml. There was one very undocumented frustration though, the comments were not showing up. It turns out that instead of having trailing ‘/’ on the URLs like other people who had this problem, the imported posts had no ‘comments: true’ line on them! This meant Disqus was disabled for all those posts. I had to then go through all of the posts and add comments to enable Disqus.

Importing the Wordpress comments into Disqus was fairly trivial as there was a Disqus plugin for Wordpress that allowed us to pull all of the Wordpress plugins in. In addition, I could then rename the domains on all the comments that were associated with the Wordpress site to being associated to my site, which was a really handy tool to have.

Conclusion

Overall, this actually took an entire Saturday morning, which is pretty heavy for a “Simple” framework for blogging on static sites. The only thing that looked simple was the amount of stuff that you end up with on the website, which gives the impression that not a lot of work went into building the site.

I guess it wouldn’t be too far fetched to say that even with top grade tools, if a little customization is required, the amount of work to set up a website increases exponentially in relation to the customization you do.

With blogging on the site, one of the things I miss is the ability to see images next to the text that I type, that Wordpress’s web interface or Bunnyblogger affords.

Online vs Batch learning

2013-03-18T00:00:00+00:00

While debugging neuron, my new neural network simulation application, I found some (visually) interesting differences between online and batch learning. While batch learning is usually touted as a better form of learning, I found that the two don’t seem to make much difference, except for a steppy curve from online learning, as I would expect as the gradient changes differently if you keep calculating the values in a cycle instead of calculating the combined gradient of all the training data.

Here are the results from training a 2-2-1 network with biases with XOR as training data:

with the weights initialized to:

double[] wx0 = { 0.1, 0.2 }; double[] wx1 = { 0.3, 0.4 }; double[] wx2 = { 0.5, 0.6 };

double wh0 = 0.7; double wh1 = 0.9; double wh2 = 1.1;

Where wx are the values between the input and hidden layer and wh are the values between the hidden and output layer.

Here is the network topography:

Here is the training error:

This seems to make Cappuccino faster...

2012-12-19T00:00:00+00:00

My lack of knowledge about Cappuccino’s implementation details may play a role, but

window.setTimeout = window.setNativeTimeout;

Seems to make my Cappuccino apps more responsive.

The Identity of an Array

2012-12-12T00:00:00+00:00

Given two arrays of integers: a and b, what does (a == b) mean?

Most of us will say, it depends on the programming language you are using, and that is true. In the case of C, it means a and b are the pointers to the start of the same array. In Java and C# it means the arrays are Reference Equal, basically meaning that the pointer to the arrays a and b are the same.

But how is that really useful? If you passed around these two arrays and checked if the references are the same, you are really just comparing the object with itself or not. The object is never going to be equal to anything else but itself. Even another array created in a different part of the program that has exactly the same integers in it in the same order will be a different array.

One would agree that [1, 2, 3] and [1, 2, 3] are the same but if these two arrays were created at different places the == operator will return false. Why is the extra information that the arrays are made in two different places important to the == operator? It makes no sense when you look at arrays as a list of numbers, and not a pointer into memory, which is an implementation detail.

So if we were to write a better == operator for arrays, what would it be?

[1,2,3,3] == [1,2,3,3] -> true
[1,2,3,3] == [3,2,1,3] -> ?
[1,2,3,3] == [1,2,3,1,2,3] -> ?
[1,2,3,3] == [2,3,1] -> ?
[1,2,3,3] == [1,2,3] -> ?
[1,2,3,3] == [1233] -> ?
[1,2,3,3] == "1233" -> ?

We suddenly see that the array’s identity changes depending on what you are using it for. Let us replace those question marks with context.

[1,2,3,3] == [1,2,3,3] -> true if the array is an ordered list
[1,2,3,3] == [3,2,1,3] -> true if the array is an unordered list
[1,2,3,3] == [1,2,3,1,2,3] -> true if the array is a bag of numbers
[1,2,3,3] == [2,3,1] -> true if the array is an unordered set
[1,2,3,3] == [1,2,3] -> true if the array is an ordered set
[1,2,3,3] == [1233] -> true if the array is a collection of digits
[1,2,3,3] == "1233" -> true if the array is a string

The array is too general. It is actually really important that we know what the array represents. A whole bunch of things are arrays but are passed around as plain arrays. This causes maintainer programmers to wonder what exactly they represent.

Some cite performance reasons. Fair enough, passing a struct that contains an array and a set of methods to manipulate it in C is slower than just passing the struct around. C is a very low level language where implementation detail really mixes with requirements.

But why is it that so many other modern languages have this problem? Why isn’t there a language that treats a set like a set, and has a compiler that really just compiles it down into an array?

Interestingly enough, this problem does not stop at arrays. Numbers have different meanings in different contexts too. Take for example the function Math.Cos in C#. This function takes a double in radians.

Why is it that the function signature is

double Cos(double angle)

and not

ratio Cos(radians angle)

where radians is just a type of double?

If the compiler was fed this information the first place, programs like this:

degrees a = 40;
ratio r = Cos(a);

will cause the compiler to either throw a compiler error for passing degrees into a radians parameter, or generate code to convert degrees to radians.

Exceptionless Programming

2012-11-28T00:00:00+00:00

I am a big fan of writing code that generates no runtime errors. This is an implementation of a function that takes the first element of an array.

template<typename T>
T first(std::vector<T> array)
{
  if (array.size() == 0) { throw EmptyArrayException(); }
  return array[0];
}

This is the standard way we are taught to write programs. However, there is another way to write this function that makes the exception unneccessary.

template<typename T>
void first(std::vector<T> array, std::function<void(T)> returnCallback)
{
  if (array.size() == 0)
  {
      return;
  }
  returnCallback(array[0]);
}

Suddenly the program flow is dictated by the emptiness of the array. Besides knowing well that this function will never throw an exception, the program that uses it will be structured in such a way that you can guarantee within the scope of the returnCallback function that was passed in, the function will always have the first element of the array. You don’t even have to check for nullness.

int calcCheckSum(std::vector<int> sorted)
{
  int result = 0;
  first<int>(sorted, [&amp;](int a)
  {
      int& res = result;
      last<int>(sorted, [&amp;](int b)
      {
          res = a + b;
      });
  });
  return result;
}

And as a perk, if this function compiles, you know that it will run without errors.

Nice Visualization of a Cross Product

2012-11-18T00:00:00+00:00

[This] is a very nice Java applet that allows you to see what a cross product is. The applet shows A x B = C (i.e. C is the cross product of A and B) in 3 dimensions. You can drag A and B around to see the effects on C.

This UI is an example of very good interactivity. It makes it very easy for the user to see what happens and since its interactive, the feedback is instant and a little playing around with it gives the user an intuition of how cross products work (not how they are done).

Instead of a program where you enter numbers, dragging arrows around give you a much better view of whats going on, and allows the user to relate a real-life situation to it better too.

Blurred Text Effect

2012-08-13T00:00:00+00:00

I came across a strange site that blurred the text of answers and asked the reader to sign up to see it. It was strange because the text was already there, just blurred via css. I thought it was an interesting little snippet so I decided to record it.

This is how it blurs.

.blurred_text {
  color: transparent;
  text-shadow: 0 0 7px #777;
}

Regex Escapees

2011-11-22T00:00:00+00:00

Sometimes when one writes regexes, Its hard to know what needs to be escaped and what doesn’t. But I’ve solved that problem while I was writing a program that generated my regexes for me. Now I have a snippet that both humans and programs can use to write regexes!

This is for the .NET flavor of Regex.

private static string SanitizeToken(string token)
{
  return token
                  .Replace(@"\", @"\\")
                  .Replace("*", @"\*")
                  .Replace("?", @"\?")
                  .Replace("+", @"\+")
                  .Replace("{", @"\{")
                  .Replace("}", @"\}")
                  .Replace("^", @"\^")
                  .Replace("$", @"\$")
                  .Replace(".", @"\.")
                  .Replace("[", @"\[")
                  .Replace("]", @"\]")
                  .Replace("(", @"\(")
                  .Replace(")", @"\)")
                  .Replace("|", @"\|");
}

Making Color From Alpha, Hue, Saturation and Brightness

2011-11-22T00:00:00+00:00

I find myself wanting to do this sometimes, so here’s a bit of code (I’m sure I stole this code from somewhere else);

public static Color ColorFromAhsb(int a, float h, float s, float b) {

  if (0 > a || 255 < a) {
      throw new ArgumentOutOfRangeException(&quot;a&quot;);
  }
  if (0f > h || 360f < h) {
      throw new ArgumentOutOfRangeException(&quot;h&quot;);
  }
  if (0f > s || 1f < s) {
      throw new ArgumentOutOfRangeException(&quot;s&quot;);
  }
  if (0f > b || 1f < b) {
      throw new ArgumentOutOfRangeException(&quot;b&quot;);
  }

  if (0 == s) {
      return Color.FromArgb(a, Convert.ToInt32(b * 255),
          Convert.ToInt32(b * 255), Convert.ToInt32(b * 255));
  }

  float fMax, fMid, fMin;
  int iSextant, iMax, iMid, iMin;

  if (0.5 < b) {
      fMax = b - (b * s) + s;
      fMin = b + (b * s) - s;
  } else {
      fMax = b + (b * s);
      fMin = b - (b * s);
  }

  iSextant = (int)Math.Floor(h / 60f);
  if (300f <= h) {
      h -= 360f;
  }
  h /= 60f;
  h -= 2f * (float)Math.Floor(((iSextant + 1f) % 6f) / 2f);
  if (0 == iSextant % 2) {
      fMid = h * (fMax - fMin) + fMin;
  } else {
      fMid = fMin - h * (fMax - fMin);
  }

  iMax = Convert.ToInt32(fMax * 255);
  iMid = Convert.ToInt32(fMid * 255);
  iMin = Convert.ToInt32(fMin * 255);

  switch (iSextant) {
      case 1:
          return Color.FromArgb(a, iMid, iMax, iMin);
      case 2:
          return Color.FromArgb(a, iMin, iMax, iMid);
      case 3:
          return Color.FromArgb(a, iMin, iMid, iMax);
      case 4:
          return Color.FromArgb(a, iMid, iMin, iMax);
      case 5:
          return Color.FromArgb(a, iMax, iMin, iMid);
      default:
          return Color.FromArgb(a, iMax, iMid, iMin);
  }
}

OpenGL weirdness

2011-10-06T00:00:00+00:00

Try this:

    dispList = glGenLists(1);

    glNewList(dispList, GL_COMPILE);

    for(int x=-768; x<768; x++)
    for(int y=-768; y<768; y++)
    {
        glBegin(GL_QUADS);
            glColor3f((double)(rand() % 100) / 500,
      (double)(rand() % 100) / 100, 0);
            glVertex3i((x)  * 1, (y)  * 1, 0);
            glVertex3i((x+1)* 1, (y)  * 1, 0);
            glVertex3i((x+1)* 1, (y+1)* 1, 0);
            glVertex3i((x)  * 1, (y+1)* 1, 0);
        glEnd();
    }

    glEndList();

and render your display list. Then, try this:

    dispList = glGenLists(1);

    glNewList(dispList, GL_COMPILE);
    glBegin(GL_QUADS);    // <- the subtle difference is here

    for(int x=-768; x<768; x++)
    for(int y=-768; y<768; y++)
    {
            glColor3f((double)(rand() % 100) / 500,
      (double)(rand() % 100) / 100, 0);
            glVertex3i((x)  * 1, (y)  * 1, 0);
            glVertex3i((x+1)* 1, (y)  * 1, 0);
            glVertex3i((x+1)* 1, (y+1)* 1, 0);
            glVertex3i((x)  * 1, (y+1)* 1, 0);
    }
    glEnd();        // <- and here

    glEndList();

I don’t know if its my graphics drivers, or graphics card or perhaps just me, but the top code maxed out at 60 fps while the bottom code ran at ~10 fps.

Oh, and don’t ask me why I’m using display lists.