4/21/2014

NLP - the second assignment

I finished week 3 and 4 of the NLP course.

I uploaded the source code for the second assignment here.

4/08/2014

Natural Language Processing

I am taking a Natural Language Processing cource here. I have finished week 1 and week 2 so far.

Although I cannot get a credit because it is actually a course last year, I am doing assignments. The best way to learn something is to actually do it firsthand!

I uploaded the source code for the first assignment here. It is written in Python.

2/06/2014

Playing with Pig

The last weekend, I installed pig. I followed the instruction here. It was really straightforward. It is really handy because it can run locally without hadoop running.

After playing it a little in the local mode, I ran the pig script 1 in the tutorial on my hadoop installation. It took about 10 minutes but it just worked.
script1-hadoop-results is the result of pig script 1
Then I got the Japanese zip code data from Japan Post (like USPS in the US). The data consists of local government code, zip code, prefecture name, city name, and town name. I ran Pig Latin commands below to find out how many unique zip codes each prefecture has.

RAW = load 'ken_all_rome.csv' using PigStorage(',');
P = foreach RAW generate $4 as pref, $1 as code; 
P2 = distinct P;
PG = group P2 by pref;
PC = foreach PG generate $0, COUNT($1);
store PC into 'poscnt-result' using PigStorage();

The prefecture which has the most is Hokkaido, 8006 zip codes. The least is Kagawa, 709 zip codes. Tokyo has 3731 zip codes. It looks too few, considering its population. Its area is relatively small, so probably the number of zip codes is proportional to the area.

the number of zip codes each prefecture has

1/24/2014

Installed Hadoop

I was curious about big data. So I installed Hadoop 1.2.1 on my Linux (Mint 16) machine.



Hadoop processes
 
 
 I ran the famous 'word count' example...

and succeeded.

How do I use Hadoop to solve puzzles? I am not sure yet.

1/21/2014

Finish entering the dekabiro data but...

I finally finished the dekabiro data but the solver cannot solve it. During entering the data, I made some mistakes and fixed what I found. I think there are still a few mistakes and I need to find them all.

The dekabiro has 90 columns and 124 rows. It took a little less than 5 minutes to enter a row data, so I spent about 10 hours to finish. It should take a few more hours to check the data...

12/16/2013

Dekabiro Kakuro - Super Giant Kakuro

I got dekabiro - super giant - Kakuro and Slitherlink problems today. I bought them from Nikoli. The size of kakuro is 90 x 124 - really huge.

Dekabiro kakuro
I am going to test my kakuro solver "Kakkuro" with dekabiro. The biggest challenge is, I need to enter the problem! I am almost sure I am going to make mistakes and to go through the problem again and again to find the mistakes... what a daunting task!

12/08/2013

Next target - slitherlink solver

I am satisfied with my Kakuro solver. There are a few possible improvements for the Kakuro solver as I mentioned in the previous post, it wouldn't be very visible. So I want to try a solver for a different puzzle.

Sudoku (Wiki) is, of course, a candidate. It is one of the most popular puzzles in the world. But I don't want to make a Sudoku solver because:
  1. There are already a lot of good solvers.
  2. It is less challending than Kakuro. Sudoku can be solved by the same tricks as Kakuro. In a sense, Sudoku is a varation of Kakuro; a 9x9 Kakuro with no black cells except the top and leftmost cells, and with exposed solution values for some cells to guarantee a unique solution.
  3. Actually I create a solver before - in Microsoft Basic (no, it was not Visual Basic. It was a Basic with line numbers). It ran on a Z80 machine. The machine was very slow at that time but the Sudoku solver still ran fairly fast.
So what's the next? I am considering Slitherlink (Wiki/nikoli.com). It seems more challenging than Kakuro.

I am also considering C# as a language instead of C++. I have been using C++ for many years but I want to try something new. It should be a fun!