Wednesday, September 27, 2006

Implementing Trees in SQL

Implementing hierarchical trees in SQL can be slightly tricky. Fortunately some good ideas have been presented by Joe Celko. I came across to ways to implement trees from his books.

One is the Adjacency List model, which in simple words makes use of seperate tables to model the nodes and the edges of the trees. One table could hold the Node data, and one table describing the Edge relations. This approach usually helps in normal situations where complex tree operations are not required.

In case one has to perform some complex calculations on the tree, another model - the Nested Tree model is suggested. The Nested Tree Model is quite interesting and very convenient to implement. The idea here is "Containment represents subordination". Quoting his book " Visualise the nested sets model as a little worm with Bates automatic numbering stamp crawling along the "boxes-and-arrows" version of the tree. The worm starts at the top, the root, and makes a complete trip around the tree. When he comes to a node he puts a number in the cell on the side he is visiting and his numbering stamp increments itself. Each node will get two numbers, one for the right side and one for the left side. Computer science majors will recognize this as a modified preorder tree traversal algorithm."


for further reading check out these pointers ....

A good starter tutorial on Adjacency List Model

Joe Celko’s SQL for Smarties Advanced SQL Programming Second Edition

Joe Celko's Trees and Hierarchies in SQL for Smarties

Wednesday, September 20, 2006

Machine Learning & Kernel Methods

Standard learning systems operate on input data after they have been transformed into feature vectors. But in many cases the input data cannot be described easily by explicit feature vectors for example in bio-sequences, images etc. Moreover, the construction of Feature extraction module can be complex and even risk losing relevant information during the process.

A solution is in the form of Kernel methods/functions. A Kernel function maps the data into a high dimension space where classes are linearly separable. The High dimension data (e.g. text) can use a simple dot-product as kernel. The Optimisation algorithm only accesses the training data through the
Kernel Matrix.

A kernel k(x,y) is a similarity measure, defined by implicit mapping f from the original space to a vector space (feature space) : k(x,y)=f(x)•f(y)

This similarity measure and the mapping include:
  • Invariance or other a priori knowledge
  • Simpler structure (linear representation of the data)

check out for further information on Kernel methods, SVM amd Machine learning

Here are some pointers which include course materials on Machine Learning done this summer in some Universities.