I found another Chord implementation in Java, it is based on an implementation of JMS.
It is a good implementation cause it tightly follows J2EE best practice: there is a clean separation between transportation tier (built over JMS infrastructure) and BL tier. Anyway it does not follows all Chord 's specification: ie it does not use 160bit Identifiers cause its identifiers are Integers (which is 2^32 space).
The name of this implelentation is UberChord it's distributed with GPL License and can be found here.
UberChord is based on UberMQ which is a clean-room implementation of the JMS 1.1 API (topics and queues), based on Java NIO for speed and scalability. It supports TCP, SSL and LRMP multicast connectivity, and is highly pluggable so you can customize various aspects to your needs.
It took long since my last post... Many things happened.
First of all I went on holiday: I went in Vietnam... with my love!
But I also worked at my new blog... I choosed Wordpress! It's open source and is in PHP so I preferred to change my hosting server (OS is now Linux and DB is MySQL). I have to say that WP crew did a terribly good work: Wordpress is a first class Content Management System.
I'm still working at my open source project... Now I'm trying to implement in Java the Internet Indirection Infrastructure, and I'm trying to tune Chord implementation... I'm alone in my work if anyone wants to join I'll be glad to share my work.
Hi all! I'm back... Not many posts in last days... I'm sorry! I didn't thought having a blog could be so hard! Actually I was very busy with my daily job. I went to a Microsoft seminar about its Shared Web Hosting program. Well..., Microsoft copies everything! They try to "resell" throught their Service Licences also open source projects. Here is the facts:
- Some guys made a clone of PHPNuke (http://phpnuke.org/) in ASP .NET
- This project is named DOTNETNUKE. Google it: they bought AD-Sense keywords, it is a rich project!
- M$ included this project in its offer for Web Hosting program.
Actually I have to say that DOTNETNUKE guys did a very good job and that M$ speaker said that DOTNETNUKE is Open Source and with a BSD License thus there is no real sell of this software, anyway, to a not careful listener, it could seem that a "NEW" "INNOVATIVE" tool is going to come from M$. Great Marketing people, M$ people!
"Risin' up -- back on the street Did my time, took my chances... ...They stack the odds still we take to the street For the kill, with the skill to survive It's the Eye of the Tiger It's the Thrill of the fight..."
By Survivor - Eye Of The Tiger, Lyrics It's a joke! I'm going to dig in my Tiger Tree Hashing algorithm implementation. My work consists of five objects:
- HashInfo
- HashStore
- HashTree
- HashTreeRAF
- TTH
HashInfo holds HashTree metadata is not important in this post 'cause is only related to JavaDC3 internal data communication model. HashStore is a singleton (GoF style). It is used to access the file in which are stored hash data: it uses HashTreeRAF which represents the actual file, this last package local object that wraps RandomAccessFile and it is used to read and write HashTrees. Here the problem to solve was that DC++ protocol exstensions requires that a client can download also multiple parts of a file or blocks (aka "Segmented download"), so the need was to have a store which holds all Tree blocks hash datas. If you're familiar with C++ look at DC++ guys work here. You will see that thier and mine work are very similar. Actually, thier one is original... I learned by examples... Ok! I confess! I copied their work: I know this is not very noble but I had to make my client work the same way as DC++, and reading in C++ then writing in Java is not so simple! HashTree is the ojbect that holds data of Hashed File. It is JavaDC3 internal rapresentation of a MerkleTree. Reading data from the Hash Store will create and "fill" a HashTree objects. TTH computes the THEX TigerTree Algorithm, it relies on the implementation of the underlying Tiger algorithm from the Cryptix JCE Marco
Last months I was mainly involved in a project called JavaDC3 which is a java porting of DC++. In this project I worked on implementation of Tiger Tree Hashing alghorithm ( TTH). TTH and THEX is clearly explained here. All these technology are based on the definition of the Merkle Hash Tree. The Merkle Hash Tree, invented by Ralph Merkle, is a hash construct that exhibits desirable properties for verifying the integrity of files and file subranges in an incremental or out-of-order fashion. I made my google search and found that in sf.net a standard implementation. The problem was that standard implementation was not usable in a production environment infact performances were not suitable for a real project and with a big file (1.5 GB) on a standard machine system crashes. So I needed to re-implement the alghorithm. (Also guys from Limewire did this and maybe better than me...) Here is what I did: One of the main problems was that standard implementation uses too much memory because it uses a Vector to hold "hashed" blocks (aka leaves). And it "hashes" all the file in one cycle then it uses other Vectors to hold hashed result of two leaves and so on along the tree until it has only hash the final one. On contrary, my implementation uses only two Stacks. I fill the two stacks continuosly: the first one with leaves hashed from file the second with blocks hashed with two leaves and so on consuming all the bits of the file but not the memory. I'll go more in deep in next posts. Marco.
Anycasti3 provides support for anycast by allowing applications to specify a prefix for each trigger identifier. Packets are then matched to the identifiers according to the longest matching prefix rule. Suppose receivers Ri insert triggers whose identifiers share a common prefix p. A packet with the identifier p|a (where p is the prefix and a is the suffix) is matched according to the longest prefix rule and forwarded to the corresponding receiver. Receivers can choose suffixes to implement different anycast policies. For example, if the suffixes of all triggers and packets (requests) are randomly chosen, this scheme provides load balancing within a factor of log N, where N is the number of receivers. Similarly, if suffixes encode the host locations (e.g. zip code where the host which inserts a trigger or sends a packet is located) the packets will be delivered to a nearby receiver.  i3 Anycast
Multicast
i3 can be used to build a multicast tree using a hierarchy of triggers. Note that the number of triggers with the same identifier represents the replication factor of a packet at an infrastructure node.  i3 multicast: The maximum replication factor in the example is three as there are three triggers with identifier id.
Mobilityi3 provides natural support for mobility. When a host changes its address, the host needs only to update its trigger. When the host changes its address from R1 to R2, it updates its trigger from (id, R1) to (id, R2). As a result, all packets with identifiers id are correctly forwarded to the new address. Note that this change is completly transparent to the sender S.  Mobility for i3
More will come about Chord and its implementation. Now I'm going to work on Internet Indirection Intfrastructure ( i3) over Chord. Internet Indirection Infrastructure ( i3) offers a rendezvous-based communication abstraction. i3 ease the deployment of services like multicast, anycast, and mobility. i3 works like this: " instead of explicitly sending a packet to a destination, each packet is associated with an identifier; this identifier is then used by the receiver to obtain delivery of the packet".  Basic i3: A host R that inserts a trigger (id, R) in the i3 infrastructure to receive all packets packets with identifier id.
In my implementation I preferred to put all Chord's business logic in three objects:
- Node,
- ChordId,
- FingerTable
then I decided to delegate all communication and stabilization task to four other components (which are objects in Picocontainer's language)
- HubManager,
- ServerManager,
- TaskManager,
- StabilizationManager.
More in deep:
- ChorId contains all "arithmeticians" methods according to Chord's clockwise mathematics. Specially "belongs", "belongsLeftInclusive" and "belongsRightInclusive" which implement respective x € (a,b), x € [a,b) and x € (a,b].
- Node contains all Chord specific methods among which "join", "leave", "findSuccessor", etc. according to Chord protocol specification document and also it holds FingerTable object.
- HubManager holds the Node object and it is delegated by Node to start comunication tasks to find remote information. It creates messages to be send to remote nodes whose pointer is a NodeHandle object. HubManager starts up Node and creates, when it is delegated by Node object, a message and a listener to be pushed in a stack. Each message has a unique key, the same key is assigned to a listener which is used as entry poing to a callback function. Messages are "poped" from stack by TaskManager who does the "hard work" of communication. Listeners are used to handle response from remote node.
- TaskManager and ServerManager do the actual communication. Modifying these two components we can implement an in-process chord network maybe for simulation purposes.
- StabilizationManager manages stabilization jobs using Quartz Scheduler facilities. Each stabilization job delegates HubManager to create messages and listeners to stabilize the network.
More on Chord and Accord project which is were to find code is here
Today I listened on radio that Italy is last country in Europe for Internet and Broadband penetration: it was a journalistic interpretation of an Eurostat report. I work for a "Very Big" Media company in Internet & Multimedia division so I thought "dark clouds in my future!!" I talked with my colleagues and... we were all sad! So what I did? I went on the Internet to search more information... I surfed till Eurostat Report site. I opened the PDF and tried to understand the document... Well! Big surprise! Italy is not so bad!  Use of the Internet by Individuals and Enterprises– EU 25*, 2004 (%) Italy is perfectly in average with EU25! What I understand is that italians surf the Internet at work and Italians surf the Internet when they need to interact with government for example: of course not much for amusing themselves. We can say we use Internet for its original purpose: documentation!  Internet usage in 2004 by individuals and enterprises Maybe there are social reasons, as a colleague of mine says, maybe italians when are at home prefer to do sports or to enjoy themselves going out and meeting other people. I don't think he is so wrong!! Actually I think that another reason is that broadband is not so diffused and is still too much expensive in contrary of other EU country like France, Germany or UK. I don't think we have to compare ourselves with Sweden or Denmark but on the other hand with Spain or France which are more close to us! Maybe we must know deeper what the Internet can offer to us. Italians always think bad about themselves! We are not so bad! Marco. P.S.: Another think that is strange: why RIAA says that piracy on the Internet in Italy is expanding? Maybe italians use the Internet but they don't say they do it!
I made an imlplementation of Chord protocol it can be found here in CVS repository Here is a brief explanation of code: there are two subprojects one named site and one named hub. Site project is a site to be used as a registry where every Chord node says "I'm alive!". I used this approach because I needed a dinamic way to find "the known node" to be used for join method at node's bootstrap. Site is build onto Springframework and it keeps an in-memory registry of alive nodes. At moment it has no way to know is a node is down. Nodes communicate with site via XML on HTTP. Hub project is the actual implementation of Chord protocol. I wanted to concentrate on chord protocol so I used some open source framework and toolkit for common tasks. Hub lifecycle is managed throught Picocontainer , an IOC framework. Stabilization tasks are scheduled by Quartz scheduler; HTTP comunication is made by Jakarta HTTP commons client, logging is delegated to Log4j and administration is made using JMX and MX4J.
The Chord system is an efficient distributed lookup service based on the Chord protocol. The Chord protocol supports just one operation: given a key, it will determine the node responsible for storing the key’s value. The Chord protocol uses a variant of consistent hashing to assign keys to Chord server nodes. Conceptually the Chord system functions analogously to the DNS system. Both systems map names to values. Chord’s algorithms have no special servers, however, in contrast to DNS which relies on a set of special root servers. In addition, Chord doesn’t put restrictions on the format and meaning of names; Chord names are just the keys of key/value pairs. Chord doesn’t attempt to solve the administrative problems of DNS. The Chord protocol is a distributed data location protocol very close to the one used in OceanStore. The Chord protocol has essentially one operation: given a key, it will determine the node responsible for the key. Chord does not itself store keys and values, but provides primitives that allow higher-layer software to build a wide variety of storage system. Each machine acting as a Chord server has a unique 160-bit Chord node identifier, produced by a SHA hash of the node’s IP address. Chord views the IDs as occupying a circular identifier space. Chord defines the node responsible for a key to be that key’s " successor". The successor of a key or node identifier j is the node with the smallest ID that is greater than or equal to j. Chord’s primary task is to find these successors.
This is the first post of a group of "Focus On" about technologies I'm trying to use and understand. First of all let me say that this and future posts are not intended to be academic readings on mile stone papers, but, on the other hand, only the collection of my considerations and implementation of protocols and solutions proposed of different projects which I studied. So what are these post about?
- Distrubuted lookup systems [1] [2]
- Rendez-vous based communication systems[1]
- Distributed Hashtables [1]
- TTH [1]
- UPnP [1]
More in deep, I will write about: Chord [2], Internet Indirection Infrastructure, Bamboo, Pastry, Tapestry, OceanStore, Tiger Tree Hashing, Merkle trees, etc. I'll try to be as clear as I can... Despite of my English! Back soon, Marco.
I'm looking at techologies dealing with P2P. I found Chord. The Chord project aims to build scalable, robust distributed systems using peer-to-peer ideas. Chord is basically a distributed lookup system it is completely decentralized and symmetric, and can find data using only log(N) messages, where N is the number of nodes in the system. So I found it: Chord is the overlay building block I needed to make up a network of peers! To have more infos about Chord look here otherwise stay connected "furthers" will come... Marco.
|