things.data.processing
Class Crawler

java.lang.Object
  extended by things.data.processing.Crawler

public class Crawler
extends java.lang.Object

Universal crawler.

Version:
1.0

Version History

EPG - Initial - 21 OCT 06
 
Author:
Erich P. Gatejen

Constructor Summary
Crawler()
           
 
Method Summary
 void crawl(java.lang.String pathroot, Logger logger, boolean loop)
          Start a crawl on the filesystem.
 void dontMatch(java.lang.String text)
          It will require every filename does NOT match the regex passed.
 void dropSuffix(java.lang.String suffix)
          Ignore any path that has this as a suffix.
 void match(java.lang.String text)
          It will require every filename match the regex passed.
 java.io.File next()
          Get the next time in the crawl.
 void start()
          Start from the root.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Crawler

public Crawler()
Method Detail

crawl

public void crawl(java.lang.String pathroot,
                  Logger logger,
                  boolean loop)
           throws java.lang.Throwable
Start a crawl on the filesystem. Log progress. You can restart the crawl with a call to this at any time. It will reset any matching configuration.

Parameters:
pathroot - the path from where to start.
logger - the logger to use.
loop - if true, just go back to the loop once we deplete.
Throws:
java.lang.Throwable - for any problem.

start

public void start()
           throws java.lang.Throwable
Start from the root. This is the same as a reset.

Throws:
java.lang.Throwable

match

public void match(java.lang.String text)
           throws java.lang.Throwable
It will require every filename match the regex passed. You may call this more than once and it will attempt to stitch them together, but I can't guarantee it. You must have called crawl at least once before setting this.

Parameters:
text - the regex. It cannot be null or empty.
Throws:
java.lang.Throwable

dontMatch

public void dontMatch(java.lang.String text)
               throws java.lang.Throwable
It will require every filename does NOT match the regex passed. You may call this more than once and it will attempt to stitch them together, but I can't guarantee it. You must have called crawl at least once before setting this.

Parameters:
text - the regex. It cannot be null or empty.
Throws:
java.lang.Throwable

dropSuffix

public void dropSuffix(java.lang.String suffix)
                throws java.lang.Throwable
Ignore any path that has this as a suffix. You can only set one at a time.

Parameters:
suffix - The suffix. Null will turn it off. It is not case sensitive.
Throws:
java.lang.Throwable

next

public java.io.File next()
                  throws java.lang.Throwable
Get the next time in the crawl.

Returns:
the File node or null if all done.
Throws:
java.lang.Throwable - if there was a serious problem.


Things.