things.data.processing.rfc822
Class AddressParser

java.lang.Object
  extended by things.data.processing.LexicalTool
      extended by things.data.processing.rfc822.AddressParser

public class AddressParser
extends LexicalTool

An 822 address parser.

The submitted addreses may have whitespace at either end of the strings. Trim if you wish. Note that CR or LFs will be converted to spaces.

It isn't as much work as it appears. It took me about 30 minutes to map it in a spreadsheet. After another hour, I had the parse language (as seen in the comments below). And another hour after that it was coded and done. I've found only one bug since, which traced back to the original spreadsheet.

Version:
1.0

Version History

EPG - Initial - 12 FEB 05
 
Author:
Erich P. Gatejen

Field Summary
 
Fields inherited from class things.data.processing.LexicalTool
ALLOWED, ASCII_HIGH, BAD, BREAKING, CHAR, CHAR_DNSCHAR, CHAR_DNSCHAR_NUMERIC, CHAR_DNSCHAR_POUND, CLASS_ALPHA, CLASS_CONTROL, CLASS_NONE, CLASS_NUMERIC, CLASS_PUNCTUATION, COLONVALUE, CONTROL, CRBYTEVALUE, DASHVALUE, DNSCHAR, DOLLARBYTEVALUE, HEADER_READ_STATE_CHART, HEADER_READ_STATE_CHARTV2, HP____SPECIAL_DEAD, HP____SPECIAL_PAUSE, HP____SPECIAL_WALKING_DEAD, HP_BROKEN, HP_CLEAR_PAUSE, HP_CLEAR_PAUSE_CRLF, HP_CLOSURE, HP_CR, HP_HEAD_CR, HP_HEAD_CRLF, HP_HEAD_LF, HP_LF, HP_LFCR, HP_NOT_USED, HP_PAUSE, HP_PAUSE_CRLF, HP_PAUSE_CRLFCR, HP_READ, HP_START, LEXICAL_HEADER_TERMINATION, LEXICAL_MAP, LEXICAL_MAP_822_HEADERNAME, LEXICAL_MAP_822_TYPE, LEXICAL_MAP_CLASSIFICATION, LEXICAL_MAP_DNS_TYPE, LEXICAL_MAP_HEXVALUE, LEXICAL_MAP_NAME, LEXICAL_MAP_URI_TYPE, LEXICAL_MAP_URLF_TYPE, LFBYTEVALUE, NO_CHARACTER, NOT_ALLOWED, OPENBBYTEVALUE, OTHER, PIPEBYTEVALUE, SLASHBYTEVALUE, SPACEVALUE, SPECIAL, SPECIAL_AMP, SPECIAL_AT, SPECIAL_BACKSLASH, SPECIAL_CHAR_DNSCHAR_DOT, SPECIAL_CLOSEBRACK, SPECIAL_CLOSEPAREN, SPECIAL_COLON, SPECIAL_COMMA, SPECIAL_DOLLAR, SPECIAL_EQ, SPECIAL_GT, SPECIAL_LT, SPECIAL_OPENBRACK, SPECIAL_OPENPAREN, SPECIAL_PERCENT, SPECIAL_PLUS, SPECIAL_QUEST, SPECIAL_QUOTE, SPECIAL_SEMICOLON, SPECIAL_SLASH, SPECIAL_SPLAT, STRING_CRLF, TABVALUE, URLCHAR, URLFCHAR, VALUE_ASCII_BOTTOM, VALUE_ASCII_HIGH_BOTTOM, VALUE_ASCII_HIGH_TOP, VALUE_ASCII_LOW_BOTTOM, VALUE_ASCII_LOW_TOP, VALUE_ASCII_TOP, WS, WS_CR_CONTROL, WS_LF_CONTROL, WS_SPACE, WS_TAB_CONTROL
 
Constructor Summary
AddressParser()
           
 
Method Summary
static void parseAndSave(StreamSource source, AddressListener addressListener)
          Parse the source for addresses.
 void parser(java.io.InputStream ins, AddressListener addressListener)
          Call with an InputStream.
 void parser(StreamSource source, AddressListener addressListener)
          Parse engine grammer.
Lexical elements: ASCII (0->127), CHAR (32->127 minus WS, SPECIAL), QUOTE, AT, COLON, SEMICOLON, DOT, OPENBRACK, CLOSEBRACK, GT, LT, BACKSLASH, COMMA, OPENPAREN, CLOSEPAREN, WS (space or tab) CR, LF |SPECIAL| (includes QUOTE, AT, COLON, SEMICOLON, DOT, OPENBRACK, CLOSEBRACK, GT, LT, BACKSLASH, COMMA, OPENPAREN, CLOSEPAREN), !OTHER! (meaning anything not listed).
 void parser(java.lang.String data, AddressListener addressListener)
          Call with a String.
 
Methods inherited from class things.data.processing.LexicalTool
get822HeadernameType, get822HeadernameTypeWithDollar, get822Type, getClassification, getDNSType, getHexValue, getLower, getName, getUpper, getURIType, getURLFType
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AddressParser

public AddressParser()
Method Detail

parseAndSave

public static void parseAndSave(StreamSource source,
                                AddressListener addressListener)
                         throws java.lang.Throwable
Parse the source for addresses. All addresses will be put in the map with their internet address as a key. Duplicate Internet addresses will overwrite the same record, so only the latest friendly name will be remembered.

Parameters:
source - the source data.
addressListener - an address listener for found addresses.
Throws:
java.lang.Throwable

parser

public void parser(StreamSource source,
                   AddressListener addressListener)
            throws java.lang.Throwable
Parse engine grammer.
Lexical elements: ASCII (0->127), CHAR (32->127 minus WS, SPECIAL), QUOTE, AT, COLON, SEMICOLON, DOT, OPENBRACK, CLOSEBRACK, GT, LT, BACKSLASH, COMMA, OPENPAREN, CLOSEPAREN, WS (space or tab) CR, LF |SPECIAL| (includes QUOTE, AT, COLON, SEMICOLON, DOT, OPENBRACK, CLOSEBRACK, GT, LT, BACKSLASH, COMMA, OPENPAREN, CLOSEPAREN), !OTHER! (meaning anything not listed). REG: $GROUP, $FRIENDLY, $ADDRESS, $FLAG_GROUP, $FLAG_GROUP_ENDED [START] -> NULL->$FRIENDLY -> NULL->$ADDRESS -> NULL->$GROUP -> NULL->$BUSTED -> false -> $FLAG_GROUP -> false -> $FLAG_GROUP_ENDED -> [OPEN] -> if (EOF) ^RETURN^ [OPEN] - WS, COMMA - burn - CHAR - push, [ACCUMULATE], ^RETURN^ - OPENPAREN - push, [GATHERCOMMENT] - QUOTE - push, [GATHERQUOTE], [ACCUMULATE], ^RETURN^ - LT - [LTADDRESS], ^RETURN^ - SEMICOLON - if (true = $FLAG_GROUP) then pop->$ADDRESS, [SUBMIT], false->$FLAG_GROUP_ENDED, ^RETURN^ else error(Character not allowed in DN.) - |SPECIAL| - error(meaningless and unquoted special) - !OTHER! - error(character not allowed in open) - EOF - ^EXIT^ -> ($FLAG_GROUP_ENDED = true) - false->$FLAG_GROUP_ENDED, ^RETURN^ [ACCUMULATE] - CHAR - push - COMMA - pop->$ADDRESS, submit, ^RETURN^ - OPENPAREN - push, [GATHERCOMMENT] - QUOTE - push, [GATHERQUOTE] - AT - push, [NAKED_ADDRESS_DN_ONLY], ^RETURN^ - WS - push(SPACE), [ACCUMULATE_WITH_WS], ^RETURN^ - LT - pop->$FRIENDLY, [LTADDRESS], ^RETURN^ - COLON - [GROUP], ^RETURN^ - |SPECIAL| - error(unquoted special) - !OTHER! - error(character not allowed in open) - EOF - push, pop->$ADDRESS, submit, ^RETURN^ [ACCUMULATE_WITH_WS] - CHAR - push, [FRIENDLY], ^RETURN^ - COLON - [GROUP], ^RETURN^ - COMMA - pop->$ADDRESS, [SUBMIT], ^RETURN^ - OPENPAREN - push, [GATHERCOMMENT] - AT - error(bad address with unquoted whitespace) - WS - push - GT - pop->$FRIENDLY, [LTADDRESS], ^RETURN^ - |SPECIAL| - error(unquoted special) - !OTHER! - error(character not allowed in open) - EOF - pop->$ADDRESS, [SUBMIT], ^RETURN^ [FRIENDLY] - CHAR - push - AT - push - QUOTE - push, [GATHERQUOTE] - COLON - [GROUP], ^RETURN^ - COMMA - error(no address present) - OPENPAREN - push, [GATHERCOMMENT] - WS - push(SPACE) - LT - pop->$FRIENDLY, [LTADDRESS], ^RETURN^ - |SPECIAL| - error(unquoted special) - !OTHER! - error(character not allowed in open) - EOF - error(no address) [LTADDRESS] ->[LT_FRONT_ADDRESS_OPEN] -> NULL->$FRIENDLY ->^RETURN^ [LT_FRONT_ADDRESS_OPEN] - WS - burn - OPENPAREN - push, [GATHERCOMMENT] - CHAR - push, [LT_FRONT_ADDRESS_NORMAL], ^RETURN^ - QUOTE - push, [LT_FRONT_ADDRESS_QUOTED], ^RETURN^ - !OTHER! - error(Not allowed character) - EOF - error(no address) [LT_FRONT_ADDRESS_NORMAL] - OPENPAREN - push, [GATHERCOMMENT] - CHAR - push - AT - push, [LT_ADDRESS_DN_ONLY], ^RETURN^ - WS - [LT_CLOSE_ONLY], ^RETURN^ - GT - pop->$ADDRESS, [SUBMIT], [EXPECT_SEPERATOR_OR_EOF], ^RETURN^ - !OTHER! - error(Character not allowed in name.) - EOF - error(no address) [LT_CLOSE_ONLY] - GT - pop->$ADDRESS, [SUBMIT], [EXPECT_SEPERATOR_OR_EOF], ^RETURN^ - CHAR - push, [BUSTEDBRACKETADDRESS], ^RETURN^ - WS - burn - !OTHER! - error(Cannot put friendly name in address closure) - EOF - error(must close a non-DN address) [LT_FRONT_ADDRESS_QUOTED] -> [GATHERQUOTE] - OPENPAREN - push, [GATHERCOMMENT] - AT - push, [LT_ADDRESS_DN_ONLY], ^RETURN^ - !OTHER! - error(broken quoted against @ in address) - EOF - error(no address) [LT_ADDRESS_DN_ONLY] -> [REQUIRE_DN] - OPENPAREN - push, [GATHERCOMMENT] - DNSCHAR - push - WS - [SEEK_GT], pop->$ADDRESS, [SUBMIT], [EXPECT_SEPERATOR_OR_EOF], ^RETURN^ - GT - pop->$ADDRESS, [SUBMIT], [EXPECT_SEPERATOR_OR_EOF], ^RETURN^ - !OTHER! - error(Character not allowed in DN.) - EOF - error(no address) [NAKED_ADDRESS_DN_ONLY] -> [REQUIRE_DN] - OPENPAREN - push, [GATHERCOMMENT] - DNSCHAR - push - SEMICOLON - if (true = $FLAG_GROUP) then pop->$ADDRESS, [SUBMIT], true->FLAG_GROUP_ENDED, ^RETURN^ else error(Character not allowed in DN.) - COMMA - pop->$ADDRESS, [SUBMIT], ^RETURN^ - WS - pop->$ADDRESS, [MAYBE_NOT_AN_ADDRESS], ^RETURN^ - !OTHER! - error(Character not allowed in DN.) - EOF - pop->$ADDRESS, [SUBMIT], ^RETURN^ [MAYBE_NOT_AN_ADDRESS] - OPENPAREN - push, [GATHERCOMMENT], [FRIENDLY], ^RETURN^ - WS - burn - SEMICOLON - if (true = $FLAG_GROUP) then pop->$ADDRESS, [SUBMIT], true->FLAG_GROUP_ENDED, ^RETURN^ else error(Group teminator when group not defined.) - LT - pop->$FRIENDLY, [LTADDRESS], [EXPECT_SEPERATOR_OR_EOF], ^RETURN^ - COMMA - pop->$ADDRESS, [SUBMIT], ^RETURN^ - EOF - pop->$ADDRESS, [SUBMIT], ^RETURN^ - QUOTE - push, [GATHERQUOTE], [FRIENDLY], ^RETURN^ - CHAR - push, [FRIENDLY], ^RETURN^ - !OTHER! - error(addresses not delimited.) [EXPECT_SEPERATOR_OR_EOF] - WS - burn - OPENPAREN - burn, [BURNCOMMENT] - SEMICOLON - if (true = $FLAG_GROUP) then true->FLAG_GROUP_ENDED, ^RETURN^ else error(Group teminator when group not defined.) - COMMA - ^RETURN^ - EOF - ^RETURN^ - !OTHER! - error(addresses not delimited.) [SEEK_GT] - OPENPAREN - push, [GATHERCOMMENT] - WS - burn - GT - ^RETURN^ - !OTHER! - error(Character not allowed after whitespace, before '>') - EOF - error(address not closed with a '>') [BUSTEDBRACKETADDRESS] - OPENPAREN - push, [GATHERCOMMENT] - WS - push(SPACE) - GT - pop->$BUSTED, [SUBMIT], ^RETURN^ - !OTHER! - push - EOF - error(address not closed with a '>') [REQUIRE_DN] - OPENPAREN - push, [GATHERCOMMENT] - DNSCHAR - push, ^RETURN^ - OPENPAREN - push, [GATHERCOMMENT] - !OTHER! - error(bad domain name) - EOF - error(no address) [GATHERQUOTE] - BACKSLASH = burn, [ESCAPE] - QUOTE = push, ^RETURN^ - !OTHER! = push - EOF - error(quote left open) [ESCAPE] - ASCII = push, ^RETURN^ - EOF - error(escape left open) [GATHERCOMMENT] - CLOSEPAREN = push, ^RETURN^ - BACKSLASH = burn, [ESCAPE] - !OTHER! = push - EOF - error(comment left dangling.) [BURNCOMMENT] - CLOSEPAREN = ^RETURN^ - !OTHER! = burn - EOF - error(comment left dangling.) [GROUP] -> if($FLAG_COLON=true, error(cannot imbed groups)), -> pop->$GROUP, -> true->$FLAG_GROUP, -> [OPEN], -> NULL->$GROUP, -> false->$FLAG_GROUP, [SUBMIT] -> submit($GROUPm, $FRIENDLY, $ADDRESS) -> NULL->$FRIENDLY -> NULL->$ADDRESS

Throws:
java.lang.Throwable

parser

public void parser(java.io.InputStream ins,
                   AddressListener addressListener)
            throws java.lang.Throwable
Call with an InputStream.

Parameters:
ins - the source stream.
addressListener -
Throws:
java.lang.Throwable

parser

public void parser(java.lang.String data,
                   AddressListener addressListener)
            throws java.lang.Throwable
Call with a String.

Parameters:
data - the String
addressListener -
Throws:
java.lang.Throwable


Things.