|
|
Introduction
The AddrTree toolchain is used by the AirCERT project to
pre-process allocation data from the Regional Internet Registries
(RIRs) for use as a supplemental data source for the analysis of
security event data. Specifically, allocation data can be used to
group event source and destination addresses by network, by
contact/organization, and by location (currently, only country codes
are supported by the AddrTree chain).
The pre-processing includes normalization of modification dates,
resolution of conflicts between the RIRs, repair of range erosions
(probable single-CIDR block allocations that are not properly aligned
to CIDR block boundaries), and resolution of inversions (violations of
allocation tree structure due to corrupted or stale allocation range
data). After anomalies are corrected, each record is split into CIDR
blocks. Additionally, AddrTree keeps a log of anomalies it found for
later analysis.
Usage
AddrTree is a collection of Perl scripts whose workflow is managed by
Make. It requires Perl (of course), GNU Make, and NcFTP (for
automatic download of RIR databases). Additionally, because AddrTree
does in-memory sorting of allocation records to build the tree, it
also requires at least 1GB of available memory (on ia32 machines;
requirements may vary on other architectures).
Note: AddrTree currently only processes ARIN, RIPE, and APNIC
allocations. The software should be suitable for processing LACNIC
allocation data as well; there is simply no provision for LACNIC
processing in the Makefile provided with AddrTree,
primarily because we do not yet use LACNIC source data in our own
process.
|
To use AddrTree, place your ARIN database access credentials in
the arin_cred.txt file, then make download cidr. After an
hour or so of processing, the IPv4 allocation tree in AddrTree format
will be written to data/world_cidr.at, and the anomaly
log to anomaly.log.
The resulting world_cidr.at file can be translated to
tab-delimited format for import into a relational database using
the atDumpTDF.pl script.
Data Format
AddrTree represents RIR allocation objects in the .at
files using a line-oriented text format of colon-delimited records
sorted in in tree traversal order (i.e. sorted by range start address
ascending, then by range end address descending, then by allocation
level ascending (allocation level is the number of intermediate
allocations and assignments between a given object and its root
allocation). The fields in each line are as follows:
| Position | Description | Notes |
| 0 | Range start address | in hexadecimal |
| 1 | Range end address | in hexadecimal |
| 2 | Prefix length | -1 before block splitting |
| 3 | Range heirarchy level | from ARIN NetHandle before stacking, tree depth after stacking |
| 4 | Source RIR | 0 = ARIN 1 = RIPE 2 = APNIC 3 = LACNIC |
| 5 | Allocation status | 0 = reserved 1 = allocated 2 = reallocated 3 = suballocated 4 = assigned 5 = reassigned 6 = early registration 7 = lir-partitioned 8 = RIR 9 = unspecified |
| 6 | Modification date | in ISO8601 format |
| 7 | Country code | |
| 8 | Network name | |
| 9 | Admin POC handle | |
| 10 | Tech POC handle | |
The anomaly log uses a similar format with twelve columns; it is the AddrTree format above with an additional anomaly code column prepended. The currently supported anomaly codes are as follows:
| Code | Description | Notes |
| 100 | Two digit year | |
| 101 | Day beyond end of month | e.g. April 31 |
| 102 | YYYYDDMM date order | only unambiguous anomalies of this type are detectable |
| 103 | Date missing | Missing dates are corrected to 1983-01-01 |
| 200 | Merge conflict | Resolved by national affiliation with RIR |
| 201 | Merge conflict | Resolved by RIR seniority |
| 202 | Merge conflict | Resolved by input file order (first file wins) |
| 400 | Inner erosion | at start of range |
| 401 | Inner erosion | at end of range |
| 402 | Inner erosion | at start and end of range |
| 403 | Outer erosion | at start of range |
| 404 | Outer erosion | at end of range |
| 405 | Outer erosion | at start and end of range |
| 406 | Shift erosion | Inner at start, outer at end |
| 500 | Inversion | Record was dropped |
| 501 | Inversion | Record was kept |
Processing Stages
The following is a list of scripts making up AddrTree and their
functions. All scripts read from standard input and write to standard
output (and can be piped accordingly), and append anomalies
to anomaly.log, unless otherwise noted. Note that the
proper workflow for these scripts is already encoded in the included
Makefile.
| File | Description | Notes |
| AddrTree.pm | Common AddrTree support module | Handles record representation, I/O, field parsing, and sorting. |
| atParseARIN.pl | Parse arin_db.txt file into .at file | Since this script requires two passes, standard input must be redirected from a file. |
| atParseRIPE.pl | Parse .db.inetnum file into .at file | |
| atStrip.pl | Strip non-network (redirect) records | |
| atDerode.pl | Fix erosions | |
| atSort.pl | Re-sort .at file | Required after steps (e.g. erosion fixing) that may break sort order |
| atMerge.pl | Merge two .at files into one | Takes names of files on command line, writes to stdout |
| atStack.pl | Verify tree structure and store hierarchy level | Also fixes inversions. Required before atCIDR.pl to avoid ambiguities in hierarchy |
| atCIDR.pl | Split ranges into CIDR blocks | atSort.pl not required after this stage, because atCIDR.pl handles re-sorting of records after splitting |
| atDumpTDF.pl | Convert .at file to tab-delimited format | Also processes anomaly logs if magic word "anomaly" given on command line. |
|