AddrTree


Introduction

The AddrTree toolchain is used by the AirCERT project to pre-process allocation data from the Regional Internet Registries (RIRs) for use as a supplemental data source for the analysis of security event data. Specifically, allocation data can be used to group event source and destination addresses by network, by contact/organization, and by location (currently, only country codes are supported by the AddrTree chain).

The pre-processing includes normalization of modification dates, resolution of conflicts between the RIRs, repair of range erosions (probable single-CIDR block allocations that are not properly aligned to CIDR block boundaries), and resolution of inversions (violations of allocation tree structure due to corrupted or stale allocation range data). After anomalies are corrected, each record is split into CIDR blocks. Additionally, AddrTree keeps a log of anomalies it found for later analysis.

Usage

AddrTree is a collection of Perl scripts whose workflow is managed by Make. It requires Perl (of course), GNU Make, and NcFTP (for automatic download of RIR databases). Additionally, because AddrTree does in-memory sorting of allocation records to build the tree, it also requires at least 1GB of available memory (on ia32 machines; requirements may vary on other architectures).

Note: AddrTree currently only processes ARIN, RIPE, and APNIC allocations. The software should be suitable for processing LACNIC allocation data as well; there is simply no provision for LACNIC processing in the Makefile provided with AddrTree, primarily because we do not yet use LACNIC source data in our own process.

To use AddrTree, place your ARIN database access credentials in the arin_cred.txt file, then make download cidr. After an hour or so of processing, the IPv4 allocation tree in AddrTree format will be written to data/world_cidr.at, and the anomaly log to anomaly.log.

The resulting world_cidr.at file can be translated to tab-delimited format for import into a relational database using the atDumpTDF.pl script.

Data Format

AddrTree represents RIR allocation objects in the .at files using a line-oriented text format of colon-delimited records sorted in in tree traversal order (i.e. sorted by range start address ascending, then by range end address descending, then by allocation level ascending (allocation level is the number of intermediate allocations and assignments between a given object and its root allocation). The fields in each line are as follows:

PositionDescriptionNotes
0Range start addressin hexadecimal
1Range end addressin hexadecimal
2Prefix length-1 before block splitting
3Range heirarchy levelfrom ARIN NetHandle before stacking, tree depth after stacking
4Source RIR0 = ARIN
1 = RIPE
2 = APNIC
3 = LACNIC
5Allocation status0 = reserved
1 = allocated
2 = reallocated
3 = suballocated
4 = assigned
5 = reassigned
6 = early registration
7 = lir-partitioned
8 = RIR
9 = unspecified
6Modification datein ISO8601 format
7Country code
8Network name
9Admin POC handle
10Tech POC handle

The anomaly log uses a similar format with twelve columns; it is the AddrTree format above with an additional anomaly code column prepended. The currently supported anomaly codes are as follows:

CodeDescriptionNotes
100Two digit year
101Day beyond end of monthe.g. April 31
102YYYYDDMM date orderonly unambiguous anomalies of this type are detectable
103Date missingMissing dates are corrected to 1983-01-01
200Merge conflictResolved by national affiliation with RIR
201Merge conflictResolved by RIR seniority
202Merge conflictResolved by input file order (first file wins)
400Inner erosionat start of range
401Inner erosionat end of range
402Inner erosionat start and end of range
403Outer erosionat start of range
404Outer erosionat end of range
405Outer erosionat start and end of range
406Shift erosionInner at start, outer at end
500InversionRecord was dropped
501InversionRecord was kept

Processing Stages

The following is a list of scripts making up AddrTree and their functions. All scripts read from standard input and write to standard output (and can be piped accordingly), and append anomalies to anomaly.log, unless otherwise noted. Note that the proper workflow for these scripts is already encoded in the included Makefile.

FileDescriptionNotes
AddrTree.pmCommon AddrTree support moduleHandles record representation, I/O, field parsing, and sorting.
atParseARIN.plParse arin_db.txt file into .at fileSince this script requires two passes, standard input must be redirected from a file.
atParseRIPE.plParse .db.inetnum file into .at file
atStrip.plStrip non-network (redirect) records
atDerode.plFix erosions
atSort.plRe-sort .at fileRequired after steps (e.g. erosion fixing) that may break sort order
atMerge.plMerge two .at files into oneTakes names of files on command line, writes to stdout
atStack.plVerify tree structure and store hierarchy levelAlso fixes inversions. Required before atCIDR.pl to avoid ambiguities in hierarchy
atCIDR.plSplit ranges into CIDR blocksatSort.pl not required after this stage, because atCIDR.pl handles re-sorting of records after splitting
atDumpTDF.plConvert .at file to tab-delimited formatAlso processes anomaly logs if magic word "anomaly" given on command line.


Copyright © 2004, Carnegie Mellon University