This section is primarily concerned with address geocoding, however the basic principles still apply to geographic coordinates and place names.
There are three basic steps for geocoding:
Geographic descriptions can be acquired from many sources, both private and public, and are typically a text-based or spreadsheet table, either exported from an existing database or as a standalone file. Many times researchers may employ surveys to collect addresses from businesses or people. Locations can also be text-mined from unstructured textual data, though this can be very advanced and is beyond the scope of this guide. Addresses can usually be input as a single line/field or as multiple fields, such as address, city, state, and postal code. Either works, but generally using multiple address fields tends to yield better results and greater control over your data. For example, it is easier to create a single line address from multiple address fields.
No matter how the data is acquired or how many address fields there are, geocoding data typically needs to be prepared and transformed into a simple data table so that it can be easily input into an XY or Address Locator. For example, spreadsheets with addresses may have extraneous data or metadata that needs to removed or organized to create a simple data table that can be imported into a geospatial data format, such as a Shapefile or File Geodatabase. This means that the first row of the table needs to only include the names of the fields (or columns). These fields also need to follow a set of rules, such as no spaces or special characters, otherwise there may be an error. When working with Excel spreadsheets in ArcGIS, there are specific rules in order for the data to be used properly. Once the table fields are properly setup, all data values need to be properly organized and ordered. Many times, geocoding data may also include other values, or attributes, that pertain to the geographic description or location but are not required for the geocoding process. When input into a locator, this data will be retained so it is safe to keep it.
After the geocoding data is prepared, it must be run through a locator. A locator is specialized software that utilizes reference data, such as roads or known location points, to tie location descriptions to geographic coordinates. Reference data is designed and organized specifically for the type of locator. Therefore, locators are only as good as the underlying reference data and are also dependent on the currency and accuracy of the reference data.
In ArcGIS, many different types of locators can be used and can even be custom created. Some are available from data providers, such as from ArcGIS Business Analyst, and are generally recommended over trying to create one. Composite locators can also be created in ArcGIS, which utilize different types of locators to provide a hierarchy for finding location descriptions. For instance, if the address cannot be found, the locator could find the zip code.
For geocoding web services, such as Google and the Census Bureau, locators are are not directly accessible like they are in ArcGIS, and therefore cannot be customized.
After the data has completed the geocoding process with the selected locator, it is recommended to systematically review how well the data was matched. There are many considerations, but here is a list to get started:
ArcGIS provides tools to make rematching records much easier. Many of the other tools do not offer this funtionality. However, even with the specialized matching tools, manually reviewing tied and unmatched addresses for very large datasets may not be worth the time. Therefore, be prepared to eliminate some addresses from the data.