Geocoding is the process of transforming a description of a location—such as a pair of coordinates, an address, or a name of a place—to a location on the earth's surface. You can geocode by entering one location description at a time or by providing many of them at once in a table. The resulting locations are output as geographic features with attributes, which can be used for mapping or spatial analysis.
The idea of geocoding addresses and places has been around for nearly as long as GIS, since the 1960s. Geocoding has become a valuable asset to GIS users and researchers to help find and locate on maps large quantities of geospatial data that are stored as addresses, place names, or latitude and longitude values in tabular data.
Single geocodes are when you simply need to find one address quickly. Typically, GIS and mapping applications, such as ArcGIS and Google Earth, will allow for users to search a single address or location using a simple search without any cost or limits to the number of times a search can be performed. This is useful if you have only a few addresses or locations, but what if you have thousands addresses? When you have many addresses or locations, this is called batch geocoding. Whether you have a few dozen or over a million addresses, batch geocoding tools will relatively quickly search for each address, though more addresses will take more time. Mapping and GIS applications will typically have limits or costs associated with batch geocoding. For example, ArcGIS Online uses a credits system, so more geocodes will cost more credits. Google Earth allows for 2,500 free geocodes per day, but also only licenses the display of geocoded data using a Google map.
Sometimes, you may have the points or the latitude and longitude values of locations, but you need to find the address of that location. This is called reverse geocoding. While there may be cases when this is needed, it is not as common as conventional geocoding. Because of how address locators work, the results may not be very accurate and may contain errors.
Geocoded data is typically converted from a table of values into a point feature class that can then be used for analysis or simple cartographic display. Researchers, business analysts, and health professionals, to name a few, commonly utilize geocoding for mapping the locations of businesses and people's homes. Business applications might include monitoring customer sales or shipping patterns, while a health professional may want to see epidemiological patterns of a disease or assess patient access to healthcare facilities. These are just a few of the many research applications for geocoding addresses.
When most people think of geocoding, they think of addresses. But they aren't the only type of descriptions that can be geocoded. There are three basic types of geocoding location descriptions:
The simplest type of geocoding is converting geographic coordinates in the form of latitude and longitude values or other types of mapping coordinates, into a GIS data format. In order to map coordinates, it is essential to know the coordinate system for the values being used. Geographic coordinates, latitude and longitude values, are many times in World Geodetic System 1984 (WGS 1984). In the US they are also commonly found in North American Datum 1983 (NAD 1983) and older data may be in North American Datum 1927 (NAD 1927). For other areas of the world, area-specific geographic coordinate systems may also be used. If the coordinates are between -180 and 180, they are likely longitude values (i.e. the 'x' value), and if they are between -90 and 90, they are likely latitude values (i.e. the 'y' value). Knowing the general location for where the data is supposed to be will help determine which value is which. If you are unsure about which geographic coordinate system the data is in, there may not be major differences between the different systems. It is recommended to use WGS 1984 if you are unsure, but be cautious.
Coordinates from projected coordinate systems are also common, especially with UTM (world-wide) and State Plane (US only) coordinate systems. In general, if the coordinates are very large positive numbers, they are likely from a projected coordinate system, and negative values are not generally found. Many times, projected coordinate values refer to 'easting' (i.e. the 'x' value) and 'northing' (i.e. the 'y' value) values. For projected coordinate system values, knowing the coordinates system is essential, otherwise the data will not align and be placed properly on the map.
Place names are another common geographic entity that can be geocoded. Place name geocoding is akin to utilizing a gazetteer to look up locations and acquire the latitude and longitude values which can then be easily mapped and converted into a GIS data format. Place names could be general to represent the center of a large feature, such as countries, states, counties, or cities, or could be very specific to represent near-exact locations, such as with mountain peaks, landmarks, water falls, and bridges.
Addresses are probably the most common location description, but also the most complex of geocoding entities. Addresses can also be found using many different methods. The simplest and most accurate form of address location is using points to accurately map the address. However, for smaller or rural communities, address point data may not exist. If address points are not available, the road network can be used to estimate along the road where the address is. For this to work, the road data must include address ranges for both sides of the road. The address locator software can then determine approximately where the address exists near the road. Though not as common, polygon data, such as for building footprints or land ownership parcels, can also be used in address locators.
When geocoding addresses for road network analyses, such as for determining many routes or closest facilities, how the address locator places the address could be important. For instance, Google Earth many times places addresses directly on or near the building, while ArcGIS tends to place points near the road. For road network analysis, having the point nearer the road could be advantageous so as to not 'confuse' the software about which road the address is associated with which road. This is especially important for addresses near road intersections.
Except where otherwise indicated, original content in this guide is licensed under a Creative Commons Attribution (CC BY) 4.0 license. You are free to share, adopt, or adapt the materials. We encourage broad adoption of these materials for teaching and other professional development purposes, and invite you to customize them for your own needs.