Documentation for County Business Patterns Establishment Data Page

 

Posted April 24, 2003 by Thomas J. Holmes, University of Minnesota, holmes@econ.umn.edu

 

          This resource contains establishment-level data derived from County Business Patterns (CBP) data used in Holmes and Stevens (2003) and other work.  The resource is posted on the web at www.econ.umn.edu/~holmes/data/cbp for the convenience of other researchers.  An additional page with web links to sources listed below can be found at http://www.econ.umn.edu/~holmes/data/cbp/links.

 

There are 7,070,048 establishments in the CBP universe for the year 2000.  The file est_cnty2000 provides cell counts for these establishments by industry (six digit NAICS), employment size class, and county.  The file est_zip2000 contains the analogous cell counts at the zip code level rather than the county level. 

 

          Also included on the resource page are auxiliary files.  There are separate geographic reference files for both counties (georef_cnty2000) and zip codes (georef_zip2000).  These reference files include geographic coordinates of the location units.  There is a reference file for the NAICS industries (NAICS2000).  There is also a file that contains mean employment by size class.  These can be used as point estimates of establishment employment.


 

Date Set 1: est_cnty2000

          The source of this data is the County Business Patterns CD-ROM 1999-2000, C1-E00-CBPX-01-US1, Issued July 2002.   

          One minor coding change was made.  The FIPS code for Miami-Dade in Florida was updated from COUNTY=’025’ to COUNTY=’086’ to be consistent with the geographic reference file from the 2000 Decennial Census.

 

Variable Name

Definition

NAICS

6-digit NAICS

ST

FIPS state

COUNTY

FIPS county

 

The case of COUNTY=’999’ indicates “Statewide”

EMPCAT

Employment size class

          ‘01’=1-4

          ‘02’=5-9

          ‘03’=10-19

          ‘04’=20-49

          ‘05’=50-99

          ‘06’=100-249

          ‘07’=250-499

          ‘08’=500-999

          ‘09’=1,000-1,499

          ‘10’=1,500-2,499

          ‘11’=2,500-4,999

          ‘12’=5,000+

         

Note:  in the county file all establishments with more than 1,000 employees are assigned to on of the detailed classifications ‘09’,’10’,’11’, or ‘12’.  In the zip code file all such establishments are listed in the category ‘M+’.

 

NUMBER

The frequency of establishments in this cell. 

 


 

Date Set 2: est_zip2000

          The source of this data is the Zip Code Business Patterns CD-ROM for 2000. 

 

Variable Name

Definition

NAICS

6-digit NAICS

ZIP

5-digit zip code

EMPCAT

Employment size class

          ‘01’=1-4

          ‘02’=5-9

          ‘03’=10-19

          ‘04’=20-49

          ‘05’=50-99

          ‘06’=100-249

          ‘07’=250-499

          ‘08’=500-999

          ‘M+’=1,000+

 

Note:  in the zip code file, establishments with more than 1,000 employees are assigned to one category.

 

NUMBER

The frequency of establishments in this cell. 

 


 

 

Date Set 3: georef_cnty2000

          The source of this geographic data is the published geographic data used for the 2000 decennial census.  For all the data files released (Summary File 1, Summary File 3, etc.) there is a consistent set of geographic files.  This file is an extract of this geographic file for the set of 3,141counties in the 2000 Census.  As an aside, the geographic data from the 2000 Census is an extremely helpful resource.  In one place (the raw geographic files) one can obtain geographic information for all the different geographic levels, including census tracts, zip codes, places, etc.  Moreover, this is a useful file for linking these various geographic objects.  Furthermore, boundary files are available for these geographic units at http://www.census.gov/geo/www/cob/index.html.

 

          Note there is no geographic information provided for cases where COUNTY=’999.’ This indicates a statewide establishment.

         

Variable Name

Definition

ST

FIPS state

COUNTY

FIPS county

sttext

state postal abbreviation

NAME

name of county

lat

latitude

long

longitude

pop2000

County population from the 2000 Census

land

land area in square miles

water

water area of county in square miles

MSACMSA

Metropolitan Statistical Area/Consolidated Metropolitan Statistical Area

CMSA

Consolidated Metropolitan Statistical Area

PMSA

Primary Metropolitan Statistical Area

NECMA

New England County Metropolitan Area

 


 

Date Set 4: georef_zip2000

         

          This data is assembled for several sources.  There are The ZIP CODE BUSINESS PATTERNS provides a geographic reference file all the zip codes with positive establishments in the data.  The only information provided in the reference file is the city and state of the post office corresponding to the zip code.  The reference file georef_zip2000 expands on this by providing geographic coordinates as well as county information.

          It should be noted that zip codes can be crude since they are tied to postal routes.  Although much rarer than it used to be, zip codes can cross state lines.  They certainly can cross county boundaries.

          The main source of additional information (geographic coordinates and county) was obtained from a file ZIPNOV99 available on the Census web site.  According to the Census documenation, this file

 

“prepared by the Bureau of Census from the U.S. Postal Service (USPS) City-State file (November, 1999). This file contains all 5-digit ZIP codes defined as of November 1, 1999, the state and county FIPS codes and the Post Office names associated with them. (Note – For ZIP codes that cross county boundaries, the Post Office file assigns that ZIP code to just one of the counties rather than to each county.)The Census Bureau then determined a geographic coordinate (latitude and longitude) for each ZIP code in the City-State file by processing it against the Bureau’s internal TIGER database for the state and county specified for the ZIP code.”

 

Out of the 39,853 zip codes in the establishment data set, all by 394 are in ZIPNOV99 file.  The geographic information from the remaining 394 was cobbled together from three other sources.  The first additional source is the U.S. Gazetteer zip code file from the 1990 Census (also available at the Census web site).  This file as geographic coordinates but no county information.  There were 166 of the residual zip codes that were matched up in this way.  The second additional source was the FIPS55 file produced by the US Geological Survey.  This files lists places along with the corresponding zip code and counties.  The zip codes in the file were used to obtain matches of county and place names.  The place names and counties were then used to obtain matches of geographic coordinates in the USGS Gazetteer file of places.  There were 56 matches obtained this way.  The final set of 171 matches was obtained by direct matching of the postal names with the place names in the USGS Gazetteer file.

 

 

 

Variable Name

Definition

ZIP

5-digit zip code

name1

Name of the post office from the original geographic reference file supplied with the establishment data

estab

Number of establishments in this zip

st

FIPS st

sttext

state postal abbreviation

county

FIPS county

lat

latitude

long

longitude

name2

Name obtained from the source of the additional information

source

Origin of the additional information

          =’0’ for case of zip=’99999’  (1 case)

          =’1’ if from zipnov99 file (34,459 cases)

          =’2’ if from 1990 Census file (34,459 cases)

          =’3’ if matched to FIPS55 file (56 cases)

          =’4’ if matched to USGS Gazetteer place

          file (171 cases)

 

 

 

Data Set 6: NAICS2000

 

This is an auxiliary file from the CBP 2000 CD.

 

Variable Name

Definition

NAICS

NAICS code

NAICStxt

text

 

Data Set 7: avg_emp

 

          This file provides information about average employment for each cell.  For EMPCAT categories ‘01’ on up to ‘08’ and ‘M+’ these are the average employments in the cells in the United State across all establishments.  This uses the information provided in the U.S. file in the County Business Patterns data.  The U.S. file does not have information about the four disaggregated size categories for 1,000 and above.  These cell averages for these four cases were estimated by a GMM procedure that assumes a long normal distribution of establishments and uses the following moments:  (1) the total number of employees in the 1,000 plus category (empcat=’M+’), (2) the total number of establishments in empcat categories ‘09’,’10’,’11’,’12’.  The reported mean is the mean employment in the cell in the estimated log normal distribution.

 

Variable Name

Definition

year

year of CBP data for which mean or estimate is calculated

EMPCAT

employment size class

AVG_EMP

mean  employment (for EMPCAT=’01’-‘08’, and ‘M+’)

estimated mean employment (for EMPCAT=’09’-‘12’