Posted
April 24, 2003 by Thomas J. Holmes, University of Minnesota, holmes@econ.umn.edu
This resource contains establishment-level
data derived from County Business Patterns (CBP) data used in Holmes and
Stevens (2003) and other work. The
resource is posted on the web at www.econ.umn.edu/~holmes/data/cbp
for the convenience of other researchers.
An additional page with web links to sources listed below can be found
at http://www.econ.umn.edu/~holmes/data/cbp/links.
There are 7,070,048 establishments in the CBP
universe for the year 2000. The file est_cnty2000
provides cell counts for these establishments by industry (six digit NAICS),
employment size class, and county. The
file est_zip2000 contains the analogous cell counts at the zip code level
rather than the county level.
Also included on the resource page are
auxiliary files. There are separate
geographic reference files for both counties (georef_cnty2000) and zip
codes (georef_zip2000). These
reference files include geographic coordinates of the location units. There is a reference file for the NAICS
industries (NAICS2000). There is
also a file that contains mean employment by size class. These can be used as point estimates of
establishment employment.
The source of this data is the County
Business Patterns CD-ROM 1999-2000, C1-E00-CBPX-01-US1, Issued July 2002.
One minor coding change was made. The FIPS code for Miami-Dade in Florida was
updated from COUNTY=025 to COUNTY=086 to be consistent with the geographic
reference file from the 2000 Decennial Census.
Variable
Name |
Definition |
NAICS |
6-digit
NAICS |
ST |
FIPS
state |
COUNTY |
FIPS
county The
case of COUNTY=999 indicates Statewide |
EMPCAT |
Employment
size class 01=1-4 02=5-9 03=10-19 04=20-49 05=50-99 06=100-249 07=250-499 08=500-999 09=1,000-1,499 10=1,500-2,499 11=2,500-4,999 12=5,000+ Note: in the county file all establishments with
more than 1,000 employees are assigned to on of the detailed classifications
09,10,11, or 12. In the zip
code file all such establishments are listed in the category M+. |
NUMBER |
The
frequency of establishments in this cell.
|
The source of this data is the Zip
Code Business Patterns CD-ROM for 2000.
Variable
Name |
Definition |
NAICS |
6-digit
NAICS |
ZIP |
5-digit
zip code |
EMPCAT |
Employment
size class 01=1-4 02=5-9 03=10-19 04=20-49 05=50-99 06=100-249 07=250-499 08=500-999 M+=1,000+ Note: in the zip code file, establishments with
more than 1,000 employees are assigned to one category. |
NUMBER |
The
frequency of establishments in this cell.
|
The source of this geographic data is
the published geographic data used for the 2000 decennial census. For all the data files released (Summary
File 1, Summary File 3, etc.) there is a consistent set of geographic
files. This file is an extract of this
geographic file for the set of 3,141counties in the 2000 Census. As an aside, the geographic data from the
2000 Census is an extremely helpful resource.
In one place (the raw geographic files) one can obtain geographic
information for all the different geographic levels, including census tracts,
zip codes, places, etc. Moreover, this
is a useful file for linking these various geographic objects. Furthermore, boundary files are available
for these geographic units at http://www.census.gov/geo/www/cob/index.html.
Note there is no geographic
information provided for cases where COUNTY=999. This indicates a statewide
establishment.
Variable
Name |
Definition |
ST |
FIPS
state |
COUNTY |
FIPS
county |
sttext |
state
postal abbreviation |
NAME |
name
of county |
lat |
latitude |
long |
longitude |
pop2000 |
County
population from the 2000 Census |
land
|
land
area in square miles |
water |
water
area of county in square miles |
MSACMSA |
Metropolitan
Statistical Area/Consolidated Metropolitan Statistical Area |
CMSA |
Consolidated
Metropolitan Statistical Area |
PMSA |
Primary
Metropolitan Statistical Area |
NECMA |
New
England County Metropolitan Area |
This data is assembled for several
sources. There are The ZIP CODE
BUSINESS PATTERNS provides a geographic reference file all the zip codes with
positive establishments in the data.
The only information provided in the reference file is the city and
state of the post office corresponding to the zip code. The reference file georef_zip2000
expands on this by providing geographic coordinates as well as county
information.
It should be noted that zip codes can
be crude since they are tied to postal routes.
Although much rarer than it used to be, zip codes can cross state
lines. They certainly can cross county
boundaries.
The main source of additional information
(geographic coordinates and county) was obtained from a file ZIPNOV99 available
on the Census web site. According to
the Census documenation, this file
prepared by the Bureau of
Census from the U.S. Postal Service (USPS) City-State file (November, 1999).
This file contains all 5-digit ZIP codes defined as of November 1, 1999, the
state and county FIPS codes and the Post Office names associated with them.
(Note For ZIP codes that cross county boundaries, the Post Office file
assigns that ZIP code to just one of the counties rather than to each
county.)The Census Bureau then determined a geographic coordinate (latitude and
longitude) for each ZIP code in the City-State file by processing it against
the Bureaus internal TIGER database for the state and county specified for the
ZIP code.
Out
of the 39,853 zip codes in the establishment data set, all by 394 are in ZIPNOV99
file. The geographic information from
the remaining 394 was cobbled together from three other sources. The first additional source is the U.S.
Gazetteer zip code file from the 1990 Census (also available at the Census web
site). This file as geographic
coordinates but no county information.
There were 166 of the residual zip codes that were matched up in this
way. The second additional source was
the FIPS55 file produced by the US Geological Survey. This files lists places along with the corresponding zip code and
counties. The zip codes in the file
were used to obtain matches of county and place names. The place names and counties were then used
to obtain matches of geographic coordinates in the USGS Gazetteer file of
places. There were 56 matches obtained
this way. The final set of 171 matches
was obtained by direct matching of the postal names with the place names in the
USGS Gazetteer file.
Variable
Name |
Definition |
ZIP |
5-digit
zip code |
name1 |
Name
of the post office from the original geographic reference file supplied with
the establishment data |
estab |
Number
of establishments in this zip |
st |
FIPS
st |
sttext |
state
postal abbreviation |
county |
FIPS
county |
lat |
latitude |
long |
longitude |
name2 |
Name
obtained from the source of the additional information |
source |
Origin
of the additional information =0 for case of zip=99999 (1 case) =1 if from zipnov99 file
(34,459 cases) =2 if from 1990 Census file
(34,459 cases) =3 if matched to FIPS55 file (56
cases) =4 if matched to USGS Gazetteer
place file (171 cases) |
This
is an auxiliary file from the CBP 2000 CD.
Variable
Name |
Definition |
NAICS |
NAICS
code |
NAICStxt |
text
|
Data
Set 7: avg_emp
This file provides information about average employment for each cell. For EMPCAT categories 01 on up to 08 and M+ these are the average employments in the cells in the United State across all establishments. This uses the information provided in the U.S. file in the County Business Patterns data. The U.S. file does not have information about the four disaggregated size categories for 1,000 and above. These cell averages for these four cases were estimated by a GMM procedure that assumes a long normal distribution of establishments and uses the following moments: (1) the total number of employees in the 1,000 plus category (empcat=M+), (2) the total number of establishments in empcat categories 09,10,11,12. The reported mean is the mean employment in the cell in the estimated log normal distribution.
Variable
Name |
Definition |
year |
year
of CBP data for which mean or estimate is calculated |
EMPCAT |
employment
size class |
AVG_EMP |
mean employment (for EMPCAT=01-08, and
M+) estimated
mean employment (for EMPCAT=09-12 |