Reads a dataset downloaded from the IPUMS extract system.
For IPUMS projects with microdata, it relies on a downloaded
DDI codebook and a fixed-width file. Loads the data with
value labels (using labelled
format)
and variable labels. See 'Details' for more information on
how record types are handled by the ipumsr package.
read_ipums_micro(
ddi,
vars = NULL,
n_max = Inf,
data_file = NULL,
verbose = TRUE,
var_attrs = c("val_labels", "var_label", "var_desc"),
lower_vars = FALSE
)
read_ipums_micro_list(
ddi,
vars = NULL,
n_max = Inf,
data_file = NULL,
verbose = TRUE,
var_attrs = c("val_labels", "var_label", "var_desc"),
lower_vars = FALSE
)
Either a filepath to a DDI xml file downloaded from
the website, or a ipums_ddi
object parsed by read_ipums_ddi
Names of variables to load. Accepts a character vector of names, or
dplyr_select_style
conventions. For hierarchical data, the
rectype id variable will be added even if it is not specified.
The maximum number of records to load.
Specify a directory to look for the data file. If left empty, it will look in the same directory as the DDI file.
Logical, indicating whether to print progress information to console.
Variable attributes to add from the DDI, defaults to
adding all (val_labels, var_label and var_desc). See
set_ipums_var_attributes
for more details.
Only if reading a DDI from a file, a logical indicating
whether to convert variable names to lowercase (default is FALSE, in line
with IPUMS conventions). Note that this argument will be ignored if
argument ddi
is an ipums_ddi
object rather than a file path.
See read_ipums_ddi
for converting variable names to lowercase
when reading in the DDI.
read_ipums_micro
returns a single tbl_df data frame, and
read_ipums_micro_list
returns a list of data frames, named by
the Record Type. See 'Details' for more
information.
Some IPUMS projects have data for multiple types of records (eg Household and Person). When downloading data from many of these projects you have the option for the IPUMS extract system to "rectangularize" the data, meaning that the data is transformed so that each row of data represents only one type of record.
There also is the option to download "hierarchical" extracts, which are a single file with record types mixed in the rows. The ipumsr package offers two methods for importing this data.
read_ipums_micro
loads this data into a "long" format
where the record types are mixed in the rows, but the variables
are NA
for the record types that they do not apply to.
read_ipums_micro_list
loads the data into a list of
data frames objects, where each data frame contains only
one record type. The names of the data frames in the list
are the text from the record type labels without 'Record'
(often 'HOUSEHOLD' for Household and 'PERSON' for Person).
Other ipums_read:
read_ipums_micro_chunked()
,
read_ipums_micro_yield()
,
read_ipums_sf()
,
read_nhgis()
,
read_terra_area()
,
read_terra_micro()
,
read_terra_raster()
# Rectangular example file
cps_rect_ddi_file <- ipums_example("cps_00006.xml")
cps <- read_ipums_micro(cps_rect_ddi_file)
#> Use of data from IPUMS-CPS is subject to conditions including that users should
#> cite the data appropriately. Use command `ipums_conditions()` for more details.
# Or load DDI separately to keep the metadata
ddi <- read_ipums_ddi(cps_rect_ddi_file)
cps <- read_ipums_micro(ddi)
#> Use of data from IPUMS-CPS is subject to conditions including that users should
#> cite the data appropriately. Use command `ipums_conditions()` for more details.
# Hierarchical example file
cps_hier_ddi_file <- ipums_example("cps_00010.xml")
# Read in "long" format and you get 1 data frame
cps_long <- read_ipums_micro(cps_hier_ddi_file)
#> Use of data from IPUMS-CPS is subject to conditions including that users should
#> cite the data appropriately. Use command `ipums_conditions()` for more details.
head(cps_long)
#> # A tibble: 6 × 9
#> RECTYPE YEAR SERIAL HWTSUPP STATEFIP MONTH PERNUM WTSUPP INCTOT
#> <chr+lbl> <dbl> <dbl> <dbl> <int+lb> <int+lb> <dbl> <dbl> <dbl+lbl>
#> 1 H [Househo… 1962 80 1476. 55 [Wis… 3 [Mar… NA NA NA
#> 2 P [Person … 1962 80 NA NA NA 1 1476. 4.88e3
#> 3 P [Person … 1962 80 NA NA NA 2 1471. 5.8 e3
#> 4 P [Person … 1962 80 NA NA NA 3 1579. 1.00e8 [Mis…
#> 5 H [Househo… 1962 82 1598. 27 [Min… 3 [Mar… NA NA NA
#> 6 P [Person … 1962 82 NA NA NA 1 1598. 1.40e4
# Read in "list" format and you get a list of multiple data frames
cps_list <- read_ipums_micro_list(cps_hier_ddi_file)
#> Use of data from IPUMS-CPS is subject to conditions including that users should
#> cite the data appropriately. Use command `ipums_conditions()` for more details.
head(cps_list$PERSON)
#> # A tibble: 6 × 6
#> RECTYPE YEAR SERIAL PERNUM WTSUPP INCTOT
#> <chr+lbl> <dbl> <dbl> <dbl> <dbl> <dbl+lbl>
#> 1 P [Person Record] 1962 80 1 1476. 4883
#> 2 P [Person Record] 1962 80 2 1471. 5800
#> 3 P [Person Record] 1962 80 3 1579. 99999998 [Missing.]
#> 4 P [Person Record] 1962 82 1 1598. 14015
#> 5 P [Person Record] 1962 83 1 1707. 16552
#> 6 P [Person Record] 1962 84 1 1790. 6375
head(cps_list$HOUSEHOLD)
#> # A tibble: 6 × 6
#> RECTYPE YEAR SERIAL HWTSUPP STATEFIP MONTH
#> <chr+lbl> <dbl> <dbl> <dbl> <int+lbl> <int+lbl>
#> 1 H [Household Record] 1962 80 1476. 55 [Wisconsin] 3 [March]
#> 2 H [Household Record] 1962 82 1598. 27 [Minnesota] 3 [March]
#> 3 H [Household Record] 1962 83 1707. 27 [Minnesota] 3 [March]
#> 4 H [Household Record] 1962 84 1790. 27 [Minnesota] 3 [March]
#> 5 H [Household Record] 1962 107 4355. 19 [Iowa] 3 [March]
#> 6 H [Household Record] 1962 108 1479. 19 [Iowa] 3 [March]
# Or you can use the \code{%<-%} operator from zeallot to unpack
c(household, person) %<-% read_ipums_micro_list(cps_hier_ddi_file)
#> Use of data from IPUMS-CPS is subject to conditions including that users should
#> cite the data appropriately. Use command `ipums_conditions()` for more details.
head(person)
#> # A tibble: 6 × 6
#> RECTYPE YEAR SERIAL PERNUM WTSUPP INCTOT
#> <chr+lbl> <dbl> <dbl> <dbl> <dbl> <dbl+lbl>
#> 1 P [Person Record] 1962 80 1 1476. 4883
#> 2 P [Person Record] 1962 80 2 1471. 5800
#> 3 P [Person Record] 1962 80 3 1579. 99999998 [Missing.]
#> 4 P [Person Record] 1962 82 1 1598. 14015
#> 5 P [Person Record] 1962 83 1 1707. 16552
#> 6 P [Person Record] 1962 84 1 1790. 6375
head(household)
#> # A tibble: 6 × 6
#> RECTYPE YEAR SERIAL HWTSUPP STATEFIP MONTH
#> <chr+lbl> <dbl> <dbl> <dbl> <int+lbl> <int+lbl>
#> 1 H [Household Record] 1962 80 1476. 55 [Wisconsin] 3 [March]
#> 2 H [Household Record] 1962 82 1598. 27 [Minnesota] 3 [March]
#> 3 H [Household Record] 1962 83 1707. 27 [Minnesota] 3 [March]
#> 4 H [Household Record] 1962 84 1790. 27 [Minnesota] 3 [March]
#> 5 H [Household Record] 1962 107 4355. 19 [Iowa] 3 [March]
#> 6 H [Household Record] 1962 108 1479. 19 [Iowa] 3 [March]