Reads a dataset downloaded from the IPUMS extract system, but does
so by returning an object that can read a group of lines at a time.
This is a more flexible way to read data in chunks than
the functions like read_ipums_micro_chunked
, allowing
you to do things like reading parts of multiple files at the same time
and resetting from the beginning more easily than with the chunked
functions. Note that while other read_ipums_micro*
functions
can read from .csv(.gz) or .dat(.gz) files, these functions can only read
from .dat(.gz) files.
read_ipums_micro_yield(
ddi,
vars = NULL,
data_file = NULL,
verbose = TRUE,
var_attrs = c("val_labels", "var_label", "var_desc"),
lower_vars = FALSE
)
read_ipums_micro_list_yield(
ddi,
vars = NULL,
data_file = NULL,
verbose = TRUE,
var_attrs = c("val_labels", "var_label", "var_desc"),
lower_vars = FALSE
)
Either a filepath to a DDI xml file downloaded from
the website, or a ipums_ddi
object parsed by read_ipums_ddi
Names of variables to load. Accepts a character vector of names, or
dplyr_select_style
conventions. For hierarchical data, the
rectype id variable will be added even if it is not specified.
Specify a directory to look for the data file. If left empty, it will look in the same directory as the DDI file.
Logical, indicating whether to print progress information to console.
Variable attributes to add from the DDI, defaults to
adding all (val_labels, var_label and var_desc). See
set_ipums_var_attributes
for more details.
Only if reading a DDI from a file, a logical indicating
whether to convert variable names to lowercase (default is FALSE, in line
with IPUMS conventions). Note that this argument will be ignored if
argument ddi
is an ipums_ddi
object rather than a file path.
See read_ipums_ddi
for converting variable names to lowercase
when reading in the DDI.
A HipYield R6 object (See 'Details' for more information)
These functions return an IpumsYield R6 object which have the following methods:
yield(n = 10000)
A function to read the next 'yield' from the data,
returns a `tbl_df` (or list of `tbl_df` for `hipread_list_yield()`)
with up to n rows (it will return NULL if no rows are left, or all
available ones if less than n are available).
reset()
A function to reset the data so that the next yield will
read data from the start.
is_done()
A function that returns whether the file has been completely
read yet or not.
cur_pos
A property that contains the next row number that will be
read (1-indexed).
Other ipums_read:
read_ipums_micro_chunked()
,
read_ipums_micro()
,
read_ipums_sf()
,
read_nhgis()
,
read_terra_area()
,
read_terra_micro()
,
read_terra_raster()
hipread::HipYield
-> hipread::HipLongYield
-> IpumsLongYield
Inherited methods
new()
IpumsLongYield$new(
ddi,
vars = NULL,
data_file = NULL,
verbose = TRUE,
var_attrs = c("val_labels", "var_label", "var_desc"),
lower_vars = FALSE
)
hipread::HipYield
-> hipread::HipListYield
-> IpumsListYield
Inherited methods
new()
IpumsListYield$new(
ddi,
vars = NULL,
data_file = NULL,
verbose = TRUE,
var_attrs = c("val_labels", "var_label", "var_desc"),
lower_vars = FALSE
)
# An example using "long" data
long_yield <- read_ipums_micro_yield(ipums_example("cps_00006.xml"))
#> Use of data from IPUMS-CPS is subject to conditions including that users should
#> cite the data appropriately. Use command `ipums_conditions()` for more details.
# Get first 10 rows
long_yield$yield(10)
#> # A tibble: 10 × 8
#> YEAR SERIAL HWTSUPP STATEFIP MONTH PERNUM WTSUPP INCTOT
#> <dbl> <dbl> <dbl> <int+lbl> <int+lbl> <dbl> <dbl> <dbl+lbl>
#> 1 1962 80 1476. 55 [Wisconsin] 3 [March] 1 1476. 4883
#> 2 1962 80 1476. 55 [Wisconsin] 3 [March] 2 1471. 5800
#> 3 1962 80 1476. 55 [Wisconsin] 3 [March] 3 1579. 99999998 [Missin…
#> 4 1962 82 1598. 27 [Minnesota] 3 [March] 1 1598. 14015
#> 5 1962 83 1707. 27 [Minnesota] 3 [March] 1 1707. 16552
#> 6 1962 84 1790. 27 [Minnesota] 3 [March] 1 1790. 6375
#> 7 1962 107 4355. 19 [Iowa] 3 [March] 1 4355. 99999999 [N.I.U.…
#> 8 1962 107 4355. 19 [Iowa] 3 [March] 2 1386. 0
#> 9 1962 107 4355. 19 [Iowa] 3 [March] 3 1629. 600
#> 10 1962 107 4355. 19 [Iowa] 3 [March] 4 1432. 99999999 [N.I.U.…
# Get 20 more rows now
long_yield$yield(20)
#> # A tibble: 20 × 8
#> YEAR SERIAL HWTSUPP STATEFIP MONTH PERNUM WTSUPP INCTOT
#> <dbl> <dbl> <dbl> <int+lbl> <int+lbl> <dbl> <dbl> <dbl+lbl>
#> 1 1962 108 1479. 19 [Iowa] 3 [March] 1 1479. 12300
#> 2 1962 108 1479. 19 [Iowa] 3 [March] 2 1482. 0
#> 3 1962 122 3603. 27 [Minnesota] 3 [March] 1 3603. 15550
#> 4 1962 122 3603. 27 [Minnesota] 3 [March] 2 3603. 0
#> 5 1962 122 3603. 27 [Minnesota] 3 [March] 3 4243. 3443
#> 6 1962 122 3603. 27 [Minnesota] 3 [March] 4 3920. 255
#> 7 1962 122 3603. 27 [Minnesota] 3 [March] 5 3689. 135
#> 8 1962 124 4104. 55 [Wisconsin] 3 [March] 1 4104. 15000
#> 9 1962 124 4104. 55 [Wisconsin] 3 [March] 2 1487. 3550
#> 10 1962 124 4104. 55 [Wisconsin] 3 [March] 3 1450. 692
#> 11 1962 124 4104. 55 [Wisconsin] 3 [March] 4 1441. 0
#> 12 1962 125 2182. 55 [Wisconsin] 3 [March] 1 2182. 4470
#> 13 1962 126 1826. 55 [Wisconsin] 3 [March] 1 1826. 99999999 [N.I.U.…
#> 14 1962 126 1826. 55 [Wisconsin] 3 [March] 2 1629. 0
#> 15 1962 761 1751. 19 [Iowa] 3 [March] 1 1751. 7300
#> 16 1962 761 1751. 19 [Iowa] 3 [March] 2 1751. 3700
#> 17 1962 762 1874. 19 [Iowa] 3 [March] 1 1874. 2534
#> 18 1962 762 1874. 19 [Iowa] 3 [March] 2 1874. 0
#> 19 1962 763 1874. 19 [Iowa] 3 [March] 1 1874. 1591
#> 20 1962 764 1724. 19 [Iowa] 3 [March] 1 1724. 8002
# See what row we're on now
long_yield$cur_pos
#> [1] 31
# Reset to beginning
long_yield$reset()
# Read the whole thing in chunks and count Minnesotans
total_mn <- 0
while (!long_yield$is_done()) {
cur_data <- long_yield$yield(1000)
total_mn <- total_mn + sum(as_factor(cur_data$STATEFIP) == "Minnesota")
}
total_mn
#> [1] 2362
# Can also read hierarchical data as list:
list_yield <- read_ipums_micro_list_yield(ipums_example("cps_00006.xml"))
#> Use of data from IPUMS-CPS is subject to conditions including that users should
#> cite the data appropriately. Use command `ipums_conditions()` for more details.
#> Assuming data rectangularized to 'P' record type
list_yield$yield(10)
#> $P
#> # A tibble: 10 × 8
#> YEAR SERIAL HWTSUPP STATEFIP MONTH PERNUM WTSUPP INCTOT
#> <dbl> <dbl> <dbl> <int+lbl> <int+lbl> <dbl> <dbl> <dbl+lbl>
#> 1 1962 80 1476. 55 [Wisconsin] 3 [March] 1 1476. 4883
#> 2 1962 80 1476. 55 [Wisconsin] 3 [March] 2 1471. 5800
#> 3 1962 80 1476. 55 [Wisconsin] 3 [March] 3 1579. 99999998 [Missin…
#> 4 1962 82 1598. 27 [Minnesota] 3 [March] 1 1598. 14015
#> 5 1962 83 1707. 27 [Minnesota] 3 [March] 1 1707. 16552
#> 6 1962 84 1790. 27 [Minnesota] 3 [March] 1 1790. 6375
#> 7 1962 107 4355. 19 [Iowa] 3 [March] 1 4355. 99999999 [N.I.U.…
#> 8 1962 107 4355. 19 [Iowa] 3 [March] 2 1386. 0
#> 9 1962 107 4355. 19 [Iowa] 3 [March] 3 1629. 600
#> 10 1962 107 4355. 19 [Iowa] 3 [March] 4 1432. 99999999 [N.I.U.…
#>