Skip to contents

This function creates a data profiling report.

Usage

create_report(
  data,
  output_format = html_document(toc = TRUE, toc_depth = 6, theme = "yeti"),
  output_file = "report.html",
  output_dir = getwd(),
  y = NULL,
  config = configure_report(),
  report_title = "Data Profiling Report",
  ...
)

Arguments

data

input data

output_format

output format in render. Default is html_document(toc = TRUE, toc_depth = 6, theme = "yeti").

output_file

output file name in render. Default is "report.html".

output_dir

output directory for report in render. Default is user's current directory.

y

name of response variable if any. Response variables will be passed to appropriate plotting functions automatically.

config

report configuration generated by configure_report.

report_title

report title. Default is "Data Profiling Report".

...

other arguments to be passed to render.

Details

config is a named list to be evaluated by create_report. Each name should exactly match a function name. By doing so, that function and corresponding content will be added to the report. If you do not want to include certain functions/content, do not add it to config.

configure_report generates the default template. You may customize the content using that function.

All function arguments will be passed to do.call as a list.

Note

If both y and plot_prcomp are present, y will be removed from plot_prcomp.

If there are multiple options for the same function, all of them will be plotted. For example, create_report(..., y = "a", config = list("plot_bar" = list("with" = "b"))) will create 3 bar charts:

  • regular frequency bar chart

  • bar chart aggregated by response variable "a"

  • bar chart aggregated by `with` variable "b"`

See also

Examples

if (FALSE) {
# Create report
create_report(iris)
create_report(airquality, y = "Ozone")

# Load library
library(ggplot2)
library(data.table)
library(rmarkdown)

# Set some missing values
diamonds2 <- data.table(diamonds)
for (j in 5:ncol(diamonds2)) {
  set(diamonds2,
      i = sample.int(nrow(diamonds2), sample.int(nrow(diamonds2), 1)),
      j,
      value = NA_integer_)
}

# Create customized report for diamonds2 dataset
create_report(
  data = diamonds2,
  output_format = html_document(toc = TRUE, toc_depth = 6, theme = "flatly"),
  output_file = "report.html",
  output_dir = getwd(),
  y = "price",
  config = configure_report(
    add_plot_prcomp = TRUE,
    plot_qq_args = list("by" = "cut", sampled_rows = 1000L),
    plot_bar_args = list("with" = "carat"),
    plot_correlation_args = list("cor_args" = list("use" = "pairwise.complete.obs")),
    plot_boxplot_args = list("by" = "cut"),
    global_ggtheme = quote(theme_light())
  )
)

## Configure report without `configure_report`
config <- list(
  "introduce" = list(),
  "plot_intro" = list(),
  "plot_str" = list(
    "type" = "diagonal",
    "fontSize" = 35,
    "width" = 1000,
    "margin" = list("left" = 350, "right" = 250)
  ),
  "plot_missing" = list(),
  "plot_histogram" = list(),
  "plot_density" = list(),
  "plot_qq" = list(sampled_rows = 1000L),
  "plot_bar" = list(),
  "plot_correlation" = list("cor_args" = list("use" = "pairwise.complete.obs")),
  "plot_prcomp" = list(),
  "plot_boxplot" = list(),
  "plot_scatterplot" = list(sampled_rows = 1000L)
)
}