Code
library(tidyverse)
library(plotly)
library(lubridate)
library(scales)
library(zoo)
library(rmarkdown)
library(purrr)
Brian Cervantes Alvarez
April 3, 2023
April 8, 2025
I automated the report generation process using parameterized R Markdown, which allowed me to update the year parameter dynamically and consistently produce reports without manual intervention. I developed custom themes and employed ggplotly to create interactive time series visualizations of monthly sales data, ensuring each report accurately reflected key trends and patterns. Additionally, I implemented a loop to render separate reports for multiple years, demonstrating efficiency and reproducibility in automated data reporting workflows.
This project was aimed at streamlining the report generation process by employing parameterized R Markdown. The core idea was to automate the update of report parameters, thus eliminating the manual task of adjusting values for daily or periodic reports. This innovation is designed to significantly reduce time and effort for data scientists, analysts, and engineers, facilitating the production of consistent and timely reports. The automation feature introduced through parameterized R Markdown enhances both productivity and accuracy, offering considerable benefits to business operations.
The process involved the following key steps:
tidyverse
for data manipulation, plotly
and ggplotly
for interactive visualizations, and rmarkdown
for rendering parameterized reports.ggplotly
to display monthly sales data, highlighting key trends and insights.ggplotly
for creating dynamic visualizations.myTheme
) ensures a uniform look across all visualizations.renderReport
function and subsequent loop for rendering reports underscore the automation aspect, showcasing the project’s capacity to produce multiple reports efficiently.This project exemplifies the power of automation in report generation using parameterized R Markdown. It offers a scalable solution for producing detailed and aesthetically consistent reports, providing valuable time savings and accuracy for businesses and their data teams.
Rows: 10,042
Columns: 9
$ ID <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
$ DocumentID <dbl> 716, 716, 716, 716, 716, 716, 716, 460, 461, 462, 463, 464,…
$ Date <date> 2019-09-23, 2019-09-23, 2019-09-23, 2019-09-23, 2019-09-23…
$ SKU <dbl> 1039, 853, 862, 868, 2313, 2355, 2529, 2361, 2723, 655, 254…
$ Price <dbl> 381.78, 593.22, 423.73, 201.70, 345.76, 406.78, 542.38, 139…
$ Discount <dbl> 67.37254, 0.00034, -0.00119, 35.58814, 61.01966, 101.69458,…
$ Customer <dbl> 1, 1, 1, 1, 1, 1, 1, 460, 479, 26, 580, 311, 311, 311, 311,…
$ Quantity <dbl> 1, 1, 1, 1, 1, 1, 1, 6, 1, 1, 1, 2, 2, 1, 1, 1, 4, 1, 1, 4,…
$ Month <date> 2019-09-01, 2019-09-01, 2019-09-01, 2019-09-01, 2019-09-01…
I utilized ggplotly, a graphical representation tool, to create an interactive visualization of monthly sales time series data for “CRM and Invoicing system,” which is a wholesale company owned by Sadi Evren. The data for this analysis was obtained from the following Kaggle dataset: https://www.kaggle.com/datasets/shedai/retail-data-set?select=file_out.csv.
The resulting plot provided an insightful representation of the monthly sales data, showcasing trends and patterns in the data that could potentially provide useful information for decision making in the business.
In addition to the initial plot, I implemented a for loop to automatically generate multiple reports based on the time series data for each year. This approach eliminated the need for manual report generation, thereby saving time and reducing the risk of errors. The loop enabled the automated generation of separate reports for each year, which provided a comprehensive view of the sales trends over time.
Overall, the use of ggplotly for data visualization and automation of report generation using a for loop demonstrated an effective approach for efficiently analyzing and presenting data.
p <- ds %>%
group_by(Month) %>%
summarize(AvgSales = round(mean(Price * Quantity),2) ) %>%
ggplot(aes(x = Month,
y = AvgSales,
group = 1, #Necessary or else line plot disappears
text = paste0("Monthly Sales: $", (round(AvgSales/1000,2)),"K" ))) +
geom_line(size = 1) +
scale_y_continuous(labels = scales::dollar_format(scale = .001, suffix = "K")) +
scale_x_date(date_breaks = "1 month", date_labels = "%B") +
labs(title = paste0("CRM and Invoicing System Sales For FY: ", params$year),
caption = "Source: https://www.kaggle.com/datasets/shedai/retail-data-set?select=file_out.csv",
x = NULL,
y = NULL) +
myTheme()
ggplotly(p, tooltip = c("text")) %>%
layout(hovermode = "x unified")
---
title: "Parameterized Reports"
author: "Brian Cervantes Alvarez"
date: "4-03-2023"
date-modified: today
image: /assets/images/report.jpeg
description: "Achieving peak productivity in data analysis with automated R Markdown reports, minimizing effort and errors."
bibliography: "bibliography.bib"
nocite: |
@*
params:
printcode: true
year: 2019
format:
html:
code-tools: true
code-fold: true
toc: true
toc-location: right
html-math-method: katex
page-layout: article
execute:
warning: false
message: false
categories: [R, Plotly, Reports, Time Series, Data Visualization]
ai-summary:
banner-title: "Yapper Labs | AI Summary"
model-title: "Model: ChatGPT o3-mini-high"
model-img: "/assets/images/OpenAI-white-monoblossom.svg"
summary: "I automated the report generation process using parameterized R Markdown, which allowed me to update the `year` parameter dynamically and consistently produce reports without manual intervention. I developed custom themes and employed ggplotly to create interactive time series visualizations of monthly sales data, ensuring each report accurately reflected key trends and patterns. Additionally, I implemented a loop to render separate reports for multiple years, demonstrating efficiency and reproducibility in automated data reporting workflows."
---

## Objective
This project was aimed at streamlining the report generation process by employing parameterized R Markdown. The core idea was to automate the update of report parameters, thus eliminating the manual task of adjusting values for daily or periodic reports. This innovation is designed to significantly reduce time and effort for data scientists, analysts, and engineers, facilitating the production of consistent and timely reports. The automation feature introduced through parameterized R Markdown enhances both productivity and accuracy, offering considerable benefits to business operations.
## Approach
The process involved the following key steps:
1. **Library Utilization:**
- Essential R libraries were loaded, including `tidyverse` for data manipulation, `plotly` and `ggplotly` for interactive visualizations, and `rmarkdown` for rendering parameterized reports.
2. **Data Preparation:**
- The dataset was loaded and processed. This included renaming columns, converting dates, and filtering data based on the year specified through parameters.
3. **Custom Theme Development:**
- A custom theme function was created to standardize the appearance of plots across all reports, ensuring a consistent and professional aesthetic.
4. **Data Visualization:**
- Interactive visualizations were crafted using `ggplotly` to display monthly sales data, highlighting key trends and insights.
5. **Automated Report Generation:**
- A loop was implemented to automate the generation of reports for each year, utilizing a function that leverages parameterized R Markdown to render individual reports dynamically.
### Highlights
- The project underscores the efficiency and scalability of automated report generation.
- The approach allows for the seamless creation of visually appealing and informative reports.
- Automation minimizes errors and saves considerable time, enhancing decision-making processes with timely and accurate data insights.
### Technical Notes
- The script included the loading of required libraries, data wrangling steps, and the use of `ggplotly` for creating dynamic visualizations.
- The custom theme function (`myTheme`) ensures a uniform look across all visualizations.
- The `renderReport` function and subsequent loop for rendering reports underscore the automation aspect, showcasing the project's capacity to produce multiple reports efficiently.
### Conclusion
This project exemplifies the power of automation in report generation using parameterized R Markdown. It offers a scalable solution for producing detailed and aesthetically consistent reports, providing valuable time savings and accuracy for businesses and their data teams.
## Required Libraries
```{r}
library(tidyverse)
library(plotly)
library(lubridate)
library(scales)
library(zoo)
library(rmarkdown)
library(purrr)
```
## Load Dataset & Wrangle
```{r}
ds <- read_csv("../../../assets/datasets/retail.csv")
#head(ds)
ds <- ds %>%
rename(ID = ...1) %>%
mutate(Month = lubridate::floor_date(Date, 'month')) %>%
filter(year(Month) == params$year)
glimpse(ds)
```
```{r}
#| include: false
myTheme <- function(){
font <- "SF Mono" #assign font family up front
theme_minimal() %+replace% #replace elements we want to change
theme(
#grid elements
panel.grid.major.x = element_blank(), #strip major gridlines
panel.grid.minor = element_blank(), #strip minor gridlines
axis.ticks = element_blank(), #strip axis ticks
#since theme_minimal() already strips axis lines,
#we don't need to do that again
#text elements
plot.title = element_text( #title
family = font, #set font family
size = 16, #set font size
face = 'bold', #bold typeface
hjust = 0, #left align
vjust = 2), #raise slightly
plot.margin = margin( #margins
r = 0.5, #right margin
l = 0.5, #left margin
t = 1, #top margin
b = 0.25, #bottom margin
unit = "cm"), #units
plot.subtitle = element_text( #subtitle
family = font, #font family
size = 12, #font size
hjust = 0,
vjust = -1),
plot.caption = element_text( #caption
family = font, #font family
size = 9), #font size
axis.title = element_text( #axis titles
family = font, #font family
size = 10), #font size
axis.text = element_text( #axis text
family = font, #axis famuly
size = 9), #font size
axis.text.x = element_text( #margin for axis text
margin = margin(t = 5, b = 20),
angle = 45)
#since the legend often requires manual tweaking
#based on plot content, don't define it here
)
}
```
## Visualize The Report
I utilized ggplotly, a graphical representation tool, to create an interactive visualization of monthly sales time series data for "CRM and Invoicing system," which is a wholesale company owned by Sadi Evren. The data for this analysis was obtained from the following Kaggle dataset: https://www.kaggle.com/datasets/shedai/retail-data-set?select=file_out.csv.
The resulting plot provided an insightful representation of the monthly sales data, showcasing trends and patterns in the data that could potentially provide useful information for decision making in the business.
In addition to the initial plot, I implemented a for loop to automatically generate multiple reports based on the time series data for each year. This approach eliminated the need for manual report generation, thereby saving time and reducing the risk of errors. The loop enabled the automated generation of separate reports for each year, which provided a comprehensive view of the sales trends over time.
Overall, the use of ggplotly for data visualization and automation of report generation using a for loop demonstrated an effective approach for efficiently analyzing and presenting data.
```{r}
p <- ds %>%
group_by(Month) %>%
summarize(AvgSales = round(mean(Price * Quantity),2) ) %>%
ggplot(aes(x = Month,
y = AvgSales,
group = 1, #Necessary or else line plot disappears
text = paste0("Monthly Sales: $", (round(AvgSales/1000,2)),"K" ))) +
geom_line(size = 1) +
scale_y_continuous(labels = scales::dollar_format(scale = .001, suffix = "K")) +
scale_x_date(date_breaks = "1 month", date_labels = "%B") +
labs(title = paste0("CRM and Invoicing System Sales For FY: ", params$year),
caption = "Source: https://www.kaggle.com/datasets/shedai/retail-data-set?select=file_out.csv",
x = NULL,
y = NULL) +
myTheme()
ggplotly(p, tooltip = c("text")) %>%
layout(hovermode = "x unified")
```
## Function To Run Parameterized Reports
```{r}
#| eval: FALSE
renderReport <- function(year) {
quarto::quarto_render(
input = "index.qmd",
output_file = paste0(year, '.html'),
execute_params = list(year = year)
)
}
```
## Render All Reports
```{r}
#| eval: FALSE
# Renders all 4 Reports (dates range from 2019-2022)
for (year in 2019:2022) {
renderReport(year)
}
```
## Data References
```{r, echo=FALSE}
knitr::write_bib(names(sessionInfo()$otherPkgs), file = "bibliography.bib")
```