maestro 0.2.0

New tags for specifying hours, days, and months for pipelines

R
data pipelines
orchestration
maestro
packages
release
Author

Will Hipson

Published

August 27, 2024

Modified

August 27, 2024

maestro 0.2.0 brings with it new features for scheduling pipelines on specific hours, days, or months. With a few new tags, we have so much more versatility for scheduling.

If you haven’t heard of maestro, it’s a package that helps you schedule your R scripts all in a single project using tags. You can learn more about it here.

Get it from CRAN:

install.packages("maestro")

New maestroFrequency Syntax

The maestroFrequency tag used to be restrictive in how you specified your pipeline frequency, only accepting it in the form of [n] [units] like 1 day, 4 hours, 2 weeks, etc. A more human-readable adverb option is now available. You can specify it as one of hourly, daily, weekly, biweekly, monthly, quarterly and yearly. Each of these is the equivalent of 1 [unit]. So, hourly = 1 hour.

#' Example hourly pipeline
#' @maestroFrequency hourly
my_hourly_job <- function() {
  # job code ...
}

You can also use this syntax in run_schedule:

library(maestro)

run_schedule(
  example_schedule,
  orch_frequency = "hourly"
)

These frequencies are important not only for readability but they also combine with hour, day, and month specifiers for more bespoke scheduling.

New Hours, Days, Months Specifiers

Until now you could only run pipelines on regular intervals. With 0.2.0 you can specify particular hours, days, and months you want the pipelines to run. This is useful if you want to do something like have a job run only on business hours or just on weekends. 3 new tags maestroHours, maestroDays, and maestroMonths are available.

Specific Hours

Let’s say I have a pipeline I want to run during regular 9am-5pm business hours. I can use a maestroFrequency of hourly and specify the hours 9am through to 5pm:

#' Example work hours pipeline
#' @maestroFrequency hourly
#' @maestroHours 9 10 11 12 13 14 15 16 17
my_work_hours_job <- function() {
  # job code ...
}

This will be UTC hours by default. If I want my timezone to be where I live I can specify maestroTz America/Halifax.

Specific Days

We can specify either days of week or days of month. Days of week use abbreviated weekdays like Mon, Wed, Sat; whereas days of month use integers 1, 10, 15, etc.

Taking the above example further, let’s have it run during business hours on weekdays:

#' Example work hours pipeline
#' @maestroFrequency hourly
#' @maestroHours 9 10 11 12 13 14 15 16 17
#' @maestroDays Mon Tue Wed Thu Fri
my_work_hours_job2 <- function() {
  # job code ...
}

Specific Months

To specify the months use the integers 1-12. Let’s imagine that our pipeline only runs in March, June, October, and December:

#' Example work hours pipeline
#' @maestroFrequency hourly
#' @maestroHours 9 10 11 12 13 14 15 16 17
#' @maestroDays Mon Tue Wed Thu Fri
#' @maestroMonths 3 6 10 12
my_work_hours_job3 <- function() {
  # job code ...
}

These specifiers must be used with -ly frequencies like hourly and daily. The type of specifier used must be at least the same as the base frequency. For example, we can’t use maestroHours on a pipeline with a daily frequency.

Improvements to suggest_orch_frequency

The function suggest_orch_frequency takes a schedule generated from build_schedule and suggests the likely optimal frequency for the orchestrator. Until recently, this function was pretty basic and just suggested twice the amount of the highest frequency pipeline in the project. This wouldn’t work out well if you had pipelines staggered on different hours.

Now suggest_orch_frequency looks for the smallest interval of time between pipelines. It won’t consider hours, days, months specifiers though.

library(maestro)

example_schedule[c("pipe_name", "frequency", "start_time")]
          pipe_name frequency          start_time
1              pipe 30 minute 1970-01-01 00:00:00
2        get_mtcars     1 day 2024-03-01 09:00:00
3              wait   3 month 1970-01-01 00:00:00
4               add   3 month 1970-01-01 00:00:00
5         something   3 month 1970-01-01 00:00:00
6         something     daily 1970-01-01 00:00:00
7       get_mtcars2     1 day 2024-03-01 09:00:00
8     specific_days     daily 1970-01-01 00:00:00
9    specific_days2     daily 1970-01-01 00:00:00
10   specific_hours    hourly 1970-01-01 00:00:00
11 specific_months1  biweekly 1970-01-01 00:00:00
12 specific_months2    weekly 1970-01-01 00:00:00
13 specific_months3   monthly 1970-01-01 00:00:00
14   specific_multi    hourly 1970-01-01 00:00:00
suggest_orch_frequency(example_schedule)
[1] "30 mins"

Note that suggest_orch_frequency assumes that you want to run your pipelines exactly when you want them to - it won’t try to round to the nearest 15 minute or whatever.

Check out the release notes for more details on what’s new in version 0.2.0. If you find any bugs or want to suggest new features and improvements, please add them here or reach out to me on LinkedIn.

Happy orchestrating!