Title: "Register In- and Outputs for Workflow Visualization"
Version: 0.1.0
Author: Philipp Thoss ORCID iD [aut, cre]
Maintainer: Philipp Thoss <ph.thoss@gmx.de>
Description: Provides tools for extracting and processing structured annotations from 'R' and 'Python' source files to facilitate workflow visualization. The package scans source files for special 'PUT' annotations that define nodes, connections, and metadata within a data processing workflow. These annotations can then be used to generate visual representations of data flows and processing steps across polyglot software environments. Builds on concepts from literate programming Knuth (1984) <doi:10.1093/comjnl/27.2.97> and utilizes directed acyclic graph (DAG) theory for workflow representation Foraita, Spallek, and Zeeb (2014) <doi:10.1007/978-0-387-09834-0_65>. Diagram generation powered by 'Mermaid' Sveidqvist (2014) https://mermaid.js.org/.
Language: en-US
License: MIT + file LICENSE
URL: https://pjt222.github.io/putior/, https://github.com/pjt222/putior
BugReports: https://github.com/pjt222/putior/issues
Depends: R (≥ 3.5.0)
Imports: tools
Suggests: testthat (≥ 3.0.0), knitr, rmarkdown, clipr, uuid, pkgdown
Encoding: UTF-8
RoxygenNote: 7.3.2
Config/testthat/edition: 3
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2025-06-18 10:49:27 UTC; phtho
Repository: CRAN
Date/Publication: 2025-06-19 14:50:02 UTC

Convert results list to data frame

Description

Convert results list to data frame

Usage

convert_results_to_df(results, include_line_numbers)

Arguments

results

List of annotation results

include_line_numbers

Whether line numbers are included

Value

Data frame


Create artifact nodes for data files

Description

Create artifact nodes for data files

Usage

create_artifact_nodes(workflow)

Arguments

workflow

Workflow data frame

Value

Data frame with artifact node definitions


Create empty result data frame

Description

Create empty result data frame

Usage

empty_result_df(include_line_numbers = FALSE)

Arguments

include_line_numbers

Whether to include line_number column

Value

Empty data frame with correct structure


Generate connections between nodes

Description

Generate connections between nodes

Usage

generate_connections(workflow, show_files = FALSE, show_artifacts = FALSE)

Arguments

workflow

Workflow data frame with combined script and artifact nodes

show_files

Whether to show file-based connections

show_artifacts

Whether artifacts are included in the workflow

Value

Character vector of connection definitions


Generate node definitions for mermaid diagram

Description

Generate node definitions for mermaid diagram

Usage

generate_node_definitions(
  workflow,
  node_labels = "label",
  show_workflow_boundaries = TRUE
)

Arguments

workflow

Workflow data frame

node_labels

What to show in node labels

show_workflow_boundaries

Whether to apply special styling to start/end nodes

Value

Character vector of node definitions


Generate node styling based on node types and theme

Description

Generate node styling based on node types and theme

Usage

generate_node_styling(
  workflow,
  theme = "light",
  show_workflow_boundaries = TRUE
)

Arguments

workflow

Workflow data frame

theme

Color theme ("light", "dark", "auto", "minimal", "github")

Value

Character vector of styling definitions


Get available themes for put_diagram

Description

Returns information about available color themes for workflow diagrams.

Usage

get_diagram_themes()

Value

Named list describing available themes

Examples

# See available themes
get_diagram_themes()

## Not run: 
# Use a specific theme (requires actual workflow data)
workflow <- put("./src")
put_diagram(workflow, theme = "github")

## End(Not run)

Get node shape characters based on node type

Description

Get node shape characters based on node type

Usage

get_node_shape(node_type, show_workflow_boundaries = TRUE)

Arguments

node_type

Node type string

show_workflow_boundaries

Whether to apply special workflow boundary styling

Value

Character vector with opening and closing shape characters


Get color schemes for different themes (FIXED VERSION)

Description

Get color schemes for different themes (FIXED VERSION)

Usage

get_theme_colors(theme)

Arguments

theme

Theme name

Value

Named list of color definitions for each node type


Handle diagram output to different destinations

Description

Handle diagram output to different destinations

Usage

handle_output(mermaid_code, output = "console", file = NULL, title = NULL)

Arguments

mermaid_code

Generated mermaid code

output

Output format

file

File path for file output

title

Diagram title


Validate PUT annotation syntax

Description

Test helper function to validate PUT annotation syntax

Usage

is_valid_put_annotation(line)

Arguments

line

Character string containing a PUT annotation

Value

Logical indicating if the annotation is valid

Examples

is_valid_put_annotation('#put name:"test", label:"Test"') # TRUE
is_valid_put_annotation("#put invalid syntax") # FALSE

Parse comma-separated pairs while respecting quotes

Description

Parse comma-separated pairs while respecting quotes

Usage

parse_comma_separated_pairs(text)

Arguments

text

Text to parse

Value

Character vector of pairs


Extract PUT Annotation Properties

Description

Parses a single line containing a PUT annotation and extracts key-value pairs. Supports flexible syntax with optional spaces and pipe separators.

Usage

parse_put_annotation(line)

Arguments

line

Character string containing a PUT annotation

Value

Named list containing all extracted properties, or NULL if invalid


Process a single file for PUT annotations

Description

Process a single file for PUT annotations

Usage

process_single_file(file, include_line_numbers, validate)

Arguments

file

Path to file

include_line_numbers

Whether to include line numbers

validate

Whether to validate annotations

Value

List of annotation results or error message


Scan R and Python Files for PUT Annotations

Description

Scans source files in a directory for PUT annotations that define workflow nodes, inputs, outputs, and metadata. Supports both R and Python files with flexible annotation syntax including single-line and multiline formats.

Usage

put(
  path,
  pattern = "\\.(R|r|py|sql|sh|jl)$",
  recursive = FALSE,
  include_line_numbers = FALSE,
  validate = TRUE
)

Arguments

path

Character string specifying the path to the folder containing files, or path to a single file

pattern

Character string specifying the file pattern to match. Default: "\.(R|r|py|sql|sh|jl)$" (R, Python, SQL, shell, Julia files)

recursive

Logical. Should subdirectories be searched recursively? Default: FALSE

include_line_numbers

Logical. Should line numbers be included in output? Default: FALSE

validate

Logical. Should annotations be validated for common issues? Default: TRUE

Value

A data frame containing file names and all properties found in annotations. Always includes columns: file_name, file_type, and any properties found in PUT annotations (typically: id, label, node_type, input, output). If include_line_numbers is TRUE, also includes line_number. Note: If output is not specified in an annotation, it defaults to the file name.

PUT Annotation Syntax

PUT annotations can be written in single-line or multiline format:

Single-line format: All parameters on one line

#put id:"node1", label:"Process Data", input:"data.csv", output:"result.csv"

Multiline format: Use backslash (\) for line continuation

#put id:"node1", label:"Process Data", \
#    input:"data.csv", \
#    output:"result.csv"

Benefits of multiline format:

Syntax rules:

Examples

## Not run: 
# Scan a directory for workflow annotations
workflow <- put("./src/")

# Scan recursively including subdirectories
workflow <- put("./project/", recursive = TRUE)

# Scan a single file
workflow <- put("./script.R")

# Include line numbers for debugging
workflow <- put("./src/", include_line_numbers = TRUE)

# Single-line PUT annotations (basic syntax):
# #put id:"load_data", label:"Load Dataset", node_type:"input", output:"data.csv"
# #put id:"process", label:"Clean Data", node_type:"process", input:"data.csv", output:"clean.csv"
#
# Multiline PUT annotations (for better code style compliance):
# Use backslash (\) at end of line to continue on next line
# #put id:"complex_process", label:"Complex Data Processing", \
# #    input:"file1.csv,file2.csv,file3.csv,file4.csv", \
# #    output:"results.csv"
#
# Multiline example with many files:
# #put id:"data_merger", \
# #    label:"Merge Multiple Data Sources", \
# #    node_type:"process", \
# #    input:"sales.csv,customers.csv,products.csv,inventory.csv", \
# #    output:"merged_dataset.csv"
#
# All PUT formats support multiline syntax:
# # put id:"style1", label:"Standard" \     # Space after #
# #put| id:"style2", label:"Pipe" \        # Pipe separator
# #put: id:"style3", label:"Colon" \       # Colon separator

## End(Not run)

Create Mermaid Diagram from PUT Workflow

Description

Generates a Mermaid flowchart diagram from putior workflow data, showing the flow of data through your analysis pipeline.

Usage

put_diagram(
  workflow,
  output = "console",
  file = "workflow_diagram.md",
  title = NULL,
  direction = "TD",
  node_labels = "label",
  show_files = FALSE,
  show_artifacts = FALSE,
  show_workflow_boundaries = TRUE,
  style_nodes = TRUE,
  theme = "light"
)

Arguments

workflow

Data frame returned by put() containing workflow nodes

output

Character string specifying output format. Options:

  • "console" - Print to console (default)

  • "file" - Save to file specified by file parameter

  • "clipboard" - Copy to clipboard (if available)

  • "raw" - Return raw mermaid code without markdown fences (for knitr/pkgdown)

file

Character string specifying output file path (used when output = "file")

title

Character string for diagram title (optional)

direction

Character string specifying diagram direction. Options: "TD" (top-down), "LR" (left-right), "BT" (bottom-top), "RL" (right-left)

node_labels

Character string specifying what to show in nodes: "name" (node IDs), "label" (descriptions), "both" (ID: label)

show_files

Logical indicating whether to show file connections

show_artifacts

Logical indicating whether to show data files as nodes. When TRUE, creates nodes for all input/output files, not just script connections. This provides a complete view of the data flow including terminal outputs.

show_workflow_boundaries

Logical indicating whether to apply special styling to nodes with node_type "start" and "end". When TRUE, these nodes get distinctive workflow boundary styling (icons, colors). When FALSE, they render as regular nodes.

style_nodes

Logical indicating whether to apply styling based on node_type

theme

Character string specifying color theme. Options: "light" (default), "dark", "auto" (GitHub adaptive), "minimal", "github"

Value

Character string containing the mermaid diagram code

Examples

## Not run: 
# Basic usage - shows only script connections
workflow <- put("./src/")
put_diagram(workflow)

# Show all data artifacts as nodes (complete data flow)
put_diagram(workflow, show_artifacts = TRUE)

# Show artifacts with file labels on connections
put_diagram(workflow, show_artifacts = TRUE, show_files = TRUE)

# Show workflow boundaries with special start/end styling
put_diagram(workflow, show_workflow_boundaries = TRUE)

# Disable workflow boundaries (start/end nodes render as regular)
put_diagram(workflow, show_workflow_boundaries = FALSE)

# GitHub-optimized theme for README files
put_diagram(workflow, theme = "github")

# Save to file with artifacts enabled
put_diagram(workflow, show_artifacts = TRUE, output = "file", file = "workflow.md")

# For use in knitr/pkgdown - returns raw mermaid code
# Use within a code chunk with results='asis'
cat("```mermaid\n", put_diagram(workflow, output = "raw"), "\n```\n")

## End(Not run)

Sanitize node ID for mermaid compatibility (IMPROVED VERSION)

Description

Sanitize node ID for mermaid compatibility (IMPROVED VERSION)

Usage

sanitize_node_id(node_id)

Arguments

node_id

Raw node identifier

Value

Sanitized identifier safe for mermaid


Split comma-separated file list

Description

Split comma-separated file list

Usage

split_file_list(file_string)

Arguments

file_string

Comma-separated file names

Value

Character vector of individual file names


Validate PUT annotation for common issues

Description

Validate PUT annotation for common issues

Usage

validate_annotation(properties, line_content)

Arguments

properties

List of annotation properties

line_content

Original line content

Value

Character vector of validation issues