Title: | "Register In- and Outputs for Workflow Visualization" |
Version: | 0.1.0 |
Author: | Philipp Thoss |
Maintainer: | Philipp Thoss <ph.thoss@gmx.de> |
Description: | Provides tools for extracting and processing structured annotations from 'R' and 'Python' source files to facilitate workflow visualization. The package scans source files for special 'PUT' annotations that define nodes, connections, and metadata within a data processing workflow. These annotations can then be used to generate visual representations of data flows and processing steps across polyglot software environments. Builds on concepts from literate programming Knuth (1984) <doi:10.1093/comjnl/27.2.97> and utilizes directed acyclic graph (DAG) theory for workflow representation Foraita, Spallek, and Zeeb (2014) <doi:10.1007/978-0-387-09834-0_65>. Diagram generation powered by 'Mermaid' Sveidqvist (2014) https://mermaid.js.org/. |
Language: | en-US |
License: | MIT + file LICENSE |
URL: | https://pjt222.github.io/putior/, https://github.com/pjt222/putior |
BugReports: | https://github.com/pjt222/putior/issues |
Depends: | R (≥ 3.5.0) |
Imports: | tools |
Suggests: | testthat (≥ 3.0.0), knitr, rmarkdown, clipr, uuid, pkgdown |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Config/testthat/edition: | 3 |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-06-18 10:49:27 UTC; phtho |
Repository: | CRAN |
Date/Publication: | 2025-06-19 14:50:02 UTC |
Convert results list to data frame
Description
Convert results list to data frame
Usage
convert_results_to_df(results, include_line_numbers)
Arguments
results |
List of annotation results |
include_line_numbers |
Whether line numbers are included |
Value
Data frame
Create artifact nodes for data files
Description
Create artifact nodes for data files
Usage
create_artifact_nodes(workflow)
Arguments
workflow |
Workflow data frame |
Value
Data frame with artifact node definitions
Create empty result data frame
Description
Create empty result data frame
Usage
empty_result_df(include_line_numbers = FALSE)
Arguments
include_line_numbers |
Whether to include line_number column |
Value
Empty data frame with correct structure
Generate connections between nodes
Description
Generate connections between nodes
Usage
generate_connections(workflow, show_files = FALSE, show_artifacts = FALSE)
Arguments
workflow |
Workflow data frame with combined script and artifact nodes |
show_files |
Whether to show file-based connections |
show_artifacts |
Whether artifacts are included in the workflow |
Value
Character vector of connection definitions
Generate node definitions for mermaid diagram
Description
Generate node definitions for mermaid diagram
Usage
generate_node_definitions(
workflow,
node_labels = "label",
show_workflow_boundaries = TRUE
)
Arguments
workflow |
Workflow data frame |
node_labels |
What to show in node labels |
show_workflow_boundaries |
Whether to apply special styling to start/end nodes |
Value
Character vector of node definitions
Generate node styling based on node types and theme
Description
Generate node styling based on node types and theme
Usage
generate_node_styling(
workflow,
theme = "light",
show_workflow_boundaries = TRUE
)
Arguments
workflow |
Workflow data frame |
theme |
Color theme ("light", "dark", "auto", "minimal", "github") |
Value
Character vector of styling definitions
Get available themes for put_diagram
Description
Returns information about available color themes for workflow diagrams.
Usage
get_diagram_themes()
Value
Named list describing available themes
Examples
# See available themes
get_diagram_themes()
## Not run:
# Use a specific theme (requires actual workflow data)
workflow <- put("./src")
put_diagram(workflow, theme = "github")
## End(Not run)
Get node shape characters based on node type
Description
Get node shape characters based on node type
Usage
get_node_shape(node_type, show_workflow_boundaries = TRUE)
Arguments
node_type |
Node type string |
show_workflow_boundaries |
Whether to apply special workflow boundary styling |
Value
Character vector with opening and closing shape characters
Get color schemes for different themes (FIXED VERSION)
Description
Get color schemes for different themes (FIXED VERSION)
Usage
get_theme_colors(theme)
Arguments
theme |
Theme name |
Value
Named list of color definitions for each node type
Handle diagram output to different destinations
Description
Handle diagram output to different destinations
Usage
handle_output(mermaid_code, output = "console", file = NULL, title = NULL)
Arguments
mermaid_code |
Generated mermaid code |
output |
Output format |
file |
File path for file output |
title |
Diagram title |
Validate PUT annotation syntax
Description
Test helper function to validate PUT annotation syntax
Usage
is_valid_put_annotation(line)
Arguments
line |
Character string containing a PUT annotation |
Value
Logical indicating if the annotation is valid
Examples
is_valid_put_annotation('#put name:"test", label:"Test"') # TRUE
is_valid_put_annotation("#put invalid syntax") # FALSE
Parse comma-separated pairs while respecting quotes
Description
Parse comma-separated pairs while respecting quotes
Usage
parse_comma_separated_pairs(text)
Arguments
text |
Text to parse |
Value
Character vector of pairs
Extract PUT Annotation Properties
Description
Parses a single line containing a PUT annotation and extracts key-value pairs. Supports flexible syntax with optional spaces and pipe separators.
Usage
parse_put_annotation(line)
Arguments
line |
Character string containing a PUT annotation |
Value
Named list containing all extracted properties, or NULL if invalid
Process a single file for PUT annotations
Description
Process a single file for PUT annotations
Usage
process_single_file(file, include_line_numbers, validate)
Arguments
file |
Path to file |
include_line_numbers |
Whether to include line numbers |
validate |
Whether to validate annotations |
Value
List of annotation results or error message
Scan R and Python Files for PUT Annotations
Description
Scans source files in a directory for PUT annotations that define workflow nodes, inputs, outputs, and metadata. Supports both R and Python files with flexible annotation syntax including single-line and multiline formats.
Usage
put(
path,
pattern = "\\.(R|r|py|sql|sh|jl)$",
recursive = FALSE,
include_line_numbers = FALSE,
validate = TRUE
)
Arguments
path |
Character string specifying the path to the folder containing files, or path to a single file |
pattern |
Character string specifying the file pattern to match. Default: "\.(R|r|py|sql|sh|jl)$" (R, Python, SQL, shell, Julia files) |
recursive |
Logical. Should subdirectories be searched recursively? Default: FALSE |
include_line_numbers |
Logical. Should line numbers be included in output? Default: FALSE |
validate |
Logical. Should annotations be validated for common issues? Default: TRUE |
Value
A data frame containing file names and all properties found in annotations. Always includes columns: file_name, file_type, and any properties found in PUT annotations (typically: id, label, node_type, input, output). If include_line_numbers is TRUE, also includes line_number. Note: If output is not specified in an annotation, it defaults to the file name.
PUT Annotation Syntax
PUT annotations can be written in single-line or multiline format:
Single-line format: All parameters on one line
#put id:"node1", label:"Process Data", input:"data.csv", output:"result.csv"
Multiline format: Use backslash (\) for line continuation
#put id:"node1", label:"Process Data", \ # input:"data.csv", \ # output:"result.csv"
Benefits of multiline format:
Compliance with code style guidelines (styler, lintr)
Improved readability for complex workflows
Easier maintenance of long file lists
Better code organization and documentation
Syntax rules:
End lines with backslash (\) to continue
Each continuation line must start with # comment marker
Properties are automatically joined with proper comma separation
Works with all PUT formats: #put, # put, #put|, #put:
Examples
## Not run:
# Scan a directory for workflow annotations
workflow <- put("./src/")
# Scan recursively including subdirectories
workflow <- put("./project/", recursive = TRUE)
# Scan a single file
workflow <- put("./script.R")
# Include line numbers for debugging
workflow <- put("./src/", include_line_numbers = TRUE)
# Single-line PUT annotations (basic syntax):
# #put id:"load_data", label:"Load Dataset", node_type:"input", output:"data.csv"
# #put id:"process", label:"Clean Data", node_type:"process", input:"data.csv", output:"clean.csv"
#
# Multiline PUT annotations (for better code style compliance):
# Use backslash (\) at end of line to continue on next line
# #put id:"complex_process", label:"Complex Data Processing", \
# # input:"file1.csv,file2.csv,file3.csv,file4.csv", \
# # output:"results.csv"
#
# Multiline example with many files:
# #put id:"data_merger", \
# # label:"Merge Multiple Data Sources", \
# # node_type:"process", \
# # input:"sales.csv,customers.csv,products.csv,inventory.csv", \
# # output:"merged_dataset.csv"
#
# All PUT formats support multiline syntax:
# # put id:"style1", label:"Standard" \ # Space after #
# #put| id:"style2", label:"Pipe" \ # Pipe separator
# #put: id:"style3", label:"Colon" \ # Colon separator
## End(Not run)
Create Mermaid Diagram from PUT Workflow
Description
Generates a Mermaid flowchart diagram from putior workflow data, showing the flow of data through your analysis pipeline.
Usage
put_diagram(
workflow,
output = "console",
file = "workflow_diagram.md",
title = NULL,
direction = "TD",
node_labels = "label",
show_files = FALSE,
show_artifacts = FALSE,
show_workflow_boundaries = TRUE,
style_nodes = TRUE,
theme = "light"
)
Arguments
workflow |
Data frame returned by |
output |
Character string specifying output format. Options:
|
file |
Character string specifying output file path (used when output = "file") |
title |
Character string for diagram title (optional) |
direction |
Character string specifying diagram direction. Options: "TD" (top-down), "LR" (left-right), "BT" (bottom-top), "RL" (right-left) |
node_labels |
Character string specifying what to show in nodes: "name" (node IDs), "label" (descriptions), "both" (ID: label) |
show_files |
Logical indicating whether to show file connections |
show_artifacts |
Logical indicating whether to show data files as nodes. When TRUE, creates nodes for all input/output files, not just script connections. This provides a complete view of the data flow including terminal outputs. |
show_workflow_boundaries |
Logical indicating whether to apply special styling to nodes with node_type "start" and "end". When TRUE, these nodes get distinctive workflow boundary styling (icons, colors). When FALSE, they render as regular nodes. |
style_nodes |
Logical indicating whether to apply styling based on node_type |
theme |
Character string specifying color theme. Options: "light" (default), "dark", "auto" (GitHub adaptive), "minimal", "github" |
Value
Character string containing the mermaid diagram code
Examples
## Not run:
# Basic usage - shows only script connections
workflow <- put("./src/")
put_diagram(workflow)
# Show all data artifacts as nodes (complete data flow)
put_diagram(workflow, show_artifacts = TRUE)
# Show artifacts with file labels on connections
put_diagram(workflow, show_artifacts = TRUE, show_files = TRUE)
# Show workflow boundaries with special start/end styling
put_diagram(workflow, show_workflow_boundaries = TRUE)
# Disable workflow boundaries (start/end nodes render as regular)
put_diagram(workflow, show_workflow_boundaries = FALSE)
# GitHub-optimized theme for README files
put_diagram(workflow, theme = "github")
# Save to file with artifacts enabled
put_diagram(workflow, show_artifacts = TRUE, output = "file", file = "workflow.md")
# For use in knitr/pkgdown - returns raw mermaid code
# Use within a code chunk with results='asis'
cat("```mermaid\n", put_diagram(workflow, output = "raw"), "\n```\n")
## End(Not run)
Sanitize node ID for mermaid compatibility (IMPROVED VERSION)
Description
Sanitize node ID for mermaid compatibility (IMPROVED VERSION)
Usage
sanitize_node_id(node_id)
Arguments
node_id |
Raw node identifier |
Value
Sanitized identifier safe for mermaid
Split comma-separated file list
Description
Split comma-separated file list
Usage
split_file_list(file_string)
Arguments
file_string |
Comma-separated file names |
Value
Character vector of individual file names
Validate PUT annotation for common issues
Description
Validate PUT annotation for common issues
Usage
validate_annotation(properties, line_content)
Arguments
properties |
List of annotation properties |
line_content |
Original line content |
Value
Character vector of validation issues