Distribute and Process Iterations Across HPC Jobs — job

This function distributes a set of iterations across multiple jobs in a high-performance computing (HPC) environment. It subsets the necessary variables, calls a specified function with the relevant subsets, and saves the results to a specified directory.

Usage

job_splitter(
  job_id,
  n_iter_per_job = 1,
  iter_total,
  prefix = "res_job_",
  output_dir = NULL,
  FUN,
  subset_vars = list(),
  subset_total_var = NULL,
  ...
)

Arguments

job_id: An integer specifying the job ID in the HPC job array (e.g., 1, 2, 3, ...). Determines which subset of iterations this job will process.
n_iter_per_job: An integer specifying the number of iterations each job should process. Default is 1.
iter_total: An integer specifying the total number of iterations to be processed across all jobs.
prefix: A character string specifying the base name for the saved RDS files. Default is "res_job_".
output_dir: A character string specifying the directory where the results should be saved. If the directory does not exist, it will be created.
FUN: A function that will be called to process the data for the specified iterations. The function should accept the arguments specified in ....
subset_vars: A named list of variables (typically matrices) that need to be subset according to the job's assigned iterations. The names should correspond to the argument names in FUN.
subset_total_var: A character string specifying the name of the argument in FUN that corresponds to the total number of iterations. If provided, the total number of iterations for the current job will be assigned to this argument.
...: Additional arguments passed to FUN.

Value

This function does not return a value but saves the results of FUN to an RDS file in the specified output_dir.

Examples

if (FALSE) { # \dontrun{
# Call job_splitter, which will subset 'perm_id' and pass it to 'FUN'
job_splitter(
  job_id = 1,
  n_iter_per_job = 10,
  iter_total = 100,
  output_dir = "/path/to/output",
  FUN = brainscore,
  subset_vars = list(perm_id = perm_id),
  subset_total_var = "n_perm",
  brain_data = brain_data,
  gene_data = gene_data,
  annoData = annoData,
  cor_method = "pearson",
  aggre_method = "mean",
  null_model = "spin_brain",
  minGSSize = 20,
  maxGSSize = 200
)
} # }