Distribute and Process Iterations Across HPC Jobs
job_splitter.Rd
This function distributes a set of iterations across multiple jobs in a high-performance computing (HPC) environment. It subsets the necessary variables, calls a specified function with the relevant subsets, and saves the results to a specified directory.
Usage
job_splitter(
job_id,
n_iter_per_job = 1,
iter_total,
prefix = "res_job_",
output_dir = NULL,
FUN,
subset_vars = list(),
subset_total_var = NULL,
...
)
Arguments
- job_id
An integer specifying the job ID in the HPC job array (e.g., 1, 2, 3, ...). Determines which subset of iterations this job will process.
- n_iter_per_job
An integer specifying the number of iterations each job should process. Default is 1.
- iter_total
An integer specifying the total number of iterations to be processed across all jobs.
- prefix
A character string specifying the base name for the saved RDS files. Default is "res_job_".
- output_dir
A character string specifying the directory where the results should be saved. If the directory does not exist, it will be created.
- FUN
A function that will be called to process the data for the specified iterations. The function should accept the arguments specified in
...
.- subset_vars
A named list of variables (typically matrices) that need to be subset according to the job's assigned iterations. The names should correspond to the argument names in
FUN
.- subset_total_var
A character string specifying the name of the argument in
FUN
that corresponds to the total number of iterations. If provided, the total number of iterations for the current job will be assigned to this argument.- ...
Additional arguments passed to
FUN
.
Value
This function does not return a value but saves the results of FUN
to an RDS file in the specified output_dir
.
Examples
if (FALSE) { # \dontrun{
# Call job_splitter, which will subset 'perm_id' and pass it to 'FUN'
job_splitter(
job_id = 1,
n_iter_per_job = 10,
iter_total = 100,
output_dir = "/path/to/output",
FUN = brainscore,
subset_vars = list(perm_id = perm_id),
subset_total_var = "n_perm",
brain_data = brain_data,
gene_data = gene_data,
annoData = annoData,
cor_method = "pearson",
aggre_method = "mean",
null_model = "spin_brain",
minGSSize = 20,
maxGSSize = 200
)
} # }