You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the scenario where multiple scripts are listed in datapackage.yml there are two options for accessing objects created via scripts earlier in the list:
datapackager_object_read, which is for accessing objects that were run in the same build i.e. both Rmd files are toggled as enabled: yes
project_data_path, which allows loading an .rda file created in a previous iteration of package_build()
This creates a relationship between the two scripts that requires manual updates when rebuilding package. Assuming the case of two processing scripts, preprocess_A and preprocess_B which generate A.rda and B.rda, respectively. preprocess_B uses the output from preprocess_A,
In the following build scenario, we would use datapackager_object_read:
# Case 1files:preprocess_A.Rmd:enabled:yespreprocess_B.Rmd:enabled:yes
In a subsequent build that is of type 2, we have to update preprocess_B to use project_data_path:
# Case 2files:preprocess_A.Rmd:enabled:nopreprocess_B.Rmd:enabled:yes
There is a certain logic to this update, because it is a change of state in preprocess_B, to no longer be coupled with preprocess_A.
However, if preprocess_A needs to be rerun for some reason, we have to take the following action:
update datapackager.yml to enable both files
switch preprocess_B.Rmd to use datapackager_object_read again (not especially intuitive)
Wondering if its possible that data objects are always read from the /data/ location, but after any previous scripts have written to that folder? This would enforce that the latest data is always used, while maximizing code portability.
The text was updated successfully, but these errors were encountered:
In the scenario where multiple scripts are listed in
datapackage.yml
there are two options for accessing objects created via scripts earlier in the list:datapackager_object_read
, which is for accessing objects that were run in the same build i.e. both Rmd files are toggled asenabled: yes
project_data_path
, which allows loading an.rda
file created in a previous iteration ofpackage_build()
This creates a relationship between the two scripts that requires manual updates when rebuilding package. Assuming the case of two processing scripts, preprocess_A and preprocess_B which generate A.rda and B.rda, respectively. preprocess_B uses the output from preprocess_A,
In the following build scenario, we would use
datapackager_object_read
:In a subsequent build that is of type 2, we have to update preprocess_B to use
project_data_path
:There is a certain logic to this update, because it is a change of state in preprocess_B, to no longer be coupled with preprocess_A.
However, if preprocess_A needs to be rerun for some reason, we have to take the following action:
datapackager_object_read
again (not especially intuitive)Wondering if its possible that data objects are always read from the
/data/
location, but after any previous scripts have written to that folder? This would enforce that the latest data is always used, while maximizing code portability.The text was updated successfully, but these errors were encountered: