Title: | Computations over Distributed Data without Aggregation |
Maintainer: | Balasubramanian Narasimhan <naras@stat.Stanford.EDU> |
Version: | 1.3-3 |
VignetteBuilder: | knitr |
URL: | http://dx.doi.org/10.18637/jss.v077.i13 |
Depends: | survival, stats, R (≥ 3.5.0) |
Imports: | utils, shiny, httr (≥ 1.0.0), digest, jsonlite, stringr, R6 (≥ 2.0), dplyr, rlang, magrittr, homomorpheR, gmp |
Suggests: | opencpu, knitr, covr, rmarkdown |
Description: | Implementing algorithms and fitting models when sites (possibly remote) share computation summaries rather than actual data over HTTP with a master R process (using 'opencpu', for example). A stratified Cox model and a singular value decomposition are provided. The former makes direct use of code from the R 'survival' package. (That is, the underlying Cox model code is derived from that in the R 'survival' package.) Sites may provide data via several means: CSV files, Redcap API, etc. An extensible design allows for new methods to be added in the future and includes facilities for local prototyping and testing. Web applications are provided (via 'shiny') for the implemented methods to help in designing and deploying the computations. |
Copyright: | inst/COPYRIGHTS |
Encoding: | UTF-8 |
License: | LGPL-2 | LGPL-2.1 | LGPL-3 [expanded from: LGPL (≥ 2)] |
RoxygenNote: | 7.2.1 |
NeedsCompilation: | yes |
Packaged: | 2022-09-01 18:28:23 UTC; naras |
Author: | Balasubramanian Narasimhan [aut, cre], Marina Bendersky [aut], Sam Gross [aut], Terry M. Therneau [ctb], Thomas Lumley [ctb] |
Repository: | CRAN |
Date/Publication: | 2022-09-01 21:00:02 UTC |
Make an appropriate opencpu URL for a specified function and url prefix for the distcomp package
Description
.makeOpencpuURL returns an appropriate URL to call a function in the distcomp package given the name of the function and a url prefix.
.defnOK returns TRUE or FALSE depending on whether the definition object meets minimimal requirements.
.deSerialize will convert the JSON result of a http response as needed, else the raw content is returned.
Usage
.makeOpencpuURL(fn, urlPrefix, package = "distcomp")
.defnOK(defn)
.deSerialize(q)
Arguments
fn |
is the name of the function in the distcomp package |
urlPrefix |
is the URL of the opencpu server with the distcomp package installed |
defn |
is the definition object passed |
q |
the result of a httr response |
Value
the formatted url as a string
TRUE or FALSE depending on the result
the converted result, if JSON, or the raw content
Examples
distcomp:::.makeOpencpuURL("foo", "http://localhost:9999/ocpu")
distcomp:::.defnOK(data.frame()) ## FALSE
distcomp:::.defnOK(data.frame(id = "ABC", stringsAsFactors=FALSE)) ## TRUE
Create a master object to control CoxWorker
worker objects
Description
CoxMaster
objects instantiate and run a distributed Cox model
computation fit
Methods
Public methods
Method new()
CoxMaster
objects instantiate and run a distributed Cox model
computation fit
Usage
CoxMaster$new(defn, debug = FALSE)
Arguments
defn
a computation definition
debug
a flag for debugging, default
FALSE
Returns
R6 CoxMaster
object
Method kosher()
Check if inputs and state of object are sane. For future use
Usage
CoxMaster$kosher()
Returns
TRUE
or FALSE
Method logLik()
Return the partial log likelihood on all data for given beta
parameter.
Usage
CoxMaster$logLik(beta)
Arguments
beta
the parameter vector
Returns
a named list with three components: value
contains the value of the
log likelihood, gradient
contains the score vector, and hessian
contains
the estimated hessian matrix
Method addSite()
Add a url or worker object for a site for participating in the distributed computation. The worker object can be used to avoid complications in debugging remote calls during prototyping.
Usage
CoxMaster$addSite(name, url = NULL, worker = NULL)
Arguments
name
of the site
url
web url of the site; exactly one of
url
orworker
should be specifiedworker
worker object for the site; exactly one of
url
orworker
should be specified
Method run()
Run the distributed Cox model fit and return the estimates
Usage
CoxMaster$run(control = coxph.control())
Arguments
control
parameters, same as
survival::coxph.control()
Returns
a named list of beta
, var
, gradient
, iter
, and returnCode
#' @description ' Return the summary of fit as a data frame
Method summary()
Usage
CoxMaster$summary()
Returns
a summary data frame columns for coef
,
exp(coef)
, ' standard error, z-score, and p-value for each
parameter in the model following the same format as the
survival
package
Method clone()
The objects of this class are cloneable with this method.
Usage
CoxMaster$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
See Also
CoxWorker
which generates objects matched to such a master object
R6 class for object to use as a worker with CoxMaster
master objects
Description
CoxWorker
objects are worker objects at each data site of
a distributed Cox model computation
Methods
Public methods
Method new()
Create a new CoxWorker
object.
Usage
CoxWorker$new(defn, data, stateful = TRUE)
Arguments
defn
the computation definition
data
the local data
stateful
a boolean flag indicating if state needs to be preserved between REST calls
Returns
a new CoxWorker
object
Method getP()
Return the dimension of the parameter vector.
Usage
CoxWorker$getP(...)
Arguments
...
other args ignored
Returns
the dimension of the parameter vector
Method getStateful()
Return the stateful status of the object.
Usage
CoxWorker$getStateful()
Returns
the stateful flag, TRUE
or FALSE
Method logLik()
Return the partial log likelihood on local data for given beta
parameter.
Usage
CoxWorker$logLik(beta, ...)
Arguments
beta
the parameter vector
...
further arguments, currently unused
Returns
a named list with three components: value
contains the value of the
log likelihood, gradient
contains the score vector, and hessian
contains
the estimated hessian matrix
Method var()
Return the variance of estimate for given beta
parameter on local data.
Usage
CoxWorker$var(beta, ...)
Arguments
beta
the parameter vector
...
further arguments, currently unused
Returns
variance vector
Method kosher()
Check if inputs and state of object are sane. For future use
Usage
CoxWorker$kosher()
Returns
TRUE
or FALSE
Method clone()
The objects of this class are cloneable with this method.
Usage
CoxWorker$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
See Also
CoxMaster
which goes hand-in-hand with this object
Create a HEMaster process for use in a distributed homomorphic encrypted (HE) computation
Description
HEMaster
objects run a distributed computation based
upon a definition file that encapsulates all information
necessary to perform a computation. A master makes use of two
non-cooperating parties which communicate with sites that
perform the actual computations using local data.
Public fields
den
denominator for rational arithmetic
den_bits
number of bits for denominator for rational arithmetic
Methods
Public methods
Method new()
Create a HEMaster
object to run homomorphic encrypted computation
Usage
HEMaster$new(defn)
Arguments
defn
the homomorphic computation definition
Returns
a HEMaster
object
Method getNC_party()
Return a list of noncooperating parties (NCPs)
Usage
HEMaster$getNC_party()
Returns
a named list of length 2 of noncooperating party information
Method getPubkey()
Return the public key from the public private key pair
Usage
HEMaster$getPubkey()
Returns
an R6 Pubkey
object
Method addNCP()
Add a noncooperating party to this master either using a url or an object in session for prototyping
Usage
HEMaster$addNCP(ncp_defn, url = NULL, ncpWorker = NULL)
Arguments
ncp_defn
the definition of the NCP
url
the url for the NCP; only one of url and ncpWorker should be non-null
ncpWorker
an instantiated worker object; only one of url and ncpWorker should be non-null
Method run()
Run a distributed homomorphic encrypted computation and return the result
Usage
HEMaster$run(debug = FALSE)
Arguments
debug
a flag for debugging, default
FALSE
Returns
the result of the distributed homomorphic computation
Method clone()
The objects of this class are cloneable with this method.
Usage
HEMaster$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
See Also
Create a homomorphic computation query count master object to employ worker objects generated by HEQueryCountWorker()
Description
HEQueryCountMaster
objects instantiate and run a distributed homomorphic query count computation; they're instantiated by non-cooperating parties (NCPs)
Super class
distcomp::QueryCountMaster
-> HEQueryCountMaster
Public fields
pubkey
the master's public key visible to everyone
pubkey_bits
the number of bits in the public key (used for reconstructing public key remotely by serializing to character)
pubkey_n
the
n
for the public key used for reconstructing public key remotelyden
the denominator for rational arithmetic
den_bits
the number of bits in the denominator used for reconstructing denominator remotely
Methods
Public methods
Inherited methods
Method new()
Create a new HEQueryCountMaster
object.
Usage
HEQueryCountMaster$new(defn, partyNumber, debug = FALSE)
Arguments
defn
the computation definition
partyNumber
the party number of the NCP that this object belongs to (1 or 2)
debug
a flag for debugging, default
FALSE
Returns
a new HEQueryCountMaster
object
Method setParams()
Set some parameters of the HEQueryCountMaster
object for homomorphic computations
Usage
HEQueryCountMaster$setParams(pubkey_bits, pubkey_n, den_bits)
Arguments
pubkey_bits
the number of bits in public key
pubkey_n
the
n
for the public keyden_bits
the number of bits in the denominator (power of 2) used in rational approximations
Method kosher()
Check if inputs and state of object are sane. For future use
Usage
HEQueryCountMaster$kosher()
Returns
TRUE
or FALSE
Method queryCount()
Run the distributed query count, associate it with a token, and return the result
Usage
HEQueryCountMaster$queryCount(token)
Arguments
token
a token to use as key
Returns
the partial result as a list of encrypted items with components int
and frac
Method cleanup()
Cleanup the instance objects
Usage
HEQueryCountMaster$cleanup()
Method run()
Run the homomorphic encrypted distributed query count computation
Usage
HEQueryCountMaster$run(token)
Arguments
token
a token to use as key
Returns
the partial result as a list of encrypted items with components int
and frac
Method clone()
The objects of this class are cloneable with this method.
Usage
HEQueryCountMaster$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
See Also
HEQueryCountWorker()
which goes hand-in-hand with this object
Create a homomorphic computation query count worker object for use with master objects generated by HEQueryCountMaster()
Description
HEQueryCountWorker
objects are worker objects at each site of
a distributed query count model computation using homomorphic encryption
Super class
distcomp::QueryCountWorker
-> HEQueryCountWorker
Public fields
pubkey
the master's public key visible to everyone
den
the denominator for rational arithmetic
Methods
Public methods
Inherited methods
Method new()
Create a new HEQueryMaster
object.
Usage
HEQueryCountWorker$new( defn, data, pubkey_bits = NULL, pubkey_n = NULL, den_bits = NULL )
Arguments
defn
the computation definition
data
the data which is usually the list of sites
pubkey_bits
the number of bits in public key
pubkey_n
the
n
for the public keyden_bits
the number of bits in the denominator (power of 2) used in rational approximations
Returns
a new HEQueryMaster
object
Method setParams()
Set some parameters for homomorphic computations
Usage
HEQueryCountWorker$setParams(pubkey_bits, pubkey_n, den_bits)
Arguments
pubkey_bits
the number of bits in public key
pubkey_n
the
n
for the public keyden_bits
the number of bits in the denominator (power of 2) used in rational approximations
Method queryCount()
Run the query count on local data and return the appropriate encrypted result to the party
Usage
HEQueryCountWorker$queryCount(partyNumber, token)
Arguments
partyNumber
the NCP party number (1 or 2)
token
a token to use for identifying parts of the same computation for NCP1 and NCP2
Returns
the count as a list of encrypted items with components int
and frac
Method clone()
The objects of this class are cloneable with this method.
Usage
HEQueryCountWorker$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
See Also
HEQueryCountMaster()
which goes hand-in-hand with this object
R6 object to use as non-cooperating party in a distributed homomorphic computation
Description
NCP
objects are worker objects that separate a
master process from communicating directly with the worker
processes. Typically two such are needed for a distributed
homomorphic computation. A master process can communicate with
NCP
objects and the NCP
objects can communicate
with worker processes. However, the two NCP
objects,
designated by numbers 1 and 2, are non-cooperating in the sense
that they don't communicate with each other and are isolated
from each other.
Public fields
pubkey
the master's public key visible to everyone
pubkey_bits
the number of bits in the public key (used for reconstructing public key remotely by serializing to character)
pubkey_n
the
n
for the public key used for reconstructing public key remotelyden
the denominator for rational arithmetic
den_bits
the number of bits in the denominator used for reconstructing denominator remotely
Methods
Public methods
Method new()
Create a new NCP
object.
Usage
NCP$new( ncp_defn, comp_defn, sites = list(), pubkey_bits = NULL, pubkey_n = NULL, den_bits = NULL )
Arguments
ncp_defn
the NCP definition; see example
comp_defn
the computation definition
sites
list of sites
pubkey_bits
the number of bits in public key
pubkey_n
the
n
for the public keyden_bits
the number of bits in the denominator (power of 2) used in rational approximations
Returns
a new NCP
object
Method getStateful()
Retrieve the value of the stateful
field
Usage
NCP$getStateful()
Method setParams()
Set some parameters of the NCP
object for homomorphic computations
Usage
NCP$setParams(pubkey_bits, pubkey_n, den_bits)
Arguments
pubkey_bits
the number of bits in public key
pubkey_n
the
n
for the public keyden_bits
the number of bits in the denominator (power of 2) used in rational approximations
Method getSites()
Retrieve the value of the private sites
field
Usage
NCP$getSites()
Method setSites()
Set the value of the private sites
field
Usage
NCP$setSites(sites)
Arguments
sites
the list of sites
Method addSite()
Add a url or worker object for a site for participating in the distributed computation. The worker object can be used to avoid complications in debugging remote calls during prototyping.
Usage
NCP$addSite(name, url = NULL, worker = NULL)
Arguments
name
of the site
url
web url of the site; exactly one of
url
orworker
should be specifiedworker
worker object for the site; exactly one of
url
orworker
should be specified
Method cleanupInstance()
Clean up by destroying instance objects created in workspace.
Usage
NCP$cleanupInstance(token)
Arguments
token
the token for the instance
Method run()
Run the distributed homomorphic computation
Usage
NCP$run(token)
Arguments
token
a unique token for the run, used to ensure that correct parts of cached results are returned appropriately
Returns
the result of the computation
Method clone()
The objects of this class are cloneable with this method.
Usage
NCP$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Create a master object to control worker objects generated by QueryCountWorker()
Description
QueryCountMaster
objects instantiate and run a distributed query count computation
Methods
Public methods
Method new()
Create a new QueryCountMaster
object.
Usage
QueryCountMaster$new(defn, debug = FALSE)
Arguments
defn
the computation definition
debug
a flag for debugging, default
FALSE
Returns
a new QueryCountMaster
object
Method kosher()
Check if inputs and state of object are sane. For future use
Usage
QueryCountMaster$kosher()
Returns
TRUE
or FALSE
Method queryCount()
Run the distributed query count and return the result
Usage
QueryCountMaster$queryCount()
Returns
the count
Method getSites()
Retrieve the value of the private sites
field
Usage
QueryCountMaster$getSites()
Method addSite()
Add a url or worker object for a site for participating in the distributed computation. The worker object can be used to avoid complications in debugging remote calls during prototyping.
Usage
QueryCountMaster$addSite(name, url = NULL, worker = NULL)
Arguments
name
of the site
url
web url of the site; exactly one of
url
orworker
should be specifiedworker
worker object for the site; exactly one of
url
orworker
should be specified
Method run()
Run the distributed query count
Usage
QueryCountMaster$run()
Returns
the count
Method clone()
The objects of this class are cloneable with this method.
Usage
QueryCountMaster$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
See Also
QueryCountWorker()
which goes hand-in-hand with this object
R6 worker object for use as a worker with master objects generated by QueryCountMaster()
Description
QueryCountWorker
objects are worker objects at each site of
a distributed QueryCount model computation
Methods
Public methods
Method new()
Create a new QueryCountWorker
object.
Usage
QueryCountWorker$new(defn, data, stateful = FALSE)
Arguments
defn
the computation definition
data
the local data
stateful
the statefulness flag, default
FALSE
Returns
a new QueryCountWorker
object
Method getStateful()
Retrieve the value of the stateful
field
Usage
QueryCountWorker$getStateful()
Method kosher()
Check if inputs and state of object are sane. For future use
Usage
QueryCountWorker$kosher()
Returns
TRUE
or FALSE
Method queryCount()
Return the query count on the local data
Usage
QueryCountWorker$queryCount()
Method clone()
The objects of this class are cloneable with this method.
Usage
QueryCountWorker$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
See Also
QueryCountMaster()
which goes hand-in-hand with this object
R6 class for SVD master object to control worker objects generated by SVDWorker()
Description
SVDMaster
objects instantiate and run a distributed SVD computation
Methods
Public methods
Method new()
SVDMaster
objects instantiate and run a distributed SVD computation
Usage
SVDMaster$new(defn, debug = FALSE)
Arguments
defn
a computation definition
debug
a flag for debugging, default
FALSE
Returns
R6 SVDMaster
object
Method kosher()
Check if inputs and state of object are sane. For future use
Usage
SVDMaster$kosher()
Returns
TRUE
or FALSE
Method updateV()
Return an updated value for the V
vector, normalized by arg
Usage
SVDMaster$updateV(arg)
Arguments
arg
the normalizing value
...
other args ignored
Returns
updated V
Method updateU()
Update U
and return the updated norm of U
Usage
SVDMaster$updateU(arg)
Arguments
arg
the normalizing value
...
other args ignored
Returns
updated norm of U
Method fixFit()
Construct the residual matrix using given the V
vector and d
so far
Usage
SVDMaster$fixFit(v, d)
Arguments
v
the value for
v
d
the value for
d
Returns
result
Method reset()
Reset the computation state by initializing work matrix and set up starting values for iterating
Usage
SVDMaster$reset()
Method addSite()
Add a url or worker object for a site for participating in the distributed computation. The worker object can be used to avoid complications in debugging remote calls during prototyping.
Usage
SVDMaster$addSite(name, url = NULL, worker = NULL)
Arguments
name
of the site
url
web url of the site; exactly one of
url
orworker
should be specifiedworker
worker object for the site; exactly one of
url
orworker
should be specified
Method run()
Run the distributed Cox model fit and return the estimates
Usage
SVDMaster$run(thr = 1e-08, max.iter = 100)
Arguments
thr
the threshold for convergence, default 1e-8
max.iter
the maximum number of iterations, default 100
Returns
a named list of V
, d
Method summary()
Return the summary result
Usage
SVDMaster$summary()
Returns
a named list of V
, d
Method clone()
The objects of this class are cloneable with this method.
Usage
SVDMaster$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
See Also
SVDWorker()
which goes hand-in-hand with this object
R6 class for a SVD worker object to use with master objects generated by SVDMaster()
Description
SVDWorker
objects are worker objects at each site of a distributed SVD model computation
Methods
Public methods
Method new()
Create a new SVDWorker
object.
Usage
SVDWorker$new(defn, data, stateful = TRUE)
Arguments
defn
the computation definition
data
the local
x
matrixstateful
a boolean flag indicating if state needs to be preserved between REST calls,
TRUE
by default
Returns
a new SVDWorker
object
Method reset()
Reset the computation state by initializing work matrix and set up starting values for iterating
Usage
SVDWorker$reset()
Method dimX()
Return the dimensions of the matrix
Usage
SVDWorker$dimX(...)
Arguments
...
other args ignored
Returns
the dimension of the matrix
Method updateV()
Return an updated value for the V
vector, normalized by arg
Usage
SVDWorker$updateV(arg, ...)
Arguments
arg
the normalizing value
...
other args ignored
Returns
updated V
Method updateU()
Update U
and return the updated norm of U
Usage
SVDWorker$updateU(arg, ...)
Arguments
arg
the initial value
...
other args ignored
Returns
updated norm of U
Method normU()
Normalize U
vector
Usage
SVDWorker$normU(arg, ...)
Arguments
arg
the normalizing value
...
other args ignored
Returns
TRUE
invisibly
Method fixU()
Construct residual matrix using arg
Usage
SVDWorker$fixU(arg, ...)
Arguments
arg
the value to use for residualizing
...
other args ignored
Method getN()
Getthe number of rows of x
matrix
Usage
SVDWorker$getN()
Returns
the number of rows of x
matrix
Method getP()
Getthe number of columnsof x
matrix
Usage
SVDWorker$getP()
Returns
the number of columns of x
matrix
Method getStateful()
Return the stateful status of the object.
Usage
SVDWorker$getStateful()
Returns
the stateful flag, TRUE
or FALSE
Method kosher()
Check if inputs and state of object are sane. For future use
Usage
SVDWorker$kosher()
Returns
TRUE
or FALSE
Method clone()
The objects of this class are cloneable with this method.
Usage
SVDWorker$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
See Also
SVDMaster()
which goes hand-in-hand with this object
Return the currently available (implemented) computations
Description
The function availableComputations
returns a list
of available computations with various components. The names of this list
(with no spaces) are unique canonical tags that are used throughout the
package to unambiguously refer to the type of computation; web applications
particularly rely on this list to instantiate objects. As more computations
are implemented, this list is augmented.
Usage
availableComputations()
Value
a list with the components corresponding to a computation
desc |
a textual description (25 chars at most) |
definitionApp |
the name of a function that will fire up a shiny webapp for defining the particular computation |
workerApp |
the name of a function that will fire up a shiny webapp for setting up a worker site for the particular computation |
masterApp |
the name of a function that will fire up a shiny webapp for setting up a master for the particular computation |
makeDefinition |
the name of a function that will return a data frame
with appropriate fields needed to define the particular computation assuming
that they are populated in a global variable. This function is used by web
applications to construct a definition object based on inputs specified
by the users. Since the full information is often gathered incrementally by
several web applications, the inputs are set in a global variable and
therefore retrieved here using the function |
makeMaster |
a function that will construct a master object for the computation given the definition and a logical flag indicating if debugging is desired |
makeWorker |
a function that will construct a worker object for that computation given the definition and data |
See Also
Examples
availableComputations()
Return currently implemented data sources
Description
The function availableDataSources
returns the
currently implemented data sources such as CSV files, Redcap etc.
Usage
availableDataSources()
Value
a list of named arguments, each of which is another list, with
required fields named desc
, a textual description and
requiredPackages
Examples
availableDataSources()
Given the definition identifier of an object, instantiate and store object in workspace
Description
The function createHEWorkerInstance
uses a
definition identified by defnId to create the appropriate
object instance for HE computations. The instantiated object is
searched for in the instance path and loaded if already
present, otherwise it is created and assigned the instanceId
and saved under the dataFileName if the latter is specified.
This instantiated object may change state between iterations
when a computation executes
Usage
createHEWorkerInstance(
defnId,
instanceId,
pubkey_bits = NULL,
pubkey_n = NULL,
den_bits = NULL,
dataFileName = NULL
)
Arguments
defnId |
the identifier of an already defined computation |
instanceId |
an indentifier to use for the created instance |
pubkey_bits |
number of bits for public key |
pubkey_n |
the |
den_bits |
the number of bits for the denominator |
dataFileName |
a file name to use for saving the
data. Typically |
Value
TRUE if everything goes well
See Also
Given the definition identifier of an object, instantiate and store object in workspace
Description
This function uses an identifier (defnId
) to locate
a stored definition in the workspace to create the appropriate
object instance. The instantiated object is assigned the
instanceId and saved under the dataFileName if the latter is
not NULL
. This instantiated object may change state between
iterations when a computation executes
Usage
createNCPInstance(
name,
ncpId,
instanceId,
pubkey_bits,
pubkey_n,
den_bits,
dataFileName = NULL
)
Arguments
name |
identifying the NC party |
ncpId |
the id indicating the NCP definition |
instanceId |
an indentifier to use for the created instance |
pubkey_bits |
the public key number of bits |
pubkey_n |
the pubkey n |
den_bits |
the denominator number of bits for for rational approximations |
dataFileName |
a file name to use for saving the
data. Typically |
Value
TRUE if everything goes well
Given the definition identifier of an object, instantiate and store object in workspace
Description
The function createWorkerInstance
uses a definition identified by
defnId to create the appropriate object instance. The instantiated object is assigned
the instanceId and saved under the dataFileName if the latter is specified.
This instantiated object may change state between iterations when a computation executes
Usage
createWorkerInstance(
defnId,
instanceId,
pubkey_bits = NULL,
pubkey_n = NULL,
den_bits = NULL,
dataFileName = NULL
)
Arguments
defnId |
the identifier of an already defined computation |
instanceId |
an indentifier to use for the created instance |
pubkey_bits |
number of bits for public key |
pubkey_n |
the |
den_bits |
the number of bits for the denominator |
dataFileName |
a file name to use for saving the data. Typically |
Value
TRUE if everything goes well
See Also
Functions copied and modified from survival package
Description
The distcomp package makes use of code from the survival package, with the permission of the original authors. This includes R code as well as C code. That is, the underlying Cox model code is derived from that in the R survival package. The original copyrights are retained for these files and the notices preserved. However, these are for internal use and future implementations may change how we use them. In order to avoid confusion and any name collision, the names of these functions have been modified to include a prefix "dc".
Usage
dccoxph(
formula,
data,
weights,
subset,
na.action,
init,
control,
ties = c("efron", "breslow", "exact"),
singular.ok = TRUE,
robust = FALSE,
model = FALSE,
x = FALSE,
y = TRUE,
tt,
method = ties,
...
)
dccoxph.fit(x, y, strata, offset, init, control, weights, method, rownames)
Define a new computation
Description
This function just calls runDistcompApp()
with the
parameter "definition"
Usage
defineNewComputation()
Value
the results of running the web application
See Also
Destroy an instance object given its identifier
Description
The function destroyInstanceObject
deletes an object associated
with the instanceId. This is typically done after a computation completes and results
have been obtained.
Usage
destroyInstanceObject(instanceId)
Arguments
instanceId |
the id of the object to destroy |
Value
TRUE if everything goes well
See Also
Distributed Computing with R
Description
distcomp
is a collection of methods to fit models to data that may be
distributed at various sites. The package arose as a way of addressing the
issues regarding data aggregation; by allowing sites to have control over
local data and transmitting only summaries, some privacy controls can be
maintained. Even when participants have no objections in principle to data
aggregation, it may still be useful to keep data local and expose just the
computations. For further details, please see the reference cited below.
Details
The initial implementation consists of a stratified Cox model fit with distributed survival data and a Singular Value Decomposition of a distributed matrix. General Linear Models will soon be added. Although some sanity checks and balances are present, many more are needed to make this truly robust. We also hope that other methods will be added by users.
We make the following assumptions in the implementation:
(a) the aggregate data is logically a stacking of data at each site, i.e.,
the full data is row-partitioned into sites where the rows are observations;
(b) Each site has the package distcomp
installed and a workspace setup
for (writeable) use by the opencpu
server
(see distcompSetup()
; and (c) each site is exposing distcomp
via an opencpu
server.
The main computation happens via a master process, a script of R code,
that makes calls to distcomp
functions at worker sites via opencpu
.
The use of opencpu
allows developers to prototype their distributed implementations
on a local machine using the opencpu
package that runs such a server locally
using localhost
ports.
Note that distcomp
computations are not intended for speed/efficiency;
indeed, they are orders of magnitude slower. However, the models that are fit are
not meant to be recomputed often. These and other details are discussed in the
paper mentioned above.
The current implementation, particularly the Stratified Cox Model, makes direct use of
code from survival::coxph()
. That is, the underlying Cox model code is
derived from that in the R survival
survival package.
For an understanding of how this package is meant to be used, please see the documented examples and the reference.
References
Software for Distributed Computation on Medical Databases: A Demonstration Project. Journal of Statistical Software, 77(13), 1-22. doi:10.18637/jss.v077.i13
Appendix E of Modeling Survival Data: Extending the Cox Model by Terry M. Therneau and Patricia Grambsch. Springer Verlag, 2000.
See Also
The examples in system.file("doc", "examples.html", package="distcomp")
The source for the examples: system.file("doc_src", "examples.Rmd", package="distcomp")
.
Setup a workspace and configuration for a distributed computation
Description
The function distcompSetup
sets up a distributed computation
and configures some global parameters such as definition file names,
data file names, instance object file names, and ssl configuration parameters. The
function creates some of necessary subdirectories if not already present and throws
an error if the workspace areas are not writeable
Usage
distcompSetup(
workspacePath = "",
defnPath = paste(workspacePath, "defn", sep = .Platform$file.sep),
instancePath = paste(workspacePath, "instances", sep = .Platform$file.sep),
defnFileName = "defn.rds",
dataFileName = "data.rds",
instanceFileName = "instance.rds",
resultsCacheFileName = "results_cache.rds",
ssl_verifyhost = 1L,
ssl_verifypeer = 1L
)
Arguments
workspacePath |
a folder specifying the workspace path. This has to be writable by the opencpu process. On a cloud opencpu server on Ubuntu, for example, this requires a one-time modification of apparmor profiles to enable write permissions to this path |
defnPath |
the path where definition files will reside, organized by computation identifiers |
instancePath |
the path where instance objects will reside |
defnFileName |
the name for the compdef definition files |
dataFileName |
the name for the data files |
instanceFileName |
the name for the instance files |
resultsCacheFileName |
the name for the instance results cache files for HE computations |
ssl_verifyhost |
integer value, usually |
ssl_verifypeer |
integer value, usually |
Value
TRUE if all is well
See Also
Examples
## Not run:
distcompSetup(workspacePath="./workspace")
## End(Not run)
Given the id of a serialized object, invoke a method on the object with arguments using homomorphic encryption
Description
The function executeHEMethod
is a homomorphic
encryption wrapper around executeMethod
. It ensures any
returned result is encrypted using the homomorphic encryption
function.
Usage
executeHEMethod(objectId, method, ...)
Arguments
objectId |
the (instance) identifier of the object on which to invoke a method |
method |
the name of the method to invoke |
... |
further arguments as appropriate for the method |
Value
a list containing an integer and a fractional result converted to characters
Given the id of a serialized object, invoke a method on the object with arguments
Description
The function executeMethod
is really the heart of
distcomp. It executes an arbitrary method on an object that
has been serialized to the distcomp workspace with any
specified arguments. The result, which is dependent on the
computation that is executed, is returned. If the object needs
to save state between iterations on it, it is automatically
serialized back for the ensuing iterations
Usage
executeMethod(objectId, method, ...)
Arguments
objectId |
the (instance) identifier of the object on which to invoke a method |
method |
the name of the method to invoke |
... |
further arguments as appropriate for the method |
Value
a result that depends on the computation being executed
Generate an identifier for an object
Description
A hash is generated based on the contents of the object
Usage
generateId(object, algo = "xxhash64")
Arguments
object |
the object for which a hash is desired |
algo |
the algorithm to use, default is "xxhash64" from
|
Value
the hash as a string
See Also
Get the value of a variable from the global store
Description
In distcomp, several web applications need to communicate between themselves. Since only one application is expected to be active at any time, they do so via a global store, essentially a hash table. This function retrieves the value of a name
Usage
getComputationInfo(name)
Arguments
name |
the name for the object |
Value
the value for the variable, NULL
if not set
See Also
Return the workspace and configuration setup values
Description
The function getConfig
returns the values of the
configuration parameters set up by distcompSetup
Usage
getConfig(...)
Arguments
... |
any further arguments |
Value
a list consisting of
workspacePath |
a folder specifying the workspace path. This has to be writable by the opencpu process. On a cloud opencpu server on Ubuntu, for example, this requires a one-time modification of apparmor profiles to enable write permissions to this path |
defnPath |
the path where definition files will reside, organized by computation identifiers |
instancePath |
the path where instance objects will reside |
defnFileName |
the name for the compdef definition files |
dataFileName |
the name for the data files |
instanceFileName |
the name for the instance files |
ssl_verifyhost |
integer value, usually |
ssl_verifypeer |
integer value, usually |
See Also
Examples
## Not run:
getConfig()
## End(Not run)
Make a computation definition given the computation type
Description
The function makeDefinition
returns a computational
definition based on current inputs (from the global store) given a
canonical computation type tag. This is a utility function for web
applications to use as input is being gathered
Usage
makeDefinition(compType)
Arguments
compType |
the canonical computation type tag |
Value
a data frame corresponding to the computation type
See Also
Examples
## Not run:
makeDefinition(names(availableComputations())[1])
## End(Not run)
Instantiate a master process for HE operations
Description
Instantiate a master process for HE operations
Usage
makeHEMaster(defn)
Arguments
defn |
the computation definition |
Value
an master object for HE operations
Make a master object given a definition
Description
The function makeMaster
returns a master object
corresponding to the definition. The types of master objects
that can be created depend upon the available computations
Usage
makeMaster(defn, partyNumber = NULL, debug = FALSE)
Arguments
defn |
the computation definition |
partyNumber |
the number of the noncooperating party, which can be optionally set if HE is desired |
debug |
a debug flag |
Value
a master object of the appropriate class based on the definition
See Also
Instantiate an noncooperating party
Description
Instantiate an noncooperating party
Usage
makeNCP(
ncp_defn,
comp_defn,
sites = list(),
pubkey_bits = NULL,
pubkey_n = NULL,
den_bits = NULL
)
Arguments
ncp_defn |
the NCP definition |
comp_defn |
the computation definition |
sites |
a list of sites each entry a named list of name, url, worker |
pubkey_bits |
number of bits for public key |
pubkey_n |
the n for the public key |
den_bits |
the log to base 2 of the denominator |
Value
an NCP object
Make a worker object given a definition and data
Description
The function makeWorker
returns an object of the
appropriate type based on a computation definition and sets the
data for the object. The types of objects that can be created
depend upon the available computations
Usage
makeWorker(defn, data, pubkey_bits = NULL, pubkey_n = NULL, den_bits = NULL)
Arguments
defn |
the computation definition |
data |
the data for the computation |
pubkey_bits |
the number of bits for the public key (used only
if |
pubkey_n |
the |
den_bits |
the number of bits for the denominator (used only
if |
Value
a worker object of the appropriate class based on the definition
See Also
Clear the contents of the global store
Description
In distcomp, several web applications need to communicate between themselves. Since only one application is expected to be active at any time, they do so via a global store, essentially a hash table. This function clears the store, except for the working directory.
Usage
resetComputationInfo()
Value
an empty list
See Also
setComputationInfo()
, getComputationInfo()
Run a specified distcomp web application
Description
Web applications can define computation, setup worker sites or masters. This function invokes the appropriate web application depending on the task
Usage
runDistcompApp(appType = c("definition", "setupWorker", "setupMaster"))
Arguments
appType |
one of three values: "definition", "setupWorker", "setupMaster" |
Value
the results of running the web application
See Also
defineNewComputation()
, setupWorker()
, setupMaster()
Save a computation instance, given the computation definition, associated data and possibly a data file name to use
Description
The function saveNewComputation
uses the computation definition to save
a new computation instance. This is typically done for every site that wants to participate
in a computation with its own local data. The function examines the computation definition
and uses the identifier therein to uniquely refer to the computation instance at the site.
This function is invoked (maybe remotely) on the opencpu server by
uploadNewComputation()
when a worker site is being set up
Usage
saveNewComputation(defn, data, dataFileName = NULL)
Arguments
defn |
an already defined computation |
data |
the (local) data to use |
dataFileName |
a file name to use for saving the data. Typically |
Value
TRUE if everything goes well
See Also
Save an NCP instance, given the sites as associated data and possibly a data file name to use
Description
The function saveNewNCP
uses the list of sites
definition to save a new NCP instance. This is
typically done for every pair of NCPs used in a computation. The function examines the
computation definition and uses the identifier therein to
uniquely refer to the computation instance at the site. This
function is invoked (maybe remotely) on the opencpu server by
uploadNewComputation()
when a worker site is being set up
Usage
saveNewNCP(defn, comp_defn, data, dataFileName = NULL)
Arguments
defn |
a definition of the ncp |
comp_defn |
the computation definition |
data |
the list of sites with name and url to use |
dataFileName |
a file name to use for saving the
data. Typically |
Value
TRUE if everything goes well
See Also
Set a name to a value in a global variable
Description
In distcomp, several web applications need to communicate between themselves. Since only one application is expected to be active at any time, they do so via a global store, essentially a hash table. This function sets a name to a value
Usage
setComputationInfo(name, value)
Arguments
name |
the name for the object |
value |
the value for the object |
Value
invisibly returns the all the name value pairs
See Also
Setup a computation master
Description
This function just calls runDistcompApp()
with the
parameter "setupMaster"
Usage
setupMaster()
Value
the results of running the web application
See Also
Setup a worker site
Description
This function just calls runDistcompApp()
with the
parameter "setupWorker"
Usage
setupWorker()
Value
the results of running the web application
See Also
Upload a new computation and data to an opencpu server
Description
The function uploadNewComputation
is really a remote version
of saveNewComputation()
, invoking that function on an opencpu server.
This is typically done for every site that wants to participate in a computation
with its own local data. Note that a site is always a list of at least a unique
name element (distinguishing the site from others) and a url element.
Usage
uploadNewComputation(site, defn, data)
Arguments
site |
a list of two items, a unique |
defn |
the identifier of an already defined computation |
data |
the (local) data to use |
Value
TRUE if everything goes well
See Also
Upload a new Non-Cooperating Party (NCP) information and sites to an opencpu server
Description
The function uploadNewNCP
is really a remote version
of saveNewNCP()
, invoking that function on an opencpu server.
This is typically done for the two NCPs participating in a
computation with the list of sites. Note that sites are always
a list of at least a unique name element (distinguishing the
site from others) and a url element.
Usage
uploadNewNCP(defn, comp_defn, url = NULL, worker = NULL, sites)
Arguments
defn |
a definition for the NCP |
comp_defn |
the computation definition |
url |
the url for the NCP. Only one of url and worker can be non-null |
worker |
the worker for the NCP if local. Only one of url and worker can be non-null |
sites |
a list of lists, each containing two items, a unique
|
Value
TRUE if everything goes well
See Also
Write the code necessary to run a master process
Description
Once a computation is defined, worker sites are set up, the master process code is written by this function. The current implementation does not allow one to mix localhost URLs with non-localhost URLs
Usage
writeCode(defn, sites, outputFilenamePrefix)
Arguments
defn |
the computation definition |
sites |
a named list of site URLs participating in the computation |
outputFilenamePrefix |
the name of the output file prefix using which code and data will be written |
Value
the value TRUE
if all goes well