This is a hotfix release to fix a bug in import()
.
AppUsage
,
Device
, and Location
that may have prevented
files from being imported.mpathsenser
now supports the new data format as of
m-Path Sense 4.2.6. This comes with a large number of changes. Most
importantly, this means that import()
had to be updated to
handle the new data format. Both the old and new data format are now
supported by this package. With the new data format there are some
changes to the database.
First, some fields have been removed:
x
, y
, and z
fields from
Accelerometer
have been removed from import()
and all subsequent functions. These fields were only used when m-Path
Sense still collected continuous data, and for some time now only
summary data is collected. No continuous data has ever been collected
outside of pilot testing, and hence these fields have been removed.x_mean
, y_mean
, z_mean
,
x_mean_sq
, y_mean_sq
, z_mean_sq
,
and n
fields from Gyroscope
have been removed
as m-Path Sense will currently collect continuous data. These fields
were implemented in anticipation of this change but instead, for now,
gyroscopic information has been removed from the app altogether. Thus,
these fields are removed from simplicity and clarity.timezone
field has been removed from all sensor
tables. This field was once added in m-Path Sense but this never made it
to the final version. It has been removed from the database and all
subsequent functions.Second, some fields have been added:
Accelerometer
has gained many new data fields:
end_time
is the time at which the sample of the data
ended, where time
denotes the start time.n
, the number of samples, was already present but has
been moved in the ordering of the fields.x_mean
, y_mean
, and z_mean
are the mean values of the accelerometer data. These were already
present in the data and remain unchanged.x_median
, y_median
, and
z_median
are the median values of the accelerometer
data.x_std
, y_std
, and z_std
are
the standard deviations of the accelerometer data.x_aad
, y_aad
, and z_aad
are
the average absolute deviations of the accelerometer data.x_min
, y_min
, and z_min
are
the minimum values of the accelerometer data.x_max
, y_max
, and z_max
are
the maximum values of the accelerometer data.x_max_min_diff
, y_max_min_diff
, and
z_max_min_diff
are the differences between the maximum and
minimum values of the accelerometer data.x_mad
, y_mad
, and z_mad
are
the median absolute deviations of the accelerometer data.x_iqr
, y_iqr
, and z_iqr
are
the interquartile ranges of the accelerometer data.x_neg_n
, y_neg_n
, and z_neg_n
are the number of negative values of the accelerometer data.x_pos_n
, y_pos_n
, and z_pos_n
are the number of positive values of the accelerometer data.x_above_mean
, y_above_mean
, and
z_above_mean
are the number of values above the mean of the
accelerometer data.x_energy
, y_energy
, and
z_energy
are similar to x_mean_sq
,
y_mean_sq
, and z_mean_sq
, being the average
sum of squares.avg_res_acc
is the average resultant acceleration,
being average of the square roots of the values in each of the three
axis squared and added together.sma
is the signal magnitude area, being the sum of
absolute values of the three axis averaged over a window.AppUsage
table has gained 2 new fields:
end_time
is the time at which the sample of the data
ended, where time
denotes the start time. Note that this
timestamp may vary slightly from the end
field in the
data.package_name
is the full application package name.last_foreground
is the time at which the application
was last in the foreground. If the app had not yet been in the
foreground, this is NA
.Bluetooth
table has gained 2 new fields:
start_scan
is the time at which the scan started.end_scan
is the time at which the scan ended.Device
table has gained 2 new fields:
operating_system_version
is the version of the
operating system.sdk
is the version of the Android SDK or the iOS
kernel.Heartbeat
has been added to the data. This
table has the following fields:
measurement_id
, participant_id
,
date
, and time
like every other sensor.period
denotes the time period over which the a
heartbeat should be registered, in minutes.device_type
denotes the type of device of this
heartbeat.device_role_name
is the role name of the device in the
protocol.Light
table has gained 1 new field:
end_time
is the time at which the sample of the data
ended, where time
denotes the start time.Location
has gained 3 new fields:
vertical_accuracy
is the estimated vertical accuracy of
this location, in meters.heading_accuracy
is the estimated bearing accuracy of
this location, in degrees. Only available on Android.is_mock
is a boolean indicating whether this location
was mocked or not. Always FALSE
on iOS. Moreover, because
SQLite does not support booleans, this is stored as an integer.Noise
table has gained 1 new field:
end_time
is the time at which the sample of the data
ended, where time
denotes the start time.Timezone
has been added a separate sensor. This table
has the following fields:
measurement_id
, participant_id
,
date
, and time
like every other sensor.timezone
is the time zone of the device at the time of
the measurement.Data collected with previous version of m-Path Sense (henceforth
referred to as legacy data) can still be read by import()
and subsequent functions, but all new fields will have missing
values.
mpathsenser::sensors
now holds 27 sensors, being
updated with Heartbeat
and Timezone
coverage(relative = FALSE)
now show correct colours. The
colours are now based on the relative values within each sensor, such
that the highest sample is fully red and zero being fully blue.vacuum_db()
is a newly exported function within this
package. Once called upon a database, it shrinks the database to its
minimal size by cleaning up remnants from import()
.maggrittr
package has been dropped as a dependency,
favouring R
’s native pipe |>
over the
maggrittr
pipe %>%
.format
argument to geocode_rev()
to allow for different output formats from Nominatim’s API.geocode_rev()
and app_category()
now
return NA
if the client or API is offline, as per CRAN
guidelines.fix_jsons()
where files with illegal
ASCII characters could be not fixed because the file was still locked
from reading.fix_jsons()
where JSON
files could incorrectly end with }},
followed by a closing
bracket ]
on a new line. This trailing comma is now removed
by fix_jsons()
.recursive = TRUE
in unzip_data()
and
to = NULL
, the output path of the JSON files will be the
local directories through which the recursive path is traversed rather
than the main directory.This is a release with breaking changes due to removal of deprecated arguments. Please review carefully before updating.
This release also supports changes from the new release of m-Path Sense (01/02/2023). Most notably, the accelerometer and gyroscope are no longer samples of a continuous stream, but rather summaries of these streams. Old versions are still supported by all functions.
x_mean
: The average acceleration or gyroscopic value
along the x
axis within a sample;y_mean
: The average acceleration or gyroscopic value
along the y
axis within a sample;z_mean
: The average acceleration or gyroscopic value
along the z
axis within a sample;x_mean_sq
: The mean of the squared x
values within the sample;y_mean_sq
: The mean of the squared y
values within the sample;z_mean_sq
: The mean of the squared z
values within the sample; From these values, one could calculate the
L1 norm
and L2 norm
like before.timezone
to all sensor data.
Confusingly, this is not the timezone of the data itself (as
this is always in UTC), but rather the timezone the participant was in
at the time of the measurement.parallel
argument in
fix_jsons()
, test_jsons()
,
unzip_data()
, and import()
.overwrite_db
and dbname
arguments from import()
.path
and db_name
arguments from copy_db()
.link()
no longer adds an extra row before (if
add_before = TRUE
) or after (if
add_after = TRUE
) if the first or last measurement equals
the start or end time respectively.link_db()
lifecycle status to deprecated as
link_db()
depends on link()
. Eventually,
link()
might see changes in its functionality that will
cause link_db()
to break, so it is better to deprecate it
already to motivate users to stop using this function.bin_data()
incorrectly handled days
occurring after DST change.link()
gained 3 new arguments:
time
: The name of the column containing the timestamps
in x
.end_time
: Optionally, the name of the column containing
the end time in x
.y_time
: The name of the column containing the
timestamps in y
.name
: The name of the nested y
data,
defaulting to "data"
.end_time
, it is now possible to specify custom time
intervals instead of only fixed intervals through
offset_before
or offset_after
. Note that these
two functionality cannot be specified at the same time.time
and y_time
in link()
must now be explicitly named, though for the time being default to
‘time’ with a warning.continue
argument to add_gaps()
that
controls whether the last measurement(s) should be continued after a
gap.link_db()
is now soft deprecated as it provides only
marginal added functionality compared to link()
.decrypt_gps()
now takes a vector of encrypted GPS
coordinates instead of a whole data frame with fixed variables names
(latitude
and longitude
). This allows more
flexibility in its use. Also, parallelisation has been added similar to
other functions in this package (i.e. by setting a future plan,
e.g.future::plan("multisession")
).The following functions are now made defunctional and internal:
activity_duration()
app_usage()
n_screen_on()
n_screen_unlocks()
screen_duration()
,step_count()
These functions delivered incorrect output and only allowed summaries by a fixed time frame, e.g. by hour or day. These functions will be reimplemented (some with a different name) in mpathsenser 2.0.0.
add_before
or add_after
is
TRUE
in link()
, no extra row is added if there
already is a row with a timestamp exactly equal to the start of the
interval (for add_before = TRUE
) or to the end of the
interval (add_after = TRUE)
.moving_average()
now allows a lazy tibble to allow
further computations in-database after having called
moving_average()
.identify_gaps()
is now slightly more efficient.get_data()
is now case insensitive. In a future update,
all sensor names throughout all functions will be made case
insensitive.add_before = TRUE
, link()
no
longer adds an extra measurement if the first measurement in the
interval equals the start time of the interval exactly.get_data()
now allows multiple
participant_id
s to be used.external_time
has been added as an argument to
link_db()
, to be able to specify the time column in
external_data
in accordance with the change in
link()
above.link()
now correctly handles natural joins (when
by = NULL
) and cross joins (when
by = character()
).original_time
was not added for any other
nested data row except the first one, if add_before
or
add_after
was true.link()
no longer suffers from future
’s max
object restriction (500MB by default).x
and y
use different time zones in
link()
and add_before = TRUE
,
link()
now correctly leaves all time zones equal to the
input.link()
incorrectly assigned the time zone of
x
to the nested data of y
, if
add_before
or add_after
was true. This is now
changed to the time zone of y
, to ensure consistency. Note
that if the time zones of x
and y
are
different, matching will be correct but the nested data may seem off as
it will keep y
’s input time zone.identify_gaps()
now allows multiple sensors to be used.
This is particularly useful when there are no sensors with high
frequency sampling (like accelerometer and gyroscope) or to ensure there
can be no measurements within the gaps from any sensor.copy_db()
from_db
and to_db
to source_db
and target_db
respectively.activity_duration()
,
screen_duration()
, n_screen_on()
,
n_screen_unlocks()
, and step_count()
to
internal until it is clear how these functions should behave and, more
importantly, what their output should be.moving_average()
to work correctly on multiple
participants.create_db()
and the other functions, where the latter
implicitly depended on the former. The following arguments are thereby
rendered disabled:
dbname
and overwrite_db
arguments in
import()
path
and db_name
in
copy_db()
parallel
argument in several functions.
If you wish to process in parallel, you must now specify this beforehand
using a future plan, e.g.
future::plan("multisession")
. As a consequence, the package
future
is no longer a dependency (but furrr
is).plot
argument in
coverage()
. To plot a coverage chart, you can now use the
default plot()
function with the output from
coverage()
.rlang::abort
,
rlang::warn
, and rlang::inform
.import()
to be more manageable in
code. As a consequence, the dependency on rjson
and
dbx
can be dropped in favour of jsonlite
and
native SQL.lifecycle
as a dependency for deprecating
arguments.identify_gaps()
and friends
to inform the user of a possible inconsistency when identifying
gaps.identify_gaps()
from using the lag of each
measurements towards using the lead. This makes no difference in the
output but is a little easier to read.link()
or
link_gaps()
in a session, stating that using external
vectors dplyr::select()
is ambiguous.bin_data()
now correctly includes measurements in bins
that do not have a stop time. This was in particular a problem with the
last measurement of a series.bin_data()
.add_gaps()
where multiple gaps in
succession (i.e. without other data in between) were incorrectly
handled.app_category()
not being able to find the exact
app name in the search results, thereby defaulting to the
n
th result (default 1).link_gaps()
: For linking gap data to other data,
i.e. how many gaps occur within an interval.add_gaps()
: To interleave gaps with other data.bin_data()
: To subdivide data into bins, e.g. all
measurements within an hour or day.link()
has been revised and expanded:
offset
with offset_before
and
offset_after
, allowing both to be specified at the same
time (#3).add_before
and add_after
argument to allow the last row before the measurement and first row
after the measurement respectively to be added to the data.split
argument, allowing computation to be
split among many parts thereby lowering computational burden.app_category()
is now case insensitive and gained the
new argument exact
to be able to match the package name
exactly based on a partial match.get_activity()
to
activity_duration()
.link2()
to link_db()
.link()
runs out of memory when there
are too many matches (#2). link()
is now much more memory
efficient and slightly faster.get_data()
which allowed multiple
sensors to be requested from one function call, sometimes leading to
crashes (#4).link()
where column
original_time
is missing if no records before or after the
interval are found (#6).import()
where sensor data not present
in first file of the batch are dropped for the other files well.app_category()
to work with the updated Google
Play website.