Type: Package
Title: Chinese Numerals Processing
Version: 0.1.5
Maintainer: Elgar Teo <elgarteo@connect.hku.hk>
URL: https://github.com/elgarteo/cnum/
BugReports: https://github.com/elgarteo/cnum/issues
Description: Chinese numerals processing in R, such as conversion between Chinese numerals and Arabic numerals as well as detection and extraction of Chinese numerals in character objects and string. This package supports the casual scale naming system and the respective SI prefix systems used in mainland China and Taiwan: "The State Council's Order on the Unified Implementation of Legal Measurement Units in Our Country" The State Council of the People's Republic of China (1984) "Names, Definitions and Symbols of the Legal Units of Measurement and the Decimal Multiples and Submultiples" Ministry of Economic Affairs (2019) https://gazette.nat.gov.tw/egFront/detail.do?metaid=108965.
License: MIT + file LICENSE
Encoding: UTF-8
Depends: R(≥ 2.10)
Imports: stringr, Rcpp
Suggests: magrittr
LinkingTo: Rcpp, BH
RoxygenNote: 7.3.2
NeedsCompilation: yes
Packaged: 2025-01-11 17:28:13 UTC; Elgar
Author: Elgar Teo [aut, cre]
Repository: CRAN
Date/Publication: 2025-01-11 22:00:02 UTC

cnum: Working with Chinese Numerals

Description

This R package provides useful functions to work with Chinese numerals in R, such as conversion between Chinese numerals and Arabic numerals as well as detection and extraction of Chinese numerals in character objects and string.

Warnings

This package supports conversion of numbers with absolute value not greater than 1e+18. Note that numbers in R are in double precision that carries approximately 16 significant digits. The conversion accuracy for numbers beyond this limit is therefore not guaranteed.

Note

Due to technical limitation, R package documentation cannot contain any non-ASCII characters. Therefore, Chinese characters are represented in romanized Chinese pinyin in the documentation. Visit the GitHub page for examples in Chinese.

Author(s)

Elgar Teo (elgarteo@connect.hku.hk)

See Also

GitHub page: https://github.com/elgarteo/cnum


Chinese Numerals Conversion

Description

Functions to convert between Chinese and Arabic numerals.

Usage

c2num(
  x,
  lang = default_cnum_lang(),
  mode = "casual",
  financial = FALSE,
  literal = FALSE
)

num2c(
  x,
  lang = default_cnum_lang(),
  mode = "casual",
  financial = FALSE,
  literal = FALSE,
  single = FALSE
)

Arguments

x

the Arabic/Chinese numerals to be converted, or a vector of them. The absolute value must not be greater than 1e+18.

lang

the language of the Chinese numerals. "tc" for Traditional Chinese. "sc" for Simplified Chinese. The default is "tc", but this can be changed by setting options(cnum.lang = "sc").

mode

the scale naming system to be enforced. See the ‘Details’ section for the list of supported modes.

financial

logical: should the financial numerals be used (daxie shuzi)?

literal

logical: should the numerals be converted literally? (e.g. 721 to be converted to "qi er yi" instead of "qibai ershiyi" and vice versa)

single

logical: should the return result with one scale character only? (e.g. 1.5e+08 as "yi dian wuyi" instead of "yiyi wuqianwan")

Value

c2num returns a numeric vector.

num2c returns a character vector.

Functions

Details

The following scale naming systems are supported:

Warnings

The modes "casual" and "casualPRC" implements a “myriad scale” with an interval of 1e+04 for large numbers, i.e. "yi" is 10,000 times of "wan", which is different from some of the interval systems used in ancient Chinese writings.

This package supports conversion of numbers with absolute value not greater than 1e+18. Note that numbers in R are in double precision that carries approximately 16 significant digits. The conversion accuracy for numbers beyond this limit is therefore not guaranteed.

References

The standard for mode "SIprefix" Names, Definitions and Symbols of the Legal Units of Measurement and the Decimal Multiples and Submultiples is available from https://gazette.nat.gov.tw/egFront/detail.do?metaid=108965 (in Traditional Chinese).

The standard for mode "SIprefixPRC" The State Council's Order on the Unified Implementation of Legal Measurement Units in Our Country is available from the PRC State Council's website (in Simplified Chinese).

See Also

Functions for detetction and extraction

Examples

c2num("EXAMPLE CHECK")

num2c(721)
num2c(-6)
num2c(3.14)
num2c(721, literal = TRUE)
num2c(1.45e4, financial = TRUE)
num2c(6.85e4, lang = "sc", mode = "casualPRC")
num2c(1.5e4, mode = "SIprefix", single = TRUE)


Default Language for cnum

Description

Function to check the default language for cnum functions.

Usage

default_cnum_lang()

Details

This package supports Traditional Chinese and Simplified Chinese. The language can be specified with the lang parameter in every function, with "tc" for Traditional Chinese and "sc" for Simplified Chinese. The default is "tc", but this can be changed by setting options(cnum.lang = "sc").

Value

The default language for cnum functions.

See Also

Examples


# Set the default language to Simplified Chinese
options(cnum.lang = "sc")

default_cnum_lang()


Chinese Numerals Detection and Extraction

Description

Functions to detect and extract Chinese numerals in character object and string.

Usage

is_cnum(
  x,
  lang = default_cnum_lang(),
  mode = "casual",
  financial = FALSE,
  literal = FALSE,
  strict = FALSE,
  ...
)

has_cnum(
  x,
  lang = default_cnum_lang(),
  mode = "casual",
  financial = FALSE,
  ...
)

extract_cnum(
  x,
  lang = default_cnum_lang(),
  mode = "casual",
  financial = FALSE,
  prefix = NULL,
  suffix = NULL,
  ...
)

Arguments

x

the character object or string to be tested or to extract from.

lang

the language of the Chinese numerals. "tc" for Traditional Chinese. "sc" for Simplified Chinese. The default is "tc", but this can be changed by setting options(cnum.lang = "sc").

mode

the scale naming system to be enforced. See the ‘Details’ section for the list of supported modes.

financial

logical: should the financial numerals be used (daxie shuzi)?

literal

logical: should the numerals be converted literally? (e.g. 721 to be converted to "qi er yi" instead of "qibai ershiyi" and vice versa)

strict

logical: Should the Chinese numerals format be strictly enforced? A casual test only checks if x contains Chinese numerals characters. A strict test checks if x is valid Chinese numerals. (e.g. "yi bai yi" will pass the casual test and fail the strict test)

...

optional arguments to be passed to grepl (for is_cnum and has_cnum) or str_extract_all (for extract_cnum). Disregarded when strict = TRUE.

prefix

the prefix of the Chinese numerals. Only numerals with the designated prefix are extracted. Supports regular expression(s).

suffix

the suffix of the Chinese numerals. Only numerals with the designated suffix are extracted. Supports regular expression(s).

Value

is_cnum returns a logical vector indicating is Chinese numerals or not for each element of x).

has_cnum returns a logical vector indicating contains Chinese numerals or not for each element of x.

extract_cnum returns a list of character vectors containing the extracted Chinese numerals.

Functions

Details

The following scale naming systems are supported:

References

The standard for mode "SIprefix" Names, Definitions and Symbols of the Legal Units of Measurement and the Decimal Multiples and Submultiples is available from https://gazette.nat.gov.tw/egFront/detail.do?metaid=108965 (in Traditional Chinese).

The standard for mode "SIprefixPRC" The State Council's Order on the Unified Implementation of Legal Measurement Units in Our Country is available from the PRC State Council's website (in Simplified Chinese).

See Also

Functions for conversion

Examples

is_cnum("yibai ershiyi")

has_cnum("yibai bashi yuan")

extract_cnum("shisiyi ren")