Type: | Package |
Title: | Read Text Out of Modern Office Files |
Version: | 0.2.2 |
Author: | Mark Ewing |
Maintainer: | Mark Ewing <b.mark@ewingsonline.com> |
Description: | Reads in text from 'unstructured' modern Microsoft Office files (XML based files) such as Word and PowerPoint. This does not read in structured data (from Excel or Access) as there are many other great packages to that do so already. |
License: | Unlimited |
Encoding: | UTF-8 |
LazyData: | true |
Imports: | xml2 (≥ 1.0.0), rvest (≥ 0.3.2), purrr (≥ 0.2.2), magrittr (≥ 1.5) |
RoxygenNote: | 6.0.1 |
NeedsCompilation: | no |
Packaged: | 2017-03-02 16:30:51 UTC; u772700 |
Repository: | CRAN |
Date/Publication: | 2017-03-08 08:22:32 |
Read data from a Modern Word File
Description
Read data from a Modern Word File
Usage
read_docx(docx)
Arguments
docx |
The .docx file to read |
Details
Only accepts one file at a time and only .docx files. Modifying file extensions will not work.
Text is broken out into the XML defined paragraphs in the vector.
Value
Vector of document text
Examples
read_docx(docx = system.file('extdata','example.docx',package='readOffice'))
Read data from a Modern PowerPoint File
Description
Read data from a Modern PowerPoint File
Usage
read_pptx(pptx)
Arguments
pptx |
The .pptx file to read |
Details
Only accepts one file at a time and only .pptx files. Modifying file extensions will not work.
The returned list contains named lists of the elements on the slide, each element of which is either a data.frame or a matrix containing the text and minor details about the structure on the page.
Data frames will contain the text in addition to the following columns: "Bulleted" indicates if the text is part of a bulleted or numbered list on the slide. "Hierarchy" indicates the tabbed depth of the element in a bulleted or numbered list (NA if not bulleted).
Alternatively, returns a matrix for tables on the slide.
Value
List containing slide elements.
Examples
read_pptx(system.file('extdata','example.pptx',package='readOffice'))