|
1 | 1 | --- |
2 | | -title: 'Binary-parser: A blazing-fast declarative parser builder for binary data' |
| 2 | +title: 'Binary-parser: A declarative and efficient parser generator for binary data' |
3 | 3 | tags: |
4 | 4 | - JavaScript |
| 5 | + - TypeScript |
| 6 | + - binary |
| 7 | + - parser |
5 | 8 | authors: |
6 | 9 | - name: Keichi Takahashi |
7 | 10 | orcid: 0000-0002-1607-5694 |
8 | 11 | affiliation: 1 |
9 | 12 | affiliations: |
10 | 13 | - name: Nara Institute of Science and Technology |
11 | | -date: 21 September 2021 |
| 14 | + index: 1 |
| 15 | +date: 27 September 2021 |
12 | 16 | bibliography: paper.bib |
13 | 17 | --- |
14 | 18 |
|
15 | 19 | # Summary |
16 | 20 |
|
17 | | -The forces on stars, galaxies, and dark matter under external gravitational |
18 | | -fields lead to the dynamical evolution of structures in the universe. The orbits |
19 | | -of these bodies are therefore key to understanding the formation, history, and |
20 | | -future state of galaxies. The field of "galactic dynamics," which aims to model |
21 | | -the gravitating components of galaxies to study their structure and evolution, |
22 | | -is now well-established, commonly taught, and frequently used in astronomy. |
23 | | -Aside from toy problems and demonstrations, the majority of problems require |
24 | | -efficient numerical tools, many of which require the same base code (e.g., for |
25 | | -performing numerical orbit integration). |
| 21 | +This paper presents `binary-parser`, a JavaScript/TypeScript library that |
| 22 | +allows users to write high-performance binary parsers, and facilitates the |
| 23 | +rapid prototyping of research software that works with binary files and |
| 24 | +network protocols. `Binary-parser`'s declarative API is designed such that |
| 25 | +expressing complex binary structures is straightforward and easy. In addition |
| 26 | +to the high productivity, `binary-parser` utilizes meta-programming to |
| 27 | +dynamically generate parser codes to achieve parsing performance equivalent |
| 28 | +to a hand-written parser. `Binary-parser` is being used by over 700 GitHub |
| 29 | +repositories and 120 npm packages as of September 2021. |
26 | 30 |
|
27 | 31 | # Statement of need |
28 | 32 |
|
29 | | -`Gala` is an Astropy-affiliated Python package for galactic dynamics. Python |
30 | | -enables wrapping low-level languages (e.g., C) for speed without losing |
31 | | -flexibility or ease-of-use in the user-interface. The API for `Gala` was |
32 | | -designed to provide a class-based and user-friendly interface to fast (C or |
33 | | -Cython-optimized) implementations of common operations such as gravitational |
34 | | -potential and force evaluation, orbit integration, dynamical transformations, |
35 | | -and chaos indicators for nonlinear dynamics. `Gala` also relies heavily on and |
36 | | -interfaces well with the implementations of physical units and astronomical |
37 | | -coordinate systems in the `Astropy` package [@astropy] (`astropy.units` and |
38 | | -`astropy.coordinates`). |
39 | | - |
40 | | -`Gala` was designed to be used by both astronomical researchers and by |
41 | | -students in courses on gravitational dynamics or astronomy. It has already been |
42 | | -used in a number of scientific publications [@Pearson:2017] and has also been |
43 | | -used in graduate courses on Galactic dynamics to, e.g., provide interactive |
44 | | -visualizations of textbook material [@Binney:2008]. The combination of speed, |
45 | | -design, and support for Astropy functionality in `Gala` will enable exciting |
46 | | -scientific explorations of forthcoming data releases from the *Gaia* mission |
47 | | -[@gaia] by students and experts alike. |
48 | | - |
49 | | -# Mathematics |
50 | | - |
51 | | -Single dollars ($) are required for inline mathematics e.g. $f(x) = e^{\pi/x}$ |
52 | | - |
53 | | -Double dollars make self-standing equations: |
54 | | - |
55 | | -$$\Theta(x) = \left\{\begin{array}{l} |
56 | | -0\textrm{ if } x < 0\cr |
57 | | -1\textrm{ else} |
58 | | -\end{array}\right.$$ |
59 | | - |
60 | | -You can also use plain \LaTeX for equations |
61 | | -\begin{equation}\label{eq:fourier} |
62 | | -\hat f(\omega) = \int_{-\infty}^{\infty} f(x) e^{i\omega x} dx |
63 | | -\end{equation} |
64 | | -and refer to \autoref{eq:fourier} from text. |
65 | | - |
66 | | -# Citations |
67 | | - |
68 | | -Citations to entries in paper.bib should be in |
69 | | -[rMarkdown](http://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html) |
70 | | -format. |
71 | | - |
72 | | -If you want to cite a software repository URL (e.g. something on GitHub without a preferred |
73 | | -citation) then you can do it with the example BibTeX entry below for @fidgit. |
74 | | - |
75 | | -For a quick reference, the following citation commands can be used: |
76 | | -- `@author:2001` -> "Author et al. (2001)" |
77 | | -- `[@author:2001]` -> "(Author et al., 2001)" |
78 | | -- `[@author1:2001; @author2:2001]` -> "(Author1 et al., 2001; Author2 et al., 2002)" |
79 | | - |
80 | | -# Figures |
81 | | - |
82 | | -Figures can be included like this: |
83 | | - |
84 | | -and referenced from text using \autoref{fig:example}. |
85 | | - |
86 | | -Figure sizes can be customized by adding an optional second parameter: |
87 | | -{ width=20% } |
88 | | - |
89 | | -# Acknowledgements |
90 | | - |
91 | | -We acknowledge contributions from Brigitta Sipocz, Syrtis Major, and Semyeong |
92 | | -Oh, and support from Kathryn Johnston during the genesis of this project. |
| 33 | +Parsing binary data is a ubiquitous task in developing research software. Many |
| 34 | +scientific instruments and software tools use proprietary file formats and |
| 35 | +network protocols, while open-source libraries to work with them are often |
| 36 | +unavailable or limited. In such situations, the programmer has no choice but |
| 37 | +to write a binary parser. However, writing a binary parser by hand is |
| 38 | +error-prone and tedious because the programmer faces challenges such as |
| 39 | +understanding the specification of the binary format, correctly managing the |
| 40 | +byte/bit offsets during parsing, and constructing complex data structures as |
| 41 | +outputs. |
| 42 | + |
| 43 | +`Binary-parser` significantly reduces the programmer's effort by automatically |
| 44 | +generating efficient parser code from a declarative description of the binary |
| 45 | +format supplied by the user. The generated parser code is converted to a |
| 46 | +JavaScript function and executed for efficient parsing. To accommodate diverse |
| 47 | +needs by different users, `binary-parser` exposes various options to ensure |
| 48 | +flexibility and provide opportunities for customization. |
| 49 | + |
| 50 | +A large number of software packages have been developed using `binary-parser` |
| 51 | +that demonstrates its usefulness and practicality. Some examples include |
| 52 | +libraries and applications to work with rainfall radars [@nimrod], |
| 53 | +software-defined radio [@flexradio], GNSS receivers [@libsbp], smart meters |
| 54 | +[@linky], drones [@djiparsetxt], and thermostats [@maxcul]. |
| 55 | + |
| 56 | +# Design |
| 57 | + |
| 58 | +`Binary-parser`'s design is characterized by the following three key features: |
| 59 | + |
| 60 | +1. **Fast**: `Binary-parser` takes advantage of meta-programming to generate |
| 61 | + a JavaScript source code during runtime from the user's description of the |
| 62 | + target binary format. The generated source code is then passed to the |
| 63 | + `Function` constructor to dynamically create a function that performs |
| 64 | + parsing. This design enables `binary-parser` to achieve parsing |
| 65 | + performance comparable to a hand-written parser. |
| 66 | +2. **Declarative**: As opposed to parser combinator libraries [@monadic; @nom], |
| 67 | + `binary-parser` allows the user to express the target binary format in a |
| 68 | + declarative manner, similar to a human-readable network protocol or file |
| 69 | + format specification. The user can combine _primitive_ parsers (integers, |
| 70 | + floating point numbers, bit fields, strings and bytes) using _composite_ |
| 71 | + parsers (arrays, choices, nests and pointers) to express a wide variety of |
| 72 | + binary formats. |
| 73 | +3. **Flexible**: Unlike binary parser generators that use an external Domain |
| 74 | + Specific Language (DSL) [@kaitai; @nail], `binary-parser` uses an internal |
| 75 | + DSL implemented on top of JavaScript. This design allows the user to |
| 76 | + specify most parsing options as return values of user-defined JavaScript |
| 77 | + functions that are invoked at runtime. For example, the offset and length |
| 78 | + of a field can be computed from another field that has been parsed already. |
| 79 | + |
| 80 | +# Performance evaluation |
| 81 | + |
| 82 | +To evaluate the parsing performance of `binary-parser`, we implemented a small |
| 83 | +parser using `binary-parser` (v2.0.1) and three major JavaScript binary parser |
| 84 | +libraries: `binparse` (v1.2.1), `structron` (v0.4.3) and `destruct.js` (v0.2.9). |
| 85 | +We also implemented the same parser using Node.js's Buffer API as a baseline. |
| 86 | +The binary data to be parsed was an array of 1,000 coordinates (each expressed |
| 87 | +as three 16-bit integers) preceded by the number of coordinates (a 32-bit |
| 88 | +integer). The benchmarks were executed on a MacBook Air (Apple M1 CPU, 2020). |
| 89 | +The JavaScript runtime was Node.js (v16.9.1). |
| 90 | + |
| 91 | +{ width=80% } |
| 92 | + |
| 93 | +\autoref{fig:benchmark} shows the measurement results. Evidently, |
| 94 | +`binary-parser` significantly outperforms its alternatives by a factor of |
| 95 | +7.5$\times$ to 180$\times$. The plot also reveals that `binary-parser` |
| 96 | +achieves performance equal to a hand-written parser. |
| 97 | + |
| 98 | +# Acknowledgments |
| 99 | + |
| 100 | +This work was partly supported by JSPS KAKENHI Grant Number JP20K19808. |
93 | 101 |
|
94 | 102 | # References |
0 commit comments