[JavaScript] Detecting Hyphens and Dashes with Regular Expressions Using unicode-regex + Dash_Punctuation

Tadashi Shigeoka ·  Tue, November 5, 2019

I’ll introduce how to detect delimiter characters like hyphens and dashes with regular expressions using unicode-regex + Dash_Punctuation in JavaScript and Node.js.

JavaScript

Prerequisites: Want to Comprehensively Detect Hyphens and Dashes with Regular Expressions

There are many types of ”―” characters, and to comprehensively detect these input values with regular expressions, I decided to use the npm package unicode-regex.

Examples:

  • ― Dash
  • − Hyphen Minus
  • ‐ Hyphen

Dashes and Hyphens in Google Japanese Input

Even in Google Japanese Input conversion candidates, this many types are displayed on the first page.

Unicode Dash_Punctuation | Google IME

unicode-regex + Dash_Punctuation

Sample Code: Regular Expression to Detect Hyphens and Dashes

const unicodeRegex = require('unicode-regex');

const dashRegex = unicodeRegex({
  General_Category: ['Dash_Punctuation']
});

console.log(dashRegex.toString());

// [\\u002d\\u058a\\u05be\\u1400\\u1806\\u2010-\\u2015\\u2e17\\u2e1a\\u2e3a-\\u2e3b\\u2e40\\u301c\\u3030\\u30a0\\ufe31-\\ufe32\\ufe58\\ufe63\\uff0d]

Reference: List of Unicode Characters of Category “Dash Punctuation”

That’s all from the Gemba.