Styles

An introduction to Vale, a syntax-aware linter for prose built with speed and extensibility in mind.

Overview

Vale has a powerful extension system that doesn’t require knowledge of any programming languages. Instead, it exposes its functionality through simple YAML files.

The core component of Vale’s extension system are collections of writing guidelines called styles. These guidelines are expressed through rules, which are YAML files enforcing a particular writing construct—e.g., ensuring a certain readability level, sentence length, or heading style.

Styles are organized in a hierarchical folder structure at a user-specified location (see Configuration for more details). For example,

styles/
├── base/
│   ├── ComplexWords.yml
│   ├── SentenceLength.yml
│   ...
├── blog/
│   ├── TechTerms.yml
│   ...
└── docs/
    ├── Branding.yml
    ...

where base, blog, and docs are your styles.

Extension Points

The building blocks behind Vale’s styles are its rules, which utilize extension points to perform specific tasks.

The basic structure of a rule consists of a small header (shown below) followed by extension-specific arguments.

# All rules should define the following header keys:
#
# `extends` indicates the extension point being used (see below for information
# on the possible values).
extends: existence
# `message` is shown to the user when the rule is broken.
#
# Many extension points accept format specifiers (%s), which are replaced by
# extracted values. See the exention-specific sections below for more details.
message: "Consider removing '%s'"
# `level` assigns the rule's severity.
#
# The accepted values are suggestion, warning, and error.
level: warning
# `scope` specifies where this rule should apply -- e.g., headings, sentences, etc.
#
# See the Markup section for more information on scoping.
scope: heading
# `code` determines whether or not the content of code spans -- e.g., `foo` for
# Markdown -- is ignored.
code: false
# `link` gives the source for this rule.
link: 'https://valelint.github.io/docs/styles/#creating-a-style'

existence

Example Definition
extends: existence
message: Consider removing '%s'
level: warning
code: false
ignorecase: true
tokens:
    - appears to be
    - arguably
NAME TYPE DESCRIPTION
append bool Adds raw to the end of tokens, assuming both are defined.
ignorecase bool Makes all matches case-insensitive.
nonword bool Removes the default word boundaries (\b).
raw array A list of tokens to be concatenated into a pattern.
tokens array A list of tokens to be transformed into a non-capturing group.

The most general extension point is existence. As its name implies, it looks for the “existence” of particular tokens.

These tokens can be anything from simple phrases (as in the above example) to complex regular expressions—e.g., the number of spaces between sentences and the position of punctuation after quotes.

You may define the tokens as elements of lists named either tokens (shown above) or raw. The former converts its elements into a word-bounded, non-capturing group. For instance,

tokens:
  - appears to be
  - arguably

becomes \b(?:appears to be|arguably)\b.

raw, on the other hand, simply concatenates its elements—so, something like

raw:
  - '(?:foo)\sbar'
  - '(baz)'</code></pre>

becomes (?:foo)\sbar(baz).

substitution

Example Definition
extends: substitution
message: Consider using '%s' instead of '%s'
ignorecase: true
level: warning
swap:
  abundance: plenty
  accelerate: speed up
NAME TYPE DESCRIPTION
ignorecase bool Makes all matches case-insensitive.
nonword bool Removes the default word boundaries (\b).
swap map A sequence of observed: expected pairs.
pos string A regular expression matching tokens to parts of speech.

substitution associates a string with a preferred form. If we want to suggest the use of “plenty” instead of “abundance,” for example, we’d write:

swap:
  abundance: plenty

The keys may be regular expressions, but they can’t include nested capture groups:

swap:
  '(?:give|gave) rise to': lead to # this is okay
  '(give|gave) rise to': lead to # this is bad!

Like existence, substitution accepts the keys ignorecase and nonword.

substitution can have one or two %s format specifiers in its message. This allows us to do either of the following:

message: "Consider using '%s' instead of '%s'"
# or
message: "Consider using '%s'"

occurrence

Example Definition
extends: occurrence
message: "Sentences should be less than 25 words"
scope: sentence
level: suggestion
max: 25
token: '\b(\w+)\b'
NAME TYPE DESCRIPTION
max int The maximum amount of times token may appear in a given scope.
token string The token of interest.

occurrence limits the number of times a particular token can appear in a given scope. In the example above, we’re limiting the number of words per sentence.

This is the only extension point that doesn’t accept a format specifier in its message.

repetition

Example Definition
extends: repetition
message: "'%s' is repeated!"
level: error
scope: paragraph
ignorecase: true
tokens:
  - '\b(\w+)\b'
NAME TYPE DESCRIPTION
ignorecase bool Makes all matches case-insensitive.
alpha bool Limits all matches to alphanumeric tokens.
tokens array A list of tokens to be transformed into a non-capturing group.

repetition looks for repeated occurrences of its tokens. If ignorecase is set to true, it’ll convert all tokens to lower case for comparison purposes.

consistency

Example Definition
extends: consistency
message: "Inconsistent spelling of '%s'"
level: warning
scope: text
ignorecase: true
either:
  advisor: adviser
  centre: center
NAME TYPE DESCRIPTION
nonword bool Removes the default word boundaries (\b).
ignorecase bool Makes all matches case-insensitive.
either map A map of option 1: option 2 pairs, of which only one may appear.

consistency will ensure that a key and its value (e.g., “advisor” and “adviser”) don’t both occur in its scope.

conditional

Example Definition
extends: conditional
message: "'%s' has no definition"
level: warning
scope: text
first: \b([A-Z]{3,5})\b
second: (?:\b[A-Z][a-z]+ )+\(([A-Z]{3,5})\)
exceptions:
  - ABC
NAME TYPE DESCRIPTION
ignorecase bool Makes all matches case-insensitive.
first string The antecedent of the statement.
second string The consequent of the statement.
exceptions array An array of strings to be ignored.

conditional ensures that the existence of first implies the existence of second. For example, consider the following text:

According to Wikipedia, the World Health Organization (WHO) is a specialized agency of the United Nations that is concerned with international public health. We can now use WHO because it has been defined, but we can’t use DAFB because people may not know what it represents. We can use DAFB when it’s presented as code, though.

Running vale on the above text with our example rule yields the following:

test.md:1:224:vale.UnexpandedAcronyms:'DAFB' has no definition

conditional also takes an optional exceptions list. Any token listed as an exception won’t be flagged.

capitalization

Example Definition
extends: capitalization
message: "'%s' should be in title case"
level: warning
scope: heading
# $title, $sentence, $lower, $upper, or a pattern.
match: $title
style: AP # AP or Chicago; only applies when match is set to $title.
NAME TYPE DESCRIPTION
match string $title, $sentence, $lower, $upper, or a pattern.
style string AP or Chicago; only applies when match is set to $title.
exceptions array An array of strings to be ignored.

capitalization checks that the text in the specified scope matches the case of match. There are a few pre-defined variables that can be passed as matches:

  • $title: “The Quick Brown Fox Jumps Over the Lazy Dog.”
  • $sentence: “The quick brown fox jumps over the lazy dog.”
  • $lower: “the quick brown fox jumps over the lazy dog.”
  • $upper: “THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.”

Additionally, when using match: $title, you can specify a style of either AP or Chicago.

readability

Example Definition
extends: readability
message: "Grade level (%s) too high!"
level: warning
grade: 8
metrics:
  - Flesch-Kincaid
  - Gunning Fog
NAME TYPE DESCRIPTION
metrics array One or more of Gunning Fog, Coleman-Liau, Flesch-Kincaid, SMOG, and Automated Readability.
grade float The highest acceptable score.

readability calculates a readability score according the specified metrics. The supported tests are Gunning-Fog, Coleman-Liau, Flesch-Kincaid, SMOG, and Automated Readability.

If more than one is listed (as seen above), the scores will be averaged. This is also the only extension point that doesn’t accept a scope, as readability is always calculated using the entire document.

gradeis the highest acceptable score. Using the example above, a warning will be issued if grade exceeds 8.

spelling

Example Definition
extends: spelling
message: "Did you really mean '%s'?"
level: error
ignore: ci/vocab.txt
NAME TYPE DESCRIPTION
aff string The fully-qualified path to a Hunspell-compatible .aff file.
custom bool Turn off the default filters for acronyms, abbreviations, and numbers.
dic string The fully-qualified path to a Hunspell-compatible .dic file.
filters array An array of patterns to ignore during spell checking.
ignore string A relative path to a personal vocabulary file.

spelling implements spell checking based on Hunspell-compatible dictionaries. By default, Vale includes en_US-web—an up-to-date, actively maintained dictionary. However, you may also specify your own via the dic and aff keys (the fully-qualified paths are required; e.g., /usr/share/hunspell/en_US.dic).

spelling also accepts an ignore file, which consists of one word per line to be ignored during spell checking.

Additionally, you may further customize the spell-checking experience by defining filters:

extends: spelling
message: "Did you really mean '%s'?"
level: error
# This disables the built-in filters. If you omit this key or set it to false,
# custom filters (see below) are added on top of the built-in ones.
#
# By default, Vale includes filters for acronyms, abbreviations, and numbers.
custom: true
# A "filter" is a regular expression specifying words to ignore during spell
# checking.
filters:
  - '[pP]y.*\b'  # Ignore all words starting with 'py' -- e.g., 'PyYAML'.
ignore: ci/vocab.txt