Lexico is a data searching library which supports a wide variety of search algorithms along with support for Google-inspired search operators. It is intended to be used with moderately sized datasets in environments where more sophisticated and powerful search engines / databases may not be feasible to use.
- β Search operators
- β Boolean searching
- β Fuzzy searching
- β Prefix searching
- β Type-safe query builder
- π§ͺ Search indexes
Inspiration to build Lexico arose from the fact that data searching libraries in JavaScript had very basic support for search operators in a search query. Google search for example has a lot of nifty search operators to include / exclude terms, search specific websites and so on. While most databases that support full-text search did have these search operators, they are hard and infeasible to use in a browser environment with moderately sized dataset in-memory. This is what Lexico aims to solve β to bring support for most of the search operators in a JS library so that it can be used anywhere for powerful data searching! All packaged in an easy to learn and terse syntax.
npm i lexico
import Lexico from 'lexico';
// Assume that data is an array of objects
const data = [];
const lexico = new Lexico(); // By default, `BinaryCmp` comparator is used if not specified
lexico.search('search query', data);
lexico.search('another search query', data);
Alternatively, you can call compile
to compile a search query. Single compiled query can be searched across different datasets, this will be faster than previous method if search query does not change.
import Lexico from 'lexico';
const data = [];
const lexico = new Lexico();
const search = lexico.compile('search query');
search(data);
Lexico supports multiple searching algorithms in the form of comparators. They can be imported from the library and passed in the comparator
field while creating a Lexico
instance. The functions in the resulting Lexico
instance remains the same.
import Lexico, { BinaryCmp, FuzzyCmp, PrefixCmp } from 'lexico';
const data = [];
const binarySearch = new Lexico({
comparator: new BinaryCmp(),
});
const fuzzySearch = new Lexico({
comparator: new FuzzyCmp(),
});
const prefixSearch = new Lexico({
comparator: new PrefixCmp(),
});
Sometimes a query has to be built programmatically without allowing the user to use the search operators directly. Lexico exposes a programmatic, operator-safe, and type-safe query builder that can be used.
import Lexico, { BinaryCmp, FuzzyCmp, PrefixCmp } from 'lexico';
const data = [];
const lexico = new Lexico({
comparator: new FuzzyCmp(),
});
// Equivalent to `"john doe" AND (team:engineering OR location:India)`
const search = lexico
.build()
.search('john doe')
.AND()
.search(
lexico
.build()
.search('engineering', { scope: 'team' })
.OR()
.search('India', { scope: 'location' })
)
.compile();
// Search the dataset on the compiled query.
search(data);
This is an exhaustive list of Lexico's syntax going over each operator. It might seem complicated to understand at first, but they naturally make sense when you start using them.
-
βοΈ Search term:
Single search terms. They represent the simplest query to be searched in the dataset.-
Simple:
apple
Searches in all the shallow fields in each object for the termapple
.
They are treated as case-insensitive in currently supported search algorithms of Lexico. -
Scoped to a field:
name:apple
Searches in the string value ofname
field in each object for termapple
.- Deep fields:
name.scientific:malus
Lexico by default only searches on shallow fields of object. For deeper fields, they have to be explicitly scoped using dot-path syntax. The above query searches in each object'sname
field'sscientific
field's string value for termmalus
.
- Deep fields:
-
Exclude:
!apple
Searches in all the shallow fields in each object and excludes them from the result if some field has the termapple
. -
Exclude scoped to a field:
name:!apple
Searches each object and excludes it if the string value ofname
field has the termapple
. -
Comparisons:
Comparisons use standard comparison operators>
,<
,>=
,<=
.-
Numeral comparisons:
>5
,<10
,>=2e3
,<=40.4
Searches in all the shallow fields in each object if it's value is greater than 5, less than 10, greater than or equal to 2e3(2000), less than or equal to 40.4 respectively. -
String comparisons:
>apple
,<orange
,>=zzz
,<=3M
Similar to previous example but uses native string comparison.
Note: If the value in the field being searched has a different type than
string
ornumber
(eg. booleans, nested objects, etc.), it's converted to its string representation and then exact matching is used instead of comparison.Comparisons with scoped fields
age:>18
,duration:<=3500
,area:<3.14
, etc.name:>apple
,description:<orange
, etc.
-
-
-
𧫠Compound search:
These queries combine multiple search terms specified above.-
AND
operator:apple AND orange
Searches in each object which includes the termapple
andorange
in some of its shallow fields. -
OR
operator:apple OR orange
Searches in each object which includes the termapple
ororange
in its shallow fields.
-
Implicit
AND
:apple orange
AND
can be eliminated and the above query is treated the same asapple AND orange
.Few more examples:
name:apple OR description:orange
-name
field withapple
ordescription
field withorange
.name:apple AND orange
-name
field withapple
and some field withorange
.name:apple type:fruit
-name
field withapple
andtype
field withfruit
.name:apple age:<3
-name
field withapple
andage
field less than3
.name:!apple OR name:!orange
-name
field withoutapple
orname
field withoutorange
.
-
-
π§° Nesting:
(apple OR orange)
Nesting groups multiple terms to a single search term. They are not for representing priority order of search expressions. The above query is same asapple OR orange
. Nesting is useful when used with other operators in Lexico.-
Compound search within a field:
name:(apple OR orange)
Same asname:apple OR name:orange
but shorter. -
Nested scoped fields:
name:(apple scientific:malus)
Same asname:apple AND name.scientific:malus
. -
Exclude compound search terms:
!(apple AND orange)
Uses De-Morgan's law to expand the nesting. Searches for all objects without both apple and orange terms. Thus, it's treated as!apple OR !orange
. Can also be written as!(apple orange)
. -
Nested compound search:
apple AND (orange OR grape)
Searches for LHS ofAND
, then RHS. Remember, brackets are not for specifying priority order. -
!(name:!apple OR age:<5)
: You guess what this is π (Hint: try expanding the nesting)Answer
Same as
name:apple AND age:>=5
. Expand the nest using De-Morgan's law. Double exclusions cancel out, exclusion of<
yields>=
results.
-
-
π Escaping and single term:
Sometimes, our search terms might include characters which are treated as operators in Lexico. To prevent them from being treated as operators and cause un-intended side-effects, consider wrapping them with double quotes" "
.Examples
"!"
: Searches for!
within the object.apples "AND" oranges
: Searches for objects withapples
includingAND
and also includingoranges
within its fields. Note:AND
here is not treated as AND operator discussed above because it is escaped with double quotes."apples AND oranges"
: Searches forapples AND oranges
within the object.
Double quotes can also be used to opt-out of the implicit AND behavior when required. For example in
apple orange
,apple
andorange
is considered as two separate search terms and is hence treated asapple AND orange
(Read implicit AND behavior above). Ifapple orange
is supposed to be treated as a single search term, then it should be written as"apple orange"
. This will search forapple orange
within all objects.
Lexico separates the search algorithm from the syntax and its associated query parsing algorithm. Each search algorithm is called a "Comparator". The benefit of having this separation is that it enables various search algorithms and approaches on an already defined and understood syntax without building a completely new library.
Each comparator(the search algorithm) defines its fundamental operations for the search operators supported by Lexico. Then it can be simply used with the same syntax with almost no code change for the consumer. Lexico currently supports following 3 comparators which have their own use-cases along with advantages and drawbacks:
-
BinaryCmp
: Boolean search algorithm. It checks if a single search term is exactly present in the string being compared to. If it is present, the record is included for further checks, otherwise it is excluded. -
FuzzyCmp
: Fuzzy search algorithm. Instead of hard inclusion / exclusion of records in the result, it assigns a score to each record indicating how close it matches the search query. Higher the score, higher the match. -
PrefixCmp
: Prefix search algorithm. Similar toBinaryCmp
but only checks if single search term is the prefix of a word in the string being compared to. It internally uses optimized data structures when used with search indexes and is extremely fast compared toBinaryCmp
for equivalent queries.
Some operators may not be defined for some comparators. Eg. comparison operators(>
, <
, >=
, <=
) has no meaning when used in fuzzy search / prefix search. Hence, FuzzyCmp
and PrefixCmp
both fallback to BinaryCmp
's logic for performing comparisons which only includes records that satisfy the comparison.
This section is coming soon!
Lexico is MIT Licensed