This is a medium-weight library I created to simplify a singular task I kept running into at my work. If this library isn't useful to you, I'm not offended.
The goal of this library is to create indexes on collections of JavaScript objects so that they can be filtered more efficiently.
The reason I say "medium-weight" is because I've added several hepler functions that go beyond the basic scope of the library in order to simplify tasks I encounter at my job.
npm install --save jsindex
I'll create a GitHub wiki to document the APIs, parameters, etc.
This entire library operates under the assumption you're dealing with a collection -- that is, an array of JavaScript objects with the same schema. This is very common in web applications. For instance, if you query a database for all students you may end up with something like the following:
var students = [
{name: "Alice", gpa: 4.0, language: "English"},
{name: "Bob", gpa: 2.5, language: "Spanish"},
{name: "Charlie", gpa: 3.2, language: "English"},
...
];
Given a collection like this, the easiest way to utilize indexing is as like so:
students.index();
After calling this, the students
array will have a new idx
member:
console.log(students.idx);
// Outputs:
{
name: {
Alice: [
{name: "Alice", gpa: 4.0, language: "English"},
...
],
Bob: [
{name: "Bob", gpa: 2.5, language: "Spanish"},
...
],
Charlie: [
{name: "Charlie", gpa: 3.2, language: "English"},
...
],
...
},
gpa: {
2.5: [
{name: "Bob", gpa: 2.5, language: "Spanish"},
...
],
3.2: [
{name: "Charlie", gpa: 3.2, language: "English"},
...
],
4.0: [
{name: "Alice", gpa: 4.0, language: "English"},
...
],
...
},
language: {
English: [
{name: "Alice", gpa: 4.0, language: "English"},
{name: "Charlie", gpa: 3.2, language: "English"},
...
],
Spanish: [
{name: "Bob", gpa: 2.5, language: "Spanish"},
...
],
...
}
}
Each object in the index is actually a reference to the original object in the collection, so this index takes up very little memory. As well, because the index consists of references, you can do the following:
students.idx["langauge"]["English"][0].name = "Paul";
This will set Alice
s name to Paul
everywhere (in the index,
in the original collection, etc).
The advantage to this structure is that it lets us filter the array in constant time. Instead of:
students.filter(s => s.name === "Alice");
We can just write:
student.idx.name["Alice"];
This is the primary reason this library was created. At my job I often have multiple collections like the following:
var parties = [
{party: "R", partyName: "Republican"},
{party: "D", partyName: "Democrat"}
];
var candidates = [
{name: "Hillary", party: "D"},
{name: "Trump", party: "R"}
];
This means I'm often writing code like the following:
document.getElementById("trumps-party").innerHTML = (
parties.filter(p => p.party === (
candidates.filter(c => c.name === "Trump")[0].party
))[0].partyName
);
This is obviously inefficient. Each time we want to display a candidate's
political party we have to scan the entire list of parties. If we have a
table of politicians, we scan the entire list once per candidate. This kind of
N^2
problem can make dashboards run really slow.
To correct this I usually write code like the following:
var partyMap = {};
for (var i = 0; i < parties.length; i++)
partyMap[parties[i].party] = parties[i].partyName;
document.getElementById("trumps-party").innerHTML = (
partyMap[
candidates.filter(c => c.name === "Trump")[0].party
]
);
Note how we've removed the parties.filter
and replaced it with a simple
object property lookup (a constant-time operation).
The problem is that I find myself re-writing this mapping for loop on every single project I do. So I felt like I should abstract it into its own library. So long as I was doing that, I thought I could make the API a bit more powerful and useful. Consider the new code:
require("jsindex");
parties.index();
candidates.index();
candidates.merge(parties, "party");
document.getElementById("trumps-party").innerHTML = (
candidates.filter(c => c.name === "Trump")[0].partyName
);
Now we've even eliminated the map lookup! The two collections have been merged into one:
console.log(candidates.merge(parties, "party"));
// Outputs:
[
{name: "Hillary", party: "D", partyName: "Democrat"},
{name: "Trump", party: "R", partyName: "Republican"}
]
This is another feature I found myself constantly using at my job. Given a collection with various properties, I may want to filter on multiple properties based on user input:
var housesForSale = [
{address: "123 Sample Dr", sqft: 1600, beds: 3, baths: 2, basement: false},
{address: "987 Example St", sqft: 3200, beds: 5, baths: 3, basement: true},
...
]
If a user selects that they want 3 bedrooms, I could use the index to filter like so:
housesForSale.index();
console.log(housesForSale.idx.beds[3]);
However what if a user selects 3 bedrooms and a basement? Now I need to
select all records in one index that are also in the other index, and I'm back
to writing a .filter()
:
housesForSale.index();
beds = housesForSale.idx.beds[3];
basement = housesForSale.idx.basement[true];
console.log(beds.filter(house => basement.indexOf(house) >= 0));
Not only is this ugly, but notice that we have a new N^2
problem! We run the
indexOf
operator (which iterates the entire basement
array) once per
element of the beds
array. As with the partyMap
solution in the merge
feature, it's possible to utilize hashmaps to solve this more efficiently. But
rather than writing all of the code each time, I just created the following
API:
housesForSale.index();
console.log(housesForSale.search({
beds: 3,
basement: true
}));
Fun, fast, and efficient!
One more thing I do all of the time with my job. I often receive a CSV (or a
list of CSVs) that I need to load into a JavaScript object. There actually
wasn't much work to do here, because I just used the csv-parse
library. But
I override some of the default options with my preferred options.
var collection = Array.load(csvData);
One more thing I do all of the time with my job. My boss likes to have "CSV
export" buttons on all data tables. Why re-write this code 1000 times? I just
use the csv-stringify
library, override a couple default options, and then
I get this:
var csvData = Array.store(collection);
Several native array functions have been overriden in order to ensure the index stays valid as you modify the collection or any record in the collection.
- push
- pop
- shift
- unshift
- splice
concat(I've opted to not override the default concat method. See #2)
Like with all of my projects (see minimux) I don't do this as a full-time job. Well, this one I kind of do, but you get my point. There will be bugs, and I won't fix them all. Feel free to submit issues, make pull requests, or just email me.
Unlike minimux, this library is not minimalist. I will accept any new features so long as they don't come with a major performance impact to existing functionality.
If you want to contribute, the best way to get started is to check out the
issue list. Issues tagged with "good first issue" require minimal
knowledge of the library, data structures, etc - but just the ability to grep
the code for a pattern and re-write it with something functionally equivalent.
I try to go through and tag all issues with either the major
, minor
, or
patch
labels to indicate the effect on the semantic version. In general I
will attempt to knock out all patch
issues before solving minor
issues,
I will attempt to knock out all minor
issues befor solving major
issues,
and I will solve all major
issues last. That said, I may skip or ignore an
issue if it looks too hard to solve or if I feel like the proper solution will
change with a major version change (and therefore any work done will just
make the major version change harder)
I kept asking myself why I did things one way instead of another. So I decided to document my thought processes in the README for my future-self's benefit.
This could also benefit anyone who chooses to contribute to this library.
Why do we override the array functions instead of using an observer?
Array.observe has been deprecated and replaced by Proxy. Some common array operations, when using a Proxy, involve incredible numbers of function calls. For example, consider the following:
var arr = [];
for (var i = 0; i < 1000000; i++)
arr.push(i);
arr = new Proxy(arr, {
has: function(obj, prop) { console.log("has"); return prop in obj; },
set: function(obj, prop, val) { console.log("set"); obj[prop] = val; return true; },
deleteProperty: function(obj, prop) { console.log("delete"); delete obj[prop]; return true; }
});
arr.splice(1, 1);
This only deletes a single element, yet it reindexes every other element.
The V8 engine can do this very efficiently for a normal array. However when we
use a Proxy, you'll see has
and set
each get called 1 million times:
- has 1? (true)
- has 2? (true)
- has 3? (true)
- ...
- has 999999? (true)
- has 1000000? (false)
- set 1 = 2
- set 2 = 3
- set 3 = 4
- ...
- set 999998 = 999999
- delete 999999
A quick test in Chrome found that the native splice
function was 300x
faster than running splice
through a Proxy like this for an array of
1 million elements.
This incredible overhead for modifying the collection is why we instead replace
the native functions with our own. When you call splice
, we first call
Array.prototype.splice.apply(this, ...)
(getting the full speed of the V8
engine) and then update the index once, after everything has finished changing.
Why is the order not preserved in the index?
Certain array functions (notably sort
and splice
) can affect arbitrary
elements in the array. In order to preserve the order of elements we would need
to iterate over the entire collection once per key each time one of these
functions was called - effectively rebuilding the index from scratch.
By giving up this requirement, we can simply append new elements to the end of the indexes and only remove elements from the specific sub-indexes affected. This is potentially hundreds of times faster for very large collections.
Why does console.log(collection) show [Proxy, Proxy, Proxy, ...]?
If you update an indexed property of a record, the indexes need to be updated to reflect the change. For instance, consider the following:
var students = [
{name: "Alice", gpa: 4.0, language: "English"},
{name: "Bob", gpa: 2.5, language: "Spanish"},
{name: "Charlie", gpa: 3.2, language: "English"}
];
students.index();
students[0].language = "Spanish";
console.log(students.idx.language.Spanish);
Since Alice
s langauge was changed to Spanish
, this index needs to
be updated to return Alice
as well as Bob
. To do this we
use the Proxy API to trap changes to records and update the index.
MIT
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.