Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Good Index Characteristic- High Cardinality #171

Open
luantranminh opened this issue Jul 5, 2024 · 0 comments
Open

Good Index Characteristic- High Cardinality #171

luantranminh opened this issue Jul 5, 2024 · 0 comments
Labels

Comments

@luantranminh
Copy link
Owner

https://www.linkedin.com/posts/raul-junco_softwareengineering-softwarearchitecture-activity-7214608242134392834-YHE3?utm_source=combined_share_message&utm_medium=member_desktop

What’s wrong with this script?

CREATE TABLE students (
	ud int PRIMARY KEY,
	name VARCHAR(255),
	gender CHAR(1)
);

CREATE INDEX idx_name on students(name);
CREATE INDEX idx_gender on students(gender);

The syntax is OK, there is nothing wrong with it. The problem is in the data and related to a concept called Cardinality.

Cardinality refers to the number of unique values in a column relative to a table’s total number of rows.

  • High cardinality means the column has many unique values.
  • Low cardinality means the column has few unique values.

These statements above will create a table as below

image

Creating an index on a column with low Cardinality is most of the time ineffective because:

  1. Low Cardinality means each indexed value points to many rows, reducing the index’s ability to narrow down the search.
  2. Maintaining an index has cost of storage and update time. For low Cardinality column, this overhead might outweigh the benefits.
  3. Database query optimizers are smart; they know column statistics, including Cardinality. When they detect a low cardinality column, they often ignore it and perform a full table scan instead.

Comments section:
Q: If there is an 'isActive' column for soft deletes, most records are likely to be active (1) rather than deleted (0), how to handle it?
A: Filtered index or partition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant