Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new catalog and cache APIs #1545

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

usamoi
Copy link
Contributor

@usamoi usamoi commented Feb 9, 2024

Provide a safe wrapper of PostgreSQL catalog and cache.

@usamoi usamoi marked this pull request as draft February 9, 2024 02:53
@usamoi usamoi force-pushed the catalog branch 8 times, most recently from 8b6b4f2 to 760420f Compare February 10, 2024 06:32
@usamoi
Copy link
Contributor Author

usamoi commented Feb 10, 2024

Failed compile-fail tests make me really confused...

Screenshot_20240210_143228

@usamoi usamoi force-pushed the catalog branch 3 times, most recently from e1e20e3 to c0cb502 Compare February 10, 2024 12:20
@usamoi usamoi marked this pull request as ready for review February 10, 2024 12:20
Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to completely remove the docs from the functions, so I can't accept a strict regression on our documentation like this.

pgrx/src/pg_catalog.rs Outdated Show resolved Hide resolved
pgrx/src/pg_catalog.rs Outdated Show resolved Hide resolved
pgrx/src/pg_catalog.rs Outdated Show resolved Hide resolved
pgrx/src/pg_catalog.rs Outdated Show resolved Hide resolved
@usamoi
Copy link
Contributor Author

usamoi commented Feb 12, 2024

Added documents copied from PostgreSQL website.

@workingjubilee
Copy link
Member

Ah, most of them were copied, huh? Hm. Will have to think a little about that.

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can follow the macro expansion patterns now, but I'm somewhat uneasy about the code and its durability in the future. I would prefer that this came with more ways of enforcing the expansion was correct, and more explanation as to why the unsafe code is correct, so that it is easier to evaluate changes of it in terms of soundness. Especially if e.g. Postgres changes the size of a value and invalidates a safety assumption we are tacitly relying on.

pgrx/src/pg_catalog.rs Outdated Show resolved Hide resolved
pgrx/src/pg_catalog.rs Outdated Show resolved Hide resolved
pgrx/src/pg_catalog.rs Outdated Show resolved Hide resolved
#[doc = concat!("Safe wrapper for pg_catalog.", stringify!($table), ".")]
$(#[$table_meta])*
pub struct [<$table:camel>]<'a> {
inner: &'a pgrx::pg_sys::HeapTupleData,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally I avoid holding references to variably-sized structs due to UCG#256 still being unresolved. I think it doesn't matter in this case.

pgrx/src/pg_catalog.rs Outdated Show resolved Hide resolved
pgrx/src/pg_catalog.rs Outdated Show resolved Hide resolved
Comment on lines 21 to 49
unsafe impl<T> GetStruct<T> for T {
unsafe fn get_struct(raw: *const T) -> Self {
unsafe { raw.read() }
}
}

unsafe impl GetStruct<pgrx::pg_sys::nameData> for &CStr {
unsafe fn get_struct(raw: *const pgrx::pg_sys::nameData) -> Self {
unsafe { CStr::from_ptr(raw.cast::<c_char>()) }
}
}

unsafe impl GetStruct<pgrx::pg_sys::int2vector> for &[i16] {
unsafe fn get_struct(raw: *const pgrx::pg_sys::int2vector) -> Self {
unsafe { (*raw).values.as_slice((*raw).dim1 as usize) }
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing the reason we need this is because sometimes we want to read the data, sometimes we want to get a CStr, and sometimes we need a slice, huh? But what about the other types that terminate in VLAs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All members in FromData_* are fixed-sized. There are only 3 exceptions: oidvector, int2vector and bytea. If there is oidvector, int2vector or bytea (only in pg_largeobject) in FromData_*, they must be the last struct member. Someone who needs bytea or there are more variable-sized types in future PostgreSQL versions could add it by themselves.

nameData is fixed-sized 64-byte array in fact. It's guaranteed that it's nul-padded so we can get CStr from it and we expose &CStr to users since it's more friendly.

@usamoi
Copy link
Contributor Author

usamoi commented Feb 24, 2024

The type of a pg_catalog column is unlikely to change. But if it changes,

  1. if it's in FromData_* (at most cases), type checking fails.
  2. if it's not in FromData_*, it's unsound.

If we want to check the second condition, we can get oid of the attribute in run-time (or test codes) and compare it with what we record in macro-call statements. Will the check make you more confident?

@workingjubilee
Copy link
Member

@usamoi Oh, I'm less concerned about runtime types, and more concerned about a later code change by someone else making something subtly wrong that nonetheless the macros allow, and it being hard to tell.

@usamoi
Copy link
Contributor Author

usamoi commented Apr 1, 2024

Unsafe code is mostly moved out of macro_rules now.

There are still several unsafe blocks like unsafe { call_xyz() } in macro_rules.

@usamoi
Copy link
Contributor Author

usamoi commented May 19, 2024

@workingjubilee Could you give me some advice about this patch? If it could not be merged, I feel it's better to move code into a downstream package.

@workingjubilee
Copy link
Member

Sorry about not getting back to you about this! I was consumed by trying to write down as much of #1661 and #1701 as possible. I am going to try to take another look this week.

@workingjubilee workingjubilee self-requested a review May 21, 2024 18:56
Copy link
Contributor

@eeeebbbbrrrr eeeebbbbrrrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm marking this "Request Changes", but really I just mean "Lets discuss how we can generate the catalog declarations in pg_catalog.rs...

}
}

define_catalog! {
Copy link
Contributor

@eeeebbbbrrrr eeeebbbbrrrr May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just picking one of these at random...

You hand-wrote all of these define_catalog!() declarations? I don't see that as sustainable. Is there any way we can generate this, via build.rs, from the Postgres sources?

Even if that meant we'd need our own equivalent of genbki.pl (in Rust, obviously), that will be infinitely better than trying to manually keep these up-to-date. Postgres does change catalogs from time to time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All declarations are hand-written. If a new column is added in a newer version of PostgreSQL, who uses it could add it again, so I thought it's not a problem.

It looks that this process could be done by parsing catalog/pg_*.h and catalogs.sgml. I will try it later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who uses it could add it again

The problem with that is it's not guaranteed anyone would notice, which might lead to miscompilations or runtime UB or who-knows-what.

In build.rs, we ultimately have the struct definitions as provided by bindgen. Maybe those can be traversed to generate the remaining code?

Ultimately, this would get us 100% coverage of all the catalogs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does seem like it would be best, although it might be a bit toilsome to implement the first time and occasionally to keep up to date.

usamoi added a commit to tensorchord/pgrx that referenced this pull request Aug 21, 2024
usamoi added a commit to tensorchord/pgrx that referenced this pull request Aug 22, 2024
usamoi added a commit to tensorchord/pgrx that referenced this pull request Sep 27, 2024
usamoi added a commit to tensorchord/pgrx that referenced this pull request Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants