Skip to content

Conversation

@lecping
Copy link
Collaborator

@lecping lecping commented Dec 16, 2025

This pull request consists 6 SQL which can be run in psql bash.

  1. SQL to initialize database for dataset metadata, file metadata of dataset, tool registry data and user usage and history data.
  2. SQL to generate username and system account for scripts.
  3. Configure roles permission for datasetDB.
  4. Configure roles permission for fileDB.
  5. Configure roles permission for toolDB.
  6. Configure roles permission for userDB.

@lecping lecping requested review from Michal-Kolomanski and tobiasschweizer and removed request for Michal-Kolomanski December 16, 2025 14:58
@lecping
Copy link
Collaborator Author

lecping commented Dec 16, 2025

@Michal-Kolomanski and @tobiasschweizer , please have a first look on the SQL, it is very likely that we will move most of the dataset metadata to the DB called datasetDB instead of admin.

@lecping
Copy link
Collaborator Author

lecping commented Dec 16, 2025

ownership definition:
createDB.sql -> Metadata Warehouse Owner
user.sql -> Metadata Warehouse Owner
datasetDBconfig -> WP4
fileDBconfig -> WP5-6
toolDBconfig -> WP5-6
userDBconfig -> WP4

Copy link
Collaborator

@tobiasschweizer tobiasschweizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should these single SQL statements be executed? I think it would make sense to have a script that executes them in a certain order. What happens if the DBB already exists?

Note that there is already a script that creates the admin DB: https://github.com/EOSC-Data-Commons/metadata-warehouse/blob/main/scripts/postgres_data/create_db.py

@@ -0,0 +1,8 @@
-- database to store metadata about datasets
CREATE DATABASE datasetDB;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this relate to existing DB structure? Is datasetDB supposed to replace the existing DB structure?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If yes, then this structure would have to comply with what we currently have since the existing code depends on it (SQL queries, FastAPI)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if admin is going to be replaced by datasetDB, then this would need to be adapted when creating the psycopg client as now it is assumed that the db user and the db name are the same.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it will be the case, I can update accordingly. with a database and user of the same name "admin"is not easy for monitoring purpose.

---------------------------------------------------------
-- CREATE GOLOBAL USERS

CREATE ROLE reggie LOGIN PASSWORD 'reggie';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another way to manage user? This seems hard to maintain as new users will join the project.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can write a procedure to skip those roles which have been already created before. So far we still do not have procedure.

Copy link
Collaborator

@Michal-Kolomanski Michal-Kolomanski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid code duplication, suggested template:

-- =========================================
-- Database RBAC template for {{DB_NAME}}
-- =========================================

\c {{DB_NAME}}

-- -----------------------------------------
-- 1. CREATE ROLE GROUPS (NOLOGIN)
-- -----------------------------------------
CREATE ROLE {{DB_NAME}}_read NOLOGIN;
CREATE ROLE {{DB_NAME}}_readwrite NOLOGIN;
CREATE ROLE {{DB_NAME}}_admin NOLOGIN;

-- -----------------------------------------
-- 2. DATABASE-LEVEL SECURITY
-- -----------------------------------------

-- Nobody connects by default
REVOKE CONNECT ON DATABASE {{DB_NAME}} FROM PUBLIC;

-- Only role groups may connect
GRANT CONNECT ON DATABASE {{DB_NAME}}
  TO {{DB_NAME}}_read,
     {{DB_NAME}}_readwrite,
     {{DB_NAME}}_admin;

-- -----------------------------------------
-- 3. SCHEMA LOCKDOWN
-- -----------------------------------------

-- Remove dangerous defaults
REVOKE ALL ON SCHEMA public FROM PUBLIC;
REVOKE CREATE ON SCHEMA public FROM PUBLIC;

-- -----------------------------------------
-- 4. SCHEMA ACCESS BY ROLE
-- -----------------------------------------

-- Read-only: see objects, nothing else
GRANT USAGE ON SCHEMA public TO {{DB_NAME}}_read;

-- Read-write: create & modify objects
GRANT USAGE, CREATE ON SCHEMA public TO {{DB_NAME}}_readwrite;

-- Admin: full control
GRANT ALL ON SCHEMA public TO {{DB_NAME}}_admin;

-- -----------------------------------------
-- 5. EXISTING OBJECT PRIVILEGES
-- -----------------------------------------

-- Tables
GRANT SELECT ON ALL TABLES IN SCHEMA public TO {{DB_NAME}}_read;
GRANT SELECT, INSERT, UPDATE ON ALL TABLES IN SCHEMA public TO {{DB_NAME}}_readwrite;
GRANT ALL ON ALL TABLES IN SCHEMA public TO {{DB_NAME}}_admin;

-- Sequences 
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA public TO {{DB_NAME}}_readwrite;
GRANT ALL ON ALL SEQUENCES IN SCHEMA public TO {{DB_NAME}}_admin;

-- Functions
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA public
  TO {{DB_NAME}}_read, {{DB_NAME}}_readwrite;
GRANT ALL ON ALL FUNCTIONS IN SCHEMA public TO {{DB_NAME}}_admin;

-- -----------------------------------------
-- 6. DEFAULT PRIVILEGES (FUTURE OBJECTS)
-- -----------------------------------------

-- Tables
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO {{DB_NAME}}_read;

ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT, INSERT, UPDATE ON TABLES TO {{DB_NAME}}_readwrite;

ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT ALL ON TABLES TO {{DB_NAME}}_admin;

-- Sequences
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT USAGE, SELECT ON SEQUENCES TO {{DB_NAME}}_readwrite;

ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT ALL ON SEQUENCES TO {{DB_NAME}}_admin;

-- Functions
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT EXECUTE ON FUNCTIONS
  TO {{DB_NAME}}_read, {{DB_NAME}}_readwrite;

ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT ALL ON FUNCTIONS TO {{DB_NAME}}_admin;

Then we can have a dedicated sql file for granting access - the only place users get power - easy to maintain and monitor, debug etc.

e.g.

-- datasetDB
GRANT datasetDB_read      TO reggie;
GRANT datasetDB_readwrite TO ping;

-- fileDB
GRANT fileDB_read TO reggie;

@@ -0,0 +1,8 @@
-- database to store metadata about datasets
CREATE DATABASE datasetDB;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably should define here the owner, encoding and change the default template to template0 to achieve separation (no inheritance)

CREATE DATABASE datasetDB
  WITH OWNER = postgres
       ENCODING = 'UTF8'
       TEMPLATE = template0;

CREATE DATABASE fileDB
  WITH OWNER = postgres
       ENCODING = 'UTF8'
       TEMPLATE = template0;

CREATE DATABASE toolDB
  WITH OWNER = postgres
       ENCODING = 'UTF8'
       TEMPLATE = template0;

CREATE DATABASE userDB
  WITH OWNER = postgres
       ENCODING = 'UTF8'
       TEMPLATE = template0;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Michal-Kolomanski, that is a good idea but I suggest to let each owner to maintain the configuration of their database. I think that it will take more time to debug and delay if we force each DB to have the same template.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main goal is to let DB owner to start fast and populate the required data for now. If we expected to have more than 100 databases and 100 user logins... your idea will be exactly needed for the refactoring.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the access control shouldn't be exactly the same for each DB?
We need the same roles in each DB, and only certain people should be able to read/write.
In my scenario, we only maintain a single file that pairs people with roles:

-- datasetDB
GRANT datasetDB_read      TO reggie;
GRANT datasetDB_readwrite TO ping;

We basically supply only access control and roles, everything else is custom made by the owners

---------------------------------------------------------
-- CREATE GOLOBAL USERS

CREATE ROLE reggie LOGIN PASSWORD 'reggie';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, we should set no privileges like:

CREATE ROLE reggie LOGIN PASSWORD 'CHANGE_ME' NOSUPERUSER NOCREATEDB NOCREATEROLE;
CREATE ROLE jusong LOGIN PASSWORD 'CHANGE_ME' NOSUPERUSER NOCREATEDB NOCREATEROLE;
CREATE ROLE ritwik LOGIN PASSWORD 'CHANGE_ME' NOSUPERUSER NOCREATEDB NOCREATEROLE;
CREATE ROLE tobias LOGIN PASSWORD 'CHANGE_ME' NOSUPERUSER NOCREATEDB NOCREATEROLE;
CREATE ROLE josip LOGIN PASSWORD 'CHANGE_ME' NOSUPERUSER NOCREATEDB NOCREATEROLE;
CREATE ROLE michal LOGIN PASSWORD 'CHANGE_ME' NOSUPERUSER NOCREATEDB NOCREATEROLE;
CREATE ROLE ping LOGIN PASSWORD 'CHANGE_ME' NOSUPERUSER NOCREATEDB NOCREATEROLE;
CREATE ROLE vincent LOGIN PASSWORD 'CHANGE_ME' NOSUPERUSER NOCREATEDB NOCREATEROLE;
CREATE ROLE eko LOGIN PASSWORD 'CHANGE_ME' NOSUPERUSER NOCREATEDB NOCREATEROLE;

CREATE ROLE script LOGIN PASSWORD 'CHANGE_ME'
  NOSUPERUSER NOCREATEDB NOCREATEROLE;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is good, should I change them now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@@ -0,0 +1,41 @@
-- ---------------------------
-- DATABASE 1 – Dataset / FAIR, connect to datasetDB first
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies to every DB

I believe that firstly, we need to lock down the PUBLIC schema. Without this any user that can connect to the DB, should be able to do anything

REVOKE ALL ON SCHEMA public FROM PUBLIC;
REVOKE CREATE ON SCHEMA public FROM PUBLIC;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Result, anyone can still:

  • Open a connection to datasetDB
  • Start a session
  • Consume a connection slot
  • See database metadata

-> They just can’t do much after connecting

to avoid this add also:
REVOKE CONNECT ON DATABASE datasetDB FROM PUBLIC;

Comment on lines +7 to +9
CREATE ROLE datasetDB_read;
CREATE ROLE datasetDB_readwrite;
CREATE ROLE datasetDB_admin;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rule:

Humans & services → LOGIN
Permissions → NOLOGIN

CREATE ROLE datasetDB_read NOLOGIN;
CREATE ROLE datasetDB_readwrite NOLOGIN;
CREATE ROLE datasetDB_admin NOLOGIN;

GRANT SELECT ON ALL TABLES IN SCHEMA public TO datasetDB_read;

-- READWRITE
GRANT USAGE, CREATE ON SCHEMA public TO datasetDB_readwrite;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing sequences & functions privileges

-- Default privileges for sequences
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT USAGE, SELECT ON SEQUENCES TO datasetDB_readwrite;

ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT ALL ON SEQUENCES TO datasetDB_admin;

-- Default privileges for functions
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT EXECUTE ON FUNCTIONS TO datasetDB_read, datasetDB_readwrite;

ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT ALL ON FUNCTIONS TO datasetDB_admin;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants