diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 00000000..e69de29b diff --git a/404.html b/404.html new file mode 100644 index 00000000..2c058bde --- /dev/null +++ b/404.html @@ -0,0 +1,756 @@ + + + + + + + + + + + + + + + + + + HelloDATA BE Docs + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+
+ +
+ + + + +
+ + +
+ +
+ + + + + + + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ +

404 - Not found

+ +
+
+ + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/architecture/architecture/index.html b/architecture/architecture/index.html new file mode 100644 index 00000000..f03cac6f --- /dev/null +++ b/architecture/architecture/index.html @@ -0,0 +1,1096 @@ + + + + + + + + + + + + + + + + + + + + + + Components & Architecture - HelloDATA BE Docs + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + +
+ + +
+ +
+ + + + + + + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + +

Architecture

+

Components

+

This chapter will explain the core architectural component, its context views, and how HelloDATA works under the hood.

+

Domain View

+

We separate between two main domains, "Business and Data Domain.

+

Business vs. Data Domain

+
    +
  • "Business" domain: This domain holds one customer or company with general services (portal, orchestration, docs, monitoring, logging). Every business domain represents a business tenant. HelloDATA is running on a Kubernetes cluster. A business domain is treated as a dedicated namespace within that cluster; thus, multi-tenancy is set up by various namespaces in the Kubernetes cluster.
  • +
  • Data Domain: This is where actual data is stored (db-schema). We combine it with a superset instance (related dashboards) and the documentation about these data. Currently, a business domain relates 1 - n to its Data Domains. Within an existing Business Domain a Data Domain can be spawned using Kubernetes deployment features and scripts to set up the database objects.
  • +
+

Resources encapsulated inside a Data Domain can be:

+
    +
  • Schema of the Data Domain
  • +
  • Data mart tables of the Data Domain
  • +
  • The entire DWH environment of the Data Domain
  • +
  • Data lineage documents of the DBT projects of the Data Domain. + Dashboards, charts, and datasets within the superset instance of a Data Domain.
  • +
  • Airflow DAGs of the Data Domain.
  • +
+

On top, you can add subsystems. This can be seen as extensions that make HelloDATA pluggable with additional tools. We now support CloudBeaver for viewing your Postgres databases, RtD, and Gitea. You can imagine adding almost infinite tools with capabilities you'd like to have (data catalog, semantic layer, specific BI tool, Jupyter Notebooks, etc.).

+

Read more about Business and Data Domain access rights in Roles / Authorization Concept.

+

+

Data Domain

+

Zooming into several Data Domains that can exist within a Business domain, we see an example of Data Domain A-C. Each Data Domain has a persistent storage, in our case, Postgres (see more details in the Infrastructure Storage chapter below).

+

Each data domain might import different source systems; some might even be used in several data domains, as illustrated. Each Data Domain is meant to have its data model with straightforward to, in the best case, layered data models as shown on the image with:

+
Landing/Staging Area
+

Data from various source systems is first loaded into the Landing/Staging Area.

+
    +
  • In this first area, the data is stored as it is delivered; therefore, the stage tables' structure corresponds to the interface to the source system.
  • +
  • No relationships exist between the individual tables.
  • +
  • Each table contains the data from the final delivery, which will be deleted before the next delivery.
  • +
  • For example, in a grocery store, the Staging Area corresponds to the loading dock where suppliers (source systems) deliver their goods (data). Only the latest deliveries are stored there before being transferred to the next area.
  • +
+
Data Storage (Cleansing Area)
+

It must be cleaned before the delivered data is loaded into the Data Processing (Core). Most of these cleaning steps are performed in this area.

+
    +
  • Faulty data must be filtered, corrected, or complemented with singleton (default) values.
  • +
  • Data from different source systems must be transformed and integrated into a unified form.
  • +
  • This layer also contains only the data from the final delivery.
  • +
  • For example, In a grocery store, the Cleansing Area can be compared to the area where the goods are commissioned for sale. The goods are unpacked, vegetables and salad are washed, the meat is portioned, possibly combined with multiple products, and everything is labeled with price tags. The quality control of the delivered goods also belongs in this area.
  • +
+
Data Processing (Core)
+

The data from the different source systems are brought together in a central area, the Data Processing (Core), through the Landing and Data Storage and stored there for extended periods, often several years. 

+
    +
  • A primary task of this layer is to integrate the data from different sources and store it in a thematically structured way rather than separated by origin.
  • +
  • Often, thematic sub-areas in the Core are called "Subject Areas."
  • +
  • The data is stored in the Core so that historical data can be determined at any later point in time. 
  • +
  • The Core should be the only data source for the Data Marts.
  • +
  • Direct access to the Core by users should be avoided as much as possible.
  • +
+
Data Mart
+

Subsets of the data from the Core are stored in a form suitable for user queries. 

+
    +
  • Each Data Mart should only contain the data relevant to each application or a unique view of the data. This means several Data Marts are typically defined for different user groups and BI applications.
  • +
  • This reduces the complexity of the queries, increasing the acceptance of the DWH system among users.
  • +
  • For example, The Data Marts are the grocery store's market stalls or sales points. Each market stand offers a specific selection of goods, such as vegetables, meat, or cheese. The goods are presented so that they are accepted, i.e., purchased, by the respective customer group.
  • +
+
+

Between the layers, we have lots of Metadata

+

Different types of metadata are needed for the smooth operation of the Data Warehouse. Business metadata contains business descriptions of all attributes, drill paths, and aggregation rules for the front-end applications and code designations. Technical metadata describes, for example, data structures, mapping rules, and parameters for ETL control. Operational metadata contains all log tables, error messages, logging of ETL processes, and much more. The metadata forms the infrastructure of a DWH system and is described as "data about data".

+
+

+
Example: Multiple Superset Dashboards within a Data Domain
+

Within a Data Domain, several users build up different dashboards. Think of a dashboard as a specific use case e.g., Covid, Sales, etc., that solves a particular purpose. Each of these dashboards consists of individual charts and data sources in superset. Ultimately, what you see in the HelloDATA portal are the dashboards that combine all of the sub-components of what Superset provides.

+

+

Portal/UI View

+

As described in the intro. The portal is the heart of the HelloDATA application, with access to all critical applications.

+

Entry page of helloDATA: When you enter the portal for the first time, you land on the dashboard where you have

+
    +
  1. Navigation to jump to the different capabilities of helloDATA
  2. +
  3. Extended status information about
      +
    1. data pipelines, containers, performance, and security
    2. +
    3. documentation and subscriptions
    4. +
    +
  4. +
  5. User and profile information of logged-in users. 
  6. +
  7. Choosing the data domain you want to work within your business domain
  8. +
  9. Overview of your dashboards
  10. +
  11. dbt linage docs
  12. +
  13. Data marts of your Postgres database
  14. +
  15. Answers to freuqently asked questions
  16. +
+

More technical details are in the "Module deployment view" chapter below.

+

+

Module View and Communication

+

Modules

+

Going one level deeper, we see that we use different modules to make the portal and helloDATA work. 

+

We have the following modules:

+
    +
  • Keycloak: Open-source identity and access management. This handles everything related to user permissions and roles in a central place that we integrate into helloDATA.
  • +
  • Redis: open-source, in-memory data store that we use for persisting technical values for the portal to work. 
  • +
  • NATS: Open-source connective technology for the cloud. It handles communication with the different tools we use.
  • +
  • Data Stack: We use the open-source data stack with dbt, Airflow, and Superset. See more information in the intro chapters above. Subsystems can be added on demand as extensible plugins.
  • +
+

+
What is Keycloak and how does it work?
+

At the center are two components, NATS and Keycloak. Keycloak, together with the HelloDATA portal, handles the authentication, authorization, and permission management of HelloDATA components. Keycloak is a powerful open-source identity and access management system. Its primary benefits include:

+
    +
  1. Ease of Use: Keycloak is easy to set up and use and can be deployed on-premise or in the cloud.
  2. +
  3. Integration: It integrates seamlessly with existing applications and systems, providing a secure way of authenticating users and allowing them to access various resources and services with a single set of credentials.
  4. +
  5. Single Sign-On: Keycloak takes care of user authentication, freeing applications from having to handle login forms, user authentication, and user storage. Users can log in once and access all applications linked to Keycloak without needing to re-authenticate. This extends to logging out, with Keycloak offering single sign-out across all linked applications.
  6. +
  7. Identity Brokering and Social Login: Keycloak can authenticate users with existing OpenID Connect or SAML 2.0 Identity Providers and easily enable social network logins without requiring changes to your application's code.
  8. +
  9. User Federation: Keycloak has the capacity to connect to existing LDAP or Active Directory servers and can support custom providers for users stored in other databases.
  10. +
  11. Admin Console: Through the admin console, administrators can manage all aspects of the Keycloak server, including features, identity brokering, user federation, applications, services, authorization policies, user permissions, and sessions.
  12. +
  13. Account Management Console: Users can manage their own accounts, update profiles, change passwords, setup two-factor authentication, manage sessions, view account history, and link accounts with additional identity providers if social login or identity brokering has been enabled.
  14. +
  15. Standard Protocols: Keycloak is built on standard protocols, offering support for OpenID Connect, OAuth 2.0, and SAML.
  16. +
  17. Fine-Grained Authorization Services: Beyond role-based authorization, Keycloak provides fine-grained authorization services, enabling the management of permissions for all services from the Keycloak admin console. This allows for the creation of specific policies to meet unique needs. Within HelloDATA, the HelloDATA portal manages authorization, yet if required by upcoming subsystems, this KeyCloak feature can be utilized in tandem.
  18. +
  19. Two-Factor Authentication (2FA): This optional feature of KeyCloak enhances security by requiring users to provide two forms of authentication before gaining access, adding an extra layer of protection to the authentication process.
  20. +
+
What is NATS and how does it work?
+

On the other hand, NATS is central for handling communication between the different modules. Its power comes from integrating modern distributed systems. It is the glue between microservices, making and processing statements, or stream processing.

+

NATS focuses on hyper-connected moving parts and additional data each module generates. It supports location independence and mobility, whether the backend process is streaming or otherwise, and securely handles all of it.

+

NATs let you connect mobile frontend or microservice to connect flexibly. There is no need for static 1:1 communication with a hostname, IP, or port. On the other hand, NATS lets you m:n connectivity based on subject instead. Still, you can use 1:1, but on top, you have things like load balancers, logs, system and network security models, proxies, and, most essential for us, sidecars. We use sidecars heavily in connection with NATS.

+

NATS can be deployed nearly anywhere: on bare metal, in a VM, as a container, inside K8S, on a device, or in whichever environment you choose. And all fully secure.

+

Subsystem communication

+

Here is an example of subsystem communication. NATS, obviously at the center, handles these communications between the HelloDATA platform and the subsystems with its workers, as seen in the image below.

+

The HelloDATA portal has workers. These workers are deployed as extra containers with sidecars, called "Sidecar Containers". Each module needing communicating needs a sidecar with these workers deployed to communicate with NATS. Therefore, the subsystem itself has its workers to share with NATS as well.

+

+

Messaging component workers

+

Everything starts with a web browser session. The HelloDATA user accesses the HelloDATA Portal through HTTP. Before you see any of your modules or components, you must authorize yourself again, Keycloak. Once logged in, you have a Single Sign-on Token that will give access to different business domains or data domains depending on your role.

+

The HelloDATA portal sends an event to the EventWorkers via JDBC to the Portal database. The portal database persists settings from the portal and necessary configurations.

+

The EventWorkers, on the other side communicate with the different HelloDATA Modules discussed above (Keycloak, NATS, Data Stack with dbt, Airflow, and Superset) where needed. Each module is part of the domain view, which persists their data within their datastore.

+

+

Flow Chart

+

In this flow chart, you see again what we discussed above in a different way. Here, we assign a new user role. Again, everything starts with the HelloDATA Portal and an existing session from Keycloak. With that, the portal worker will publish a JSON message via UserRoleEvent to NATS. As the communication hub for HelloDATA, NATS knows what to do with each message and sends it to the respective subsystem worker.

+

Subsystem workers will execute that instruction and create and populate roles on, e.g., Superset and Airflow, and once done, inform the spawned subsystem worker that it's done. The worker will push it back to NATS, telling the portal worker, and at the end, will populate a message on the HelloDATA portal.

+

+

Building Block View

+

+ + + + + + +
+
+ + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/architecture/data-stack/index.html b/architecture/data-stack/index.html new file mode 100644 index 00000000..0711b54e --- /dev/null +++ b/architecture/data-stack/index.html @@ -0,0 +1,956 @@ + + + + + + + + + + + + + + + + + + + + + + Data Stack - HelloDATA BE Docs + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + +
+ + +
+ +
+ + + + + + + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + +

Data Stack

+

We'll explain which data stack is behind HelloDATA BE.

+

Control Pane - Portal

+

The differentiator of HelloDATA lies in the Portal. It combines all the loosely open-source tools into a single control pane.

+

The portal lets you see:

+
    +
  • Data models with a dbt lineage: You see the sources of a given table or even column.
  • +
  • You can check out the latest runs. Gives you when the dashboards have been updated.
  • +
  • Create and view all company-wide reports and dashboards.
  • +
  • View your data tables as Data Marts: Accessing physical tables, columns, and schemas.
  • +
  • Central Monitoring of all processes running in the portal.
  • +
  • Manage and control all your user access and role permission and authorization.
  • +
+

You can find more about the navigation and the features in the User Manual.

+

Data Modeling with SQL - dbt

+

dbt is a small database toolset that has gained immense popularity and is the facto standard for working with SQL. Why, you might ask? SQL is the most used language besides Python for data engineers, as it is declarative and easy to learn the basics, and many business analysts or people working with Excel or similar tools might know a little already.

+

The declarative approach is handy as you only define the what, meaning you determine what columns you want in the SELECT and which table to query in the FROM statement. You can do more advanced things with WHERE, GROUP BY, etc., but you do not need to care about the how. You do not need to watch which database, which partition it is stored, what segment, or what storage. You do not need to know if an index makes sense to use. All of it is handled by the query optimizer of Postgres (or any database supporting SQL).

+

But let's face it: SQL also has its downside. If you have worked extensively with SQL, you know the spaghetti code that usually happens when using it. It's an issue because of the repeatability—no variable we can set and reuse in an SQL. If you are familiar with them, you can achieve a better structure with CTEs, which allows you to define specific queries as a block to reuse later. But this is only within one single query and handy if the query is already log.

+

But what if you'd like to define your facts and dimensions as a separate query and reuse that in another query? You'd need to decouple the queries from storage, and we would persist it to disk and use that table on disk as a FROM statement for our following query. But what if we change something on the query or even change the name we won't notice in the dependent queries? And we will need to find out which queries depend on each other. There is no lineage or dependency graph.

+

It takes a lot of work to be organized with SQL. There is also not a lot of support if you use a database, as they are declarative. You need to make sure how to store them in git or how to run them.

+

That's where dbt comes into play. dbt lets you create these dependencies within SQL. You can declaratively build on each query, and you'll get errors if one changes but not the dependent one. You get a lineage graph (see an example), unit tests, and more. It's like you have an assistant that helps you do your job. It's added software engineering practice that we stitch on top of SQL engineering.

+

The danger we need to be aware of, as it will be so easy to build your models, is not to make 1000 of 1000 tables. As you will get lots of errors checked by the pre-compiling dbt,  good data modeling techniques are essential to succeed.

+

Below, you see dbt docs, lineage, and templates: +1. Project Navigation +2. Detail Navigation +3. SQL Template +4. SQL Compiled (practical SQL that gets executed) +5. Full Data lineage where with the source and transformation for the current object

+

+

Or zoom dbt lineage (when clicked): +

+

Task Orchestration - Airflow

+

Airflow is the natural next step. If you have many SQLs representing your business metrics, you want them to run on a daily or hourly schedule triggered by events. That's where Airflow comes into play. Airflow is, in its simplest terms, a task or workflow scheduler, which tasks or DAGs (how they are called) can be written programatically with Python. If you know cron jobs, these are the lowest task scheduler in Linux (think * * * * *), but little to no customization beyond simple time scheduling.

+

Airflow is different. Writing the DAGs in Python allows you to do whatever your business logic requires before or after a particular task is started. In the past, ETL tools like Microsoft SQL Server Integration Services (SSIS) and others were widely used. They were where your data transformation, cleaning and normalisation took place. In more modern architectures, these tools aren’t enough anymore. Moreover, code and data transformation logic are much more valuable to other data-savvy people (data anlysts, data scientists, business analysts) in the company instead of locking them away in a propreitary format.

+

Airflow or a general Orchestrator ensures correct execution of depend tasks. It is very flexibile and extensible with operators from the community or in-build capabiliities of the framework itself.

+

Default View

+

Airflow DAGs - Entry page which shows you the status of all your DAGs +- what's the schedule of each job +- are they active, how often have they failed, etc.

+

Next, you can click on each of the DAGs and get into a detailed view: +

+

Airflow operations overview for one DAG

+
    +
  1. General visualization possibilities which you prefer to see (here Grid view)
  2. +
  3. filter your DAG runs
  4. +
  5. see details on each run status in one view 
  6. +
  7. Check details in the table view
  8. +
  9. Gantt view for another example to see how long each sub-task had of the DAG
  10. +
+

+

+

Graph view of DAG

+

It shows you the dependencies of your business's various tasks, ensuring that the order is handled correctly.

+

+

Dashboards - Superset

+

Superset is the entry point to your data. It's a popular open-source business intelligence dashboard tool that visualizes your data according to your needs. It's able to handle all the latest chart types. You can combine them into dashboards filtered and drilled down as expected from a BI tool. The access to dashboards is restricted to authenticated users only. A user can be given view or edit rights to individual dashboards using roles and permissions. Public access to dashboards is not supported.

+

Example dashboard

+

+

Supported Charts

+

(see live in action)

+

+

Storage Layer - Postgres

+

Let's start with the storage layer. We use Postgres, the currently most used and loved database. Postgres is versatile and simple to use. It's a relational database that can be customized and scaled extensively.

+ + + + + + +
+
+ + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/architecture/infrastructure/index.html b/architecture/infrastructure/index.html new file mode 100644 index 00000000..d0118b03 --- /dev/null +++ b/architecture/infrastructure/index.html @@ -0,0 +1,934 @@ + + + + + + + + + + + + + + + + + + + + + + Infrastructure - HelloDATA BE Docs + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + Skip to content + + +
+
+ +
+ + + + +
+ + +
+ +
+ + + + + + + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + +

Infrastructure

+

Infrastructure is the part where we go into depth about how to run HelloDATA and its components on Kubernetes. 

+

Kubernetes

+

Kubernetes and its platform allow you to run and orchestrate container workloads. Kubernetes has become popular and is the de-facto standard for your cloud-native apps to (auto-) scale-out and deploy the various open-source tools fast, on any cloud, and locally. This is called cloud-agnostic, as you are not locked into any cloud vendor (Amazon, Microsoft, Google, etc.).

+

Kubernetes is infrastructure as code, specifically as YAML, allowing you to version and test your deployment quickly. All the resources in Kubernetes, including Pods, Configurations, Deployments, Volumes, etc., can be expressed in a YAML file using Kubernetes tools like HELM. Developers quickly write applications that run across multiple operating environments. Costs can be reduced by scaling down and using any programming language running with a simple Dockerfile. Its management makes it accessible through its modularity and abstraction; also, with the use of Containers, you can monitor all your applications in one place.

+

Kubernetes Namesspaces provides a mechanism for isolating groups of resources within a single cluster. Names of resources need to be unique within a namespace but not across namespaces. Namespace-based scoping is applicable only for namespaced objects (e.g. Deployments, Services, etc) and not for cluster-wide objects (e.g., StorageClass, Nodes, PersistentVolumes, etc).

+
    +
  • Namespaces provide a mechanism for isolating groups of resources within a single cluster (separation of concerns). Namespaces also lets you easily wramp up several HelloDATA instances on demand. 
      +
    • Names of resources need to be unique within a namespace but not across namespaces.
    • +
    +
  • +
  • We get central monitoring and logging solutions with GrafanaPrometheus, and the ELK stack (Elasticsearch, Logstash, and Kibana). As well as the Keycloak single sign-on.
  • +
  • Everything runs in a single Kubernetes Cluster but can also be deployed on-prem on any Kubernetes Cluster.
  • +
  • Persistent data will run within the "Data Domain" and must run on a Persistent Volume on Kubernetes or a central Postgres service (e.g., on Azure or internal).
  • +
+

+

Module deployment view

+

Here, we have a look at the module view with an inside view of accessing the HelloDATA Portal.

+

The Portal API serves with SpringBootWildfly and Angular.

+

+

+

Storage (Data Domain)

+

Following up on how storage is persistent for the Domain View introduced in the above chapters. 

+

Data-Domain Storage View

+

Storage is an important topic, as this is where the business value and the data itself are stored.

+

From a Kubernetes and deployment view, everything is encapsulated inside a Namespace. As explained in the above "Domain View", we have different layers from one Business domain (here Business Domain) to n (multiple) Data Domains. 

+

Each domain holds its data on persistent storage, whether Postgres for relational databases, blob storage for files or file storage on persistent volumes within Kubernetes.

+

GitSync is a tool we added to allow GitOps-type deployment. As a user, you can push changes to your git repo, and GitSync will automatically deploy that into your cluster on Kubernetes.

+

+

Business-Domain Storage View

+

Here is another view that persistent storage within Kubernetes (K8s) can hold data across the Data Domain. If these persistent volumes are used to store Data Domain information, it will also require implementing a backup and restore plan for these data.

+

Alternatively, blob storage on any cloud vendor or services such as Postgres service can be used, as these are typically managed and come with features such as backup and restore.

+

+

K8s Jobs

+

HelloDATA uses Kubernetes jobs to perform certain activities

+

Cleanup Jobs

+

Contents:

+
    +
  • Cleaning up user activity logs
  • +
  • Cleaning up logfiles
  • +
+

+

Deployment Platforms

+

HelloDATA can be operated as different platforms, e.g. development, test, and/or production platforms. The deployment is based on common CICD principles. It uses GIT and flux internally to deploy its resources onto the specific Kubernetes clusters. +In case of resource shortages, the underlying platform can be extended with additional resources upon request. +Horizontal scaling of the infrastructure can be done within the given resources boundaries (e. g. multiple pods for Superset.)

+

Platform Authentication Authorization

+

See at Roles and authorization concept.

+ + + + + + +
+
+ + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/assets/all logos/Favicon_HelloData BE.svg b/assets/all logos/Favicon_HelloData BE.svg new file mode 100644 index 00000000..3a399fec --- /dev/null +++ b/assets/all logos/Favicon_HelloData BE.svg @@ -0,0 +1,59 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/assets/all logos/Favicon_HelloData BE_ gross.png b/assets/all logos/Favicon_HelloData BE_ gross.png new file mode 100644 index 00000000..74baa0ac Binary files /dev/null and b/assets/all logos/Favicon_HelloData BE_ gross.png differ diff --git a/assets/all logos/Favicon_HelloData BE_64px.ico b/assets/all logos/Favicon_HelloData BE_64px.ico new file mode 100644 index 00000000..b29def47 Binary files /dev/null and b/assets/all logos/Favicon_HelloData BE_64px.ico differ diff --git a/assets/all logos/Icon_HelloData BE.svg b/assets/all logos/Icon_HelloData BE.svg new file mode 100644 index 00000000..5cc7afb6 --- /dev/null +++ b/assets/all logos/Icon_HelloData BE.svg @@ -0,0 +1,63 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/assets/all logos/Icon_HelloData BE_128px.ico b/assets/all logos/Icon_HelloData BE_128px.ico new file mode 100644 index 00000000..02577b50 Binary files /dev/null and b/assets/all logos/Icon_HelloData BE_128px.ico differ diff --git a/assets/all logos/Icon_HelloData BE_gross.png b/assets/all logos/Icon_HelloData BE_gross.png new file mode 100644 index 00000000..8cd469cb Binary files /dev/null and b/assets/all logos/Icon_HelloData BE_gross.png differ diff --git a/assets/favicon-16x16.png b/assets/favicon-16x16.png new file mode 100644 index 00000000..74baa0ac Binary files /dev/null and b/assets/favicon-16x16.png differ diff --git a/assets/favicon-32x32.png b/assets/favicon-32x32.png new file mode 100644 index 00000000..74baa0ac Binary files /dev/null and b/assets/favicon-32x32.png differ diff --git a/assets/favicon.ico b/assets/favicon.ico new file mode 100644 index 00000000..b29def47 Binary files /dev/null and b/assets/favicon.ico differ diff --git a/assets/favicon.png b/assets/favicon.png new file mode 100644 index 00000000..74baa0ac Binary files /dev/null and b/assets/favicon.png differ diff --git a/assets/images/favicon.png b/assets/images/favicon.png new file mode 100644 index 00000000..1cf13b9f Binary files /dev/null and b/assets/images/favicon.png differ diff --git a/assets/javascripts/bundle.51d95adb.min.js b/assets/javascripts/bundle.51d95adb.min.js new file mode 100644 index 00000000..b20ec683 --- /dev/null +++ b/assets/javascripts/bundle.51d95adb.min.js @@ -0,0 +1,29 @@ +"use strict";(()=>{var Hi=Object.create;var xr=Object.defineProperty;var Pi=Object.getOwnPropertyDescriptor;var $i=Object.getOwnPropertyNames,kt=Object.getOwnPropertySymbols,Ii=Object.getPrototypeOf,Er=Object.prototype.hasOwnProperty,an=Object.prototype.propertyIsEnumerable;var on=(e,t,r)=>t in e?xr(e,t,{enumerable:!0,configurable:!0,writable:!0,value:r}):e[t]=r,P=(e,t)=>{for(var r in t||(t={}))Er.call(t,r)&&on(e,r,t[r]);if(kt)for(var r of kt(t))an.call(t,r)&&on(e,r,t[r]);return e};var sn=(e,t)=>{var r={};for(var n in e)Er.call(e,n)&&t.indexOf(n)<0&&(r[n]=e[n]);if(e!=null&&kt)for(var n of kt(e))t.indexOf(n)<0&&an.call(e,n)&&(r[n]=e[n]);return r};var Ht=(e,t)=>()=>(t||e((t={exports:{}}).exports,t),t.exports);var Fi=(e,t,r,n)=>{if(t&&typeof t=="object"||typeof t=="function")for(let o of $i(t))!Er.call(e,o)&&o!==r&&xr(e,o,{get:()=>t[o],enumerable:!(n=Pi(t,o))||n.enumerable});return e};var yt=(e,t,r)=>(r=e!=null?Hi(Ii(e)):{},Fi(t||!e||!e.__esModule?xr(r,"default",{value:e,enumerable:!0}):r,e));var fn=Ht((wr,cn)=>{(function(e,t){typeof wr=="object"&&typeof cn!="undefined"?t():typeof define=="function"&&define.amd?define(t):t()})(wr,function(){"use strict";function e(r){var n=!0,o=!1,i=null,a={text:!0,search:!0,url:!0,tel:!0,email:!0,password:!0,number:!0,date:!0,month:!0,week:!0,time:!0,datetime:!0,"datetime-local":!0};function s(T){return!!(T&&T!==document&&T.nodeName!=="HTML"&&T.nodeName!=="BODY"&&"classList"in T&&"contains"in T.classList)}function f(T){var Ke=T.type,We=T.tagName;return!!(We==="INPUT"&&a[Ke]&&!T.readOnly||We==="TEXTAREA"&&!T.readOnly||T.isContentEditable)}function c(T){T.classList.contains("focus-visible")||(T.classList.add("focus-visible"),T.setAttribute("data-focus-visible-added",""))}function u(T){T.hasAttribute("data-focus-visible-added")&&(T.classList.remove("focus-visible"),T.removeAttribute("data-focus-visible-added"))}function p(T){T.metaKey||T.altKey||T.ctrlKey||(s(r.activeElement)&&c(r.activeElement),n=!0)}function m(T){n=!1}function d(T){s(T.target)&&(n||f(T.target))&&c(T.target)}function h(T){s(T.target)&&(T.target.classList.contains("focus-visible")||T.target.hasAttribute("data-focus-visible-added"))&&(o=!0,window.clearTimeout(i),i=window.setTimeout(function(){o=!1},100),u(T.target))}function v(T){document.visibilityState==="hidden"&&(o&&(n=!0),B())}function B(){document.addEventListener("mousemove",z),document.addEventListener("mousedown",z),document.addEventListener("mouseup",z),document.addEventListener("pointermove",z),document.addEventListener("pointerdown",z),document.addEventListener("pointerup",z),document.addEventListener("touchmove",z),document.addEventListener("touchstart",z),document.addEventListener("touchend",z)}function re(){document.removeEventListener("mousemove",z),document.removeEventListener("mousedown",z),document.removeEventListener("mouseup",z),document.removeEventListener("pointermove",z),document.removeEventListener("pointerdown",z),document.removeEventListener("pointerup",z),document.removeEventListener("touchmove",z),document.removeEventListener("touchstart",z),document.removeEventListener("touchend",z)}function z(T){T.target.nodeName&&T.target.nodeName.toLowerCase()==="html"||(n=!1,re())}document.addEventListener("keydown",p,!0),document.addEventListener("mousedown",m,!0),document.addEventListener("pointerdown",m,!0),document.addEventListener("touchstart",m,!0),document.addEventListener("visibilitychange",v,!0),B(),r.addEventListener("focus",d,!0),r.addEventListener("blur",h,!0),r.nodeType===Node.DOCUMENT_FRAGMENT_NODE&&r.host?r.host.setAttribute("data-js-focus-visible",""):r.nodeType===Node.DOCUMENT_NODE&&(document.documentElement.classList.add("js-focus-visible"),document.documentElement.setAttribute("data-js-focus-visible",""))}if(typeof window!="undefined"&&typeof document!="undefined"){window.applyFocusVisiblePolyfill=e;var t;try{t=new CustomEvent("focus-visible-polyfill-ready")}catch(r){t=document.createEvent("CustomEvent"),t.initCustomEvent("focus-visible-polyfill-ready",!1,!1,{})}window.dispatchEvent(t)}typeof document!="undefined"&&e(document)})});var un=Ht(Sr=>{(function(e){var t=function(){try{return!!Symbol.iterator}catch(c){return!1}},r=t(),n=function(c){var u={next:function(){var p=c.shift();return{done:p===void 0,value:p}}};return r&&(u[Symbol.iterator]=function(){return u}),u},o=function(c){return encodeURIComponent(c).replace(/%20/g,"+")},i=function(c){return decodeURIComponent(String(c).replace(/\+/g," "))},a=function(){var c=function(p){Object.defineProperty(this,"_entries",{writable:!0,value:{}});var m=typeof p;if(m!=="undefined")if(m==="string")p!==""&&this._fromString(p);else if(p instanceof c){var d=this;p.forEach(function(re,z){d.append(z,re)})}else if(p!==null&&m==="object")if(Object.prototype.toString.call(p)==="[object Array]")for(var h=0;hd[0]?1:0}),c._entries&&(c._entries={});for(var p=0;p1?i(d[1]):"")}})})(typeof global!="undefined"?global:typeof window!="undefined"?window:typeof self!="undefined"?self:Sr);(function(e){var t=function(){try{var o=new e.URL("b","http://a");return o.pathname="c d",o.href==="http://a/c%20d"&&o.searchParams}catch(i){return!1}},r=function(){var o=e.URL,i=function(f,c){typeof f!="string"&&(f=String(f)),c&&typeof c!="string"&&(c=String(c));var u=document,p;if(c&&(e.location===void 0||c!==e.location.href)){c=c.toLowerCase(),u=document.implementation.createHTMLDocument(""),p=u.createElement("base"),p.href=c,u.head.appendChild(p);try{if(p.href.indexOf(c)!==0)throw new Error(p.href)}catch(T){throw new Error("URL unable to set base "+c+" due to "+T)}}var m=u.createElement("a");m.href=f,p&&(u.body.appendChild(m),m.href=m.href);var d=u.createElement("input");if(d.type="url",d.value=f,m.protocol===":"||!/:/.test(m.href)||!d.checkValidity()&&!c)throw new TypeError("Invalid URL");Object.defineProperty(this,"_anchorElement",{value:m});var h=new e.URLSearchParams(this.search),v=!0,B=!0,re=this;["append","delete","set"].forEach(function(T){var Ke=h[T];h[T]=function(){Ke.apply(h,arguments),v&&(B=!1,re.search=h.toString(),B=!0)}}),Object.defineProperty(this,"searchParams",{value:h,enumerable:!0});var z=void 0;Object.defineProperty(this,"_updateSearchParams",{enumerable:!1,configurable:!1,writable:!1,value:function(){this.search!==z&&(z=this.search,B&&(v=!1,this.searchParams._fromString(this.search),v=!0))}})},a=i.prototype,s=function(f){Object.defineProperty(a,f,{get:function(){return this._anchorElement[f]},set:function(c){this._anchorElement[f]=c},enumerable:!0})};["hash","host","hostname","port","protocol"].forEach(function(f){s(f)}),Object.defineProperty(a,"search",{get:function(){return this._anchorElement.search},set:function(f){this._anchorElement.search=f,this._updateSearchParams()},enumerable:!0}),Object.defineProperties(a,{toString:{get:function(){var f=this;return function(){return f.href}}},href:{get:function(){return this._anchorElement.href.replace(/\?$/,"")},set:function(f){this._anchorElement.href=f,this._updateSearchParams()},enumerable:!0},pathname:{get:function(){return this._anchorElement.pathname.replace(/(^\/?)/,"/")},set:function(f){this._anchorElement.pathname=f},enumerable:!0},origin:{get:function(){var f={"http:":80,"https:":443,"ftp:":21}[this._anchorElement.protocol],c=this._anchorElement.port!=f&&this._anchorElement.port!=="";return this._anchorElement.protocol+"//"+this._anchorElement.hostname+(c?":"+this._anchorElement.port:"")},enumerable:!0},password:{get:function(){return""},set:function(f){},enumerable:!0},username:{get:function(){return""},set:function(f){},enumerable:!0}}),i.createObjectURL=function(f){return o.createObjectURL.apply(o,arguments)},i.revokeObjectURL=function(f){return o.revokeObjectURL.apply(o,arguments)},e.URL=i};if(t()||r(),e.location!==void 0&&!("origin"in e.location)){var n=function(){return e.location.protocol+"//"+e.location.hostname+(e.location.port?":"+e.location.port:"")};try{Object.defineProperty(e.location,"origin",{get:n,enumerable:!0})}catch(o){setInterval(function(){e.location.origin=n()},100)}}})(typeof global!="undefined"?global:typeof window!="undefined"?window:typeof self!="undefined"?self:Sr)});var Qr=Ht((Lt,Kr)=>{/*! + * clipboard.js v2.0.11 + * https://clipboardjs.com/ + * + * Licensed MIT © Zeno Rocha + */(function(t,r){typeof Lt=="object"&&typeof Kr=="object"?Kr.exports=r():typeof define=="function"&&define.amd?define([],r):typeof Lt=="object"?Lt.ClipboardJS=r():t.ClipboardJS=r()})(Lt,function(){return function(){var e={686:function(n,o,i){"use strict";i.d(o,{default:function(){return ki}});var a=i(279),s=i.n(a),f=i(370),c=i.n(f),u=i(817),p=i.n(u);function m(j){try{return document.execCommand(j)}catch(O){return!1}}var d=function(O){var w=p()(O);return m("cut"),w},h=d;function v(j){var O=document.documentElement.getAttribute("dir")==="rtl",w=document.createElement("textarea");w.style.fontSize="12pt",w.style.border="0",w.style.padding="0",w.style.margin="0",w.style.position="absolute",w.style[O?"right":"left"]="-9999px";var k=window.pageYOffset||document.documentElement.scrollTop;return w.style.top="".concat(k,"px"),w.setAttribute("readonly",""),w.value=j,w}var B=function(O,w){var k=v(O);w.container.appendChild(k);var F=p()(k);return m("copy"),k.remove(),F},re=function(O){var w=arguments.length>1&&arguments[1]!==void 0?arguments[1]:{container:document.body},k="";return typeof O=="string"?k=B(O,w):O instanceof HTMLInputElement&&!["text","search","url","tel","password"].includes(O==null?void 0:O.type)?k=B(O.value,w):(k=p()(O),m("copy")),k},z=re;function T(j){return typeof Symbol=="function"&&typeof Symbol.iterator=="symbol"?T=function(w){return typeof w}:T=function(w){return w&&typeof Symbol=="function"&&w.constructor===Symbol&&w!==Symbol.prototype?"symbol":typeof w},T(j)}var Ke=function(){var O=arguments.length>0&&arguments[0]!==void 0?arguments[0]:{},w=O.action,k=w===void 0?"copy":w,F=O.container,q=O.target,Le=O.text;if(k!=="copy"&&k!=="cut")throw new Error('Invalid "action" value, use either "copy" or "cut"');if(q!==void 0)if(q&&T(q)==="object"&&q.nodeType===1){if(k==="copy"&&q.hasAttribute("disabled"))throw new Error('Invalid "target" attribute. Please use "readonly" instead of "disabled" attribute');if(k==="cut"&&(q.hasAttribute("readonly")||q.hasAttribute("disabled")))throw new Error(`Invalid "target" attribute. You can't cut text from elements with "readonly" or "disabled" attributes`)}else throw new Error('Invalid "target" value, use a valid Element');if(Le)return z(Le,{container:F});if(q)return k==="cut"?h(q):z(q,{container:F})},We=Ke;function Ie(j){return typeof Symbol=="function"&&typeof Symbol.iterator=="symbol"?Ie=function(w){return typeof w}:Ie=function(w){return w&&typeof Symbol=="function"&&w.constructor===Symbol&&w!==Symbol.prototype?"symbol":typeof w},Ie(j)}function Ti(j,O){if(!(j instanceof O))throw new TypeError("Cannot call a class as a function")}function nn(j,O){for(var w=0;w0&&arguments[0]!==void 0?arguments[0]:{};this.action=typeof F.action=="function"?F.action:this.defaultAction,this.target=typeof F.target=="function"?F.target:this.defaultTarget,this.text=typeof F.text=="function"?F.text:this.defaultText,this.container=Ie(F.container)==="object"?F.container:document.body}},{key:"listenClick",value:function(F){var q=this;this.listener=c()(F,"click",function(Le){return q.onClick(Le)})}},{key:"onClick",value:function(F){var q=F.delegateTarget||F.currentTarget,Le=this.action(q)||"copy",Rt=We({action:Le,container:this.container,target:this.target(q),text:this.text(q)});this.emit(Rt?"success":"error",{action:Le,text:Rt,trigger:q,clearSelection:function(){q&&q.focus(),window.getSelection().removeAllRanges()}})}},{key:"defaultAction",value:function(F){return yr("action",F)}},{key:"defaultTarget",value:function(F){var q=yr("target",F);if(q)return document.querySelector(q)}},{key:"defaultText",value:function(F){return yr("text",F)}},{key:"destroy",value:function(){this.listener.destroy()}}],[{key:"copy",value:function(F){var q=arguments.length>1&&arguments[1]!==void 0?arguments[1]:{container:document.body};return z(F,q)}},{key:"cut",value:function(F){return h(F)}},{key:"isSupported",value:function(){var F=arguments.length>0&&arguments[0]!==void 0?arguments[0]:["copy","cut"],q=typeof F=="string"?[F]:F,Le=!!document.queryCommandSupported;return q.forEach(function(Rt){Le=Le&&!!document.queryCommandSupported(Rt)}),Le}}]),w}(s()),ki=Ri},828:function(n){var o=9;if(typeof Element!="undefined"&&!Element.prototype.matches){var i=Element.prototype;i.matches=i.matchesSelector||i.mozMatchesSelector||i.msMatchesSelector||i.oMatchesSelector||i.webkitMatchesSelector}function a(s,f){for(;s&&s.nodeType!==o;){if(typeof s.matches=="function"&&s.matches(f))return s;s=s.parentNode}}n.exports=a},438:function(n,o,i){var a=i(828);function s(u,p,m,d,h){var v=c.apply(this,arguments);return u.addEventListener(m,v,h),{destroy:function(){u.removeEventListener(m,v,h)}}}function f(u,p,m,d,h){return typeof u.addEventListener=="function"?s.apply(null,arguments):typeof m=="function"?s.bind(null,document).apply(null,arguments):(typeof u=="string"&&(u=document.querySelectorAll(u)),Array.prototype.map.call(u,function(v){return s(v,p,m,d,h)}))}function c(u,p,m,d){return function(h){h.delegateTarget=a(h.target,p),h.delegateTarget&&d.call(u,h)}}n.exports=f},879:function(n,o){o.node=function(i){return i!==void 0&&i instanceof HTMLElement&&i.nodeType===1},o.nodeList=function(i){var a=Object.prototype.toString.call(i);return i!==void 0&&(a==="[object NodeList]"||a==="[object HTMLCollection]")&&"length"in i&&(i.length===0||o.node(i[0]))},o.string=function(i){return typeof i=="string"||i instanceof String},o.fn=function(i){var a=Object.prototype.toString.call(i);return a==="[object Function]"}},370:function(n,o,i){var a=i(879),s=i(438);function f(m,d,h){if(!m&&!d&&!h)throw new Error("Missing required arguments");if(!a.string(d))throw new TypeError("Second argument must be a String");if(!a.fn(h))throw new TypeError("Third argument must be a Function");if(a.node(m))return c(m,d,h);if(a.nodeList(m))return u(m,d,h);if(a.string(m))return p(m,d,h);throw new TypeError("First argument must be a String, HTMLElement, HTMLCollection, or NodeList")}function c(m,d,h){return m.addEventListener(d,h),{destroy:function(){m.removeEventListener(d,h)}}}function u(m,d,h){return Array.prototype.forEach.call(m,function(v){v.addEventListener(d,h)}),{destroy:function(){Array.prototype.forEach.call(m,function(v){v.removeEventListener(d,h)})}}}function p(m,d,h){return s(document.body,m,d,h)}n.exports=f},817:function(n){function o(i){var a;if(i.nodeName==="SELECT")i.focus(),a=i.value;else if(i.nodeName==="INPUT"||i.nodeName==="TEXTAREA"){var s=i.hasAttribute("readonly");s||i.setAttribute("readonly",""),i.select(),i.setSelectionRange(0,i.value.length),s||i.removeAttribute("readonly"),a=i.value}else{i.hasAttribute("contenteditable")&&i.focus();var f=window.getSelection(),c=document.createRange();c.selectNodeContents(i),f.removeAllRanges(),f.addRange(c),a=f.toString()}return a}n.exports=o},279:function(n){function o(){}o.prototype={on:function(i,a,s){var f=this.e||(this.e={});return(f[i]||(f[i]=[])).push({fn:a,ctx:s}),this},once:function(i,a,s){var f=this;function c(){f.off(i,c),a.apply(s,arguments)}return c._=a,this.on(i,c,s)},emit:function(i){var a=[].slice.call(arguments,1),s=((this.e||(this.e={}))[i]||[]).slice(),f=0,c=s.length;for(f;f{"use strict";/*! + * escape-html + * Copyright(c) 2012-2013 TJ Holowaychuk + * Copyright(c) 2015 Andreas Lubbe + * Copyright(c) 2015 Tiancheng "Timothy" Gu + * MIT Licensed + */var is=/["'&<>]/;Jo.exports=as;function as(e){var t=""+e,r=is.exec(t);if(!r)return t;var n,o="",i=0,a=0;for(i=r.index;i0&&i[i.length-1])&&(c[0]===6||c[0]===2)){r=0;continue}if(c[0]===3&&(!i||c[1]>i[0]&&c[1]=e.length&&(e=void 0),{value:e&&e[n++],done:!e}}};throw new TypeError(t?"Object is not iterable.":"Symbol.iterator is not defined.")}function W(e,t){var r=typeof Symbol=="function"&&e[Symbol.iterator];if(!r)return e;var n=r.call(e),o,i=[],a;try{for(;(t===void 0||t-- >0)&&!(o=n.next()).done;)i.push(o.value)}catch(s){a={error:s}}finally{try{o&&!o.done&&(r=n.return)&&r.call(n)}finally{if(a)throw a.error}}return i}function D(e,t,r){if(r||arguments.length===2)for(var n=0,o=t.length,i;n1||s(m,d)})})}function s(m,d){try{f(n[m](d))}catch(h){p(i[0][3],h)}}function f(m){m.value instanceof Xe?Promise.resolve(m.value.v).then(c,u):p(i[0][2],m)}function c(m){s("next",m)}function u(m){s("throw",m)}function p(m,d){m(d),i.shift(),i.length&&s(i[0][0],i[0][1])}}function mn(e){if(!Symbol.asyncIterator)throw new TypeError("Symbol.asyncIterator is not defined.");var t=e[Symbol.asyncIterator],r;return t?t.call(e):(e=typeof xe=="function"?xe(e):e[Symbol.iterator](),r={},n("next"),n("throw"),n("return"),r[Symbol.asyncIterator]=function(){return this},r);function n(i){r[i]=e[i]&&function(a){return new Promise(function(s,f){a=e[i](a),o(s,f,a.done,a.value)})}}function o(i,a,s,f){Promise.resolve(f).then(function(c){i({value:c,done:s})},a)}}function A(e){return typeof e=="function"}function at(e){var t=function(n){Error.call(n),n.stack=new Error().stack},r=e(t);return r.prototype=Object.create(Error.prototype),r.prototype.constructor=r,r}var $t=at(function(e){return function(r){e(this),this.message=r?r.length+` errors occurred during unsubscription: +`+r.map(function(n,o){return o+1+") "+n.toString()}).join(` + `):"",this.name="UnsubscriptionError",this.errors=r}});function De(e,t){if(e){var r=e.indexOf(t);0<=r&&e.splice(r,1)}}var Fe=function(){function e(t){this.initialTeardown=t,this.closed=!1,this._parentage=null,this._finalizers=null}return e.prototype.unsubscribe=function(){var t,r,n,o,i;if(!this.closed){this.closed=!0;var a=this._parentage;if(a)if(this._parentage=null,Array.isArray(a))try{for(var s=xe(a),f=s.next();!f.done;f=s.next()){var c=f.value;c.remove(this)}}catch(v){t={error:v}}finally{try{f&&!f.done&&(r=s.return)&&r.call(s)}finally{if(t)throw t.error}}else a.remove(this);var u=this.initialTeardown;if(A(u))try{u()}catch(v){i=v instanceof $t?v.errors:[v]}var p=this._finalizers;if(p){this._finalizers=null;try{for(var m=xe(p),d=m.next();!d.done;d=m.next()){var h=d.value;try{dn(h)}catch(v){i=i!=null?i:[],v instanceof $t?i=D(D([],W(i)),W(v.errors)):i.push(v)}}}catch(v){n={error:v}}finally{try{d&&!d.done&&(o=m.return)&&o.call(m)}finally{if(n)throw n.error}}}if(i)throw new $t(i)}},e.prototype.add=function(t){var r;if(t&&t!==this)if(this.closed)dn(t);else{if(t instanceof e){if(t.closed||t._hasParent(this))return;t._addParent(this)}(this._finalizers=(r=this._finalizers)!==null&&r!==void 0?r:[]).push(t)}},e.prototype._hasParent=function(t){var r=this._parentage;return r===t||Array.isArray(r)&&r.includes(t)},e.prototype._addParent=function(t){var r=this._parentage;this._parentage=Array.isArray(r)?(r.push(t),r):r?[r,t]:t},e.prototype._removeParent=function(t){var r=this._parentage;r===t?this._parentage=null:Array.isArray(r)&&De(r,t)},e.prototype.remove=function(t){var r=this._finalizers;r&&De(r,t),t instanceof e&&t._removeParent(this)},e.EMPTY=function(){var t=new e;return t.closed=!0,t}(),e}();var Or=Fe.EMPTY;function It(e){return e instanceof Fe||e&&"closed"in e&&A(e.remove)&&A(e.add)&&A(e.unsubscribe)}function dn(e){A(e)?e():e.unsubscribe()}var Ae={onUnhandledError:null,onStoppedNotification:null,Promise:void 0,useDeprecatedSynchronousErrorHandling:!1,useDeprecatedNextContext:!1};var st={setTimeout:function(e,t){for(var r=[],n=2;n0},enumerable:!1,configurable:!0}),t.prototype._trySubscribe=function(r){return this._throwIfClosed(),e.prototype._trySubscribe.call(this,r)},t.prototype._subscribe=function(r){return this._throwIfClosed(),this._checkFinalizedStatuses(r),this._innerSubscribe(r)},t.prototype._innerSubscribe=function(r){var n=this,o=this,i=o.hasError,a=o.isStopped,s=o.observers;return i||a?Or:(this.currentObservers=null,s.push(r),new Fe(function(){n.currentObservers=null,De(s,r)}))},t.prototype._checkFinalizedStatuses=function(r){var n=this,o=n.hasError,i=n.thrownError,a=n.isStopped;o?r.error(i):a&&r.complete()},t.prototype.asObservable=function(){var r=new U;return r.source=this,r},t.create=function(r,n){return new wn(r,n)},t}(U);var wn=function(e){ne(t,e);function t(r,n){var o=e.call(this)||this;return o.destination=r,o.source=n,o}return t.prototype.next=function(r){var n,o;(o=(n=this.destination)===null||n===void 0?void 0:n.next)===null||o===void 0||o.call(n,r)},t.prototype.error=function(r){var n,o;(o=(n=this.destination)===null||n===void 0?void 0:n.error)===null||o===void 0||o.call(n,r)},t.prototype.complete=function(){var r,n;(n=(r=this.destination)===null||r===void 0?void 0:r.complete)===null||n===void 0||n.call(r)},t.prototype._subscribe=function(r){var n,o;return(o=(n=this.source)===null||n===void 0?void 0:n.subscribe(r))!==null&&o!==void 0?o:Or},t}(E);var Et={now:function(){return(Et.delegate||Date).now()},delegate:void 0};var wt=function(e){ne(t,e);function t(r,n,o){r===void 0&&(r=1/0),n===void 0&&(n=1/0),o===void 0&&(o=Et);var i=e.call(this)||this;return i._bufferSize=r,i._windowTime=n,i._timestampProvider=o,i._buffer=[],i._infiniteTimeWindow=!0,i._infiniteTimeWindow=n===1/0,i._bufferSize=Math.max(1,r),i._windowTime=Math.max(1,n),i}return t.prototype.next=function(r){var n=this,o=n.isStopped,i=n._buffer,a=n._infiniteTimeWindow,s=n._timestampProvider,f=n._windowTime;o||(i.push(r),!a&&i.push(s.now()+f)),this._trimBuffer(),e.prototype.next.call(this,r)},t.prototype._subscribe=function(r){this._throwIfClosed(),this._trimBuffer();for(var n=this._innerSubscribe(r),o=this,i=o._infiniteTimeWindow,a=o._buffer,s=a.slice(),f=0;f0?e.prototype.requestAsyncId.call(this,r,n,o):(r.actions.push(this),r._scheduled||(r._scheduled=ut.requestAnimationFrame(function(){return r.flush(void 0)})))},t.prototype.recycleAsyncId=function(r,n,o){var i;if(o===void 0&&(o=0),o!=null?o>0:this.delay>0)return e.prototype.recycleAsyncId.call(this,r,n,o);var a=r.actions;n!=null&&((i=a[a.length-1])===null||i===void 0?void 0:i.id)!==n&&(ut.cancelAnimationFrame(n),r._scheduled=void 0)},t}(Ut);var On=function(e){ne(t,e);function t(){return e!==null&&e.apply(this,arguments)||this}return t.prototype.flush=function(r){this._active=!0;var n=this._scheduled;this._scheduled=void 0;var o=this.actions,i;r=r||o.shift();do if(i=r.execute(r.state,r.delay))break;while((r=o[0])&&r.id===n&&o.shift());if(this._active=!1,i){for(;(r=o[0])&&r.id===n&&o.shift();)r.unsubscribe();throw i}},t}(Wt);var we=new On(Tn);var R=new U(function(e){return e.complete()});function Dt(e){return e&&A(e.schedule)}function kr(e){return e[e.length-1]}function Qe(e){return A(kr(e))?e.pop():void 0}function Se(e){return Dt(kr(e))?e.pop():void 0}function Vt(e,t){return typeof kr(e)=="number"?e.pop():t}var pt=function(e){return e&&typeof e.length=="number"&&typeof e!="function"};function zt(e){return A(e==null?void 0:e.then)}function Nt(e){return A(e[ft])}function qt(e){return Symbol.asyncIterator&&A(e==null?void 0:e[Symbol.asyncIterator])}function Kt(e){return new TypeError("You provided "+(e!==null&&typeof e=="object"?"an invalid object":"'"+e+"'")+" where a stream was expected. You can provide an Observable, Promise, ReadableStream, Array, AsyncIterable, or Iterable.")}function Ki(){return typeof Symbol!="function"||!Symbol.iterator?"@@iterator":Symbol.iterator}var Qt=Ki();function Yt(e){return A(e==null?void 0:e[Qt])}function Gt(e){return ln(this,arguments,function(){var r,n,o,i;return Pt(this,function(a){switch(a.label){case 0:r=e.getReader(),a.label=1;case 1:a.trys.push([1,,9,10]),a.label=2;case 2:return[4,Xe(r.read())];case 3:return n=a.sent(),o=n.value,i=n.done,i?[4,Xe(void 0)]:[3,5];case 4:return[2,a.sent()];case 5:return[4,Xe(o)];case 6:return[4,a.sent()];case 7:return a.sent(),[3,2];case 8:return[3,10];case 9:return r.releaseLock(),[7];case 10:return[2]}})})}function Bt(e){return A(e==null?void 0:e.getReader)}function $(e){if(e instanceof U)return e;if(e!=null){if(Nt(e))return Qi(e);if(pt(e))return Yi(e);if(zt(e))return Gi(e);if(qt(e))return _n(e);if(Yt(e))return Bi(e);if(Bt(e))return Ji(e)}throw Kt(e)}function Qi(e){return new U(function(t){var r=e[ft]();if(A(r.subscribe))return r.subscribe(t);throw new TypeError("Provided object does not correctly implement Symbol.observable")})}function Yi(e){return new U(function(t){for(var r=0;r=2;return function(n){return n.pipe(e?_(function(o,i){return e(o,i,n)}):me,Oe(1),r?He(t):zn(function(){return new Xt}))}}function Nn(){for(var e=[],t=0;t=2,!0))}function fe(e){e===void 0&&(e={});var t=e.connector,r=t===void 0?function(){return new E}:t,n=e.resetOnError,o=n===void 0?!0:n,i=e.resetOnComplete,a=i===void 0?!0:i,s=e.resetOnRefCountZero,f=s===void 0?!0:s;return function(c){var u,p,m,d=0,h=!1,v=!1,B=function(){p==null||p.unsubscribe(),p=void 0},re=function(){B(),u=m=void 0,h=v=!1},z=function(){var T=u;re(),T==null||T.unsubscribe()};return g(function(T,Ke){d++,!v&&!h&&B();var We=m=m!=null?m:r();Ke.add(function(){d--,d===0&&!v&&!h&&(p=jr(z,f))}),We.subscribe(Ke),!u&&d>0&&(u=new et({next:function(Ie){return We.next(Ie)},error:function(Ie){v=!0,B(),p=jr(re,o,Ie),We.error(Ie)},complete:function(){h=!0,B(),p=jr(re,a),We.complete()}}),$(T).subscribe(u))})(c)}}function jr(e,t){for(var r=[],n=2;ne.next(document)),e}function K(e,t=document){return Array.from(t.querySelectorAll(e))}function V(e,t=document){let r=se(e,t);if(typeof r=="undefined")throw new ReferenceError(`Missing element: expected "${e}" to be present`);return r}function se(e,t=document){return t.querySelector(e)||void 0}function _e(){return document.activeElement instanceof HTMLElement&&document.activeElement||void 0}function tr(e){return L(b(document.body,"focusin"),b(document.body,"focusout")).pipe(ke(1),l(()=>{let t=_e();return typeof t!="undefined"?e.contains(t):!1}),N(e===_e()),Y())}function Be(e){return{x:e.offsetLeft,y:e.offsetTop}}function Yn(e){return L(b(window,"load"),b(window,"resize")).pipe(Ce(0,we),l(()=>Be(e)),N(Be(e)))}function rr(e){return{x:e.scrollLeft,y:e.scrollTop}}function dt(e){return L(b(e,"scroll"),b(window,"resize")).pipe(Ce(0,we),l(()=>rr(e)),N(rr(e)))}var Bn=function(){if(typeof Map!="undefined")return Map;function e(t,r){var n=-1;return t.some(function(o,i){return o[0]===r?(n=i,!0):!1}),n}return function(){function t(){this.__entries__=[]}return Object.defineProperty(t.prototype,"size",{get:function(){return this.__entries__.length},enumerable:!0,configurable:!0}),t.prototype.get=function(r){var n=e(this.__entries__,r),o=this.__entries__[n];return o&&o[1]},t.prototype.set=function(r,n){var o=e(this.__entries__,r);~o?this.__entries__[o][1]=n:this.__entries__.push([r,n])},t.prototype.delete=function(r){var n=this.__entries__,o=e(n,r);~o&&n.splice(o,1)},t.prototype.has=function(r){return!!~e(this.__entries__,r)},t.prototype.clear=function(){this.__entries__.splice(0)},t.prototype.forEach=function(r,n){n===void 0&&(n=null);for(var o=0,i=this.__entries__;o0},e.prototype.connect_=function(){!zr||this.connected_||(document.addEventListener("transitionend",this.onTransitionEnd_),window.addEventListener("resize",this.refresh),xa?(this.mutationsObserver_=new MutationObserver(this.refresh),this.mutationsObserver_.observe(document,{attributes:!0,childList:!0,characterData:!0,subtree:!0})):(document.addEventListener("DOMSubtreeModified",this.refresh),this.mutationEventsAdded_=!0),this.connected_=!0)},e.prototype.disconnect_=function(){!zr||!this.connected_||(document.removeEventListener("transitionend",this.onTransitionEnd_),window.removeEventListener("resize",this.refresh),this.mutationsObserver_&&this.mutationsObserver_.disconnect(),this.mutationEventsAdded_&&document.removeEventListener("DOMSubtreeModified",this.refresh),this.mutationsObserver_=null,this.mutationEventsAdded_=!1,this.connected_=!1)},e.prototype.onTransitionEnd_=function(t){var r=t.propertyName,n=r===void 0?"":r,o=ya.some(function(i){return!!~n.indexOf(i)});o&&this.refresh()},e.getInstance=function(){return this.instance_||(this.instance_=new e),this.instance_},e.instance_=null,e}(),Jn=function(e,t){for(var r=0,n=Object.keys(t);r0},e}(),Zn=typeof WeakMap!="undefined"?new WeakMap:new Bn,eo=function(){function e(t){if(!(this instanceof e))throw new TypeError("Cannot call a class as a function.");if(!arguments.length)throw new TypeError("1 argument required, but only 0 present.");var r=Ea.getInstance(),n=new Ra(t,r,this);Zn.set(this,n)}return e}();["observe","unobserve","disconnect"].forEach(function(e){eo.prototype[e]=function(){var t;return(t=Zn.get(this))[e].apply(t,arguments)}});var ka=function(){return typeof nr.ResizeObserver!="undefined"?nr.ResizeObserver:eo}(),to=ka;var ro=new E,Ha=I(()=>H(new to(e=>{for(let t of e)ro.next(t)}))).pipe(x(e=>L(Te,H(e)).pipe(C(()=>e.disconnect()))),J(1));function de(e){return{width:e.offsetWidth,height:e.offsetHeight}}function ge(e){return Ha.pipe(S(t=>t.observe(e)),x(t=>ro.pipe(_(({target:r})=>r===e),C(()=>t.unobserve(e)),l(()=>de(e)))),N(de(e)))}function bt(e){return{width:e.scrollWidth,height:e.scrollHeight}}function ar(e){let t=e.parentElement;for(;t&&(e.scrollWidth<=t.scrollWidth&&e.scrollHeight<=t.scrollHeight);)t=(e=t).parentElement;return t?e:void 0}var no=new E,Pa=I(()=>H(new IntersectionObserver(e=>{for(let t of e)no.next(t)},{threshold:0}))).pipe(x(e=>L(Te,H(e)).pipe(C(()=>e.disconnect()))),J(1));function sr(e){return Pa.pipe(S(t=>t.observe(e)),x(t=>no.pipe(_(({target:r})=>r===e),C(()=>t.unobserve(e)),l(({isIntersecting:r})=>r))))}function oo(e,t=16){return dt(e).pipe(l(({y:r})=>{let n=de(e),o=bt(e);return r>=o.height-n.height-t}),Y())}var cr={drawer:V("[data-md-toggle=drawer]"),search:V("[data-md-toggle=search]")};function io(e){return cr[e].checked}function qe(e,t){cr[e].checked!==t&&cr[e].click()}function je(e){let t=cr[e];return b(t,"change").pipe(l(()=>t.checked),N(t.checked))}function $a(e,t){switch(e.constructor){case HTMLInputElement:return e.type==="radio"?/^Arrow/.test(t):!0;case HTMLSelectElement:case HTMLTextAreaElement:return!0;default:return e.isContentEditable}}function Ia(){return L(b(window,"compositionstart").pipe(l(()=>!0)),b(window,"compositionend").pipe(l(()=>!1))).pipe(N(!1))}function ao(){let e=b(window,"keydown").pipe(_(t=>!(t.metaKey||t.ctrlKey)),l(t=>({mode:io("search")?"search":"global",type:t.key,claim(){t.preventDefault(),t.stopPropagation()}})),_(({mode:t,type:r})=>{if(t==="global"){let n=_e();if(typeof n!="undefined")return!$a(n,r)}return!0}),fe());return Ia().pipe(x(t=>t?R:e))}function Me(){return new URL(location.href)}function ot(e){location.href=e.href}function so(){return new E}function co(e,t){if(typeof t=="string"||typeof t=="number")e.innerHTML+=t.toString();else if(t instanceof Node)e.appendChild(t);else if(Array.isArray(t))for(let r of t)co(e,r)}function M(e,t,...r){let n=document.createElement(e);if(t)for(let o of Object.keys(t))typeof t[o]!="undefined"&&(typeof t[o]!="boolean"?n.setAttribute(o,t[o]):n.setAttribute(o,""));for(let o of r)co(n,o);return n}function fr(e){if(e>999){let t=+((e-950)%1e3>99);return`${((e+1e-6)/1e3).toFixed(t)}k`}else return e.toString()}function fo(){return location.hash.substring(1)}function uo(e){let t=M("a",{href:e});t.addEventListener("click",r=>r.stopPropagation()),t.click()}function Fa(){return b(window,"hashchange").pipe(l(fo),N(fo()),_(e=>e.length>0),J(1))}function po(){return Fa().pipe(l(e=>se(`[id="${e}"]`)),_(e=>typeof e!="undefined"))}function Nr(e){let t=matchMedia(e);return Zt(r=>t.addListener(()=>r(t.matches))).pipe(N(t.matches))}function lo(){let e=matchMedia("print");return L(b(window,"beforeprint").pipe(l(()=>!0)),b(window,"afterprint").pipe(l(()=>!1))).pipe(N(e.matches))}function qr(e,t){return e.pipe(x(r=>r?t():R))}function ur(e,t={credentials:"same-origin"}){return ve(fetch(`${e}`,t)).pipe(ce(()=>R),x(r=>r.status!==200?Tt(()=>new Error(r.statusText)):H(r)))}function Ue(e,t){return ur(e,t).pipe(x(r=>r.json()),J(1))}function mo(e,t){let r=new DOMParser;return ur(e,t).pipe(x(n=>n.text()),l(n=>r.parseFromString(n,"text/xml")),J(1))}function pr(e){let t=M("script",{src:e});return I(()=>(document.head.appendChild(t),L(b(t,"load"),b(t,"error").pipe(x(()=>Tt(()=>new ReferenceError(`Invalid script: ${e}`))))).pipe(l(()=>{}),C(()=>document.head.removeChild(t)),Oe(1))))}function ho(){return{x:Math.max(0,scrollX),y:Math.max(0,scrollY)}}function bo(){return L(b(window,"scroll",{passive:!0}),b(window,"resize",{passive:!0})).pipe(l(ho),N(ho()))}function vo(){return{width:innerWidth,height:innerHeight}}function go(){return b(window,"resize",{passive:!0}).pipe(l(vo),N(vo()))}function yo(){return Q([bo(),go()]).pipe(l(([e,t])=>({offset:e,size:t})),J(1))}function lr(e,{viewport$:t,header$:r}){let n=t.pipe(X("size")),o=Q([n,r]).pipe(l(()=>Be(e)));return Q([r,t,o]).pipe(l(([{height:i},{offset:a,size:s},{x:f,y:c}])=>({offset:{x:a.x-f,y:a.y-c+i},size:s})))}(()=>{function e(n,o){parent.postMessage(n,o||"*")}function t(...n){return n.reduce((o,i)=>o.then(()=>new Promise(a=>{let s=document.createElement("script");s.src=i,s.onload=a,document.body.appendChild(s)})),Promise.resolve())}var r=class{constructor(n){this.url=n,this.onerror=null,this.onmessage=null,this.onmessageerror=null,this.m=a=>{a.source===this.w&&(a.stopImmediatePropagation(),this.dispatchEvent(new MessageEvent("message",{data:a.data})),this.onmessage&&this.onmessage(a))},this.e=(a,s,f,c,u)=>{if(s===this.url.toString()){let p=new ErrorEvent("error",{message:a,filename:s,lineno:f,colno:c,error:u});this.dispatchEvent(p),this.onerror&&this.onerror(p)}};let o=new EventTarget;this.addEventListener=o.addEventListener.bind(o),this.removeEventListener=o.removeEventListener.bind(o),this.dispatchEvent=o.dispatchEvent.bind(o);let i=document.createElement("iframe");i.width=i.height=i.frameBorder="0",document.body.appendChild(this.iframe=i),this.w.document.open(),this.w.document.write(` + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + +
+ + +
+ +
+ + + + + + + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + +

Showcase: Animal Statistics (Switzerland)

+

What is the Showcase?

+

It's the demo cases of HD-BE, it's importing animal data from an external source and loading them with Airflow, modeled with dbt, and visualized in Superset.

+

It hopefully will show you how the platform works and it comes pre-installed with the docker-compose installation.

+

How can I get started and explore it?

+

Click on the data-domain showcase and you can explore pre-defined dashboards with below described Airflow job and dbt models.

+

How does it look?

+

Below the technical details of the showcase are described. How the airflow pipeline is collecting the data from an open API and modeling it with dbt.

+

Airflow Pipeline

+

+
    +
  • data_download
    + The source files, which are in CSV format, are queried via the data_download task and stored in the file system.
  • +
  • create_tables
    + Based on the CSV files, tables are created in the LZN database schema of the project.
  • +
  • insert_data
    + After the tables have been created, in this step, the source data from the CSV file is copied into the corresponding tables in the LZN database schema.
  • +
  • dbt_run
    + After the preceding steps have been executed and the data foundation for the DBT framework has been established, the data processing steps in the database can be initiated using + DBT scripts. (described in the DBT section)
  • +
  • dbt_docs
    + Upon completion of generating the tables in the database, a documentation of the tables and their dependencies is generated using DBT.
  • +
  • dbt_docs_serve
    + For the visualization of the generated documentation, it is provided in the form of a website.
  • +
+

DBT: Data modeling

+

fact_breeds_long

+

The fact table fact_breeds_long describes key figures, which are used to derive the stock of registered, living animals, divided by breeds over time.

+

The following tables from the [lzn] database schema are selected for the calculation of the key figure:

+
    +
  • cats_breeds
  • +
  • cattle_breeds
  • +
  • dogs_breeds
  • +
  • equids_breeds
  • +
  • goats_breeds
  • +
  • sheep_breeds
  • +
+

+

fact_cattle_beefiness_fattissue

+

The fact table fact_catle_beefiness_fattissue describes key figures, which are used to derive the number of slaughtered cows by year and month.
+Classification is done according to CH-TAX (Trading Class Classification CHTAX System | VIEGUT AG)

+

The following tables from the [lzn] database schema are selected for the calculation of the key figure:

+
    +
  • cattle_evolbeefiness
  • +
  • cattle_evolfattissue
  • +
+

+

fact_cattle_popvariations

+

The fact table fact_cattle_popvariations describes key figures, which are used to derive the increase and decrease of the cattle population in the Animal Traffic +Database (https://www.agate.ch/) over time (including reports from Liechtenstein).
+The key figures are grouped according to the following types of reports:

+
    +
  • Birth
  • +
  • Slaughter
  • +
  • Death
  • +
+

The following table from the [lzn] database schema is selected for the calculation of the key figure:

+
    +
  • cattle_popvariations
  • +
+

+

fact_cattle_pyr_wide & fact_cattle_pyr_long

+

The fact table fact_cattle_popvariations describes key figures, which are used to derive the distribution of registered living cattle by age class and gender.

+

The following table from the [lzn] database schema is selected for the calculation of the key figure:

+
    +
  • cattle_pyr
  • +
+

The fact table fact_cattle_pyr_long pivots all key figures from fact_cattle_pyr_wide.

+

+

Superset

+

Database Connection

+

The data foundation of the Superset visualizations in the form of Datasets, Dashboards, and Charts is realized through a Database Connection.

+

In this case, a database connection to a database is established, which refers to a PostgreSQL database in which the above-described DBT scripts were executed.

+

Datasets

+

Datasets are used to prepare the data foundation in a suitable form, which can then be visualized in charts in an appropriate way.

+

Essentially, modeled fact tables from the UDM database schema are selected and linked with dimension tables.

+

This allows facts to be calculated or evaluated at different levels of professional granularity.

+

+

Interfaces

+

Tierstatistik

+ + + + + + + + + + + + + + + + + + + + + +
SourceDescription
https://tierstatistik.identitas.ch/de/Website of the API provider
https://tierstatistik.identitas.ch/de/docs.htmlDocumentation of the platform and description of the data basis and API
tierstatistik.identitas.ch/tierstatistik.rdfAPI and data provided by the website
+ + + + + + +
+
+ + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/concepts/workspaces-troubleshoot/index.html b/concepts/workspaces-troubleshoot/index.html new file mode 100644 index 00000000..cc5211a8 --- /dev/null +++ b/concepts/workspaces-troubleshoot/index.html @@ -0,0 +1,859 @@ + + + + + + + + + + + + + + + + + + Troubleshooting - HelloDATA BE Docs + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + +
+ + +
+ +
+ + + + + + + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + +

Troubleshooting

+

Kubernetes

+

If you haven't turned on Kubernetes, you'll get an error similar to this: +urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='kubernetes.docker.internal', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/pods?labelSelector=dag_id%3Drun_boiler_example%2Ckubernetes_pod_operator%3DTrue%2Cpod-label-test%3Dlabel-name-test%2Crun_id%3Dmanual__2024-01-29T095915.2491840000-f3be8d87f%2Ctask_id%3Drun_duckdb_query%2Calready_checked%21%3DTrue%2C%21airflow-worker (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0xffff82c2ab10>: Failed to establish a new connection: [Errno 111] Connection refused'))

+

Full log: +

[2024-01-29, 09:48:49 UTC] {pod.py:1017} ERROR - 'NoneType' object has no attribute 'metadata'
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 174, in _new_conn
+    conn = connection.create_connection(
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/site-packages/urllib3/util/connection.py", line 95, in create_connection
+    raise err
+  File "/usr/local/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
+    sock.connect(sa)
+ConnectionRefusedError: [Errno 111] Connection refused
+During handling of the above exception, another exception occurred:
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 714, in urlopen
+    httplib_response = self._make_request(
+                       ^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 403, in _make_request
+    self._validate_conn(conn)
+  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1053, in _validate_conn
+    conn.connect()
+  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 363, in connect
+    self.sock = conn = self._new_conn()
+                       ^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/site-packages/urllib3/connection.py", line 186, in _new_conn
+    raise NewConnectionError(
+urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0xffff82db3650>: Failed to establish a new connection: [Errno 111] Connection refused
+During handling of the above exception, another exception occurred:
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 583, in execute_sync
+    self.pod = self.get_or_create_pod(  # must set `self.pod` for `on_kill`
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 545, in get_or_create_pod
+    pod = self.find_pod(self.namespace or pod_request_obj.metadata.namespace, context=context)
+
+....
+
+
+airflow.exceptions.AirflowException: Pod airflow-running-dagster-workspace-jdkqug7h returned a failure.
+remote_pod: None
+[2024-01-29, 09:48:49 UTC] {taskinstance.py:1398} INFO - Marking task as UP_FOR_RETRY. dag_id=run_boiler_example, task_id=run_duckdb_query, execution_date=20210501T000000, start_date=20240129T094849, end_date=20240129T094849
+[2024-01-29, 09:48:49 UTC] {standard_task_runner.py:104} ERROR - Failed to execute job 3 for task run_duckdb_query (Pod airflow-running-dagster-workspace-jdkqug7h returned a failure.
+remote_pod: None; 225)
+[2024-01-29, 09:48:49 UTC] {local_task_job_runner.py:228} INFO - Task exited with return code 1
+[2024-01-29, 09:48:49 UTC] {taskinstance.py:2776} INFO - 0 downstream tasks scheduled from follow-on schedule check
+

+

Docker image not build locally or missing

+

If your name or image is not available locally (check docker image ls), you'll get an error on Airflow like this:

+
[2024-01-29, 10:10:14 UTC] {pod.py:961} INFO - Building pod airflow-running-dagster-workspace-64ngbudj with labels: {'dag_id': 'run_boiler_example', 'task_id': 'run_duckdb_query', 'run_id': 'manual__2024-01-29T101013.7029880000-328a76b5e', 'kubernetes_pod_operator': 'True', 'try_number': '1'}
+[2024-01-29, 10:10:14 UTC] {pod.py:538} INFO - Found matching pod airflow-running-dagster-workspace-64ngbudj with labels {'airflow_kpo_in_cluster': 'False', 'airflow_version': '2.7.1-astro.1', 'dag_id': 'run_boiler_example', 'kubernetes_pod_operator': 'True', 'pod-label-test': 'label-name-test', 'run_id': 'manual__2024-01-29T101013.7029880000-328a76b5e', 'task_id': 'run_duckdb_query', 'try_number': '1'}
+[2024-01-29, 10:10:14 UTC] {pod.py:539} INFO - `try_number` of task_instance: 1
+[2024-01-29, 10:10:14 UTC] {pod.py:540} INFO - `try_number` of pod: 1
+[2024-01-29, 10:10:14 UTC] {pod_manager.py:348} WARNING - Pod not yet started: airflow-running-dagster-workspace-64ngbudj
+[2024-01-29, 10:10:15 UTC] {pod_manager.py:348} WARNING - Pod not yet started: airflow-running-dagster-workspace-64ngbudj
+[2024-01-29, 10:10:16 UTC] {pod_manager.py:348} WARNING - Pod not yet started: airflow-running-dagster-workspace-64ngbudj
+[2024-01-29, 10:10:17 UTC] {pod_manager.py:348} WARNING - Pod not yet started: airflow-running-dagster-workspace-64ngbudj
+[2024-01-29, 10:10:18 UTC] {pod_manager.py:348} WARNING - Pod not yet started: airflow-running-dagster-workspace-64ngbudj
+[2024-01-29, 10:12:15 UTC] {pod.py:823} INFO - Deleting pod: airflow-running-dagster-workspace-64ngbudj
+[2024-01-29, 10:12:15 UTC] {taskinstance.py:1935} ERROR - Task failed with exception
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 594, in execute_sync
+    self.await_pod_start(pod=self.pod)
+  File "/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 556, in await_pod_start
+    self.pod_manager.await_pod_start(pod=pod, startup_timeout=self.startup_timeout_seconds)
+  File "/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 354, in await_pod_start
+    raise PodLaunchFailedException(msg)
+airflow.providers.cncf.kubernetes.utils.pod_manager.PodLaunchFailedException: Pod took longer than 120 seconds to start. Check the pod events in kubernetes to determine why.
+During handling of the above exception, another exception occurred:
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 578, in execute
+    return self.execute_sync(context)
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 617, in execute_sync
+    self.cleanup(
+  File "/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 746, in cleanup
+    raise AirflowException(
+airflow.exceptions.AirflowException: Pod airflow-running-dagster-workspace-64ngbudj returned a failure.
+
+
+...
+
+[2024-01-29, 10:12:15 UTC] {local_task_job_runner.py:228} INFO - Task exited with return code 1
+[2024-01-29, 10:12:15 UTC] {taskinstance.py:2776} INFO - 0 downstream tasks scheduled from follow-on schedule check
+
+

If you open a kubernetes Monitoring tool such as Lens or k9s, you'll also see the pod struggling to pull the image:

+

+

Another cause, in case you haven't created the local PersistentVolume, you'd see something like "my-pvc" does not exist. Then you'd need to create the pvc first.

+ + + + + + +
+
+ + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/concepts/workspaces/index.html b/concepts/workspaces/index.html new file mode 100644 index 00000000..8a90f722 --- /dev/null +++ b/concepts/workspaces/index.html @@ -0,0 +1,1217 @@ + + + + + + + + + + + + + + + + + + + + + + Workspaces - HelloDATA BE Docs + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + +
+ + +
+ +
+ + + + + + + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + +

Data Engineering Workspaces

+

On this page, we'll explain what workspaces in the context of HelloDATA-BE are and how to use them, and you'll create your own based on a prepared starter repo.

+
+

Info

+

Also see the step-by-step video we created that might help you further.

+
+

What is a Workspace?

+

Within the context of HelloDATA-BE, data, engineers, or technical people can develop their dbt, airflow, or even bring their tool, all packed into a separate git-repo and run as part of HelloDATA-BE where they enjoy the benefits of persistent storage, visualization tools, user management, monitoring, etc.

+

graph TD
+    subgraph "Business Domain (Tenant)"
+        BD[Business Domain]
+        BD -->|Services| SR1[Portal]
+        BD -->|Services| SR2[Orchestration]
+        BD -->|Services| SR3[Lineage]
+        BD -->|Services| SR5[Database Manager]
+        BD -->|Services| SR4[Monitoring & Logging]
+    end
+    subgraph "Workspaces"
+        WS[Workspaces] -->|git-repo| DE[Data Engineering]
+        WS[Workspaces] -->|git-repo| ML[ML Team]
+        WS[Workspaces] -->|git-repo| DA[Product Analysts]
+        WS[Workspaces] -->|git-repo| NN[...]
+    end
+    subgraph "Data Domain (1-n)"
+        DD[Data Domain] -->|Persistent Storage| PG[Postgres]
+        DD[Data Domain] -->|Data Modeling| DBT[dbt]
+        DD[Data Domain] -->|Visualization| SU[Superset]
+    end
+
+    BD -->|Contains 1-n| DD
+    DD -->|n-instances| WS
+
+    %% Colors
+    class BD business
+    class DD data
+    class WS workspace
+    class SS,PGA subsystem
+    class SR1,SR2,SR3,SR4 services
+
+    classDef business fill:#96CD70,stroke:#333,stroke-width:2px;
+    classDef data fill:#A898D8,stroke:#333,stroke-width:2px;
+    classDef workspace fill:#70AFFD,stroke:#333,stroke-width:2px;
+    %% classDef subsystem fill:#F1C40F,stroke:#333,stroke-width:2px;
+    %% classDef services fill:#E74C3C,stroke:#333,stroke-width:1px;
+A schematic overview of workspaces are embedded into HelloDATA-BE.

+ + +

A workspace can have n-instances within a data domain. What does it mean? Each team can deal with its requirements to develop and build their project independently.

+

Think of an ML engineer who needs heavy tools such as Tensorflow, etc., as an analyst might build simple dbt models. In contrast, another data engineer uses a specific tool from the Modern Data Stack.

+

When to use Workspaces

+

Workspaces are best used for development, implementing custom business logic, and modeling your data. But there is no limit to what you build as long as it can be run as a DAG as an Airflow data pipeline.

+

Generally speaking, a workspace is used whenever someone needs to create a custom logic yet to be integrated within the HelloDATA BE Platform.

+

As a second step - imagine you implemented a critical business transformation everyone needs - that code and DAG could be moved and be a default DAG within a data domain. But the development always happens within the workspace, enabling self-serve.

+

Without workspaces, every request would need to go over the HelloDATA BE Project team. Data engineers need a straightforward way isolated from deployment where they can add custom code for their specific data domain pipelines.

+

How does a Workspace work?

+

When you create your workspace, it will be deployed within HelloDATA-BE and run by an Airflow DAG. The Airflow DAG is the integration into HD. You'll define things like how often it runs, what it should run, the order of it, etc.

+

Below, you see an example of two different Airflow DAGs deployed from two different Workspaces (marked red arrow): +

+

How do I create my own Workspace?

+

To implement your own Workspace, we created a hellodata-be-workspace-starter. This repo contains a minimal set of artefacts in order to be deployed on HD.

+

Pre-requisites

+
    +
  • Install latest Docker Desktop
  • +
  • Activate Kubernetes feature in Docker Desktop (needed to run Airflow DAG as an Docker-Image): Settings -> Kubernetes -> Enable Kubernetes
  • +
+

Step-by-Step Guide

+
    +
  1. Clone hellodata-be-workspace-starter.
  2. +
  3. Add your own custom logic to the repo, update Dockerfile with relevant libraries and binaries you need.
  4. +
  5. Create one or multiple Airflow DAGs for running within HelloDATA-BE.
  6. +
  7. Build the image with docker build -t hellodata-ws-boilerplate:0.1.0-a.1 . (or the name of choice)
  8. +
  9. Start up Airflow locally with Astro CLI (see more below) and run/test the pipeline
  10. +
  11. Define needed ENV-Variables and deployments needs (to be set-up by HD-Team initially once)
  12. +
  13. Push the image to a DockerHub of choice
  14. +
  15. Ask HD Team to deploy initially
  16. +
+

From now on whenever you have a change, you just build a new image and that will be deployed on HelloDATA-BE automatically. Making you and your team independent.

+

Boiler-Plate Example

+

Below you find an example structure that help you understand how to configure workspaces for your needs.

+

Boiler-Plate repo

+

The repo helps you to build your workspace by simply clone the whole repo and adding your changes.

+

We generally have these boiler plate files: +

├── Dockerfile
+├── Makefile
+├── README.md
+├── build-and-push.sh
+├── deployment
+│   └── deployment-needs.yaml
+└── src
+    ├── dags
+    │   └── airflow
+    │       ├── .astro
+    │       │   ├── config.yaml
+    │       ├── Dockerfile
+    │       ├── Makefile
+    │       ├── README.md
+    │       ├── airflow_settings.yaml
+    │       ├── dags
+    │       │   ├── .airflowignore
+    │       │   └── boiler-example.py
+    │       ├── include
+    │       │   └── .kube
+    │       │       └── config
+    │       ├── packages.txt
+    │       ├── plugins
+    │       ├── requirements.txt
+    └── duckdb
+        └── query_duckdb.py
+

+

Important files: Business logic (DAG)

+

Where as query_duckdb.py and the boiler-example.py DAG are in this case are my custom code that you'd change with your own code.

+

Although the Airflow DAG can be re-used as we use KubernetesPodOperator that works works within HD and locally (check more below). Essentially you change the name and the schedule to your needs, the image name and your good to go.

+

Example of a Airflow DAG: +

from pendulum import datetime
+from airflow import DAG
+from airflow.configuration import conf
+from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import (
+    KubernetesPodOperator,
+)
+from kubernetes.client import models as k8s
+import os
+
+default_args = {
+    "owner": "airflow",
+    "depend_on_past": False,
+    "start_date": datetime(2021, 5, 1),
+    "email_on_failure": False,
+    "email_on_retry": False,
+    "retries": 1,
+}
+
+workspace_name = os.getenv("HD_WS_BOILERPLATE_NAME", "ws-boilerplate")
+namespace = os.getenv("HD_NAMESPACE", "default")
+
+# This will use .kube/config for local Astro CLI Airflow and ENV variable for k8s deployment
+if namespace == "default":
+    config_file = "include/.kube/config"  # copy your local kube file to the include folder: `cp ~/.kube/config include/.kube/config`
+    in_cluster = False
+else:
+    in_cluster = True
+    config_file = None
+
+with DAG(
+    dag_id="run_boiler_example",
+    schedule="@once",
+    default_args=default_args,
+    description="Boiler Plate for running a hello data workspace in airflow",
+    tags=[workspace_name],
+) as dag:
+    KubernetesPodOperator(
+        namespace=namespace,
+        image="my-docker-registry.com/hellodata-ws-boilerplate:0.1.0",
+        image_pull_secrets=[k8s.V1LocalObjectReference("regcred")],
+        labels={"pod-label-test": "label-name-test"},
+        name="airflow-running-dagster-workspace",
+        task_id="run_duckdb_query",
+        in_cluster=in_cluster,  # if set to true, will look in the cluster, if false, looks for file
+        cluster_context="docker-desktop",  # is ignored when in_cluster is set to True
+        config_file=config_file,
+        is_delete_operator_pod=True,
+        get_logs=True,
+        # please add/overwrite your command here
+        cmds=["/bin/bash", "-cx"],
+        arguments=[
+            "python query_duckdb.py && echo 'Query executed successfully'",  # add your command here
+        ],
+    )
+

+

DAG: How to test or run a DAG locally before deploying

+

To run locally, the easiest way is to use the Astro CLI (see link for installation). With it, we can simply astro start or astro stop to start up/down.

+

For local deployment we have these requirements:

+
    +
  • Local Docker installed (either native or Docker-Desktop)
  • +
  • make sure Kubernetes is enabled
  • +
  • copy you local kube-file to astro: cp ~/.kube/config src/dags/airflow/include/.kube/
  • +
  • attention, under Windows you find that file most probably under: C:\Users\[YourIdHere]\.kube\config
  • +
  • make sure docker image is available locally (for Airflow to use it) -> docker build must have run (check with docker image ls
  • +
+

The config file is used from astro to run on local Kubernetes. Se more infos on Run your Astro project in a local Airflow environment.

+

Install Requirements: Dockerfile

+

Below is the example how to install requirements (here duckdb) and copy my custom code src/duckdb/query_duckdb.py to the image.

+

Boiler-plate example: +

FROM python:3.10-slim
+
+RUN mkdir -p /opt/airflow/airflow_home/dags/
+
+# Copy your airflow DAGs which will be copied into bussiness domain Airflow (These DAGs will be executed by Airflow)
+COPY ../src/dags/airflow/dags/* /opt/airflow/airflow_home/dags/
+
+WORKDIR /usr/src/app
+
+RUN pip install --upgrade pip
+
+# Install DuckDB (example - please add your own dependencies here)
+RUN pip install duckdb
+
+# Copy the script into the container
+COPY src/duckdb/query_duckdb.py ./
+
+# long-running process to keep the container running 
+CMD tail -f /dev/null
+

+

Deployment: deployment-needs.yaml

+

Below you see an an example of a deployment needs in deployment-needs.yaml, that defines:

+
    +
  • Docker image
  • +
  • Volume mounts you need
  • +
  • a command to run
  • +
  • container behaviour
  • +
  • extra ENV variables and values that HD-Team needs to provide for you
  • +
+
+

This part is the one that will change most likely

+

All of which will be eventually more automated. Also let us know or just add missing specs to the file and we'll add the functionallity on the deployment side.

+
+
spec:
+  initContainers:
+    copy-dags-to-bd:
+      image:
+        repository: my-docker-registry.com/hellodata-ws-boilerplate
+        pullPolicy: IfNotPresent
+        tag: "0.1.0"
+      resources: {}
+
+      volumeMounts:
+        - name: storage-hellodata
+          type: external
+          path: /storage
+      command: [ "/bin/sh","-c" ]
+      args: [ "mkdir -p /storage/${datadomain}/dags/${workspace}/ && rm -rf /storage/${datadomain}/dags/${workspace}/* && cp -a /opt/airflow/airflow_home/dags/*.py /storage/${datadomain}/dags/${workspace}/" ]
+
+  containers:
+    - name: ws-boilerplate
+      image: my-docker-registry.com/hellodata-ws-boilerplate:0.1.0
+      imagePullPolicy: Always
+
+
+#needed envs for Airflow
+airflow:
+
+  extraEnv: |
+    - name: "HD_NAMESPACE"
+      value: "${namespace}"
+    - name: "HD_WS_BOILERPLATE_NAME"
+      value: "dd01-ws-boilerplate"
+
+

Example with Airflow and dbt

+

We've added another demo dag called showcase-boiler.py which is an DAG that download data from the web (animal statistics, ~150 CSVs), postgres tables are created, data inserted and a dbt run and docs is ran at the end.

+

+

In this case we use multiple task in a DAG, these have all the same image, but you could use different one for each step. Meaning you could use Python for download, R for transformatin and Java for machine learning. But as long as images are similar, I'd suggest to use the same image.

+

Volumes / PVC

+

Another addition is the use of voulmes. These are a persistent storage also called pvs in Kubernetes, which allow to store intermediate storage outside of the container. Downloaded CSVs are stored there for the next task to pick up from that storage.

+

Locally you need to create such a storage once, there is a script in case you want to apply it to you local Docker-Desktop setup. Run this command: +

kubectl apply -f src/volume_mount/pvc.yaml
+

+

Be sure to use the same name, in this example we use my-pvc in your DAGs as well. See in the showcase-boiler.py how the volumnes are mounted like this: +

volume_claim = k8s.V1PersistentVolumeClaimVolumeSource(claim_name="my-pvc")
+volume = k8s.V1Volume(name="my-volume", persistent_volume_claim=volume_claim)
+volume_mount = k8s.V1VolumeMount(name="my-volume", mount_path="/mnt/pvc")
+

+

Conclusion

+

I hope this has illustrated how to create your own workspace. Otherwise let us know in the discussions or create an issue/PR.

+

Troubleshooting

+

If you enconter errors, we collect them in Troubleshooting.

+ + + + + + +
+
+ + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/images/1014792293.png b/images/1014792293.png new file mode 100644 index 00000000..fcc70048 Binary files /dev/null and b/images/1014792293.png differ diff --git a/images/1040683241.png b/images/1040683241.png new file mode 100644 index 00000000..f5ed7ef8 Binary files /dev/null and b/images/1040683241.png differ diff --git a/images/1040683243 1.png b/images/1040683243 1.png new file mode 100644 index 00000000..cfac44ea Binary files /dev/null and b/images/1040683243 1.png differ diff --git a/images/1040683243.PNG b/images/1040683243.PNG new file mode 100644 index 00000000..658e7ed6 Binary files /dev/null and b/images/1040683243.PNG differ diff --git a/images/1040683245.png b/images/1040683245.png new file mode 100644 index 00000000..21568470 Binary files /dev/null and b/images/1040683245.png differ diff --git a/images/1046873175.png b/images/1046873175.png new file mode 100644 index 00000000..9f78068d Binary files /dev/null and b/images/1046873175.png differ diff --git a/images/1046873182.png b/images/1046873182.png new file mode 100644 index 00000000..02761fe1 Binary files /dev/null and b/images/1046873182.png differ diff --git a/images/1046873407.png b/images/1046873407.png new file mode 100644 index 00000000..0b3f1372 Binary files /dev/null and b/images/1046873407.png differ diff --git a/images/1046873409.png b/images/1046873409.png new file mode 100644 index 00000000..02761fe1 Binary files /dev/null and b/images/1046873409.png differ diff --git a/images/1046873411.png b/images/1046873411.png new file mode 100644 index 00000000..292fe1d7 Binary files /dev/null and b/images/1046873411.png differ diff --git a/images/1046873413.png b/images/1046873413.png new file mode 100644 index 00000000..0b8a229d Binary files /dev/null and b/images/1046873413.png differ diff --git a/images/1046873417.png b/images/1046873417.png new file mode 100644 index 00000000..df502b4a Binary files /dev/null and b/images/1046873417.png differ diff --git a/images/1046873425.png b/images/1046873425.png new file mode 100644 index 00000000..cc66a390 Binary files /dev/null and b/images/1046873425.png differ diff --git a/images/1046873438.png b/images/1046873438.png new file mode 100644 index 00000000..1f9b6858 Binary files /dev/null and b/images/1046873438.png differ diff --git a/images/1046873444.png b/images/1046873444.png new file mode 100644 index 00000000..1f9b6858 Binary files /dev/null and b/images/1046873444.png differ diff --git a/images/1046873449.png b/images/1046873449.png new file mode 100644 index 00000000..5e09e5d2 Binary files /dev/null and b/images/1046873449.png differ diff --git a/images/1046873456.png b/images/1046873456.png new file mode 100644 index 00000000..1f9b6858 Binary files /dev/null and b/images/1046873456.png differ diff --git a/images/1062338709.png b/images/1062338709.png new file mode 100644 index 00000000..63fe27cc Binary files /dev/null and b/images/1062338709.png differ diff --git a/images/1062338710.png b/images/1062338710.png new file mode 100644 index 00000000..5342ea08 Binary files /dev/null and b/images/1062338710.png differ diff --git a/images/1062338713.png b/images/1062338713.png new file mode 100644 index 00000000..8c6b28b4 Binary files /dev/null and b/images/1062338713.png differ diff --git a/images/1062338731.png b/images/1062338731.png new file mode 100644 index 00000000..81db1e7f Binary files /dev/null and b/images/1062338731.png differ diff --git a/images/1062338737.png b/images/1062338737.png new file mode 100644 index 00000000..35904ed8 Binary files /dev/null and b/images/1062338737.png differ diff --git a/images/1062338827.png b/images/1062338827.png new file mode 100644 index 00000000..b506c83a Binary files /dev/null and b/images/1062338827.png differ diff --git a/images/1063555292.png b/images/1063555292.png new file mode 100644 index 00000000..87361325 Binary files /dev/null and b/images/1063555292.png differ diff --git a/images/1063555293.png b/images/1063555293.png new file mode 100644 index 00000000..fe389702 Binary files /dev/null and b/images/1063555293.png differ diff --git a/images/1063555295.png b/images/1063555295.png new file mode 100644 index 00000000..8b3a7592 Binary files /dev/null and b/images/1063555295.png differ diff --git a/images/1063555296.png b/images/1063555296.png new file mode 100644 index 00000000..94db944f Binary files /dev/null and b/images/1063555296.png differ diff --git a/images/1063555297.png b/images/1063555297.png new file mode 100644 index 00000000..e82d2402 Binary files /dev/null and b/images/1063555297.png differ diff --git a/images/1063555298.png b/images/1063555298.png new file mode 100644 index 00000000..f57ccac1 Binary files /dev/null and b/images/1063555298.png differ diff --git a/images/1063555299.png b/images/1063555299.png new file mode 100644 index 00000000..e4b92f7a Binary files /dev/null and b/images/1063555299.png differ diff --git a/images/1063555338.png b/images/1063555338.png new file mode 100644 index 00000000..11d21463 Binary files /dev/null and b/images/1063555338.png differ diff --git a/images/1068204566.png b/images/1068204566.png new file mode 100644 index 00000000..81db1e7f Binary files /dev/null and b/images/1068204566.png differ diff --git a/images/1068204575.png b/images/1068204575.png new file mode 100644 index 00000000..2f942da4 Binary files /dev/null and b/images/1068204575.png differ diff --git a/images/1068204576.png b/images/1068204576.png new file mode 100644 index 00000000..9af04a1d Binary files /dev/null and b/images/1068204576.png differ diff --git a/images/1068204578.png b/images/1068204578.png new file mode 100644 index 00000000..66af961c Binary files /dev/null and b/images/1068204578.png differ diff --git a/images/1068204586.png b/images/1068204586.png new file mode 100644 index 00000000..61aad1c8 Binary files /dev/null and b/images/1068204586.png differ diff --git a/images/1068204588.png b/images/1068204588.png new file mode 100644 index 00000000..c59bc6cb Binary files /dev/null and b/images/1068204588.png differ diff --git a/images/1068204591.png b/images/1068204591.png new file mode 100644 index 00000000..8dd8feee Binary files /dev/null and b/images/1068204591.png differ diff --git a/images/1068204596.png b/images/1068204596.png new file mode 100644 index 00000000..0f64d9af Binary files /dev/null and b/images/1068204596.png differ diff --git a/images/1068204597.png b/images/1068204597.png new file mode 100644 index 00000000..82fa7094 Binary files /dev/null and b/images/1068204597.png differ diff --git a/images/1068204599.png b/images/1068204599.png new file mode 100644 index 00000000..72fbf070 Binary files /dev/null and b/images/1068204599.png differ diff --git a/images/1068204607.png b/images/1068204607.png new file mode 100644 index 00000000..68445ede Binary files /dev/null and b/images/1068204607.png differ diff --git a/images/1068204613.png b/images/1068204613.png new file mode 100644 index 00000000..1fd80bf1 Binary files /dev/null and b/images/1068204613.png differ diff --git a/images/1068204614.png b/images/1068204614.png new file mode 100644 index 00000000..67d92a16 Binary files /dev/null and b/images/1068204614.png differ diff --git a/images/1068204616.png b/images/1068204616.png new file mode 100644 index 00000000..3858e44d Binary files /dev/null and b/images/1068204616.png differ diff --git a/images/1068204617.png b/images/1068204617.png new file mode 100644 index 00000000..9a4dbbf9 Binary files /dev/null and b/images/1068204617.png differ diff --git a/images/1068204620.png b/images/1068204620.png new file mode 100644 index 00000000..ab5302f0 Binary files /dev/null and b/images/1068204620.png differ diff --git a/images/1068204622.png b/images/1068204622.png new file mode 100644 index 00000000..aa5932dd Binary files /dev/null and b/images/1068204622.png differ diff --git a/images/1068204623.png b/images/1068204623.png new file mode 100644 index 00000000..48fca20e Binary files /dev/null and b/images/1068204623.png differ diff --git a/images/1068204627.png b/images/1068204627.png new file mode 100644 index 00000000..2dea3908 Binary files /dev/null and b/images/1068204627.png differ diff --git a/images/1068204628.png b/images/1068204628.png new file mode 100644 index 00000000..ca1e5122 Binary files /dev/null and b/images/1068204628.png differ diff --git a/images/1110083707.png b/images/1110083707.png new file mode 100644 index 00000000..63bb58dd Binary files /dev/null and b/images/1110083707.png differ diff --git a/images/1110083764.png b/images/1110083764.png new file mode 100644 index 00000000..b69a9494 Binary files /dev/null and b/images/1110083764.png differ diff --git a/images/1110083783.png b/images/1110083783.png new file mode 100644 index 00000000..df502b4a Binary files /dev/null and b/images/1110083783.png differ diff --git a/images/1110083799.png b/images/1110083799.png new file mode 100644 index 00000000..3deb0999 Binary files /dev/null and b/images/1110083799.png differ diff --git a/images/1110083801.png b/images/1110083801.png new file mode 100644 index 00000000..3deb0999 Binary files /dev/null and b/images/1110083801.png differ diff --git a/images/1110083803.png b/images/1110083803.png new file mode 100644 index 00000000..3deb0999 Binary files /dev/null and b/images/1110083803.png differ diff --git a/images/1110083805.png b/images/1110083805.png new file mode 100644 index 00000000..3deb0999 Binary files /dev/null and b/images/1110083805.png differ diff --git a/images/1110083807.png b/images/1110083807.png new file mode 100644 index 00000000..1f9fa503 Binary files /dev/null and b/images/1110083807.png differ diff --git a/images/1110083812.png b/images/1110083812.png new file mode 100644 index 00000000..1f9fa503 Binary files /dev/null and b/images/1110083812.png differ diff --git a/images/1111425060.png b/images/1111425060.png new file mode 100644 index 00000000..d7b17159 Binary files /dev/null and b/images/1111425060.png differ diff --git a/images/1117849043.png b/images/1117849043.png new file mode 100644 index 00000000..ab810abb Binary files /dev/null and b/images/1117849043.png differ diff --git a/images/1117849051.png b/images/1117849051.png new file mode 100644 index 00000000..ab810abb Binary files /dev/null and b/images/1117849051.png differ diff --git a/images/1117849053.png b/images/1117849053.png new file mode 100644 index 00000000..ab810abb Binary files /dev/null and b/images/1117849053.png differ diff --git a/images/1131088924.png b/images/1131088924.png new file mode 100644 index 00000000..a6c6bb08 Binary files /dev/null and b/images/1131088924.png differ diff --git a/images/1151074469.png b/images/1151074469.png new file mode 100644 index 00000000..ab810abb Binary files /dev/null and b/images/1151074469.png differ diff --git a/images/Kubernetes Namespaces.png b/images/Kubernetes Namespaces.png new file mode 100644 index 00000000..2a1a52c2 Binary files /dev/null and b/images/Kubernetes Namespaces.png differ diff --git a/images/Pasted image 20230929120735.png b/images/Pasted image 20230929120735.png new file mode 100644 index 00000000..fc0558f1 Binary files /dev/null and b/images/Pasted image 20230929120735.png differ diff --git a/images/Pasted image 20230929120817.png b/images/Pasted image 20230929120817.png new file mode 100644 index 00000000..fc0558f1 Binary files /dev/null and b/images/Pasted image 20230929120817.png differ diff --git a/images/Pasted image 20230929121034.png b/images/Pasted image 20230929121034.png new file mode 100644 index 00000000..fc0558f1 Binary files /dev/null and b/images/Pasted image 20230929121034.png differ diff --git a/images/Pasted image 20231106135341.png b/images/Pasted image 20231106135341.png new file mode 100644 index 00000000..4cacd84e Binary files /dev/null and b/images/Pasted image 20231106135341.png differ diff --git a/images/Pasted image 20231106135439.png b/images/Pasted image 20231106135439.png new file mode 100644 index 00000000..3f6c1264 Binary files /dev/null and b/images/Pasted image 20231106135439.png differ diff --git a/images/Pasted image 20231106135522.png b/images/Pasted image 20231106135522.png new file mode 100644 index 00000000..3f6c1264 Binary files /dev/null and b/images/Pasted image 20231106135522.png differ diff --git a/images/Pasted image 20231130144045.png b/images/Pasted image 20231130144045.png new file mode 100644 index 00000000..a77a3223 Binary files /dev/null and b/images/Pasted image 20231130144045.png differ diff --git a/images/Pasted image 20231130144109.png b/images/Pasted image 20231130144109.png new file mode 100644 index 00000000..dcba085a Binary files /dev/null and b/images/Pasted image 20231130144109.png differ diff --git a/images/Pasted image 20231130144118.png b/images/Pasted image 20231130144118.png new file mode 100644 index 00000000..f4a0a8fa Binary files /dev/null and b/images/Pasted image 20231130144118.png differ diff --git a/images/Pasted image 20231130144128.png b/images/Pasted image 20231130144128.png new file mode 100644 index 00000000..c3ba4e0b Binary files /dev/null and b/images/Pasted image 20231130144128.png differ diff --git a/images/Pasted image 20231130144139.png b/images/Pasted image 20231130144139.png new file mode 100644 index 00000000..1945fc2f Binary files /dev/null and b/images/Pasted image 20231130144139.png differ diff --git a/images/Pasted image 20231130144155.png b/images/Pasted image 20231130144155.png new file mode 100644 index 00000000..f721fc82 Binary files /dev/null and b/images/Pasted image 20231130144155.png differ diff --git a/images/Pasted image 20231130144943.png b/images/Pasted image 20231130144943.png new file mode 100644 index 00000000..8dda1dd9 Binary files /dev/null and b/images/Pasted image 20231130144943.png differ diff --git a/images/Pasted image 20231130145718.png b/images/Pasted image 20231130145718.png new file mode 100644 index 00000000..6d8dcde1 Binary files /dev/null and b/images/Pasted image 20231130145718.png differ diff --git a/images/Pasted image 20231130145958.png b/images/Pasted image 20231130145958.png new file mode 100644 index 00000000..6d8dcde1 Binary files /dev/null and b/images/Pasted image 20231130145958.png differ diff --git a/images/Pasted image 20231130151446.png b/images/Pasted image 20231130151446.png new file mode 100644 index 00000000..bbe78bb9 Binary files /dev/null and b/images/Pasted image 20231130151446.png differ diff --git a/images/Pasted image 20231130151542.png b/images/Pasted image 20231130151542.png new file mode 100644 index 00000000..34955803 Binary files /dev/null and b/images/Pasted image 20231130151542.png differ diff --git a/images/Pasted image 20231130151610.png b/images/Pasted image 20231130151610.png new file mode 100644 index 00000000..ae3b0e99 Binary files /dev/null and b/images/Pasted image 20231130151610.png differ diff --git a/images/Pasted image 20231130151712.png b/images/Pasted image 20231130151712.png new file mode 100644 index 00000000..9c32235b Binary files /dev/null and b/images/Pasted image 20231130151712.png differ diff --git a/images/Pasted image 20231130151757.png b/images/Pasted image 20231130151757.png new file mode 100644 index 00000000..8aaae04e Binary files /dev/null and b/images/Pasted image 20231130151757.png differ diff --git a/images/Pasted image 20231130151816.png b/images/Pasted image 20231130151816.png new file mode 100644 index 00000000..65567b49 Binary files /dev/null and b/images/Pasted image 20231130151816.png differ diff --git a/images/Pasted image 20231130152628.png b/images/Pasted image 20231130152628.png new file mode 100644 index 00000000..5ded16d8 Binary files /dev/null and b/images/Pasted image 20231130152628.png differ diff --git a/images/Pasted image 20231130152819.png b/images/Pasted image 20231130152819.png new file mode 100644 index 00000000..99eb8360 Binary files /dev/null and b/images/Pasted image 20231130152819.png differ diff --git a/images/Pasted image 20231130153025.png b/images/Pasted image 20231130153025.png new file mode 100644 index 00000000..1c3f3af6 Binary files /dev/null and b/images/Pasted image 20231130153025.png differ diff --git a/images/Pasted image 20231130153037.png b/images/Pasted image 20231130153037.png new file mode 100644 index 00000000..e5e11127 Binary files /dev/null and b/images/Pasted image 20231130153037.png differ diff --git a/images/Pasted image 20231130153054.png b/images/Pasted image 20231130153054.png new file mode 100644 index 00000000..036a9ada Binary files /dev/null and b/images/Pasted image 20231130153054.png differ diff --git a/images/Pasted image 20231130153156.png b/images/Pasted image 20231130153156.png new file mode 100644 index 00000000..ab043ff8 Binary files /dev/null and b/images/Pasted image 20231130153156.png differ diff --git a/images/Pasted image 20231130153220.png b/images/Pasted image 20231130153220.png new file mode 100644 index 00000000..7687864c Binary files /dev/null and b/images/Pasted image 20231130153220.png differ diff --git a/images/Pasted image 20231130153427.png b/images/Pasted image 20231130153427.png new file mode 100644 index 00000000..620b2a32 Binary files /dev/null and b/images/Pasted image 20231130153427.png differ diff --git a/images/Pasted image 20231130153507.png b/images/Pasted image 20231130153507.png new file mode 100644 index 00000000..474fd18d Binary files /dev/null and b/images/Pasted image 20231130153507.png differ diff --git a/images/Pasted image 20231130153801.png b/images/Pasted image 20231130153801.png new file mode 100644 index 00000000..d595b4f9 Binary files /dev/null and b/images/Pasted image 20231130153801.png differ diff --git a/images/Pasted image 20231130154714.png b/images/Pasted image 20231130154714.png new file mode 100644 index 00000000..570717e4 Binary files /dev/null and b/images/Pasted image 20231130154714.png differ diff --git a/images/Pasted image 20231130154804.png b/images/Pasted image 20231130154804.png new file mode 100644 index 00000000..fbb67bdb Binary files /dev/null and b/images/Pasted image 20231130154804.png differ diff --git a/images/Pasted image 20231130154849.png b/images/Pasted image 20231130154849.png new file mode 100644 index 00000000..44fbecd5 Binary files /dev/null and b/images/Pasted image 20231130154849.png differ diff --git a/images/Pasted image 20231130154943.png b/images/Pasted image 20231130154943.png new file mode 100644 index 00000000..fda9bcf3 Binary files /dev/null and b/images/Pasted image 20231130154943.png differ diff --git a/images/Pasted image 20231130155419.png b/images/Pasted image 20231130155419.png new file mode 100644 index 00000000..4a77794f Binary files /dev/null and b/images/Pasted image 20231130155419.png differ diff --git a/images/Pasted image 20231130155512.png b/images/Pasted image 20231130155512.png new file mode 100644 index 00000000..28d03f46 Binary files /dev/null and b/images/Pasted image 20231130155512.png differ diff --git a/images/Pasted image 20231130155752.png b/images/Pasted image 20231130155752.png new file mode 100644 index 00000000..fede5249 Binary files /dev/null and b/images/Pasted image 20231130155752.png differ diff --git a/images/dbt-linage.png b/images/dbt-linage.png new file mode 100644 index 00000000..14e0725f Binary files /dev/null and b/images/dbt-linage.png differ diff --git a/images/deployment-view.jpg b/images/deployment-view.jpg new file mode 100644 index 00000000..c5d2978c Binary files /dev/null and b/images/deployment-view.jpg differ diff --git a/images/favicon.png b/images/favicon.png new file mode 100644 index 00000000..74baa0ac Binary files /dev/null and b/images/favicon.png differ diff --git a/images/hello-data-superset copy.jpg b/images/hello-data-superset copy.jpg new file mode 100644 index 00000000..47621726 Binary files /dev/null and b/images/hello-data-superset copy.jpg differ diff --git a/images/hello-data-superset.jpg b/images/hello-data-superset.jpg new file mode 100644 index 00000000..ce028d30 Binary files /dev/null and b/images/hello-data-superset.jpg differ diff --git a/images/hello-data-superset.png b/images/hello-data-superset.png new file mode 100644 index 00000000..32b5d473 Binary files /dev/null and b/images/hello-data-superset.png differ diff --git a/images/helloDATA-portal-entry.jpg b/images/helloDATA-portal-entry.jpg new file mode 100644 index 00000000..f853fb8b Binary files /dev/null and b/images/helloDATA-portal-entry.jpg differ diff --git a/images/portal-airflow.jpg b/images/portal-airflow.jpg new file mode 100644 index 00000000..c2e6d5d9 Binary files /dev/null and b/images/portal-airflow.jpg differ diff --git a/images/portal-superset.jpg b/images/portal-superset.jpg new file mode 100644 index 00000000..e0a0e6f7 Binary files /dev/null and b/images/portal-superset.jpg differ diff --git a/images/roadmap.png b/images/roadmap.png new file mode 100644 index 00000000..36834988 Binary files /dev/null and b/images/roadmap.png differ diff --git a/images/showcase-boiler.png b/images/showcase-boiler.png new file mode 100644 index 00000000..c80013aa Binary files /dev/null and b/images/showcase-boiler.png differ diff --git a/images/workspaces-business-overview.png b/images/workspaces-business-overview.png new file mode 100644 index 00000000..a283db7e Binary files /dev/null and b/images/workspaces-business-overview.png differ diff --git a/images/workspaces-error-pull-image.png b/images/workspaces-error-pull-image.png new file mode 100644 index 00000000..3241109c Binary files /dev/null and b/images/workspaces-error-pull-image.png differ diff --git a/index.html b/index.html new file mode 100644 index 00000000..425bc1be --- /dev/null +++ b/index.html @@ -0,0 +1,853 @@ + + + + + + + + + + + + + + + + + + + + HelloDATA BE Docs + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + +
+ + +
+ +
+ + + + + + + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + +

Welcome to HelloDATA BE 👋🏻

+

This is the open documentation about HelloDATA BE. We hope you enjoy it.

+
+

Contribute

+

In case something is missing or you'd like to add something, below is how you can contribute:

+ +
+

What is HelloDATA BE?

+

HelloDATA BE is an enterprise data platform built on top of open-source tools based on the modern data stack. We use state-of-the-art tools such as dbt for data modeling with SQL and Airflow to run and orchestrate tasks and use Superset to visualize the BI dashboards. The underlying database is Postgres.

+

Each of these components is carefully chosen and additional tools can be added in a later stage.

+

Why do you need an Open Enterprise Data Platform (HelloDATA BE)?

+

These days the amount of data grows yearly more than the entire lifetime before. Each fridge, light bulb, or anything really starts to produce data. Meaning there is a growing need to make sense of more data. Usually, not all data is necessary and valid, but due to the nature of growing data, we must be able to collect and store them easily. There is a great need to be able to analyze this data. The result can be used for secondary usage and thus create added value.

+

That is what this open data enterprise platform is all about. In the old days, you used to have one single solution provided; think of SAP or Oracle. These days that has completely changed. New SaaS products are created daily, specializing in a tiny little niche. There are also many open-source tools to use and get going with minutes freely.

+

So why would you need a HelloDATA BE? It's simple. You want the best of both worlds. You want open source to not be locked-in, to use the strongest, collaboratively created product in the open. People worldwide can fix a security bug in minutes, or you can even go into the source code (as it's available for everyone) and fix it yourself—compared to an extensive vendor where you solely rely on their update cycle.

+

But let's be honest for a second if we use the latest shiny thing from open source. There are a lot of bugs, missing features, and independent tools. That's precisely where HelloDATA BE comes into play. We are building the missing platform that combines the best-of-breed open-source technologies into a single portal, making it enterprise-ready by adding features you typically won't get in an open-source product. Or we fix bugs that were encountered during our extensive tests.

+

Sounds too good to be true? Give it a try. Do you want to knot the best thing? It's open-source as well. Check out our GitHub HelloDATA BE.

+

Quick Start for Developers

+

Want to run HelloDATA BE and test it locally? Run the following command in the docker-compose directory to deploy all components:

+
cd hello-data-deployment/docker-compose
+docker-compose up -d
+
+

Note: Please refer to our docker-compose README for more information; there are some must presets you need to configure.

+ + + + + + +
+
+ + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/manuals/role-authorization-concept/index.html b/manuals/role-authorization-concept/index.html new file mode 100644 index 00000000..53947966 --- /dev/null +++ b/manuals/role-authorization-concept/index.html @@ -0,0 +1,1356 @@ + + + + + + + + + + + + + + + + + + + + + + Roles & Authorization - HelloDATA BE Docs + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + +
+ + +
+ +
+ + + + + + + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + +

Roles and authorization concept

+

Platform Authentication Authorization

+

Authentication and authorizations within the various logical contexts or domains of the HelloDATA system are handled as follows. 
+Authentication is handled via the OAuth 2 standard. In the case of the Canton of Bern, this is done via the central KeyCloak server. Authorizations to the various elements within a subject or Data Domain are handled via authorization within the HelloDATA portal.
+To keep administration simple, a role concept is applied. Instead of defining the authorizations for each user, roles receive the authorizations and the users are then assigned to the roles. The roles available in the portal have fixed defined permissions.

+

Business Domain

+

In order for a user to gain access to a Business Domain, the user must be authenticated for the Business Domain.
+Users without authentication who try to access a Business Domain will receive an error message.
+The following two logical roles are available within a Business Domain:

+
    +
  • HELLODATA_ADMIN
  • +
  • BUSINESS_DOMAIN_ADMIN
  • +
+

HELLODATA_ADMIN

+
    +
  • Can act fully in the system.
  • +
+

BUSINESS_DOMAIN_ADMIN

+
    +
  • Can manage users and assign roles (except HELLODATA_ADMIN).
  • +
  • Can manage dashboard metadata.
  • +
  • Can manage announcements.
  • +
  • Can manage the FAQ.
  • +
  • Can manage the external documentation links.
  • +
+

BUSINESS_DOMAIN_ADMIN is automatically DATA_DOMAIN_ADMIN in all Data Domains within the Business Domain (see Data Domain Context).

+

Data Domain

+

A Data Domain encapsulates all data elements and tools that are of interest for a specific issue.
+HalloDATA supports 1 - n Data Domains within a Business Domain.

+

The resources to be protected within a Data Domain are:

+
    +
  • Schema of the Data Domain.
  • +
  • Data mart tables of the Data Domain.
  • +
  • The entire DWH environment of the Data Domain.
  • +
  • Data lineage documents of the DBT projects of the Data Domain.
  • +
  • Dashboards, charts, datasets within the superset instance of a Data Domain.
  • +
  • Airflow DAGs of the Data Domain.
  • +
+

The following three logical roles are available within a Data Domain:

+
    +
  • DATA_DOMAIN_VIEWER    
  • +
  • DATA_DOMAIN_EDITOR
  • +
  • DATA_DOMAIN_ADMIN
  • +
+

Depending on the role assigned, users are given different permissions to act in the Data Domain.
+A user who has not been assigned a role in a Data Domain will generally not be granted access to any resources of that Data Domain.

+

DATA_DOMAIN_VIEWER

+
    +
  • The DATA_DOMAIN_VIEWER role is granted potential read access to dashboards of a Data Domain.
  • +
  • Which dashboards of the Data Domain a DATA_DOMAIN_VIEWER user is allowed to see is administered within the user management of the HelloDATA portal.
  • +
  • Only assigned dashboards are visible to a DATA_DOMAIN_VIEWER.
  • +
  • Only dashboards in "Published" status are visible to a DATA_DOMAIN_VIEWER. A DATA_DOMAIN_VIEWER can view all data lineage documents of the Data Domain.
  • +
  • A DATA_DOMAIN_VIEWER can access the links to external dashboards associated with its Data Domain. It is not checked whether the user has access in the systems outside the HelloDATA system boundary.
  • +
+

DATA_DOMAIN_EDITOR

+

Same as DATA_DOMAIN_VIEWER plus:

+
    +
  • The DATA_DOMAIN_EDITOR role is granted read and write access to the dashboards of a Data Domain. All dashboards are visible and editable for a DATA_DOMAIN_EDITOR. All charts used in the dashboards are visible and editable for a DATA_DOMAIN_EDITOR. All data sets used in the dashboards are visible and editable for a DATA_DOMAIN_EDITOR.
  • +
  • A DATA_DOMAIN_EDITOR can create new dashboards.
  • +
  • A DATA_DOMAIN_EDITOR can view the data marts of the Data Domain.
  • +
  • A DATA_DOMAIN_EDITOR has access to the SQL lab in the superset.
  • +
+

DATA_DOMAIN_ADMIN

+

Same as DATA_DOMAIN_EDITOR plus:

+

The DATA_DOMAIN_ADMIN role can view the airflow DAGs of the Data Domain.
+A DATA_DOMAIN_ADMIN can view all database objects in the DWH of the Data Domain.

+

Extra Data Domain

+

Beside the standard Data Domains there are also extra Data Domains
+An Extra Data Domain provides additional permissions, functions and database connections such as :

+
    +
  • CSV uploads to the Data Domain.
  • +
  • Read permissions from one Data Domain to additional other Data Domain(s).
  • +
  • Database connections to Data Domains of other databases.
  • +
  • Database connections via AD group permissions.
  • +
  • etc.
  • +
+

These additional permissions, functions or database connections are a matter of negotiation per extra Data Domain.
+The additional permissions, if any, are then added to the standard roles mentioned above for the extra Data Domain.

+

Row Level Security settings on Superset level can be used to additionally restrict the data that is displayed in a dashboard (e.g. only data of the own domain is displayed).

+

System Role to Portal Role Mapping

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
System RolePortal RolePortal PermissionMenu / Submenu / Page in PortalInfo
HELLODATA_ADMINSUPERUSERROLE_MANAGEMENTAdministration / Portal Rollenverwaltung
MONITORINGMonitoring
DEVTOOLSDev Tools
USER_MANAGEMENTAdministration / Benutzerverwaltung
FAQ_MANAGEMENTAdministration / FAQ Verwaltung
EXTERNAL_DASHBOARDS_MANAGEMENTUnter External DashboardsKann neue Einträge erstellen und verwalten bei Seite External Dashboards
DOCUMENTATION_MANAGEMENTAdministration / Dokumentationsmanagement
ANNOUNCEMENT_MANAGEMENTAdministration/ Ankündigungen
DASHBOARDSDashboardsSieht im Menu Liste, dann je einen Link auf alle Data Domains auf die er Zugriff hat mit deren Dashboards auf die er Zugriff hat plus Externe Dashboards
DATA_LINEAGEData LineageSieht im Menu je einen Lineage Link für alle Data Domains auf die er Zugriff hat
DATA_MARTSData MartsSieht im Menu je einen Data Mart Link für alle Data Domains auf die er Zugriff hat
DATA_DWHData Eng, / DWH ViewerSieht im Menu Data Eng. das Submenu DWH Viewer
DATA_ENGData Eng. / OrchestrationSieht im Menu Data Eng. das Submenu Orchestration
BUSINESS_DOMAIN_ADMINBUSINESS_DOMAIN_ADMINUSER_MANAGEMENTAdministration / Portal Rollenverwaltung
FAQ_MANAGEMENTDev Tools
EXTERNAL_DASHBOARDS_MANAGEMENTAdministration / Benutzerverwaltung
DOCUMENTATION_MANAGEMENTAdministration / FAQ Verwaltung
ANNOUNCEMENT_MANAGEMENTUnter External Dashboards
DASHBOARDSAdministration / DokumentationsmanagementSieht im Menu Liste, dann je einen Link auf alle Data Domains auf die er Zugriff hat mit deren Dashboards auf die er Zugriff hat plus Externe Dashboards
DATA_LINEAGEAdministration/ AnkündigungenSieht im Menu je einen Lineage Link für alle Data Domains auf die er Zugriff hat
DATA_MARTSData MartsSieht im Menu je einen Data Mart Link für alle Data Domains auf die er Zugriff hat
DATA_DWHData Eng, / DWH ViewerSieht im Menu Data Eng. das Submenu DWH Viewer
DATA_ENGData Eng. / OrchestrationSieht im Menu Data Eng. das Submenu Orchestration
DATA_DOMAIN_ADMINDATA_DOMAIN_ADMINDASHBOARDSDashboardsSieht im Menu Liste, dann je einen Link auf alle Data Domains auf die er Zugriff hat mit deren Dashboards auf die er Zugriff hat plus Externe Dashboards
DATA_LINEAGEData LineageSieht im Menu je einen Lineage Link für alle Data Domains auf die er Zugriff hat
DATA_MARTSData MartsSieht im Menu je einen Data Mart Link für alle Data Domains auf die er Zugriff hat
DATA_DWHData Eng, / DWH ViewerSieht im Menu Data Eng. das Submenu DWH Viewer
DATA_ENGData Eng. / OrchestrationSieht im Menu Data Eng. das Submenu Orchestration
DATA_DOMAIN_EDITOREDITORDASHBOARDSDashboardsSieht im Menu Liste, dann je einen Link auf alle Data Domains auf die er Zugriff hat mit deren Dashboards auf die er Zugriff hat plus Externe Dashboards
DATA_LINEAGEData LineageSieht im Menu je einen Lineage Link für alle Data Domains auf die er Zugriff hat
DATA_MARTSData MartsSieht im Menu je einen Data Mart Link für alle Data Domains auf die er Zugriff hat
DATA_DOMAIN_VIEWERVIEWERDASHBOARDSDashboardsSieht im Menu Liste, dann je einen Link auf alle Data Domains auf die er Zugriff hat mit deren Dashboards auf die er Zugriff hat plus Externe Dashboards
DATA_LINEAGEData LineageSieht im Menu je einen Lineage Link für alle Data Domains auf die er Zugriff hat
+

System Role to Superset Role Mapping

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
System RoleSuperset RoleInfo
No Data Domain rolePublicUser should not get access to Superset functions so he gets a role with no permissions.
DATA_DOMAIN_VIEWERBI_VIEWER plus roles forDashboards he was granted access to i. e. the slugified dashboard names with prefix "D_"Example: User is "DATA_DOMAIN_VIEWER" in a Data Domain. We grant the user acces to the "Hello World" dashboard. Then user gets the role "BI_VIEWER" plus the role "D_hello_world" in Superset.
DATA_DOMAIN_EDITORBI_EDITORHas access to all Dashboards as he is owner of the dashboards  plus he gets SQL Lab permissions.
DATA_DOMAIN_ADMINBI_EDITOR plus BI_ADMINHas access to all Dashboards as he is owner of the dashboards  plus he gets SQL Lab permissions.
+

System Role to Airflow Role Mapping

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
System RoleAirflow RoleInfo
HELLO_DATA_ADMINAdminUser gets DATA_DOMAIN_ADMIN role for all exisitng Data Domains and thus gets his permissions by that roles.

User additionally gets the Admin role.
BUSINESS_DOMAIN_ADMINUser gets DATA_DOMAIN_ADMIN role for all exisitng Data Domains and thus gets his permissions by that roles.
No Data Domain rolePublicUser should not get access to Airflow functions so he gets a role with no permissions.
DATA_DOMAIN_VIEWERPublicUser should not get access to Airflow functions so he gets a role with no permissions.
DATA_DOMAIN_EDITORPublicUser should not get access to Airflow functions so he gets a role with no permissions.
DATA_DOMAIN_ADMINAF_OPERATOR plus role corresponding to his Data Domain Key with prefix "DD_"Example: User is "DATA_DOMAIN_ADMIN" in a Data Domain with the key "data_domain_one". Then user gets the role "AF_OPERATOR" plus the role "DD_data_domain_one" in Airflow.
+ + + + + + +
+
+ + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/manuals/user-manual/index.html b/manuals/user-manual/index.html new file mode 100644 index 00000000..76758cb3 --- /dev/null +++ b/manuals/user-manual/index.html @@ -0,0 +1,1240 @@ + + + + + + + + + + + + + + + + + + + + + + User Manual - HelloDATA BE Docs + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + +
+ + +
+ +
+ + + + + + + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + +

User Manual

+

Goal

+

This use manual should enable you to use the HelloDATA platform and illustrate the features of the product and how to use them.

+

→ More about the Platform and its architecture you can find on Architecture & Concepts.

+ +

Portal

+

The entry page of HelloDATA is the Web Portal.

+
    +
  1. Navigation to jump to the different capabilities of HelloDATA
  2. +
  3. Extended status information about
      +
    1. data pipelines, containers, performance and security
    2. +
    3. documentation and subscriptions
    4. +
    +
  4. +
  5. User and profile information of logged-in user. 
  6. +
  7. Overview of your dashboards
  8. +
+

+

Business & Data Domain

+

As explained in Domain View, a key feature is to create business domains with n-data domains. If you have access to more than one data domain, you can switch between them by clicking the drop-down at the top and switch between them.

+

+

Dashboards

+

The most important navigation button is the dashboard links. If you hover over it, you'll see three options to choose from. 

+

You can either click the dashboard list in the hover menu (2) to see the list of dashboards with thumbnails, or directly choose your dashboard (3).

+

+

Data-Lineage

+

To see the data lineage (dependencies of your data tables), you have the second menu option. Again, you chose the list or directly on "data lineage" (2).

+

Button 2 will bring you to the project site, where you choose your project and load the lineage. +

+

Once loaded, you see all sources (1) and dbt Projects (2). On the detail page, you can see all the beautiful and helpful documentation such as:

+
    +
  • table name (3)
  • +
  • columns and data types (4)
  • +
  • which table and model this selected object depends on (5)
  • +
  • the SQL code (6)
      +
    • as a template or complied
    • +
    +
  • +
  • and dependency graph (7)
      +
    • which you can expand to full view (8) after clicking (7)
    • +
    • interactive data lineage view (9)
    • +
    +
  • +
+

+ +

+

Data Marts Viewer

+

This view let's you access the universaal data mart (udm) layer:

+

+

These are cleaned and modeled data mart tables. Data marts are the tables that have been joined and cleaned from the source tables. This is effectively the latest layer of HelloDATA BE, which the Dashboards are accessing. Dashboards should not access any layer before (landing zone, data storage, or data processing).

+

We use CloudBeaver for this, same as the DWH Viewer later. +

+

Data Engineering

+

DWH Viewer

+

This is essentially a database access layer where you see all your tables, and you can write SQL queries based on your access roles with a provided tool (CloudBeaver).

+
Create new SQL Query
+

o

+
Choose Connection and stored queries
+

You can chose pre-defined connections and query your data warehouse. Also you can store queries that other user can see and use as well. Run your queries with (1).

+

+
Settings and Powerful features
+

You can set many settings, such as user status, and many more.

+

+Please find all setting and features in the CloudBeaver Documentation.

+

Orchestration

+

The orchestrator is your task manager. You tell Airflow, our orchestrator, in which order the task will run. This is usually done ahead of time, and in the portal, you can see the latest runs and their status (successful, failed, etc.). 

+
    +
  • You can navigate to DAGs (2) and see all the details (3) with the DAG name, owner, runs, schedules, next run and recent.
  • +
  • You can also dive deeper into Datasets, Security, Admin or similar (4)
  • +
  • Airflow offers lots of different visualization modes, e.g. the Graph view (6), that allows you to see each step of this task.
      +
    • As you can see, you can choose calendar, task duration, Gantt, etc.
    • +
    +
  • +
+

+

+

Administration

+

Here you manage the portal configurations such as user, roles, announcements, FAQs, and documentation management.

+

+

Benutzerverwaltung / User Management

+
Adding user
+

First type your email and hit enter. Then choose the drop down and click on it. +

+

Now type the Name and hit Berechtigungen setzen to add the user: +

+

You should see something like this:

+

+
Changing Permissions
+
    +
  1. Search the user you want to give or change permission
  2. +
  3. Scroll to the right
  4. +
  5. Click the green edit icon
  6. +
+

+

Now choose the role you want to give:

+

+

And or give access to specific data domains:

+

+

See more in role-authorization-concept.

+

Portal Rollenverwaltung / Portal Role Management

+

In this portal role management, you can see all the roles that exist.

+
+

Warning

+

Creating new roles are not supported, despite the fact "Rolle erstellen" button exists. All roles are defined and hard coded.

+
+

+
Creating a new role
+

See how to create a new role below: +

+

Ankündigung / Announcement

+

You can simply create an announcement that goes to all users by Ankündigung erstellen: +

+

Then you fill in your message. Save it.

+

+You'll see a success if everything went well: +

+

And this is how it looks to the users — It will appear until the user clicks the cross to close it. +

+

FAQ

+

The FAQ works the same as the announcements above. They are shown on the starting dashboard, but you can set the granularity of a data domain:

+

+

And this is how it looks: +

+

Dokumentationsmanagement / Documentation Management

+

Lastly, you can document the system with documentation management. Here you have one document that you can document everything in detail, and everyone can write to it. It will appear on the dashboard as well:

+

+

Monitoring

+

We provide two different ways of monitoring: 

+
    +
  • Status: 
  • +
  • Workspaces
  • +
+

+

Status

+

It will show you details information on instances of HelloDATA, how is the situation for the Portal, is the monitoring running, etc. +

+

Data Domains

+

In Monitoring your data domains you see each system and the link to the native application. You can easily and quickly observer permission, roles and users by different subsystems (1). Click the one you want, and you can choose different levels (2) for each, and see its permissions (3).

+

+

+

By clicking on the blue underlined DBT Docs, you will be navigated to the native dbt docs. Same is true if you click on a Airflow or Superset instance.

+

DevTools

+

DevTools are additional tools HelloDATA provides out of the box to e.g. send Mail (Mailbox) or browse files (FileBrowser).

+

+

Mailbox

+

You can check in Mailbox (we use MailHog) what emails have been sending or what accounts are updated.|

+

+

FileBrowser

+

Here you can browse all the documentation or code from the git repos as file browser. We use FileBrowser here. Please use with care, as some of the folder are system relevant.

+
+

Log in

+

Make sure you have the login credentials to log in. Your administrator should be able to provide these to you.

+
+

+

More: Know-How

+ +

Find further important references, know-how, and best practices on HelloDATA Know-How.

+ + + + + + +
+
+ + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/more/changelog/index.html b/more/changelog/index.html new file mode 100644 index 00000000..03169f26 --- /dev/null +++ b/more/changelog/index.html @@ -0,0 +1,857 @@ + + + + + + + + + + + + + + + + + + + + Changelog - HelloDATA BE Docs + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + +
+ + +
+ +
+ + + + + + + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + +

Changelog

+

2023-11-22 Concepts

+
    +
  • Added workspaces on the concepts page.
  • +
  • Added showcase main category to explain the demo that comes with HD-BE
  • +
+

2023-11-20 Changed corporate design

+
    +
  • Changed primary color to KAIO style guide: color red (#EE0F0F), and font: Roboto (was already default font)
  • +
+

2023-11-06 Switched architecture over

+ +

2023-09-29 Initial version

+ + + + + + + +
+
+ + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/more/faq/index.html b/more/faq/index.html new file mode 100644 index 00000000..f9b07643 --- /dev/null +++ b/more/faq/index.html @@ -0,0 +1,791 @@ + + + + + + + + + + + + + + + + + + + + + + FAQ - HelloDATA BE Docs + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + +
+ + +
+ +
+ + + + + + + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + +

FAQ

+ + + + + + + + + +
+
+ + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/more/glossary/index.html b/more/glossary/index.html new file mode 100644 index 00000000..b7142e60 --- /dev/null +++ b/more/glossary/index.html @@ -0,0 +1,792 @@ + + + + + + + + + + + + + + + + + + + + + + Glossary - HelloDATA BE Docs + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + +
+ + +
+ +
+ + + + + + + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+ +
+ + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/search/search_index.json b/search/search_index.json new file mode 100644 index 00000000..1cbf85e4 --- /dev/null +++ b/search/search_index.json @@ -0,0 +1 @@ +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Welcome to HelloDATA BE \ud83d\udc4b\ud83c\udffb","text":"

This is the open documentation about HelloDATA BE. We hope you enjoy it.

Contribute

In case something is missing or you'd like to add something, below is how you can contribute:

  • Star our\u00a0GitHub \u2b50
  • Want to discuss, contribute, or need help, create a GitHub Issue, create a Pull Request or open a Discussion.
"},{"location":"#what-is-hellodata-be","title":"What is HelloDATA BE?","text":"

HelloDATA BE is an\u00a0enterprise data platform\u00a0built on top of open-source tools based on the modern data stack. We use state-of-the-art tools such as dbt for data modeling with SQL and Airflow to run and orchestrate tasks and use Superset to visualize the BI dashboards. The underlying database is Postgres.

Each of these components is carefully chosen and additional tools can be added in a later stage.

"},{"location":"#why-do-you-need-an-open-enterprise-data-platform-hellodata-be","title":"Why do you need an Open Enterprise Data Platform (HelloDATA BE)?","text":"

These days the amount of data grows yearly more than the entire lifetime before. Each fridge, light bulb, or anything really starts to produce data. Meaning there is a growing need to make sense of more data. Usually, not all data is necessary and valid, but due to the nature of growing data, we must be able to collect and store them easily. There is a great need to be able to analyze this data. The result can be used for secondary usage and thus create added value.

That is what this open data enterprise platform is all about. In the old days, you used to have one single solution provided; think of SAP or Oracle. These days that has completely changed. New SaaS products are created daily, specializing in a tiny little niche. There are also many open-source tools to use and get going with minutes freely.

So why would you need a HelloDATA BE? It's simple. You want the best of both worlds. You want\u00a0open source\u00a0to not be locked-in, to use the strongest, collaboratively created product in the open. People worldwide can fix a security bug in minutes, or you can even go into the source code (as it's available for everyone) and fix it yourself\u2014compared to an extensive vendor where you solely rely on their update cycle.

But let's be honest for a second if we use the latest shiny thing from open source. There are a lot of bugs, missing features, and independent tools. That's precisely where HelloDATA BE comes into play. We are building the\u00a0missing platform\u00a0that combines the best-of-breed open-source technologies into a\u00a0single portal, making it enterprise-ready by adding features you typically won't get in an open-source product. Or we fix bugs that were encountered during our extensive tests.

Sounds too good to be true? Give it a try. Do you want to knot the best thing? It's open-source as well. Check out our\u00a0GitHub HelloDATA BE.

"},{"location":"#quick-start-for-developers","title":"Quick Start for Developers","text":"

Want to run HelloDATA BE and test it locally? Run the following command in the docker-compose directory to deploy all components:

cd hello-data-deployment/docker-compose\ndocker-compose up -d\n

Note: Please refer to our docker-compose README for more information; there are some must presets you need to configure.

"},{"location":"architecture/architecture/","title":"Architecture","text":""},{"location":"architecture/architecture/#components","title":"Components","text":"

This chapter will explain the core architectural component, its context views, and how HelloDATA works under the hood.

"},{"location":"architecture/architecture/#domain-view","title":"Domain View","text":"

We separate between two main domains, \"Business and Data Domain.

"},{"location":"architecture/architecture/#business-vs-data-domain","title":"Business\u00a0vs. Data Domain","text":"
  • \"Business\" domain: This domain holds one customer or company with general services (portal, orchestration, docs, monitoring, logging). Every business domain represents a business tenant. HelloDATA is running on a Kubernetes cluster. A business domain is treated as a dedicated namespace within that cluster; thus, multi-tenancy is set up by various namespaces in the Kubernetes cluster.
  • Data Domain: This is where actual data is stored (db-schema). We combine it with a superset instance (related dashboards) and the documentation about these data. Currently, a business domain relates 1 - n to its Data Domains. Within an existing Business Domain a Data Domain can be spawned using Kubernetes deployment features and scripts to set up the database objects.

Resources encapsulated inside a\u00a0Data Domain\u00a0can be:

  • Schema of the Data Domain
  • Data mart tables of the Data Domain
  • The entire DWH environment of the Data Domain
  • Data lineage documents of the DBT projects of the Data Domain. Dashboards, charts, and datasets within the superset instance of a Data Domain.
  • Airflow DAGs of the Data Domain.

On top, you can add\u00a0subsystems. This can be seen as extensions that make HelloDATA pluggable with additional tools. We now support\u00a0CloudBeaver\u00a0for viewing your Postgres databases, RtD, and Gitea. You can imagine adding almost infinite tools with capabilities you'd like to have (data catalog, semantic layer, specific BI tool, Jupyter Notebooks, etc.).

Read more about Business and Data Domain access rights in\u00a0Roles / Authorization Concept.

"},{"location":"architecture/architecture/#data-domain","title":"Data Domain","text":"

Zooming into several Data Domains that can exist within a Business domain, we see an example of Data Domain A-C. Each Data Domain has a persistent storage, in our case, Postgres (see more details in the\u00a0Infrastructure Storage\u00a0chapter below).

Each data domain might import different source systems; some might even be used in several data domains, as illustrated. Each Data Domain is meant to have its data model with straightforward to, in the best case, layered data models as shown on the image with:

"},{"location":"architecture/architecture/#landingstaging-area","title":"Landing/Staging Area","text":"

Data from various source systems is first loaded into the Landing/Staging Area.

  • In this first area, the data is stored as it is delivered; therefore, the stage tables' structure corresponds to the interface to the source system.
  • No relationships exist between the individual tables.
  • Each table contains the data from the final delivery, which will be deleted before the next delivery.
  • For example, in a grocery store, the Staging Area corresponds to the loading dock where suppliers (source systems) deliver their goods (data). Only the latest deliveries are stored there before being transferred to the next area.
"},{"location":"architecture/architecture/#data-storage-cleansing-area","title":"Data Storage (Cleansing Area)","text":"

It must be cleaned before the delivered data is loaded into the Data Processing (Core). Most of these cleaning steps are performed in this area.

  • Faulty data must be filtered, corrected, or complemented with singleton (default) values.
  • Data from different source systems must be transformed and integrated into a unified form.
  • This layer also contains only the data from the final delivery.
  • For example, In a grocery store, the Cleansing Area can be compared to the area where the goods are commissioned for sale. The goods are unpacked, vegetables and salad are washed, the meat is portioned, possibly combined with multiple products, and everything is labeled with price tags. The quality control of the delivered goods also belongs in this area.
"},{"location":"architecture/architecture/#data-processing-core","title":"Data Processing\u00a0(Core)","text":"

The data from the different source systems are brought together in a central area, the Data Processing (Core), through the Landing and Data Storage and stored there for extended periods, often several years.\u00a0

  • A primary task of this layer is to integrate the data from different sources and store it in a thematically structured way rather than separated by origin.
  • Often, thematic sub-areas in the Core are called \"Subject Areas.\"
  • The data is stored in the Core so that historical data can be determined at any later point in time.\u00a0
  • The Core should be the only data source for the Data Marts.
  • Direct access to the Core by users should be avoided as much as possible.
"},{"location":"architecture/architecture/#data-mart","title":"Data Mart","text":"

Subsets of the data from the Core are stored in a form suitable for user queries.\u00a0

  • Each Data Mart should only contain the data relevant to each application or a unique view of the data. This means several Data Marts are typically defined for different user groups and BI applications.
  • This reduces the complexity of the queries, increasing the acceptance of the DWH system among users.
  • For example, The Data Marts are the grocery store's market stalls or sales points. Each market stand offers a specific selection of goods, such as vegetables, meat, or cheese. The goods are presented so that they are accepted, i.e., purchased, by the respective customer group.

Between the layers, we have lots of\u00a0Metadata

Different types of metadata are needed for the smooth operation of the Data Warehouse. Business metadata contains business descriptions of all attributes, drill paths, and aggregation rules for the front-end applications and code designations. Technical metadata describes, for example, data structures, mapping rules, and parameters for ETL control. Operational metadata contains all log tables, error messages, logging of ETL processes, and much more. The metadata forms the infrastructure of a DWH system and is described as \"data about data\".

"},{"location":"architecture/architecture/#example-multiple-superset-dashboards-within-a-data-domain","title":"Example: Multiple Superset Dashboards within a Data Domain","text":"

Within a Data Domain, several users build up different dashboards. Think of a dashboard as a specific use case e.g., Covid, Sales, etc., that solves a particular purpose. Each of these dashboards consists of individual charts and data sources in superset. Ultimately, what you see in the HelloDATA portal are the dashboards that combine all of the sub-components of what Superset provides.

"},{"location":"architecture/architecture/#portalui-view","title":"Portal/UI View","text":"

As described in the intro. The portal is the heart of the HelloDATA application, with access to all critical applications.

Entry page of helloDATA: When you enter the portal for the first time, you land on the dashboard where you have

  1. Navigation to jump to the different capabilities of helloDATA
  2. Extended status information about
    1. data pipelines, containers, performance, and security
    2. documentation and subscriptions
  3. User and profile information of logged-in users.\u00a0
  4. Choosing the data domain you want to work within your business domain
  5. Overview of your dashboards
  6. dbt linage docs
  7. Data marts of your Postgres database
  8. Answers to freuqently asked questions

More technical details are in the \"Module deployment view\" chapter below.

"},{"location":"architecture/architecture/#module-view-and-communication","title":"Module View and Communication","text":""},{"location":"architecture/architecture/#modules","title":"Modules","text":"

Going one level deeper, we see that we use different modules to make the portal and helloDATA work.\u00a0

We have the following modules:

  • Keycloak: Open-source identity and access management. This handles everything related to user permissions and roles in a central place that we integrate into helloDATA.
  • Redis: open-source, in-memory data store that we use for persisting technical values for the portal to work.\u00a0
  • NATS: Open-source connective technology for the cloud. It handles communication with the different tools we use.
  • Data Stack: We use the open-source data stack with dbt, Airflow, and Superset. See more information in the intro chapters above. Subsystems can be added on demand as extensible plugins.

"},{"location":"architecture/architecture/#what-is-keycloak-and-how-does-it-work","title":"What is Keycloak and how does it work?","text":"

At the center are two components, NATS and Keycloak.\u00a0Keycloak, together with the HelloDATA portal, handles the authentication, authorization, and permission management of HelloDATA components. Keycloak is a powerful open-source identity and access management system. Its primary benefits include:

  1. Ease of Use: Keycloak is easy to set up and use and can be deployed on-premise or in the cloud.
  2. Integration: It integrates seamlessly with existing applications and systems, providing a secure way of authenticating users and allowing them to access various resources and services with a single set of credentials.
  3. Single Sign-On: Keycloak takes care of user authentication, freeing applications from having to handle login forms, user authentication, and user storage. Users can log in once and access all applications linked to Keycloak without needing to re-authenticate. This extends to logging out, with Keycloak offering single sign-out across all linked applications.
  4. Identity Brokering and Social Login: Keycloak can authenticate users with existing OpenID Connect or SAML 2.0 Identity Providers and easily enable social network logins without requiring changes to your application's code.
  5. User Federation: Keycloak has the capacity to connect to existing LDAP or Active Directory servers and can support custom providers for users stored in other databases.
  6. Admin Console: Through the admin console, administrators can manage all aspects of the Keycloak server, including features, identity brokering, user federation, applications, services, authorization policies, user permissions, and sessions.
  7. Account Management Console: Users can manage their own accounts, update profiles, change passwords, setup two-factor authentication, manage sessions, view account history, and link accounts with additional identity providers if social login or identity brokering has been enabled.
  8. Standard Protocols: Keycloak is built on standard protocols, offering support for OpenID Connect, OAuth 2.0, and SAML.
  9. Fine-Grained Authorization Services: Beyond role-based authorization, Keycloak provides fine-grained authorization services, enabling the management of permissions for all services from the Keycloak admin console. This allows for the creation of specific policies to meet unique needs. Within HelloDATA, the HelloDATA portal manages authorization, yet if required by upcoming subsystems, this KeyCloak feature can be utilized in tandem.
  10. Two-Factor Authentication (2FA): This optional feature of KeyCloak enhances security by requiring users to provide two forms of authentication before gaining access, adding an extra layer of protection to the authentication process.
"},{"location":"architecture/architecture/#what-is-nats-and-how-does-it-work","title":"What is NATS and how does it work?","text":"

On the other hand, NATS is central for handling communication between the different modules. Its power comes from integrating modern distributed systems. It is the glue between microservices, making and processing statements, or stream processing.

NATS focuses on hyper-connected moving parts and additional data each module generates. It supports location independence and mobility, whether the backend process is streaming or otherwise, and securely handles all of it.

NATs let you connect mobile frontend or microservice to connect flexibly. There is no need for static 1:1 communication with a hostname, IP, or port. On the other hand, NATS lets you m:n connectivity based on subject instead. Still, you can use 1:1, but on top, you have things like load balancers, logs, system and network security models, proxies, and, most essential for us,\u00a0sidecars. We use sidecars heavily in connection with NATS.

NATS can be\u00a0deployed\u00a0nearly anywhere: on bare metal, in a VM, as a container, inside K8S, on a device, or in whichever environment you choose. And all fully secure.

"},{"location":"architecture/architecture/#subsystem-communication","title":"Subsystem communication","text":"

Here is an example of subsystem communication. NATS, obviously at the center, handles these communications between the HelloDATA platform and the subsystems with its workers, as seen in the image below.

The HelloDATA portal has workers.\u00a0These workers are deployed as extra containers with sidecars, called \"Sidecar Containers\". Each module needing communicating needs a sidecar with these workers deployed to communicate with NATS. Therefore, the subsystem itself has its workers to share with NATS as well.

"},{"location":"architecture/architecture/#messaging-component-workers","title":"Messaging component workers","text":"

Everything starts with a\u00a0web browser\u00a0session. The HelloDATA user accesses the\u00a0HelloDATA Portal\u00a0through HTTP. Before you see any of your modules or components, you must authorize yourself again, Keycloak. Once logged in, you have a Single Sign-on Token that will give access to different business domains or data domains depending on your role.

The HelloDATA portal sends an event to the EventWorkers via JDBC to the Portal database.\u00a0The\u00a0portal database\u00a0persists settings from the portal and necessary configurations.

The\u00a0EventWorkers, on the other side communicate with the different\u00a0HelloDATA Modules\u00a0discussed above (Keycloak, NATS, Data Stack with dbt, Airflow, and Superset) where needed. Each module is part of the domain view, which persists their data within their datastore.

"},{"location":"architecture/architecture/#flow-chart","title":"Flow Chart","text":"

In this flow chart, you see again what we discussed above in a different way. Here, we\u00a0assign a new user role. Again, everything starts with the HelloDATA Portal and an existing session from Keycloak. With that, the portal worker will publish a JSON message via UserRoleEvent to NATS. As\u00a0the\u00a0communication hub for HelloDATA, NATS knows what to do with each message and sends it to the respective subsystem worker.

Subsystem workers will execute that instruction and create and populate roles on, e.g., Superset and Airflow, and once done, inform the spawned subsystem worker that it's done. The worker will push it back to NATS, telling the portal worker, and at the end, will populate a message on the HelloDATA portal.

"},{"location":"architecture/architecture/#building-block-view","title":"Building Block View","text":""},{"location":"architecture/data-stack/","title":"Data Stack","text":"

We'll explain which data stack is behind HelloDATA BE.

"},{"location":"architecture/data-stack/#control-pane-portal","title":"Control Pane - Portal","text":"

The\u00a0differentiator of HelloDATA\u00a0lies in the Portal. It combines all the loosely open-source tools into a single control pane.

The portal lets you see:

  • Data models with a dbt lineage: You see the sources of a given table or even column.
  • You can check out the latest runs. Gives you when the dashboards have been updated.
  • Create and view all company-wide reports and dashboards.
  • View your data tables as Data Marts: Accessing physical tables, columns, and schemas.
  • Central Monitoring of all processes running in the portal.
  • Manage and control all your user access and role permission and authorization.

You can find more about the navigation and the features in the\u00a0User Manual.

"},{"location":"architecture/data-stack/#data-modeling-with-sql-dbt","title":"Data Modeling with SQL - dbt","text":"

dbt\u00a0is a small database toolset that has gained immense popularity and is the facto standard for working with SQL. Why, you might ask? SQL is the most used language besides Python for data engineers, as it is\u00a0declarative and easy to learn the basics, and many business analysts or people working with Excel or similar tools might know a little already.

The declarative approach is handy as you only define the\u00a0what, meaning you determine what columns you want in the SELECT and which table to query in the FROM statement. You can do more advanced things with WHERE, GROUP BY, etc., but you do not need to care about the\u00a0how. You do not need to watch which database, which partition it is stored, what segment, or what storage. You do not need to know if an index makes sense to use. All of it is handled by the\u00a0query optimizer\u00a0of Postgres (or any database supporting SQL).

But let's face it: SQL also has its downside. If you have worked extensively with SQL, you know the spaghetti code that usually happens when using it. It's an issue because of the repeatability\u2014no\u00a0variable\u00a0we can set and reuse in an SQL. If you are familiar with them, you can achieve a better structure with\u00a0CTEs, which allows you to define specific queries as a block to reuse later. But this is only within one single query and handy if the query is already log.

But what if you'd like to define your facts and dimensions as a separate query and reuse that in another query? You'd need to decouple the queries from storage, and we would persist it to disk and use that table on disk as a FROM statement for our following query. But what if we change something on the query or even change the name we won't notice in the dependent queries? And we will need to find out which queries depend on each other. There is no\u00a0lineage\u00a0or dependency graph.

It takes a lot of work to be organized with SQL. There is also not a lot of support if you use a database, as they are declarative. You need to make sure how to store them in git or how to run them.

That's where dbt comes into play. dbt lets you\u00a0create these dependencies within SQL. You can declaratively build on each query, and you'll get errors if one changes but not the dependent one. You get a lineage graph (see an\u00a0example), unit tests, and more. It's like you have an assistant that helps you do your job. It's added software engineering practice that we stitch on top of SQL engineering.

The danger we need to be aware of, as it will be so easy to build your models, is not to make 1000 of 1000 tables. As you will get lots of errors checked by the pre-compiling dbt, \u00a0good data modeling techniques are essential to succeed.

Below, you see dbt docs, lineage, and templates: 1. Project Navigation 2. Detail Navigation 3. SQL Template 4. SQL Compiled (practical SQL that gets executed) 5. Full Data lineage where with the source and transformation for the current object

Or zoom dbt lineage (when clicked):

"},{"location":"architecture/data-stack/#task-orchestration-airflow","title":"Task Orchestration - Airflow","text":"

Airflow\u00a0is the natural next step. If you have many SQLs representing your business metrics, you want them to run on a daily or hourly schedule triggered by events. That's where Airflow comes into play. Airflow is, in its simplest terms, a task or workflow scheduler, which tasks or\u00a0DAGs\u00a0(how they are called) can be written programatically with Python. If you know\u00a0cron\u00a0jobs, these are the lowest task scheduler in Linux (think * * * * *), but little to no customization beyond simple time scheduling.

Airflow is different. Writing the DAGs in Python allows you to do whatever your business logic requires before or after a particular task is started. In the past, ETL tools like Microsoft SQL Server Integration Services (SSIS) and others were widely used. They were where your data transformation, cleaning and normalisation took place. In more modern architectures, these tools aren\u2019t enough anymore. Moreover, code and data transformation logic are much more valuable to other data-savvy people (data anlysts, data scientists, business analysts) in the company instead of locking them away in a propreitary format.

Airflow or a general Orchestrator ensures correct execution of depend tasks. It is very flexibile and extensible with operators from the community or in-build capabiliities of the framework itself.

"},{"location":"architecture/data-stack/#default-view","title":"Default View","text":"

Airflow DAGs - Entry page which shows you the status of all your DAGs - what's the schedule of each job - are they active, how often have they failed, etc.

Next, you can click on each of the DAGs and get into a detailed view:

"},{"location":"architecture/data-stack/#airflow-operations-overview-for-one-dag","title":"Airflow operations overview for one DAG","text":"
  1. General visualization possibilities which you prefer to see (here Grid view)
  2. filter your DAG runs
  3. see details on each run status in one view\u00a0
  4. Check details in the table view
  5. Gantt view for another example to see how long each sub-task had of the DAG
"},{"location":"architecture/data-stack/#graph-view-of-dag","title":"Graph view of DAG","text":"

It shows you the dependencies of your business's various tasks, ensuring that the order is handled correctly.

"},{"location":"architecture/data-stack/#dashboards-superset","title":"Dashboards - Superset","text":"

Superset\u00a0is the entry point to your data. It's a popular open-source business intelligence dashboard tool that visualizes your data according to your needs.\u00a0It's able to handle all the latest chart types. You can combine them into dashboards filtered and drilled down as expected from a BI tool. The access to dashboards is restricted to authenticated users only. A user can be given view or edit rights to individual dashboards using roles and permissions. Public access to dashboards is not supported.

"},{"location":"architecture/data-stack/#example-dashboard","title":"Example dashboard","text":""},{"location":"architecture/data-stack/#supported-charts","title":"Supported Charts","text":"

(see live in action)

"},{"location":"architecture/data-stack/#storage-layer-postgres","title":"Storage Layer - Postgres","text":"

Let's start with the storage layer. We use Postgres, the currently\u00a0most used and loved database. Postgres is versatile and simple to use. It's a\u00a0relational database\u00a0that can be customized and scaled extensively.

"},{"location":"architecture/infrastructure/","title":"Infrastructure","text":"

Infrastructure is the part where we go into depth about how to run HelloDATA and its components on\u00a0Kubernetes.

"},{"location":"architecture/infrastructure/#kubernetes","title":"Kubernetes","text":"

Kubernetes and its platform allow you to run and orchestrate container workloads. Kubernetes has become\u00a0popular\u00a0and is the\u00a0de-facto standard\u00a0for your cloud-native apps to (auto-)\u00a0scale-out\u00a0and deploy the various open-source tools fast, on any cloud, and locally. This is called cloud-agnostic, as you are not locked into any cloud vendor (Amazon, Microsoft, Google, etc.).

Kubernetes is\u00a0infrastructure as code, specifically as YAML, allowing you to version and test your deployment quickly. All the resources in Kubernetes, including Pods, Configurations, Deployments, Volumes, etc., can be expressed in a YAML file using Kubernetes tools like HELM. Developers quickly write applications that run across multiple operating environments. Costs can be reduced by scaling down and using any programming language running with a simple Dockerfile. Its management makes it accessible through its modularity and abstraction; also, with the use of Containers, you can monitor all your applications in one place.

Kubernetes\u00a0Namesspaces\u00a0provides a mechanism for isolating groups of resources within a single cluster. Names of resources need to be unique within a namespace but not across namespaces. Namespace-based scoping is applicable only for namespaced\u00a0objects (e.g. Deployments, Services, etc)\u00a0and not for cluster-wide objects\u00a0(e.g., StorageClass, Nodes, PersistentVolumes, etc).

  • Namespaces provide a mechanism for isolating groups of resources within a single cluster (separation of concerns). Namespaces also lets you easily wramp up several HelloDATA instances on demand.\u00a0
    • Names of resources need to be unique within a namespace but not across namespaces.
  • We get central monitoring and logging solutions with\u00a0Grafana,\u00a0Prometheus, and the\u00a0ELK stack (Elasticsearch, Logstash, and Kibana). As well as the Keycloak single sign-on.
  • Everything runs in a single Kubernetes Cluster but can also be deployed on-prem on any Kubernetes Cluster.
  • Persistent data will run within the \"Data Domain\" and must run on a\u00a0Persistent Volume\u00a0on Kubernetes or a central Postgres service (e.g., on Azure or internal).

"},{"location":"architecture/infrastructure/#module-deployment-view","title":"Module deployment view","text":"

Here, we have a look at the module view with an inside view of accessing the\u00a0HelloDATA Portal.

The Portal API serves with\u00a0SpringBoot,\u00a0Wildfly\u00a0and\u00a0Angular.

"},{"location":"architecture/infrastructure/#storage-data-domain","title":"Storage (Data Domain)","text":"

Following up on how storage is persistent for the\u00a0Domain View\u00a0introduced in the above chapters.\u00a0

"},{"location":"architecture/infrastructure/#data-domain-storage-view","title":"Data-Domain Storage View","text":"

Storage is an important topic, as this is where the business value and the data itself are stored.

From a Kubernetes and deployment view, everything is encapsulated inside a Namespace. As explained in the above \"Domain View\", we have different layers from one Business domain (here Business Domain) to n (multiple) Data Domains.\u00a0

Each domain holds its data on\u00a0persistent storage, whether Postgres for relational databases, blob storage for files or file storage on persistent volumes within Kubernetes.

GitSync is a tool we added to allow\u00a0GitOps-type deployment. As a user, you can push changes to your git repo, and GitSync will automatically deploy that into your cluster on Kubernetes.

"},{"location":"architecture/infrastructure/#business-domain-storage-view","title":"Business-Domain Storage View","text":"

Here is another view that persistent storage within Kubernetes (K8s) can hold data across the Data Domain. If these\u00a0persistent volumes\u00a0are used to store Data Domain information, it will also require implementing a backup and restore plan for these data.

Alternatively, blob storage on any\u00a0cloud vendor or services\u00a0such as Postgres service can be used, as these are typically managed and come with features such as backup and restore.

"},{"location":"architecture/infrastructure/#k8s-jobs","title":"K8s Jobs","text":"

HelloDATA uses Kubernetes jobs to perform certain activities

"},{"location":"architecture/infrastructure/#cleanup-jobs","title":"Cleanup Jobs","text":"

Contents:

  • Cleaning up user activity logs
  • Cleaning up logfiles

"},{"location":"architecture/infrastructure/#deployment-platforms","title":"Deployment Platforms","text":"

HelloDATA can be operated as different platforms, e.g. development, test, and/or production platforms. The deployment is based on common CICD principles. It uses GIT and flux internally to deploy its resources onto the specific Kubernetes clusters. In case of resource shortages, the underlying platform can be extended with additional resources upon request. Horizontal scaling of the infrastructure can be done within the given resources boundaries (e. g. multiple pods for Superset.)

"},{"location":"architecture/infrastructure/#platform-authentication-authorization","title":"Platform Authentication Authorization","text":"

See at\u00a0Roles and authorization concept.

"},{"location":"concepts/showcase/","title":"Showcase: Animal Statistics (Switzerland)","text":""},{"location":"concepts/showcase/#what-is-the-showcase","title":"What is the Showcase?","text":"

It's the demo cases of HD-BE, it's importing animal data from an external source and loading them with Airflow, modeled with dbt, and visualized in Superset.

It hopefully will show you how the platform works and it comes pre-installed with the docker-compose installation.

"},{"location":"concepts/showcase/#how-can-i-get-started-and-explore-it","title":"How can I get started and explore it?","text":"

Click on the data-domain showcase and you can explore pre-defined dashboards with below described Airflow job and dbt models.

"},{"location":"concepts/showcase/#how-does-it-look","title":"How does it look?","text":"

Below the technical details of the showcase are described. How the airflow pipeline is collecting the data from an open API and modeling it with dbt.

"},{"location":"concepts/showcase/#airflow-pipeline","title":"Airflow Pipeline","text":"
  • data_download The source files, which are in CSV format, are queried via the data_download task and stored in the file system.
  • create_tables Based on the CSV files, tables are created in the LZN database schema of the project.
  • insert_data After the tables have been created, in this step, the source data from the CSV file is copied into the corresponding tables in the LZN database schema.
  • dbt_run After the preceding steps have been executed and the data foundation for the DBT framework has been established, the data processing steps in the database can be initiated using DBT scripts. (described in the DBT section)
  • dbt_docs Upon completion of generating the tables in the database, a documentation of the tables and their dependencies is generated using DBT.
  • dbt_docs_serve For the visualization of the generated documentation, it is provided in the form of a website.
"},{"location":"concepts/showcase/#dbt-data-modeling","title":"DBT: Data modeling","text":""},{"location":"concepts/showcase/#fact_breeds_long","title":"fact_breeds_long","text":"

The fact table fact_breeds_long describes key figures, which are used to derive the stock of registered, living animals, divided by breeds over time.

The following tables from the [lzn] database schema are selected for the calculation of the key figure:

  • cats_breeds
  • cattle_breeds
  • dogs_breeds
  • equids_breeds
  • goats_breeds
  • sheep_breeds

"},{"location":"concepts/showcase/#fact_cattle_beefiness_fattissue","title":"fact_cattle_beefiness_fattissue","text":"

The fact table fact_catle_beefiness_fattissue describes key figures, which are used to derive the number of slaughtered cows by year and month. Classification is done according to CH-TAX (Trading Class Classification CHTAX System | VIEGUT AG)

The following tables from the [lzn] database schema are selected for the calculation of the key figure:

  • cattle_evolbeefiness
  • cattle_evolfattissue

"},{"location":"concepts/showcase/#fact_cattle_popvariations","title":"fact_cattle_popvariations","text":"

The fact table fact_cattle_popvariations describes key figures, which are used to derive the increase and decrease of the cattle population in the Animal Traffic Database (https://www.agate.ch/) over time (including reports from Liechtenstein). The key figures are grouped according to the following types of reports:

  • Birth
  • Slaughter
  • Death

The following table from the [lzn] database schema is selected for the calculation of the key figure:

  • cattle_popvariations

"},{"location":"concepts/showcase/#fact_cattle_pyr_wide-fact_cattle_pyr_long","title":"fact_cattle_pyr_wide\u00a0&\u00a0fact_cattle_pyr_long","text":"

The fact table fact_cattle_popvariations describes key figures, which are used to derive the distribution of registered living cattle by age class and gender.

The following table from the [lzn] database schema is selected for the calculation of the key figure:

  • cattle_pyr

The fact table fact_cattle_pyr_long pivots all key figures from fact_cattle_pyr_wide.

"},{"location":"concepts/showcase/#superset","title":"Superset","text":""},{"location":"concepts/showcase/#database-connection","title":"Database Connection","text":"

The data foundation of the Superset visualizations in the form of Datasets, Dashboards, and Charts is realized through a Database Connection.

In this case, a database connection to a database is established, which refers to a PostgreSQL database in which the above-described DBT scripts were executed.

"},{"location":"concepts/showcase/#datasets","title":"Datasets","text":"

Datasets are used to prepare the data foundation in a suitable form, which can then be visualized in charts in an appropriate way.

Essentially, modeled fact tables from the UDM database schema are selected and linked with dimension tables.

This allows facts to be calculated or evaluated at different levels of professional granularity.

"},{"location":"concepts/showcase/#interfaces","title":"Interfaces","text":""},{"location":"concepts/showcase/#tierstatistik","title":"Tierstatistik","text":"Source Description https://tierstatistik.identitas.ch/de/ Website of the API provider https://tierstatistik.identitas.ch/de/docs.html Documentation of the platform and description of the data basis and API tierstatistik.identitas.ch/tierstatistik.rdf API and data provided by the website"},{"location":"concepts/workspaces-troubleshoot/","title":"Troubleshooting","text":""},{"location":"concepts/workspaces-troubleshoot/#kubernetes","title":"Kubernetes","text":"

If you haven't turned on Kubernetes, you'll get an error similar to this: urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='kubernetes.docker.internal', port=6443): Max retries exceeded with url: /api/v1/namespaces/default/pods?labelSelector=dag_id%3Drun_boiler_example%2Ckubernetes_pod_operator%3DTrue%2Cpod-label-test%3Dlabel-name-test%2Crun_id%3Dmanual__2024-01-29T095915.2491840000-f3be8d87f%2Ctask_id%3Drun_duckdb_query%2Calready_checked%21%3DTrue%2C%21airflow-worker (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0xffff82c2ab10>: Failed to establish a new connection: [Errno 111] Connection refused'))

Full log:

[2024-01-29, 09:48:49 UTC] {pod.py:1017} ERROR - 'NoneType' object has no attribute 'metadata'\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.11/site-packages/urllib3/connection.py\", line 174, in _new_conn\n    conn = connection.create_connection(\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.11/site-packages/urllib3/util/connection.py\", line 95, in create_connection\n    raise err\n  File \"/usr/local/lib/python3.11/site-packages/urllib3/util/connection.py\", line 85, in create_connection\n    sock.connect(sa)\nConnectionRefusedError: [Errno 111] Connection refused\nDuring handling of the above exception, another exception occurred:\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py\", line 714, in urlopen\n    httplib_response = self._make_request(\n^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py\", line 403, in _make_request\n    self._validate_conn(conn)\nFile \"/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py\", line 1053, in _validate_conn\n    conn.connect()\nFile \"/usr/local/lib/python3.11/site-packages/urllib3/connection.py\", line 363, in connect\n    self.sock = conn = self._new_conn()\n^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.11/site-packages/urllib3/connection.py\", line 186, in _new_conn\n    raise NewConnectionError(\nurllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0xffff82db3650>: Failed to establish a new connection: [Errno 111] Connection refused\nDuring handling of the above exception, another exception occurred:\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py\", line 583, in execute_sync\n    self.pod = self.get_or_create_pod(  # must set `self.pod` for `on_kill`\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py\", line 545, in get_or_create_pod\n    pod = self.find_pod(self.namespace or pod_request_obj.metadata.namespace, context=context)\n....\n\nairflow.exceptions.AirflowException: Pod airflow-running-dagster-workspace-jdkqug7h returned a failure.\nremote_pod: None\n[2024-01-29, 09:48:49 UTC] {taskinstance.py:1398} INFO - Marking task as UP_FOR_RETRY. dag_id=run_boiler_example, task_id=run_duckdb_query, execution_date=20210501T000000, start_date=20240129T094849, end_date=20240129T094849\n[2024-01-29, 09:48:49 UTC] {standard_task_runner.py:104} ERROR - Failed to execute job 3 for task run_duckdb_query (Pod airflow-running-dagster-workspace-jdkqug7h returned a failure.\nremote_pod: None; 225)\n[2024-01-29, 09:48:49 UTC] {local_task_job_runner.py:228} INFO - Task exited with return code 1\n[2024-01-29, 09:48:49 UTC] {taskinstance.py:2776} INFO - 0 downstream tasks scheduled from follow-on schedule check\n

"},{"location":"concepts/workspaces-troubleshoot/#docker-image-not-build-locally-or-missing","title":"Docker image not build locally or missing","text":"

If your name or image is not available locally (check docker image ls), you'll get an error on Airflow like this:

[2024-01-29, 10:10:14 UTC] {pod.py:961} INFO - Building pod airflow-running-dagster-workspace-64ngbudj with labels: {'dag_id': 'run_boiler_example', 'task_id': 'run_duckdb_query', 'run_id': 'manual__2024-01-29T101013.7029880000-328a76b5e', 'kubernetes_pod_operator': 'True', 'try_number': '1'}\n[2024-01-29, 10:10:14 UTC] {pod.py:538} INFO - Found matching pod airflow-running-dagster-workspace-64ngbudj with labels {'airflow_kpo_in_cluster': 'False', 'airflow_version': '2.7.1-astro.1', 'dag_id': 'run_boiler_example', 'kubernetes_pod_operator': 'True', 'pod-label-test': 'label-name-test', 'run_id': 'manual__2024-01-29T101013.7029880000-328a76b5e', 'task_id': 'run_duckdb_query', 'try_number': '1'}\n[2024-01-29, 10:10:14 UTC] {pod.py:539} INFO - `try_number` of task_instance: 1\n[2024-01-29, 10:10:14 UTC] {pod.py:540} INFO - `try_number` of pod: 1\n[2024-01-29, 10:10:14 UTC] {pod_manager.py:348} WARNING - Pod not yet started: airflow-running-dagster-workspace-64ngbudj\n[2024-01-29, 10:10:15 UTC] {pod_manager.py:348} WARNING - Pod not yet started: airflow-running-dagster-workspace-64ngbudj\n[2024-01-29, 10:10:16 UTC] {pod_manager.py:348} WARNING - Pod not yet started: airflow-running-dagster-workspace-64ngbudj\n[2024-01-29, 10:10:17 UTC] {pod_manager.py:348} WARNING - Pod not yet started: airflow-running-dagster-workspace-64ngbudj\n[2024-01-29, 10:10:18 UTC] {pod_manager.py:348} WARNING - Pod not yet started: airflow-running-dagster-workspace-64ngbudj\n[2024-01-29, 10:12:15 UTC] {pod.py:823} INFO - Deleting pod: airflow-running-dagster-workspace-64ngbudj\n[2024-01-29, 10:12:15 UTC] {taskinstance.py:1935} ERROR - Task failed with exception\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py\", line 594, in execute_sync\n    self.await_pod_start(pod=self.pod)\nFile \"/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py\", line 556, in await_pod_start\n    self.pod_manager.await_pod_start(pod=pod, startup_timeout=self.startup_timeout_seconds)\nFile \"/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py\", line 354, in await_pod_start\n    raise PodLaunchFailedException(msg)\nairflow.providers.cncf.kubernetes.utils.pod_manager.PodLaunchFailedException: Pod took longer than 120 seconds to start. Check the pod events in kubernetes to determine why.\nDuring handling of the above exception, another exception occurred:\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py\", line 578, in execute\n    return self.execute_sync(context)\n^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py\", line 617, in execute_sync\n    self.cleanup(\nFile \"/usr/local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py\", line 746, in cleanup\n    raise AirflowException(\nairflow.exceptions.AirflowException: Pod airflow-running-dagster-workspace-64ngbudj returned a failure.\n\n...\n\n[2024-01-29, 10:12:15 UTC] {local_task_job_runner.py:228} INFO - Task exited with return code 1\n[2024-01-29, 10:12:15 UTC] {taskinstance.py:2776} INFO - 0 downstream tasks scheduled from follow-on schedule check\n

If you open a kubernetes Monitoring tool such as Lens or k9s, you'll also see the pod struggling to pull the image:

Another cause, in case you haven't created the local PersistentVolume, you'd see something like \"my-pvc\" does not exist. Then you'd need to create the pvc first.

"},{"location":"concepts/workspaces/","title":"Data Engineering Workspaces","text":"

On this page, we'll explain what workspaces in the context of HelloDATA-BE are and how to use them, and you'll create your own based on a prepared starter repo.

Info

Also see the step-by-step video we created that might help you further.

"},{"location":"concepts/workspaces/#what-is-a-workspace","title":"What is a Workspace?","text":"

Within the context of HelloDATA-BE, data, engineers, or technical people can\u00a0develop their dbt, airflow, or even bring their tool, all packed into a separate git-repo and run as part of HelloDATA-BE where they enjoy the benefits of persistent storage, visualization tools, user management, monitoring, etc.

graph TD\n    subgraph \"Business Domain (Tenant)\"\n        BD[Business Domain]\n        BD -->|Services| SR1[Portal]\n        BD -->|Services| SR2[Orchestration]\n        BD -->|Services| SR3[Lineage]\n        BD -->|Services| SR5[Database Manager]\n        BD -->|Services| SR4[Monitoring & Logging]\n    end\n    subgraph \"Workspaces\"\n        WS[Workspaces] -->|git-repo| DE[Data Engineering]\n        WS[Workspaces] -->|git-repo| ML[ML Team]\n        WS[Workspaces] -->|git-repo| DA[Product Analysts]\n        WS[Workspaces] -->|git-repo| NN[...]\n    end\n    subgraph \"Data Domain (1-n)\"\n        DD[Data Domain] -->|Persistent Storage| PG[Postgres]\n        DD[Data Domain] -->|Data Modeling| DBT[dbt]\n        DD[Data Domain] -->|Visualization| SU[Superset]\n    end\n\n    BD -->|Contains 1-n| DD\n    DD -->|n-instances| WS\n\n    %% Colors\n    class BD business\n    class DD data\n    class WS workspace\n    class SS,PGA subsystem\n    class SR1,SR2,SR3,SR4 services\n\n    classDef business fill:#96CD70,stroke:#333,stroke-width:2px;\n    classDef data fill:#A898D8,stroke:#333,stroke-width:2px;\n    classDef workspace fill:#70AFFD,stroke:#333,stroke-width:2px;\n    %% classDef subsystem fill:#F1C40F,stroke:#333,stroke-width:2px;\n    %% classDef services fill:#E74C3C,stroke:#333,stroke-width:1px;
A schematic overview of workspaces are embedded into HelloDATA-BE.

A workspace can have n-instances within a data domain. What does it mean? Each team can deal with its requirements to develop and build their project independently.

Think of an ML engineer who needs heavy tools such as Tensorflow, etc., as an analyst might build simple dbt models. In contrast, another data engineer uses a specific tool from the Modern Data Stack.

"},{"location":"concepts/workspaces/#when-to-use-workspaces","title":"When to use Workspaces","text":"

Workspaces are best used for development, implementing custom business logic, and modeling your data. But there is no limit to what you build as long as it can be run as a DAG as an Airflow data pipeline.

Generally speaking, a workspace is used whenever someone needs to create a custom logic yet to be integrated within the HelloDATA BE Platform.

As a second step - imagine you implemented a critical business transformation everyone needs - that code and DAG could be moved and be a default DAG within a data domain. But the development always happens within the workspace, enabling self-serve.

Without workspaces, every request would need to go over the HelloDATA BE Project team. Data engineers need a straightforward way isolated from deployment where they can add custom code for their specific data domain pipelines.

"},{"location":"concepts/workspaces/#how-does-a-workspace-work","title":"How does a Workspace work?","text":"

When you create your workspace, it will be deployed within HelloDATA-BE and run by an Airflow DAG. The Airflow DAG is the integration into HD. You'll define things like how often it runs, what it should run, the order of it, etc.

Below, you see an example of two different Airflow DAGs deployed from two different Workspaces (marked red arrow):

"},{"location":"concepts/workspaces/#how-do-i-create-my-own-workspace","title":"How do I create my own Workspace?","text":"

To implement your own Workspace, we created a hellodata-be-workspace-starter. This repo contains a minimal set of artefacts in order to be deployed on HD.

"},{"location":"concepts/workspaces/#pre-requisites","title":"Pre-requisites","text":"
  • Install latest Docker Desktop
  • Activate Kubernetes feature in Docker Desktop (needed to run Airflow DAG as an Docker-Image): Settings -> Kubernetes -> Enable Kubernetes
"},{"location":"concepts/workspaces/#step-by-step-guide","title":"Step-by-Step Guide","text":"
  1. Clone hellodata-be-workspace-starter.
  2. Add your own custom logic to the repo, update Dockerfile with relevant libraries and binaries you need.
  3. Create one or multiple Airflow DAGs for running within HelloDATA-BE.
  4. Build the image with docker build -t hellodata-ws-boilerplate:0.1.0-a.1 . (or the name of choice)
  5. Start up Airflow locally with Astro CLI (see more below) and run/test the pipeline
  6. Define needed ENV-Variables and deployments needs (to be set-up by HD-Team initially once)
  7. Push the image to a DockerHub of choice
  8. Ask HD Team to deploy initially

From now on whenever you have a change, you just build a new image and that will be deployed on HelloDATA-BE automatically. Making you and your team independent.

"},{"location":"concepts/workspaces/#boiler-plate-example","title":"Boiler-Plate Example","text":"

Below you find an example structure that help you understand how to configure workspaces for your needs.

"},{"location":"concepts/workspaces/#boiler-plate-repo","title":"Boiler-Plate repo","text":"

The repo helps you to build your workspace by simply clone the whole repo and adding your changes.

We generally have these boiler plate files:

\u251c\u2500\u2500 Dockerfile\n\u251c\u2500\u2500 Makefile\n\u251c\u2500\u2500 README.md\n\u251c\u2500\u2500 build-and-push.sh\n\u251c\u2500\u2500 deployment\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 deployment-needs.yaml\n\u2514\u2500\u2500 src\n    \u251c\u2500\u2500 dags\n    \u2502\u00a0\u00a0 \u2514\u2500\u2500 airflow\n    \u2502\u00a0\u00a0     \u251c\u2500\u2500 .astro\n    \u2502\u00a0\u00a0     \u2502\u00a0\u00a0 \u251c\u2500\u2500 config.yaml\n    \u2502\u00a0\u00a0     \u251c\u2500\u2500 Dockerfile\n    \u2502\u00a0\u00a0     \u251c\u2500\u2500 Makefile\n    \u2502\u00a0\u00a0     \u251c\u2500\u2500 README.md\n    \u2502\u00a0\u00a0     \u251c\u2500\u2500 airflow_settings.yaml\n    \u2502\u00a0\u00a0     \u251c\u2500\u2500 dags\n    \u2502\u00a0\u00a0     \u2502\u00a0\u00a0 \u251c\u2500\u2500 .airflowignore\n    \u2502\u00a0\u00a0     \u2502\u00a0\u00a0 \u2514\u2500\u2500 boiler-example.py\n    \u2502\u00a0\u00a0     \u251c\u2500\u2500 include\n    \u2502\u00a0\u00a0     \u2502\u00a0\u00a0 \u2514\u2500\u2500 .kube\n    \u2502\u00a0\u00a0     \u2502\u00a0\u00a0     \u2514\u2500\u2500 config\n    \u2502\u00a0\u00a0     \u251c\u2500\u2500 packages.txt\n    \u2502\u00a0\u00a0     \u251c\u2500\u2500 plugins\n    \u2502\u00a0\u00a0     \u251c\u2500\u2500 requirements.txt\n    \u2514\u2500\u2500 duckdb\n        \u2514\u2500\u2500 query_duckdb.py\n

"},{"location":"concepts/workspaces/#important-files-business-logic-dag","title":"Important files: Business logic (DAG)","text":"

Where as query_duckdb.py and the boiler-example.py DAG are in this case are my custom code that you'd change with your own code.

Although the Airflow DAG can be re-used as we use KubernetesPodOperator that works works within HD and locally (check more below). Essentially you change the name and the schedule to your needs, the image name and your good to go.

Example of a Airflow DAG:

from pendulum import datetime\nfrom airflow import DAG\nfrom airflow.configuration import conf\nfrom airflow.providers.cncf.kubernetes.operators.kubernetes_pod import (\nKubernetesPodOperator,\n)\nfrom kubernetes.client import models as k8s\nimport os\ndefault_args = {\n\"owner\": \"airflow\",\n\"depend_on_past\": False,\n\"start_date\": datetime(2021, 5, 1),\n\"email_on_failure\": False,\n\"email_on_retry\": False,\n\"retries\": 1,\n}\nworkspace_name = os.getenv(\"HD_WS_BOILERPLATE_NAME\", \"ws-boilerplate\")\nnamespace = os.getenv(\"HD_NAMESPACE\", \"default\")\n# This will use .kube/config for local Astro CLI Airflow and ENV variable for k8s deployment\nif namespace == \"default\":\nconfig_file = \"include/.kube/config\"  # copy your local kube file to the include folder: `cp ~/.kube/config include/.kube/config`\nin_cluster = False\nelse:\nin_cluster = True\nconfig_file = None\nwith DAG(\ndag_id=\"run_boiler_example\",\nschedule=\"@once\",\ndefault_args=default_args,\ndescription=\"Boiler Plate for running a hello data workspace in airflow\",\ntags=[workspace_name],\n) as dag:\nKubernetesPodOperator(\nnamespace=namespace,\nimage=\"my-docker-registry.com/hellodata-ws-boilerplate:0.1.0\",\nimage_pull_secrets=[k8s.V1LocalObjectReference(\"regcred\")],\nlabels={\"pod-label-test\": \"label-name-test\"},\nname=\"airflow-running-dagster-workspace\",\ntask_id=\"run_duckdb_query\",\nin_cluster=in_cluster,  # if set to true, will look in the cluster, if false, looks for file\ncluster_context=\"docker-desktop\",  # is ignored when in_cluster is set to True\nconfig_file=config_file,\nis_delete_operator_pod=True,\nget_logs=True,\n# please add/overwrite your command here\ncmds=[\"/bin/bash\", \"-cx\"],\narguments=[\n\"python query_duckdb.py && echo 'Query executed successfully'\",  # add your command here\n],\n)\n

"},{"location":"concepts/workspaces/#dag-how-to-test-or-run-a-dag-locally-before-deploying","title":"DAG: How to test or run a DAG locally before deploying","text":"

To run locally, the easiest way is to use the Astro CLI (see link for installation). With it, we can simply astro start or astro stop to start up/down.

For local deployment we have these requirements:

  • Local Docker installed (either native or Docker-Desktop)
  • make sure Kubernetes is enabled
  • copy you local kube-file to astro: cp ~/.kube/config src/dags/airflow/include/.kube/
  • attention, under Windows you find that file most probably under: C:\\Users\\[YourIdHere]\\.kube\\config
  • make sure docker image is available locally (for Airflow to use it) -> docker build must have run (check with docker image ls

The config file is used from astro to run on local Kubernetes. Se more infos on Run your Astro project in a local Airflow environment.

"},{"location":"concepts/workspaces/#install-requirements-dockerfile","title":"Install Requirements: Dockerfile","text":"

Below is the example how to install requirements (here duckdb) and copy my custom code src/duckdb/query_duckdb.py to the image.

Boiler-plate example:

FROM python:3.10-slim\nRUN mkdir -p /opt/airflow/airflow_home/dags/\n\n# Copy your airflow DAGs which will be copied into bussiness domain Airflow (These DAGs will be executed by Airflow)\nCOPY ../src/dags/airflow/dags/* /opt/airflow/airflow_home/dags/\n\nWORKDIR /usr/src/app\nRUN pip install --upgrade pip\n\n# Install DuckDB (example - please add your own dependencies here)\nRUN pip install duckdb\n\n# Copy the script into the container\nCOPY src/duckdb/query_duckdb.py ./\n\n# long-running process to keep the container running \nCMD tail -f /dev/null\n

"},{"location":"concepts/workspaces/#deployment-deployment-needsyaml","title":"Deployment: deployment-needs.yaml","text":"

Below you see an an example of a deployment needs in deployment-needs.yaml, that defines:

  • Docker image
  • Volume mounts you need
  • a command to run
  • container behaviour
  • extra ENV variables and values that HD-Team needs to provide for you

This part is the one that will change most likely

All of which will be eventually more automated. Also let us know or just add missing specs to the file and we'll add the functionallity on the deployment side.

spec:\ninitContainers:\ncopy-dags-to-bd:\nimage:\nrepository: my-docker-registry.com/hellodata-ws-boilerplate\npullPolicy: IfNotPresent\ntag: \"0.1.0\"\nresources: {}\nvolumeMounts:\n- name: storage-hellodata\ntype: external\npath: /storage\ncommand: [ \"/bin/sh\",\"-c\" ]\nargs: [ \"mkdir -p /storage/${datadomain}/dags/${workspace}/ && rm -rf /storage/${datadomain}/dags/${workspace}/* && cp -a /opt/airflow/airflow_home/dags/*.py /storage/${datadomain}/dags/${workspace}/\" ]\ncontainers:\n- name: ws-boilerplate\nimage: my-docker-registry.com/hellodata-ws-boilerplate:0.1.0\nimagePullPolicy: Always\n#needed envs for Airflow\nairflow:\nextraEnv: |\n- name: \"HD_NAMESPACE\"\nvalue: \"${namespace}\"\n- name: \"HD_WS_BOILERPLATE_NAME\"\nvalue: \"dd01-ws-boilerplate\"\n
"},{"location":"concepts/workspaces/#example-with-airflow-and-dbt","title":"Example with Airflow and dbt","text":"

We've added another demo dag called showcase-boiler.py which is an DAG that download data from the web (animal statistics, ~150 CSVs), postgres tables are created, data inserted and a dbt run and docs is ran at the end.

In this case we use multiple task in a DAG, these have all the same image, but you could use different one for each step. Meaning you could use Python for download, R for transformatin and Java for machine learning. But as long as images are similar, I'd suggest to use the same image.

"},{"location":"concepts/workspaces/#volumes-pvc","title":"Volumes / PVC","text":"

Another addition is the use of voulmes. These are a persistent storage also called pvs in Kubernetes, which allow to store intermediate storage outside of the container. Downloaded CSVs are stored there for the next task to pick up from that storage.

Locally you need to create such a storage once, there is a script in case you want to apply it to you local Docker-Desktop setup. Run this command:

kubectl apply -f src/volume_mount/pvc.yaml\n

Be sure to use the same name, in this example we use my-pvc in your DAGs as well. See in the showcase-boiler.py how the volumnes are mounted like this:

volume_claim = k8s.V1PersistentVolumeClaimVolumeSource(claim_name=\"my-pvc\")\nvolume = k8s.V1Volume(name=\"my-volume\", persistent_volume_claim=volume_claim)\nvolume_mount = k8s.V1VolumeMount(name=\"my-volume\", mount_path=\"/mnt/pvc\")\n

"},{"location":"concepts/workspaces/#conclusion","title":"Conclusion","text":"

I hope this has illustrated how to create your own workspace. Otherwise let us know in the discussions or create an issue/PR.

"},{"location":"concepts/workspaces/#troubleshooting","title":"Troubleshooting","text":"

If you enconter errors, we collect them in Troubleshooting.

"},{"location":"manuals/role-authorization-concept/","title":"Roles and authorization concept","text":""},{"location":"manuals/role-authorization-concept/#platform-authentication-authorization","title":"Platform Authentication Authorization","text":"

Authentication and authorizations within the various logical contexts or domains of the HelloDATA system are handled as follows.\u00a0 Authentication is handled via the OAuth 2 standard. In the case of the Canton of Bern, this is done via the central KeyCloak server. Authorizations to the various elements within a subject or Data Domain are handled via authorization within the HelloDATA portal. To keep administration simple, a role concept is applied. Instead of defining the authorizations for each user, roles receive the authorizations and the users are then assigned to the roles. The roles available in the portal have fixed defined permissions.

"},{"location":"manuals/role-authorization-concept/#business-domain","title":"Business Domain","text":"

In order for a user to gain access to a Business Domain, the user must be authenticated for the Business Domain. Users without authentication who try to access a Business Domain will receive an error message. The following two logical roles are available within a Business Domain:

  • HELLODATA_ADMIN
  • BUSINESS_DOMAIN_ADMIN
"},{"location":"manuals/role-authorization-concept/#hellodata_admin","title":"HELLODATA_ADMIN","text":"
  • Can act fully in the system.
"},{"location":"manuals/role-authorization-concept/#business_domain_admin","title":"BUSINESS_DOMAIN_ADMIN","text":"
  • Can manage users and assign roles (except HELLODATA_ADMIN).
  • Can manage dashboard metadata.
  • Can manage announcements.
  • Can manage the FAQ.
  • Can manage the external documentation links.

BUSINESS_DOMAIN_ADMIN is automatically DATA_DOMAIN_ADMIN in all Data Domains within the Business Domain (see Data Domain Context).

"},{"location":"manuals/role-authorization-concept/#data-domain","title":"Data Domain","text":"

A Data Domain encapsulates all data elements and tools that are of interest for a specific issue. HalloDATA supports 1 - n Data Domains within a Business Domain.

The resources to be protected within a Data Domain are:

  • Schema of the Data Domain.
  • Data mart tables of the Data Domain.
  • The entire DWH environment of the Data Domain.
  • Data lineage documents of the DBT projects of the Data Domain.
  • Dashboards, charts, datasets within the superset instance of a Data Domain.
  • Airflow DAGs of the Data Domain.

The following three logical roles are available within a Data Domain:

  • DATA_DOMAIN_VIEWER \u00a0 \u00a0
  • DATA_DOMAIN_EDITOR
  • DATA_DOMAIN_ADMIN

Depending on the role assigned, users are given different permissions to act in the Data Domain. A user who has not been assigned a role in a Data Domain will generally not be granted access to any resources of that Data Domain.

"},{"location":"manuals/role-authorization-concept/#data_domain_viewer","title":"DATA_DOMAIN_VIEWER","text":"
  • The DATA_DOMAIN_VIEWER role is granted potential read access to dashboards of a Data Domain.
  • Which dashboards of the Data Domain a DATA_DOMAIN_VIEWER user is allowed to see is administered within the user management of the HelloDATA portal.
  • Only assigned dashboards are visible to a DATA_DOMAIN_VIEWER.
  • Only dashboards in \"Published\" status are visible to a DATA_DOMAIN_VIEWER. A DATA_DOMAIN_VIEWER can view all data lineage documents of the Data Domain.
  • A DATA_DOMAIN_VIEWER can access the links to external dashboards associated with its Data Domain. It is not checked whether the user has access in the systems outside the HelloDATA system boundary.
"},{"location":"manuals/role-authorization-concept/#data_domain_editor","title":"DATA_DOMAIN_EDITOR","text":"

Same as DATA_DOMAIN_VIEWER plus:

  • The DATA_DOMAIN_EDITOR role is granted read and write access to the dashboards of a Data Domain. All dashboards are visible and editable for a DATA_DOMAIN_EDITOR. All charts used in the dashboards are visible and editable for a DATA_DOMAIN_EDITOR. All data sets used in the dashboards are visible and editable for a DATA_DOMAIN_EDITOR.
  • A DATA_DOMAIN_EDITOR can create new dashboards.
  • A DATA_DOMAIN_EDITOR can view the data marts of the Data Domain.
  • A DATA_DOMAIN_EDITOR has access to the SQL lab in the superset.
"},{"location":"manuals/role-authorization-concept/#data_domain_admin","title":"DATA_DOMAIN_ADMIN","text":"

Same as DATA_DOMAIN_EDITOR plus:

The DATA_DOMAIN_ADMIN role can view the airflow DAGs of the Data Domain. A DATA_DOMAIN_ADMIN can view all database objects in the DWH of the Data Domain.

"},{"location":"manuals/role-authorization-concept/#extra-data-domain","title":"Extra Data Domain","text":"

Beside the standard Data Domains there are also extra Data Domains An Extra Data Domain provides additional permissions, functions and database connections such as :

  • CSV uploads to the Data Domain.
  • Read permissions from one Data Domain to additional other Data Domain(s).
  • Database connections to Data Domains of other databases.
  • Database connections via AD group permissions.
  • etc.

These additional permissions, functions or database connections are a matter of negotiation per extra Data Domain. The additional permissions, if any, are then added to the standard roles mentioned above for the extra Data Domain.

Row Level Security settings on Superset level can be used to additionally restrict the data that is displayed in a dashboard (e.g. only data of the own domain is displayed).

"},{"location":"manuals/role-authorization-concept/#system-role-to-portal-role-mapping","title":"System Role to Portal Role Mapping","text":"System Role Portal Role Portal Permission Menu / Submenu / Page in Portal Info HELLODATA_ADMIN SUPERUSER ROLE_MANAGEMENT Administration / Portal Rollenverwaltung MONITORING Monitoring DEVTOOLS Dev Tools USER_MANAGEMENT Administration / Benutzerverwaltung FAQ_MANAGEMENT Administration / FAQ Verwaltung EXTERNAL_DASHBOARDS_MANAGEMENT Unter External Dashboards Kann neue Eintr\u00e4ge erstellen und verwalten bei Seite External Dashboards DOCUMENTATION_MANAGEMENT Administration / Dokumentationsmanagement ANNOUNCEMENT_MANAGEMENT Administration/ Ank\u00fcndigungen DASHBOARDS Dashboards Sieht im Menu Liste, dann je einen Link auf alle Data Domains auf die er Zugriff hat mit deren Dashboards auf die er Zugriff hat plus Externe Dashboards DATA_LINEAGE Data Lineage Sieht im Menu je einen Lineage Link f\u00fcr alle Data Domains auf die er Zugriff hat DATA_MARTS Data Marts Sieht im Menu je einen Data Mart Link f\u00fcr alle Data Domains auf die er Zugriff hat DATA_DWH Data Eng, / DWH Viewer Sieht im Menu Data Eng. das Submenu DWH Viewer DATA_ENG Data Eng. / Orchestration Sieht im Menu Data Eng. das Submenu Orchestration BUSINESS_DOMAIN_ADMIN BUSINESS_DOMAIN_ADMIN USER_MANAGEMENT Administration / Portal Rollenverwaltung FAQ_MANAGEMENT Dev Tools EXTERNAL_DASHBOARDS_MANAGEMENT Administration / Benutzerverwaltung DOCUMENTATION_MANAGEMENT Administration / FAQ Verwaltung ANNOUNCEMENT_MANAGEMENT Unter External Dashboards DASHBOARDS Administration / Dokumentationsmanagement Sieht im Menu Liste, dann je einen Link auf alle Data Domains auf die er Zugriff hat mit deren Dashboards auf die er Zugriff hat plus Externe Dashboards DATA_LINEAGE Administration/ Ank\u00fcndigungen Sieht im Menu je einen Lineage Link f\u00fcr alle Data Domains auf die er Zugriff hat DATA_MARTS Data Marts Sieht im Menu je einen Data Mart Link f\u00fcr alle Data Domains auf die er Zugriff hat DATA_DWH Data Eng, / DWH Viewer Sieht im Menu Data Eng. das Submenu DWH Viewer DATA_ENG Data Eng. / Orchestration Sieht im Menu Data Eng. das Submenu Orchestration DATA_DOMAIN_ADMIN DATA_DOMAIN_ADMIN DASHBOARDS Dashboards Sieht im Menu Liste, dann je einen Link auf alle Data Domains auf die er Zugriff hat mit deren Dashboards auf die er Zugriff hat plus Externe Dashboards DATA_LINEAGE Data Lineage Sieht im Menu je einen Lineage Link f\u00fcr alle Data Domains auf die er Zugriff hat DATA_MARTS Data Marts Sieht im Menu je einen Data Mart Link f\u00fcr alle Data Domains auf die er Zugriff hat DATA_DWH Data Eng, / DWH Viewer Sieht im Menu Data Eng. das Submenu DWH Viewer DATA_ENG Data Eng. / Orchestration Sieht im Menu Data Eng. das Submenu Orchestration DATA_DOMAIN_EDITOR EDITOR DASHBOARDS Dashboards Sieht im Menu Liste, dann je einen Link auf alle Data Domains auf die er Zugriff hat mit deren Dashboards auf die er Zugriff hat plus Externe Dashboards DATA_LINEAGE Data Lineage Sieht im Menu je einen Lineage Link f\u00fcr alle Data Domains auf die er Zugriff hat DATA_MARTS Data Marts Sieht im Menu je einen Data Mart Link f\u00fcr alle Data Domains auf die er Zugriff hat DATA_DOMAIN_VIEWER VIEWER DASHBOARDS Dashboards Sieht im Menu Liste, dann je einen Link auf alle Data Domains auf die er Zugriff hat mit deren Dashboards auf die er Zugriff hat plus Externe Dashboards DATA_LINEAGE Data Lineage Sieht im Menu je einen Lineage Link f\u00fcr alle Data Domains auf die er Zugriff hat"},{"location":"manuals/role-authorization-concept/#system-role-to-superset-role-mapping","title":"System Role to Superset Role Mapping","text":"System Role Superset Role Info No Data Domain role Public User should not get access to Superset functions so he gets a role with no permissions. DATA_DOMAIN_VIEWER BI_VIEWER plus\u00a0roles forDashboards he was granted access to i. e. the slugified dashboard names with prefix \"D_\" Example: User is \"DATA_DOMAIN_VIEWER\" in a Data Domain. We grant the user acces to the \"Hello World\" dashboard. Then user gets the role \"BI_VIEWER\" plus the role \"D_hello_world\" in Superset. DATA_DOMAIN_EDITOR BI_EDITOR Has access to all Dashboards as he is owner of the dashboards\u00a0 plus he gets SQL Lab permissions. DATA_DOMAIN_ADMIN BI_EDITOR plus\u00a0BI_ADMIN Has access to all Dashboards as he is owner of the dashboards\u00a0 plus he gets SQL Lab permissions."},{"location":"manuals/role-authorization-concept/#system-role-to-airflow-role-mapping","title":"System Role to Airflow Role Mapping","text":"System Role Airflow Role Info HELLO_DATA_ADMIN Admin User gets DATA_DOMAIN_ADMIN role for all exisitng Data Domains and thus gets his permissions by that roles.User additionally gets the Admin role. BUSINESS_DOMAIN_ADMIN User gets DATA_DOMAIN_ADMIN role for all exisitng Data Domains and thus gets his permissions by that roles. No Data Domain role Public User should not get access to Airflow functions so he gets a role with no permissions. DATA_DOMAIN_VIEWER Public User should not get access to Airflow functions so he gets a role with no permissions. DATA_DOMAIN_EDITOR Public User should not get access to Airflow functions so he gets a role with no permissions. DATA_DOMAIN_ADMIN AF_OPERATOR plus\u00a0role corresponding to his Data Domain Key with prefix \"DD_\" Example: User is \"DATA_DOMAIN_ADMIN\" in a Data Domain with the key \"data_domain_one\". Then user gets the role \"AF_OPERATOR\" plus the role \"DD_data_domain_one\" in Airflow."},{"location":"manuals/user-manual/","title":"User Manual","text":""},{"location":"manuals/user-manual/#goal","title":"Goal","text":"

This use manual should enable you to use the HelloDATA platform and illustrate the features of the product and how to use them.

\u2192 More about the Platform and its architecture you can find on\u00a0Architecture & Concepts.

"},{"location":"manuals/user-manual/#navigation","title":"Navigation","text":""},{"location":"manuals/user-manual/#portal","title":"Portal","text":"

The entry page of HelloDATA is the Web Portal.

  1. Navigation to jump to the different capabilities of HelloDATA
  2. Extended status information about
    1. data pipelines, containers, performance and security
    2. documentation and subscriptions
  3. User and profile information of logged-in user.\u00a0
  4. Overview of your dashboards

"},{"location":"manuals/user-manual/#business-data-domain","title":"Business & Data Domain","text":"

As explained in Domain View, a key feature is to create business domains with n-data domains. If you have access to more than one data domain, you can switch between them by clicking the drop-down at the top and switch between them.

"},{"location":"manuals/user-manual/#dashboards","title":"Dashboards","text":"

The most important navigation button is the dashboard links. If you hover over it, you'll see three options to choose from.\u00a0

You can either click the dashboard list in the hover menu (2) to see the list of dashboards with thumbnails, or directly choose your dashboard (3).

"},{"location":"manuals/user-manual/#data-lineage","title":"Data-Lineage","text":"

To see the data lineage (dependencies of your data tables), you have the second menu option. Again, you chose the list or directly on \"data lineage\" (2).

Button 2 will bring you to the project site, where you choose your project and load the lineage.

Once loaded, you see all sources (1) and dbt Projects (2). On the detail page, you can see all the beautiful and helpful documentation such as:

  • table name (3)
  • columns and data types (4)
  • which table and model this selected object depends on (5)
  • the SQL code (6)
    • as a template or complied
  • and dependency graph (7)
    • which you can expand to full view (8) after clicking (7)
    • interactive data lineage view (9)

"},{"location":"manuals/user-manual/#data-marts-viewer","title":"Data Marts Viewer","text":"

This view let's you access the universaal data mart (udm) layer:

These are cleaned and modeled data mart tables. Data marts are the tables that have been joined and cleaned from the source tables. This is effectively the latest layer of HelloDATA BE, which the Dashboards are accessing. Dashboards should not access any layer before (landing zone, data storage, or data processing).

We use CloudBeaver for this, same as the DWH Viewer later.

"},{"location":"manuals/user-manual/#data-engineering","title":"Data Engineering","text":""},{"location":"manuals/user-manual/#dwh-viewer","title":"DWH Viewer","text":"

This is essentially a database access layer where you see all your tables, and you can write SQL queries based on your access roles with a provided tool (CloudBeaver).

"},{"location":"manuals/user-manual/#create-new-sql-query","title":"Create new SQL Query","text":"

o

"},{"location":"manuals/user-manual/#choose-connection-and-stored-queries","title":"Choose Connection and stored queries","text":"

You can chose pre-defined connections and query your data warehouse. Also you can store queries that other user can see and use as well. Run your queries with (1).

"},{"location":"manuals/user-manual/#settings-and-powerful-features","title":"Settings and Powerful features","text":"

You can set many settings, such as user status, and many more.

Please find all setting and features in the CloudBeaver Documentation.

"},{"location":"manuals/user-manual/#orchestration","title":"Orchestration","text":"

The orchestrator is your task manager. You tell\u00a0Airflow, our orchestrator, in which order the task will run. This is usually done ahead of time, and in the portal, you can see the latest runs and their status (successful, failed, etc.).\u00a0

  • You can navigate to DAGs (2) and see all the details (3) with the DAG name, owner, runs, schedules, next run and recent.
  • You can also dive deeper into Datasets, Security, Admin or similar (4)
  • Airflow offers lots of different visualization modes, e.g. the Graph view (6), that allows you to see each step of this task.
    • As you can see, you can choose calendar, task duration, Gantt, etc.

"},{"location":"manuals/user-manual/#administration","title":"Administration","text":"

Here you manage the portal configurations such as user, roles, announcements, FAQs, and documentation management.

"},{"location":"manuals/user-manual/#benutzerverwaltung-user-management","title":"Benutzerverwaltung / User Management","text":""},{"location":"manuals/user-manual/#adding-user","title":"Adding user","text":"

First type your email and hit enter. Then choose the drop down and click on it.

Now type the Name and hit Berechtigungen setzen to add the user:

You should see something like this:

"},{"location":"manuals/user-manual/#changing-permissions","title":"Changing Permissions","text":"
  1. Search the user you want to give or change permission
  2. Scroll to the right
  3. Click the green edit icon

Now choose the role you want to give:

And or give access to specific data domains:

See more in role-authorization-concept.

"},{"location":"manuals/user-manual/#portal-rollenverwaltung-portal-role-management","title":"Portal Rollenverwaltung / Portal Role Management","text":"

In this portal role management, you can see all the roles that exist.

Warning

Creating new roles are not supported, despite the fact \"Rolle erstellen\" button exists. All roles are defined and hard coded.

"},{"location":"manuals/user-manual/#creating-a-new-role","title":"Creating a new role","text":"

See how to create a new role below:

"},{"location":"manuals/user-manual/#ankundigung-announcement","title":"Ank\u00fcndigung / Announcement","text":"

You can simply create an announcement that goes to all users by Ank\u00fcndigung erstellen:

Then you fill in your message. Save it.

You'll see a success if everything went well:

And this is how it looks to the users \u2014 It will appear until the user clicks the cross to close it.

"},{"location":"manuals/user-manual/#faq","title":"FAQ","text":"

The FAQ works the same as the announcements above. They are shown on the starting dashboard, but you can set the granularity of a data domain:

And this is how it looks:

"},{"location":"manuals/user-manual/#dokumentationsmanagement-documentation-management","title":"Dokumentationsmanagement / Documentation Management","text":"

Lastly, you can document the system with documentation management. Here you have one document that you can document everything in detail, and everyone can write to it. It will appear on the dashboard as well:

"},{"location":"manuals/user-manual/#monitoring","title":"Monitoring","text":"

We provide two different ways of monitoring:\u00a0

  • Status:\u00a0
  • Workspaces

"},{"location":"manuals/user-manual/#status","title":"Status","text":"

It will show you details information on instances of HelloDATA, how is the situation for the Portal, is the monitoring running, etc.

"},{"location":"manuals/user-manual/#data-domains","title":"Data Domains","text":"

In Monitoring your data domains you see each system and the link to the native application. You can easily and quickly observer permission, roles and users by different subsystems (1). Click the one you want, and you can choose different levels (2) for each, and see its permissions (3).

By clicking on the blue underlined DBT Docs, you will be navigated to the native dbt docs. Same is true if you click on a Airflow or Superset instance.

"},{"location":"manuals/user-manual/#devtools","title":"DevTools","text":"

DevTools are additional tools HelloDATA provides out of the box to e.g. send Mail (Mailbox) or browse files (FileBrowser).

"},{"location":"manuals/user-manual/#mailbox","title":"Mailbox","text":"

You can check in Mailbox (we use\u00a0MailHog) what emails have been sending or what accounts are updated.|

"},{"location":"manuals/user-manual/#filebrowser","title":"FileBrowser","text":"

Here you can browse all the documentation or code from the git repos as file browser. We use\u00a0FileBrowser\u00a0here. Please use with care, as some of the folder are system relevant.

Log in

Make sure you have the login credentials to log in. Your administrator should be able to provide these to you.

"},{"location":"manuals/user-manual/#more-know-how","title":"More: Know-How","text":"
  • More help for Superset
    • Superset Documentation
  • More help for dbt:
    • dbt Documentation
    • dbt Developer Hub
  • More about Airflow
    • Airflow Documentation

Find further important references, know-how, and best practices on\u00a0HelloDATA Know-How.

"},{"location":"more/changelog/","title":"Changelog","text":""},{"location":"more/changelog/#2023-11-22-concepts","title":"2023-11-22 Concepts","text":"
  • Added workspaces on the concepts page.
  • Added showcase main category to explain the demo that comes with HD-BE
"},{"location":"more/changelog/#2023-11-20-changed-corporate-design","title":"2023-11-20 Changed corporate design","text":"
  • Changed primary color to KAIO style guide: color red (#EE0F0F), and font: Roboto (was already default font)
"},{"location":"more/changelog/#2023-11-06-switched-architecture-over","title":"2023-11-06 Switched architecture over","text":"
  • Switched the architecture over to mkdocs
  • Updated vision
  • Updated user manual
"},{"location":"more/changelog/#2023-09-29-initial-version","title":"2023-09-29 Initial version","text":"
  • Created the template for documentation with mkdocs and the popular theme mkdocs-material.
"},{"location":"more/faq/","title":"FAQ","text":""},{"location":"more/glossary/","title":"Glossary","text":"
  • HD: HelloDATA
  • KAIO: Amt f\u00fcr Informatik und Organisation des Kantons Bern (KAIO)
"},{"location":"test/examples/","title":"Examples","text":""},{"location":"test/examples/#code-annotation-examples","title":"Code Annotation Examples","text":""},{"location":"test/examples/#codeblocks","title":"Codeblocks","text":"

Some code goes here.

"},{"location":"test/examples/#plain-codeblock","title":"Plain codeblock","text":"

A plain codeblock:

Some code here\ndef myfunction()\n// some comment\n
"},{"location":"test/examples/#code-for-a-specific-language","title":"Code for a specific language","text":"

Some more code with the py at the start:

import tensorflow as tf\ndef whatever()\n
"},{"location":"test/examples/#with-a-title","title":"With a title","text":"bubble_sort.py
def bubble_sort(items):\nfor i in range(len(items)):\nfor j in range(len(items) - 1 - i):\nif items[j] > items[j + 1]:\nitems[j], items[j + 1] = items[j + 1], items[j]\n
"},{"location":"test/examples/#with-line-numbers","title":"With line numbers","text":"
def bubble_sort(items):\nfor i in range(len(items)):\nfor j in range(len(items) - 1 - i):\nif items[j] > items[j + 1]:\nitems[j], items[j + 1] = items[j + 1], items[j]\n
"},{"location":"test/examples/#highlighting-lines","title":"Highlighting lines","text":"
def bubble_sort(items):\nfor i in range(len(items)):\nfor j in range(len(items) - 1 - i):\nif items[j] > items[j + 1]:\nitems[j], items[j + 1] = items[j + 1], items[j]\n
"},{"location":"test/examples/#admonitions-call-outs","title":"Admonitions / Call-outs","text":"

Note

this is a note

Phasellus posuere in sem ut cursus

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla et euismod nulla. Curabitur feugiat, tortor non consequat finibus, justo purus auctor massa, nec semper lorem quam in massa.

Supported types:

  • note
  • abstract
  • info
  • tip
  • success
  • question
  • warning
  • failure
  • danger
  • bug
  • example
  • quote
"},{"location":"test/examples/#diagrams","title":"Diagrams","text":"
graph LR\n  A[Start] --> B{Error?};\n  B -->|Yes| C[Hmm...];\n  C --> D[Debug];\n  D --> B;\n  B ---->|No| E[Yay!];
"},{"location":"test/examples/#sequence-diagram","title":"Sequence diagram","text":"

Sequence diagrams describe a specific scenario as sequential interactions between multiple objects or actors, including the messages that are exchanged between those actors:

sequenceDiagram\n  autonumber\n  Alice->>John: Hello John, how are you?\n  loop Healthcheck\n      John->>John: Fight against hypochondria\n  end\n  Note right of John: Rational thoughts!\n  John-->>Alice: Great!\n  John->>Bob: How about you?\n  Bob-->>John: Jolly good!
"},{"location":"test/examples/#state-diagram","title":"State diagram","text":"

State diagrams are a great tool to describe the behavior of a system, decomposing it into a finite number of states, and transitions between those states:

stateDiagram-v2\n  state fork_state <<fork>>\n    [*] --> fork_state\n    fork_state --> State2\n    fork_state --> State3\n\n    state join_state <<join>>\n    State2 --> join_state\n    State3 --> join_state\n    join_state --> State4\n    State4 --> [*]

"},{"location":"test/examples/#class-diagram","title":"Class diagram","text":"

Class diagrams are central to object oriented programing, describing the structure of a system by modelling entities as classes and relationships between them:

classDiagram\n  Person <|-- Student\n  Person <|-- Professor\n  Person : +String name\n  Person : +String phoneNumber\n  Person : +String emailAddress\n  Person: +purchaseParkingPass()\n  Address \"1\" <-- \"0..1\" Person:lives at\n  class Student{\n    +int studentNumber\n    +int averageMark\n    +isEligibleToEnrol()\n    +getSeminarsTaken()\n  }\n  class Professor{\n    +int salary\n  }\n  class Address{\n    +String street\n    +String city\n    +String state\n    +int postalCode\n    +String country\n    -validate()\n    +outputAsLabel()  \n  }
"},{"location":"test/examples/#entity-relationship-diagram","title":"Entity-relationship diagram","text":"

An entity-relationship diagram is composed of entity types and specifies relationships that exist between entities. It describes inter-related things in a specific domain of knowledge:

erDiagram\n  CUSTOMER ||--o{ ORDER : places\n  ORDER ||--|{ LINE-ITEM : contains\n  LINE-ITEM {\n    string name\n    int pricePerUnit\n  }\n  CUSTOMER }|..|{ DELIVERY-ADDRESS : uses
"},{"location":"test/examples/#icons-and-emojs","title":"Icons and Emojs","text":""},{"location":"vision/roadmap/","title":"Roadmap","text":"Feature Roadmap"},{"location":"vision/vision-and-goal/","title":"Our Vision and Goal","text":"

The Open-Source Enterprise Data Platform in a Single Portal

HelloDATA BE is an enterprise data platform built on top of open source. We use state-of-the-art tools such as dbt for data modeling with SQL and Airflow to run and orchestrate tasks and use Superset to visualize the BI dashboards. The underlying database is Postgres.

"},{"location":"vision/vision-and-goal/#vision","title":"Vision","text":"

In a fast-moving data engineering world, where every device and entity becomes a data generator, the need for agile, robust, and transparent data platforms is more crucial. HelloDATA BE is not just any data platform; it's the bridge between open-source innovation and enterprise solutions' demanding reliability.

HelloDATA BE handpicked the best tools like dbt, Airflow, Superset, and Postgres and integrated them into a seamless, enterprise-ready data solution. Empowering businesses with the agility of open-source and the dependability of a tested, unified platform.

"},{"location":"vision/vision-and-goal/#the-goal-of-hellodata-be","title":"The Goal of HelloDATA BE","text":"

Our goal at HelloDATA BE is clear: to democratize the power of data for enterprises.

As digital transformation and data expand, the challenges with various SaaS solutions, vendor lock-ins, and fragmented data sources become apparent.

HelloDATA BE trying to provide an answer to these challenges. We aim to merge the world's best open-source tools, refining them for enterprise standards ensuring that every organization, irrespective of size or niche, has access to top-tier data solutions. By fostering a community-driven approach through our open-source commitment, we envision a data future that's inclusive, robust, and open to innovation.

"}]} \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml new file mode 100644 index 00000000..bbc90c2c --- /dev/null +++ b/sitemap.xml @@ -0,0 +1,78 @@ + + + + None + 2024-05-08 + daily + + + None + 2024-05-08 + daily + + + None + 2024-05-08 + daily + + + None + 2024-05-08 + daily + + + None + 2024-05-08 + daily + + + None + 2024-05-08 + daily + + + None + 2024-05-08 + daily + + + None + 2024-05-08 + daily + + + None + 2024-05-08 + daily + + + None + 2024-05-08 + daily + + + None + 2024-05-08 + daily + + + None + 2024-05-08 + daily + + + None + 2024-05-08 + daily + + + None + 2024-05-08 + daily + + + None + 2024-05-08 + daily + + \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz new file mode 100644 index 00000000..1e956089 Binary files /dev/null and b/sitemap.xml.gz differ diff --git a/stylesheets/extra.css b/stylesheets/extra.css new file mode 100644 index 00000000..0c35d193 --- /dev/null +++ b/stylesheets/extra.css @@ -0,0 +1,34 @@ +/** + * Copyright © 2024, Kanton Bern + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of the nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL BE LIABLE FOR ANY + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +:root { + --md-primary-fg-color: #EE0F0F; + --md-primary-fg-color--light: #EE0F0F; + --md-primary-fg-color--dark: #EE0F0F; + --md-text-font: Roboto !important; + +} + diff --git a/test/examples/index.html b/test/examples/index.html new file mode 100644 index 00000000..44a3fc4a --- /dev/null +++ b/test/examples/index.html @@ -0,0 +1,910 @@ + + + + + + + + + + + + + + + + + + Examples - HelloDATA BE Docs + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + +
+ + +
+ +
+ + + + + + + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + +

Examples

+

Code Annotation Examples

+

Codeblocks

+

Some code goes here.

+

Plain codeblock

+

A plain codeblock:

+
Some code here
+def myfunction()
+// some comment
+
+

Code for a specific language

+

Some more code with the py at the start:

+
import tensorflow as tf
+def whatever()
+
+

With a title

+
bubble_sort.py
def bubble_sort(items):
+    for i in range(len(items)):
+        for j in range(len(items) - 1 - i):
+            if items[j] > items[j + 1]:
+                items[j], items[j + 1] = items[j + 1], items[j]
+
+

With line numbers

+
1
+2
+3
+4
+5
def bubble_sort(items):
+    for i in range(len(items)):
+        for j in range(len(items) - 1 - i):
+            if items[j] > items[j + 1]:
+                items[j], items[j + 1] = items[j + 1], items[j]
+
+

Highlighting lines

+
def bubble_sort(items):
+    for i in range(len(items)):
+        for j in range(len(items) - 1 - i):
+            if items[j] > items[j + 1]:
+                items[j], items[j + 1] = items[j + 1], items[j]
+
+

Admonitions / Call-outs

+
+

Note

+

this is a note

+
+
+

Phasellus posuere in sem ut cursus

+

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla et euismod +nulla. Curabitur feugiat, tortor non consequat finibus, justo purus auctor +massa, nec semper lorem quam in massa.

+
+

Supported types:

+
    +
  • note
  • +
  • abstract
  • +
  • info
  • +
  • tip
  • +
  • success
  • +
  • question
  • +
  • warning
  • +
  • failure
  • +
  • danger
  • +
  • bug
  • +
  • example
  • +
  • quote
  • +
+

Diagrams

+
graph LR
+  A[Start] --> B{Error?};
+  B -->|Yes| C[Hmm...];
+  C --> D[Debug];
+  D --> B;
+  B ---->|No| E[Yay!];
+

Sequence diagram

+

Sequence diagrams describe a specific scenario as sequential interactions between multiple objects or actors, including the messages that are exchanged between those actors:

+
sequenceDiagram
+  autonumber
+  Alice->>John: Hello John, how are you?
+  loop Healthcheck
+      John->>John: Fight against hypochondria
+  end
+  Note right of John: Rational thoughts!
+  John-->>Alice: Great!
+  John->>Bob: How about you?
+  Bob-->>John: Jolly good!
+

State diagram

+

State diagrams are a great tool to describe the behavior of a system, decomposing it into a finite number of states, and transitions between those states: +

stateDiagram-v2
+  state fork_state <<fork>>
+    [*] --> fork_state
+    fork_state --> State2
+    fork_state --> State3
+
+    state join_state <<join>>
+    State2 --> join_state
+    State3 --> join_state
+    join_state --> State4
+    State4 --> [*]

+

Class diagram

+

Class diagrams are central to object oriented programing, describing the structure of a system by modelling entities as classes and relationships between them:

+
classDiagram
+  Person <|-- Student
+  Person <|-- Professor
+  Person : +String name
+  Person : +String phoneNumber
+  Person : +String emailAddress
+  Person: +purchaseParkingPass()
+  Address "1" <-- "0..1" Person:lives at
+  class Student{
+    +int studentNumber
+    +int averageMark
+    +isEligibleToEnrol()
+    +getSeminarsTaken()
+  }
+  class Professor{
+    +int salary
+  }
+  class Address{
+    +String street
+    +String city
+    +String state
+    +int postalCode
+    +String country
+    -validate()
+    +outputAsLabel()  
+  }
+

Entity-relationship diagram

+

An entity-relationship diagram is composed of entity types and specifies relationships that exist between entities. It describes inter-related things in a specific domain of knowledge:

+
erDiagram
+  CUSTOMER ||--o{ ORDER : places
+  ORDER ||--|{ LINE-ITEM : contains
+  LINE-ITEM {
+    string name
+    int pricePerUnit
+  }
+  CUSTOMER }|..|{ DELIVERY-ADDRESS : uses
+

Icons and Emojs

+

😄

+

+

+

+ + + + + + +
+
+ + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/vision/roadmap/index.html b/vision/roadmap/index.html new file mode 100644 index 00000000..19c50db8 --- /dev/null +++ b/vision/roadmap/index.html @@ -0,0 +1,793 @@ + + + + + + + + + + + + + + + + + + + + + + Roadmap - HelloDATA BE Docs + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + +
+ + +
+ +
+ + + + + + + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + +

Roadmap

+
+

Image title +

+
Feature Roadmap
+
+ + + + + + +
+
+ + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file diff --git a/vision/vision-and-goal/index.html b/vision/vision-and-goal/index.html new file mode 100644 index 00000000..cea6dec0 --- /dev/null +++ b/vision/vision-and-goal/index.html @@ -0,0 +1,837 @@ + + + + + + + + + + + + + + + + + + + + + + Vision - HelloDATA BE Docs + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + +
+ + +
+ +
+ + + + + + + + + +
+
+ + + +
+
+
+ + + + + + + + +
+
+
+ + + + +
+
+ + + + + +

Our Vision and Goal

+
+

The Open-Source Enterprise Data Platform in a Single Portal

+
+

HelloDATA BE is an enterprise data platform built on top of open source. We use state-of-the-art tools such as dbt for data modeling with SQL and Airflow to run and orchestrate tasks and use Superset to visualize the BI dashboards. The underlying database is Postgres.

+

Vision

+

In a fast-moving data engineering world, where every device and entity becomes a data generator, the need for agile, robust, and transparent data platforms is more crucial. HelloDATA BE is not just any data platform; it's the bridge between open-source innovation and enterprise solutions' demanding reliability.

+

HelloDATA BE handpicked the best tools like dbt, Airflow, Superset, and Postgres and integrated them into a seamless, enterprise-ready data solution. Empowering businesses with the agility of open-source and the dependability of a tested, unified platform.

+

The Goal of HelloDATA BE

+

Our goal at HelloDATA BE is clear: to democratize the power of data for enterprises.

+

As digital transformation and data expand, the challenges with various SaaS solutions, vendor lock-ins, and fragmented data sources become apparent.

+

HelloDATA BE trying to provide an answer to these challenges. We aim to merge the world's best open-source tools, refining them for enterprise standards ensuring that every organization, irrespective of size or niche, has access to top-tier data solutions. By fostering a community-driven approach through our open-source commitment, we envision a data future that's inclusive, robust, and open to innovation.

+ + + + + + +
+
+ + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + \ No newline at end of file