Enterprise Data Management, powered by Metadata!
Data is a key asset for enterprises, yet finding the right data on the right systems can be elusive. AdaptiveScale allows you to leverage the metadata from your data sources to discover and organize data. AdaptiveScale helps catalog all the data sources and files that a user wants to discover and centralizes that metadata in a single location. Catalog generation with tagging and labeling is supported down to column level and the scheduling is customizable for catalog triggers. Scheduling is an important part oft he catalloging process because as data is mutated in respective data sources something needs to keep track of what changes are taking place and captures that lienage. All your metadate is searchable and it's powered by ElasticSearch. Users can also query and browse any JDBC compliant database in the SQL Explorer. You can also use your catalog as a curated source for transfering the data, with associated filtering and governance policies applied to cloud storage or Hadoop HDFS. Cloud storage support includes Amazon S3, Azure Data Lake Storage and Google Cloud Storage.
The dashboard provides general statistic for how many data sources the user has crawled, how many catalogs have been created, and how many tags or lables are created. The AdaptiveScale JDBC driver which you can download directly from the dashboard allows you to connect to AdaptiveScale using your favorite JDBC compliant development environment to browse and query cataloged data sources with SQL. To help the users get more familiar with AdaptiveScale, there are links to User Guides, Tutorials and Documentation. The user interface is organized in the following categories: Data Sources, Catalogs, Tags, Schedules, Schema Evolution, Search, SQL Explorer and Data Tranfer.
Data Sources is the category where the user defines a data source connection. There are six options to choose from; a JDBC driver, file, file transfer protocol(FTP), Google Bucket, Amazon S3, and Microsoft Azure. The supported database servers are: MySQL, MS SQL Server, Postgres, Oracle, BigQuery, and Generic driver. The Generic driver expands the database list to any JDBC compliant database so you can connect to any database that provides a JDBC driver. To create a new data source connection fill in the connection name, description, a host and a port for the server, database name for the utilized database server with a username and password, define a driver class name and upload a JDBC driver. All these fields are saved and the newly created data source connection is shown in a list of Data Source connections. Once saved the connection will be displayed in a list where you have the option to edit or delete it. For a new file or FTP data source connection fill in the connection name, choose a URI of the file, and from the dropdown choose CSV or AVRO, which should be same as the type of the file. The same applies for the other connections that are left with some small differences, for example, for Google Bucket data source connection you must add the Service Account, for Amazon S3 you must provide the Access Key ID and Secret Access Key and for Microsoft Azure you should provide Account Name, Container Name and SAS Token.
Catalogs is where you create the catalog based on the metadata of the data source. In catalog generation, tagging and labeling are supported. To create a new catalog the user needs to fill the catalog name, which needs to be a unique name to prevent any kind of naming collisions, choose a data source from existing ones or the newly created from where all the columns, tables are going to be retrieved. In the Details section you have the option to fetch the total counts of records within the data source. Record counts are fetched every time a scheduled catalog is triggered and will automatically be included in the catalog metadata. There are two important features inside the catalog category, one is for tagging and labeling, and the other for scheduling, where you can select a time interval for when the catalog crawler shoudl be run. All these changes are saved and the new catalog is shown in a list of catalogs with fields that have all the statistics calculated and set in the previous step, including how many tables and columns were crawled, when it was last run, next run and options to edit, delete and run the catalog. With run option, the crawler retrieves all the information from the data source and it applies the catalog rules set.
Schedules lists all active and inactive crawlers defined for catalogs, with information like the catalog name, last run, next scheduled run, and options to activate or deactivate a schedule, and the option to delete it.
Schema evolution allows you to choose a catalog and see the changes that have been made to that catalog over time. The lineage can be seen very clearly; for example when the user selects a date, the column or table that has been changed is highlighted with highlight colors.
Search schreen allows you to search by tags, field names, table names, data source names, and more. Here’s a list of supported filters: string, wild card:
tag: <tag name (string)>
key: <key name (string)>
table: <table name>
column: <field name>
data source: <data source name>
If for example, you searche for a tag name, it retrieves that tag with how many matches it finds. The user can also see the data lineage from the search section when clicking on the column.
To understand more about it, see the sections on the right.
SQL Explorer allows you to choose a JDBC data source connection, view the schema and query or browse any JDBC compliant database in the SQL.
Data Transfer is used for exporting data from a selected catalog to cloud storage and HDFS, including Google Cloud Storage, Amazon S3, and Azure Data Lake Storage.