Which steps are recommended best practices for prioritizing cluster keys in Snowflake? (Choose two.)
Choose columns that are frequently used in join predicates.
Choose lower cardinality columns to support clustering keys and cost effectiveness.
Choose TIMESTAMP columns with nanoseconds for the highest number of unique rows.
Choose cluster columns that are most actively used in selective filters.
Choose cluster columns that are actively used in the GROUP BY clauses.
The Answer Is:
A, DExplanation:
According to the Snowflake documentation, the best practices for choosing clustering keys are:
Choose columns that are frequently used in join predicates. This can improve the join performance by reducing the number of micro-partitions that need to be scanned and joined.
Choose columns that are most actively used in selective filters. This can improve the scan efficiency by skipping micro-partitions that do not match the filter predicates.
Avoid using low cardinality columns, such as gender or country, as clustering keys. This can result in poor clustering and high maintenance costs.
Avoid using TIMESTAMP columns with nanoseconds, as they tend to have very high cardinality and low correlation with other columns. This can also result in poor clustering and high maintenance costs.
Avoid using columns with duplicate values or NULLs, as they can cause skew in the clustering and reduce the benefits of pruning.
Cluster on multiple columns if the queries use multiple filters or join predicates. This can increase the chances of pruning more micro-partitions and improve the compression ratio.
Clustering is not always useful, especially for small or medium-sized tables, or tables that are not frequently queried or updated. Clustering can incur additional costs for initially clustering the data and maintaining the clustering over time.
References:
Clustering Keys & Clustered Tables | Snowflake Documentation
[Considerations for Choosing Clustering for a Table | Snowflake Documentation]
Which data models can be used when modeling tables in a Snowflake environment? (Select THREE).
Graph model
Dimensional/Kimball
Data lake
lnmon/3NF
Bayesian hierarchical model
Data vault
The Answer Is:
B, D, FExplanation:
Snowflake is a cloud data platform that supports various data models for modeling tables in a Snowflake environment. The data models can be classified into two categories: dimensional and normalized. Dimensional data models are designed to optimize query performance and ease of use for business intelligence and analytics. Normalized data models are designed to reduce data redundancy and ensure data integrity for transactional and operational systems. The following are some of the data models that can be used in Snowflake:
Dimensional/Kimball: This is a popular dimensional data model that uses a star or snowflake schema to organize data into fact and dimension tables. Fact tables store quantitative measures and foreign keys to dimension tables. Dimension tables store descriptive attributes and hierarchies. A star schema has a single denormalized dimension table for each dimension, while a snowflake schema has multiple normalized dimension tables for each dimension. Snowflake supports both star and snowflake schemas, and allows users to create views and joins to simplify queries.
Inmon/3NF: This is a common normalized data model that uses a third normal form (3NF) schema to organize data into entities and relationships. 3NF schema eliminates data duplication and ensures data consistency by applying three rules: 1) every column in a table must depend on the primary key, 2) every column in a table must depend on the whole primary key, not a part of it, and 3) every column in a table must depend only on the primary key, not on other columns. Snowflake supports 3NF schema and allows users to create referential integrity constraints and foreign key relationships to enforce data quality.
Data vault: This is a hybrid data model that combines the best practices of dimensional and normalized data models to create a scalable, flexible, and resilient data warehouse. Data vault schema consists of three types of tables: hubs, links, and satellites. Hubs store business keys and metadata for each entity. Links store associations and relationships between entities. Satellites store descriptive attributes and historical changes for each entity or relationship. Snowflake supports data vault schema and allows users to leverage its features such as time travel, zero-copy cloning, and secure data sharing to implement data vault methodology.
References: What is Data Modeling? | Snowflake, Snowflake Schema in Data Warehouse Model - GeeksforGeeks, [Data Vault 2.0 Modeling with Snowflake]
Which feature provides the capability to define an alternate cluster key for a table with an existing cluster key?
External table
Materialized view
Search optimization
Result cache
The Answer Is:
BExplanation:
A materialized view is a feature that provides the capability to define an alternate cluster key for a table with an existing cluster key. A materialized view is a pre-computed result set that is stored in Snowflake and can be queried like a regular table. A materialized view can have a different cluster key than the base table, which can improve the performance and efficiency of queries on the materialized view. A materialized view can also support aggregations, joins, and filters on the base table data. A materialized view is automatically refreshed when the underlying data in the base table changes, as long as the AUTO_REFRESH parameter is set to true1.
References:
Materialized Views | Snowflake Documentation
What is the MOST efficient way to design an environment where data retention is not considered critical, and customization needs are to be kept to a minimum?
Use a transient database.
Use a transient schema.
Use a transient table.
Use a temporary table.
The Answer Is:
AExplanation:
Transient databases in Snowflake are designed for situations where data retention is not critical, and they do not have the fail-safe period that regular databases have. This means that data in a transient database is not recoverable after the Time Travel retention period. Using a transient database is efficient because it minimizes storage costs while still providing most functionalities of a standard database without the overhead of data protection features that are not needed when data retention is not a concern.
Based on the Snowflake object hierarchy, what securable objects belong directly to a Snowflake account? (Select THREE).
Database
Schema
Table
Stage
Role
Warehouse
The Answer Is:
A, E, FExplanation:
A securable object is an entity to which access can be granted in Snowflake. Securable objects include databases, schemas, tables, views, stages, pipes, functions, procedures, sequences, tasks, streams, roles, warehouses, and shares1.
The Snowflake object hierarchy is a logical structure that organizes the securable objects in a nested manner. The top-most container is the account, which contains all the databases, roles, and warehouses for the customer organization. Each database contains schemas, which in turn contain tables, views, stages, pipes, functions, procedures, sequences, tasks, and streams. Each role can be granted privileges on other roles or securable objects. Each warehouse can be used to execute queries on securable objects2.
Based on the Snowflake object hierarchy, the securable objects that belong directly to a Snowflake account are databases, roles, and warehouses. These objects are created and managed at the account level, and do not depend on any other securable object. The other options are not correct because:
Schemas belong to databases, not to accounts. A schema must be created within an existing database3.
Tables belong to schemas, not to accounts. A table must be created within an existing schema4.
Stages belong to schemas or tables, not to accounts. A stage must be created within an existing schema or table.
References:
1: Overview of Access Control | Snowflake Documentation
2: Securable Objects | Snowflake Documentation
3: CREATE SCHEMA | Snowflake Documentation
4: CREATE TABLE | Snowflake Documentation
[5]: CREATE STAGE | Snowflake Documentation
The data share exists between a data provider account and a data consumer account. Five tables from the provider account are being shared with the consumer account. The consumer role has been granted the imported privileges privilege.
What will happen to the consumer account if a new table (table_6) is added to the provider schema?
The consumer role will automatically see the new table and no additional grants are needed.
The consumer role will see the table only after this grant is given on the consumer side:
grant imported privileges on database PSHARE_EDW_4TEST_DB to DEV_ROLE;
The consumer role will see the table only after this grant is given on the provider side:
use role accountadmin;
Grant select on table EDW.ACCOUNTING.Table_6 to share PSHARE_EDW_4TEST;
The consumer role will see the table only after this grant is given on the provider side:
use role accountadmin;
grant usage on database EDW to share PSHARE_EDW_4TEST ;
grant usage on schema EDW.ACCOUNTING to share PSHARE_EDW_4TEST ;
Grant select on table EDW.ACCOUNTING.Table_6 to database PSHARE_EDW_4TEST_DB ;
The Answer Is:
DExplanation:
When a new table (table_6) is added to a schema in the provider's account that is part of a data share, the consumer will not automatically see the new table. The consumer will only be able to access the new table once the appropriate privileges are granted by the provider. The correct process, as outlined in option D, involves using the provider’s ACCOUNTADMIN role to grant USAGE privileges on the database and schema, followed by SELECT privileges on the new table, specifically to the share that includes the consumer's database. This ensures that the consumer account can access the new table under the established data sharing setup.References:
Snowflake Documentation on Managing Access Control
Snowflake Documentation on Data Sharing
Following objects can be cloned in snowflake
Permanent table
Transient table
Temporary table
External tables
Internal stages
The Answer Is:
A, B, DExplanation:
Snowflake supports cloning of various objects, such as databases, schemas, tables, stages, file formats, sequences, streams, tasks, and roles. Cloning creates a copy of an existing object in the system without copying the data or metadata. Cloning is also known as zero-copy cloning1.
Among the objects listed in the question, the following ones can be cloned in Snowflake:
Permanent table: A permanent table is a type of table that has a Fail-safe period and a Time Travel retention period of up to 90 days. A permanent table can be cloned using the CREATE TABLE … CLONE command2. Therefore, option A is correct.
Transient table: A transient table is a type of table that does not have a Fail-safe period and can have a Time Travel retention period of either 0 or 1 day. A transient table can also be cloned using the CREATE TABLE … CLONE command2. Therefore, option B is correct.
External table: An external table is a type of table that references data files stored in an external location, such as Amazon S3, Google Cloud Storage, or Microsoft Azure Blob Storage. An external table can be cloned using the CREATE EXTERNAL TABLE … CLONE command3. Therefore, option D is correct.
The following objects listed in the question cannot be cloned in Snowflake:
Temporary table: A temporary table is a type of table that is automatically dropped when the session ends or the current user logs out. Temporary tables do not support cloning4. Therefore, option C is incorrect.
Internal stage: An internal stage is a type of stage that is managed by Snowflake and stores files in Snowflake’s internal cloud storage. Internal stages do not support cloning5. Therefore, option E is incorrect.
References: : Cloning Considerations : CREATE TABLE … CLONE : CREATE EXTERNAL TABLE … CLONE : Temporary Tables : Internal Stages
When loading data from stage using COPY INTO, what options can you specify for the ON_ERROR clause?
CONTINUE
SKIP_FILE
ABORT_STATEMENT
FAIL
The Answer Is:
A, B, CExplanation:
The ON_ERROR clause is an optional parameter for the COPY INTO command that specifies the behavior of the command when it encounters errors in the files. The ON_ERROR clause can have one of the following values1:
CONTINUE: This value instructs the command to continue loading the file and return an error message for a maximum of one error encountered per data file. The difference between the ROWS_PARSED and ROWS_LOADED column values represents the number of rows that include detected errors. To view all errors in the data files, use the VALIDATION_MODE parameter or query the VALIDATE function1.
SKIP_FILE: This value instructs the command to skip the file when it encounters a data error on any of the records in the file. The command moves on to the next file in the stage and continues loading. The skipped file is not loaded and no error message is returned for the file1.
ABORT_STATEMENT: This value instructs the command to stop loading data when the first error is encountered. The command returns an error message for the file and aborts the load operation. This is the default value for the ON_ERROR clause1.
Therefore, options A, B, and C are correct.
References: : COPY INTO