Exam Code : DCDEP
Exam Name : Databricks Certified Data Engineer Professional
Vendor Name :
"Databrick"
DCDEP Dumps DCDEP Braindumps DCDEP Real Questions DCDEP Practice Test
DCDEP Actual Questions
killexams.com Databrick DCDEP
Databricks Certified Data Engineer Professional
https://killexams.com/pass4sure/exam-detail/DCDEP
A DELTA LIVE TABLE pipelines can be scheduled to run in two different modes, what are these two different modes?
Triggered, Incremental
Once, Continuous
Triggered, Continuous
Once, Incremental
Continuous, Incremental
Explanation:
The answer is Triggered, Continuous
https://docs.microsoft.com/en-us/azure/databricks/data-engineering/delta-live-tables/delta-live-tables-concepts#-- continuous-and-triggered-pipelines
⢠Triggered pipelines update each table with whatever data is currently available and then stop the cluster running the pipeline. Delta Live Tables automatically analyzes the dependencies between your tables and starts by computing those that read from external sources. Tables within the pipeline are updated after their dependent data sources have been updated.
⢠Continuous pipelines update tables continuously as input data changes. Once an update is started, it continues to run until manually stopped. Continuous pipelines require an always-running cluster but ensure that downstream consumers have the most up-to-date data.
Which of the following developer operations in CI/CD flow can be implemented in Databricks Re-pos?
Merge when code is committed
Pull request and review process
Trigger Databricks Repos API to pull the latest version of code into production folder
Resolve merge conflicts
Delete a branch
Explanation:
See the below diagram to understand the role Databricks Repos and Git provider plays when building a CI/CD workflow.
All the steps highlighted in yellow can be done Databricks Repo, all the steps highlighted in Gray are done in a git provider like Github or Azure DevOps
Identify one of the below statements that can query a delta table in PySpark Dataframe API
Spark.read.mode("delta").table("table_name")
Spark.read.table.delta("table_name")
Spark.read.table("table_name")
Spark.read.format("delta").LoadTableAs("table_name")
Spark.read.format("delta").TableAs("table_name")
How VACCUM and OPTIMIZE commands can be used to manage the DELTA lake?
VACCUM command can be used to compact small parquet files, and the OP-TIMZE command can be used to delete parquet files that are marked for dele-tion/unused.
VACCUM command can be used to delete empty/blank parquet files in a delta table. OPTIMIZE command can be used to update stale statistics on a delta table.
VACCUM command can be used to compress the parquet files to reduce the size of the table, OPTIMIZE command can be used to cache frequently delta tables for better performance.
VACCUM command can be used to delete empty/blank parquet files in a delta table, OPTIMIZE command can be used to cache frequently delta tables for better performance.
OPTIMIZE command can be used to compact small parquet files, and the VAC-CUM command can be used to delete parquet files that are marked for deletion/unused. (Correct)
You can remove files no longer referenced by a Delta table and are older than the retention thresh-old by running the vacuum command on the table. vacuum is not triggered automatically. The de-fault retention threshold for the files is 7 days. To change this behavior, see Configure data retention for time travel.
OPTIMIZE:
Using OPTIMIZE you can compact data files on Delta Lake, this can improve the speed of read queries on the table. Too many small files can significantly degrade the performance of the query.
Which of the following statements are correct on how Delta Lake implements a lake house?
Delta lake uses a proprietary format to write data, optimized for cloud storage
Using Apache Hadoop on cloud object storage
Delta lake always stores meta data in memory vs storage
Delta lake uses open source, open format, optimized cloud storage and scalable meta data
Delta lake stores data and meta data in computes memory
Explanation:
Delta lake is
⢠Open source
⢠Builds up on standard data format ⢠Optimized for cloud object storage
⢠Built for scalable metadata handling Delta lake is not ⢠Proprietary technology
⢠Storage format ⢠Storage medium
⢠Database service or data warehouse
What are the different ways you can schedule a job in Databricks workspace?
Continuous, Incremental
On-Demand runs, File notification from Cloud object storage
Cron, On Demand runs
Cron, File notification from Cloud object storage
Once, Continuous
Explanation:
The answer is, Cron, On-Demand runs
Supports running job immediately or using can be scheduled using CRON syntax
Which of the following type of tasks cannot setup through a job?
Notebook
DELTA LIVE PIPELINE
Spark Submit
Python
Databricks SQL Dashboard refresh
Which of the following describes how Databricks Repos can help facilitate CI/CD workflows on the Databricks Lakehouse Platform?
Databricks Repos can facilitate the pull request, review, and approval process before merging branches
Databricks Repos can merge changes from a secondary Git branch into a main Git branch
Databricks Repos can be used to design, develop, and trigger Git automation pipelines
Databricks Repos can store the single-source-of-truth Git repository
Databricks Repos can commit or push code changes to trigger a CI/CD process
Explanation:
Answer is Databricks Repos can commit or push code changes to trigger a CI/CD process See below diagram to understand the role Databricks Repos and Git provider plays when building a CI/CD workdlow.
All the steps highlighted in yellow can be done Databricks Repo, all the steps highlighted in Gray are done in a git provider like Github or Azure Devops.
Diagram
Description automatically generated