Latest DCDEP Practice Tests with Actual Questions

Get Complete pool of questions with Premium PDF and Test Engine

Exam Code : DCDEP
Exam Name : Databricks Certified Data Engineer Professional
Vendor Name : "Databrick"







DCDEP Dumps DCDEP Braindumps DCDEP Real Questions DCDEP Practice Test

DCDEP Actual Questions


killexams.com Databrick DCDEP


Databricks Certified Data Engineer Professional


https://killexams.com/pass4sure/exam-detail/DCDEP


Question: 21


A DELTA LIVE TABLE pipelines can be scheduled to run in two different modes, what are these two different modes?


  1. Triggered, Incremental

  2. Once, Continuous

  3. Triggered, Continuous

  4. Once, Incremental

  5. Continuous, Incremental




Answer: C



Explanation:


The answer is Triggered, Continuous


https://docs.microsoft.com/en-us/azure/databricks/data-engineering/delta-live-tables/delta-live-tables-concepts#-- continuous-and-triggered-pipelines


⢠Triggered pipelines update each table with whatever data is currently available and then stop the cluster running the pipeline. Delta Live Tables automatically analyzes the dependencies between your tables and starts by computing those that read from external sources. Tables within the pipeline are updated after their dependent data sources have been updated.


⢠Continuous pipelines update tables continuously as input data changes. Once an update is started, it continues to run until manually stopped. Continuous pipelines require an always-running cluster but ensure that downstream consumers have the most up-to-date data.



Question: 22


Which of the following developer operations in CI/CD flow can be implemented in Databricks Re-pos?


  1. Merge when code is committed

  2. Pull request and review process

  3. Trigger Databricks Repos API to pull the latest version of code into production folder

  4. Resolve merge conflicts

  5. Delete a branch




Answer: C



Explanation:


See the below diagram to understand the role Databricks Repos and Git provider plays when building a CI/CD workflow.


All the steps highlighted in yellow can be done Databricks Repo, all the steps highlighted in Gray are done in a git provider like Github or Azure DevOps


Question: 23


Identify one of the below statements that can query a delta table in PySpark Dataframe API


  1. Spark.read.mode("delta").table("table_name")

  2. Spark.read.table.delta("table_name")

  3. Spark.read.table("table_name")

  4. Spark.read.format("delta").LoadTableAs("table_name")

  5. Spark.read.format("delta").TableAs("table_name")




Answer: C
Question: 24

How VACCUM and OPTIMIZE commands can be used to manage the DELTA lake?


  1. VACCUM command can be used to compact small parquet files, and the OP-TIMZE command can be used to delete parquet files that are marked for dele-tion/unused.

  2. VACCUM command can be used to delete empty/blank parquet files in a delta table. OPTIMIZE command can be used to update stale statistics on a delta table.

  3. VACCUM command can be used to compress the parquet files to reduce the size of the table, OPTIMIZE command can be used to cache frequently delta tables for better performance.

  4. VACCUM command can be used to delete empty/blank parquet files in a delta table, OPTIMIZE command can be used to cache frequently delta tables for better performance.

  5. OPTIMIZE command can be used to compact small parquet files, and the VAC-CUM command can be used to delete parquet files that are marked for deletion/unused. (Correct)




Answer: E
Explanation: VACCUM:

You can remove files no longer referenced by a Delta table and are older than the retention thresh-old by running the vacuum command on the table. vacuum is not triggered automatically. The de-fault retention threshold for the files is 7 days. To change this behavior, see Configure data retention for time travel.


OPTIMIZE:


Using OPTIMIZE you can compact data files on Delta Lake, this can improve the speed of read queries on the table. Too many small files can significantly degrade the performance of the query.



Question: 25


Which of the following statements are correct on how Delta Lake implements a lake house?


  1. Delta lake uses a proprietary format to write data, optimized for cloud storage

  2. Using Apache Hadoop on cloud object storage

  3. Delta lake always stores meta data in memory vs storage

  4. Delta lake uses open source, open format, optimized cloud storage and scalable meta data

  5. Delta lake stores data and meta data in computes memory



Answer: D



Explanation:


Delta lake is


⢠Open source


⢠Builds up on standard data format ⢠Optimized for cloud object storage

⢠Built for scalable metadata handling Delta lake is not ⢠Proprietary technology

⢠Storage format ⢠Storage medium

⢠Database service or data warehouse



Question: 26


What are the different ways you can schedule a job in Databricks workspace?


  1. Continuous, Incremental

  2. On-Demand runs, File notification from Cloud object storage

  3. Cron, On Demand runs

  4. Cron, File notification from Cloud object storage

  5. Once, Continuous




Answer: C



Explanation:


The answer is, Cron, On-Demand runs


Supports running job immediately or using can be scheduled using CRON syntax



Question: 27


Which of the following type of tasks cannot setup through a job?


  1. Notebook

  2. DELTA LIVE PIPELINE

  3. Spark Submit

  4. Python

  5. Databricks SQL Dashboard refresh



Answer: E
Question: 28

Which of the following describes how Databricks Repos can help facilitate CI/CD workflows on the Databricks Lakehouse Platform?


  1. Databricks Repos can facilitate the pull request, review, and approval process before merging branches

  2. Databricks Repos can merge changes from a secondary Git branch into a main Git branch

  3. Databricks Repos can be used to design, develop, and trigger Git automation pipelines

  4. Databricks Repos can store the single-source-of-truth Git repository

  5. Databricks Repos can commit or push code changes to trigger a CI/CD process




Answer: E



Explanation:


Answer is Databricks Repos can commit or push code changes to trigger a CI/CD process See below diagram to understand the role Databricks Repos and Git provider plays when building a CI/CD workdlow.


All the steps highlighted in yellow can be done Databricks Repo, all the steps highlighted in Gray are done in a git provider like Github or Azure Devops.


Diagram


Description automatically generated