Generative AI Engineer

COMMENTS

STATISTICS

RECORDS

TAKE THE TEST

Title of test:

Generative AI Engineer

Description:
Databricks Generative AI Engineer Exam

Author:

John Badrun

Other tests from this author

Creation Date: 2025/12/26

Category: Computers

Number of questions: 92

Rating:

(0)

Share the Test:

Nuevo Comentario

New Comment
NO RECORDS

Content:

A Generative Al Engineer has created a RAG application to look up answers to questions about a series of fantasy novels that are being asked on the author’s web forum. The fantasy novel texts are chunked and embedded into a vector store with metadata (page number, chapter number, book title), retrieved with the user’s query, and provided to an LLM for response generation. The Generative AI Engineer used their intuition to pick the chunking strategy and associated configurations but now wants to more methodically choose the best values. Which TWO strategies should the Generative AI Engineer take to optimize their chunking strategy and parameters? (Choose two.). Change embedding models and compare performance. Add a classifier for user queries that predicts which book will best contain the answer. Use this to filter retrieval. Choose an appropriate evaluation metric (such as recall or NDCG) and experiment with changes in the chunking strategy, such as splitting chunks by paragraphs or chapters. Choose the strategy that gives the best performance metric. Pass known questions and best answers to an LLM and instruct the LLM to provide the best token count. Use a summary statistic (mean, median, etc.) of the best token counts to choose chunk size. Create an LLM-as-a-judge metric to evaluate how well previous questions are answered by the most appropriate chunk. Optimize the chunking parameters based upon the values of the metric.

A Generative AI Engineer is designing a RAG application for answering user questions on technical regulations as they learn a new sport. What are the steps needed to build this RAG application and deploy it?. Ingest documents from a source –> Index the documents and saves to Vector Search –> User submits queries against an LLM –> LLM retrieves relevant documents –> Evaluate model –> LLM generates a response –> Deploy it using Model Serving. Ingest documents from a source –> Index the documents and save to Vector Search –> User submits queries against an LLM –> LLM retrieves relevant documents –> LLM generates a response -> Evaluate model –> Deploy it using Model Serving. Ingest documents from a source –> Index the documents and save to Vector Search –> Evaluate model –> Deploy it using Model Serving. User submits queries against an LLM –> Ingest documents from a source –> Index the documents and save to Vector Search –> LLM retrieves relevant documents –> LLM generates a response –> Evaluate model –> Deploy it using Model Serving.

A Generative AI Engineer just deployed an LLM application at a digital marketing company that assists with answering customer service inquiries. Which metric should they monitor for their customer service LLM application in production?. Number of customer inquiries processed per unit of time. Energy usage per query. Final perplexity scores for the training of the model. HuggingFace Leaderboard values for the base LLM.

A Generative AI Engineer is building a Generative AI system that suggests the best matched employee team member to newly scoped projects. The team member is selected from a very large team. The match should be based upon project date availability and how well their employee profile matches the project scope. Both the employee profile and project scope are unstructured text. How should the Generative Al Engineer architect their system?. Create a tool for finding available team members given project dates. Embed all project scopes into a vector store, perform a retrieval using team member profiles to find the best team member. Create a tool for finding team member availability given project dates, and another tool that uses an LLM to extract keywords from project scopes. Iterate through available team members’ profiles and perform keyword matching to find the best available team member. Create a tool to find available team members given project dates. Create a second tool that can calculate a similarity score for a combination of team member profile and the project scope. Iterate through the team members and rank by best score to select a team member. Create a tool for finding available team members given project dates. Embed team profiles into a vector store and use the project scope and filtering to perform retrieval to find the available best matched team members.

A Generative AI Engineer is designing an LLM-powered live sports commentary platform. The platform provides real-time updates and LLM-generated analyses for any users who would like to have live summaries, rather than reading a series of potentially outdated news articles. Which tool below will give the platform access to real-time data for generating game analyses based on the latest game scores?. DatabricksIQ. Foundation Model APIs. Feature Serving. AutoML.

A Generative AI Engineer has a provisioned throughput model serving endpoint as part of a RAG application and would like to monitor the serving endpoint’s incoming requests and outgoing responses. The current approach is to include a micro-service in between the endpoint and the user interface to write logs to a remote server. Which Databricks feature should they use instead which will perform the same task?. Vector Search. Lakeview. DBSQL. Inference Tables.

A Generative Al Engineer is tasked with improving the RAG quality by addressing its inflammatory outputs. Which action would be most effective in mitigating the problem of offensive text outputs?. Increase the frequency of upstream data updates. Inform the user of the expected RAG behavior. Restrict access to the data sources to a limited number of users. Curate upstream data properly that includes manual review before it is fed into the RAG system.

A Generative Al Engineer is creating an LLM-based application. The documents for its retriever have been chunked to a maximum of 512 tokens each. The Generative Al Engineer knows that cost and latency are more important than quality for this application. They have several context length levels to choose from. Which will fulfill their need?. context length 514; smallest model is 0.44GB and embedding dimension 768. context length 2048: smallest model is 11GB and embedding dimension 2560. context length 32768: smallest model is 14GB and embedding dimension 4096. context length 512: smallest model is 0.13GB and embedding dimension 384.

A small and cost-conscious startup in the cancer research field wants to build a RAG application using Foundation Model APIs. Which strategy would allow the startup to build a good-quality RAG application while being cost-conscious and able to cater to customer needs?. Limit the number of relevant documents available for the RAG application to retrieve from. Pick a smaller LLM that is domain-specific. Limit the number of queries a customer can send per day. Use the largest LLM possible because that gives the best performance for any general queries.

A Generative Al Engineer is responsible for developing a chatbot to enable their company’s internal HelpDesk Call Center team to more quickly find related tickets and provide resolution. While creating the GenAI application work breakdown tasks for this project, they realize they need to start planning which data sources (either Unity Catalog volume or Delta table) they could choose for this application. They have collected several candidate data sources for consideration: call_rep_history: a Delta table with primary keys representative_id, call_id. This table is maintained to calculate representatives’ call resolution from fields call_duration and call start_time. transcript Volume: a Unity Catalog Volume of all recordings as a *.wav files, but also a text transcript as *.txt files. call_cust_history: a Delta table with primary keys customer_id, cal1_id. This table is maintained to calculate how much internal customers use the HelpDesk to make sure that the charge back model is consistent with actual service use. call_detail: a Delta table that includes a snapshot of all call details updated hourly. It includes root_cause and resolution fields, but those fields may be empty for calls that are still active. maintenance_schedule – a Delta table that includes a listing of both HelpDesk application outages as well as planned upcoming maintenance downtimes. They need sources that could add context to best identify ticket root cause and resolution. Which TWO sources do that? (Choose two.). call_cust_history. maintenance_schedule. call_rep_history. call_detail. transcript Volume.

What is the most suitable library for building a multi-step LLM-based workflow?. Pandas. TensorFlow. PySpark. LangChain.

When developing an LLM application, it’s crucial to ensure that the data used for training the model complies with licensing requirements to avoid legal risks. Which action is NOT appropriate to avoid legal risks?. Reach out to the data curators directly before you have started using the trained model to let them know. Use any available data you personally created which is completely original and you can decide what license to use. Only use data explicitly labeled with an open license and ensure the license terms are followed. Reach out to the data curators directly after you have started using the trained model to let them know.

A Generative AI Engineer is testing a simple prompt template in LangChain using the code below, but is getting an error. from langchain.chains import LLMChain from langchain_community.llms import OpenAI from langchain_core.prompts import PromptTemplate prompt_template = "Tell me a {adjective} joke" prompt = PromptTemplate( input_variables=["adjective"], template=prompt_template ) llm = LLMChain(prompt=prompt) llm.generate([{"adjective": "funny"}]) Assuming the API key was properly defined, what change does the Generative AI Engineer need to make to fix their chain?. prompt_template = "Tell me a {adjective} joke" prompt = PromptTemplate( input_variables=["adjective"], template=prompt_template ) llm = LLMChain(prompt=prompt) llm.generate("funny"). prompt_template = "Tell me a {adjective} joke" prompt = PromptTemplate( input_variables=["adjective"], template=prompt_template ) llm = LLMChain(prompt=prompt.format("funny")) llm.generate(). prompt_template = "Tell me a {adjective} joke" prompt = PromptTemplate( input_variables=["adjective"], template=prompt_template, llm=OpenAI() ) llm = LLMChain(prompt=prompt) llm.generate([{"adjective": "funny"}]). prompt_template = "Tell me a {adjective} joke" prompt = PromptTemplate( input_variables=["adjective"], template=prompt_template ) llm = LLMChain(llm=OpenAI(), prompt=prompt) llm.generate([{"adjective": "funny"}]).

A Generative Al Engineer is creating an LLM system that will retrieve news articles from the year 1918 and related to a user's query and summarize them. The engineer has noticed that the summaries are generated well but often also include an explanation of how the summary was generated, which is undesirable. Which change could the Generative Al Engineer perform to mitigate this issue?. Split the LLM output by newline characters to truncate away the summarization explanation. Tune the chunk size of news articles or experiment with different embedding models. Revisit their document ingestion logic, ensuring that the news articles are being ingested properly. Provide few shot examples of desired output format to the system and/or user prompt.

A Generative Al Engineer has developed an LLM application to answer questions about internal company policies. The Generative AI Engineer must ensure that the application doesn’t hallucinate or leak confidential data. Which approach should NOT be used to mitigate hallucination or confidential data leakage?. Add guardrails to filter outputs from the LLM before it is shown to the user. Fine-tune the model on your data, hoping it will learn what is appropriate and not. Limit the data available based on the user’s access level. Use a strong system prompt to ensure the model aligns with your needs.

A Generative Al Engineer interfaces with an LLM with prompt/response behavior that has been trained on customer calls inquiring about product availability. The LLM is designed to output “In Stock” if the product is available or only the term “Out of Stock” if not. Which prompt will work to allow the engineer to respond to call classification labels correctly?. Respond with “In Stock” if the customer asks for a product. You will be given a customer call transcript where the customer asks about product availability. The outputs are either “In Stock” or “Out of Stock”. Format the output in JSON, for example: {“call_id”: “123”, “label”: “In Stock”}. Respond with “Out of Stock” if the customer asks for a product. You will be given a customer call transcript where the customer inquires about product availability. Respond with “In Stock” if the product is available or “Out of Stock” if not.

A Generative Al Engineer is tasked with developing a RAG application that will help a small internal group of experts at their company answer specific questions, augmented by an internal knowledge base. They want the best possible quality in the answers, and neither latency nor throughput is a huge concern given that the user group is small and they’re willing to wait for the best answer. The topics are sensitive in nature and the data is highly confidential and so, due to regulatory requirements, none of the information is allowed to be transmitted to third parties. Which model meets all the Generative Al Engineer’s needs in this situation?. Dolly 1.5B. OpenAI GPT-4. BGE-large. Llama2-70B.

A Generative Al Engineer would like an LLM to generate formatted JSON from emails. This will require parsing and extracting the following information: order ID, date, and sender email. Here’s a sample email: Date: April 23, 2024 Time: 4:22 PM From: anjali.thayer@computex.org To: cust_service@realtek.com Subject: Shipment details Hey there, I have a shipment (order ID is CD34RFT) can you please send me an update? Thank you, Anjali They will need to write a prompt that will extract the relevant information in JSON format with the highest level of output accuracy. Which prompt will do that?. You will receive customer emails and need to extract date, sender email, and order ID. You should return the date, sender email, and order ID information in JSON format. You will receive customer emails and need to extract date, sender email, and order ID. Return the extracted information in JSON format. Here’s an example: {“date”: “April 16, 2024”, “sender_email”: “[email protected]”, “order_id”: “RE987D”}. You will receive customer emails and need to extract date, sender email, and order ID. Return the extracted information in a human-readable format. You will receive customer emails and need to extract date, sender email, and order IReturn the extracted information in JSON format.

A Generative AI Engineer has been asked to build an LLM-based question-answering application. The application should take into account new documents that are frequently published. The engineer wants to build this application with the least cost and least development effort and have it operate at the lowest cost possible. Which combination of chaining components and configuration meets these requirements?. For the application a prompt, a retriever, and an LLM are required. The retriever output is inserted into the prompt which is given to the LLM to generate answers. The LLM needs to be frequently with the new documents in order to provide most up-to-date answers. For the question-answering application, prompt engineering and an LLM are required to generate answers. For the application a prompt, an agent and a fine-tuned LLM are required. The agent is used by the LLM to retrieve relevant content that is inserted into the prompt which is given to the LLM to generate answers.

A Generative AI Engineer is creating an agent-based LLM system for their favorite monster truck team. The system can answer text based questions about the monster truck team, lookup event dates via an API call, or query tables on the team’s latest standings. How could the Generative AI Engineer best design these capabilities into their system?. Ingest PDF documents about the monster truck team into a vector store and query it in a RAG architecture. Write a system prompt for the agent listing available tools and bundle it into an agent system that runs a number of calls to solve a query. Instruct the LLM to respond with “RAG”, “API”, or “TABLE” depending on the query, then use text parsing and conditional statements to resolve the query. Build a system prompt with all possible event dates and table information in the system prompt. Use a RAG architecture to lookup generic text questions and otherwise leverage the information in the system prompt.

A Generative AI Engineer has been asked to design an LLM-based application that accomplishes the following business objective: answer employee HR questions using HR PDF documentation. Which set of high level tasks should the Generative AI Engineer's system perform?. Calculate averaged embeddings for each HR document, compare embeddings to user query to find the best document. Pass the best document with the user query into an LLM with a large context window to generate a response to the employee. Use an LLM to summarize HR documentation. Provide summaries of documentation and user query into an LLM with a large context window to generate a response to the user. Create an interaction matrix of historical employee questions and HR documentation. Use ALS to factorize the matrix and create embeddings. Calculate the embeddings of new queries and use them to find the best HR documentation. Use an LLM to generate a response to the employee question based upon the documentation retrieved. Split HR documentation into chunks and embed into a vector store. Use the employee question to retrieve best matched chunks of documentation, and use the LLM to generate a response to the employee based upon the documentation retrieved.

Generative AI Engineer at an electronics company just deployed a RAG application for customers to ask questions about products that the company carries. However, they received feedback that the RAG response often returns information about an irrelevant product. What can the engineer do to improve the relevance of the RAG’s response?. Assess the quality of the retrieved context. Implement caching for frequently asked questions. Use a different LLM to improve the generated response. Use a different semantic similarity search algorithm.

A Generative AI Engineer is developing a chatbot designed to assist users with insurance-related queries. The chatbot is built on a large language model (LLM) and is conversational. However, to maintain the chatbot’s focus and to comply with company policy, it must not provide responses to questions about politics. Instead, when presented with political inquiries, the chatbot should respond with a standard message: “Sorry, I cannot answer that. I am a chatbot that can only answer questions around insurance.” Which framework type should be implemented to solve this?. Safety Guardrail. Security Guardrail. Contextual Guardrail. Compliance Guardrail.

A Generative AI Engineer I using the code below to test setting up a vector store: from databricks.vector_search.client import VectorSearchClient vsc = VectorSearchClient() vsc.create_endpoint( name="vector_search_test", endpoint_type="STANDARD" ) Assuming they intend to use Databricks managed embeddings with the default embedding model, what should be the next logical function call?. vsc.get_index(). vsc.create_delta_sync_index(). vsc.create_direct_access_index(). vsc.similarity_search().

A Generative AI Engineer is tasked with deploying an application that takes advantage of a custom MLflow Pyfunc model to return some interim results. How should they configure the endpoint to pass the secrets and credentials?. Use spark.conf.set (). Pass variables using the Databricks Feature Store API. Add credentials using environment variables. Pass the secrets in plain text.

A Generative AI Engineer wants to build an LLM-based solution to help a restaurant improve its online customer experience with bookings by automatically handling common customer inquiries. The goal of the solution is to minimize escalations to human intervention and phone calls while maintaining a personalized interaction. To design the solution, the Generative AI Engineer needs to define the input data to the LLM and the task it should perform. Which input/output pair will support their goal?. Input: Online chat logs; Output: Group the chat logs by users, followed by summarizing each user’s interactions. Input: Online chat logs; Output: Buttons that represent choices for booking details. Input: Customer reviews; Output: Classify review sentiment. Input: Online chat logs; Output: Cancellation options.

What is an effective method to preprocess prompts using custom code before sending them to an LLM?. Directly modify the LLM’s internal architecture to include preprocessing steps. It is better not to introduce custom code to preprocess prompts as the LLM has not been trained with examples of the preprocessed prompts. Rather than preprocessing prompts, it’s more effective to postprocess the LLM outputs to align the outputs to desired outcomes. Write a MLflow PyFunc model that has a separate function to process the prompts.

A Generative AI Engineer is developing an LLM application that users can use to generate personalized birthday poems based on their names. Which technique would be most effective in safeguarding the application, given the potential for malicious user inputs?. Implement a safety filter that detects any harmful inputs and ask the LLM to respond that it is unable to assist. Reduce the time that the users can interact with the LLM. Ask the LLM to remind the user that the input is malicious but continue the conversation with the user. Increase the amount of compute that powers the LLM to process input faster.

Which indicator should be considered to evaluate the safety of the LLM outputs when qualitatively assessing LLM responses for a translation use case?. The ability to generate responses in code. The similarity to the previous language. The latency of the response and the length of text generated. The accuracy and relevance of the responses.

A Generative AI Engineer is developing a patient-facing healthcare-focused chatbot. If the patient’s question is not a medical emergency, the chatbot should solicit more information from the patient to pass to the doctor’s office and suggest a few relevant pre-approved medical articles for reading. If the patient’s question is urgent, direct the patient to calling their local emergency services. Given the following user input: “I have been experiencing severe headaches and dizziness for the past two days.” Which response is most appropriate for the chatbot to generate?. Here are a few relevant articles for your browsing. Let me know if you have questions after reading them. Please call your local emergency services. Headaches can be tough. Hope you feel better soon!. Please provide your age, recent activities, and any other symptoms you have noticed along with your headaches and dizziness.

After changing the response generating LLM in a RAG pipeline from GPT-4 to a model with a shorter context length that the company self-hosts, the Generative AI Engineer is getting the following error: { "error_code": "BAD_REQUEST", "message": "Bad request: rpc error: code = InvalidArgument desc = prompt token count (4595) cannot exceed 4096…" } What TWO solutions should the Generative AI Engineer implement without changing the response generating model? (Choose two.). Use a smaller embedding model to generate embeddings. Reduce the maximum output tokens of the new model. Decrease the chunk size of embedded documents. Reduce the number of records retrieved from the vector database. Retrain the response generating model using ALiBi.

A Generative Al Engineer is building a system which will answer questions on latest stock news articles. Which will NOT help with ensuring the outputs are relevant to financial news?. Implement a comprehensive guardrail framework that includes policies for content filters tailored to the finance sector. Increase the compute to improve processing speed of questions to allow greater relevancy analysis. Implement a profanity filter to screen out offensive language. Incorporate manual reviews to correct any problematic outputs prior to sending to the users.

A Generative Al Engineer is building a RAG application that answers questions about internal documents for the company SnoPen AI. The source documents may contain a significant amount of irrelevant content, such as advertisements, sports news, or entertainment news, or content about other companies. Which approach is advisable when building a RAG application to achieve this goal of filtering irrelevant information?. Keep all articles because the RAG application needs to understand non-company content to avoid answering questions about them. Include in the system prompt that any information it sees will be about SnoPenAI, even if no data filtering is performed. Include in the system prompt that the application is not supposed to answer any questions unrelated to SnoPen AI. Consolidate all SnoPen AI related documents into a single chunk in the vector database.

A Generative Al Engineer has successfully ingested unstructured documents and chunked them by document sections. They would like to store the chunks in a Vector Search index. The current format of the dataframe has two columns: (i) original document file name (ii) an array of text chunks for each document. What is the most performant way to store this dataframe?. Split the data into train and test set, create a unique identifier for each document, then save to a Delta table. Flatten the dataframe to one chunk per row, create a unique identifier for each row, and save to a Delta table. First create a unique identifier for each document, then save to a Delta table. Store each chunk as an independent JSON file in Unity Catalog Volume. For each JSON file, the key is the document section name and the value is the array of text chunks for that section.

A Generative AI Engineer has created a RAG application which can help employees retrieve answers from an internal knowledge base, such as Confluence pages or Google Drive. The prototype application is now working with some positive feedback from internal company testers. Now the Generative Al Engineer wants to formally evaluate the system’s performance and understand where to focus their efforts to further improve the system. How should the Generative AI Engineer evaluate the system?. Use cosine similarity score to comprehensively evaluate the quality of the final generated answers. Curate a dataset that can test the retrieval and generation components of the system separately. Use MLflow’s built in evaluation metrics to perform the evaluation on the retrieval and generation components. Benchmark multiple LLMs with the same data and pick the best LLM for the job. Use an LLM-as-a-judge to evaluate the quality of the final answers generated.

A Generative Al Engineer has already trained an LLM on Databricks and it is now ready to be deployed. Which of the following steps correctly outlines the easiest process for deploying a model on Databricks?. Log the model as a pickle object, upload the object to Unity Catalog Volume, register it to Unity Catalog using MLflow, and start a serving endpoint. Log the model using MLflow during training, directly register the model to Unity Catalog using the MLflow API, and start a serving endpoint. Save the model along with its dependencies in a local directory, build the Docker image, and run the Docker container. Wrap the LLM’s prediction function into a Flask application and serve using Gunicorn.

A Generative AI Engineer developed an LLM application using the provisioned throughput Foundation Model API. Now that the application is ready to be deployed, they realize their volume of requests are not sufficiently high enough to create their own provisioned throughput endpoint. They want to choose a strategy that ensures the best cost-effectiveness for their application. What strategy should the Generative AI Engineer use?. Switch to using External Models instead. Deploy the model using pay-per-token throughput as it comes with cost guarantees. Change to a model with a fewer number of parameters in order to reduce hardware constraint issues. Throttle the incoming batch of requests manually to avoid rate limiting issues.

A Generative AI Engineer is building an LLM to generate article summaries in the form of a type of poem, such as a haiku, given the article content. However, the initial output from the LLM does not match the desired tone or style. Which approach will NOT improve the LLM’s response to achieve the desired response?. Provide the LLM with a prompt that explicitly instructs it to generate text in the desired tone and style. Use a neutralizer to normalize the tone and style of the underlying documents. Include few-shot examples in the prompt to the LLM. Fine-tune the LLM on a dataset of desired tone and style.

A Generative AI Engineer is creating an LLM-powered application that will need access to up-to-date news articles and stock prices. The design requires the use of stock prices which are stored in Delta tables and finding the latest relevant news articles by searching the internet. How should the Generative AI Engineer architect their LLM system?. Use an LLM to summarize the latest news articles and lookup stock tickers from the summaries to find stock prices. Query the Delta table for volatile stock prices and use an LLM to generate a search query to investigate potential causes of the stock volatility. Download and store news articles and stock price information in a vector store. Use a RAG architecture to retrieve and generate at runtime. Create an agent with tools for SQL querying of Delta tables and web searching, provide retrieved values to an LLM for generation of response.

A Generative AI Engineer is designing a chatbot for a gaming company that aims to engage users on its platform while its users play online video games. Which metric would help them increase user engagement and retention for their platform?. Randomness. Diversity of responses. Lack of relevance. Repetition of responses.

A company has a typical RAG-enabled, customer-facing chatbot on its website. Select the correct sequence of components a user's questions will go through before the final output is returned. Use the diagram above for reference. 1.embedding model, 2.vector search, 3.context-augmented prompt, 4.response-generating LLM. 1.context-augmented prompt, 2.vector search, 3.embedding model, 4.response-generating LLM. 1.response-generating LLM, 2.vector search, 3.context-augmented prompt, 4.embedding model. 1.response-generating LLM, 2.context-augmented prompt, 3.vector search, 4.embedding model.

A team wants to serve a code generation model as an assistant for their software developers. It should support multiple programming languages. Quality is the primary objective. Which of the Databricks Foundation Model APIs, or models available in the Marketplace, would be the best fit?. Llama2-70b. BGE-large. MPT-7b. CodeLlama-34B.

A Generative AI Engineer is building a RAG application that will rely on context retrieved from source documents that are currently in PDF format. These PDFs can contain both text and images. They want to develop a solution using the least amount of lines of code. Which Python package should be used to extract the text from the source documents?. flask. beautifulsoup. unstructured. numpy.

A Generative AI Engineer received the following business requirements for an external chatbot. The chatbot needs to know what types of questions the user asks and routes to appropriate models to answer the questions. For example, the user might ask about upcoming event details. Another user might ask about purchasing tickets for a particular event. What is an ideal workflow for such a chatbot?. The chatbot should only look at previous event information. There should be two different chatbots handling different types of user queries. The chatbot should be implemented as a multi-step LLM workflow. First, identify the type of question asked, then route the question to the appropriate model. If it’s an upcoming event question, send the query to a text-to-SQL model. If it’s about ticket purchasing, the customer should be redirected to a payment platform. The chatbot should only process payments.

A Generative Al Engineer is tasked with developing an application that is based on an open source large language model (LLM). They need a foundation LLM with a large context window. Which model fits this need?. DistilBERT. MPT-30B. Llama2-70B. DBRX.

A Generative AI Engineer is building an LLM-based application that has an important transcription (speech-to-text) task. Speed is essential for the success of the application. Which open Generative AI models should be used?. DBRX. MPT-30B-Instruct. Llama-2-70b-chat-hf. whisper-large-v3 (1.6B).

A Generative AI Engineer is deciding between using LSH (Locality Sensitive Hashing) and HNSW (Hierarchical Navigable Small World) for indexing their vector database. Their top priority is semantic accuracy. Which approach should the Generative AI Engineer use to evaluate these two techniques?. Compare the cosine similarities of the embeddings of returned results against those of a representative sample of test inputs. Compare the Bilingual Evaluation Understudy (BLEU) scores of returned results for a representative sample of test inputs. Compare the Recall-Oriented-Understudy for Gisting Evaluation (ROUGE) scores of returned results for a representative sample of test inputs. Compare the Levenshtein distances of returned results against a representative sample of test inputs.

When developing an LLM application, it’s crucial to ensure that the data used for training the model complies with licensing requirements to avoid legal risks. Which action is most appropriate to avoid legal risks?. Only use data explicitly labeled with an open license and ensure the license terms are followed. Any LLM outputs are reasonable to use because they do not reveal the original sources of data directly. Reach out to the data curators directly to gain written consent for using their data. Use any publicly available data as public data does not have legal restrictions.

A Generative AI Engineer interfaces with an LLM with instruction-following capabilities trained on customer calls inquiring about product availability. The LLM should output “Success” if the product is available or “Fail” if not. Which prompt allows the engineer to receive call classification labels correctly?. You are a helpful assistant that reads customer call transcripts. Walk through the transcript and think step-by-step if the customer’s inquiries are addressed successfully. Answer “Success” if yes; otherwise, answer “Fail”. You will be given a customer call transcript where the customer asks about product availability. Classify the call as “Success” if the product is available and “Fail” if the product is unavailable. You will be given a customer call transcript where the customer asks about product availability. The outputs are either “Success” or “Fail”. Format the output in JSON, for example: {"call_id": "123", "label": "Succes"}. You will be given a customer call transcript. Answer “Success” if the customer call has been resolved successfully. Answer “Fail” if the call is redirected or if the question is not resolved.

Which TWO chain components are required for building a basic LLM-enabled chat application that includes conversational capabilities, knowledge retrieval, and contextual memory? (Choose two.). Vector Stores. Conversation Buffer Memory. External tools. Chat loaders. React Components.

A Generative AI Engineer has written scalable PySpark code to ingest unstructured PDF documents and chunk them in preparation for storing in a Databricks Vector Search index. Currently, the two columns of their dataframe include the original filename as a string and an array of text chunks from that document. What set of steps should the Generative AI Engineer perform to store the chunks in a ready-to-ingest manner for Databricks Vector Search?. Use PySpark’s autoloader to apply a UDF across all chunks, formatting them in a JSON structure for Vector Search ingestion. Flatten the dataframe to one chunk per row, create a unique identifier for each row, and enable change feed on the output Delta table. Utilize the original filename as the unique identifier and save the dataframe as is. Create a unique identifier for each document, flatten the dataframe to one chunk per row and save to an output Delta table.

A Generative AI Engineer is asked to build an LLM application that would excel at code generation. They need to select a model that has been specifically trained to generate code. Which model would likely produce the best results out of the box?. CodeLlama-34b-Instruct-hf. Mixtral-8x7B-v0.1. Llama-2-70b-hf. mpt-7b-8k-instruct.

A Generative AI Engineer needs to design an LLM pipeline to conduct multi-stage reasoning that leverages external tools. To be effective at this, the LLM will need to plan and adapt actions while performing complex reasoning tasks. Which approach will do this?. Train the LLM to generate a single, comprehensive response without interacting with any external tools, relying solely on its pre-trained knowledge. Use a Chain-of-Thought (CoT) prompting technique to guide the LLM through a series of reasoning steps, then manually input the results from external tools for the final answer. Implement a framework like ReAct, which allows the LLM to generate reasoning traces and perform task-specific actions that leverage external tools if necessary. Encourage the LLM to make multiple API calls in sequence without planning or structuring the calls, allowing the LLM to decide when and how to use external tools spontaneously.

A Generative AI Engineer at an automotive company would like to build a question-answering chatbot for customers to inquire about their vehicles. They have a database containing various documents of different vehicle makes, their hardware parts, and common maintenance information. Which of the following components will NOT be useful in building such a chatbot?. Invite users to submit long, rather than concise, questions. Response-generating LLM. Embedding model. Vector database.

A Generative AI Engineer is building an LLM to generate article headlines given the article content. However, the initial output from the LLM does not match the desired tone or style. Which approach would be most effective for adjusting the LLM’s response to achieve the desired response?. Exclude any article headlines that do not match the desired output. Fine-tune the LLM on a dataset of desired tone and style. Provide the LLM with a prompt that explicitly instructs it to generate text in the desired tone and style. All of the above.

A Generative AI Engineer is creating a customer support bot that should respond differently to an end user based on the sentiment in their initial message. For example, if the end user’s message was angry, the bot should try to de-escalate their negative sentiments as it solves the customer query. They want to make sure their approach follows best practices. Which approach will do this?. Use an encoder-only LLM model to both detect sentiment and generate replies based upon the detected sentiment. Implement a RAG architecture for how to respond to users depending on detected sentiment. Use linear regression model to classify sentiment and feed the result to a system prompt for the LLM to respond. Create a chain which first uses an LLM to classify sentiment, then changes system prompt for the customer interaction LLM based upon the initial customer query sentiment.

A Generative AI Engineer is ready to deploy an LLM application written using Foundation Model APIs. They want to follow security best practices for production scenarios. Which authentication method should they choose?. Use OAuth machine-to-machine authentication. Use an access token belonging to service principals. Use an access token belonging to any workspace user. Use a frequently rotated access token belonging to either a workspace user or a service principal.

A Generative AI Engineer is developing a RAG system for their company to perform internal document Q&A for structured HR policies, but the answers returned are frequently incomplete and unstructured. It seems that the retriever is not returning all relevant context. The Generative AI Engineer has experimented with different embedding and response generating LLMs but that did not improve results. Which TWO options could be used to improve the response quality? (Choose two.). Add the section header as a prefix to chunks. Split the document by sentence. Use a larger embedding model. Increase the document chunk size. Fine tune the response generation model.

A Generative AI Engineer is building a production-ready LLM system which replies directly to customers. The solution makes use of the Foundation Model API via provisioned throughput. They are concerned that the LLM could potentially respond in a toxic or otherwise unsafe way. They also wish to perform this with the least amount of effort. Which approach will do this?. Ask users to report unsafe responses. Host Llama Guard on Foundation Model API and use it to detect unsafe responses. Add some LLM calls to their chain to detect unsafe content before returning text. Add a regex expression on inputs and outputs to detect unsafe responses.

A Generative AI Engineer would like an LLM to parse and extract the following information: date, sender email, and order ID. The output should be formatted into JSON. Here’s an email sample: Date: April 16, 2024 Time: 3:45 PM From: sarah.lee925@gmail.com To: customer_service@smart_thermostat.com Subject: Order update Hi, Can I get a status update on my order? The order ID is RE987D. Thank you, Sarah They need a prompt that will extract and output the required information in JSON with the highest level of output accuracy. Which prompt will do that?. You will receive customer emails and need to extract date, sender email, and order ID. Return the extracted information in a human-readable format. You will receive customer emails and need to extract date, sender email, and order ID. Return the extracted information in JSON format. You will receive customer emails and need to extract date, sender email, and order ID. Return the extracted information in JSON format. Here’s an example: {"date":"April 16, 2024", "sender_email":"[email protected]", "order_id":"RE987D"}. You will receive customer emails and need to extract date, sender email, and order IYou should return the date, sender email, and order ID information in JSON format.

A Generative AI Engineer has built an LLM-based system that will automatically translate user text between two languages. They now want to benchmark multiple LLM’s on this task and pick the best one. They have an evaluation set with known high quality translation examples. They want to evaluate each LLM using the evaluation set with a performant metric. Which metric should they choose for this evaluation?. BLEU metric. NDCG metric. ROUGE metric. RECALL metric.

A Generative AI Engineer is using an LLM to classify species of edible mushrooms based on text descriptions of certain features. The model is returning accurate responses in testing and the Generative AI Engineer is confident they have the correct list of possible labels, but the output frequently contains additional reasoning in the answer when the Generative AI Engineer only wants to return the label with no additional text. Which action should they take to elicit the desired behavior from this LLM?. Use few shot prompting to instruct the model on expected output format. Use zero shot prompting to instruct the model on expected output format. Use zero shot chain-of-thought prompting to prevent a verbose output format. Use a system prompt to instruct the model to be succinct in its answer.

A Generative AI Engineer is working with a retail company that wants to enhance its customer experience by automatically handling common customer inquiries. They are working on an LLM-powered AI solution that should improve response times while maintaining a personalized interaction. They want to define the appropriate input and LLM task to do this. Which input/output pair will do this?. Input: Customer service chat logs; Output: Group the chat logs by users, followed by summarizing each user’s interactions, then respond. Input: Customer service chat logs; Output: Find the answers to similar questions and respond with a summary. Input: Customer reviews; Output: Classify review sentiment. Input: Customer reviews; Output: Group the reviews by users and aggregate per-user average rating, then respond.

A Generative AI Engineer is developing a RAG application and would like to experiment with different embedding models to improve the application performance. Which strategy for picking an embedding model should they choose?. Pick an embedding model with multilingual support to support potential multilingual user questions. Pick the most recent and most performant open LLM released at the time. Pick an embedding model trained on related domain knowledge. Pick the embedding model ranked highest on the Massive Text Embedding Benchmark (MTEB) leaderboard hosted by HuggingFace.

A Generative AI Engineer wants their finetuned LLMs in their prod Databricks workspace available for testing in their dev workspace as well. All of their workspaces are Unity Catalog enabled and they are currently logging their models into the Model Registry in MLflow. What is the most cost-effective and secure option for the Generative AI Engineer to accomplish their goal?. Use an external model registry which can be accessed from all workspaces. Use MLflow to log the model directly into Unity Catalog, and enable READ access in the dev workspace to the model. Setup a duplicate training pipeline in dev, so that an identical model is available in dev. Setup a script to export the model from prod and import it to dev.

A Generative AI Engineer has just deployed an LLM application at a manufacturing company that assists with answering customer service inquiries. They need to identity the key enterprise metrics to monitor the application in production. Which is NOT a metric they will implement for their customer service LLM application in production?. Massive Multi-task Language Understanding (MMLU) score. Number of customer inquiries processed per unit of time. Factual accuracy of the response. Time taken for LLM to generate a response.

Generative AI Engineer is helping a cinema extend its website’s chat bot to be able to respond to questions about specific showtimes for movies currently playing at their local theater. They already have the location of the user provided by location services to their agent, and a Delta table which is continually updated with the latest showtime information by location. They want to implement this new capability in their RAG application. Which option will do this with the least effort and in the most performant way?. Create a Feature Serving Endpoint from a FeatureSpec that references an online store synced from the Delta table. Query the Feature Serving Endpoint as part of the agent logic / tool implementation. Query the Delta table directly via a SQL query constructed from the user’s input using a text-to-SQL LLM in the agent logic / tool implementation. Set up a task in Databricks Workflows to write the information in the Delta table periodically to an external database such as MySQL and query the information from there as part of the agent logic / tool implementation. Write the Delta table contents to a text column, then embed those texts using an embedding model and store these in the vector index. Look up the information based on the embedding as part of the agent logic / tool implementation.

Generative AI Engineer needs to build an LLM application that can understand medical documents, including recently published ones. They want to select an open model available on HuggingFace’s model hub. Which step is most appropriate for selecting an LLM?. Pick any model in the Mistral family, as Mistral models are good with all types of use cases. Select a model based on the highest number of downloads, as this indicates popularity, reliability, and general suitability. Select a model that is most recently uploaded, as this indicates the model is the newest and highly likely to be the most performant. Check for the model and training data description to identify if the model is trained on any medical data.

Generative AI Engineer is building a RAG application that answers questions about technology-related news articles. The source documents may contain a significant amount of irrelevant content, such as advertisements, sports news, or entertainment news. Which approach is NOT advisable for building a RAG application focused on answering technology-only questions?. Include in the system prompt that the application is not supposed to answer any questions unrelated to technology. Filter out irrelevant news articles in the retrieval process. Keep all news articles because the RAG application needs to understand non-technological content to avoid answering questions about them. Filter out irrelevant news articles in the upstream document database.

A Generative AI Engineer is building a RAG application that will rely on context retrieved from source documents that are currently in HTML format. They want to develop a solution using the least amount of lines of code. Which Python package should be used to extract the text from the source documents?. pytesseract. numpy. pypdf2. beautifulsoup.

A Generative AI Engineer is building a RAG application for answering employee questions on company policies. What are the steps needed to build this RAG application and deploy it?. Ingest documents from a source -> Index the documents and saves to Vector Search -> User submits queries against an LLM -> LLM retrieves relevant documents -> Evaluate model -> LLM generates a response -> Deploy it using Model Serving. User submits queries against an LLM -> Ingest documents from a source -> Index the documents and save to Vector Search -> LLM retrieves relevant documents -> LLM generates a response -> Evaluate model -> Deploy it using Model Serving. Ingest documents from a source -> Index the documents and save to Vector Search -> Evaluate model -> Deploy it using Model Serving -> User submits queries against an LLM -> LLM retrieves relevant documents -> LLM generates a response. Ingest documents from a source -> Index the documents and save to Vector Search -> User submits queries against an LLM -> LLM retrieves relevant documents -> LLM generates a response -> Evaluate model -> Deploy it using Model Serving.

A Generative AI Engineer who was prototyping an LLM system accidentally ran thousands of inference queries against a Foundation Model endpoint over the weekend. They want to take action to prevent this from unintentionally happening again in the future. What action should they take?. Use prompt engineering to instruct the LLM endpoints to refuse too many subsequent queries. Require that all development code which interfaces with a Foundation Model endpoint must be reviewed by a Staff level engineer before execution. Build a pyfunc model which proxies to the Foundation Model endpoint and add throttling within the pyfune model. Configure rate limiting on the Foundation Model endpoints.

A Generative AI Engineer is setting up a Databricks Vector Search that will lookup news articles by topic within 10 days of the date specified. An example query might be “Tell me about monster truck news around January 5th 1992”. They want to do this with the least amount of effort. How can they set up their Vector Search index to support this use case?. Create separate indexes by topic and add a classifier model to appropriately pick the best index. Include metadata columns for article date and topic to support metadata filtering. Pass the query directly to the vector search index and return the best articles. Split articles by 10 day blocks and return the block closest to the query.

A Generative AI Engineer developed an LLM application using the pay-per-token Foundation Model API. Now that the application is ready to be deployed, they would like to ensure the model endpoint can serve high incoming volumes of requests in production. What should the Generative AI Engineer consider?. Switch to using External Models instead. Throttle the incoming batch of requests manually to avoid rate limiting issues. Change to a model with a fewer number of parameters in order to reduce hardware constraint issues. Deploy the model using provisioned throughput as it comes with performance guarantees.

A Generative AI Engineer at a home appliance company has been asked to design an LLM based application that accomplishes the following business objective: answer customer questions on home appliances using the associated instruction manuals. Which set of high-level tasks should the Generative AI Engineer’s system perform?. Split instruction manuals into chunks and embed into a vector store. Use the question to retrieve best matched chunks of manual, and use the LLM to generate a response to the user based upon the manual retrieved. Create an interaction matrix of historical user questions and appliance instruction manuals. Use ALS to factorize the matrix and create embeddings. Calculate the embeddings of new queries and use them to find the best manual. Use an LLM to generate a response to the question based upon the manual retrieved. Calculate averaged embeddings for each instruction manual, compare embeddings to user query to find the best manual. Pass the best manual with user query into an LLM with a large context window to generate a response to the employee. Use an LLM to summarize all of the instruction manuals. Provide summaries of each manual and user query into an LLM with a large context window to generate a response to the user.

A Generative AI Engineer is developing an LLM application to interact with users to provide personalized movie recommendations. Given the potential for malicious user inputs, which technique would be most effective in safeguarding the application?. Reduce the time that the users can interact with the LLM. Increase the amount of compute that powers the LLM to process input faster. Ask the LLM to remind the user that the input is malicious but continue the conversation with the user. Implement a safety filter that detects any harmful inputs and ask the LLM to respond that it is unable to assist.

A Generative AI Engineer received the following business requirements for an internal chatbot. The internal chatbot needs to know what types of questions the user asks and route them to appropriate models to answer the questions. For example, the user might ask about historical failure rates of a specific electrical part. Another user might ask about how to troubleshoot a piece of electrical equipment. Available data sources include a database of electrical equipment PDF manuals and also a table with information on when an electrical part experiences failure. Which workflow supports such a chatbot?. Parse the electrical equipment PDF manuals into a table of question and response pairs. That way, the same chatbot can query tables easily to answer questions about both historical failure rates and equipment troubleshooting. The chatbot should be implemented as a multi-step LLM workflow. First, identify the type of question asked, then route the question to the appropriate model. If it’s a historical failure rate question, send the query to a text-to-SQL model. If it’s a troubleshooting question, then send the query to another model that summarizes the equipment-specific document and generates the response. There should be two different chatbots handling different types of user queries. The table with electrical part failures should be converted into a text document first. That way, the same chatbot can use the same document retrieval process to generate answers regardless of question types.

A Generative AI Engineer is building a system that will answer questions on currently unfolding news topics. As such, it pulls information from a variety of sources including articles and social media posts. They are concerned about toxic posts on social media causing toxic outputs from their system. Which guardrail will limit toxic outputs?. Reduce the amount of context items the system will include in consideration for its response. Use only approved social media and news accounts to prevent unexpected toxic data from getting to the LLM. Log all LLM system responses and perform a batch toxicity analysis monthly. Implement rate limiting.

A Generative AI Engineer has created a RAG application which can help employees interpret HR documentation. The prototype application is now working with some positive feedback from internal company testers. Now the Generative AI Engineer wants to formally evaluate the system’s performance and understand where to focus their efforts to further improve the system How should the Generative AI Engineer evaluate the system?. Use ROUGE score to comprehensively evaluate the quality of the final generated answers. Use an LLM-as-a-judge to evaluate the quality of the final answers generated. Curate a dataset that can test the retrieval and generation components of the system separately. Use MLflow’s built in evaluation metrics to perform the evaluation on the retrieval and generation components. Benchmark multiple LLMs with the same data and pick the best LLM for the job.

A Generative AI Engineer is using LangChain to assist a museum in classifying documents and using this code: from langchain import PromptTemplate template_text = """ You are an automated archeological assistant helping identify documents relevant to ancient Egypt. If the document is relevant return "relevant", else return "irrelevant" with no additional text. Document:" {document_input}" Response: """ prompt_template = PromptTemplate( input_variables=["document_input"] ) completed_prompt = prompt_template.format( document_input=document_input ) Their code results in an error. What do they need to change in order to fix this template?. Provide an LLM argument to PromptTemplate(). Provide template and LLM arguments to PromptTemplate(). Omit PromptTemplate(), it is only used for multi-part templates. Provide a template argument to PromptTemplate().

A Generative AI Engineer has been reviewing issues with their company's LLM based question-answering assistant and has determined that a technique called prompt chaining could help alleviate some performance concerns. However, to suggest this to their team, they have to clearly explain how it works and how it can benefit their question-answering assistant. Which explanation do they communicate to the team?. It allows you to break down complex tasks into multiple independent subtasks. This enables the assistant to generate more comprehensive and accurate responses. It allows you to reduce the latency of your applications. By having multiple chains participating in the response as a chain, you increase the rate at which the response is generated. It allows you to decrease the effort involved in crafting a prompt. Chains make it possible to reuse prompt text across multiple different use cases. It reduces the average cost of a typical request. Chains make more efficient use of the tokens produced to generate higher quality responses with fewer tokens.

An AI developer team wants to fine tune an open-weight model to have exceptional performance on a code generation use case. They are trying to choose the best model to start with. They want to minimize model hosting costs, and are using Huggingface model cards and spaces to explore models. Which TWO model attributes and metrics should the team focus on to make their selection? (Choose two.). Big Code Models Leaderboard. Number of model parameters. MTEB Leaderboard. Chatbot Arena Leaderboard. Number of model downloads last month.

A Generative AI Engineer at an automotive company would like to build a question-answering chatbot to help customers answer specific questions about their vehicles. They have: • A catalog with hundreds of thousands of cars manufactured since the 1960s • Historical searches, with user queries and successful matches • Descriptions of their own cars in multiple languages They have already selected an open source LLM and created a test set of user queries. They need to discard techniques that will not help them build the chatbot. Which do they discard?. Setting chunk size to match the model's context window to maximize coverage. Implementing metadata filtering based on car models and years. Fine-tuning an embedding model on automotive terminology. Adding few-shot examples for response generation.

A Generative AI Engineer at a legal firm is designing a RAG system to analyze historical legal case precedents. The system needs to process millions of court opinions and legal documents, already organized by time and topic, to track how interpretations of specific laws have evolved over time. All of these documents are in plain-text. The engineer needs to choose a chunking method that would most effectively preserve continuity and the temporal nature of the cases. Which method do they choose?. Implement windowed summarization with overlapping chunks. Implement a hierarchical tree structure, like RAPTOR, to group similar legal concepts. Implement paragraph level embeddings with each chunk. Implement sentence level embeddings with each chunk tagged with the time to enable metadata filtering.

A Generative AI Engineer is developing an agent system using a popular agent-authoring library. The agent comprises multiple parallel and sequential chains. The engineer encounters challenges as the agent fails at one of the steps, making it difficult to debug the root cause. They need to find an appropriate approach to research this issue and discover the cause of failure. Which approach do they choose?. Enable MLflow tracing to gain visibility into each agent's behavior and execution step. Run MLflow.evaluate to determine root cause of failed step. Implement structured logging within the agent's code to capture detailed execution information. Deconstruct the agent into independent steps to simplify debugging.

A Generative AI Engineer is experimenting with using parameters to configure an agent in Mosaic Agent Framework. However, they are struggling to get the agent to respond with relevant information with this configuration: config_dict = { "prompt_template": "You are a trivia bot. Generate a trivia question based on the user's input.", "prompt_template_input_vars": ["user_input"], "model_serving_endpoint": "databricks-dbrx-instruct", "llm_parameters": {"temperature": 0.01, "max_tokens": 500}, } Which error is causing the problem?. The prompt does not parse the user's input vars. The prompt does not set the retriever schema. The prompt does not list available agents for the LLM to call. The prompt is not wrapped in ChatModel.

A Generative AI Engineer is using LangGraph to define multiple tools in a single agentic application. They want to enable the main orchestrator LLM to decide on its own which tools are most appropriate to call for a given prompt. To do this, they must determine the general flow of the code. Which sequence will do this?. 1. Define or import the tools 2. Add tools and LLM to the agent 3. Create the ReAct agent. 1. Define or import the tools 2. Define the agent 3. Initialize the agent with ReAct, the LLM, and the tools. 1. Define the tools 2. Load each tool into a separate agent 3. Instruct the LLM to use ReAct to call the appropriate agent. 1. Define the tools inside the agents 2. Load the agents into the LLM 3. Instruct the LLM to use CoT reasoning to determine the appropriate agent.

All of the following are python APIs used to query Databricks foundation models. When running in an interactive notebook, which of the following libraries does not automatically use the current session credentials?. OpenAI client. REST API via requests library. MLflow Deployments SDK. Databricks Python SDK.

A Generative AI Engineer is deploying a customer-facing, fine-tuned LLM on their public website. Given the large investment the company put into fine tuning this model, and the proprietary nature of the tuning data, they are concerned about model inversion attacks. Which of the following Databricks AI Security Framework (DASF) risk mitigation strategies are most relevant to this use case?. Implement AI guardrails to allow users to configure and enforce compliance. Leverage Databricks access control lists (ACLs) to configure permissions for accessing models. Use secure model features with Databricks Feature Store. Apply attribute-based access controls (ABAC) to limit unauthorized access.

A team uses Mosaic AI Vector Search to retrieve documents for their Retrieval-Augmented Generation (RAG) pipeline. The search query returns five relevant documents, and the first three are added to the prompt as context. Performance evaluation with Agent Evaluation shows that some lower-ranked retrieved documents have higher context relevancy scores than higher-ranked documents. Which option should the team consider to optimize this workflow?. Use a reranker to order the documents based on the relevance scores. Modify the prompt to instruct the LLM to order the documents based on the relevance scores. Use a different embedding model for computing document embeddings. Increase the number of documents added to the prompt to improve context relevance.

A generative AI engineer is deploying an AI agent authored with MLflow's ChatAgent interface for a retail company's customer support system on Databricks. The agent must handle thousands of inquiries daily, and the engineer needs to track its performance and quality in real-time to ensure it meets service-level agreements. Which metrics are automatically captured by default and made available for monitoring when the agent is deployed using the Mosaic AI Agent Framework?. Operational metrics like request volume, latency, and errors. Quality metrics like correctness and guideline adherence. Both operational and quality metrics. No metrics are automatically captured.

Report abuse

▲