Author: rodrigtech
-
LlamaIndex Simplifying Data Retrieval
Introduction Most often using forms of LLM’s with a front-end UI has constraints for memory primarily because this is using the ChatCompletionsClient to initiate the conversation. This is stateless in nature meaning it is only limited to that session and the LLM’s knowledge for what is represented back to the end user, over time this…
-
Phi-3.5 Mixture of Experts
Introduction Microsoft has open-sourced its Phi-3.5 Mixture of Experts model recently on the Azure AI Studio catalog provided as a (Model-as-a-Service) that you can run on Azure or you can also use Huggingface to utilize this model. The first question depending how much you’re following along with the constant upstream releases of models is the…
-
AI Agents with LangGraph
Introduction Agents are the next iteration of taking traditional stateless interactions with LLM’s to a stateful interaction with the use of typically Assistants API or extending using a framework. Some popular ways to create a agent workflow are tools such as Promptflow, CrewAI, LangGraph, LangChain and others. For this blog post I’m going to demonstrate…
-
RouteLLM Unlocking Cost Effective LLM Routing
Introduction Costs associated with using closed-source large language models can add up in the use cases of complex tasks due to the nature of how tokens are priced for using APIs. RouteLLM is a open-sourced project that creates a method to determine based on the query a user sends which LLM to choose based on…
-
Github Actions with Azure ML Jobs
Introduction Its no secret that Github is a premier development platform for Source Control Management and does have a robust features that also allow for Continuous Integration/Continuous Deployment. These features are a part of the use of Github Actions, in this blog post I’m going to use the example code in a repository here this…
-
Batch Jobs in Azure OpenAI
Introduction In the existing landscape of Generative AI, optimizing API submissions is crucial for both cost and performance. Whether you’re fine-tuning token usage or streamlining context-aware requests using Retrieval-Augmented Generation (RAG), finding the right tools can make a significant difference. One of the most promising solutions is the Azure OpenAI Batch API, designed specifically for…
-
Mutability of FIPS on AKS
Introduction Your in compliance and tasked with identifying which microservice supported supports Federal Information Processing standards. Operations are dynamic and can change from supporting a business unit that might have this requirement, so what are you options if you have to revert and keep the cluster? Currently in Azure Kubernetes Service this has been capable…
-
Artifact Registry VEX in GCP
Introduction Vulnerability Exchange (VEX) or Vulnerability Exploitability eXchange is a communication format that is used to share detailed information about the exploitability of vulnerabilities in software products. VEX documents provide essential details about vulnerabilities, focusing on whether they are exploitable in the specific context of the software or environment in which they are found. Given…
-
Groq + Exa.ai Powerful Searching across LLMs
Introduction I’ve been exploring APIs that extend some search capabilities of existing LLM models for knowledge that isn’t known to the underlying model and ideally assist with relevant knowledge bases for some research I’m conducting. I’ve tried a handful of API’s such as Serper API that is very powerful and recently did a video using…
-
Google Cloud Privileged Access Management
Today’s vast array of identities whether they are human-centric identities or machine-identities have a large amount of permissions tied to them, given the attack surface of cloud identities can be tied to resources that are also mapped to other services this can be a sticky situation. Most hyperscalers have best practices documented on Identity and…