Most AI agencies rent GPUs by the hour and call it innovation. They spin up a cloud instance, run a model someone else trained, pipe the output into a template someone else designed, and bill you for "custom AI development." We take a different approach. We build the hardware, print the parts, test the security, and run the infrastructure ourselves. This is what that actually looks like.
Welcome to the BDK Studios hardware lab. It is not a clean room with glass walls and ambient lighting. It is a working shop with 3D printers running overnight, custom water cooling loops, soldering stations, and enough blinking LEDs to land a small aircraft. Every piece of hardware in here serves a purpose, and every purpose ties back to the work we do for clients.
The Flipper Zero: Security Research Tool, Not a Toy
The Flipper Zero gets a bad reputation because social media is full of people using it to open garage doors and mess with fast food displays. In our lab, it serves a completely different purpose: security research and wireless protocol testing.
Our Flipper runs Unleashed firmware with the WiFi Dev Board and Video Game Module attached. The WiFi Dev Board turns it into a portable wireless network analyzer capable of scanning, identifying, and testing WiFi networks, Bluetooth devices, and sub GHz communications. The Video Game Module adds a secondary processor that handles more compute intensive analysis tasks.
Why does an AI studio need this? Because every system we build for clients connects to their network, interacts with their devices, and handles their sensitive data. Before we deploy any system, we test its attack surface. Can the webhook endpoint be spoofed? Is the Tailscale connection properly locked down? Are there any rogue devices on the client's network that could intercept traffic?
The Flipper Zero lets us test these scenarios in the field. We have used it to identify unsecured IoT devices on client networks that were broadcasting in the clear, Bluetooth devices that were pairable without authentication, and sub GHz signals from legacy alarm systems that could be replayed. Each finding resulted in a security recommendation that protected the client's infrastructure before we ever connected our systems to it.
This is the kind of due diligence that separates "we take security seriously" from actually taking security seriously. You can check out our free security and network tools to see some of what we have built from this research.
The Bambu Lab P1S: Production 3D Printing
Our Bambu Lab P1S with the Automatic Material System (AMS) runs almost every day. It prints real parts that go into real systems.
Custom server mounting hardware. When you are building custom workstations with nonstandard cooling solutions, off the shelf mounting brackets do not always exist. We have designed and printed custom GPU support brackets, fan shrouds, cable management clips, and drive bay adapters. The P1S prints these in PETG or ASA for heat resistance and structural integrity. A bracket that would take weeks to source from a custom fabricator prints overnight.
We wrote about this process in detail in our 3D printing for AI server parts post, including the specific materials and design considerations for parts that live inside high heat enclosures.
Prototype enclosures. Before committing to a final design for any client facing hardware, we print prototype enclosures to test fit, ventilation, and cable routing. This iteration cycle, from CAD to physical part in 4 hours, means we catch design problems before they become expensive manufacturing mistakes.
Client presentation models. When explaining complex system architectures to non technical clients, sometimes a physical model communicates better than a slide deck. We have printed scaled models of network topologies, server rack layouts, and IoT sensor placements that clients can hold in their hands and immediately understand.
The AMS handles automatic filament switching, which means multi material prints run unattended. We queue a print before leaving the studio and pick up the finished part the next morning. The reliability of the P1S has been exceptional. In over 300 prints, we have had fewer than 5 failures, and most of those were user error in slicer settings, not machine problems.
The Dual RTX 5090 Workstation: 64GB of VRAM, Water Cooled
This is the centerpiece. A custom built workstation with two NVIDIA RTX 5090 GPUs, each with 32GB of VRAM, for a combined 64GB of inference memory. The system is water cooled with a custom loop because air cooling cannot sustain the thermal loads of continuous AI inference across two 575 watt GPUs.
The numbers tell the story. This machine processes AI workloads at 415 tokens per second on text generation tasks. It transcribes audio faster than real time. It processes document batches, generates embeddings, runs image analysis, and handles video generation, all simultaneously across the two GPUs.
We benchmarked every aspect of this build in our RTX 5090 inference benchmark, and the water cooling ROI analysis covers why liquid cooling was not optional for this use case.
This workstation powers the tools on our website. When you use our Review Response Writer, QR Code Generator, Background Remover, or any other tool, the inference is running on this machine. Not on a cloud API. Not on a rented GPU. On hardware we built, maintain, and optimize ourselves.
Here is what runs on it daily:
- Whisper for speech to text transcription (float16 precision, required for Blackwell architecture)
- Ollama serving qwen2.5:32b for text inference and qwen2.5vl:7b for vision tasks
- CogVideoX for video generation
- Kokoro TTS for text to speech
- RVC for voice processing
- Embedding models for semantic search and document retrieval
- Rembg for background removal
All services run in Docker containers with a GPU monitor that automatically pauses lower priority services when GPU utilization exceeds 80%. This ensures that client facing workloads always get priority while batch processing tasks yield gracefully.
The Mac Studio Fleet: Orchestration Layer
The GPUs handle the heavy inference, but the Mac Studios handle everything else. An M3 Ultra serves as the primary orchestration node, managing job queues, API routing, health monitoring, and the scheduling logic that coordinates work across all hardware.
An M4 Mac Studio runs the Docker environment for lightweight services that do not need GPU acceleration: webhook handlers, API gateways, database operations, and monitoring dashboards. This separation of concerns means the GPU workstation is never bottlenecked by CPU bound orchestration tasks, and the Mac Studios never compete for GPU resources they do not need.
The entire fleet communicates over Tailscale, an encrypted mesh VPN that creates a private network across all devices regardless of physical location. This means a Mac Studio at the office, the GPU workstation in the server room, and a laptop at a client site all see each other as if they are on the same local network, with end to end encryption and zero trust authentication.
The Testing Arsenal
Samsung Galaxy Z Fold 7. Every client facing system we build gets tested on this device because it represents one of the most challenging form factors in mobile: a folding screen with multiple display modes, dynamic aspect ratios, and continuity requirements between folded and unfolded states. If a web app works correctly on the Z Fold, it works everywhere.
12 Kasa Smart Plugs with Home Assistant. Every significant piece of hardware in the lab is plugged into a Kasa smart plug monitored by Home Assistant. This gives us real time power consumption data for every device, historical usage patterns, and the ability to remotely power cycle any piece of equipment. When the dual 5090 workstation pulls 850 watts under full load, we see it in real time. When a Docker container hangs and a service needs a hard restart, we can cycle the power remotely without being physically present.
The power monitoring also feeds into our cost calculations. We know exactly how much electricity each piece of hardware consumes under different workloads, which means our cost comparisons between local inference and cloud API pricing are based on measured data, not estimates.
Why This Matters for Clients
This is not a hardware flex post. Every piece of equipment in this lab exists because it makes the work we do for clients better, faster, more secure, or more reliable.
The Flipper Zero means we actually test security instead of assuming it. The 3D printer means we can prototype and iterate physical components in hours instead of weeks. The dual 5090 workstation means client data never leaves our infrastructure and inference costs are fixed regardless of volume. The Mac Studios mean orchestration is reliable and separated from compute intensive tasks. The testing devices mean client facing systems work across every form factor. The power monitoring means we have real data on operational costs.
Most agencies outsource all of this. They rent cloud GPUs, use SaaS tools for everything, and charge you for the markup. There is nothing inherently wrong with that model for simple projects. But when you need custom AI systems that handle sensitive data, run on dedicated infrastructure, and integrate with physical hardware, you need a team that builds and maintains the entire stack.
That is what we do. When you hire BDK Studios, you are not hiring consultants who will recommend tools and hand you an implementation guide. You are hiring the team that builds and maintains this entire stack, from the 3D printed server brackets to the Docker containers to the encrypted mesh network that ties it all together.
Want to see what this infrastructure can do for your business? Get in touch and we will show you exactly how it applies to your specific use case.
