AI Services and Tools (3)#

In the previous chapters, we looked at major AI services and APIs. In this chapter, we’ll cover tools that are closer to the “ecosystem”: an open-model platform (Hugging Face), an LLM app framework (LangChain), image / video / music generation tools, developer-focused AI coding tools (Copilot / Cursor), and the trend of integrating AI directly into browsers (AI browsers).

Hugging Face#

In the public model ecosystem, there are many options that go beyond what people typically call “open source,” including open-weight models whose weights are publicly available. Hugging Face is one of the most representative platform-and-community hubs for sharing AI models and datasets. You can discover and download models / data, re-share fine-tuned models, and even try models or host demos in the browser, which makes it easier to compare and choose. It also provides features to help users deploy and serve models on clouds (AWS, Google Cloud, Azure, etc.), often via integrations / partnerships.

LangChain#

LangChain is a framework / library commonly used to integrate LLMs into applications. It standardizes recurring work such as RAG (document retrieval + answering), prompt templates / chain composition, tool calling, and conversation state management—making it useful when you want to build a “convincing demo” quickly or stitch together a multi-component LLM app. But even with a framework, it’s important to remember that operational concerns like data quality / permissions / logging / evaluation (evals) / safety guardrails still need to be designed intentionally.

To integrate AI into an in-house application (personal or company), you can either connect APIs (or SDKs) from providers like OpenAI or Gemini directly, or use libraries like LangChain and CrewAI. For production products, it’s hard to say “a library is always the right answer,” but when you have more moving parts—RAG / tool calling / conversation state / orchestration—libraries often help by standardizing repetitive work. On the other hand, for a simple feature, wiring an SDK directly can be simpler, more transparent, and easier to operate. In my case, when building an AI-powered service at work, I considered LangChain vs. CrewAI, and ended up choosing LangChain (and LangGraph) because I found CrewAI less convenient for reflecting detailed requirements.

Image generation tools and services#

For images needed in apps or web services, ad banners, or artwork, you can use AI image generation tools or services. As mentioned earlier, you can generate images using ChatGPT or Gemini, but there are many other options as well. Two common approaches are using Stable Diffusion and using Midjourney.

Stable Diffusion is a family of image generation models that evolved largely around Stability AI and has had major influence in the public ecosystem. Thanks to the benefits of open-source / public-model ecosystems, it’s often relatively flexible to use—but depending on the model / weights / distribution format, you may still have licensing and usage conditions (commercial use, redistribution, scope, etc.), so it’s safer to check once before using it in a business. You can run it locally or in the cloud, and you can also use it via Stability AI’s online service (e.g., Stable Assistant) or many third-party tools. Midjourney, on the other hand, is a paid online service developed by Midjourney, Inc. Its usability was once centered around Discord, but later expanded into service access through its own website.

When comparing ChatGPT (OpenAI image models), Gemini, Stable Diffusion, and Midjourney, it’s often more practical to focus on user experience / level of control / operating model than to argue about absolute “best.” ChatGPT is convenient for quickly iterating through drafts in a conversational flow. Stable Diffusion–style workflows are easier to customize if you’re willing to set up an environment (local / server) and build a pipeline. Midjourney is often praised for artistic style and the “texture” of its outputs. Still, such evaluations can change with time and model updates. And regardless of tool, it’s still hard to get “exactly the result you want” in one try—so it’s realistic to approach it as an iterative workflow: generate / select / refine (retouch).

AI image generation is one of the most actively used areas in business, and new models and services keep appearing even now. So it matters to explore and pick tools that fit your goals. At least for now, though, it’s still hard to get “the exact intended output in one shot.” It can be extremely useful when you can accept a randomly generated result as-is or use outputs as inspiration, but it’s comparatively difficult to produce data-accurate infographics or do highly precise “Photoshop-like” edits to reach a near-perfect answer. I also remember, back in 2024 when AI image generation became a big topic, even simple prompts like “draw three concentric circles” could yield strange results. Things are improving over time, but compared to “drawing ability,” the “ability to produce accurate infographics” still tends to be weaker.

Video generation tools and services#

Video generation is another fast-growing area, much like image generation. As of January 2026, Google and OpenAI have each released video generation models / services such as Veo 3 and Sora. Veo 3 can be used in Gemini, and Sora can be used via the Sora site (https://sora.chatgpt.com). In general, there are still common constraints such as access restrictions by region / account / plan, limits on length, resolution, and speed (credits / quota), and abuse prevention (policy enforcement, filtering, watermarking, etc.)—so if you want to use it for work, it’s best to first check “what my account can do, and where.”

In terms of capability, you can generate short videos from text prompts, and also do “generation + editing” work—like changing style or modifying a specific segment—based on inputs such as images or existing videos. In other words, it’s well suited to an iterative workflow where you generate repeatedly and refine parts to polish the final output.

AI-generated video is already being used across many business areas: film / game scene creation, short ads for social, product demo videos, summary clips from longer videos, and videos where a virtual avatar reads a script with expressions. For avatar-speaking videos, you can find many services by searching keywords like “AI avatar video” or “AI Face Animator.” Research models like VASA-1 and EMO have also been introduced. In practice, a common approach is using tools like Synthesia, which provides avatar-based video creation as a service.

Music generation tools and services#

There are also many music generation services. If you search “AI Music Generator,” you’ll find options like Udio and Suno. If you listen to sample tracks, you can feel that quality is quickly moving beyond “demo level” into something usable for real content. You can also generate sound effects using features like ElevenLabs’ Text to Sound effects. That said, terms for music / voice / sound effects vary by service—especially around copyright, licensing, and commercial usage—so if you plan to use outputs in a business, you should check the terms and the allowed scope of generated assets.

Even if you’ve never produced videos or music before, you can now generate videos with AI and also generate matching music and sound effects. If you’re willing to lower the psychological barrier of “I’ve never done this before,” we’re in an era where almost anyone can experiment and create high-quality content.

Copilot#

Copilot literally means “co-pilot.” Beyond simply “executing” user commands, it’s also commonly used as a general term for AI tools/services that support a user’s workflow alongside them. Two representative examples are GitHub Copilot and Microsoft Copilot.

GitHub Copilot has expanded beyond code autocomplete (inline suggestions) inside an IDE/editor into a broader assistant for coding work: explanations, refactoring, guidance on how to fix errors, and generating tests. In other words, it has evolved from merely filling in function names or parameters to providing block-level suggestions and conversational support based on the context of the current file / project.

Microsoft Copilot is not a single product. It has expanded into multiple product lines—for example, helping with writing / summarizing / analysis / presentation building inside Microsoft 365 apps (Word / Excel / PowerPoint), and also chat-based work assistance. So “Microsoft Copilot” is more accurately understood as a brand that spans features inside Office apps as well as tools to build and manage customized Copilots within an organization.

One key difference between Copilot-style tools and older “simple AI assistants” is the product integration that makes them feel like they “understand context.” Older assistants often triggered fixed actions based on short commands (“play music,” “call someone”). In contrast, Copilot-style tools are more deeply integrated into working surfaces like editors, documents, and spreadsheets—drawing on information close to “what I’m doing right now” (current file, selection, table / document structure, related data) without requiring a long explanation from the user. In that sense, “understanding context” doesn’t mean the model gained superpowers; it often means the product has become much better at providing the right context to the model.

Cursor#

While GitHub Copilot was still delivering a fresh shockwave to developers, “AI-first IDEs” like Cursor emerged relatively quickly. Cursor can be understood as a tool that puts workflows for “making codebase-level changes” front-and-center—going beyond code completion / chat on top of a VS Code-like editor experience.

There are similar tools such as Windsurf and Google’s Antigravity. I initially worried about which tool to settle on, but I decided to stick with Cursor—because tools with similar concepts tend to compete and quickly catch up with each other’s feature sets. In other words, if another tool introduces (or popularizes) a new capability, I expect Cursor will likely add something similar soon, so I chose to focus less on tool-switching costs and more on changing my way of working.

When I first started using Cursor, it felt like “just a bit better than Copilot.” But once Agent features arrived, the experience changed dramatically. It moved beyond “a tool that writes the next line” toward an experience where you give requirements and the agent explores the codebase, edits multiple files, and explains changes as needed—closer to “pushing the task to completion.”

The experience of building with Cursor can often be compared to pair programming: Cursor acts as the driver (writing / editing code), and the developer acts as the navigator (setting direction, providing requirements, reviewing results). The key to the navigator role isn’t “letting it write code,” but giving clear goals / constraints / quality criteria and reviewing outcomes until they converge on the intent.

To confirm whether the code actually meets the goal, you can also have the agent write tests and run them to self-check whether “the job is done.” In other words, it encourages a habit of finishing with “verified, working output,” not just something that “looks plausible.”

In practice, it can also be useful to run multiple agents / sessions in parallel for different tasks. For example, one agent investigates a bug while another strengthens tests or cleans up documentation—similar to a developer doing two things at once (while still requiring a human to do final integration and review).

Finally, I felt Cursor may not remain “only for developers.” Beyond writing code, it can read and modify local files, run commands to automate work, and connect external tools (e.g., via MCP-style integrations) to extend workflows. That suggests there may be room to apply it to non-development work as well—by breaking repetitive tasks into steps and automating them. We’ll cover concrete approaches later in AI Implementation Strategies.

AI browsers#

I realized I spend an enormous amount of time in a web browser for learning and work, and I expected that “how deeply AI is integrated into the browser environment” would become increasingly important. For a while, I was especially interested in attaching AI as a browser extension (Extension) and was also building related apps personally.

Around that time, an AI browser called Fellou appeared. Seeing a product designed as an AI-first browser app—rather than as an extension—made me think “so we’re getting entirely new browsers,” and it felt like getting hit in the back of the head.

If you search for “AI browser” today, you’ll find many browsers / projects. Especially since mid-2025, products with similar concepts have been flooding in, but many are still in beta / invite-only / waitlist states, so there aren’t as many that “anyone can try right away” as you might expect.

In that context, toward the end of 2025, a browser (or browser-app-style product) called Atlas also appeared from the ChatGPT side, and I had a chance to try it. There were some rough edges in polish, but the direction—“browsing itself becomes an AI workflow”—felt compelling, and I had high expectations for what comes next. As of January 2026, it’s still only available as a macOS app, so Windows users need to wait a bit longer.