September 5, 2024

Researching cost-efficient search methods

We partnered with a research company to explore cutting-edge solutions for consolidating various forms of data and building efficient search capabilities using state-of-the-art technologies.

Goals

The primary goal of this project was to research various methods of consolidating data and enhancing search capabilities for different types of information. The research explored ways to integrate and search through data in a cost-efficient manner using advanced technologies, such as text representations, image hashes, and vector-based search methods. Additionally, the project aimed to develop a robust system for protecting data using a custom-built image signature and perception-based authentication solution.

Challenges

This project posed several challenges, primarily due to the use of cutting-edge technologies and the experimental nature of the task. Developing multiple search methods for different data types—such as text, images, and vectors—required significant effort and experimentation. Building a robust and efficient integration between the databases and Spring Boot presented additional complexity, especially when working with Milvus for vector search functionalities and ensuring seamless integration with the Spring ecosystem.

Software Consulting and Development

Using Spring Boot as the foundation, we developed a system capable of consolidating diverse forms of data, from images to text-based information, and allowing efficient searches across these data types. Our team created multiple services, including FastAPI services for image processing and AI model handling, alongside a custom integration with Milvus for vector-based searching and ArangoDB as the primary graph database, enabling complex relationships between different data types and offering flexible search options.

We implemented a powerful image processing pipeline that allowed for image imports and searches based on segments of images. This included the use of classification models that enable searches not only by image but by similar segments. The custom-built image signature and perception-based authentication system ensured a secure environment for all data inputs.

For text data, the system supports various representations, such as transcripts, forum posts, and dictionaries. We tied text data to its dictionary meanings, enabling searches based on synonyms and related words. This capability enhances the search experience by allowing users to query data in a more flexible manner.

Milvus and ArangoDB Integration

The project utilized two different databases: Milvus for vector storage and search, and ArangoDB as the graph database. Milvus allowed us to store and efficiently search image vectors, while ArangoDB handled the graph relationships between different data points. We developed a custom integration to allow seamless interaction between Spring Boot and Milvus, enabling efficient vector-based search capabilities.

Results

By the end of the project, we successfully created a system capable of importing any image and conducting searches based on image segments, using classification models to enhance search accuracy. The system also allows for the import and full-text search of various text-based data types, such as transcripts and dictionaries. Through extensive benchmarking, we identified a cost-effective solution by utilizing ArangoDB for graph-based data management and Milvus for vector-based storage, achieving a balance between features, availability, and cost. We also developed a minimal web interface for testing image search functionalities.

We're using cookies to improve your experience. You can opt in or out at any time.

Privacy policy