Hello.
I have been using Findx as a secondary search engines for a few months. For a long time, I noticed that certain searches I need to be exact on in order to get proper results. For example, if I type Mesa3D.org, I will then see a link for Mesa3D.org homepage, but if I type just Mesa3D, I do not see the Mesa3D homepage. The same thing happens when I try to search GitHub.
Are there any plans to improve search results like this?
|
Hi,
thank you for your interest in Findx!
Yes, we have discussed how to improve queries like this, and your request actually triggered us to start development of this feature already.
We will give domain matches a ranking boost, but only for single-term queries. I know Google used to boost domains matching concatenated queries, and it opened up a can of worms due to "spamming" - creative marketers registered long domains like bestflatscreentv.com (not actual example) to match the query "best flat screen tv". We think the risk is limited when we only enable it for single term queries.
I'll update this issue once the feature is in place and our index has been updated.
|
I do not remember, whether I have written about it
at some the Privacore/FindX issue reports before, but my current (2018_07) belief is that just like humans understand text differently, the search engines can also analyze and classify texts differently. The core of the Matter...is that
(The Frame Problem)
One way to understand "the context" is that text T_1 is an element at some container C_1, which can in return reside at some other container C_2, which can in return reside at another container C_3, which can .... and the containers C_1, C_2, C_3, ... are the context of the text T_1. For example, if
then the C_1 that changes the meaning of the T_1 is
that the T_1 is said not in real life, but by an actor to another actor at a movie. The C_2 that contains the C_1 might be that the characters that the actors play at that movie are Americans, not Estonians, which renders the "I love You" to a Estonian meaning of
So, C_1 changes the meaning of the T_1 and the C_2
changes the meaning again. It is not known, how many contexts there are, but the number of contexts that a human is able to understand is limited and the more intelligent the human, the more contexts it can notice/understand. I do not know the meaning or story behind the Matryoshka doll, but maybe it might have something to do with the fact that context changes the meaning. Real life ExampleThat also explains, why the Google Translate will never
work as well as a human translator does, unless Google Translate sends a robot, an android, to live with the humans to GATHER INFORMATION that describes the context or obtains the information by some other means, may be by some derivation, analysis of huge data sets. Computational power wise a few Google datacenters will match the human brain pretty well, it's just the lack of information and speed of information exchange. The datacenters will have the problem that as the speed of light is limited, signal transfers have a physical distance related delays. It takes elecrical_signals/light time to travel from one place to another. The human brain has much smaller physical dimensions than the huge Google datacenters have. -----sidenote---start----
The same problem occurs with physically large microchips: it takes time for a signal to travel from one corner of a die to another, although, as I explained at a post titled "Multi-core CPU Production Economics", due to production economics the future multi-core CPUs with thousands of CPU-cores will likely be multi-die chips. -----sidenote---end------ Few wild, Possibly Flawed, Ideas, how to Approach thisDefinitionsFor the sake of simplicity let's define the Internet
to be a FINITE set of of plain text documents, a Universal Set, and search engine as a function that select a subset of those documents.
In practice the idea that the Internet is a finite set of documents actually
holds for the end user of a classical, non-P2P-search-engine, because those search engines, including the Google and the FindX and the Bing, have a finite index. Even if there are multiple search engines at play, the idea that the number of documents is finite for the end user still holds, because the Universal Set of documents may consist of all of the documents that the search engines have crawled and all of the links that are described at all of the crawled documents and documents that the end user has obtained by other means. Assumption is that the data size of a single document is always finite. A search engine can be seen as a Query parametrized
FILTER that is applied to ALL DOCUMENTS in the Universal Set of documents. (May be a more intuitive example is that in the case of corporate information systems the automatically generated reports are essentially predefined filters combined with some calculations that are applied to the data that the filter has selected. The search engine parameters, "queries", can be so complex that they have to be saved for later re-use.) An Ideal SolutionAs the meaning depends on the context and context is
search engine end user specific, an ideal search result, an ideally selected subset of the Universal Set of text documents, is user specific. A halve-user-specific solution might be that when a doctor uses a medical term for a search query, it gets its own subfield specific documents as the search result, but if a pharmacists uses the same medical term, the search results might consist of drugs that are used at the treatment of the issues related to the medical term, but if a non-medical-professional searches that term, the search results might consist of general discussions about that medical term, links to home pages of fine doctors, links to warnings about shoddy doctors, etc. As a single person can have many roles, one query parameter might be the role of the person. The Mad Idea: Distributed P2P Search Engine That Indexes only Local DocumentsI won't repeat myself here, but I have described part of
the idea at my Silktorrent Fossil repository (archival copy). The distributed P2P Search Engine idea is not new. For me the inspiration has been the YaCy, but due to the fact that the search results should be search engine end user specific, there needs to be end user specific indexes, which means that the Universal Set of text documents needs to be fully indexed for (almost) every end user for at least once and then re-indexed in an end user specific manner after the end user has intellectually changed. For example, a teenage me probably
needs different search results than the adult me. A student studying topic TOPIC_01 probably needs beginner tutorials and early, introductory, scientific papers about the TOPIC_01, but a student, who has completed the course of the TOPIC_01, probably needs a different set of scientific papers about the TOPIC_01, may be later advancements, derivatives, of the TOPIC_01. To my current knowledge, the only way to do that scalably
is to use some sort of scalable P2P system. May be it does not need to be a kind of P2P system, where the end users run the nodes. After all, the 2018 Internet is a P2P system, where the Internet Service Providers as commercial entities have teamed up to form the physical P2P network that is "the internet"(IPv4/IPv6). May be in stead of the YaCy like solution, where every person individually runs a P2P node, each household or a bigger house will host a local server room that is paid for as part of the house utilities. Sever rooms of different multi-flat houses may form a P2P network and may be there are only a few gateways of that P2P network per very neighborhood. ---sidenote--start--
In Estonia every smaller village has just a few, may be just one, optical cables anyway. Those cables carry the whole traffic of the village, which means that physically the P2P-network forming hardware tends to get pooled to local regions central points of failure anyway. Another example of such a local regions central point of failure is the household's internet connection: if it goes down, all family members are without a proper, non-mobile, internet connection. In 2018 Estonia the Toompea Supermafia,
so to speak the Government, State, has an optical network that covers literally every village and the ISPs, including the companies that are considered big companies in the context of the Estonian ~1.4 million inhabitants, either run their own optical network or rent those optical cables from the Supermafia, so that broadband internet is affordable all around 2018 Estonia, even at some deep forest, if that's where one wants to live. By affordable I mean about 30€/month and with about max(2000€) initiation/connecting fee in 2018 EURs, regardless of location. The overall loot/tax rate in 2018 for common Estonians is about 60%, so the Toompea Supermafia, the Government, can afford to "compensate" the optical cabling cost that is above the few thousand EUR initiation fee, but for most rural Estonians the initiation fee is about few hundred EUR maximum, probably even less, depending on the circumstances. ---sidenote--end---- So, the part 1 of the mad idea was that
the indexed data should be stored locally for reindexing according to the context of every person and that there is a scalable and affordable way to do that by using a P2P system that can be maintained by professionals, so that non-IT-people do not have to bother with the maintenance of the P2P search engine. The part 2 of that mad idea is that the system candidates for storing and sharing the local document collections are:
My 2018 favorite is the ZeroNet, but as of 2018_07 it is
NOT READY for prime time. According to my very subjective 2018_07 opinion the IPFS is a failure in terms of implementation technology choice, because the Go compiler is hard to get working, hard to compile. (One version of the Go compiler needs an earlier version of the Go compiler to compile, which in turn needs an even earlier version of the Go to be available, etc. If it can't be bootstrapped easily, then it can't be ported easily and that is enough to disqualify Go for me.) Interestingly as of 2018_07 the IPFS developers have worked on reimplementing the IPFS in JavaScript, but the NodeJS is certainly NOT anything lightweight to run as a background process. The Beaker/Dat people have avoided the Go blunder and the Beaker/Dat is a serious contender, but at some point I suspected that they might lack in modularity by depending heavily on web browser integration. I hope that I'm mistaken about the Dat/Beaker. As the IPFS people have the social issue that they have to
show something for the money of their supporters, the Dat/Beaker project and the ZeroNet project are socially much better positioned to scrap failed attempts, break backwards compatibility and create a clean implementation from scratch. That's why I think that the Dat/Beaker and the ZeroNet have a much better chance of creating a quality oriented solution that will really scale, once they get their work completed to a state, where the projects can technically withstand mainstream adoption. My 2018_07 favorite is the ZeroNet and as of 2018_07 I think that the ZeroNet is the ideal candidate for maintaining the local document collection that a local P2P search engine node should index. What regards to the projects like the Privacore and the FindX, then
as of 2018_07 I believe that one business model for the FindX/Privacore might be
or the hosting of small, dedicated, private, P2P Search Engine nodes. As of 2018_07
the Gigablast author seems to have moved from server sales to donation based financing/"business" model. (Quotes because I can't call the gathering of donations a business, even if it brings in a lot of money, unless it is a public service, in which case the money transfers should NOT be called donantions, but voluntary-keep-running-payments. I guess the Gigablast.com 2018_07 money gathering qualifies as voluntary-keep-running-payments.) Future services to Search Engine RunnersJust like car industry does not produce paint, rubber, metal,
the paint production is outsourced to chemical industry, the Search Engine Industry may divide its task to subtasks and outsource some of them. For example, one sub-task might be the collecting of links. There might be a Linux Foundation like or Apache Foundation like or Eclipse Foundation like non-profit that is jointly financed by many search engine providers and that non-profit does only one thing: creates a gigantic collection of links, without indexing anything. The link collection does not include any duplicate entries. There is absolutely NO LABELING, it is only a raw collection of links. Different search engines might use the same collection of raw links and index the documents by using different contexts. One indexes the documents according to the context of doctors. Another according to the context of logistics specialists, etc. In stead of a single "Bing" or "Google" people would pick a specialized search engine, much like the following search engines are: The specialized search engines might finance themselves by collecting
voluntary-payments-to-keep-running(hereafter: VP2KR) and they may limit their services according to the IP-address ranges that were indicated at the VP2KR money transfers. Each VP2KR money transfer may include a region indicator, like "Estonia" or "town Foo" at its comment field and then all of the resources of the search engine nonprofit are distributed according to the distribution that forms from the VP2KR money transfers. Those money transfers that do not indicate a specific region, go to the "global serving" pool of the search engine. As long as humanity uses deception, there will always
be censorship. In various ways(archival copy). There will always stay a niche market for indexing those documents that are somehow banned, be it due to private interests or supermafia("State" in new-speak/news-peak terms) demands. As long as some members of the humanity are superficial or plain stupid or lazy at educating oneself with self-sought-out materials, deception will be successful and the work of Public Relations_(read: lying for money)_ specialists and mainstream journalists_(for them almost anything goes, as long as it pays well)_ will not run out. As long as supermafia("State") exists, nonprofits that run public search engines will always have supermafia induced censorship limits, but even if they serve only those materials that pass the censorship, they lessen the load of private P2P search engine nodes. Basically, in the future people are expected run their personal search engine aggregators that use the censored, public, search engines and the private engine instances and their neighborhood server room P2P search engine instances. May be one future "job" is to be a sub-contractor to
future "general IT-support". Just like in 2018 there are freelancing plumbers and car repairs specialists, in the future there might be freelancing P2P-search-engine node suppliers, who install a dedicated device on the demands of the "general IT-support". The jobs of applied statisticians, so called "data scientists",
is essentially describing and running mathematically advanced queries. The 2018 "data analytics" companies are essentially specialized search engine companies that run their own, internal, search engines on "small", temporary, "sets of documents"(client data) "one-query-series-at-a-time". As of 2018_07 I guess that the world of applied statisticians and the world of classical search engines can be combined in far more elegant way than the various year 2018 web page analytics and e-shop Artificial Intelligence applications are. I guess that the WolframAlpha is certainly a step to that direction, but there are laso other similar efforts. etc.etc.etc.
Hardware TrendsThe trend of reducing power consumption and
parallelizing as much as possible fits well together with the Andrew Zonenberg's AntiKernel idea (an unofficial collection of materials resides at my repository). As of 2018_07 I suspect that may be one future development might be that in stead of letting the general purpose operating system file system drivers to handle file system related low level details like user permissions, error correction codes, journaling, post-power-failure cleanup, etc., the future "hard disks"(HDDs) might have a POSIX file system protocol in stead of byte and address based protocol. Just like the modern flash memory cards handle wear leveling(archival copy) transparently, the future "HDDs" may handle all of the file system level errors transparently. A step forward might be that in stead of a file system,
some standardized, SIMPLISTIC, database engine is used in stead. That is to say, the communication protocol of the future "HDD" might be a communication protocol of a simplistic, standardized, database engine. That idea is partly described at my site(archival copy). By "simplistic" I mean something simpler than SQLite, something, where queries always have timing guarantees. The timing guarantees can vary by device, but the worst case timing parameters might be read out of the device just like the IDs of year 2018 HDDs and CPUs can be read out of the device. The implementation of the database engine might use FPGAs that are programmed by using "High Level Synthesis". If the code is done to safety critical system quality, then software flaws are rare enough to be irrelevant. The use of such high level "HDDs" affects any search engine
implementation that uses data that is stored on such "HDDs". That is to say, when planning for search engine software architecture, then that's probably one thing to watch out for, specially in terms of how to design search algorithms modular enough to allow the leveraging of the computational power of such HDDs. Redesigning search algorithms, indexing algorithms, might be a lot of work, TOO MUCH WORK, if the work pile has accumulated over 10 years or so. The ConclusionNot even Microsoft and Google can afford to scale
their search engines to the point, where all known documents are reindexed for every end-user according to end-user specific context. Probably only P2P systems can scale to that level. The P2P systems do not have to be run by every end-user individually, there can be small scale clustering of computation resources in the forms of neighborhood servers, company servers, household servers. That is to say, it is hopeless for Privacore/FindX to offer
proper query results to all end-users. Even Microsoft and Google can not TECHNICALLY do it even if they wanted to do it. But it MIGHT be possible to get some sub-task completed in some very future-proof manner, so that the current investment will not be useless at offering future services. Thank You for reading this "blog post"
:-)
|