Improve graph: entity types, traversal, ingestion pipeline, REST API, tests, and scoring#1139
Open
ChristianKniep wants to merge 3 commits intoMemMachine:mainfrom
Open
Improve graph: entity types, traversal, ingestion pipeline, REST API, tests, and scoring#1139ChristianKniep wants to merge 3 commits intoMemMachine:mainfrom
ChristianKniep wants to merge 3 commits intoMemMachine:mainfrom
Conversation
…ine, REST API, tests, and scoring
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose of the change
Adds a knowledge graph layer to MemMachine, enabling multi-hop relationship traversal, entity-typed nodes, semantic feature relationships, graph analytics, and node deduplication on top of the existing Neo4j vector store. This allows the system to answer queries that require following connections across memories — for example, discovering that Bob is a TensorFlow expert and Project Atlas uses TensorFlow — rather than relying solely on vector similarity scoring.
Description
This PR introduces end-to-end knowledge graph capabilities across the storage, application, and API layers:
Graph infrastructure (
neo4j_vector_graph_store.py,data_types.py,graph_traversal_store.py):RELATED_TOedges are created between semantically similar features during ingestion, controlled by a configurable cosine-similarity threshold (related_to_threshold, default0.70)ENTITY_TYPE_Person,ENTITY_TYPE_Concept,ENTITY_TYPE_Event, etc.) and exposed as a filter parameter on the search APISAME_ASproposals and merge/dismiss resolution via the APIRELATED_TOedge suppression atsimilarity >= 0.99to avoid noiseSemantic ingestion pipeline (
semantic_ingestion.py,semantic_relationship_storage.py,feature_relationship_types.py):RELATED_TO,CONTRADICTS,IMPLIES,SUPERSEDESSemanticRelationshipStorageprotocol exposes relationship CRUD and contradiction detectionEpisode store deduplication (
episode_sqlalchemy_store.py):content_hashcolumn (SHA-256 ofsession_key + producer_id + content) withON CONFLICT DO NOTHINGupsert on both PostgreSQL and SQLiteEpisode.is_newflag allows callers to distinguish newly inserted episodes from deduplicated returnsREST API (
graph_router.py, ~1,900 lines) — new/memories/graphroute group:POST /memories/graph/search/multi-hop— multi-hop traversal from an anchor nodePOST /memories/graph/search/filtered— graph-filtered vector similarity searchPOST /memories/graph/relationships— create typed feature relationshipsPOST /memories/graph/relationships/get— query relationships with direction and confidence filtersPOST /memories/graph/relationships/delete— delete a specific relationshipPOST /memories/graph/contradictions— find allCONTRADICTSpairs within a feature setPOST /memories/graph/dedup/proposals— list duplicate node proposalsPOST /memories/graph/dedup/resolve— merge or dismiss duplicate pairsPOST /memories/graph/analytics/pagerank— compute PageRank (requires GDS)POST /memories/graph/analytics/communities— Louvain community detection (requires GDS)POST /memories/graph/analytics/stats— graph statistics (node/edge counts, degree, type distribution)POST /memories/graph/analytics/shortest-path— shortest path between two nodesPOST /memories/graph/analytics/degree-centrality— degree centrality rankingPOST /memories/graph/analytics/betweenness— betweenness centrality (requires GDS)POST /memories/graph/analytics/subgraph— ego-graph/subgraph extractionConfiguration (
database_conf.py,configuration/__init__.py):gds_enabled,gds_default_damping_factor,gds_default_max_iterations,pagerank_auto_enabled,pagerank_trigger_threshold,dedup_trigger_threshold,dedup_embedding_threshold,dedup_property_threshold,dedup_auto_mergerelated_to_thresholdMigration utilities (
neo4j_migration.py):audit_duplicate_uids,resolve_duplicate_uids,apply_uniqueness_constraints,backfill_entity_type_labelsDocumentation (
docs/open_source/graph.mdx+ four experiment pages):Bruno collection (
tools/bruno/):Dependencies: No new runtime Python dependencies. The Neo4j GDS plugin is optional; all non-analytics endpoints work with a vanilla Neo4j instance.
Fixes/Closes
Fixes #(issue number)
Type of change
How Has This Been Tested?
test_neo4j_knowledge_graph.pyRELATED_TOedge creation, traversaltest_neo4j_knowledge_graph_integration.pytest_neo4j_pagerank_pipeline.pytest_neo4j_shortest_path.pytest_neo4j_subgraph_extraction.pytest_neo4j_degree_centrality.pytest_neo4j_betweenness_centrality.pytest_neo4j_cross_collection_traversal.pytest_neo4j_gds_refinements.pytest_neo4j_graph_stats.pytest_graph_data_types.pytest_episode_dedup.pytest_neo4j_feature_relationships_integration.pytest_neo4j_graph_relationships_integration.pytest_neo4j_utils.pytest_semantic_memory_graph_enrichment.pytest_semantic_prompt_template.pytest_declarative_memory_entity_types.pytest_declarative_memory_graph_search.pytest_graph_router.pytest_graph_integration.pyTest Results: All unit tests pass locally. Integration tests require a running Neo4j instance. GDS analytics tests additionally require the Neo4j GDS plugin.
Checklist
Maintainer Checklist
Screenshots/Gifs
N/A
Further comments
/analytics/pagerank,/analytics/communities,/analytics/betweenness) require the Neo4j Graph Data Science plugin andgds_enabled: truein the Neo4j configuration. All other graph endpoints work with a standard Neo4j instance.RELATED_TOedges withsimilarity >= 0.99are suppressed to avoid near-duplicate noise between identical or near-identical semantic features.score_decay^d(defaultscore_decay = 0.7), and paths crossing low-similarityRELATED_TOedges are penalised via thepath_qualityfield onMultiHopResult.