Alfresco Document Lifecycle Management
This article explores the key stages of document management within the Alfresco content management system, focusing on the critical processes of document creation, deletion from Alfresco Share, and subsequent removal from the trashcan and content store. Additionally, it delves into the complete life cycle of a document, examining the location of its binary file at various stages, high-level database SQL queries, Solr indexing, and other key technical aspects related to a document created and later deleted in Share.
This diagram illustrates the lifecycle of a document in Alfresco, from creation to permanent deletion. Initially, the document resides in the Workspace, with its binary content in the Content Store, metadata in the Database, and an index in the Solr Alfresco Index. Upon deletion, it moves to the ArchiveSpace (trashcan), where its metadata and binary content remain, but it is re-indexed in the Solr Archive Index. If deleted from the trashcan, the metadata is removed, and the binary file becomes orphaned in the Content Store, which is eventually cleaned up by the Content Store Cleaner Job, completing the lifecycle.
Create new document in Alfresco through Share
A document created in Share will be stored under Primary Store called SpacesStore which refers to the main, active content repository used by Alfresco.
Get nodeRef, contentUrl and dbid of the document
Location of the binary file in content store
Alfresco references binary files using a contentUrl, which is a logical path stored in the database. For instance: store://2024/12/11/2/20/4e309e94-6c8a-45f6-8272-43f5d44d864e.bin. This path is not a direct file system location but can be viewed in the Node Browser for troubleshooting or management.
Base Directory (dir.root):
The primary directory for Alfresco’s data is configured in the alfresco-global.properties file using the dir.root property. For example, this could be set as:
/vm/alfresco-data-01/alf_data.
Content Store Directory (dir.contentstore):Within the base directory, the content store is configured in the repository.properties file with the dir.contentstore property. This typically references a subdirectory inside dir.root, like:
${dir.root}/contentstore.
Physical Path:Alfresco maps the contentUrl to a file in the content store directory. Using the previous example, the binary file would be stored at:
/vm/alfresco-data-01/alf_data/contentstore/2024/12/11/2/20/4e309e94-6c8a-45f6-8272-43f5d44d864e.bin.
Search in Solr console
Search in Solr Admin console
@cm\:name:"file-rose-001.jpg"
Search in Node Browser using lucene query under Share Admin Tools
@cm\:name:"file-rose-001.jpg"
SQL to find the document in database
SELECT
n.id AS dbid,
CONCAT(
s.protocol, '://', s.identifier, '/', n.uuid
) AS node_ref,
p.string_value AS document_name
FROM
alf_node n
JOIN
alf_store s ON n.store_id = s.id
JOIN
alf_node_properties p ON n.id = p.node_id
WHERE
n.id = 1235 -- Replace with your dbid
AND p.qname_id = (
SELECT id
FROM alf_qname
WHERE local_name = 'name'
AND ns_id = (
SELECT id
FROM alf_namespace
WHERE uri = 'http://www.alfresco.org/model/content/1.0'
)
);
SQL to find the document in database by uuid (PostgreSQL):
SQL to find the document in database by dbid
SELECT
n.id AS dbid,
CONCAT(
s.protocol, '://', s.identifier, '/', n.uuid
) AS node_ref,
p.string_value AS document_name
FROM
alf_node n
JOIN
alf_store s ON n.store_id = s.id
JOIN
alf_node_properties p ON n.id = p.node_id
WHERE
CONCAT(s.protocol, '://', s.identifier, '/', n.uuid) = 'workspace://SpacesStore/3324c889-509a-44dc-bc4a-cf3566ac9bd7' -- Replace with your node_ref
AND p.qname_id = (
SELECT id
FROM alf_qname
WHERE local_name = 'name'
AND ns_id = (
SELECT id
FROM alf_namespace
WHERE uri = 'http://www.alfresco.org/model/content/1.0'
)
);
Delete the file from Share
Once the file is deleted from Share, it will move to trashcan which can be accessed from Trashcan under My Profile page.
Get nodeRef, contentUrl and dbid of the document
@cm\:name:"file-rose-001.jpg"
Archived Content Store: This refers to a secondary, specialized content store intended for archived or historical content that is no longer in active use but is retained for long-term storage or compliance reasons. Note that the protocol part of the NodeRef changed to archive.
Also note in the below screenshot the dbid is changed from 1235 to 1238.
Location of the binary file in content store
The location of the binary file in the content store is still the same.
Search in Solr console
@cm\:name:"file-rose-001.jpg"
SQL to find the document in database
SQL to find the document in database by dbid (PostgreSQL)Deleting Files from the Trashcan in Alfresco
In Alfresco, files in the trashcan can be deleted manually or automatically:
Manual Deletion:Use the Trashcan UI to select and permanently delete files or use the REST API to programmatically delete files by sending a DELETE request.
Automatic Deletion:Configure the Trashcan Cleaner Job to automatically remove files after a specified retention period. This job runs periodically based on a predefined schedule. Details are given below. The files deleted from the Trashcan in Alfresco will no longer have references in the database or Solr index. Here's why:
Solr Index Cleanup:Alfresco ensures that Solr indexes are updated in real-time or during scheduled index maintenance. When a file is permanently deleted, the corresponding Solr entries are also removed to reflect the deletion. However, the binary content of the file in the content store still remains until it is handled by the Content Store Cleaner Job. This job removes orphaned files (files no longer referenced in the database or index) after a defined retention period.
Configure Trashcan Cleaner
The Trashcan Cleaner permanently removes nodes from the repository that have been soft-deleted and are sitting in the Trashcan (also known as the Recycle Bin). Trashcan Cleaner job in Alfresco removes all database entries related to a deleted document, including metadata, associations, and references. However, the binary content of the document remains in the ${dir.root}/contentstore directory. This is because the removal of binary files from the content store is managed separately by the Content Store Cleaner job, which identifies and handles orphaned files (files no longer referenced in the database). Trashcan Cleaner job details are documented in the following link: https://docs.alfresco.com/content-services/7.4/admin/content-stores/#configure-trashcan-cleaner
Trashcan Cleaner job related properties:
#Specifies the cron schedule for the Trashcan Cleaner job. See Scheduled Jobs.
trashcan-cleaner.cron=0 30 * * * ?
#Specifies the period for which trashcan items are kept (in the java.time.Duration format).
trashcan-cleaner.keepPeriod=P1D
#Specifies the number of trashcan items to delete per job run.
trashcan-cleaner.deleteBatchCount=1000
Removing orphaned files from Content Store
The Content Store Cleaner Job in Alfresco is responsible for cleaning up orphaned or unused content files in the content store. It helps remove files that are no longer associated with any nodes in the repository. The Content Store Cleaner moves the file from dir.contentstore to dir.contentstore.deleted.
Location of the binary file in content store deleted
The location of the binary file in the content store is still the same.
Content Store Cleaner job related properties below:
# Content Store Cleanup Cron Expression (run daily at 4 AM)
system.content.orphanCleanup.cronExpression=0 0 4 * * ?
# Content protection period for orphaned files (14 days)
system.content.orphanProtectDays=14
# Do not clean up orphaned content immediately
system.content.eagerOrphanCleanup=false
# Batch size for orphan cleanup process
system.content.orphanCleanup.batchSize=1000
Note: Files in the dir.contentstore.deleted directory can be removed manually from the file system or deleted through scheduled jobs configured at the operating system level, such as using cron jobs on Linux or Task Scheduler on Windows. This ensures efficient cleanup of orphaned files that are no longer referenced in the repository.
Post a comment