Alfresco Document Lifecycle Management

This article explores the key stages of document management within the Alfresco content management system, focusing on the critical processes of document creation, deletion from Alfresco Share, and subsequent removal from the trashcan and content store. Additionally, it delves into the complete life cycle of a document, examining the location of its binary file at various stages, high-level database SQL queries, Solr indexing, and other key technical aspects related to a document created and later deleted in Share.

Alfresco document life cycle workflow diagram

This diagram illustrates the lifecycle of a document in Alfresco, from creation to permanent deletion. Initially, the document resides in the Workspace, with its binary content in the Content Store, metadata in the Database, and an index in the Solr Alfresco Index. Upon deletion, it moves to the ArchiveSpace (trashcan), where its metadata and binary content remain, but it is re-indexed in the Solr Archive Index. If deleted from the trashcan, the metadata is removed, and the binary file becomes orphaned in the Content Store, which is eventually cleaned up by the Content Store Cleaner Job, completing the lifecycle.

Create new document in Alfresco through Share

A document created in Share will be stored under Primary Store called SpacesStore which refers to the main, active content repository used by Alfresco.

Alfresco Document Life Cycle New Document

Get nodeRef, contentUrl and dbid of the document
Alfresco Document Life cycle New Document

Alfresco Document Life Cycle Node Browser
Alfresco Document Life Cycle Node Browser

Location of the binary file in content store

Alfresco references binary files using a contentUrl, which is a logical path stored in the database. For instance: store://2024/12/11/2/20/4e309e94-6c8a-45f6-8272-43f5d44d864e.bin. This path is not a direct file system location but can be viewed in the Node Browser for troubleshooting or management.

Base Directory (dir.root):

The primary directory for Alfresco’s data is configured in the alfresco-global.properties file using the dir.root property. For example, this could be set as:

/vm/alfresco-data-01/alf_data.

Content Store Directory (dir.contentstore):

Within the base directory, the content store is configured in the repository.properties file with the dir.contentstore property. This typically references a subdirectory inside dir.root, like:

${dir.root}/contentstore.

Physical Path:

Alfresco maps the contentUrl to a file in the content store directory. Using the previous example, the binary file would be stored at:

/vm/alfresco-data-01/alf_data/contentstore/2024/12/11/2/20/4e309e94-6c8a-45f6-8272-43f5d44d864e.bin.

Alfresco Document Life Cycle Binary File

Search in Solr console

Search in Solr Admin console

@cm\:name:"file-rose-001.jpg"
Alfresco Document Life Cycle Binary File

Search in Node Browser using lucene query under Share Admin Tools

@cm\:name:"file-rose-001.jpg"
Alfresco Document Life Cycle Binary File

SQL to find the document in database

SELECT 
    n.id AS dbid,
    CONCAT(
        s.protocol, '://', s.identifier, '/', n.uuid
    ) AS node_ref,
    p.string_value AS document_name
FROM 
    alf_node n
JOIN 
    alf_store s ON n.store_id = s.id
JOIN 
    alf_node_properties p ON n.id = p.node_id
WHERE 
    n.id = 1235  -- Replace with your dbid
    AND p.qname_id = (
        SELECT id 
        FROM alf_qname 
        WHERE local_name = 'name'
          AND ns_id = (
              SELECT id 
              FROM alf_namespace 
              WHERE uri = 'http://www.alfresco.org/model/content/1.0'
          )
    );

Alfresco Document Life Cycle Binary File Sql

SQL to find the document in database by uuid (PostgreSQL):

SQL to find the document in database by dbid

SELECT 
    n.id AS dbid,
    CONCAT(
        s.protocol, '://', s.identifier, '/', n.uuid
    ) AS node_ref,
    p.string_value AS document_name
FROM 
    alf_node n
JOIN 
    alf_store s ON n.store_id = s.id
JOIN 
    alf_node_properties p ON n.id = p.node_id
WHERE 
    CONCAT(s.protocol, '://', s.identifier, '/', n.uuid) = 'workspace://SpacesStore/3324c889-509a-44dc-bc4a-cf3566ac9bd7'  -- Replace with your node_ref
    AND p.qname_id = (
        SELECT id 
        FROM alf_qname 
        WHERE local_name = 'name'
          AND ns_id = (
              SELECT id 
              FROM alf_namespace 
              WHERE uri = 'http://www.alfresco.org/model/content/1.0'
          )
    );
Alfresco Document Life Cycle Binary File Sql Data by Noderef

Delete the file from Share

Once the file is deleted from Share, it will move to trashcan which can be accessed from Trashcan under My Profile page.

Alfresco Document Life Cycle Trashcan

Get nodeRef, contentUrl and dbid of the document
@cm\:name:"file-rose-001.jpg"
Alfresco Document Life Cycle Trashcan Archive

Archived Content Store: This refers to a secondary, specialized content store intended for archived or historical content that is no longer in active use but is retained for long-term storage or compliance reasons. Note that the protocol part of the NodeRef changed to archive.
Also note in the below screenshot the dbid is changed from 1235 to 1238.

Alfresco Document Life Cycle Trashcan Archive Spacestore

Location of the binary file in content store

The location of the binary file in the content store is still the same.

Alfresco Document Life Cycle Binary File

Search in Solr console

@cm\:name:"file-rose-001.jpg"
Alfresco Document Life Cycle Trashcan Archive Spacestore Solr

SQL to find the document in database

SQL to find the document in database by dbid (PostgreSQL)
Alfresco Document Life Cycle Trashcan Archive Spacestore SQL

Deleting Files from the Trashcan in Alfresco

In Alfresco, files in the trashcan can be deleted manually or automatically:

Manual Deletion:

Use the Trashcan UI to select and permanently delete files or use the REST API to programmatically delete files by sending a DELETE request.

Automatic Deletion:

Configure the Trashcan Cleaner Job to automatically remove files after a specified retention period. This job runs periodically based on a predefined schedule. Details are given below. The files deleted from the Trashcan in Alfresco will no longer have references in the database or Solr index. Here's why:

Solr Index Cleanup:

Alfresco ensures that Solr indexes are updated in real-time or during scheduled index maintenance. When a file is permanently deleted, the corresponding Solr entries are also removed to reflect the deletion. However, the binary content of the file in the content store still remains until it is handled by the Content Store Cleaner Job. This job removes orphaned files (files no longer referenced in the database or index) after a defined retention period.

Configure Trashcan Cleaner

The Trashcan Cleaner permanently removes nodes from the repository that have been soft-deleted and are sitting in the Trashcan (also known as the Recycle Bin). Trashcan Cleaner job in Alfresco removes all database entries related to a deleted document, including metadata, associations, and references. However, the binary content of the document remains in the ${dir.root}/contentstore directory. This is because the removal of binary files from the content store is managed separately by the Content Store Cleaner job, which identifies and handles orphaned files (files no longer referenced in the database). Trashcan Cleaner job details are documented in the following link: https://docs.alfresco.com/content-services/7.4/admin/content-stores/#configure-trashcan-cleaner

Trashcan Cleaner job related properties:

#Specifies the cron schedule for the Trashcan Cleaner job. See Scheduled Jobs. 
trashcan-cleaner.cron=0 30 * * * ?

#Specifies the period for which trashcan items are kept (in the java.time.Duration format). 
trashcan-cleaner.keepPeriod=P1D

#Specifies the number of trashcan items to delete per job run. 
trashcan-cleaner.deleteBatchCount=1000

Removing orphaned files from Content Store

The Content Store Cleaner Job in Alfresco is responsible for cleaning up orphaned or unused content files in the content store. It helps remove files that are no longer associated with any nodes in the repository. The Content Store Cleaner moves the file from dir.contentstore to dir.contentstore.deleted.

Location of the binary file in content store deleted

The location of the binary file in the content store is still the same.

Alfresco Document Life Cycle content Store Deleted

Content Store Cleaner job related properties below:
# Content Store Cleanup Cron Expression (run daily at 4 AM)
system.content.orphanCleanup.cronExpression=0 0 4 * * ?

# Content protection period for orphaned files (14 days)
system.content.orphanProtectDays=14

# Do not clean up orphaned content immediately
system.content.eagerOrphanCleanup=false

# Batch size for orphan cleanup process
system.content.orphanCleanup.batchSize=1000

Note: Files in the dir.contentstore.deleted directory can be removed manually from the file system or deleted through scheduled jobs configured at the operating system level, such as using cron jobs on Linux or Task Scheduler on Windows. This ensures efficient cleanup of orphaned files that are no longer referenced in the repository.

Post a comment