Functional String Processing: Implement in F#, Call from C#

I recently had the task to build a string extracted from some XML file into a very specific format under some constraints. Some of those requirements were:

System must accept a string containing tagged content in format /TAGx/content
System must accept a priority map defining the processing order of tags [this was predefined in configuration]
Tags must be processed in order of their assigned priority, lower number = higher priority
Maximum content length must be 390 characters
Content must be divided into segments of exactly 65 characters each
Maximum of 6 segments allowed (65 × 6 = 390)
Content can flow across segments without regard to tag boundaries
Final segment may be shorter than 65 characters
Each tag and its associated content must be processed in priority order
If including a tag's content would cause total content length to exceed 390 characters:
- The entire tag and its content must be excluded
- All subsequent tags must also be excluded
Partial tag content is not allowed if it would cause total length breach

There were more, but these were the hard requirements necessary for further processing.

This seemed like a good opportunity to use a functional paradigm and of course commence a more serious functional learning journey since it was my first foray into F# in a production setting after some hobbyist projects. The received content from the C# part of the application would be in the form of a list of StringBuilder objects, each segment of the list would look like this:

var sb = new StringBuilder
    ("/TAG1/AAAAAAAAAAAAAAAAAAAA/TAG2/BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB/TAG3/CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC/TAG4/DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD/TAG5/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/TAG6/small-complete-tag/TAG7/another-small-tag/TAG8/this-should-not-appear");

Configuration data when materialized into the app would look like:

List<(string, int)> priorityMap =
[
    ("/TAG1/", 1),
    ("/TAG2/", 2),
    ("/TAG3/", 3),
    ("/TAG4/", 4),
    ("/TAG5/", 5),
    ("/TAG6/", 6),
    ("/TAG7/", 7),
    ("/TAG8/", 8)
];

Then we enter the string processing logic in F#. First, we define our core types to represent the segments and the result of the processing:

[<Struct>]
type TagSegment = {
    StartPos: int
    Length: int
    Priority: int
    Content: string
}

[<Struct>]
type ArrangeResult = {
    BuildString: StringBuilder
    Segments: TagSegment list
}

The TagSegment record captures each piece of tagged content along with its position, length, and priority. The ArrangeResult wraps both the final StringBuilder and the list of processed segments for convenient consumption from C#.

Now we build our processing pipeline. First, a helper to find where the next tag begins:

let private findNextTagPosition (content: string) (currentPos: int) (tag: string) (tags: string list) =
    tags
    |> List.choose (fun t ->
        let pos = content.IndexOf(t, currentPos + tag.Length)
        if pos > currentPos then Some pos else None)
    |> function
        | [] -> content.Length
        | positions -> List.min positions

This function scans ahead from the current position to find the nearest occurrence of any tag. If no tag is found, it returns the end of the string. This lets us know where the current tag's content ends.

Next, we extract the tag segments on a positional basis:

let private findTagSegments (content: string) (tag: string) (priority: int) (allTags: string list) =
    let rec loop pos acc =
        match content.IndexOf(tag, pos, StringComparison.Ordinal) with
        | -1 -> List.rev acc
        | currentPos ->
            let nextPos = findNextTagPosition content currentPos tag allTags
            let segment = {
                StartPos = currentPos
                Length = nextPos - currentPos
                Priority = priority
                Content = content.[currentPos..nextPos-1].TrimEnd()
            }
            loop nextPos (segment :: acc)
    loop 0 []

Essentially:

search for a tag starting at position pos
if no tag is found [-1 situation] return the accumulated segments in reverse order to preserve the original order
if a tag is found:
- find the next tag position
- create a new segment with the content between the current and next tag position
- recursively search from the next position for the next tag, adding the new segment to the accumulator

With segment extraction in place, we need to collect segments from all tags and sort them by priority:

let private processSegments (content: string) (priorityMap: (string * int) list) =
    let tags = priorityMap |> List.map fst
    priorityMap
    |> List.collect (fun (tag, priority) ->
        findTagSegments content tag priority tags)
    |> List.sortBy (fun s -> s.Priority)

This function iterates through the priority map, extracts segments for each tag, flattens them into a single list, and sorts by priority. Lower priority numbers come first as indicated in the priority map, ensuring the most important content gets included.

Next comes the validation step, this is where we enforce the 390-character limit and the "all or nothing" rule for tags:

let private validateSegments (maxLength: int) (segments: TagSegment list) =
    let rec loop acc length = function
        | [] -> List.rev acc
        | segment :: rest ->
            let newLength = length + segment.Content.Length
            if newLength <= maxLength
            then loop (segment :: acc) newLength rest
            else List.rev acc
    loop [] 0 segments

The validateSegments function walks through the priority-sorted segments, accumulating content length as it goes. The moment adding a segment would exceed maxLength, it stops and returns what it has, no partial inclusions allowed. This elegantly satisfies the requirement that if a tag would cause a breach, it and all subsequent tags are excluded.

Finally, we build the result of the processing inside the buildFinalResult function:

let private buildFinalResult (segments: TagSegment list) =
        let result = StringBuilder(maxTotalLength)
        let combinedContent =
            let sb = StringBuilder()
            segments |> List.iter (fun s -> sb.Append(s.Content) |> ignore)
            sb.ToString()

        let rec splitIntoChunks pos content acc =
            match content with
            | "" -> List.rev acc
            | remaining ->
                let chunkSize = Math.Min(maxSetLength, remaining.Length)
                if pos + chunkSize > maxTotalLength then
                    List.rev acc
                else
                    let chunk = remaining.[0..chunkSize-1]
                    let newSegment = {
                        StartPos = pos
                        Length = chunkSize
                        Priority = 0
                        Content = chunk
                    }
                    result.Append(chunk) |> ignore
                    splitIntoChunks
                        (pos + chunkSize)
                        (remaining.[chunkSize..])
                        (newSegment :: acc)

        let finalSegments = splitIntoChunks 0 combinedContent []
        { BuildString = result; Segments = finalSegments }

The buildFinalResult function essentially takes our processed segments and prepares them for final output while respecting size constraints.

First we combine all the segment contents into a single string
Then we split the combined content into chunks of 65 characters each, respecting the size constraint [never exceed 390 characters]
As it creates these chunks, it builds both a StringBuilder containing the final text and a list of segments that track where each chunk begins and ends. Think of it like taking a long piece of text and carefully dividing it into evenly-sized pages, while keeping track of where each page starts and what content it holds. The function manages this process recursively rather than with traditional loops [which we would use if this were C#], accumulating the chunks one at a time until it either runs out of content or hits the maximum length limit
Finally, it returns the StringBuilder containing the final text and the list of segments that track where each chunk begins and ends, packaging up both the StringBuilder and the list of segment locations into a single result that can be easily consumed by other parts of the application, particularly from C# call-sites.

The concluding function within the F# core simply pipelines the necessary internal functions to process the StringBuilder and packages the entire code to make it callable from C# as a regular extension method:

let arrangeSegments (sb: StringBuilder) (priorityMap: seq<string * int>) =
        if not (isValidInput sb priorityMap) then
            { BuildString = sb; Segments = [] }
        else
            let content = sb.ToString()
            let result =
                priorityMap
                |> Seq.toList
                |> processSegments content
                |> validateSegments maxTotalLength
                |> buildFinalResult

            sb.Clear() |> ignore
            sb.Append(result.BuildString) |> ignore
            result

module SbExtensions =
    type StringBuilder with
        member this.ArrangeSegments (priorityMap: seq<string * int>) =
            StringBuilderExtension.arrangeSegments this priorityMap

allowing us to call it from C# like this:

var result = builder.ArrangeSegments(priorityMap);

To see this in action, let's trace through our example input. The original string totals well over 390 characters:

Tag	Content Length	Running Total	Status
TAG1	26 (including tag)	26	✅ Included
TAG2	50	76	✅ Included
TAG3	111	187	✅ Included
TAG4	62	249	✅ Included
TAG5	77	326	✅ Included
TAG6	24	350	✅ Included
TAG7	24	374	✅ Included
TAG8	28	402	❌ Excluded (would exceed 390)

TAG8's content "/TAG8/this-should-not-appear" is excluded entirely because including it would push us over the 390-character limit. The validated content is then split into six 65-character chunks (with the last chunk being shorter), and the result is returned with both the StringBuilder and the segment metadata.

This approach allowed me to build a robust and testable string processing pipeline in F# while keeping the core logic isolated and reusable. The extension method approach in C# made it easy to integrate the F# functionality into existing codebases without significant changes to the existing architecture. The functional paradigm allowed me to express the logic in a more declarative way, making it easier to reason about and test. By the way, you might notice some F#-Rust parallels here, and you would be correct.