Removing Markdown Links in PHP While Preserving Visible Text and Image Alt Text

Introduction

Sometimes you want plain text from Markdown, but you do not want to throw away the human-readable parts.

If your input contains [Laravel docs](https://laravel.com) or ![Architecture diagram](/diagram.png), a naive "remove everything between parentheses" regex will usually destroy the useful text along with the URL. For previews, search indexing, scraping pipelines, and AI preprocessing, that is the wrong tradeoff.

What you usually want is this:

keep the visible text from regular links
keep the alt text from images
drop empty links and empty images
handle nested image-in-link cases without duplicating text

In this article, we will walk through a small PHP utility that does exactly that.

The Target Output

These are the transformations we want:

text

[Laravel docs](https://laravel.com)
=> Laravel docs

![Architecture diagram](/diagram.png)
=> Architecture diagram

[![Architecture diagram](/diagram.png)](/post/system-design)
=> Architecture diagram

[](/internal-link)
=> ""

![](/logo.png)
=> ""

That behavior is especially useful when Markdown is just an intermediate format and your real goal is clean plain text.

The Utility

Here is the core implementation:

php

class MarkdownUtil
{
    public static function removeLinks($markdown)
    {
        $patternWithImageAltText = '/!\[(.*?)\]\((.*?)\)/s';
        $patternWithLinkText = '/\[(.*?)\]\((?![^\[]*?\!\[)(.*?)\)/s';

        $markdown = preg_replace('/!\[\]\((.*?)\)/', '', $markdown);

        $markdown = preg_replace_callback($patternWithImageAltText, function ($matches) {
            return $matches[1];
        }, $markdown);

        $markdown = preg_replace_callback($patternWithLinkText, function ($matches) {
            return $matches[1];
        }, $markdown);

        $markdown = preg_replace('/\[\]\((.*?)\)/', '', $markdown);

        return $markdown;
    }
}

The method is short, but the ordering matters more than the regexes themselves.

Why the Replacement Order Matters

This utility works as a four-stage pipeline:

remove empty images
convert images to their alt text
convert regular links to their visible text
remove empty links

That sequence prevents nested Markdown from being interpreted the wrong way.

Consider this input:

[![Architecture diagram](/diagram.png)](/post/system-design)

If you process links first, you risk treating the entire inner image as generic link text. By processing images before regular links, the input becomes:

[Architecture diagram](/post/system-design)

After that, the regular link pass can safely reduce it to:

text

Architecture diagram

This is the main design idea behind the utility: simplify the Markdown in stages instead of trying to solve everything with one giant regex.

The Image Pattern

The image regex is:

php

$patternWithImageAltText = '/!\[(.*?)\]\((.*?)\)/s';

It captures:

the alt text inside ![ ... ]
the image target inside ( ... )

The callback returns only the first capture group:

php

return $matches[1];

So this:

![Architecture diagram](/diagram.png)

becomes this:

text

Architecture diagram

The /s modifier makes the pattern more tolerant of multiline Markdown, because . can also match newline characters. That helps when link text, alt text, or the target wraps across lines.

The Link Pattern

The regular link regex is:

php

$patternWithLinkText = '/\[(.*?)\]\((?![^\[]*?\!\[)(.*?)\)/s';

The important part is the negative lookahead:

php

(?![^\[]*?\!\[)

This helps the pattern avoid consuming image-style content as if it were a plain link body. In practice, that makes the regular link pass safer around nested Markdown.

Just like the image callback, the link callback returns only the human-readable text:

php

return $matches[1];

So this:

[Laravel docs](https://laravel.com)

becomes this:

text

Laravel docs

Removing Empty Markdown Nodes

The utility explicitly deletes two forms that do not contribute any readable text:

php

$markdown = preg_replace('/!\[\]\((.*?)\)/', '', $markdown);
$markdown = preg_replace('/\[\]\((.*?)\)/', '', $markdown);

That covers:

empty images like ![](/logo.png)
empty links like [](/internal-link)

This is a small but important detail. If you keep them, your output may end up with meaningless placeholders or stray punctuation.

Real Use Cases

This kind of utility is useful when Markdown is not your final output format:

converting scraped HTML to Markdown and then compressing it into plain text
building search or indexing pipelines that should ignore raw URLs
generating content previews where visible text matters more than destination links
cleaning AI or summarization input before sending it into another processing step

One especially practical flow is:

convert HTML to Markdown
replace links and images with visible text
compress whitespace
store the result as plain text for downstream processing

That keeps the readable content while dropping noise from links, tracking URLs, and embedded images.

Test Cases Worth Keeping

The value of this utility is not just the regex. It is the set of edge cases covered by tests.

Useful cases include:

regular links with text
links with query strings and special characters
images with alt text
images with empty alt text
nested images inside links
multiple links and images in the same string
empty input
multiline Markdown with base64 images

A representative example is this base64 image case:

php

$text = <<<'EOT'
Hello,
the email was verified. ![image](data:image/png;base64,iVBORw0KGgoAAA...)
EOT;

$expected = <<<'EOT'
Hello,
the email was verified. image
EOT;

That test proves the utility is not limited to short URL-style image paths.

Where Regex Is Good Enough and Where It Is Not

This is a pragmatic regex solution, not a full Markdown parser.

It does not aim to cover every Markdown form. For example, reference links such as [docs][1], autolinks such as <https://example.com>, escaped bracket cases, and some URLs with parentheses may remain unchanged or produce imperfect output.

That is a strength when:

you control the Markdown shape
you need a lightweight dependency-free helper
you care about a narrow transformation instead of full CommonMark compliance

It becomes a weaker fit when:

the input can contain highly irregular Markdown
you need to preserve complex nested formatting
you must support every edge case of the Markdown spec

For controlled application pipelines, though, this tradeoff is often exactly right. A focused regex utility is easier to maintain, easier to test, and much easier to explain.

Final Thoughts

The useful idea here is not "use regex on Markdown" in the abstract. It is "use a staged transformation with explicit intent."

This utility works because it defines a clear policy:

visible link text stays
image alt text stays
empty nodes disappear
nested cases are simplified in the right order

If you need clean plain text from Markdown in PHP, that policy gives you a compact and production-friendly solution without pulling in a full parser.

Removing Markdown Links in PHP While Preserving Visible Text and Image Alt Text ​

Introduction ​

The Target Output ​

The Utility ​

Why the Replacement Order Matters ​

The Image Pattern ​

The Link Pattern ​

Removing Empty Markdown Nodes ​

Real Use Cases ​

Test Cases Worth Keeping ​

Where Regex Is Good Enough and Where It Is Not ​

Final Thoughts ​

Removing Markdown Links in PHP While Preserving Visible Text and Image Alt Text

Introduction

The Target Output

The Utility

Why the Replacement Order Matters

The Image Pattern

The Link Pattern

Removing Empty Markdown Nodes

Real Use Cases

Test Cases Worth Keeping

Where Regex Is Good Enough and Where It Is Not

Final Thoughts