Removing Markdown Links in PHP While Preserving Visible Text and Image Alt Text
Introduction
Sometimes you want plain text from Markdown, but you do not want to throw away the human-readable parts.
If your input contains [Laravel docs](https://laravel.com) or , a naive "remove everything between parentheses" regex will usually destroy the useful text along with the URL. For previews, search indexing, scraping pipelines, and AI preprocessing, that is the wrong tradeoff.
What you usually want is this:
- keep the visible text from regular links
- keep the alt text from images
- drop empty links and empty images
- handle nested image-in-link cases without duplicating text
In this article, we will walk through a small PHP utility that does exactly that.
The Target Output
These are the transformations we want:
[Laravel docs](https://laravel.com)
=> Laravel docs

=> Architecture diagram
[](/post/system-design)
=> Architecture diagram
[](/internal-link)
=> ""

=> ""That behavior is especially useful when Markdown is just an intermediate format and your real goal is clean plain text.
The Utility
Here is the core implementation:
class MarkdownUtil
{
public static function removeLinks($markdown)
{
$patternWithImageAltText = '/!\[(.*?)\]\((.*?)\)/s';
$patternWithLinkText = '/\[(.*?)\]\((?![^\[]*?\!\[)(.*?)\)/s';
$markdown = preg_replace('/!\[\]\((.*?)\)/', '', $markdown);
$markdown = preg_replace_callback($patternWithImageAltText, function ($matches) {
return $matches[1];
}, $markdown);
$markdown = preg_replace_callback($patternWithLinkText, function ($matches) {
return $matches[1];
}, $markdown);
$markdown = preg_replace('/\[\]\((.*?)\)/', '', $markdown);
return $markdown;
}
}The method is short, but the ordering matters more than the regexes themselves.
Why the Replacement Order Matters
This utility works as a four-stage pipeline:
- remove empty images
- convert images to their alt text
- convert regular links to their visible text
- remove empty links
That sequence prevents nested Markdown from being interpreted the wrong way.
Consider this input:
[](/post/system-design)If you process links first, you risk treating the entire inner image as generic link text. By processing images before regular links, the input becomes:
[Architecture diagram](/post/system-design)After that, the regular link pass can safely reduce it to:
Architecture diagramThis is the main design idea behind the utility: simplify the Markdown in stages instead of trying to solve everything with one giant regex.
The Image Pattern
The image regex is:
$patternWithImageAltText = '/!\[(.*?)\]\((.*?)\)/s';It captures:
- the alt text inside
![ ... ] - the image target inside
( ... )
The callback returns only the first capture group:
return $matches[1];So this:
becomes this:
Architecture diagramThe /s modifier makes the pattern more tolerant of multiline Markdown, because . can also match newline characters. That helps when link text, alt text, or the target wraps across lines.
The Link Pattern
The regular link regex is:
$patternWithLinkText = '/\[(.*?)\]\((?![^\[]*?\!\[)(.*?)\)/s';The important part is the negative lookahead:
(?![^\[]*?\!\[)This helps the pattern avoid consuming image-style content as if it were a plain link body. In practice, that makes the regular link pass safer around nested Markdown.
Just like the image callback, the link callback returns only the human-readable text:
return $matches[1];So this:
[Laravel docs](https://laravel.com)becomes this:
Laravel docsRemoving Empty Markdown Nodes
The utility explicitly deletes two forms that do not contribute any readable text:
$markdown = preg_replace('/!\[\]\((.*?)\)/', '', $markdown);
$markdown = preg_replace('/\[\]\((.*?)\)/', '', $markdown);That covers:
- empty images like
 - empty links like
[](/internal-link)
This is a small but important detail. If you keep them, your output may end up with meaningless placeholders or stray punctuation.
Real Use Cases
This kind of utility is useful when Markdown is not your final output format:
- converting scraped HTML to Markdown and then compressing it into plain text
- building search or indexing pipelines that should ignore raw URLs
- generating content previews where visible text matters more than destination links
- cleaning AI or summarization input before sending it into another processing step
One especially practical flow is:
- convert HTML to Markdown
- replace links and images with visible text
- compress whitespace
- store the result as plain text for downstream processing
That keeps the readable content while dropping noise from links, tracking URLs, and embedded images.
Test Cases Worth Keeping
The value of this utility is not just the regex. It is the set of edge cases covered by tests.
Useful cases include:
- regular links with text
- links with query strings and special characters
- images with alt text
- images with empty alt text
- nested images inside links
- multiple links and images in the same string
- empty input
- multiline Markdown with base64 images
A representative example is this base64 image case:
$text = <<<'EOT'
Hello,
the email was verified. 
EOT;
$expected = <<<'EOT'
Hello,
the email was verified. image
EOT;That test proves the utility is not limited to short URL-style image paths.
Where Regex Is Good Enough and Where It Is Not
This is a pragmatic regex solution, not a full Markdown parser.
It does not aim to cover every Markdown form. For example, reference links such as [docs][1], autolinks such as <https://example.com>, escaped bracket cases, and some URLs with parentheses may remain unchanged or produce imperfect output.
That is a strength when:
- you control the Markdown shape
- you need a lightweight dependency-free helper
- you care about a narrow transformation instead of full CommonMark compliance
It becomes a weaker fit when:
- the input can contain highly irregular Markdown
- you need to preserve complex nested formatting
- you must support every edge case of the Markdown spec
For controlled application pipelines, though, this tradeoff is often exactly right. A focused regex utility is easier to maintain, easier to test, and much easier to explain.
Final Thoughts
The useful idea here is not "use regex on Markdown" in the abstract. It is "use a staged transformation with explicit intent."
This utility works because it defines a clear policy:
- visible link text stays
- image alt text stays
- empty nodes disappear
- nested cases are simplified in the right order
If you need clean plain text from Markdown in PHP, that policy gives you a compact and production-friendly solution without pulling in a full parser.
