<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en_US"><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://azureossd.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://azureossd.github.io/" rel="alternate" type="text/html" hreflang="en_US" /><updated>2026-05-18T15:16:46+00:00</updated><id>https://azureossd.github.io/feed.xml</id><title type="html"> </title><subtitle>Support for Open Source Technologies on Microsoft Azure App Service.</subtitle><entry><title type="html">Container crash with exit code 132 (SIGILL) on Web App for Containers</title><link href="https://azureossd.github.io/2026/05/11/Container-crash-with-exit-code-132-SIGILL-on-Web-App-for-Containers/index.html" rel="alternate" type="text/html" title="Container crash with exit code 132 (SIGILL) on Web App for Containers" /><published>2026-05-11T12:00:00+00:00</published><updated>2026-05-11T12:00:00+00:00</updated><id>https://azureossd.github.io/2026/05/11/Container-crash-with-exit-code-132-SIGILL-on-Web-App-for-Containers/Container-crash-with-exit-code-132-SIGILL-on-Web-App-for-Containers</id><content type="html" xml:base="https://azureossd.github.io/2026/05/11/Container-crash-with-exit-code-132-SIGILL-on-Web-App-for-Containers/index.html"><![CDATA[<p>This post will cover containers crashing with exit code 132 (SIGILL - Illegal Instruction) on Web App for Containers, typically caused by CPU architecture mismatches between Intel and AMD workers.</p>

<h1 id="overview">Overview</h1>
<p>On Azure App Service, the underlying infrastructure fleet includes workers with different CPU vendors - specifically <strong>Intel</strong> and <strong>AMD</strong> processors. App Service does not guarantee a specific CPU vendor or instruction set for any given worker. Over time, stamps may transition between hardware generations, and instance movements (due to scaling, platform maintenance, or rebalancing) can place your application on a worker with a different CPU architecture than the one it was previously running on.</p>

<p>If your container image was compiled with CPU-specific instructions (for example, using <code class="language-plaintext highlighter-rouge">-march=native</code> on an Intel build machine, or linking against libraries that use Intel-specific instruction sets like <code class="language-plaintext highlighter-rouge">AVX-512</code>), the container may crash immediately when placed on an AMD worker - or vice versa.</p>

<p>This crash presents itself as <strong>exit code 132</strong>, which corresponds to <strong>signal 4 (SIGILL - Illegal Instruction)</strong>. The container typically exits so quickly that no application logs (stdout/stderr) are produced.</p>

<h1 id="what-does-exit-code-132-mean">What does exit code 132 mean?</h1>
<p>Exit code 132 is the result of the Linux kernel sending <strong>SIGILL (signal 4)</strong> to a process. This signal is raised when the CPU encounters an instruction it does not recognize or support.</p>

<p>The formula is: <code class="language-plaintext highlighter-rouge">128 + signal number = exit code</code>. So <code class="language-plaintext highlighter-rouge">128 + 4 = 132</code>.</p>

<p>Common reasons for SIGILL:</p>
<ul>
  <li>The binary was compiled with <code class="language-plaintext highlighter-rouge">-march=native</code> on an Intel machine, which may enable <strong><code class="language-plaintext highlighter-rouge">AVX-512</code></strong>, <strong>SSE4.2</strong>, or other Intel-specific instructions that AMD processors do not support (or vice versa)</li>
  <li>Native extensions or shared libraries (<code class="language-plaintext highlighter-rouge">.so</code> files) were built targeting a specific CPU microarchitecture</li>
  <li>Compiled languages such as C, C++, Rust, or Go (with assembly) may embed architecture-specific instructions at build time</li>
  <li>Python packages with native C extensions (like <code class="language-plaintext highlighter-rouge">numpy</code>, <code class="language-plaintext highlighter-rouge">scipy</code>, <code class="language-plaintext highlighter-rouge">cryptography</code>, etc.) may have been compiled from source on a specific CPU architecture</li>
</ul>

<h1 id="how-this-manifests-on-app-service">How this manifests on App Service</h1>
<p>A typical scenario looks like this:</p>

<ol>
  <li>Your application runs without any issues on a worker with an Intel CPU</li>
  <li>An instance movement occurs - this could be due to platform maintenance, scaling events, or worker rebalancing</li>
  <li>Your container is placed on a worker with an AMD CPU</li>
  <li>The container starts, but crashes immediately with exit code 132</li>
  <li>The container enters a crash loop - every restart attempt results in the same exit code 132</li>
  <li>Since the crash happens so fast, <strong>no application logs are produced</strong> - you may see messages like <code class="language-plaintext highlighter-rouge">Failed to get container logs</code> in diagnostic logging</li>
  <li>HTTP traffic returns <strong>503 errors</strong> since the container never becomes healthy</li>
</ol>

<p>If you look at <strong>Diagnose and Solve Problems</strong> or App Service Logs, you’ll see entries similar to:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Container [containerName] for site [siteName] has exited, exit code: 132
</code></pre></div></div>

<p>This will repeat for every restart attempt. The container never successfully starts.</p>

<blockquote>
  <p><strong>NOTE</strong>: This issue is classified as an <strong>application-level</strong> issue, even though it is triggered by an infrastructure change (instance movement). App Service does not guarantee a specific CPU vendor, and applications should be built to run on any supported <code class="language-plaintext highlighter-rouge">x86-64</code> processor.</p>
</blockquote>

<h1 id="identifying-the-issue">Identifying the issue</h1>
<p>To confirm this is a CPU architecture mismatch:</p>

<ol>
  <li><strong>Check the exit code</strong> - Exit code 132 specifically indicates <code class="language-plaintext highlighter-rouge">SIGILL</code>. This is different from other common exit codes like 137 (OOM kill) or 139 (segfault)</li>
  <li><strong>Check if an instance movement occurred</strong> - Look at App Service diagnostic logs or <strong>Diagnose and Solve Problems</strong> to see if the worker changed around the time the crashes started</li>
  <li><strong>Check if the application was previously healthy</strong> - If the same image was running without issues and then suddenly started crashing with exit code 132 after a worker change, this strongly suggests a CPU architecture mismatch</li>
  <li><strong>No application logs</strong> - The process is killed by the kernel before it can write any output. If you see empty logs or “Failed to get container logs”, combined with exit code 132, this is characteristic of <code class="language-plaintext highlighter-rouge">SIGILL</code></li>
</ol>

<h1 id="resolution">Resolution</h1>
<p>The fix is to rebuild your container image so that it does <strong>not</strong> rely on CPU-specific instructions. The following steps should be taken:</p>

<p><strong>1. Use architecture-agnostic compiler flags</strong></p>

<p>If you’re compiling code (C, C++, Rust, Go with assembly, etc.), use generic <code class="language-plaintext highlighter-rouge">x86-64</code> baseline target flags instead of architecture-specific ones:</p>

<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Instead of this (targets the build machine's exact CPU):</span>
<span class="k">RUN </span>gcc <span class="nt">-march</span><span class="o">=</span>native <span class="nt">-O2</span> <span class="nt">-o</span> myapp myapp.c

<span class="c"># Use this (targets the generic x86-64 baseline):</span>
<span class="k">RUN </span>gcc <span class="nt">-march</span><span class="o">=</span>x86-64 <span class="nt">-O2</span> <span class="nt">-o</span> myapp myapp.c

<span class="c"># Or use x86-64-v2 for a slightly newer baseline (SSE4.2, SSSE3, POPCNT):</span>
<span class="k">RUN </span>gcc <span class="nt">-march</span><span class="o">=</span>x86-64-v2 <span class="nt">-O2</span> <span class="nt">-o</span> myapp myapp.c
</code></pre></div></div>

<p><strong>2. Check for SIMD intrinsics</strong></p>

<p>If your application or its dependencies use SIMD (Single Instruction, Multiple Data) intrinsics, ensure <strong>runtime CPU feature detection</strong> is in place rather than compile-time assumptions. Many modern libraries support this - for example, checking for AVX support at runtime before using AVX instructions.</p>

<p><strong>3. Python with native extensions</strong></p>

<p>If you’re using Python with packages that have native C extensions (such as <code class="language-plaintext highlighter-rouge">numpy</code>, <code class="language-plaintext highlighter-rouge">scipy</code>, <code class="language-plaintext highlighter-rouge">cryptography</code>, <code class="language-plaintext highlighter-rouge">pillow</code>, etc.):</p>
<ul>
  <li>Use <strong>pre-built wheels</strong> from PyPI instead of building from source. Pre-built wheels target the generic <code class="language-plaintext highlighter-rouge">x86-64</code> baseline</li>
  <li>If you must build from source, ensure the build does not use <code class="language-plaintext highlighter-rouge">-march=native</code></li>
  <li>Consider using packages from <code class="language-plaintext highlighter-rouge">conda-forge</code>, which are also built for generic <code class="language-plaintext highlighter-rouge">x86-64</code></li>
</ul>

<p><strong>4. Rust applications</strong></p>

<p>For Rust, ensure your <code class="language-plaintext highlighter-rouge">.cargo/config.toml</code> or build command does not specify a CPU-specific target:</p>

<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Avoid this:</span>
<span class="nn">[build]</span>
<span class="py">rustflags</span> <span class="p">=</span> <span class="p">[</span><span class="s">"-C"</span><span class="p">,</span> <span class="py">"target-cpu</span><span class="p">=</span><span class="err">native</span><span class="s">"]</span><span class="err">
</span>
<span class="c"># Use this instead (or simply omit the target-cpu flag):</span>
<span class="nn">[build]</span>
<span class="py">rustflags</span> <span class="p">=</span> <span class="p">[</span><span class="s">"-C"</span><span class="p">,</span> <span class="py">"target-cpu</span><span class="p">=</span><span class="err">x</span><span class="mi">86-64</span><span class="s">"]</span><span class="err">
</span></code></pre></div></div>

<p><strong>5. Go applications</strong></p>

<p>Go applications are generally safe, as Go compiles to a generic <code class="language-plaintext highlighter-rouge">x86-64</code> target by default. However, if you’re using <strong>assembly files</strong> (<code class="language-plaintext highlighter-rouge">.s</code> files) or <strong>cgo</strong> with C code that targets a specific CPU, verify those components are architecture-agnostic.</p>

<p><strong>6. Push a new image and restart</strong></p>

<p>After rebuilding your image:</p>
<ol>
  <li>Push the new image to your container registry with a new tag</li>
  <li>Update the Web App for Containers configuration to use the new image tag</li>
  <li>The new image will be pulled</li>
</ol>

<h1 id="additional-considerations">Additional considerations</h1>
<ul>
  <li><strong>Multi-stage builds</strong>: If you use multi-stage Docker builds, ensure the <strong>build stage</strong> uses generic compiler flags, not just the final stage</li>
  <li><strong>Base images</strong>: Some base images may include pre-compiled binaries that target specific architectures. If you’re using a niche or heavily optimized base image, verify it’s compatible with both Intel and AMD x86-64 processors</li>
  <li><strong>Third-party binaries</strong>: If your Dockerfile downloads pre-compiled binaries (rather than building from source), ensure those binaries target the generic x86-64 baseline</li>
  <li><strong>Testing</strong>: To test locally, you can use <code class="language-plaintext highlighter-rouge">objdump -d &lt;binary&gt; | grep -i avx512</code> (or other instruction sets) to check if your binary contains architecture-specific instructions</li>
</ul>]]></content><author><name></name></author><category term="Web App for Containers" /><category term="Troubleshooting" /><category term="Linux" /><category term="Web App for Containers" /><category term="Docker" /><category term="Troubleshooting" /><category term="Linux" /><summary type="html"><![CDATA[This post will cover containers crashing with exit code 132 (SIGILL - Illegal Instruction) on Web App for Containers, typically caused by CPU architecture mismatches between Intel and AMD workers.]]></summary></entry><entry><title type="html">How to Reset WordPress Admin Password on Azure App Service Using SSH and WP-CLI</title><link href="https://azureossd.github.io/2026/05/07/How-to-Reset-WordPress-Admin-Password-on-Azure-App-Service-Using-SSH-and-WP-CLI/index.html" rel="alternate" type="text/html" title="How to Reset WordPress Admin Password on Azure App Service Using SSH and WP-CLI" /><published>2026-05-07T00:00:00+00:00</published><updated>2026-05-07T00:00:00+00:00</updated><id>https://azureossd.github.io/2026/05/07/How-to-Reset-WordPress-Admin-Password-on-Azure-App-Service-Using-SSH-and-WP-CLI/How-to-Reset-WordPress-Admin-Password-on-Azure-App-Service-Using-SSH-and-WP-CLI</id><content type="html" xml:base="https://azureossd.github.io/2026/05/07/How-to-Reset-WordPress-Admin-Password-on-Azure-App-Service-Using-SSH-and-WP-CLI/index.html"><![CDATA[<h2 id="overview">Overview</h2>
<p>This article provides the steps to reset a WordPress admin password for a site hosted on <strong>Azure App Service</strong> using <strong>SSH and WP-CLI</strong>.</p>

<p>This method is particularly useful when access to the WordPress admin portal (<code class="language-plaintext highlighter-rouge">/wp-admin</code>) is not available, or you do not have access to the MySQL database.</p>

<h2 id="scope">Scope</h2>
<ul>
  <li>Azure App Service (Linux) hosting WordPress</li>
  <li>WordPress deployments with WP-CLI available</li>
  <li>Scenarios where password reset via email or UI is not possible</li>
</ul>

<h2 id="prerequisites">Prerequisites</h2>
<ul>
  <li>Access to the <strong>Azure Portal</strong></li>
  <li>Sufficient permissions to the target Web App</li>
  <li>SSH access enabled on the App Service</li>
  <li>Basic familiarity with command-line usage</li>
</ul>

<h2 id="procedure">Procedure</h2>

<h3 id="1-access-azure-portal">1. Access Azure Portal</h3>
<ul>
  <li>Sign in to the Azure Portal: https://portal.azure.com</li>
  <li>Navigate to your <strong>WordPress Web App</strong></li>
</ul>

<h3 id="2-open-ssh-console">2. Open SSH Console</h3>
<ul>
  <li>In the left-hand menu, select:<br />
<strong>Development Tools → SSH</strong></li>
  <li>Click <strong>Go</strong> to open the SSH session</li>
</ul>

<h3 id="3-navigate-to-wordpress-root-directory">3. Navigate to WordPress Root Directory</h3>
<p>Run the following command:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> /home/site/wwwroot
</code></pre></div></div>

<h3 id="4-list-wordpress-users">4. List WordPress Users</h3>
<p>To identify the user account:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wp user list <span class="nt">--allow-root</span>
</code></pre></div></div>
<p>This will display:</p>
<ul>
  <li>User ID</li>
  <li>Username(user_login)</li>
  <li>Display name</li>
  <li>Email(user_email)</li>
  <li>date of registration(user_registered)</li>
  <li>Roles</li>
</ul>

<p><img src="/media/2026/05/List_User_Result.png" alt="List User Result" /></p>

<h3 id="5-reset-the-password">5. Reset the Password</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wp user update &lt;USER_ID&gt; <span class="nt">--user_pass</span><span class="o">=</span><span class="s1">'&lt;NEW_PASSWORD&gt;'</span> <span class="nt">--allow-root</span>
</code></pre></div></div>

<p>Example:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wp user update 1 <span class="nt">--user_pass</span><span class="o">=</span><span class="s1">'P@ssw0rd123!'</span> <span class="nt">--allow-root</span>
</code></pre></div></div>
<p><img src="/media/2026/05/Reset_Password.png" alt="List User Result" /></p>

<h2 id="expected-result">Expected Result</h2>

<ul>
  <li>The password is updated immediately</li>
  <li>No restart of the App Service is required</li>
</ul>

<h2 id="verification-steps">Verification Steps</h2>

<ol>
  <li>Navigate to:
    <pre><code class="language-code">https://&lt;your-site&gt;/wp-admin
</code></pre>
  </li>
  <li>Log in using:
    <ul>
      <li>Username</li>
      <li>New password</li>
    </ul>
  </li>
</ol>

<h2 id="important-notes">Important Notes</h2>

<ul>
  <li>The –allow-root flag is required in Azure App Service environments</li>
  <li>Always use a strong and secure password:
    <ul>
      <li>At least 12 characters recommended</li>
      <li>Combination of uppercase, lowercase, numbers, and symbols</li>
    </ul>
  </li>
  <li>Avoid sharing credentials in plain text</li>
</ul>

<h2 id="troubleshooting">Troubleshooting</h2>
<h3 id="issue-wp-command-not-found">Issue: wp: command not found</h3>
<ul>
  <li>WP-CLI may not be installed or available in the environment</li>
  <li>Verify your WordPress image or installation
    <h3 id="issue-permission-errors">Issue: Permission errors</h3>
  </li>
  <li>Ensure you are in the correct directory:
    <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> /home/site/wwwroot
</code></pre></div>    </div>
    <h3 id="issue-unable-to-identify-correct-user">Issue: Unable to identify correct user</h3>
  </li>
  <li>Re-run:
    <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wp user list <span class="nt">--allow-root</span>
</code></pre></div>    </div>
  </li>
  <li>Verify the correct Id before updating</li>
</ul>]]></content><author><name></name></author><category term="Azure App Service on Linux" /><category term="PHP" /><category term="WordPress" /><category term="MySQL" /><category term="How-To" /><category term="Web App(Linux)" /><category term="PHP" /><category term="WordPress" /><summary type="html"><![CDATA[Overview This article provides the steps to reset a WordPress admin password for a site hosted on Azure App Service using SSH and WP-CLI.]]></summary></entry><entry><title type="html">Overview of Azure Load Testing Service</title><link href="https://azureossd.github.io/2026/04/02/Azure-Load-Testing-overview/index.html" rel="alternate" type="text/html" title="Overview of Azure Load Testing Service" /><published>2026-04-02T12:00:00+00:00</published><updated>2026-04-02T12:00:00+00:00</updated><id>https://azureossd.github.io/2026/04/02/Azure-Load-Testing-overview/Azure-Load-Testing-overview</id><content type="html" xml:base="https://azureossd.github.io/2026/04/02/Azure-Load-Testing-overview/index.html"><![CDATA[<h2 id="this-blog-provides-a-quick-overview-of-the-azure-load-testing-service">This blog provides a quick overview of the Azure Load testing service</h2>

<h2 id="overview">Overview</h2>
<p>This post provides a practical overview of load testing applications deployed on Azure PaaS services such as Azure App Service and Azure Container Apps.</p>

<p>Azure Load Testing is a fully managed service for testing and evaluating application performance. It is especially valuable before production deployments because it helps predict application behavior at scale and under varying traffic patterns.</p>

<p>The following are the core components of the service:</p>
<ol>
  <li>Test - The overall configuration of the test including endpoints, rules, and metrics.</li>
  <li>Test run - One execution of a test.</li>
  <li>Test engine - Managed compute that generates traffic and executes test runs. You can scale tests by increasing engine instances and users.</li>
  <li>App components - The resources to monitor during load (for example CPU, memory, latency, and HTTP failures).</li>
  <li>Engine instances - Up to a maximum of 400 per test, with a maximum of 1000 concurrent instances across tests (subject to change).</li>
  <li>Users - On the Azure portal, users per engine instance are limited (up to 250, subject to change). This can be customized further in JMX based tests.</li>
</ol>

<h2 id="jmeter">JMeter</h2>
<p>Azure load tests are executed by Apache JMeter under the hood. More information on JMeter is available <a href="https://jmeter.apache.org/">here</a>. You can upload an existing JMeter script (<code class="language-plaintext highlighter-rouge">.jmx</code>) to the test engine. While Azure Load Testing users are analogous to JMeter threads and Azure engine instances are similar to JMeter nodes, Azure Load Testing includes built in Azure Monitor integration. This makes it easier to benchmark Azure resource metrics without additional plugins, integrations, or credential setup. JMeter still offers advanced customizations that are not exposed directly through the Azure Load Testing UI or API.</p>

<h2 id="test-types-url-vs-jmx">Test Types: URL vs JMX</h2>
<p>Before proceeding, it helps to understand the two common test types.</p>

<p><code class="language-plaintext highlighter-rouge">URL</code> tests are lightweight and can be configured directly in the portal or by using a JSON request file (for example <code class="language-plaintext highlighter-rouge">requests.json</code>) to define one or more HTTP requests. They are useful for quick API checks.</p>

<p><code class="language-plaintext highlighter-rouge">JMX</code> tests use a full Apache JMeter test plan (<code class="language-plaintext highlighter-rouge">.jmx</code>). Choose this when you need advanced behavior such as authentication flows, parameterization, and CSV data driven testing.</p>

<p>In practice, start with <code class="language-plaintext highlighter-rouge">URL</code> for fast validation and move to <code class="language-plaintext highlighter-rouge">JMX</code> as your scenario complexity grows.</p>

<h2 id="creating-tests-in-portal-vs-azure-cli">Creating Tests in Portal vs Azure CLI</h2>
<p>You can create and run tests from either the Azure portal or Azure CLI. The CLI option is better for repeatable workflows, source control, and CI/CD automation.</p>

<p>Below are corresponding Test Definition samples.</p>

<h2 id="sample-yaml-for-jmx-deployment">Sample YAML for JMX deployment</h2>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">version</span><span class="pi">:</span> <span class="s">v0.1</span>
<span class="na">testId</span><span class="pi">:</span> <span class="s">YourUniqueTestID</span>
<span class="na">displayName</span><span class="pi">:</span> <span class="s">Your Readable Test Name</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">Load test website home page</span>
<span class="na">testPlan</span><span class="pi">:</span> <span class="s">YourTestPlan.jmx</span>
<span class="na">testType</span><span class="pi">:</span> <span class="s">JMX</span> <span class="c1">#URL is the other type</span>
<span class="na">engineInstances</span><span class="pi">:</span> <span class="m">1</span>
<span class="na">subnetId</span><span class="pi">:</span> <span class="s">/subscriptions/&lt;subid&gt;/resourceGroups/rgname/providers/Microsoft.Network/virtualNetworks/vnetname/subnets/subnetid</span> <span class="c1">#optional</span>
<span class="na">configurationFiles</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="s1">'</span><span class="s">sampledata.csv'</span>
<span class="na">zipArtifacts</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="s">largedata.zip</span> 
<span class="c1">#Large configuration/data files under 50MB, up to 5 zip files - only for JMX</span>
<span class="na">splitAllCSVs</span><span class="pi">:</span> <span class="no">true</span> 
<span class="c1">#Splits CSV files per engine. Useful to split test data across engines or to avoid collisions such as with logins</span>
<span class="na">failureCriteria</span><span class="pi">:</span> 
<span class="c1">#examples</span>
  <span class="pi">-</span> <span class="s">avg(response_time_ms) &gt; </span><span class="m">300</span>
  <span class="pi">-</span> <span class="s">percentage(error) &gt; </span><span class="m">50</span>
  <span class="pi">-</span> <span class="na">YourJMeterSampler</span><span class="pi">:</span> <span class="s">avg(latency) &gt; </span><span class="m">200</span>
<span class="na">autoStop</span><span class="pi">:</span>
  <span class="na">errorPercentage</span><span class="pi">:</span> <span class="m">80</span>
  <span class="na">timeWindow</span><span class="pi">:</span> <span class="m">60</span> <span class="c1">#seconds</span>
<span class="na">env</span><span class="pi">:</span> 
<span class="c1">#Env variable referenced by the script</span>
    <span class="na">BASE_URL</span><span class="pi">:</span> <span class="s">https://app.yourdomain.com</span>  
    <span class="na">ENVIRONMENT</span><span class="pi">:</span> <span class="s">prod</span>

</code></pre></div></div>

<h2 id="sample-yaml-for-url-type-deployment">Sample YAML for URL type deployment</h2>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">version</span><span class="pi">:</span> <span class="s">v0.1</span>
<span class="na">testId</span><span class="pi">:</span> <span class="s">YourUniqueTestID</span>
<span class="na">displayName</span><span class="pi">:</span> <span class="s">Your Readable Test Name</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">Simple URL load tests</span>
<span class="na">testType</span><span class="pi">:</span> <span class="s">URL</span>
<span class="na">testPlan</span><span class="pi">:</span> <span class="s">requests.json</span>
<span class="na">engineInstances</span><span class="pi">:</span> <span class="m">2</span>
<span class="na">failureCriteria</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="s">avg(response_time_ms) &gt; </span><span class="m">500</span>
  <span class="pi">-</span> <span class="s">percentage(error) &gt; </span><span class="m">5</span>
<span class="na">autoStop</span><span class="pi">:</span>
  <span class="na">errorPercentage</span><span class="pi">:</span> <span class="m">20</span>
  <span class="na">timeWindow</span><span class="pi">:</span> <span class="m">60</span> <span class="c1">#seconds</span>

</code></pre></div></div>

<h2 id="sample-json-required-configuration-for-url-type-test">Sample JSON (required) configuration for URL type test</h2>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"requests"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
    </span><span class="p">{</span><span class="w">
      </span><span class="nl">"requestName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"HomePage"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"method"</span><span class="p">:</span><span class="w"> </span><span class="s2">"GET"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"url"</span><span class="p">:</span><span class="w"> </span><span class="s2">"https://yourdomain.com/"</span><span class="w">
    </span><span class="p">},</span><span class="w">
    </span><span class="p">{</span><span class="w">
      </span><span class="nl">"requestName"</span><span class="p">:</span><span class="w"> </span><span class="s2">"HealthCheck"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"method"</span><span class="p">:</span><span class="w"> </span><span class="s2">"GET"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"url"</span><span class="p">:</span><span class="w"> </span><span class="s2">"https://yourdomain.com/health"</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<h2 id="configuring-and-executing-tests-from-the-azure-portal">Configuring and executing tests from the Azure Portal</h2>
<p>The first step is to create the resources, which can be done quickly from the portal. Create an Azure Load Testing resource, select <code class="language-plaintext highlighter-rouge">Tests</code>, then choose <code class="language-plaintext highlighter-rouge">Create</code> and follow the guided options. The following areas are the most important to configure.</p>
<ol>
  <li>Load
Select the number of engine instances and the type of load pattern. It is recommended to create separate tests to test all the load patterns (Linear, Spike and Step) to replicate production scenarios and your use cases.
    <ul>
      <li>Linear: traffic increases over time. Allow an initial ramp up time.</li>
      <li>Spike: traffic rises rapidly to simulate seasonal or occasional spikes.</li>
      <li>Step: traffic increases in defined plateaus.
Select the number of concurrent users per engine. On the portal, this is commonly limited to 250 users per engine instance. Select the number of engines, test duration, and ramp-up time. More information on limits is available <a href="https://learn.microsoft.com/en-us/azure/app-testing/load-testing/resource-limits-quotas-capacity">here</a>.</li>
    </ul>
  </li>
  <li>
    <p>Monitoring
Attach key Azure dependencies being tested, such as App Service and Azure databases. Otherwise, testing is limited to client side metrics. It is recommended to add relevant components such as App Service, App Service Plan, upstream dependencies, and Application Insights.</p>
  </li>
  <li>Test criteria
Specify test criteria for client side metrics as well as Azure resource metrics. For client side analysis, useful metrics include response times, latency, and errors. For server side analysis, useful metrics include CPU, memory, scaling events, requests, and responses.</li>
</ol>

<p>Additionally, if your workload is private, configure subnet/VNet integration and validate DNS/routing before runs.</p>

<p>As an example, to simulate 5,000 users, you could configure 250 virtual users with 20 engine instances, set ramp up time to one minute, and run separate tests for linear, spike, and step scenarios.</p>

<p>After configuring the test, review the resources and settings, start the run, monitor live metrics, and then save and compare outcomes across runs.</p>

<h2 id="configuring-and-executing-tests-with-azure-cli-and-github-actions">Configuring and executing tests with Azure CLI and GitHub Actions</h2>
<p>Azure CLI is the recommended path for repeatability, source control, and CI/CD integration. In a typical operating model, you keep YAML/JMX/JSON/CSV files in the repository, create or update tests from YAML, and trigger runs in release stages. This provides traceable configuration history and consistent execution across environments.</p>

<h3 id="example-invocation-with-azure-cli">Example invocation with Azure CLI</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Install the Azure Load Testing CLI extension if needed</span>
az extension add <span class="nt">--name</span> load <span class="nt">--upgrade</span>
<span class="c"># Create or update a test from YAML</span>
az load <span class="nb">test </span>create <span class="se">\</span>
  <span class="nt">--load-test-resource</span> &lt;your-load-test-resource-name&gt; <span class="se">\</span>
  <span class="nt">--resource-group</span> &lt;your-load-test-rg&gt; <span class="se">\</span>
  <span class="nt">--test-id</span> &lt;test-id&gt; <span class="se">\</span>
  <span class="nt">--load-test-config-file</span> loadtest-config.yaml
<span class="c"># Start a test run</span>
az load test-run create <span class="se">\</span>
  <span class="nt">--load-test-resource</span> &lt;your-load-test-resource-name&gt; <span class="se">\</span>
  <span class="nt">--resource-group</span> &lt;your-load-test-rg&gt; <span class="se">\</span>
  <span class="nt">--test-id</span> &lt;test-id&gt; <span class="se">\</span>
  <span class="nt">--test-run-id</span> &lt;test-run-id&gt;
</code></pre></div></div>

<p>This can be automated with Github Actions as part of pull request or release workflows. The pipeline authenticates to Azure, runs the test with the repository configuration, and publishes result files as build artifacts for auditing and comparison.</p>

<h3 id="example-workflow-with-github-actions">Example workflow with GitHub Actions</h3>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">name</span><span class="pi">:</span> <span class="s">your-load-test-name</span>
<span class="na">on</span><span class="pi">:</span>
  <span class="na">workflow_dispatch</span><span class="pi">:</span>
  <span class="na">push</span><span class="pi">:</span>
    <span class="na">branches</span><span class="pi">:</span> <span class="pi">[</span> <span class="s2">"</span><span class="s">main"</span> <span class="pi">]</span>
<span class="na">jobs</span><span class="pi">:</span>
  <span class="na">run-load-test</span><span class="pi">:</span>
    <span class="na">runs-on</span><span class="pi">:</span> <span class="s">ubuntu-latest</span>
    <span class="na">steps</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Checkout</span>
        <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v4</span>
      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Azure Login</span>
        <span class="na">uses</span><span class="pi">:</span> <span class="s">azure/login@v2</span>
        <span class="na">with</span><span class="pi">:</span>
          <span class="na">creds</span><span class="pi">:</span> <span class="s">$</span>
      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Run Azure Load Testing</span>
        <span class="na">uses</span><span class="pi">:</span> <span class="s">azure/load-testing@v1</span>
        <span class="na">with</span><span class="pi">:</span>
          <span class="na">loadTestConfigFile</span><span class="pi">:</span> <span class="s">loadtest-config.yaml</span>
          <span class="na">loadTestResource</span><span class="pi">:</span> <span class="s">&lt;your-load-test-resource-name&gt;</span>
          <span class="na">resourceGroup</span><span class="pi">:</span> <span class="s">&lt;your-load-test-rg&gt;</span>
      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Publish results artifact</span>
        <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/upload-artifact@v4</span>
        <span class="na">with</span><span class="pi">:</span>
          <span class="na">name</span><span class="pi">:</span> <span class="s">load-test-results</span>
          <span class="na">path</span><span class="pi">:</span> <span class="pi">|</span>
            <span class="s">loadTest/*.json</span>
            <span class="s">loadTest/*.csv</span>
</code></pre></div></div>

<p>An example appraoch would be to run short tests with every pull requests and more comprehensive tests before releases. Save the test results to compare and review trends over time corresponding to application releases.</p>]]></content><author><name></name></author><category term="Azure App Service Linux" /><category term="Azure Container App" /><category term="Web App for Containers" /><category term="Other" /><category term="Performance, How-to" /><category term="Load testing" /><category term="Performance tuning" /><category term="Scaling" /><summary type="html"><![CDATA[This blog provides a quick overview of the Azure Load testing service]]></summary></entry><entry><title type="html">Basic Network Troubleshooting in Linux</title><link href="https://azureossd.github.io/2026/03/12/Basic-Network-Troubleshooting-in-Linux/index.html" rel="alternate" type="text/html" title="Basic Network Troubleshooting in Linux" /><published>2026-03-12T12:00:00+00:00</published><updated>2026-03-12T12:00:00+00:00</updated><id>https://azureossd.github.io/2026/03/12/Basic-Network-Troubleshooting-in-Linux/Basic-Network-Troubleshooting-in-Linux</id><content type="html" xml:base="https://azureossd.github.io/2026/03/12/Basic-Network-Troubleshooting-in-Linux/index.html"><![CDATA[<p>This blog covers basic Linux network troubleshooting with core open source tools and a handy <a href="https://github.com/azureossd/networking-troubleshooting-utility" title="Go to script">automated troubleshooting script</a>.</p>

<p><strong>Readme</strong>: <a href="https://github.com/azureossd/networking-troubleshooting-utility/blob/main/README.md">https://github.com/azureossd/networking-troubleshooting-utility/blob/main/README.md</a></p>

<h2 id="overview">Overview</h2>
<p>Outbound connectivity issues in cloud environments can stem from many causes and manifest in different ways. Applications may experience intermittent or persistent failures when reaching a specific host or API. Sometimes only one external endpoint is affected while others remain accessible, and in other cases all outbound traffic fails.</p>

<p>Common causes include DNS resolution errors, routing or firewall misconfigurations, upstream service failures, and platform constraints such as SNAT port limits in Azure App Services. These problems typically appear as latency, timeouts, or refused connections, often surfacing as HTTP 5xx errors downstream or at the client.</p>

<p>Effective troubleshooting requires isolating the failure domain, whether DNS, network path, platform limits, or application behavior. The following tools and approaches can help systematically diagnose and resolve these issues.</p>

<h2 id="environment-setup-and-prerequisites">Environment Setup and Prerequisites</h2>
<p>Before troubleshooting, confirm which tools are available. Managed runtimes (Azure App Services, Conatainer Apps etc) or IaaS services (ex:Azure VMs) may not include all utilities by default.</p>

<p><strong>In Azure App Service (Kudu SSH console):</strong> <code class="language-plaintext highlighter-rouge">curl</code>, <code class="language-plaintext highlighter-rouge">dig</code>, <code class="language-plaintext highlighter-rouge">nc</code>, and <code class="language-plaintext highlighter-rouge">tcpdump</code> are pre-installed in the sandbox. <code class="language-plaintext highlighter-rouge">nmap</code>, <code class="language-plaintext highlighter-rouge">tshark</code>, and <code class="language-plaintext highlighter-rouge">zeek</code> are absent and must be side-loaded as static binaries into <code class="language-plaintext highlighter-rouge">/home</code> (persistent storage) if needed. Note that as a non-root user, you cannot install these tools in the kudu container. Additionally, to be able to install these tools in the runtime container, it needs to be up and running. And with custom docker containers on App Services, ssh would need to be enabled. Refer <a href="https://azureossd.github.io/2022/04/27/2022-Enabling-SSH-on-Linux-Web-App-for-Containers/">here</a> for steps to enable SSH.</p>

<p>Below can be run as a quick test to see which (common) tools are already installed.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Inventory what is available</span>
<span class="k">for </span>tool <span class="k">in </span>curl wget dig nslookup nc nmap tcpdump tshark zeek ss netstat ip traceroute mtr iftop tcpping<span class="p">;</span> <span class="k">do
    </span><span class="nb">command</span> <span class="nt">-v</span> <span class="s2">"</span><span class="nv">$tool</span><span class="s2">"</span> &amp;&gt;/dev/null <span class="o">&amp;&amp;</span> <span class="nb">echo</span> <span class="s2">"✓  </span><span class="nv">$tool</span><span class="s2">"</span> <span class="o">||</span> <span class="nb">echo</span> <span class="s2">"✗  </span><span class="nv">$tool</span><span class="s2"> (not found)"</span>
<span class="k">done</span>
</code></pre></div></div>

<h2 id="dns">DNS</h2>
<p>One of the first steps in troubleshooting upstream connectivity failures is validating name resolution. In Linux based environments, it is important to review /etc/hosts and /etc/resolv.conf, as these files may contain custom entries. Always validate DNS before moving to reachability tests.</p>

<p>When an application, command line tool such as curl, browser, or runtime attempts to resolve a hostname, it invokes the operating system resolver through a system call. Typically (as determined by the resolutin order), /etc/hosts is checked first followed by DNS servers defined in /etc/resolv.conf. When any process resolves a hostname the OS resolver follows an order controlled by <code class="language-plaintext highlighter-rouge">/etc/nsswitch.conf</code> — typically <code class="language-plaintext highlighter-rouge">files</code> first, then <code class="language-plaintext highlighter-rouge">dns</code>:</p>

<p>A couple of tools(nslookup and dig) and their example usage is below.</p>

<h3 id="dig"><code class="language-plaintext highlighter-rouge">dig</code></h3>
<p><strong>Basic resolution:</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dig microsoft.com
</code></pre></div></div>
<p><strong>Query a specific DNS server (bypass the system resolver):</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dig @8.8.8.8        microsoft.com   <span class="c"># Google public DNS</span>
dig @168.63.129.16  microsoft.com   <span class="c"># Azure platform resolver</span>
dig @10.1.0.4       microsoft.com   <span class="c"># An example custom DNS server in your VNet</span>
</code></pre></div></div>
<p><strong>Follow the full delegation chain:</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dig +trace microsoft.com
</code></pre></div></div>
<p><strong>Reverse DNS lookup:</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dig <span class="nt">-x</span> 20.112.52.29
</code></pre></div></div>
<p><strong>Query specific record types:</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dig microsoft.com A       <span class="c"># IPv4</span>
dig microsoft.com CNAME   <span class="c"># Canonical name alias</span>
dig microsoft.com TXT     <span class="c"># SPF, DMARC, ACME challenges</span>
dig microsoft.com NS      <span class="c"># Authoritative nameservers</span>
dig microsoft.com SOA     <span class="c"># Start of Authority</span>
</code></pre></div></div>
<p><strong>Measure DNS query time and check TTL:</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dig microsoft.com | <span class="nb">grep</span> <span class="nt">-E</span> <span class="s2">"Query time|ANSWER SECTION|IN</span><span class="se">\s</span><span class="s2">+A"</span>
<span class="c"># ;; Query time: 4 msec</span>
<span class="c"># microsoft.com.  30  IN  A  20.112.52.29</span>
</code></pre></div></div>
<p>A low TTL (30 seconds in this case) means results expire quickly. A cached NXDOMAIN with a TTL of 300 will persist for 5 minutes even after a DNS fix is deployed. Account for this when validating a fix.</p>

<h3 id="nslookup"><code class="language-plaintext highlighter-rouge">nslookup</code></h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nslookup microsoft.com
nslookup microsoft.com 168.63.129.16   <span class="c"># query a specific server</span>
</code></pre></div></div>

<h2 id="tcp-reachability-and-connectivity-testing">TCP Reachability and Connectivity Testing</h2>
<p>If DNS resolution fails, review /etc/resolv.conf, test with an alternate DNS server, and confirm there are no local blockers such as an incorrect hosts file entry or a corporate proxy interfering with name resolution. Check custom DNS servers, forwarding, and Firewall rules.</p>

<p>Once name resolution succeeds, the next step is to validate reachability. These tests determine whether traffic can physically traverse the network path to the destination over TCP.</p>

<p>While several tools are available for this purpose, utilities such as nmap and nc are commonly present in Linux environments and offer many useful capabilities including port scanning, custom packet testing, and basic data transfer.</p>

<p><strong>Basic TCP connectivity test:</strong></p>

<h3 id="nc"><code class="language-plaintext highlighter-rouge">nc</code></h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nc <span class="nt">-zv</span> <span class="nt">-w</span> 5 microsoft.com 443
</code></pre></div></div>
<p><strong>Probe multiple ports in a single command:</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nc <span class="nt">-zv</span> microsoft.com 80 443 8080
</code></pre></div></div>
<p><strong>Send a raw HTTP request and inspect the response</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">printf</span> <span class="s2">"GET / HTTP/1.0</span><span class="se">\r\n</span><span class="s2">Host: microsoft.com</span><span class="se">\r\n\r\n</span><span class="s2">"</span> | nc microsoft.com 80

</code></pre></div></div>
<h3 id="nmap-port-and-service-discovery"><code class="language-plaintext highlighter-rouge">nmap</code> Port and Service Discovery</h3>
<p><code class="language-plaintext highlighter-rouge">nmap</code> provides richer results than <code class="language-plaintext highlighter-rouge">nc</code>, including service identification and the distinction between a filtered (firewall dropped) port and a closed (RST received or unavailable) port.
<strong>Single port test:</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nmap <span class="nt">-p</span> 443 80 microsoft.com
</code></pre></div></div>
<p><strong>TCP SYN scan:</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nmap <span class="nt">-sS</span> <span class="nt">-p</span> 80,443 microsoft.com
</code></pre></div></div>
<p><strong>Service and version detection: Very handy</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nmap <span class="nt">-sV</span> <span class="nt">-p</span> 443 microsoft.com
<span class="c"># 443/tcp open  ssl/http Microsoft IIS httpd</span>
</code></pre></div></div>
<p><strong>Show why each port has a given state (filtered vs. closed): Equally, very useful</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nmap <span class="nt">-p</span> 443 <span class="nt">--reason</span> microsoft.com
<span class="c"># 443/tcp open      syn-ack    — port open, accepting connections</span>
<span class="c"># 443/tcp filtered  no-response — firewall/NSG silently dropping packets</span>
<span class="c"># 443/tcp closed    reset       — host reachable, nothing listening on that port</span>
</code></pre></div></div>

<h3 id="curl"><code class="language-plaintext highlighter-rouge">curl</code></h3>
<p><code class="language-plaintext highlighter-rouge">curl</code> is a useful tool for end-to-end n/w validation as well as for downloading packages or artifacts. 
<strong>Simple request with granular timing breakdown:</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">-o</span> /dev/null <span class="nt">-s</span> <span class="nt">-w</span> <span class="se">\</span>
<span class="s2">"DNS lookup:    %{time_namelookup}s</span><span class="se">\n\</span><span class="s2">
TCP connect:   %{time_connect}s</span><span class="se">\n\</span><span class="s2">
TLS handshake: %{time_appconnect}s</span><span class="se">\n\</span><span class="s2">
TTFB:          %{time_starttransfer}s</span><span class="se">\n\</span><span class="s2">
Total:         %{time_total}s</span><span class="se">\n\</span><span class="s2">
HTTP status:   %{http_code}</span><span class="se">\n</span><span class="s2">"</span> <span class="se">\</span>
https://microsoft.com
</code></pre></div></div>
<p><strong>Very verbose — includes the full TLS certificate chain:</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">-vvv</span> https://microsoft.com 2&gt;&amp;1 | <span class="nb">head</span> <span class="nt">-100</span>
</code></pre></div></div>
<p><strong>Test with internal or custom CA certificate</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">--cacert</span> /etc/ssl/certs/custom-ca.crt <span class="se">\</span>
  https://custom-api.mycompany.com
</code></pre></div></div>

<h3 id="ss-or-netstat-note-that-ss-is-a-newer-replacement"><code class="language-plaintext highlighter-rouge">ss</code> or <code class="language-plaintext highlighter-rouge">netstat</code>. Note that <code class="language-plaintext highlighter-rouge">ss</code> is a newer replacement.</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Show aoo connections</span>
netstat <span class="nt">-tunp</span>
<span class="c"># Connections to a specific remote host</span>
ss <span class="nt">-tnp</span> dst microsoft.com
<span class="c"># Show details including MTU and MSS for active connections</span>
ss <span class="nt">-tin</span>

</code></pre></div></div>

<h3 id="check-interface-mtu">Check Interface MTU</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Show all interfaces and their MTU</span>
ip <span class="nb">link </span>show
<span class="c"># Single interface</span>
ip <span class="nb">link </span>show eth0 | <span class="nb">grep </span>mtu
</code></pre></div></div>

<h2 id="other-utilities">Other utilities</h2>
<p>Additional tools are available to quickly troubleshoot connectivity, review bandwidth usage, latency, and overall network performance. Common examples include iftop, iptraf-ng, and nethogs, all of which provide real time visibility into network activity on Linux systems.</p>

<p>After installation, these utilities can be launched directly from the console to display active connections in real time.</p>

<h3 id="iftop"><code class="language-plaintext highlighter-rouge">iftop</code></h3>
<p>Iftop also launches a UI which shows live traffic and can be useful to view current and active outbound connections and bandwidth usage.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>iftop <span class="nt">-i</span> eth0             <span class="c"># live bandwidth by connection pair</span>
iftop <span class="nt">-i</span> eth0 <span class="nt">-f</span> <span class="s2">"port 443"</span>   <span class="c"># filter to HTTPS only</span>
</code></pre></div></div>

<h3 id="nethogs"><code class="language-plaintext highlighter-rouge">nethogs</code></h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nethogs eth0              <span class="c"># bandwidth by PID</span>
nethogs <span class="nt">-d</span> 2 eth0         <span class="c"># refresh every 2 seconds</span>
</code></pre></div></div>

<h3 id="iptraf-ng-ui-based-tool-similar-to-iftop-nad-nethogs-with-additional-utility"><code class="language-plaintext highlighter-rouge">iptraf-ng</code> UI based tool similar to <code class="language-plaintext highlighter-rouge">iftop</code> nad <code class="language-plaintext highlighter-rouge">nethogs</code> with additional utility</h3>
<p><strong>Example view below showing outbound connections</strong>
<img src="/media/2026/03/iptraf-ng2.png" alt="iptraf-ng view" /></p>

<h2 id="capturing-network-traces">Capturing Network traces</h2>
<p>If reachability or connectivity tests fail, or if the issues are intermittent, capturing and analyzing a network trace is the next step. tcpdump is a powerful tool for this purpose, allowing you to record traffic that can be analyzed directly in the CLI using tcpdump or tshark, or externally with tools like Wireshark.</p>

<h3 id="full-capture-with-tcdump">Full Capture with tcdump</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tcpdump <span class="nt">-i</span> any <span class="nt">-s</span> 0 <span class="nt">-tttt</span> <span class="nt">-U</span> <span class="nt">-nn</span> <span class="nt">-w</span> “trace.pcap” 
</code></pre></div></div>
<p>This command captures full TCP packets (-s 0) on all interfaces (-i any), includes timestamps (-tttt), avoids resolving hostnames and ports (-nn), outputs in verbose mode (-vv) (not needed), and writes packets immediately to disk (-U) to the specified file (-w).</p>

<p>Filters can be applied during capture, often to reduce noise. However, it may be beneficial to capture all traffic on primary (or all) interfaces and apply filters later during analysis.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Rolling capture with 100 MB file rotation, keeping 5 files</span>
tcpdump <span class="nt">-i</span> any <span class="nt">-s</span> 0 <span class="nt">-nn</span> <span class="nt">-w</span> trace_%Y%m%d_%H%M%S.pcap <span class="nt">-C</span> 100 <span class="nt">-W</span> 5

<span class="c"># Traffic to or from a specific host</span>
tcpdump <span class="nt">-i</span> any <span class="nt">-s</span> 0 <span class="nt">-nn</span> <span class="nt">-w</span> trace.pcap host microsoft.com

<span class="c"># Traffic on a specific port</span>
tcpdump <span class="nt">-i</span> any <span class="nt">-s</span> 0 <span class="nt">-nn</span> <span class="nt">-w</span> trace.pcap port 443

<span class="c"># Traffic between two specific IP addresses</span>
tcpdump <span class="nt">-i</span> any <span class="nt">-s</span> 0 <span class="nt">-nn</span> <span class="nt">-w</span> trace.pcap <span class="se">\</span>
  <span class="s1">'src host 10.1.0.10 and dst host 20.112.52.29'</span>

<span class="c"># TCP RST and FIN packets only (connection termination events)</span>
tcpdump <span class="nt">-i</span> any <span class="nt">-nn</span> <span class="nt">-s</span> 0 <span class="nt">-w</span> resets.pcap <span class="se">\</span>
  <span class="s1">'tcp[tcpflags] &amp; (tcp-rst|tcp-fin) != 0'</span>

<span class="c"># DNS traffic only</span>
tcpdump <span class="nt">-i</span> any <span class="nt">-s</span> 0 <span class="nt">-nn</span> <span class="nt">-w</span> dns.pcap port 53
</code></pre></div></div>

<h2 id="analyzing-network-trace">Analyzing network trace</h2>
<p>The packet capture file can be analyzed in several ways right away with tcpdump or tshark.</p>

<h3 id="with-tcpdump">With <code class="language-plaintext highlighter-rouge">tcpdump</code></h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Verbose output</span>
tcpdump <span class="nt">-v</span> <span class="nt">-r</span> trace.pcap

<span class="c"># Show ASCII payload (useful for HTTP headers and plain-text protocols)</span>
tcpdump <span class="nt">-v</span> <span class="nt">-A</span> <span class="nt">-r</span> trace.pcap

<span class="c"># Filter to traffic involving microsoft.com's IP</span>
tcpdump <span class="nt">-nn</span> <span class="nt">-r</span> trace.pcap <span class="s1">'host 20.112.52.29 and port 443'</span>

<span class="c"># Show DNS queries and responses</span>
tcpdump <span class="nt">-A</span> <span class="nt">-nn</span> <span class="nt">-r</span> trace.pcap <span class="s1">'port 53'</span>

<span class="c"># All unique destination ports contacted (from SYN packets)</span>
tcpdump <span class="nt">-nn</span> <span class="nt">-r</span> <span class="s2">"trace.pcap"</span> <span class="se">\</span>
  <span class="s1">'tcp[tcpflags] &amp; tcp-syn != 0 and tcp[tcpflags] &amp; tcp-ack == 0'</span> | <span class="se">\</span>
  <span class="nb">grep</span> <span class="nt">-oP</span> <span class="s1">'\d+\.\d+\.\d+\.\d+\.(\d+)'</span> | <span class="se">\</span>
  <span class="nb">grep</span> <span class="nt">-oP</span> <span class="s1">'\d+$'</span> | <span class="nb">sort</span> <span class="nt">-n</span> | <span class="nb">uniq</span> <span class="nt">-c</span> | <span class="nb">sort</span> <span class="nt">-rn</span>

</code></pre></div></div>

<h2 id="deeper-analysis-with-tshark">Deeper Analysis with <code class="language-plaintext highlighter-rouge">tshark</code></h2>
<p>tshark provides a more powerful command-line option for analyzing capture files. A practical approach is to take a quick trace while troubleshooting an issue live and analyze it immediately with tcpdump or tshark. If the results are inconclusive, the capture file can be examined externally using tools such as Wireshark. For more complex scenarios, a comprehensive, long running (with file rotation) network trace can be captured to provide deeper insights.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># With absolute UTC timestamps</span>
tshark <span class="nt">-r</span> trace.pcap <span class="nt">-t</span> ud
<span class="c"># All DNS records</span>
tshark <span class="nt">-r</span> trace.pcap <span class="nt">-Y</span> <span class="s2">"dns"</span>
<span class="c"># TCP RST packets</span>
tshark <span class="nt">-r</span> trace.pcap <span class="nt">-Y</span> <span class="s2">"tcp.flags.reset == 1"</span>
<span class="c"># TCP retransmissions (tshark expert analysis field — reliable)</span>
tshark <span class="nt">-r</span> trace.pcap <span class="nt">-Y</span> <span class="s2">"tcp.analysis.retransmission"</span>
<span class="c"># IP endpoint statistics (top talkers by packet and byte count)</span>
tshark <span class="nt">-r</span> trace.pcap <span class="nt">-q</span> <span class="nt">-z</span> endpoints,ip
</code></pre></div></div>

<p>In real-world environments, it is common to iterate through multiple analysis methods to isolate the issue. To simplify this process, below is a handy OSS script that wraps these native Linux tools into a single interface. It can be downloaded with a simple curl command and run either interactively or by specifying the destination IP and/or port.</p>

<h2 id="usage">Usage</h2>
<p><strong>Download and install the script with the below curl command</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">-fsSL</span> https://raw.githubusercontent.com/azureossd/networking-troubleshooting-utility/refs/heads/main/nwutils_install.sh | bash
</code></pre></div></div>
<p><strong>Install all tools (only) and run the commands manually</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nwutils <span class="nb">install</span>
</code></pre></div></div>
<p><strong>Run interactively for dynamic ports</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nwutils run
</code></pre></div></div>
<p><strong>Or pass the target directly</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nwutils myapi.com 4000
</code></pre></div></div>

<p>Simply run the script and it installs the necessary tools, runs through the diagnostics, collects network trace (default 60s), analysis the trace and generates a logfile and a html report with a summary.</p>]]></content><author><name></name></author><category term="Azure App Service Linux" /><category term="Azure Container App" /><category term="Web App for Containers" /><category term="Other" /><category term="Troubleshooting" /><category term="linux, network trace, tcpdump, tcpping, wireshark, networking, snat, vnet, dns" /><summary type="html"><![CDATA[This blog covers basic Linux network troubleshooting with core open source tools and a handy automated troubleshooting script.]]></summary></entry><entry><title type="html">Issues with deleting Container App Environments</title><link href="https://azureossd.github.io/2026/02/16/Issues-with-deleting-Container-App-Environments/index.html" rel="alternate" type="text/html" title="Issues with deleting Container App Environments" /><published>2026-02-16T12:00:00+00:00</published><updated>2026-02-16T12:00:00+00:00</updated><id>https://azureossd.github.io/2026/02/16/Issues-with-deleting-Container-App-Environments/Issues-with-deleting-Container-App-Environments</id><content type="html" xml:base="https://azureossd.github.io/2026/02/16/Issues-with-deleting-Container-App-Environments/index.html"><![CDATA[<p>This blog post covers two typical reasons why deleting a Container App Environment may fail</p>

<h1 id="overview">Overview</h1>
<p>There are times that a Container App Environment may fail to delete. Although this is rare. Potential symptons of this after invoking a <code class="language-plaintext highlighter-rouge">delete</code> command (via portal, CLI, or else) may show something like the below:</p>
<ul>
  <li>Fail fast, it may fail after a few seconds with an error about a failure to delete</li>
  <li>Timeout - the command/action may run for tens-of-minutes or more, also ending with a message about failure to delete</li>
</ul>

<p>At this time of this post (02/16/2026) the <em>direct</em> root cause reason isn’t surfaced back. But we can infer two general reasons for why this may happen:</p>
<ul>
  <li><strong>Resource locks</strong> - This is especially common with environments that use a VNET and a lock is placed on the <a href="https://learn.microsoft.com/en-us/azure/container-apps/custom-virtual-networks?tabs=workload-profiles-env#managed-resources">VNET infrastructure / managed-resource group</a>. But this can happen on environments that don’t have a VNET.</li>
  <li><strong>Unhealthy environment/cluster</strong>: A cluster that is in a failed state may cause delete (or in general, create/update/delete options from succeeding). This has it’s own potential causes too.</li>
</ul>

<h1 id="resource-locks">Resource locks</h1>
<p>Resource locks on any of the following (but not limited to) would cause this failure to delete:</p>
<ul>
  <li>On the Container App Environment
    <ul>
      <li>Potentially on any resources inside of it, such as an a Container App</li>
    </ul>
  </li>
  <li>On the managed resource group that’s created as part of the environment when a custom VNET is used at creation time
    <ul>
      <li>This resource group gets deleted when the environment gets deleted - so if there is a lock on this resource group or anything in this resource group (eg. Azure Load Balancer, public IP, etc.), then the deletion operation will fail since it cannot fully clean up all resources</li>
    </ul>
  </li>
</ul>

<p>These resource groups will look like the following (unless it was customized at creation time):</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">MC_some-env-name-rg_region</code> (<code class="language-plaintext highlighter-rouge">MC_</code> prefixed is used for Consumption-only environments)</li>
  <li><code class="language-plaintext highlighter-rouge">ME_some-env-name-rg_region</code> (<code class="language-plaintext highlighter-rouge">ME_</code> prefixed is used for Workload profile environments)</li>
</ul>

<p>A quick way to see these resource groups is to go to the <strong>Subscription</strong> -&gt; <strong>Resource groups</strong> blade in the Azure Portal.</p>

<p><img src="/media/2026/02/aca-env-delete-1.png" alt="Managed Resource Group" /></p>

<p>Under <strong>Settings</strong> -&gt; <strong>Locks</strong>, you can check if a lock is in place. By default, the Container Apps platform does <strong>not</strong> add locks.</p>

<blockquote>
  <p><strong>NOTE</strong>: In real-world scenarios, this is likely added by deployment automation in companies/organizations/teams, etc. This automation may also be wide-spread to various resources, and may/may not also be tied to an Azure Policy.</p>
</blockquote>

<p>If a user-created lock exists for any resource, you’ll see it in that blade. Below, we can see a lock named <code class="language-plaintext highlighter-rouge">prevent-delete</code> was added to this managed resource group.</p>

<p>In our case, a lock was created at the subscription level, which cascades to resources within it.</p>

<p><img src="/media/2026/02/aca-env-delete-2.png" alt="Managed Resource Group locks" /></p>

<p>Now when you try to delete it, you’ll see a popup after a few minutes in the top left of the Azure Portal (or returned through your deployment client) with something like the following:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[resource_name]: The scope '/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourcegroups/some-rg/providers/Microsoft.App/managedEnvironments/some-env' cannot perform delete operation because following scope(s) are locked: '/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'. Please remove the lock and try again. (Code: ScopeLocked)
</code></pre></div></div>

<p>The important part of this error is <code class="language-plaintext highlighter-rouge">because following scope(s) are locked: '/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'</code> - since this tells you what scope the lock is for. In our case, it’s a subscription level lock. If it was a resource group lock, it woud look something like <code class="language-plaintext highlighter-rouge">/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourcegroups/some-rg</code>. If it was an individual resource lock, it would target the resource directly. Use this to then go to the <strong>Locks</strong> blade on the relevant resource, remove it - and then try the delete operation again.</p>

<p>More information on locks, including finding locks in ways not described above, review <a href="https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/lock-resources?tabs=json">Azure Resource Locks</a>.</p>

<h1 id="unhealthy-environmentcluster">Unhealthy environment/cluster</h1>
<p>This can be more tricky to troubleshoot - you can get an indication something may be wrong with the environment/cluster state by the type of error returned during CRUD operations - as it may report back a message stating something like <code class="language-plaintext highlighter-rouge">managedEnvironment provisioning state is failed</code>. Use the below guidance to understand if this is related to deletion failures.</p>

<p>You can see this in a more clear way by going to <strong>Diagnose and Solve Problems</strong> -&gt; <strong>Container App Down</strong> and look for the following:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">powerState</code> - Cluster power state</li>
  <li><code class="language-plaintext highlighter-rouge">environmentProvisioningState</code> - Container App Environment provisioning state</li>
  <li><code class="language-plaintext highlighter-rouge">managedEnvironmentProvisioningState</code> - Managed environment provisioning state</li>
</ul>

<p><img src="/media/2026/02/aca-env-delete-3.png" alt="Environment and cluster provisioning states" /></p>

<p>The below is a non-exhaustive list for reasons why environment/cluster states may be unhealthy</p>

<hr />

<p>If <code class="language-plaintext highlighter-rouge">powerState</code> is failed, then create/update/delete operations will likely not succeed.</p>

<p>If <code class="language-plaintext highlighter-rouge">environmentProvisioningState</code> <em>not</em> “Succeeded”, this doesn’t immediately mean a deletion or other operations will fail. Take into account the below:</p>
<ul>
  <li>If <code class="language-plaintext highlighter-rouge">environmentProvisioningState</code> is shown as “Updating” for multiple hours, then this means this is not a “terminal” state, so create/update/deletion operations will potentially fail. The <code class="language-plaintext highlighter-rouge">environmentProvisioningState</code> should be at a terminal state, which essentially <em>not</em> “Updating” - if it reaches most other states, you should still be able to delete it
    <ul>
      <li><code class="language-plaintext highlighter-rouge">environmentProvisioningState</code> being unhealthy could be due to a previous delete failure due to resource locks</li>
      <li>This could also mean that the environment was attempted to be upgraded through typical Platform Upgrades at a prior point in time but you have an <strong>Azure Policy</strong> preventing updates. See this documentation on <a href="https://learn.microsoft.com/en-us/azure/container-apps/environment#policies">policies</a> with Azure Container Apps for more information</li>
    </ul>
  </li>
</ul>

<p><code class="language-plaintext highlighter-rouge">managedEnvironmentProvisioningState</code> may be unhealthy for a few reasons:</p>
<ul>
  <li>Initial failure upon deployment. This again could be Azure Policy related. Eg. creating a resource without meeting certain policy demands could cause deployments to fail as only a subset of resources failed. Policy errors, like above, may not be returned to the user. You’d have to investigate what policies are set for these resources that are being created within this subscription</li>
  <li>This could also fail due to Platform Upgrades being blocked due to the above</li>
  <li>This could fail due to unhealthy infrastructure</li>
</ul>

<blockquote>
  <p><strong>TIP</strong>: If the suspected issue was due to an Azure Policy in please blocking upgrades or preventing resource changes - and you have not seen <code class="language-plaintext highlighter-rouge">environmentProvisioningState</code> or <code class="language-plaintext highlighter-rouge">managedEnvironmentProvisioningState</code> move to a succeeded state, then add OR update a tag on the <strong>Container App Environment</strong> to reconcile the cluster</p>
</blockquote>

<p><strong>NOTE</strong>: For Consumption-only environments - these are much more sensitive to customer-brought networking, especially for UDR’s and NAT gateways. Blocking any part of underlying AKS traffic (eg. <code class="language-plaintext highlighter-rouge">kubeapi-server</code>) will cause the cluster to enter a failed state. Which in turn would cause the above provisioning states to be failed, and your create/update/delete operations to fail.</p>
<ul>
  <li>This is called out in <a href="https://learn.microsoft.com/en-us/azure/container-apps/firewall-integration?tabs=consumption-only#outbound">Securing a virtual network in Azure Container Apps</a>, through <a href="https://learn.microsoft.com/en-us/azure/aks/outbound-rules-control-egress#required-outbound-network-rules-and-fqdns-for-aks-clusters">Required outbound network rules and FQDNs for AKS clusters</a>. The fact UDR’s are only supported on Workload Profiles are also mentioned <a href="https://learn.microsoft.com/en-us/azure/container-apps/user-defined-routes">here</a>. This also applies to NAT gateways. The short of it is, avoid using this - or send all traffic to the internet through a route, to avoid breaking the environment.
    <ul>
      <li>Ideally, it is heavily recommended to migrate to and use workload profile environments which support this as a feature.</li>
    </ul>
  </li>
</ul>]]></content><author><name></name></author><category term="Azure Container Apps" /><category term="Troubleshooting" /><category term="Container Apps" /><category term="Availability" /><category term="Configuration" /><category term="Troubleshooting" /><summary type="html"><![CDATA[This blog post covers two typical reasons why deleting a Container App Environment may fail [resource_name]: The scope ‘/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourcegroups/some-rg/providers/Microsoft.App/managedEnvironments/some-env’ cannot perform delete operation because following scope(s) are locked: ‘/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx’. Please remove the lock and try again. (Code: ScopeLocked)]]></summary></entry><entry><title type="html">Behavior of _del_node_modules</title><link href="https://azureossd.github.io/2026/02/02/Behavior-of-_del_node_modules/index.html" rel="alternate" type="text/html" title="Behavior of _del_node_modules" /><published>2026-02-02T12:00:00+00:00</published><updated>2026-02-02T12:00:00+00:00</updated><id>https://azureossd.github.io/2026/02/02/Behavior-of-_del_node_modules/Behavior-of-_del_node_modules</id><content type="html" xml:base="https://azureossd.github.io/2026/02/02/Behavior-of-_del_node_modules/index.html"><![CDATA[<p>This post will touch on the directory “_del_node_modules”</p>

<h1 id="overview">Overview</h1>
<p>In some App Service Logs, with a App Service Linux - Node.js “Blessed” image (not a custom image, eg. Web App for Containers), you may see a reference to a directory named <code class="language-plaintext highlighter-rouge">_del_node_modules</code>.</p>

<p>This directory is created on startup through logic defined in the Node.js Blessed Image’s container entrypoint and startup behavior. You can see the logic invoked to create it if App Service Logs are enabled, and if you look in <code class="language-plaintext highlighter-rouge">default_docker.log</code>.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2026-01-30T21:06:43.8273864Z <span class="nb">export </span><span class="nv">NODE_PATH</span><span class="o">=</span><span class="s2">"/node_modules"</span>:<span class="nv">$NODE_PATH</span>
2026-01-30T21:06:43.8273897Z <span class="nb">export </span><span class="nv">PATH</span><span class="o">=</span>/node_modules/.bin:<span class="nv">$PATH</span>
2026-01-30T21:06:43.8273929Z <span class="k">if</span> <span class="o">[</span> <span class="nt">-d</span> node_modules <span class="o">]</span><span class="p">;</span> <span class="k">then
</span>2026-01-30T21:06:43.8273962Z     <span class="nb">mv</span> <span class="nt">-f</span> node_modules _del_node_modules <span class="o">||</span> <span class="nb">true
</span>2026-01-30T21:06:43.8273993Z <span class="k">fi
</span>2026-01-30T21:06:43.8274023Z 
2026-01-30T21:06:43.8274065Z <span class="k">if</span> <span class="o">[</span> <span class="nt">-d</span> /node_modules <span class="o">]</span><span class="p">;</span> <span class="k">then
</span>2026-01-30T21:06:43.8274098Z     <span class="nb">ln</span> <span class="nt">-sfn</span> /node_modules ./node_modules 
2026-01-30T21:06:43.8274130Z <span class="k">fi
</span>2026-01-30T21:06:43.8274162Z 
2026-01-30T21:06:43.8274192Z <span class="nb">echo</span> <span class="s2">"Done."</span>
2026-01-30T21:06:43.8274224Z node server.js
</code></pre></div></div>

<h1 id="call-outs-about-this-directory">Call outs about this directory</h1>
<ul>
  <li>It is essentially transparent. During deployments, or, mostly any <code class="language-plaintext highlighter-rouge">node_modules</code> related aspects, this should not be touched or worried about. This doesn’t directly affect the application.</li>
  <li>This directory has the same package versions as the ones referenced in your <code class="language-plaintext highlighter-rouge">package.json</code> and also the <code class="language-plaintext highlighter-rouge">node_modules</code> being used by your application. In some cases, if you have a cloud security scanner, it may flag <code class="language-plaintext highlighter-rouge">_del_node_modules</code> as having outdated versions.
    <ul>
      <li>You can SSH into the application container and look at the <code class="language-plaintext highlighter-rouge">package.json</code> for your packages in either <code class="language-plaintext highlighter-rouge">/node_modules</code>, <code class="language-plaintext highlighter-rouge">/home/site/wwwroot/node_modules</code>, or <code class="language-plaintext highlighter-rouge">_del_node_modules</code> and cross compare these</li>
      <li>If there is a scenario where <code class="language-plaintext highlighter-rouge">_del_node_modules</code> is being flagged as having an older version of packages. Delete the directory, restart, and then redeploy. <strong>However, you should ensure the applications <code class="language-plaintext highlighter-rouge">package.json</code> (or <code class="language-plaintext highlighter-rouge">package-lock.json</code>, <code class="language-plaintext highlighter-rouge">yarn.lock</code>) should NOT be the one to contain the package version causing this to be flagged in the first place</strong>.
        <ul>
          <li>Testing with build automation enabled (Oryx) and without (which nowadays most sites will use the same <code class="language-plaintext highlighter-rouge">node_module</code> compression logic that Oryx uses as called out in <a href="https://azure.github.io/AppService/2025/07/09/node-optimization.html">Improved Node.js Deployment Performance on Azure App Service</a>) shows that <code class="language-plaintext highlighter-rouge">_del_node_modules</code> and <code class="language-plaintext highlighter-rouge">node_modules</code> will have mirroring package versions</li>
        </ul>
      </li>
    </ul>
  </li>
</ul>

<p>The only directory, in most cases, that you should share about, is the ones described in here: <a href="https://azure.github.io/AppService/2025/07/09/node-optimization.html">Improved Node.js Deployment Performance on Azure App Service</a>.</p>]]></content><author><name></name></author><category term="App Service Linux" /><category term="Configuration" /><category term="Linux" /><category term="App Service Linux" /><category term="Troubleshooting" /><summary type="html"><![CDATA[This post will touch on the directory “_del_node_modules”]]></summary></entry><entry><title type="html">GLIBC version not found</title><link href="https://azureossd.github.io/2026/01/13/glibc-version-not-found/index.html" rel="alternate" type="text/html" title="GLIBC version not found" /><published>2026-01-13T12:00:00+00:00</published><updated>2026-01-13T12:00:00+00:00</updated><id>https://azureossd.github.io/2026/01/13/glibc-version-not-found/glibc-version-not-found</id><content type="html" xml:base="https://azureossd.github.io/2026/01/13/glibc-version-not-found/index.html"><![CDATA[<p>This post will quickly cover this error and some brief actions to take for troubleshooting</p>

<h1 id="overview">Overview</h1>
<p>This post will cover the error <code class="language-plaintext highlighter-rouge">version 'GLIBC_x.xx' not found</code>. This error signature may vary depending on the language of the application, but it generally has the same syntax. This can happen for any language - the context of this post applies to “Blessed Images” but this general information can apply to anywhere this happens on Linux machines.</p>

<p>A full error in a real-world setting is below, where the language is Python:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">ImportError: /lib/x86_64-linux-gpu/libc.so.6: version 'GLIBC_2.33' not found (required by /home/site/wwwroot/antenv/lib/python3.12/site-pcackages/cryptography/hazmat/bindings/_rust.abi3.so)</code></li>
</ul>

<h2 id="deconstructing-the-error"><strong>Deconstructing the error</strong></h2>
<p>The main part of the error is <code class="language-plaintext highlighter-rouge">version 'GLIBC_2.33' not found</code>. Which is saying that a particular version of glibc is not found at runtime. Since App Service Linux is talked about here, the runtime is the container. The container is ran from an image in which a certain version of glibc is included in, normally based on the distribution and version.</p>

<p>This version of glibc is being requested by a specific peice of code. Normally this is going to be something like C bindings or shared object libraries. We see this occasionally with packages that are primarily built upon C, such as <code class="language-plaintext highlighter-rouge">cryptography</code>, <code class="language-plaintext highlighter-rouge">bcrypt</code> and other packages like this.</p>

<h2 id="how-can-you-check-the-glibc-version"><strong>How can you check the glibc version?</strong></h2>
<p>Go into <strong>SSH</strong> and run <code class="language-plaintext highlighter-rouge">ldd --version</code>. Do this in the <strong>application container</strong>, not the Kudu container.</p>

<p><img src="/media/2026/01/glibc_version.png" alt="glibc version" /></p>

<blockquote>
  <p><strong>NOTE</strong>: The application must be running to go into SSH. If it’s crashing you cannot SSH in. You cannot SSH into a container that is crashing, in general.</p>
</blockquote>

<blockquote>
  <p><strong>NOTE</strong>: For custom images/custom containers (Web App for Containers), you need to enable SSH in your image. See <a href="https://azureossd.github.io/2022/04/27/2022-Enabling-SSH-on-Linux-Web-App-for-Containers/index.html">Enabling SSH on Linux Web App for Containers</a></p>
</blockquote>

<p><strong>What is “glibc?”</strong>: This is the C standard library (GNU) which is a foundational aspect in Linux. There is also <code class="language-plaintext highlighter-rouge">musl</code> C, which Alpine uses.</p>

<h2 id="important-points-and-troubleshooting"><strong>Important points and troubleshooting</strong></h2>
<p><strong>Do not try to manually change the libc version</strong>. This means attempting things like trying to install/update <code class="language-plaintext highlighter-rouge">glibc</code> or things along this path because:</p>
<ul>
  <li>You may break other aspects of the container. There is a large amount of tooling builtin to container images that normally rely on C. Changing this may cause adverse/unexpected behavior at runtime</li>
  <li>You may waste a large amount of time. Even though you can alter some behavior via startup commands/scripts - due to complexity of trying to change it while troubleshooting the initial error, you may end up being unsuccessful and ending frusturated</li>
</ul>

<p><strong>Typical path to resolution</strong>:</p>
<ul>
  <li>Identify the package that wants a specific version of glibc. You may be able to downgrade or upgrade these. The package repository (or general online searching) should point to information that shows what version of glibc a package may use</li>
  <li>Or, If you can’t down/upgrade - then for “Blessed Images”, see if you can upgrade to a higher version of the image you’re on. For example, at the time of writing this blog post and in relation to the above error that’s ran within a Python “Blessed” image, the following glibc versions are utilized:
    <ul>
      <li><strong>Python 3.11 image</strong>: glibc 2.21</li>
      <li><strong>Python 3.12 image</strong>: glibc 2.21</li>
      <li><strong>Python 3.13 image</strong>: glibc 2.36</li>
      <li><strong>Python 3.14 image</strong>: glibc 2.39</li>
      <li>The error assumes that the application is using an image that’s 3.12 or below. Therefor a potential resolution is to use the Python 3.13 image or higher.</li>
      <li>This same concept should apply to other Blessed Images. Of course, there may be differences in which glibc versions are used across language versions of the other images. Use <code class="language-plaintext highlighter-rouge">ldd --version</code> to check the glibc version while in SSH</li>
    </ul>
  </li>
  <li>Or, Use a custom image through Web App for Containers. There may be packages that require a specific version that may not be compatible with the current distribution version or other aspects in Blessed Images. <strong>This is an occasionally recommended path for things like this is to use a custom image since you have complete control over how the image is built.</strong></li>
</ul>]]></content><author><name></name></author><category term="App Service Linux" /><category term="Configuration" /><category term="Linux" /><category term="App Service Linux" /><category term="Troubleshooting" /><summary type="html"><![CDATA[This post will quickly cover this error and some brief actions to take for troubleshooting]]></summary></entry><entry><title type="html">Using Webhooks for image pulls with Web App for Containers</title><link href="https://azureossd.github.io/2025/12/16/Using-Webhooks-for-image-pulls-with-Web-App-for-Containers/index.html" rel="alternate" type="text/html" title="Using Webhooks for image pulls with Web App for Containers" /><published>2025-12-16T12:00:00+00:00</published><updated>2025-12-16T12:00:00+00:00</updated><id>https://azureossd.github.io/2025/12/16/Using-Webhooks-for-image-pulls-with-Web-App-for-Containers/Using-Webhooks-for-image-pulls-with-Web-App-for-Containers</id><content type="html" xml:base="https://azureossd.github.io/2025/12/16/Using-Webhooks-for-image-pulls-with-Web-App-for-Containers/index.html"><![CDATA[<p>This post will cover using “webhooks” to initiate image pulls for Web App for Containers</p>

<h1 id="overview">Overview</h1>
<p>A “webhook” in this context is an endpoint exposed on the Kudu/.scm. site for a Web App for Containers resource that offeres another way to restart the site, which will also initiate an image pull, as apart of the site restart operations.</p>

<p>This endpoint is exposed at: <code class="language-plaintext highlighter-rouge">https://[username]:[password]@mysite-bxx8bgaxxxxx.scm.eastus-01.azurewebsites.net/api/registry/webhook</code></p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">[username]</code> is the username for “Basic Auth” credentials that are enabled/disabled in the <strong>Configuration</strong> blade within the portal</li>
  <li>[password]` is the password “Basic Auth” credentials</li>
</ul>

<p>These credentials are found in <strong>Deployment Center</strong> under the <em>Application-scope</em> section by default.</p>

<h2 id="mechanics-of-the-webhook">Mechanics of the webhook</h2>
<p>This is touched on above, but to be more specific, this endpoint offers another way to trigger an image pull.</p>

<p>In most cases, users will update the images tag set for an image used on an application (in this case, Web App for Containers), to something unique - typically incremental over time as updates are done to an applications image, eg:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">myacr.azurecr.io/myimage:2025-12-16-v1.0.0.1</code></li>
  <li><code class="language-plaintext highlighter-rouge">myacr.azurecr.io/myimage:2025-12-16-v1.0.0.2</code></li>
  <li><code class="language-plaintext highlighter-rouge">myacr.azurecr.io/myimage:2025-12-16-v1.0.0.3</code></li>
</ul>

<p>Since these are site configuration level updates, and implicitly also will cause a restart - which the “restart” causes the image to be <strong>repulled</strong>, you’ll have your new image downloaded with all relevant layer changes. This normally fits most use cases. Another scenario that may happen is users pushing changes to the same tag, which is not <strong>recommended</strong> - in this case, you’d have to manually restart the application for changes to any layers on that same tag to be repulled.</p>
<ul>
  <li><strong>Note</strong>: App Service, like most other PaaS services, do not “watch” or have any awareness that you would have pushed a change to the <em>same</em> tag. A restart is the only way these new changed layers would be repulled with the container runtime.</li>
</ul>

<p>A webhook, on the other hand, does have “awareness” that an event happened for a specific tag, defined by what <code class="language-plaintext highlighter-rouge">action</code> and <code class="language-plaintext highlighter-rouge">scope</code>. This webhook is created on <strong>Azure Container Registry</strong> - it is not created on App Service, only the endpoint exposed to allow restarts for a notification event is exposed by App Service. The only relevancy this has to App Service is the <code class="language-plaintext highlighter-rouge">Service URI</code>, in which a “notification” is posted (eg. to restart)</p>

<p>More information on Azure Container Registry Webhooks can be found at <a href="https://learn.microsoft.com/en-us/azure/container-registry/container-registry-webhook">Using Azure Container Registry webhooks</a></p>

<h2 id="use-cases---pushing-changes-to-the-same-tag">Use cases - pushing changes to the same tag</h2>
<p>Although not recommended for reasons explained in the overall container community, you can push changes to the same tag - and not have to manually restart (or write some automation/operation to restart the site) when pushing changes to the same tag being used by your Web App for Container.</p>

<p>To be more specific and clear on “pushing changes to the same tag”, this means that you’re <strong>just only</strong> pushing changes to your Azure Container Registry. You also do <strong>not</strong> need to have some type of deployment event (like a CI/CD pipeline, or other explicit CLI commands or IaC set up) that would need to handle deployment. This can be seen as another benefit in certain situations.</p>

<p>By enabling a webhook (below), and keeping the scope to that same <code class="language-plaintext highlighter-rouge">image:tag</code> in use, you’ll see that by reviewing <code class="language-plaintext highlighter-rouge">docker.log</code> under <code class="language-plaintext highlighter-rouge">/home/LogFiles</code> in App Service Logs (or other areas like Log stream), that the application will be restarted, which will repull image layers, and thus your changes.</p>
<ul>
  <li>From the time the webhook notification is posted to the App Service webhook endpoint, to the time it takes to start the image pull, may not be immediate</li>
  <li>Meaning, there may be a couple of seconds or so gap, which is expected</li>
</ul>

<h2 id="pushing-to-a-webhook">Pushing to a webhook</h2>
<p>After setting up your webhook, you can use the UI on the Azure Container Registry side to look at webhook events. In this example, we pushed an image with the tag of <code class="language-plaintext highlighter-rouge">latest</code> to Azure Container Registry - which fired this webhook.</p>

<p>Click into the webhook to see if any events fired, for example:</p>

<p><img src="/media/2025/12/wafc-webhooks-3.png" alt="ACR webhook push event" /></p>

<p>If we look at <code class="language-plaintext highlighter-rouge">docker.log</code> (or generally container lifecycle events on this Web App for Container), you’ll see a restart was triggered at the same time:</p>
<blockquote>
  <p><strong>Note</strong>: 11:52AM EST = 16:52 UTC</p>
</blockquote>

<p><img src="/media/2025/12/wafc-webhooks-4.png" alt="WaFC pull event" /></p>

<h2 id="creating-a-webhook">Creating a webhook</h2>
<p>If you want to create a webhook for a registry that is <strong>not</strong> Azure Container Registry, you need to follow the below option to manually create it on your Azure Container Registry.</p>

<h3 id="on-azure-container-registry">On Azure Container Registry</h3>
<p>On your Azure Container Registry, go to <strong>Services</strong> &gt; <strong>Webhooks</strong>. Click “Add”. Then fill out the required information.</p>

<p><img src="/media/2025/12/wafc-webhooks-1.png" alt="ACR webhook creation" /></p>

<h3 id="on-app-service">On App Service</h3>
<p>Go to <strong>Deployment Center</strong> &gt; Check the <em>Continuous deployment</em> box, and click <strong>Save</strong>. This will create a ready-to-go webhook in your Azure Container Registry under <em>webhooks</em>. This will be created with a <code class="language-plaintext highlighter-rouge">scope</code> of your current <code class="language-plaintext highlighter-rouge">image:tag</code> set on the Web App for Container.</p>

<p><img src="/media/2025/12/wafc-webhooks-2.png" alt="ACR webhook enablement on App Service" /></p>

<h3 id="other-information">Other information</h3>
<ul>
  <li><code class="language-plaintext highlighter-rouge">push</code> is the default action, which should be kept</li>
  <li><code class="language-plaintext highlighter-rouge">scope</code> is also important, in-line with the above information in this post, keep this to a specific <code class="language-plaintext highlighter-rouge">image:tag</code> you’ll be pushing changes to. <code class="language-plaintext highlighter-rouge">scope</code> is what the webhook is “watching” changes for. This can be a specific tag, or all tags in a repository.
    <ul>
      <li><strong>Note</strong>: Doing something like <code class="language-plaintext highlighter-rouge">image:*</code> to watch any tags under an image repository won’t do anything in App Service’s case since for Web App for Containers, this isn’t going to change the images tag on App Service to one that was updated (or anything similiar to  that). Forthat effect, you need to do change the  image tag on App Service explicitly.</li>
    </ul>
  </li>
</ul>

<h1 id="troubleshooting">Troubleshooting</h1>
<p>If you notice no restart is happening, go to Azure Container Registry and look at your webhook events. Typically, a HTTP 4xx or 5xx status would be indicative of an issue. Below are examples of HTTP 401’s</p>

<p><strong>HTTP 401</strong>:</p>
<ul>
  <li>Your username and/or password is incorrect. If manually setting this up on the ACR side, double check your credentials</li>
  <li>As a test, set this up via the App Service side since it’ll automatically fill this information in for you</li>
  <li>You also may have Basic Auth publishing <strong>disabled</strong>. <strong>This must be enabled for webhooks to work</strong>.</li>
</ul>

<p><strong>HTTP 403</strong></p>
<ul>
  <li>Your site likely has a Private Endpoint or Access Restrictions
    <ul>
      <li>If this is due to Access Restrictions, you will need to whitelist the registry IP</li>
      <li>For Private Endpoints, ensure the registry has access to the subnet App Service is in</li>
    </ul>
  </li>
</ul>

<p><strong>HTTP 500</strong></p>
<ul>
  <li>The kudu container is experiencing an issue. Try scaling (up/down) to land on a new instance and have the kudu container restart</li>
</ul>

<p><strong>HTTP 503</strong></p>
<ul>
  <li>The kudu container is likely crashing. If using Bring Your Own Storage - when the volume is failing to mount, this will also bring down the kudu container. Review if this is the case. See <a href="https://azureossd.github.io/2023/04/20/How-to-troubleshoot-Bring-Your-Own-Storage-(BYOS)-Issues-on-App-Service-Linux/index.html">How to troubleshooting Bring Your Own Storage (BYOS) issues on App Service Linux</a></li>
  <li>Otherwise, try scaling (up/down) to land on a new instance and have the kudu container restart</li>
</ul>

<p><strong>No HTTP response is seen/no notification is posted</strong></p>
<ul>
  <li>There are times inbound network traffic may block a webhook event, which may not return a HTTP response from Kudu (since its blocked before this)</li>
  <li>Review inbound NSG’s and any networking flows towards App Service to ensure this traffic is allowed</li>
</ul>]]></content><author><name></name></author><category term="Web App for Containers" /><category term="Configuration" /><category term="Linux" /><category term="Web App for Containers" /><category term="Images" /><summary type="html"><![CDATA[This post will cover using “webhooks” to initiate image pulls for Web App for Containers]]></summary></entry><entry><title type="html">Capacity Planning with Azure Container Apps Workload Profiles : Per-Node, Per-Replica and Practical Sizing</title><link href="https://azureossd.github.io/2025/12/10/Capacity-Planning-ACA/index.html" rel="alternate" type="text/html" title="Capacity Planning with Azure Container Apps Workload Profiles : Per-Node, Per-Replica and Practical Sizing" /><published>2025-12-10T12:00:00+00:00</published><updated>2025-12-10T12:00:00+00:00</updated><id>https://azureossd.github.io/2025/12/10/Capacity-Planning-ACA/Capacity-Planning-ACA</id><content type="html" xml:base="https://azureossd.github.io/2025/12/10/Capacity-Planning-ACA/index.html"><![CDATA[<h2 id="overview">Overview</h2>
<p>Azure Container Apps (ACA) simplifies container orchestration, but capacity planning often confuses developers. Questions like “How do replicas consume node resources?”, “When does ACA add nodes?”, and “How should I model limits and requests?”. This guide pairs <strong>official ACA guidance</strong> with <strong>practical examples</strong> to demystify workload profiles, autoscaling, and resource modelling.</p>

<h2 id="understanding-workload-profiles-in-azure-container-apps">Understanding Workload Profiles in Azure Container Apps</h2>

<p>ACA offers three profile types:</p>
<h3 id="consumption">Consumption</h3>
<ul>
  <li>Scales to zero</li>
  <li>Platform decides node size</li>
  <li>Billing per replica execution time</li>
</ul>

<h3 id="dedicated">Dedicated</h3>
<ul>
  <li>Choose VM SKU (e.g., D4 → 4 vCPU, 16 GiB RAM)</li>
  <li>Billing per node</li>
</ul>

<h3 id="flex-preview">Flex (Preview)</h3>
<ul>
  <li>Combines dedicated isolation with consumption-like billing</li>
</ul>

<p>Each profile defines <strong>node-level resources</strong>. For Example: D4 → 4 vCPU, 16 GiB RAM per node.</p>

<h2 id="2-how-replicas-consume-node-resources">2. How Replicas Consume Node Resources</h2>
<p>ACA runs on managed Kubernetes.</p>
<ul>
  <li><strong>Node = VM with fixed resources</strong></li>
  <li><strong>Replica = Pod scheduled on a node</strong></li>
  <li>Replicas share node resources; ACA packs replicas until node capacity is full.</li>
</ul>

<h3 id="example">Example</h3>
<p>Node: D4 (4 vCPU, 16 GiB RAM)<br />
Replica requests: 1 vCPU, 2 GiB<br />
5 replicas → Needs 5 vCPU, 10 GiB</p>

<p>ACA places <strong>4 replicas on Node 1</strong> and adds <strong>Node 2</strong> for <strong>replica 5</strong></p>

<p><img src="/media/2025/12/capacity_planning_aca_pic2.png" alt="" /></p>

<h2 id="3-when-aca-adds-nodes">3. When ACA Adds Nodes</h2>
<p>ACA adds nodes when:</p>
<ul>
  <li>Pending replicas cannot fit on existing nodes</li>
  <li>Resource requests exceed available capacity</li>
</ul>

<p>ACA uses Kubernetes scheduling principles. Nodes scale out when pods are not schedulable due to CPU/memory constrains.</p>

<h2 id="4-practical-sizing-strategy">4. Practical Sizing Strategy</h2>
<ol>
  <li>Identify peak load → translate to CPU/memory per replica</li>
  <li>Choose workload profile SKU (e.g., D4)</li>
  <li>Calculate packing: <strong>node capacity ÷ replica request = max replica node</strong></li>
  <li>Add buffer ( e.g 20% ) for headroom</li>
  <li>Configure autoscaling:
    <ul>
      <li>Min replicas for HA.</li>
      <li>Max replicas for burst.</li>
      <li>Min/Max nodes for cost control.</li>
    </ul>
  </li>
</ol>

<h2 id="5-common-misconceptions">5. Common Misconceptions</h2>
<p><strong>Myth:</strong> “Replicas have dedicated CPU/RAM per container automatically.”
<strong>Reality:</strong> Not exactly.They <strong>consume from the node pool</strong> based on your configured CPU &amp; memory. Multiple replicas compete for the <strong>same node</strong> until capacity is exhausted.</p>

<p><strong>Myth:</strong> “ACA node scaling is CPU-time based.”
<strong>Reality:</strong> ACA node scaling is driven by <strong>unschedulable replicas</strong> (cannot place due to configured resources). Triggers for <strong>replica scaling</strong> are KEDA rules (HTTP, queue, CPU/memory %, etc.), but <strong>node scale</strong> follows from replica placement pressure.</p>

<h2 id="6-key-takeaways">6. Key Takeaways</h2>
<ul>
  <li>Model <strong>per-node packing</strong> before setting replica counts</li>
  <li>Plan for <strong>zero-downtime upgrades</strong> (double replicas temporarily)</li>
  <li>Monitor autoscaling behavior; defaults may not fit every workload</li>
</ul>]]></content><author><name></name></author><category term="Azure Container Apps" /><category term="Troubleshooting" /><category term="Container Apps" /><category term="Availability" /><category term="Configuration" /><category term="Troubleshooting" /><summary type="html"><![CDATA[Overview Azure Container Apps (ACA) simplifies container orchestration, but capacity planning often confuses developers. Questions like “How do replicas consume node resources?”, “When does ACA add nodes?”, and “How should I model limits and requests?”. This guide pairs official ACA guidance with practical examples to demystify workload profiles, autoscaling, and resource modelling.]]></summary></entry><entry><title type="html">Setting up a NFS volume with Azure Container Apps</title><link href="https://azureossd.github.io/2025/10/17/Setting-up-a-NFS-volume-with-Azure-Container-Apps/index.html" rel="alternate" type="text/html" title="Setting up a NFS volume with Azure Container Apps" /><published>2025-10-17T12:00:00+00:00</published><updated>2025-10-17T12:00:00+00:00</updated><id>https://azureossd.github.io/2025/10/17/Setting-up-a-NFS-volume-with-Azure-Container-Apps/Setting-up-a-NFS-volume-with-Azure-Container-Apps</id><content type="html" xml:base="https://azureossd.github.io/2025/10/17/Setting-up-a-NFS-volume-with-Azure-Container-Apps/index.html"><![CDATA[<p>This post will cover how to set up an NFS volume with Azure Container Apps through the Azure Portal</p>

<h1 id="overview">Overview</h1>
<p>As of writing this, documentation for mounting volumes with Azure Container Apps is mostly cli-based. This post will cover a basic NFS set up from scratch. Current documentation for NFS volumes and Azure Container Apps can be found at <a href="https://learn.microsoft.com/en-us/azure/container-apps/storage-mounts?tabs=nfs&amp;pivots=azure-cli">Use storage mounts in Azure Container Apps</a></p>

<h1 id="creating-resources">Creating resources</h1>
<h2 id="storage">Storage</h2>
<p>To start, you’ll need a <strong>Premium tier</strong> Storage Account. An NFS share cannot be created on a <em>Standard</em> one if that’s being used.</p>

<ol>
  <li>
    <p>In the Azure Portal, click on ‘Create a Resource’ and search for <em>Storage Account</em></p>

    <p><img src="/media/2025/10/aca-nfs-creation-1.png" alt="Storage Account resource" /></p>
  </li>
  <li>Set the following options on the Storage Account basics
    <ul>
      <li><strong>Subscription</strong>: Choose your subscription to create the Storage Account in. Use the same subscription as the Container App Environment</li>
      <li><strong>Resource Group</strong>: Choose with Resource Group to create this in</li>
      <li><strong>Storage Account Name</strong>: Choose a name for the account</li>
      <li><strong>Region</strong>: Choose a region for the account</li>
      <li><strong>Preferred storage type</strong>: Set this to <em>Azure Files</em></li>
      <li><strong>Performance</strong>: Set this to <strong>Premium</strong></li>
      <li><strong>File share billing</strong> and <strong>Redundancy</strong>: Set this to your preferred options</li>
    </ul>

    <p><img src="/media/2025/10/aca-nfs-creation-2.png" alt="Storage Account Basics creation tab" /></p>
  </li>
  <li>Select <strong>Review and create</strong> to create the Storage Account. The rest of the tabs during the creation process can be left the same.</li>
</ol>

<h3 id="create-the-nfs-share">Create the NFS share</h3>
<ol>
  <li>On the Storage Account, go to the <strong>Data Storage</strong> blade and then select <em>File Shares</em></li>
  <li>Click on “+ File Share” to create a new share</li>
  <li>In the File Share creation window, set the following:
    <ul>
      <li><strong>Name</strong>: Name of your NFS share</li>
      <li><strong>Protocol</strong>: NFS</li>
      <li><strong>Root Squash</strong>: Keep this defaulted to <em>No Root Squash</em></li>
      <li>The rest of the options under <em>Provisioned storage</em> and <em>Performance</em> can be left alone unless desired to be changed.</li>
    </ul>

    <p><img src="/media/2025/10/aca-nfs-creation-3.png" alt="NFS share creation" /></p>
  </li>
  <li>Click <strong>Review + create</strong> to create the share.</li>
</ol>

<h3 id="remove-secure-transfer-required">Remove “Secure Transfer Required”</h3>
<p>On the Storage Accountt, go to <strong>Settings</strong> &gt; <strong>Configuration</strong> and disable <em>Secure transfer required</em>. If this is not done, you’ll see a volume mount failure later on with a message of <code class="language-plaintext highlighter-rouge">mount.nfs: access denied by server while mounting</code> in Container App system logs</p>

<p><img src="/media/2025/10/aca-nfs-creation-4.png" alt="Disable secure transfer required" /></p>

<h2 id="container-app-environment">Container App Environment</h2>
<p>At this point, if creating the Storage Account first, you’ll see the message about any access to the NFS share is through a Virtual Network. For the Storage Account, VNET integration can be done later on after the fact, post-creation.</p>

<p>With a Container App Environment, the Virtual Network <strong>needs to be specified at creation time</strong>. You cannot enable a VNET post-creation - so if you’re following along and your environment does <em>not</em> have a VNET, you’ll need to create a net-new one.</p>

<h3 id="create-a-container-app-environment-and-container-app-with-a-vnet">Create a Container App Environment and Container App (with a VNET)</h3>
<ol>
  <li>
    <p>In the Azure Portal, click on ‘Create a Resource’ and search for <em>Container App</em></p>

    <p><img src="/media/2025/10/aca-nfs-creation-5.png" alt="Container App creation blade" /></p>
  </li>
  <li>On the <strong>Basics</strong> blade:
    <ul>
      <li><strong>Subscription</strong>: Choose a subscription that the Storage Account from earlier exists i n</li>
      <li><strong>Resource group</strong>: Choose a Resource Group</li>
      <li><strong>Container app name</strong>: Choose a name for your Container App</li>
      <li>For <em>Optimize for Azure Functions</em>, leave this as-is (unless you’re deploying an Azure Function)</li>
      <li><strong>Deployment Source</strong>: Container image (unless you’re deploying from source - eg. source-to-cloud)</li>
      <li><strong>Container Apps environment</strong>:
        <ul>
          <li><strong>Region</strong>: Select a region for the envinronment</li>
          <li><strong>Container apps environment</strong>: Click on “Create new environment”
            <ul>
              <li><strong>Environment name</strong>: Give a name for the environment</li>
              <li><strong>Networking blade</strong>: In the networking blade, select <em>Use your own virtual network</em> as “Yes” under the <strong>Virtual network</strong> section
                <ul>
                  <li>Select <em>Create new</em> if you do not already have an existing VNET and an <strong>empty</strong> subnet</li>
                  <li>Leave <em>Virtual IP</em> as external (for the sake of this blog post)</li>
                  <li>Leave <em>Infrastructure resource group</em> empty so it takes the default name</li>
                  <li>
                    <p>All the other blades for <em>Workload profiles</em> and <em>Monitoring</em> can be left-as unless you want to enable Workload Profiles or change your logging destination during the creation process</p>

                    <p><img src="/media/2025/10/aca-nfs-creation-6.png" alt="Container App creation blade" /></p>
                  </li>
                  <li>Then, select <strong>Create</strong></li>
                </ul>
              </li>
            </ul>
          </li>
        </ul>
      </li>
    </ul>
  </li>
  <li>On the <strong>Container</strong> blade
    <ul>
      <li>You can either use the quickstart image, by checking the <em>Use quickstart image</em> image checkbox</li>
      <li>Or, bring your own image - <strong>NOTE</strong>: Failures can happen post-creation, which, at this point would have nothing to do with the storage volume. Always review error logs - follow (Applications (and revisions) stuck in activating state on Azure Container Apps)[https://azureossd.github.io/2025/05/05/Applications-(and-revisions)-stuck-in-activating-state-on-Azure-Container-Apps/index.html] for common reasons why revisions may end up in a degraded or failed state.</li>
    </ul>

    <p><img src="/media/2025/10/aca-nfs-creation-7.png" alt="Container App creation blade" /></p>
  </li>
  <li><strong>Ingress</strong> blade
    <ul>
      <li>If using your own container, and if this application expects external traffic, ensure the <strong>Target port</strong> matches the application listening port</li>
    </ul>
  </li>
  <li>Click <strong>Review and create</strong> to create the environment with the selected VNET, as well as the Container App.</li>
</ol>

<h3 id="option-1-service-endpoint---allow-the-storage-account-access-from-the-aca-vnet">(Option 1) Service Endpoint - Allow the Storage Account access from the ACA VNET</h3>
<p>Since the VNET is now created - go back to your Storage Account from earlier.</p>

<ol>
  <li>
    <p>Go to the File Share. Under <strong>Overview</strong> will show the following:</p>

    <p><img src="/media/2025/10/aca-nfs-creation-8.png" alt="NFS share network setup" /></p>
  </li>
  <li>For ease of setup in terms of this blog post, select <strong>Service Endpoint</strong></li>
  <li>Use the following options:
    <ul>
      <li><strong>Public network access</strong>: Enabled from selected virtual networks and IP addresses</li>
      <li><strong>Virtual networks</strong>: Select <em>Add existing virtual network</em> and then select the VNET and subnet from earlier. This will prompt to enable a Service Endpoint, which you want to do. Select <strong>Enable</strong></li>
    </ul>

    <p><img src="/media/2025/10/aca-nfs-creation-9.png" alt="Service Endpoint setup" /></p>

    <ul>
      <li>After this process completes while in the same blade, click <strong>Add</strong></li>
      <li>Then, click <strong>Save</strong></li>
    </ul>
  </li>
  <li>
    <p>At this point, <strong>Endpoint Status</strong> should be <em>Enabled</em> and we can now move on to adding the storage resource and volume on the environment and application</p>

    <p><img src="/media/2025/10/aca-nfs-creation-10.png" alt="Service Endpoint setup complete" /></p>
  </li>
</ol>

<h3 id="option-2-private-endpoint---allow-the-storage-account-access-from-the-aca-vnet">(Option 2) Private Endpoint - Allow the Storage Account access from the ACA VNET</h3>
<p>Another option is using a Private Endpoint. You’ll need an <em>empty</em> subnet (different than the one given to the Container App Environment) to proceed. You can create an empty subnet in your Virtual Network with the defaults provided and then move back to the below blade on the Storage Account - or - just follow <a href="https://learn.microsoft.com/en-us/azure/storage/files/storage-files-networking-endpoints?tabs=azure-portal#create-a-private-endpoint">Create a private endpoint for Azure Files</a></p>

<ol>
  <li>
    <p>Go to the File Share. Under <strong>Overview</strong> will show the following:</p>

    <p><img src="/media/2025/10/aca-nfs-creation-8.png" alt="NFS share network setup" /></p>
  </li>
  <li>Select <strong>Private Endpoint</strong>. This will take you through the creation blade for a <strong>Private Endpoint</strong>. Follow the step form to create this:
    <ul>
      <li><strong>Basics</strong>:
        <ul>
          <li><strong>Subscription</strong>: Create this in the same subscription as the Container App Environment and Storage account
            <ul>
              <li><strong>Resource Group</strong>: Choose a resouce group</li>
            </ul>
          </li>
          <li><strong>Name</strong>: Give the Private Endpoint a name</li>
          <li><strong>Network Interface Name</strong>: This will defaul to <code class="language-plaintext highlighter-rouge">[name]-nic</code>, leave this as the default</li>
        </ul>
      </li>
      <li><strong>Resource</strong>: Leave the defaults</li>
      <li><strong>Virtual Network</strong>:
        <ul>
          <li><strong>Virtual Network</strong>: Select the VNET from earlier</li>
          <li><strong>Subnet</strong>: Select the empty subnet created (mentioned above)</li>
          <li>Leave the rest of the defaults for all following tabs</li>
        </ul>
      </li>
      <li>Click through and then click <strong>Create</strong></li>
    </ul>
  </li>
  <li>After Private Endpoint creation, you should see <strong>Private link resource</strong> set to your Storage Account. If you <code class="language-plaintext highlighter-rouge">nslookup</code> your file share FQDN, you should see this now also has the private link alias associated with it:</li>
</ol>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ nslookup myfsstorageaccount.file.core.windows.net
Server:  some.thing.com
Address:  xxx.xx.xx.xx

Non-authoritative answer:
Name:    file.xxxxxxxxxx.store.core.windows.net
Address:  xx.xx.xxx.x
Aliases:  myfsstorageaccount.file.core.windows.net
          myfsstorageaccount.privatelink.file.core.windows.net
</code></pre></div></div>

<h3 id="create-a-storage-resource-on-the-environment">Create a Storage Resource on the environment</h3>
<blockquote>
  <p><strong>NOTE</strong>: Irregardless if you used a Service Endpoint or Private Endpoint, the normal FQDN for the file share (eg. somefileshare.file.core.windows.net will be used below - don’t use the Private Link alias if a Private Endpoint was used)</p>
</blockquote>

<ol>
  <li>Go back to the <strong>Container App Environment</strong> and select <strong>Azure Files</strong> under <em>Settings</em></li>
  <li>Select <strong>Add</strong> and then <em>Network File System (NFS)</em></li>
  <li>In the pop-out blade, set the following:
    <ul>
      <li><strong>Name</strong>: Set a name for the resource</li>
      <li><strong>Server</strong>: Set this to <code class="language-plaintext highlighter-rouge">yourstorageaccount.file.core.windows.net</code></li>
      <li><strong>File share name</strong>: Set this to <code class="language-plaintext highlighter-rouge">/yourstorageaccount/yournfssharename</code></li>
      <li><strong>Access Mode</strong>: Set your access mode to read/write or read-only</li>
    </ul>

    <p><strong>IMPORTANT</strong>: Make sure to include the forward slashes, this syntax is expected.</p>

    <p><img src="/media/2025/10/aca-nfs-creation-12.png" alt="ACA NFS setup blade" /></p>
  </li>
</ol>

<p><strong>NOTE</strong>: If you go back to your NFS file share, you’ll see this info in which you can pull from</p>

<p><img src="/media/2025/10/aca-nfs-creation-11.png" alt="NFS mount information" /></p>

<ol>
  <li>Select <strong>Add</strong> and then <strong>Save</strong></li>
</ol>

<h3 id="add-a-volume-to-the-container-app">Add a volume to the Container App</h3>
<ol>
  <li>Go to your <strong>Azure Container App</strong> that you created and go to <strong>Application</strong> &gt; <strong>Volumes</strong></li>
  <li>Select <strong>Add</strong> to add a volume
    <ul>
      <li><strong>Volume type</strong>: Azure file volume</li>
      <li><strong>Name</strong>: Choose a name for the volume</li>
      <li><strong>File share name</strong>: Choose the name of the storage resource created earlier above on the <em>Container App Environment</em></li>
      <li><strong>Mount options</strong>: Omit this unless you know which options to pass in. Allowed mount options are also the ones listed on the NFS “man page” - <a href="https://linux.die.net/man/5/nfs">nfs(5) - Linux man page</a></li>
    </ul>

    <p><img src="/media/2025/10/aca-nfs-creation-13.png" alt="Adding a volume on the Container App" /></p>
  </li>
  <li>Lastly, click <strong>Add</strong>. Then click <strong>Save as new revision</strong></li>
</ol>

<h3 id="mount-the-volume-to-the-container-app">Mount the volume to the Container App</h3>
<blockquote>
  <p><strong>NOTE</strong>: You can either explicitly create a new revision through the <em>Revisions and replicas</em> or do this through the <em>Containers</em> blade</p>
</blockquote>

<ol>
  <li>Go to your <strong>Azure Container App</strong> that you created and go to <strong>Application</strong> &gt; <strong>Revisions and replicas</strong></li>
  <li>Click <strong>Create  new revision</strong></li>
  <li>Select the container to mount a volume to, then click <strong>Edit</strong></li>
  <li>On the popout window, go to the <em>Volume mounts</em> tab
    <ul>
      <li><strong>Volume name</strong>: Select the <em>Volume</em> created just before on the Container App</li>
      <li><strong>Mount path</strong>: Specify a directory to mount this volume to</li>
    </ul>
  </li>
  <li>
    <p>Click <strong>Save</strong>, then click <strong>Create</strong></p>

    <p><img src="/media/2025/10/aca-nfs-creation-14.png" alt="Mounting a volume on the Container App" /></p>
  </li>
</ol>

<h3 id="confirm-the-volume-is-mounted">Confirm the volume is mounted</h3>
<p>If the Revision is not in a degraded or failed state after enabling the volume mount, this already would imply this is successful.</p>

<p>This is is now <strong>Failed</strong> or <strong>Degraded</strong>, ensure your <em>Logs destination</em> is set to either Log Analytics or Azure Monitor and query the system logs table for <code class="language-plaintext highlighter-rouge">ContainerAppSystemLogs</code> (Azure Monitor) or <code class="language-plaintext highlighter-rouge">ContainerAppSystemLogs_CL</code> (Log Analytics) and review the errors seen there.</p>

<p>Otherwise, go to the <strong>Console</strong> blade, and run <code class="language-plaintext highlighter-rouge">df -a</code> or <code class="language-plaintext highlighter-rouge">df -h</code>. You can also use the <code class="language-plaintext highlighter-rouge">mount</code> command, assuming it’s installed in your container image and available in the running container, to use the command <code class="language-plaintext highlighter-rouge">mount | grep "nfs"</code> to see additional arguments and version information</p>

<p><img src="/media/2025/10/aca-nfs-creation-15.png" alt="Console output on the Container App" /></p>

<p>If using a Private Endpoint, you’ll still see the normal FQDN for the file share in the above output. But if you <code class="language-plaintext highlighter-rouge">nslookup</code> this from within the container, you should see this resolves to the IP of the NIC for the Private Endpoint</p>

<p><img src="/media/2025/10/aca-nfs-creation-18.png" alt="Private Endpoint output on the Container App" /></p>

<p>We know this works because if this was failing, console access would be unavailable due to the fact the volume monut happens early on in the container lifecycle  (within a pod) - a container is not yet created and running by that point in time.</p>

<h1 id="faqs-and-other-information">FAQs and other information</h1>
<h2 id="faqs">FAQs</h2>
<ul>
  <li>You cannot pass mount options like <code class="language-plaintext highlighter-rouge">-t</code>, <code class="language-plaintext highlighter-rouge">--types</code> through the “mount options” field. <code class="language-plaintext highlighter-rouge">-t</code> is only set when you either choose SMB (<code class="language-plaintext highlighter-rouge">mount.cifs</code>) or NFS (<code class="language-plaintext highlighter-rouge">mount.nfs</code>). Using the above <em>Console</em> method, you can see that the NFS version used is <code class="language-plaintext highlighter-rouge">nsf4</code>. This cannot be changed (eg. to <code class="language-plaintext highlighter-rouge">aznfs</code>)</li>
  <li>SMB and NFS have different methods of setting file/directory permissions which can be seen here: <a href="https://azureossd.github.io/2024/12/30/Container-Apps-Setting-storage-directory-permissions/index.html">Container Apps - Setting storage directory permissions</a></li>
  <li>SMB requires Access Keys to set up a storage resource and mount a volume, NFS does not</li>
  <li>NFS requires a VNET on the Container App Environment and integrated with the Storage Account to make all of this possible</li>
  <li>This blog post only covered a “typical” Private Endpoint set up - if your VNET has custom DNS, ensure the file share FQDN can be resolved against your DNS servers. You may need to add Azure DNS (168.63.129.16) to “resolve unresolved queries” on your DNS servers</li>
  <li>If a volume mount is failing, no “application” / “console” logs will be available, only system logs. This is because a volume mount happens very early on in a pod lifecycle, prior to a running container. A container never gets to the point of being successfully created and running, which means no application process ever starts (and thus, no application logs)</li>
</ul>

<h2 id="troubleshooting">Troubleshooting</h2>
<ul>
  <li>For general storage mount troubleshooting, review <a href="https://azureossd.github.io/2023/07/24/Troubleshooting-volume-mount-issues-on-Azure-Container-Apps/index.html">Troubleshooting volume mount issues on Azure Container Apps</a></li>
  <li>You also should review <a href="https://azureossd.github.io/2025/02/10/Azure-Files-security-compatability-on-Container-Apps/index.html">Azure Files security compatability on Container Apps</a>. This by default is set to “maximum capability” on the file share - if this is changed to options other than this, you risk having the mount fail with <code class="language-plaintext highlighter-rouge">permission denied (13)</code></li>
  <li>Forgetting to disable “Secure Transfer Required” on the Storage Account will also cause <code class="language-plaintext highlighter-rouge">permission denied</code></li>
</ul>

<p>Aside from system logs, going to <strong>Diagnose and Solve Problems</strong> &gt; <strong>Storage Mount Failures</strong> will also display storage troubleshooting information for mount failure on the Container App</p>

<p><img src="/media/2025/10/aca-nfs-creation-16.png" alt="Diagnose and Solve Problems" /></p>]]></content><author><name></name></author><category term="Azure Container Apps" /><category term="Troubleshooting" /><category term="Container Apps" /><category term="Availability" /><category term="Configuration" /><category term="Troubleshooting" /><summary type="html"><![CDATA[This post will cover how to set up an NFS volume with Azure Container Apps through the Azure Portal]]></summary></entry></feed>