Container Apps lets you use multiple containers, if desired - this includes “sidecar” and “init” containers. These will all run together in a application pod or replica.
Since Container Apps is powered by Kubernetes, see Kuberenetes.io - Pods with multiple containers.
An excerpt from the above link helps explain multicontainer usage in a pod or replica:
Pods in a Kubernetes cluster are used in two main ways
Technically, most pods or replicas on Container Apps will already run in a multicontainer kind of set up. Since enabling services like Dapr, Built-in authentication, Managed Identity will add respective sidecar containers - and other “system” or “service” containers may exist outside of this.
However, the premise of “multi container” usage in this post and what’s documented publicly is about being able to use more than one application container.
There are two container types (aside from the “main” container):
A sidecar will continuously run in the pod or replica (think background service or another TCP/HTTP/gRPC based application)
An init container will run, finish its logic, and exit prior to the application container(s) starting. For information on init container usage, see the below blog posts:
Sidecar containers are still defined in the templates.containers
array in the Container App ARM spec. Essentially, if there is more than one (1) container defined in this array, the application pod or replica is using a multi container set up with sidecars.
Init containers are defined in a separate property, under the templates.initContainers
array, in the Container App ARM spec
Documentation on container types can be found here - Azure Container Apps - Multiple Containers
Containers in pods can communicate with each other since these containers share a network namespace which is a part of the pod. This functionality through Kubernetes lets pods communicate with each other over localhost
instead of needing the pod IP.
This can be done since the network namespace with Kubernetes is shared between containers in a pod by default.
Since containers share the same namespace - care needs to be taken to avoid binding to the same port which can cause port clashes and one or more containers failing to start.
In the below example, the frontend
container could call the redis-sidecar
container by localhost:6379
.
Within a pod with multiple containers on Container Apps - you can use netstat -ano
to look at the listening addresses.
# netstat -ano
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State Timer
tcp 0 0 127.0.0.11:53 0.0.0.0:* LISTEN off (0.00/0/0)
tcp 0 0 0.0.0.0:8000 0.0.0.0:* LISTEN off (0.00/0/0)
tcp 0 0 127.8.90.151:23044 0.0.0.0:* LISTEN off (0.00/0/0)
tcp 0 0 127.8.90.151:23044 0.0.0.0:* LISTEN off (0.00/0/0)
tcp 0 0 127.8.90.151:23044 0.0.0.0:* LISTEN off (0.00/0/0)
tcp 0 0 127.8.90.151:23044 0.0.0.0:* LISTEN off (0.00/0/0)
tcp 0 0 0.0.0.0:23045 0.0.0.0:* LISTEN off (0.00/0/0)
tcp 0 0 0.0.0.0:23045 0.0.0.0:* LISTEN off (0.00/0/0)
tcp 0 0 0.0.0.0:23045 0.0.0.0:* LISTEN off (0.00/0/0)
tcp 0 0 0.0.0.0:23045 0.0.0.0:* LISTEN off (0.00/0/0)
tcp 0 0 127.78.144.42:23044 0.0.0.0:* LISTEN off (0.00/0/0)
tcp 0 0 127.78.144.42:23044 0.0.0.0:* LISTEN off (0.00/0/0)
tcp 0 0 127.78.144.42:23044 0.0.0.0:* LISTEN off (0.00/0/0)
tcp 0 0 127.78.144.42:23044 0.0.0.0:* LISTEN off (0.00/0/0)
tcp 0 0 127.39.62.152:23044 0.0.0.0:* LISTEN off (0.00/0/0)
tcp 0 0 127.39.62.152:23044 0.0.0.0:* LISTEN off (0.00/0/0)
tcp 0 0 127.39.62.152:23044 0.0.0.0:* LISTEN off (0.00/0/0)
tcp 0 0 127.39.62.152:23044 0.0.0.0:* LISTEN off (0.00/0/0)
tcp 0 0 0.0.0.0:6379 0.0.0.0:* LISTEN off (0.00/0/0)
Going off the above example, note that there is something listening to 0.0.0.6379
and 0.0.0.0:8000
which would correspond to application and sidecar containers.
Depending what services are enabled (explained earlier above) there may be additional servers binding to addresses.
Another example is calling a sidecar API through its localhost
address through http://localhost:3000
:
To demonstrate further on other ways to configure routing, such as in a reverse proxy method - consider you have two containers - one is NGINX, the other is Go:
go
is ran as a sidecar container. nginx
is the main container.nginx
is, in this case - since it’s listening port 80
and we have our ingress set to 80
.0.0.0.0:3000
- since containers in pods can communicate via localhost
, you could now proxy_pass
requests from nginx
to go
:(Go)
func main() {
app := fiber.New()
app.Get("/", controllers.IndexController)
app.Get("/get/headers", controllers.UpstreamController)
log.Fatal(app.Listen(":3000"))
}
(NGINX)
location / {
proxy_pass http://localhost:3000;
}
location /headers {
proxy_pass http://localhost:3000/get/headers;
}
You can confirm by Response Headers that the response is returned by NGINX, confirming this is being proxied:
Using the same concept as above, instead of NGINX, we’ll use our own custom Envoy container:
0.0.0.0:3000
and was created as the sidecar container10000
- ingress is also set to 10000
The same Go code from above is used. For envoy, we configure this upstream endpoint as the following:
....other fields
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: "*"
routes:
- match:
prefix: "/headers"
route:
prefix_rewrite: "/get/headers"
cluster: go-application
- match:
prefix: "/"
route:
cluster: go-application
....other fields
clusters:
- name: go-application
connect_timeout: 30s
type: LOGICAL_DNS
# Comment out the following line to test on v6 networks
dns_lookup_family: V4_ONLY
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: go-application
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 3000
Sidecar containers:
For pods using multicontainers with sidecar containers (eg., app and sidecar containers), the lifecycle events would still follow what is described in Container Apps - Demystifying restarts
There is no guarantee a certain container starts first. If an application requires the other application to be available - then the “client” application should use something like retry mechanisms (like general backoffs or exponential backoffs on connection or request attempts) until the other sidecar application or service is started.
Init containers:
Init containers will always run prior to the “main” and “sidecar” containers being created - as well as any system containers. Below is an example of what you’d see in ContainerAppSystemLogs
/ ContainerAppSystemLogs_CL
regarding the pod lifecycle:
3/6/2024, 3:58:05.564 PM envoy-go-multicontainer--94j5cq2-66b45d94c-wmhlq Replica 'envoy-go-multicontainer--94j5cq2-66b45d94c-wmhlq' has been scheduled to run on a node.
3/6/2024, 3:58:12.953 PM envoy-go-multicontainer--94j5cq2-66b45d94c-wmhlq Successfully pulled image 'someacr.azurecr.io/containerappsmulticontainers-envoy:headers2' in 5.8793404s
3/6/2024, 3:58:14.096 PM envoy-go-multicontainer--94j5cq2-66b45d94c-wmhlq Successfully pulled image 'someacr.azurecr.io/azurecontainerappsgoinitcontainer:latest' in 7.4125569s
3/6/2024, 3:58:14.615 PM envoy-go-multicontainer--94j5cq2-66b45d94c-wmhlq Successfully pulled image 'someacr.azurecr.io/containerappsmulticontainers-go:latest' in 8.7526821s
3/6/2024, 3:58:14.615 PM envoy-go-multicontainer--94j5cq2-66b45d94c-wmhlq Created container 'go-init'
3/6/2024, 3:58:15.721 PM envoy-go-multicontainer--94j5cq2-66b45d94c-wmhlq Started container 'go-init'
3/6/2024, 3:58:15.721 PM envoy-go-multicontainer--94j5cq2-66b45d94c-wmhlq Probe started from stage StartUp
3/6/2024, 3:58:15.721 PM envoy-go-multicontainer--94j5cq2-66b45d94c-wmhlq Container 'go-init' was terminated with exit code '0' and reason 'ProcessExited'
3/6/2024, 3:58:15.721 PM envoy-go-multicontainer--94j5cq2-66b45d94c-wmhlq Probe started from stage StartUp
3/6/2024, 3:58:15.721 PM envoy-go-multicontainer--94j5cq2-66b45d94c-wmhlq Probe started from stage StartUp
3/6/2024, 3:58:15.721 PM envoy-go-multicontainer--94j5cq2-66b45d94c-wmhlq Created container 'envoyproxy'
3/6/2024, 3:58:15.721 PM envoy-go-multicontainer--94j5cq2-66b45d94c-wmhlq Started container 'envoyproxy'
3/6/2024, 3:58:15.721 PM envoy-go-multicontainer--94j5cq2-66b45d94c-wmhlq Probe started from stage StartUp
3/6/2024, 3:58:15.721 PM envoy-go-multicontainer--94j5cq2-66b45d94c-wmhlq Created container 'go'
3/6/2024, 3:58:15.721 PM envoy-go-multicontainer--94j5cq2-66b45d94c-wmhlq Started container 'go'
3/6/2024, 3:58:15.721 PM envoy-go-multicontainer--94j5cq2-66b45d94c-wmhlq Probe started from stage StartUp
We can infer the above from logging:
go-init
which runs to completion prior to any system or other application main/sidecar containers that start their lifecycle eventsNOTE: The above is a pod in a Dedicated environment - logging will be slightly different on Consumption-Only
IMPORTANT: For the rest of the system and app containers to move forward in their typical lifecycle events (image pull, container create, start, etc.) - the init container needs to succeed. An exit code of greater than 1, failing startups, failing image pulls, failing container creation, etc. - will all cause the pod or replica to be marked as “failed”.
Below is a diagram of the lifecycle which includes init containers as described above:
For both init container and main containers - typical container configuration that exists for main containers can be done for init containers.
With the following exceptions:
The Container App ARM definition can be found here for further configuration.
Health Probes can be configured for each individual container. Review the Container Apps: Troubleshooting and configuration with Health Probes blog for further information on probes and troubleshooting.
Note, that if one of the containers Health Probes starts to fail the defined probe definitions - the Revision will be marked as failed or potentially degraded.
Essentially almost all troubleshooting for a multi container app should follow typical app troubleshooting.
The only main difference is if the init container exits with a status code greater than zero (0) and/or consistently fails start up - amongst other typical fatal issues like image pull failures, volume mount failures, container create failures, etc. - then the rest of system and app container creation and lifecycle events will not move forward. Logging and events will appear typical Container App logging tables.
Below is an example of what you’d see in ContainerAppSystemLogs_CL
/ ContainerAppSystemLogs
if an init container was failing to start or exit with a unsuccessful status code:
ReplicaName msg
someapp--se423cs-568986fd64-rxzvg Pulling image "someacr.azurecr.io/azurecontainerappsgoinitcontainer:exit1"
someapp--se423cs-568986fd64-rxzvg Successfully pulled image "someacr.azurecr.io/azurecontainerappsgoinitcontainer:exit1" in 189.996488ms (190.001986ms including waiting)
someapp--se423cs-568986fd64-rxzvg Created container go-init
someapp--se423cs-568986fd64-rxzvg Started container go-init
someapp--se423cs-568986fd64-rxzvg Persistent Failiure to start container
someapp--se423cs-568986fd64-rxzvg Back-off restarting failed container go-init in pod someapp2--se423cs-568986fd64-rxzvg_k8se-apps(adb8d524-571f-4928-ab41-7a7b927e2f8b)
At that point, you should look in ContainerAppConsoleLogs_CL
/ ContainerAppConsoleLogs
or Logstream to review application stdout
/ stderr
Since a large amount of issues will be the same as typical application and/or configuration issues, below are some quick reference links:
For blog posts filed under the “Container Apps” category - see Category - Container Apps. Or, review the general Azure OSS Developer blog
Given the way networking with pods work - containers in a pod share the same network name space. Ports will need to be unique. If not, you’ll run into something like the below in ContainerAppConsoleLogs
/ ContainerAppConsoleLogs_CL
- and in turn, the revision will show as “failed”:
failed to listen: listen tcp4 :3000: bind: address already in use
The above is just an example with Go - but the same general message will show regardless of language (with just some variation in the message). This can occur if you have two or more containers in a pod that happen to be binding to the same port.
To resolve this - ensure each container is using a unique port.
IaC refers to infrastructure-as-code - like Bicep, ARM, Terraform, Pulumi, etc.
Ensure that recent API versions described in the Container Apps ARM specification are used. As of writing this now - there is older or deprecated API versions available when init containers did not exist. A deployment may be successful when using these versions with IaC - but the initContainers
array on the Container App resource will be null
due to the usage of old versions.
CPU and Memory metrics exposed are for the pod resource usage as a whole, which is made up by the containers within. If troubleshooting high CPU or high memory scenarios - this may need to be taken into account.
If tools like top
or ps
exist in the container - you can go into a container to validate which process may be consuming high CPU or memory. If these tools don’t exit - they can (normally) be installed - assuming access to the internet is not completely locked down for package installation.
Note, when you have a shell opened with a specific container, this will not show process ID’s from other containers in a pod. This is because the PID namespace is not shared between containers in a pod - by default, in Kubernetes.
Blog posts for performance on Container Apps can be found in Container Apps - Performance
]]>Container Apps offers built-in support for Dapr. This post will cover some more generalized and common scenarios as well as talking about some component troubleshooting.
Although this may be at a high-level, this can potentially be used to point yourself in the right direction for troubleshooting.
Limitations on Dapr with Container Apps is called out here - Dapr integration with Azure Container Apps - Limitations
Additionally, with the following:
Under the Overview blade for the Container App Environment, there is a property that shows the current Dapr version
This may be helpful is needing to know the current version in the environment.
ContainerAppConsoleLogs_CL
) or Azure Monitor (ContainerAppConsoleLogs
) will contain stdout/err for the daprd
sidecar that’s a part of your pod/replica. The Dapr API verbosity option (seen below) also influences the kind of logs seen here.Knowing this is incredibly important for overall troubleshooting with Dapr. The output here is the same kind of output you’d see from daprd
when running locally.
time="2024-02-12T17:14:21.77245614Z" level=info msg="starting Dapr Runtime -- version 1.11.6 -- commit 349d21adeac3425919e0fea8cb58c4a2ec799a3f" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.772518438Z" level=info msg="log level set to: debug" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.772632589Z" level=info msg="metrics server started on :9090/" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.metrics type=log ver=1.11.6
time="2024-02-12T17:14:21.772743081Z" level=info msg="Initializing the operator client" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.781894439Z" level=debug msg="Loading Kubernetes config resource: defaultconfig" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.785143677Z" level=debug msg="No resiliency policies found" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.785169708Z" level=debug msg="Found 0 resiliency configurations from Kubernetes" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.785195051Z" level=info msg="Resiliency configuration loaded" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.78520075Z" level=debug msg="No Access control policy specified" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.acl type=log ver=1.11.6
time="2024-02-12T17:14:21.786006664Z" level=info msg="kubernetes mode configured" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.786033443Z" level=info msg="app id: go-pubsub" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.786054759Z" level=info msg="mTLS enabled. creating sidecar authenticator" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.786169121Z" level=info msg="trust anchors and cert chain extracted successfully" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime.security type=log ver=1.11.6
time="2024-02-12T17:14:21.786185935Z" level=info msg="authenticator created" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.796812226Z" level=info msg="Dapr trace sampler initialized: DaprTraceSampler(P=1.000000)" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.797348731Z" level=info msg="Initialized name resolution to k8se" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.797448745Z" level=info msg="Loading components…" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.799537865Z" level=debug msg="Found component: pubsub (pubsub.azure.eventhubs/v1)" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.799665593Z" level=info msg="Waiting for all outstanding components to be processed" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.799678331Z" level=debug msg="Loading component: pubsub (pubsub.azure.eventhubs/v1)" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.799979923Z" level=info msg="The provided connection string is specific to the Event Hub (\"entity path\") 'orders'; publishing or subscribing to a topic that does not match this Event Hub will fail when attempted" app_id=go-pubsub component="pubsub (pubsub.azure.eventhubs/v1)" instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.contrib type=log ver=1.11.6
time="2024-02-12T17:14:21.800073504Z" level=info msg="Component loaded: pubsub (pubsub.azure.eventhubs/v1)" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.800092843Z" level=info msg="All outstanding components processed" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.800106043Z" level=info msg="Loading endpoints" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.80163449Z" level=debug msg="No http endpoints found" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime.httpendpoints type=log ver=1.11.6
time="2024-02-12T17:14:21.801936125Z" level=info msg="Waiting for all outstanding http endpoints to be processed" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.802070319Z" level=info msg="All outstanding http endpoints processed" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.802508458Z" level=info msg="gRPC proxy enabled" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.802801319Z" level=info msg="gRPC server listening on TCP address: [::1]:50001" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime.grpc.api type=log ver=1.11.6
time="2024-02-12T17:14:21.803162606Z" level=info msg="gRPC server listening on TCP address: 127.0.0.1:50001" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime.grpc.api type=log ver=1.11.6
time="2024-02-12T17:14:21.962326119Z" level=info msg="application discovered on port 3000" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
Looking through the various output above, we can see this shows a lot of events - such as the version of Dapr being used, various configuration loaded (by default), Dapr components that are loaded (if any), application discovery, and others.
If a component fails to load, there should be at least one log with a level of fatal, and potentially a log with a level of warning, with a reason for the issue. If the replicas of the Container App scale down to zero, the Dapr sidecar will automatically shutdown and messages that resemble the following should appear, which is generally normal. However, if an Actor on the Container App, minReplicas
set to at least one to avoid this behavior so that the Actor can function. This is called out in the limitations hyperlink above.
The below logging will also show for any new pod or replicas being created. Especially in “restart” scenarios - if you carefully look at the pod/replica name (exposed through the ContainerGroupName_s
column in ContainerAppConsoleLogs
) - you’ll see this follows the logic in Container Apps - Demystifying restarts. The daprd
sidecar in the “old” pod/replica will shutdown while the new one starts up
time="2024-02-12T17:14:29.340921459Z" level=info msg="Dapr shutting down" app_id=go-pubsub instance=someapp--qqeuvcc-845886bbff-bld5n scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:29.340993239Z" level=info msg="Stopping PubSub subscribers and input bindings" app_id=go-pubsub instance=someapp--qqeuvcc-845886bbff-bld5n scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:29.341010362Z" level=info msg="Shutting down workflow engine" app_id=go-pubsub instance=someapp--qqeuvcc-845886bbff-bld5n scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:29.341018735Z" level=info msg="Initiating actor shutdown" app_id=go-pubsub instance=someapp--qqeuvcc-845886bbff-bld5n scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:29.341026503Z" level=info msg="Shutting down actor" app_id=go-pubsub instance=someapp--qqeuvcc-845886bbff-bld5n scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:29.341039199Z" level=info msg="Holding shutdown for 5s to allow graceful stop of outstanding operations" app_id=go-pubsub instance=someapp--qqeuvcc-845886bbff-bld5n scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:34.341514242Z" level=info msg="Stopping Dapr APIs" app_id=go-pubsub instance=someapp--qqeuvcc-845886bbff-bld5n scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:34.341869812Z" level=info msg="Shutting down all remaining components" app_id=go-pubsub instance=someapp--qqeuvcc-845886bbff-bld5n scope=dapr.runtime type=log ver=1.11.6
Ensure if the application is using HTTP or gRPC that the protocol is appropriately set for daprd
in the Dapr blade on the application:
For example, in this case, this Dapr SDK for Go was expecting to communicate over gRPC - but the protocol was set to HTTP. Therefor the below connection refused
occurred since the HTTP port of 3500
is expecting to be accessed. More than likely, these kind of events are only going to surface in an application containers ContainerAppConsoleLogs
/ ContainerAppConsoleLogs_CL
and not within daprd
- the below would be found in ContainerAppConsoleLogs:
panic: error publishing event unto orders topic: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:50001: connect: connection refused"
By default, the protocol is HTTP after enabling Dapr on an application.
This section is to call out certain messages that may be present in ContainerAppConsoleLogs
/ ContainerAppConsoleLogs_CL
when daprd
is enabled for a Container App.
daprd
and is just a “call out”level=info msg="actors: state store is not configured - this is okay for clients but services with hosted actors will fail to initialize!"
daprd
and should have no availability affects on the application.level=debug msg="error connecting to placement service (will retry to connect in background): rpc error: code = Unavailable desc = last resolver error: produced zero addresses"
A full list of Dapr API error codes can be found here. What is returned will depend on the component used (if any) and the scenario.
It is good to cross-reference these messages to the table in the above link. Even if an application is not set to return the direct error from Dapr, logging through ContainerAppConsoleLogs and filtering on the daprd
can possibly
State Management refers to the “State Management” Dapr API that can be used with an HTTP client (various based on language) as well as the Dapr SDK, which also varies based on language. Documentation on the State Management API can be found here.
The base URL for the State Store through the HTTP API is http://localhost:3500/v1.0/state/[your_statestore_name]
. Under the hood, the SDK will use a base host and port defined in the language SDK code base.
Paths for different CRUD operations regarding state will vary depending on what is being tried - here. The biggest difference is that the HTTP VERB associated with these operations need to change too. Eg.
GET
DELETE
POST
When the State Management HTTP API is invoked, you’ll see daprd
sidecar logging about this - the method
will depend on what kind of HTTP VERB the client application making the API call is using:
level=info msg="HTTP API Called" app_id=python-app instance=someapp--gy2u6mk-6b48968d5f-mdwsj method="POST /v1.0/state/{storeName}" scope=dapr.runtime.http-info type=log useragent=python-requests/2.28.1 ver=1.11.6
Or it may look like the below, if an SDK is used (the log itself will vary depending on the language used) - which also happens to use gRPC for the pubsub operations
level=info msg="gRPC API Called" app_id=python-app instance=some-app--fk72mzz-6887f49b67-8zxkq method=/dapr.proto.runtime.v1.Dapr/SaveState scope=dapr.runtime.grpc.api-info type=log useragent="dapr-sdk-python/1.12.1 grpc-python/1.60.1 grpc-c/37.0.0 (linux; chttp2)" ver=1.11.6
For general component metaData
issues - where metaData
has incorrect information, eg., an incorrect Storage Account, you may get something like the below in which the component fails to load:
level=fatal msg="process component statestore error: [INIT_COMPONENT_FAILURE]: initialization error occurred for statestore (state.azure.blobstorage/v1): init timeout for component statestore exceeded after 5s" app_id=go-app instance=someapp--e4n7wj3-658dd7d969-6twrz scope=dapr.runtime type=log ver=1.11.6
ERR_STATE_STORES_NOT_CONFIGURED:
The equivalent of this may look like the below in daprd
:
time="2024-02-08T22:08:47.649371144Z" level=debug msg="api error: code = FailedPrecondition desc = state store is not configured" app_id=python-app instance=some-appp--8gyw3on-7bb477d6d7-dbzpt scope=dapr.runtime.http type=log ver=1.11.6
If Dapr is enabled, but the State Store API is being called through an HTTP client or the Dapr SDK, but no State Store was configured (either never added or for some reason never loaded through Dapr), this error will occur.
If a State Store was added at some later point in time, ensure the Container App is restarted so the new/pod replicas will have daprd
load the component in.
You can ensure that a State Store component was created by looking at the Dapr components blade on the Container App Environment. In ContainerAppConsoleLogs
/ ContainerAppConsoleLogs_CL
, you can confirm if the state store component was loaded or not by carefully reviewing messaging - below is the typical messages you’d want to see for any Dapr component:
Ideally, we want to see the below - which would confirm the component is loaded. If we don’t see these logs at all upon daprd
startup, then we can infer the Store Store was not loaded which would put us into this problem. Note, that during normal runtime you should not see this under the daprd
sidecar was restarted, or, a new pod or replica was created:
time="2024-02-08T22:21:17.508900433Z" level=info msg="Loading components…" app_id=python-app instance=some-app--8gyw3on-56c84b68d7-vg4nb scope=dapr.runtime type=log ver=1.11.6
time="2024-02-08T22:21:17.510808895Z" level=debug msg="Found component: statestore (state.azure.blobstorage/v1)" app_id=python-app instance=some-app--8gyw3on-56c84b68d7-vg4nb scope=dapr.runtime type=log ver=1.11.6
time="2024-02-08T22:21:17.510842467Z" level=info msg="Waiting for all outstanding components to be processed" app_id=python-app instance=some-app--8gyw3on-56c84b68d7-vg4nb scope=dapr.runtime type=log ver=1.11.6
time="2024-02-08T22:21:17.510860302Z" level=debug msg="Loading component: statestore (state.azure.blobstorage/v1)" app_id=python-app instance=some-app--8gyw3on-56c84b68d7-vg4nb scope=dapr.runtime type=log ver=1.11.6
time="2024-02-08T22:21:17.575072929Z" level=info msg="Component loaded: statestore (state.azure.blobstorage/v1)" app_id=python-app instance=some-app--8gyw3on-56c84b68d7-vg4nb scope=dapr.runtime type=log ver=1.11.6
time="2024-02-08T22:21:17.575121312Z" level=info msg="All outstanding components processed" app_id=python-app instance=some-app--8gyw3on-56c84b68d7-vg4nb scope=dapr.runtime type=log ver=1.11.6
The state store type will completely depend on the component and which backing store is used.
Depending on how the application is written, the call to state may very much return an HTTP 200, but ultimately state will not be altered or retrieved if the component is never loaded.
ERR_STATE_STORE_NOT_FOUND:
This means that the Store Store component is loaded - but the state store specified by the application does not exist, may be mispelled, or scoped to the wrong component.
Quick validation that can be done is to go into the Dapr blade for the application and review the name of the State Store to see if this matches what the application is referencing - this is assuming the component is scoped to the application (and the correct one):
Else, if it not scoped, which all applications will have it loaded, go to the Dapr Components blade on the Container App Environment:
Using an example below, we’d be able to confirm the state store was loaded:
level=info msg="Loading components…" app_id=python-app instance=some-app--8gyw3on-56c84b68d7-vg4nb scope=dapr.runtime type=log ver=1.11.6
level=debug msg="Found component: statestore (state.azure.blobstorage/v1)" app_id=python-app instance=some-app--8gyw3on-56c84b68d7-vg4nb scope=dapr.runtime type=log ver=1.11.6
level=info msg="Waiting for all outstanding components to be processed" app_id=python-app instance=some-app--8gyw3on-56c84b68d7-vg4nb scope=dapr.runtime type=log ver=1.11.6
level=debug msg="Loading component: statestore (state.azure.blobstorage/v1)" app_id=python-app instance=some-app--8gyw3on-56c84b68d7-vg4nb scope=dapr.runtime type=log ver=1.11.6
level=info msg="Component loaded: statestore (state.azure.blobstorage/v1)" app_id=python-app instance=some-app--8gyw3on-56c84b68d7-vg4nb scope=dapr.runtime type=log ver=1.11.6
level=info msg="All outstanding components processed" app_id=python-app instance=some-app--8gyw3on-56c84b68d7-vg4nb scope=dapr.runtime type=log ver=1.11.6
But when looking in ContainerAppConsoleLogs_CL
, we’d see the below message:
level=debug msg="api error: code = InvalidArgument desc = state store statestores is not found" app_id=python-app instance=someapp--gy2u6mk-6778d7478d-wgvc9 scope=dapr.runtime.http type=log ver=1.11.6
Where “statestores” is the name of the State Store being referenced within the calling application code. In this example, we can see that the client application is calling the wrong store name - however, this can happen for all the reasons mentioned earlier above.
Service Invocation refers to the “Service Invocation” Dapr API that can be used with an HTTP client (various based on language), Dapr SDK or gRPC - which also varies based on language. Documentation on the Service Invocation API can be found here.
An example of “Service Invocation” is a client using the Dapr API to call to other services, like a backend API. This is done within the application through the following:
http://localhost:3500/v1.0/invoke/some_service/method/some_backend_route
http://localhost:3500/v1.0/invoke
- This is the “base URL”. This always remains the same.backend
- This needs to match the name of the Dapr App ID of the target resource being called upstream. “some_service” is just an example.method
- This needs to remain in the URL scheme above. Just like the “base URL”some_backend_route
- This is the route/path being called on the target service. For instance, http://localhost:3500/v1.0/invoke/backend/method/controllerRoute
or - if it was a nested path, an example would be http://localhost:3500/v1.0/invoke/backend/method/api/controllerRoute
localhost:50001
while specifying the AppId
in the requestNote, using the SDK with the Service Invocation method will vary based on the language - but you still need to target the correct App Id of the upstream resource.
You do not need any components for Service Invocation. Only Dapr needs to be enabled at the application level to use this API.
ERR_DIRECT_INVOKE:
A full error message may look like:
{"errorCode":"ERR_DIRECT_INVOKE","message":"fail to invoke, id: backen, err: rpc error: code = Unimplemented desc = "}#
This is likely due to the target method not matching the Dapr App Id of the upstream resource. For example:
http://localhost:3500/v1.0/invoke/backend/method/controllerRoute
- If “backend” is not an existing Dapr App Id of an upstream resource, this will cause ERR_DIRECT_INVOKE
invoke
method exposed from the SDKctx = metadata.AppendToOutgoingContext(ctx, "dapr-app-id", "backend")
From ContainerAppConsoleLogs
, you can query output from daprd
to see when the invoke
API is called for Service Invocation:
ContainerAppConsoleLogs_CL
| where ContainerName_s == "daprd"
| project TimeGenerated, Log_s, ContainerAppName_s, ContainerGroupName_s
(HTTP API)
level=info msg="HTTP API Called" app_id=frontend instance=some-app--yo0h472-79b654fc48-57ncr method="GET /v1.0/invoke/{id}/method/{method:*}" scope=dapr.runtime.http-info type=log useragent=python-requests/2.28.1 ver=1.11.6
(gRPC API)
time="2024-02-12T20:30:53.106095734Z" level=info msg="gRPC API Called" app_id=client instance=someapp--7hwmpzf-569599f95c-sngdk method=/helloworld.Greeter/SayHello scope=dapr.runtime.grpc.api-info type=log useragent=grpc-go/1.61.0 ver=1.11.6
NOTE: User-Agent completely depends on the application and what kind of HTTP client is being used. Even with Dapr SDK usage, the User-Agent may vary since different HTTP clients are used under the hood
For additional testing, you can connect to the application container through Console and use curl
to test the upstream resource:
curl -v http://localhost:3500/v1.0/invoke/myupstreamservice/method/myendpoint
Pubsub refers to the “publish and subscribe” methods and Dapr API’s - the full API reference for the Dapr pubsub API can be found here.
The pubsub building block documentation along with other links to quickstarts and tutorials can be found here.
As a reference, when a pubsub API is invoked, it’ll show the following through daprd
:
The below shows what it may look like when a pubsub gRPC API is called (the User-Agent will vary based on the SDK used):
level=info msg="gRPC API Called" app_id=go-app instance=someapp--yrdy-59b746866f-fbzkb method=/dapr.proto.runtime.v1.Dapr/PublishEvent scope=dapr.runtime.grpc.api-info type=log useragent="dapr-sdk-go/v1.9.1 grpc-go/1.57.0" ver=1.11.6
The below is through the HTTP API (the User-Agent will vary based on the HTTP client used):
time="2023-10-18T21:24:19.343194954Z" level=info msg="HTTP API Called" app_id=node-app instance=dapr-pub-sub-examples-node-http--d4np7jq-684c774bb8-ngwl6 method="POST /v1.0/publish/{pubsubname}/{topic:*}" scope=dapr.runtime.http-info type=log useragent=axios/1.5.1 ver=1.11.2-msft-3
The Dapr pubsub API has a set of status codes that may be returned in certain scenarios. This may manifest in application logging:
Code | Description
204 | Message delivered
403 | Message forbidden by access controls
404 | No pubsub name or topic given
500 | Delivery failed
You can view the API documentation here
Pubsub “message” codes refer to the text message returned in the response body. This is returned from daprd
as seen in the codebase here - GitHub - dapr - api.go
NOTE: These error messages will appear in the ContainerAppConsoleLog_CL
/ ContainerAppConsoleLog
tables - this may be returned from the application response - or - an related message seen from daprd
ERR_PUBSUB_PUBLISH_MESSAGE:
This may be a generic message, review the message
in the returned body for more details. This may likely point to the reason for this message.
An example of this is:
message: "error when publish to topic sometopic in pubsub somepubsub: error trying to establish a connection: the requested topic 'sometopic' does not match the Event Hub name in the connection string"
Or the below - which indicates the FQDN in the connection string is incorrect, or, DNS cannot be resolved to the FQDN of the pubsub (eventhub, in this case)
message: 'error when publish to topic asome-topic in pubsub some-pubsub: error creating event batch: (connlost): dial tcp: lookup some-eventhub.servicebus.windows.net on 10.96.0.10:53: no such host'
ERR_PUBSUB_NOT_FOUND:
The pubsub name may be incorrect or does not exist.
Review the connection string (if used) in the component metadata. Since programmatic subscriptions can only be used (as called out in the limits and caveats section above, declarative component subscriptions is not supported), review the calling client within your code to make sure they’re not calling an incorrect or non-existent pubsub name - which will depend on how the endpoint is contructed (eg., through an HTTP client or SDK usage).
An example response body of this is:
data: {
errorCode: 'ERR_PUBSUB_NOT_FOUND',
message: 'pubsub eventhubs-pubsubt not found'
}
ERR_PUBSUB_NOT_CONFIGURED:
The application may not be appropriately scoped to the pubsub component - or there is no existing pubsub component. If a component was added but the application was never restarted (new deployment, revision, etc.) then this change would not have been picked up.
You’d want to review ContainerAppConsoleLogs_CL
ContainerAppConsoleLogs
to ensure there is a message about the pubsub component being loaded like below - if this is not shown, then this is likely a contributor or root cause to the problem:
time="2024-02-12T17:14:21.797448745Z" level=info msg="Loading components…" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.799537865Z" level=debug msg="Found component: pubsub (pubsub.azure.eventhubs/v1)" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.799665593Z" level=info msg="Waiting for all outstanding components to be processed" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.799678331Z" level=debug msg="Loading component: pubsub (pubsub.azure.eventhubs/v1)" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.799979923Z" level=info msg="The provided connection string is specific to the Event Hub (\"entity path\") 'orders'; publishing or subscribing to a topic that does not match this Event Hub will fail when attempted" app_id=go-pubsub component="pubsub (pubsub.azure.eventhubs/v1)" instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.contrib type=log ver=1.11.6
time="2024-02-12T17:14:21.800073504Z" level=info msg="Component loaded: pubsub (pubsub.azure.eventhubs/v1)" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
time="2024-02-12T17:14:21.800092843Z" level=info msg="All outstanding components processed" app_id=go-pubsub instance=someapp--qqeuvcc-7fbff77449-trzbx scope=dapr.runtime type=log ver=1.11.6
NOTE: There can be various pubsub providers, this is just an example using Event Hub, but the messages will look almost the same regardless of provider
This may also happen if the appId
is renamed while the old appId
is shown as “not found” on the component. Ensure the appId
and scope is updated and then the revision is restarted.
Below is an example message:
{
errorCode: 'ERR_PUBSUB_NOT_CONFIGURED',
message: 'no pubsub is configured'
}
In daprd
logging within console logs, this may show the following message - note, this may vary slightly based on whether an SDK or HTTP API is being used as well as gRPC:
level=debug msg="rpc error: code = FailedPrecondition desc = no pubsub is configured" app_id=go-pubsub instance=some-pubsub--cgfkfw9-9fc65ddf5-czfnx scope=dapr.runtime.grpc.api type=log ver=1.11.6
Unable to publish would mean that a client application is unable to send messages to the target resource - eg., Event Hub, Service Bus, or other message-based service. This is in addition to the specific errors mentioned above.
Scoping:
daprd
(sidecar) stdout/err? Or both?metaData
fields. For new implementations:
metaData
fields and cross-reference these, if neededPotential common issues:
metaData
is targeting an incorrect or non-existent message server. Either by FQDN or by topic name. Additionally, connectionStrings, if used, may be incorrect.metaData
- cross reference requirements for a component here - Dapr - Components - PubsubSince programmatic subscriptions can be only be used - see the documentation here - Dapr - Programmatic Subscriptions.
stdout
- a successful call for Dapr subscriptions would be logged out in an applications HTTP access log (or elsewhere if they’re redirecting stdout
to a specific location). This would look something like 2023-10-23T21:07:37.445738298Z 127.0.0.1 - - [23/Oct/2023:21:07:37 +0000] "GET /dapr/subscribe HTTP/1.1" 200 60 "-" "Go-http-client/1.1"
.GET
request to /dapr/subscribe
is made from Dapr to the applications endpoint that is exposing the path /dapr/subscribe
- if the application is wanting to subscribe - a successful call would return an HTTP 200Component metaData
should be reviewed. As an example, with Event Hub, if consumerID
is not set to the Consumer Group created on the Event Hub side, then subscription endpoints may not receive data, even though there may be no error.
This may vary based on which message broker is being used (not all components use the same metaData
properties).
In certain cases, an application may not be able to subscribe if they’re not aware of the “Consumer groups and competing patterns”
An example of this could be if one is running an application locally, and then runs the same application on Container Apps - both running at the same time - which means there is now two subscribers - then potentially only the local application will receive events as it was running first.
ERR_HEALTH_NOT_READY dapr is not ready:
This will look like the following in console logs:
level=debug msg="{ERR_HEALTH_NOT_READY dapr is not ready}" app_id=go-app instance=someapp-http--yrdy-7c97f56697-bllgh scope=dapr.runtime.http type=log ver=1.11.6
This can occur if a component has incorrect metaData
- eg., wrong Access Key, or any other property that has wrong values where it needs to connect to an external resource. An issue can be inferred when this is repeatedly being seen in daprs
sidecar logs in either table.
delayed connect error: 111:
The full message would be upstream connect error or disconnect/reset before headers. retried and the latest reset reason: remote connection failure, transport failure reason: delayed connect error: 111
returned to a client. This may immediately show after invoking a request.
This is not Dapr specific - rather, this is returned from Envoy and is an HTTP 503 - which is due to the application container exiting due to a fatal error or unhandled exception. The reason this is mentioned in this Dapr blog is due to the chance, which completely depends on the application, where a component or service is invoked and fails - and when happens, encounters this scenario.
On top of this, there is a likely chance that in some situations an associated error is not written to stderr by daprd
, thus not appearing in ContainerAppConsoleLogs
/ ContainerAppConsoleLogs
when filtering by the daprd
container name.
Instead, it’s very important to look in ContainerAppConsoleLogs
/ ContainerAppConsoleLogs
(for the application container) to try and find the fatal error or exception. Below is an example of a fatal error (due to log.Fatalf
, which implicitly calls exit
) with Go due to calling an invalid upstream App Id:
2024/02/12 20:59:46 could not greet: rpc error: code = Unimplemented desc =
Or this panic
, due to no configured pubsub component:
panic: error publishing event unto orders topic: rpc error: code = FailedPrecondition desc = no pubsub is configured
Ultimately, the error, and how it’s handled, will dictate the logging you see.
You can set log verbosity for Dapr via the portal - under the “Dapr” blade on the Container App. The output here mimics what would be shown running Dapr anywhere else, such as locally.
Setting the log level to Debug can be helpful in troubleshooting issues. This will show more detailed logging around various Dapr lifecycle events. For instance, using the Pubsub component and Event Hub as an example - when setting logging to Debug - we can see more specific events logged out. This would be useful in cases where an app wasn’t receiving messages after subscribing to a message service:
The below shows the pubsub HTTP API being called along with polling of any new messages:
Log
time="2023-10-18T21:23:02.836969406Z" level=debug msg="Received batch with 0 events on topic orders, partition 0" app_id=node-app component="pubsub (pubsub.azure.eventhubs/v1)" instance=dapr-pub-sub-examples-node-http--d4np7jq-684c774bb8-ngwl6 scope=dapr.contrib type=log ver=1.11.2-msft-3
time="2023-10-18T21:24:02.841720985Z" level=debug msg="Received batch with 0 events on topic orders, partition 0" app_id=node-app component="pubsub (pubsub.azure.eventhubs/v1)" instance=dapr-pub-sub-examples-node-http--d4np7jq-684c774bb8-ngwl6 scope=dapr.contrib type=log ver=1.11.2-msft-3
time="2023-10-18T21:24:19.343194954Z" level=info msg="HTTP API Called" app_id=node-app instance=dapr-pub-sub-examples-node-http--d4np7jq-684c774bb8-ngwl6 method="POST /v1.0/publish/{pubsubname}/{topic:*}" scope=dapr.runtime.http-info type=log useragent=axios/1.5.1 ver=1.11.2-msft-3
time="2023-10-18T21:24:19.438464479Z" level=debug msg="Received batch with 1 events on topic orders, partition 0" app_id=node-app component="pubsub (pubsub.azure.eventhubs/v1)" instance=dapr-pub-sub-examples-node-http--d4np7jq-684c774bb8-ngwl6 scope=dapr.contrib type=log ver=1.11.2-msft-3
time="2023-10-18T21:24:19.43871775Z" level=debug msg="Processing EventHubs event orders/7e9a5ff4-0c99-4726-7400-a944e4f3c47d (attempt: 1)" app_id=node-app component="pubsub (pubsub.azure.eventhubs/v1)" instance=dapr-pub-sub-examples-node-http--d4np7jq-684c774bb8-ngwl6 scope=dapr.contrib type=log ver=1.11.2-msft-3
npx
. The installation process will prompt for a project name - this same name will dictate the directory it creates for your Next.js application.npx create-next-app@latest
npx create-next-app@latest
√ What is your project named? ... azure-webapp-windows-node-nextjs-basic
√ Would you like to use TypeScript? ... No / Yes
√ Would you like to use ESLint? ... No / Yes
√ Would you like to use Tailwind CSS? ... No / Yes
√ Would you like to use `src/` directory? ... No / Yes
√ Would you like to use App Router? (recommended) ... No / Yes
√ Would you like to customize the default import alias (@/*)? ... No / Yes
Creating a new Next.js app in C:\Users\user\azure-webapp-windows-node-nextjs-basic.
npm run dev
to run the development server. Or, build the application first to generate a production build, this will create a .next
folder, and then run the application locally:npm run build
npm run start
NOTE: If using
yarn
, useyarn run build
andyarn start
NOTE: Running without building first may show
Error: ENOENT: no such file or directory, open 'C:\path\to\project\.next\BUILD_ID'
http://localhost:3000
:Compared to App Service Linux (eg., NextJS Deployment on App Service Linux), you can just package.json
commands directly, or by invoking the full path to next
to use the CLI. On Windows App Service, we can’t do this - as Node applications on Windows run with iisnode
in IIS, whereas with Linux App Service, applications run as Containers.
To make this work on Windows, we need to define a .js
entrypoint. This can be server.js
, index.js
, etc. In our case, this will be called server.js
. The content in here will be what is generally defined in Next.js’s custom server documentation
Create a server.js
and add the following:
const { createServer } = require('http')
const { parse } = require('url')
const next = require('next')
const dev = process.env.NODE_ENV !== 'production'
const hostname = '0.0.0.0'
const port = process.env.PORT || 3000
// when using middleware `hostname` and `port` must be provided below
const app = next({ dev, hostname, port })
const handle = app.getRequestHandler()
app.prepare().then(() => {
createServer(async (req, res) => {
try {
// Be sure to pass `true` as the second argument to `url.parse`.
// This tells it to parse the query portion of the URL.
const parsedUrl = parse(req.url, true)
await handle(req, res, parsedUrl)
} catch (err) {
console.error('Error occurred handling', req.url, err)
res.statusCode = 500
res.end('internal server error')
}
})
.once('error', (err) => {
console.error(err)
process.exit(1)
})
.listen(port, () => {
console.log(`> Ready on http://${hostname}:${port}`)
})
})
The logic regarding pathname
in Next.js’s custom server documentation example should be changed as fit by your application. These changes can and should be tested locally first. At a minimum, you need to include await handle(req, res, parsedUrl)
(above) or else the application will never return an HTTP response. This will also cause an application to fail when starting up on App Service.
A few important points:
hostname
is changed to 0.0.0.0
since we need to listen on all addresses. You can set the value to localhost
if you want to test this locally, but do not set this to localhost
on App Service - the application will fail to recieve the request since it’s only listening for local connections and not external ones.port
variable. PORT
on App Service Windows (for node applications) is actually a named piped - if you hardcode this, the application will fail to start. You can set port
to something like process.env.PORT || 3000
if wanting to test locally.NOTE: If
NODE_ENV
is set todevelopment
, this will enable hot reloading. This will cause adverse issues when deployed. Ensure thatNODE_ENV
is set toproduction
.
As example above, it’s imperative that the port
variable either is just set to process.env.PORT
or process.env.PORT || 3000
(or a port of your choosing for local development). This will cause the application to not return a HTTP response to warmup pings and ultimately fail to start.
Since IIS (and iisnode) is used on App Service Windows with Node applications, we’ll need to bring a web.config
. Sometimes, in a application deployment to App Service Windows, a web.config
will be auto generated. However, we need to ensure this targets our server.js
entrypoint. To avoid the site failing immediately after deploying and having to change this file later to correct this, let’s add one now.
Add the following web.config
to your project root:
<?xml version="1.0" encoding="utf-8"?>
<!--
This configuration file is required if iisnode is used to run node processes behind
IIS or IIS Express. For more information, visit:
https://github.com/tjanczuk/iisnode/blob/master/src/samples/configuration/web.config
-->
<configuration>
<system.webServer>
<!-- Visit http://blogs.msdn.com/b/windowsazure/archive/2013/11/14/introduction-to-websockets-on-windows-azure-web-sites.aspx for more information on WebSocket support -->
<webSocket enabled="false" />
<handlers>
<!-- Indicates that the server.js file is a node.js site to be handled by the iisnode module -->
<add name="iisnode" path="server.js" verb="*" modules="iisnode" />
</handlers>
<rewrite>
<rules>
<!-- Do not interfere with requests for node-inspector debugging -->
<rule name="NodeInspector" patternSyntax="ECMAScript" stopProcessing="true">
<match url="^server.js\/debug[\/]?" />
</rule>
<!-- First we consider whether the incoming URL matches a physical file in the /public folder -->
<rule name="StaticContent">
<action type="Rewrite" url="public{REQUEST_URI}" />
</rule>
<!-- All other URLs are mapped to the node.js site entry point -->
<rule name="DynamicContent">
<conditions>
<add input="{REQUEST_FILENAME}" matchType="IsFile" negate="True" />
</conditions>
<action type="Rewrite" url="server.js" />
</rule>
</rules>
</rewrite>
<!-- 'bin' directory has no special meaning in node.js and apps can be placed in it -->
<security>
<requestFiltering>
<hiddenSegments>
<remove segment="bin" />
</hiddenSegments>
</requestFiltering>
</security>
<!-- Make sure error responses are left untouched -->
<httpErrors existingResponse="PassThrough" />
<!--
You can control how Node is hosted within IIS using the following options:
* watchedFiles: semi-colon separated list of files that will be watched for changes to restart the server
* node_env: will be propagated to node as NODE_ENV environment variable
* debuggingEnabled - controls whether the built-in debugger is enabled
See https://github.com/tjanczuk/iisnode/blob/master/src/samples/configuration/web.config for a full list of options
-->
<!--<iisnode watchedFiles="web.config;*.js"/>-->
</system.webServer>
</configuration>
This should be placed relative to your package.json
. Example:
|-- .next
| -- <other production build files>
| public
| -- <public assets>
|-- src
| | -- <application .js files>
|-- .eslintrc.json
|-- jsconfig.json
|-- next.config.mjs
|-- package-lock.json
|-- package.json
|-- postcss.config.js
|-- README.md
|-- server.js
|-- tailwind.config.js
|-- web.config
Below are a few ways that we can deploy this application to Azure.
Note, if using the VSCode extension to deploy to App Service, the concept of including the build folder and using the custom deployment script also applies.
If deploying with Local Git, VSCode, or other methods that build against the Kudu site directly, we’ll want to ensure our .next
folder gets deployed as well. This is in .gitignore
by default, which is fine, since ideally we want to run npm run build
during our deployment phase - which will recreate this folder on each deployment, which is the ideal method, in case there are application changes.
However, a caveat on App Service Windows with node.js applications is that npm build
is not ran - only npm install
is. Therefor, we need to use a custom deployment script.
You can generate a custom deployment script with kuduscript. Run the following command in the root of your project:
$ kuduscript -y --node
Generating deployment script for node.js Web Site
Generated deployment script files
This will create a .deployment
and deploy.cmd
file. Don’t edit the .deployment
file. Any changes we make will be in the deploy.cmd
file. When deploying with Local Git, and having the .deployment
file - it will automatically detect that we’re using a custom deployment script and execute what we have here instead.
Use the following script in this repo - with some changes from the default script. Copy this into the deploy.cmd
that was generated in your project and redeploy to the site. The only major difference between the one we generated versus the one in the example repository is the additional of npm run build
in our custom deployment script. This is the key to get production builds generated with Local Git and VSCode deployments.
To setup Local Git as a deployment option, follow these steps:
git add .
git commit -m "initial Commit"
git remote add azure https://<sitename>.scm.azurewebsites.net:443/<sitename>.git
remote: Updating branch 'master'.
remote: Updating submodules.
remote: Preparing deployment for commit id '5f50a51ad1'.
remote: Running custom deployment command...
remote: Running deployment command...
remote: Handling node.js deployment.
remote: Creating app_offline.htm
remote: KuduSync.NET from: 'C:\home\site\repository' to: 'C:\home\site\wwwroot'
remote: Copying file: 'package-lock.json'
remote: Copying file: 'package.json'
remote: Copying file: 'server.js'
remote: Deleting app_offline.htm
remote: Looking for app.js/server.js under site root.
remote: Using start-up script server.js
remote: The package.json file does not specify node.js engine version constraints.
remote: The node.js application will run with the default node.js version 20.9.0.
remote: Selected npm version 10.1.0
remote: Running npm install..
remote: ...........
remote:
remote: added 1 package, and audited 356 packages in 16s
remote:
remote: 129 packages are looking for funding
remote: run `npm fund` for details
remote:
remote: found 0 vulnerabilities
remote: Creating a production build, running npm run build..
remote:
remote: > azure-webapp-windows-node-nextjs-basic@0.1.0 build
remote: > next build
remote:
remote: .............
remote: ? Disabling SWC Minifer will not be an option in the next major version. Please report any issues you may be experiencing to https://github.com/vercel/next.js/issues
remote: ? Next.js 14.1.0
remote:
remote: Creating an optimized production build ...
remote: .................................................................................................................................................
remote: ? Compiled successfully
remote: Linting and checking validity of types ...
remote: ....................................
remote: Collecting page data ...
remote: .......
remote: Generating static pages (0/6) ...
remote:
remote: Generating static pages (1/6)
remote:
remote: Generating static pages (2/6)
remote:
remote: Generating static pages (4/6)
remote:
remote: ? Generating static pages (6/6)
remote: ...
remote: Finalizing page optimization ...
remote: Collecting build traces ...
remote: .........................................
remote:
remote: Route (app) Size First Load JS
remote: + ? / 5.19 kB 90 kB
remote: + ? /_not-found 901 B 85.7 kB
remote: + First Load JS shared by all 84.8 kB
remote: + chunks/69-1b6d135f94ac0e36.js 29.2 kB
remote: + chunks/fd9d1056-cc48c28d170fddc2.js 53.7 kB
remote: + other shared chunks (total) 1.89 kB
remote:
remote: Route (pages) Size First Load JS
remote: - ? /about (1205 ms) 271 B 79.4 kB
remote: + First Load JS shared by all 79.1 kB
remote: + chunks/framework-56343d6ce4928a14.js 45.2 kB
remote: + chunks/main-3574ac84065612ad.js 32 kB
remote: + other shared chunks (total) 1.87 kB
remote:
remote: ? (Static) prerendered as static content
remote:
remote: Finished successfully.
remote: Running post deployment command(s)...
remote: Triggering recycle (preview mode disabled).
remote: Deployment successful.
If you see a message stating remote: Invalid start-up command "next start" in package.json. Please use the format "node <script relative path>".
- in Next.js’s case, shouldn’t be fatal and is more of a warning. To correct this, set the start
script in your package.json
to node server.js
.
.github/workflows
You can find more details about these steps documented here:
For Next deployments it is recommended to modify the default template with the following recommendations - this is due to the time it takes overwise to copy over files between deployment stages (eg., actions/upload-artifact@v2
) - without these changes, it may take 15 minutes to easily over an hour(s) since node_modules
contains thousands of files, or more, depending on the project.
Below is an example of zipping the source code contents and moving it between stages. Under the hood, ZipDeploy is used to deploy the zip artifact to Kudu.
This uses the “Publish Profile” method:
name: Build and deploy Node.js app to Azure Web App - myapp
on:
push:
branches:
- main
workflow_dispatch:
jobs:
build:
runs-on: windows-latest
steps:
- uses: actions/checkout@v2
- name: Set up Node.js version
uses: actions/setup-node@v1
with:
node-version: '20.x'
- name: npm install, build, and test
run: |
npm install
npm run build --if-present
- name: Zip all files for upload between jobs
run: Compress-Archive -Path .\* -DestinationPath next.zip
- name: Upload artifact for deployment job
uses: actions/upload-artifact@v2
with:
name: node-app
path: next.zip
deploy:
runs-on: ubuntu-latest
needs: build
environment:
name: 'Production'
url: $
steps:
- name: Download artifact from build job
uses: actions/download-artifact@v2
with:
name: node-app
- name: 'Deploy to Azure Web App'
uses: azure/webapps-deploy@v2
id: deploy-to-webapp
with:
app-name: 'myapp'
slot-name: 'Production'
publish-profile: $
package: next.zip
- name: Delete zip file
run: rm next.zip
This uses a Service Principal for authentication, which is now an option in the portal upon creation to use which will automatically create one for your deployment, when creating a new workload. Alternatively, your own Service Principal (with the correct scope and RBAC) can be used as a drop in replacement:
name: Build and deploy Node.js app to Azure Web App - somesite
on:
push:
branches:
- main
workflow_dispatch:
jobs:
build:
runs-on: windows-latest
steps:
- uses: actions/checkout@v4
- name: Set up Node.js version
uses: actions/setup-node@v3
with:
node-version: '~20'
- name: npm install and build
run: |
npm install
npm run build --if-present
- name: Zip all files for upload between jobs
run: Compress-Archive -Path .\* -DestinationPath next.zip
- name: Upload artifact for deployment job
uses: actions/upload-artifact@v2
with:
name: node-app
path: next.zip
deploy:
runs-on: windows-latest
needs: build
environment:
name: 'Production'
url: $
permissions:
id-token: write # This is required for requesting the JWT
steps:
- name: Download artifact from build job
uses: actions/download-artifact@v3
with:
name: node-app
- name: Login to Azure
uses: azure/login@v1
with:
client-id: $
tenant-id: $
subscription-id: $
- name: 'Deploy to Azure Web App'
uses: azure/webapps-deploy@v2
id: deploy-to-webapp
with:
app-name: 'somesite'
slot-name: 'Production'
package: next.zip
NOTE: If you want to use
yarn
you can drop in theyarn
command where need be. This will be a part of theactions/setup-node@v3
action.
You can use Azure Pipelines to build your Next application. For Next apps, you can use npm
or yarn
to install application dependencies and create a production build through the .next
folder. You can review more details here: Implement JavaScript frameworks.
The below will be creating a pipeline through .yaml
based creation.
trigger:
- main
variables:
# Agent VM image name
vmImageName: 'windows-latest'
environmentName: 'appname'
stages:
- stage: Build
displayName: Build stage
jobs:
- job: Build
displayName: Build
pool:
vmImage: $(vmImageName)
steps:
- task: NodeTool@0
inputs:
versionSpec: '20.x'
displayName: 'Install Node.js'
- script: |
npm install
displayName: 'npm install'
- script: |
npm run build
displayName: 'npm run build'
- task: ArchiveFiles@2
inputs:
rootFolderOrFile: '$(System.DefaultWorkingDirectory)'
includeRootFolder: false
archiveType: 'zip'
archiveFile: '$(Build.ArtifactStagingDirectory)/$(Build.BuildId).zip'
replaceExistingArchive: true
- task: PublishBuildArtifacts@1
inputs:
PathtoPublish: '$(Build.ArtifactStagingDirectory)/$(Build.BuildId).zip'
ArtifactName: 'drop'
- stage: Deploy
displayName: Deploy stage
dependsOn: Build
condition: succeeded()
jobs:
- deployment: Deploy
displayName: Deploy
environment: $(environmentName)
pool:
vmImage: $(vmImageName)
strategy:
runOnce:
deploy:
steps:
- task: AzureWebApp@1
inputs:
azureSubscription: 'subscriptionName(00000000-0000-0000-0000-000000000000)'
appType: 'webApp'
appName: 'appname'
package: '$(Pipeline.Workspace)/drop/$(Build.BuildId).zip'
This approach, like with GitHub Actions, will build for production in the pipeline and deploy the .next
folder required. Under the hood, ZipDeploy is used to deploy the zip artifact to Kudu.
Review this post for common iisnode-based issues - Troubleshooting Common iisnode Issues
A table of iisnode substatus codes can be found here - this can be used to track down what may be occurring. This usual indicates that node.exe
is crashing. A logging-errors.txt
file will be created (assuming that App Service Logs are enabled). If an uncaught exception is occurring, it will be logged into this file.
Common scenarios for this directly after a deployment may be:
package.json
but referencing the missing package as an import in codeWEBSITE_NODE_DEFAULT_VERSION
is not set, this falls back to a v0.x version.)You may have forgetten to deploy your web.config
, or this is misconfigured. Review the web.config
mentioned earlier in this article for comparison. Ensure this is also pointing to the correct .js
entrypoint file in the production build.
Additionally, a web.config
in an incorrect location may cause this as well. Eg., placing this in a subfolder outside of the project root.
Currently, Next 14.x requires Node.js >= 18.17.0. You may see something like this if you’re running a lesser Node version:
remote: npm WARN EBADENGINE Unsupported engine {
remote: npm WARN EBADENGINE package: 'next@14.1.0',
remote: npm WARN EBADENGINE required: { node: '>=18.17.0' },
remote: npm WARN EBADENGINE current: { node: 'v18.12.1', npm: '8.19.2' }
remote: npm WARN EBADENGINE }
You are using Node.js 18.12.1. For Next.js, Node.js version >= v18.17.0 is required.
Some potential mitigations:
WEBSITE_DEFAULT_NODE_VERSION
to ~20
, which targets the latest running version of major 20.x, this is the latest running Node major on App Servince Windows as of this blog postIn Log Stream or home\LogFiles\Application\logging-errors.txt
(if App Service Logs are enabled) an error like the below may show:
Mon Feb 05 2024 18:25:25 GMT+0000 (Coordinated Universal Time): Application has thrown an uncaught exception and is terminated:
Error: ENOENT: no such file or directory, open 'C:\home\site\wwwroot\.next\BUILD_ID'
Just like earlier in this post - if npm run build
or yarn build
was never ran, then .next
was never generated, assuming this is still in your .gitignore
. Ensure that the build
command for your package manager is being ran. For instance, if using Local Git or Visual Studio code deployments - you need to use a custom deployment script as mentioned above.
If using CI/CD like GitHub Actions or Azure Pipelines, then the build command needs to be ran on the pipeline and the .next
folder contained in the zip being deployed to Kudu.
Next.js uses a Rust-based compiler for faster compilation times - however this requires certain dependencies that may not be available on the machine. In [SWC Failed to load | Next.js](https://nextjs.org/docs/messages/failed-loading-swc) it gives a few workarounds, one regarding a C++ installation - this in particular cannot be altered with Windows App Service. |
The full build error message will appear at build time and look like the below:
remote: ? Attempted to load @next/swc-win32-ia32-msvc, but an error occurred: A dynamic link library (DLL) initialization routine failed.
remote: \\?\C:\home\site\wwwroot\node_modules\@next\swc-win32-ia32-msvc\next-swc.win32-ia32-msvc.node
remote: ? Failed to load SWC binary for win32/ia32, see more info here: https://nextjs.org/docs/messages/failed-loading-swc
remote: Compiler server unexpectedly exited with code: 3221225477 and signal: null
remote: Failed exitCode=-1073741819, command="C:\Program Files (x86)\nodejs\20.9.0\node.exe" "C:\Program Files (x86)\npm\10.1.0\node_modules\npm\bin\npm-cli.js" run build
remote: An error has occurred during web site deployment
Additioanlly, Node on App Service Windows is ran in 32bit by default. Attempting the most relevant workarounds in the above link (using .babelrc
and changing next.config.js
) does not work. For these workarounds to take effect, you need to switch to using a 64bit node version - which then SWC
can be opted out of. You can then either do:
.babelrc
in your project root with the following:{
"presets": ["next/babel"]
}
next.config.js
(or next.config.mjs
):const nextConfig = {
swcMinify : false,
};
NOTE: This property has the following mentioned: @deprecated — will be enabled by default and removed in Next.js 15
This currently happens when building against the Kudu site (local git, VSCode, etc.) - but may or may not happen with CI/CD agents - this completely depends on the machine and set up used. The same workarounds in that link can be attempted.
For more information, see the Next.js/Vercel GitHub thread on this: https://github.com/vercel/next.js/discussions/30468
If you’re deploying with NODE_ENV
to production
and devDependencies
containing tailwind
(or other dependencies required during next build
) - then you may see this message. A workaround is to move the devDependencies
into your dependencies
section or to set NODE_ENV
during the build or deployment phase to development
.
Note, if you’re using a CI/CD deployment process but also have a .deployment
file and a custom deployment script, this will cause the deployment (eg., npm install
/npm build
) to be re-ran on Kudu, and can have adverse side effects since it’s a different environment than a CI/CD agent - this will also extend deployment time since it’s essentially two builds occuring (one on the agent, one on Kudu). Ensure these files are removed or renamed prior to deployment.
As opposed to Linux, where if hardcoding the listening port for a Node application on App Service Linux will result in a HTTP 502 and a container time out on start up, doing this on Windows App Service will likely introduce a HTTP 500.1001 (or closely related substatus code). Follow the steps in the prerequisites portion of this post to avoid encountering this situation.
If the default hosting page still shows after a deployment, then likely one of the following is happening:
In either case, you can connect to your applications site contents with FTP or through the Kudu site to validate what exists under home\site\wwwroot
.
Wordpress on Linux refers to the offering that is found here - Create a WordPress site.
This uses an Alpine-based image with NGINX and PHP-FPM (currently PHP 8.x). NGINX is the web server used.
It is a common misconception that .htaccess
files can be used here. .htaccess
files cannot be used - as these are only relevant for Apache HTTPD servers.
Sometimes, it may be needed to change some of NGINX’s configuration through its .conf
files under certain circumentances. This post will explain some more common changes that can be done.
It is recommended that How to run Bash scripts in WordPress on Azure App Service (techcommunity) / How to run Bash scripts in WordPress on Linux App Services (GitHub) was read before hand as well. This explains generally using startup scripts with this Wordpress image which will differ slightly from other “Blessed” images. The startup script location is only under /home/dev/startup.sh
.
To start editing, you’ll want to access https://yoursite.scm.azurewebsites.net/webssh/host
to be able to copy of the files described below. Alternatively, you can go to https://yoursite.scm.azurewebsites.net
and then select SSH on the top nav bar
NOTE: There is the option in the Azure Portal on the Web App for “SSH” as well, which will take you to the same spot
For common scenarios, you’re either going to change nginx.conf
or default.conf
. But there may be other files as well:
/etc/nginx/nginx.conf
/etc/nginx/conf.d/default.conf
/etc/nginx/conf.d/spec-settings.conf
. This is loaded in by nginx.conf
for various NGINX and fastcgi settings.Depending on what’s being done, copy these to /home/dev
with cp /etc/nginx/nginx.conf /home/dev
or cp /etc/nginx/conf.d/default.conf /home/dev
for easier editing.
/etc/nginx/conf.d/default.conf
to /home/dev
. “Our” copy of the file will exist as /home/dev/default.conf
/home/dev/startup.sh
to include the following:#!/bin/bash
echo "Copying custom default.conf over to /etc/nginx/conf.d/default.conf"
cp /home/dev/default.conf /etc/nginx/conf.d/default
nginx -s reload
default.conf
with your intended change below.Add a new server block as seen below. Ensure the old server block and all other directives also remain in your default.conf
. Essentially, you will have two server
blocks.
server {
server_name mysite.azurewebsites.net;
return 301 $scheme://www.google.com$request_uri;
}
# old/original server block / other directives - KEEP this
server {
listen 80;
....
Add a new server block as seen below. Ensure the old server block and all other directives also remain in your default.conf
. Essentially, you will have two server
blocks.
server {
server_name mysite.azurewebsites.net;
return 301 $scheme://www.mysite.azurewebsites.net$request_uri;
}
# old/original server block / other directives - KEEP this
server {
listen 80;
....
Redirect www to non-www:
Add a new server block as seen below. Ensure the old server block and all other directives also remain in your default.conf
. Essentially, you will have two server
blocks.
server {
server_name www.mysite.azurewebsites.net;
return 301 $scheme://mysite.azurewebsites.net$request_uri;
}
# old/original server block / other directives - KEEP this
server {
listen 80;
....
Before diving into this, it’s important to understand/remember that, by default, App Service Front-ends do TLS termination. Therefor, all HTTPS requests go back as HTTP to the application container.
If you try to set NGINX to do a redirect back to HTTPS, this will get into a redirect loop with ERR_TOO_MANY_REDIRECTS
because:
https://yoursite.com
- which then goes through the TLS termination process again, and repeats the whole process. Which begins the loopSince TLS termination is done, sites get full TLS/SSL benefit.
However, if this wanted to be done for some reason, the below would achieve this. Add a new server block as seen below. Ensure the old server block and all other directives also remain in your default.conf
server {
server_name mysite.azurewebsites.net;
return 301 https://mysite.azurewebsites.net$request_uri;
}
You can add headers with the add_header
directive in the following format to your default.conf
server
block:
add_header X-custom-header "my custom header";
Your startup script would look like this:
#!/bin/bash
echo "Copying custom default.conf over to /etc/nginx/conf.d/default.conf"
cp /home/dev/default.conf /etc/nginx/conf.d/default
nginx -s reload
Removing the Server
header comes down to two general methods with NGINX:
.c
files where this header is set and then recompile. This cannot be done on the Wordpress App Service imageUsing the same approach above with a startup script, copy over an nginx.conf
from /etc/nginx/nginx.conf
to /home/dev
for us to alter.
Add in the following to the file:
...other directives
# Add this line
load_module /usr/lib/nginx/modules/ngx_http_headers_more_filter_module.so;
events {
worker_connections 10000;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
# Add this line
more_clear_headers 'Server';
...other directives
Make sure to keep the rest of the nginx.conf
the same aside from the two lines with the comment “Add this line” above them, which pertains to loading the module by referencing the .so
and actually adding the more_clear_headers
directive itself.
Your /home/dev/startup.sh
file would now look like this:
#!/bin/bash
echo "Installing 'nginx-mod-http-headers-more'.."
apk add nginx-mod-http-headers-more
echo "Copying custom nginx.conf over to /etc/nginx/nginx.conf"
cp /home/dev/nginx.conf /etc/nginx/nginx.conf
nginx -s reload
Lastly, restart the site.
(Before)
(After - note the lack of the Server
header)
Most other relevant configuration directives can be found in /etc/nginx/conf.d/spec-settings.conf
. These are all loaded into nginx.conf
through the include /etc/nginx/conf.d/*.conf;
directive.
Some common directives that may be changed within this file are:
client_max_body_size
: This is set to 512MB
in the current imageclient_header_buffer_size
: This is set to 256K
server_tokens
: This is set to off
add_header
directive for common security headersA lot of these typically do not need to be changed. But if needed for some reason, use the same startup script approach:
spec-settings.conf
from /etc/nginx/conf.d/spec-settings.conf
to /home/dev
#!/bin/bash
echo "Copying custom spec-settings.conf over to /etc/nginx/conf.d/spec-settings.conf"
cp /home/dev/spec-settings.conf /etc/nginx/conf.d/spec-settings.conf
nginx -s reload
Blocking IPs:
You can use NGINX to block certain IPs by following How to enable IP access restrictions on wp-admin for the WordPress on App Service offering
You may notice that when using custom startup scripts on Wordpress and either looking for stdout from an operation or from explicit stdout generation like echo
- you will not see this appear in default_docker.log
.
This is because supervisord usage and it’s configuration. To get a general idea of how this works - see Logging with supervisord on Web Apps for Containers
With Wordpress on App Service, the same concept applies with supervisord and log locations. Since startup scripts are ran as a separate process (named post-startup-script
) - they can be found in their own log files under the /tmp
directory, which is the default for supervisord. This will either show as:
post-startup-script-stderr---supervisor-00000000.log
post-startup-script-stdout---supervisor-00000000.log
Below is an example of reviewing stdout from a custom startup script in one of these log files:
7fd6d23d988d:/tmp# cat post-startup-script-stdout---supervisor-yb_gtkno.log
This is being executed from /home/dev/startup.sh..
Furthermore, by reviewing default_docker.log
(the equivalent is also shown in /var/log/supervisor/supervisord.log
), you can confirm if a startup script is successfully executed by finding the below in logging:
2023-11-13 14:32:40,214 INFO spawned: 'post-startup-script' with pid 237
2023-11-13 14:32:40,306 WARN exited: post-startup-script (exit status 0; not expected)
Although supervisord is showing “not expected” - an exit
of status 0 is successful - and what we want to see. An exit code greater than > 0 is deemed unsuccessful.
Ultimately, this is all relevant to know in case the startup script is failing, due to something like invalid NGINX syntax in a .conf
files being overriden. If you notice that changes in the startup script are not applying, review the /tmp/post-startup-script-stderr
and /tmp/post-startup-script-stdout
files by opening an SSH session in the application container.
Note: This can happen for any editing of startup scripts on App Service Linux. This is not limited to just Wordpress.
If saving a startup.sh
(or similar file used for startup scripts) with the /newui
File Manager editor, and then trying to run that same bash script, the below error may appear - which can also be found in /tmp/post-startup-script-stderr---supervisor-xxxxxxxx.log
(see the Stdout/err through startup scripts above):
/home/dev/startup.sh: line 2: $'\r': command not found
This may not be visible in the File Manager, but if using vi
, you can see extra characeters ^M
appended to the line endings. This is because the script was saved in the UI with Windows-style line endings (\r
, return carriage)
#!/bin/bash^M
^M
echo "Copying custom default.conf over to /etc/nginx/conf.d/default.conf"^M
To resolve this, using vi
or another text editor, delete the bad endings and save the file.
Typically, to the user, this would firstly manifest as their startup script/changes done through the script not applying, since likely the script is failing to properly execute.
]]>Sometimes, you may encounter an issue where an application is having issues connecting to a MySQL database, or, certain queries when done through application code do not generate the result you expected, amongst other potential issues.
Through the SSH option on the App Service Linux Kudu site, you can connect to your container and use the below approach. For “Blessed Images”, SSH is already enabled.
For custom images, you may need to enable SSH - if so, review Enabling SSH on Linux Web App for Containers. If SSH is not able to enabled into a custom image, then what’s covered in this blog post may not be able to be done for creating a connection from that particular application.
NOTE: Do not try to install these packages through the “Bash” option - Bash opens a shell in the Kudu container where you’re running as
kudu_ssh_user
(non-root) - therefor package installation will fail. This must be done in the application container (“SSH” option).
SQLCMD
is sometimes used on Windows applications for troubleshooting.
It is an executable available only in Windows App Service applications. Because of this, the closest implementation is installing the relevant Linux MySQL client for your distribution. This concept is essentially the same outside of Azure in most *NIX or containerized environments.
First, it may be good to test general connectivity and name resolution.
Name resolution:
You can use the nslookup
command. Run nslookup [yourmysqlhost]
. If this is an Azure Database for MySQL server, run nslookup somemysqlserver.mysql.database.azure.com
, for example.
If this is not installed in the container, run the following:
apt-get install dnsutils
apk add bind-tools
tdnf install bind-utils
yum install bind-utils
If name resolution fails then subsequent commands and steps below will not work, this will need to be focused on first.
Connections:
We can test if the MySQL server is able to establish a connection from us, the client, by using the nc
(netcat) command.
If this is not installed in the container, run the following:
apt-get install netcat
E: Package 'netcat' has no installation candidate
use apt-get install netcat-traditional
insteadapk add netcat-openbsd
tdnf install nc
yum install nc
This example is ran within a Debian-based container. We’re running the nc
command nc -vzn [mysql_ip] [mysql_port]
which confirms that a connection to port 3306 for our MySQL server can be established. If this fails, review if traffic is allowed to the destination (eg., firewall on the server, UDR/RTs, Virtual Appliances, etc.). You should see something like the below.
To install the MySQL client, run the following depending on your package manager and distribution:
apt-get install default-mysql-client
apk add mysql mysql-client
tdnf install mysql
yum install mysql
If you’re running a Ubuntu/Debian installation and get E: Package 'mysql-client' has no installation candidate
, make sure that default-mysql-client
is being installed as the package.
With the mysql
client now installed, run the command mysql -u [mysql_user] -h [mysql_host_fqdn] -p
. If this can successfully connect, you’d see something like the below:
You can now review the databases, tables, execute queries and other information since you’re now connected to the target MySQL server:
If you’re unable to connect - the error message should be written to the console, which can be troubleshot further depending on the message itself.
]]>ContainerBackOff
events and container exits events
This post talks about ContainerBackOff
events and container exit events - which may show up as Error
or ContainerTerminated
in the Reason
column of the ContainerAppSystemLogs_CL
/ ContainerAppSystemLogs
tables for Log Analytics or Azure Monitor, respectively.
Back off events:
ContainerBackOff
or messages of Persistent Failiure to start container
is essentially the same thing backoff-restart
events in Kubernetes. Where the container is continuously failing to be started in a pod or replica.
Container exits:
Container exits are what you may imagine - the container was able to start in a pod or replica but at some point in time after it exited, typically with a status code thats greater than 0 (indicating failure)
This will also appear in the ContainerAppSystemLogs_CL
/ ContainerAppSystemLogs
tables - typically with a Reason
of ContainerTerminated
or Error
. The message itself may look like Container 'my-container' was terminated with exit code '137'
Note that you generally may see both Persistent Failiure to start container
and Container 'my-container' was terminated with exit code 'some_exit_code'
together.
Although, these two may not be mutually exclusive, if for instance, the container exited and a new pod was created and the next attempt at container startup in the new pod was succesful. In that case, you’d only see Container 'my-container' was terminated with exit code 'some_exit_code'
.
If in the new pod/replica, the container continued to be unsuccessful with starting - then you’d see both messages.
Most of the times, these messages may be application related. The reason for failure completely depends on the application logic and configuration required for it.
Since that is the case, and reasons can be almost limitless, it is important to always review Log Stream and ContainerAppConsoleLogs_CL
/ ContainerAppConsoleLogs
tables (depending if Azure Monitor or Log Analytics is used). Assuming that the application is writing to stderr
, normally, some indication of failure would be in here.
Below is an example, where ContainerAppSystemLogs_CL
most shows Persistent Failiure to start container
. When we look in ContainerAppConsoleLogs_CL
, we see the reason why:
(ContainerAppSystemLogs_CL)
(ContainerAppConsoleLogs_CL)
In this example, this was an exception thrown by raise Exception()
in this applications app.py
, acting as the entrypoint. Given the location its called from, this causes the application to exit at every startup attempt.
Other examples may generally include:
Container exits:
An application exiting with a specific exit code may look like something below. You want to take the same approach to investigating application logs. If for some reason this is not being written to stdout/stderr
- consider enabling more verbose logging while attempting to reproduce the issue. Otherwise, tracking down the issue may be more tough.
Below is a table that can generally be referenced for exit codes:
Code | Description |
---|---|
0 | An exit code 0 can signify successful completion of the task executed within code. This may not necessarily be bad. If an application is exiting with an exit code of 0, review if this is expected for the application as this is likely being set within code or a referenced library. A call to exit() may cause the container to exit regardless of being successful or not. |
1 | An exit code 1 can signifiy an application error caused the container to exit. This can be any fatal runtime error, or an invalid reference, such as a file being referenced that does not exit within the container. This is a generic catch all application exit error code. |
2 | An exit code 2 can indicate a missing keyword, command, or invalid syntax in an invoked shell or script. This can indicate a permissions issue as well. |
100 | An exit code 100 may be common with MongoDB (mongod). This may be due to an unhandled exception. Review if this container is a MongoDB container. |
125 | An exit code 125 can point to an issue with the run command. Such as an undefined flag, an issue between the container runtime engine and the OS, or the user in the defined Docker Image does not have sufficient permissions on the machine. |
126 | An exit code 126 may mean a command used in the container cannot be invoked. Possibly due to syntax issues or an invalid/missing dependency of the command |
127 | An exit code 127 means the command invoked refers to a non-existent or non-accessible file or directory. |
128 | An exit code 128 is an invalid argument to exit. The allowed range is only whole integers between 0-255. |
134 | An exit code 134 means the container abnormally terminated itself, closed the process and flushed open streams. A library or specific process may have called SIGABRT . |
137 | An exit code 137 means the container received a SIGKILL signal from the OS. This is forceful termination. This can happen in k8s OOM (Out of memory) scenarios, resource contention issues on the node, or other various k8s-specific scenarios. This can also occur if a SIGTERM was sent but the container did not shutdown after 30 or more seconds. OOM is not mutually exclusive to a 137 code |
139 | An exit code 139 indicates a SIGSEGV , or Segmentation Fault. This can happen due to code issues, issues between executables in the application and shared libraries/object files (.so files), or incompatability with libraries and the OS. |
143 | An exit code 143 indicates a SIGTERM , or graceful shutdown. This can happen due to Kubernetes terminating the pod, such as due to inactivity with minReplicas set to 0, or if there is node movement occurring and the pods need to be shut down. |
255 | An exit code 255 indicates an Exit Status Out Of Range. The container (application) entrypoint stopped and returned that static. Investigation through logs will be needed to see why the container entrypoint exited with this status code. |
It is possible to have other exit codes that are not on this list. The closest to “standardized” exit codes is what’s defined here - tldp.org - exit codes. Otherwise, a developer could use their own meaning for exit codes. Always review application logs in this case.
Most of the common exit codes that may be seen that may have had a cause by the platform is related to exit code 137
. As mentioned above, this can happen if SIGTERM
was sent to the container but it not shut down within the period defined by “Termination Grace Period” - in which case, a 137
/ SIGKILL
is sent.
Aside from this, vCPU or memory constraints being hit on the node the pod or replica is running on will kill the container with a 137
code and also may show the below:
Maximum Allowed Cores exceeded for the Managed Environment. Please check https://learn.microsoft.com/en-us/azure/container-apps/quotas for resource limits
This is resource contention related, which won’t be covered in this article.
In other circumstances, a possibility of a 143
exit code in scenarios where a pod or replicas are scaled down (most likely back to 0) may show, as well.
Note: This is regarding the Wordress on Linux “marketplace” image and not about using a PHP 8.x “Blessed image” with a Wordpress installation. These are two different products.
Wordpress on App Service Linux Docker Images utilize php-fpm
, which can have various settings be changed if needed. The current settings are set to generous defaults, notably the following which are some of the more popular ones:
pm.max_children = 50
pm.start_servers = 20
pm.min_spare_servers = 5
pm.max_spare_servers = 35
A typical reason for wanting to change some of these is if [pool www] server reached pm.max_children setting (50), consider raising it
is seen. Although there are other core reasons why this is likely appearing aside from the fact that simply that max children php-fpm
processes were hit. That is however outside of this post.
The settings talked about here are found in zz-docker.conf
under /usr/local/etc/php-fpm.d/zz-docker.conf
. The default file contains the following currently:
[global]
daemonize = no
[www]
;listen = 9000
listen = /var/run/php/php-fpm.sock
listen.owner = nginx
listen.group = nginx
listen.mode = 0660
pm = dynamic
pm.max_children = 50
pm.start_servers = 20
pm.min_spare_servers = 5
pm.max_spare_servers = 35
Although www.conf
has the same settings, changing this (even with reloading php-fpm
and nginx
) will not have php-fpm
pick up these changes. This can be confirmed with the php-fpm -tt
CLI command or, depending on what was changed (like pm.start_servers
), you can use top
or ps
.
Rationale on why this file exists and may be used over www.conf
can be found in this PHP-FPM GitHub thread - Documentation regarding configuration. - Issue #241
zz-docker.conf
file from /usr/local/etc/php-fpm.d/zz-docker.conf
to /home/dev/
.zz-docker.conf
. As an example, we’ll change pm.max_children
to 40
and pm.start_servers
to 10
.... other settings
pm = dynamic
pm.max_children = 40
pm.start_servers = 10
pm.min_spare_servers = 5
pm.max_spare_servers = 35
... other settings
Follow How to run Bash scripts in WordPress on Azure App Service (techcommunity) / How to run Bash scripts in WordPress on Linux App Services (GitHub) regarding the startup script location under /home/dev/startup.sh
Add the following to your /home/dev/startup.sh
file:
#!/bin/bash
cp /home/dev/zz-docker.conf /usr/local/etc/php-fpm.d/
echo "Copied /home/dev/zz-docker to /usr/local/etc/php-fpm.d/"
You can confirm this was executed by running cat /tmp/post-startup-script-stdout---supervisor-xxxxxxxx.log
NOTE: If they are any errors with startup script execution they’ll be logged in
/tmp/post-startup-script-stderr---supervisor-xxxxxxxx.log
php -tt
, below is a truncated output example - we can our pm.max_children
and pm.start_servers
change:[18-Jan-2024 18:46:49] NOTICE: pm = dynamic
[18-Jan-2024 18:46:49] NOTICE: pm.max_children = 40
[18-Jan-2024 18:46:49] NOTICE: pm.start_servers = 10
[18-Jan-2024 18:46:49] NOTICE: pm.min_spare_servers = 5
[18-Jan-2024 18:46:49] NOTICE: pm.max_spare_servers = 35
[18-Jan-2024 18:46:49] NOTICE: pm.max_spawn_rate = 32
[18-Jan-2024 18:46:49] NOTICE: pm.process_idle_timeout = 10
[18-Jan-2024 18:46:49] NOTICE: pm.max_requests = 500
[18-Jan-2024 18:46:49] NOTICE: pm.status_path = undefined
[18-Jan-2024 18:46:49] NOTICE: pm.status_listen = undefined
top
to validate this. Since pm.start_servers
is a change you can easily visually see by the number of php-fpm
child processes at “rest”, we can count the ones we see:263 184 nginx SN 594m 19% 0 0% php-fpm: pool www
266 184 nginx SN 594m 19% 0 0% php-fpm: pool www
265 184 nginx SN 594m 19% 0 0% php-fpm: pool www
184 182 root SN 591m 18% 0 0% php-fpm: master process (/usr/local/etc/php-fpm.conf)
271 184 nginx SN 591m 18% 0 0% php-fpm: pool www
264 184 nginx SN 591m 18% 0 0% php-fpm: pool www
272 184 nginx SN 591m 18% 0 0% php-fpm: pool www
270 184 nginx SN 591m 18% 0 0% php-fpm: pool www
267 184 nginx SN 591m 18% 0 0% php-fpm: pool www
268 184 nginx SN 591m 18% 0 0% php-fpm: pool www
269 184 nginx SN 591m 18% 0 0% php-fpm: pool www
We can confirm there is a default of 10 php-fpm
child processes.
Knowing the above, other settings can be changed as needed in the zz-docker.conf
file, although in most cases, this does not need to be done.
The steps below are derived from the Oracle documentation found here: Install the free Oracle Instant Client ‘Basic’ ZIP file
cd /home/site
wget https://download.oracle.com/otn_software/linux/instantclient/instantclient-basic-linuxx64.zip
unzip instantclient-basic-linuxx64.zip
ls -la
For Linux Apps, we need to utilize a startup command. The Oracle instant client has a dependency, libaio1, which will need to be installed on startup.
#!/bin/bash
# required dependency for the instant client.
apt install libaio1
# Start your application. This will depend on your application stack.
npm start
For the application to recognize the client, we need to add the app setting LD_LIBRARY_PATH where the value is the directory of the instant client. In my case, this is /home/site/instantclient_21_11, this will depend on the directory name mentioned in the previous step.
Add a new app setting from Configuration > Application settings > + New application setting
Your Linux App Service is now configured and you should be able to successfully connect to your Oracle Database.
The Oracle client comes as either x86 or x64. Windows App Services can be 32bit or 64bit depending on the application stack and configuration.
Windows App Services are 32-bit by default. However, Java apps can only be 64-bit.
In this example, we’re utilizing x64 because node-oracledb does not come with a prebuilt x86 version. I will include configuration instructions for x86, but not for building the x86 node module from source.
You can check which version your Windows App Service is using from the debug console.
echo %PROCESSOR_ARCHITECTURE%
You can switch between x64 and x86 here: Configuration > General Settings > Platform Settings > Platform.
The process here is similar to Linux, but we use curl instead of wget. You can also manually download a specific version then upload to App Service using FTP or the drag and drop feature on the Kudu site.
x86
cd /home/site
curl https://download.oracle.com/otn_software/nt/instantclient/instantclient-basic-nt.zip --output client.zip
unzip client.zip
x64
cd /home/site
curl https://download.oracle.com/otn_software/nt/instantclient/instantclient-basic-windows.zip --output client.zip
unzip client.zip
Output
For your application to find the Oracle client, we need to add the instant client to the PATH variable.
<?xml version="1.0"?>
<configuration xmlns:xdt="http://schemas.microsoft.com/XML-Document-Transform">
<system.webServer>
<runtime xdt:Transform="InsertIfMissing">
<environmentVariables xdt:Transform="InsertIfMissing">
<add name="PATH" value="%PATH%/home/site/instantclient_21_9/;"
xdt:Locator="Match(name)" xdt:Transform="InsertIfMissing" />
</environmentVariables>
</runtime>
</system.webServer>
</configuration>
“Command override” refers to what is seen in the portal (below), or for example, what the --command
parameter in the az containerapp update command
This is the same concept of overriding a containers ENTRYPOINT
or CMD
with a custom command - which can be done in most environments that can run a container. For Kubernetes specific documentation, refer to Kubernetes - Containers - Define command arguments for a container
An example of this locally (non-kubernetes) - would be docker run -d -p 8080:8080 somecontainer node server.js
- where node server.js
is the command passed into override the container start up.
When using a command override for Container Apps (or in general for a container), a few things need to be validated:
$PATH
A common issue when trying to use this option is encountering OCI runime create failed
issues due to the above reason (and others), this would look like the following - below are 2 examples:
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "java -jar probes-0.0.1-SNAPSHOT.jar": stat java -jar probes-0.0.1-SNAPSHOT.jar: no such file or directory: unknown
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "dbt build --select stage+": executable file not found in $PATH: unknown
The command following exec
(above) is what is attempted to use to start the container.
These can be found in the ContainerAppSystemLogs
/ ContainerAppSystemLogs_CL
table (depending on if you’re using Log Analytics or Azure Monitor). However, be noted of logging differences between Consumption-only and Dedicated environments:
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: ... [rest or message]
Container 'oci-runtime-test' was terminated with exit code '' and reason 'ContainerCreateFailure'
As seen above, dedicated environments will show a “reason” of ContainerCreateFailure
in the Log_s
column.
Consumption-only environments will output the full message that you would typically see in stderr
.
If OCI runtime create failed
or ContainerCreateFailure
is being seen when using “command override”:
no such file or directory: unknown
in the OCI
error$PATH
$PATH
, you may see executable file not found in $PATH: unknown
. You will either need to reference the executable by the full path or export
it first to make it available -python, /some/path/python.py
no such file or directory: unknown
for commands not comma separatedFor further information, see the blog - Container Apps - Troubleshooting ‘ContainerCreateFailure’ and ‘OCI runtime create failed’ issues
If backoff-restart
or container exits are seen after using Command Override - this likely indicates that the command being invoked was succesful to a degree, but an application or runtime issue is causing it to fail to succesfully start or terminate post-start. In these instances, reviewing LogStream or the ContainerAppConsoleLogs
/ ContainerAppConsoleLogs_Cl
is important since the reason for failure depends on the application.
This post will cover mostly what is in Troubleshooting OCI runtime create errors, however, this will be more Container App specific
Note: This post can also potentially apply to any environment a container (linux-based) can run. Not just on Container Apps.
Preface: Some of these errors can and will happen if attempting to run locally. It is always a good practice to make sure you can actually run your image as a container on your local computer or development environment first. Some of these issues covered below will require the image to be rebuilt and/or code to be properly changed.
In short, seeing this error means that container creation failed. There is typically more information in this message, which we’ll cover some scenarios below. Some of the messages may be cryptic while others more straight forward. This error can happen for a myriad of reasons.
runc
? - runc is a CLI tool for spawning and running containers on Linux according to the OCI specificationThe runc create failed
is returned here - create.go, when trying to create the container.
On Container Apps (or in any other environment that can run a linux-based container), this means we have successfully created a pod or replica, succesfully pulled an image, but have failed on the container create
part of a pod/replica lifecycle.
Essentially, the error is happening on the “create” process of the container, when trying to create one to run. Hitting this means we never successfully get to attempt to start a container within a pod or replica.
The full error message may look like this (on consumption-only environments):
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/usr/local/bin/appp": stat /usr/local/bin/appp: no such file or directory: unknown
Note: The message above may vary slightly depending on the reason of failure
Some of these errors can also be reproduced in local environments - some other errors may be due to additional configuration or other environment factors.
When running applications on consumption-only environments vs. dedicated environments, you may notice that the error signature varies between environments when reviewing logging in ContainerAppSystemLogs
/ContainerAppSystemLogs_CL
tables:
(consumption-only): Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: ... [rest or message]
(dedicated): Container 'oci-runtime-test' was terminated with exit code '' and reason 'ContainerCreateFailure'
As seen above, dedicated environments will show a “reason” of ContainerCreateFailure
in the Log_s
column.
Consumption-only environments will output the full message that you would typically see in stderr
.
In short, it’s good to note that ContainerCreateFailure
on dedicated environments is referring to OCI runtime create failed
issues.
OCI runtime create failed: runc create failed: unable to start container process: exec: "/path/to/somefile.sh": stat /path/to/somefile.sh: no such file or directory: unknown
Dockerfile
’s ENTRYPOINT
instruction does not exist at the path specified, either is the wrong name/typo, or does not exist in the image./app
. We’re copying over our init_container.sh
file as well: WORKDIR /app
.. other logic
COPY . ./
RUN chmod +x /app/init_container.sh
.. other logic
EXPOSE 8080
ENTRYPOINT [ "/app/1/init_container.sh" ]
ENTRYPOINT
is referencing for init_container.sh
does not exist.OCI runtime create failed: runc create failed: unable to start container process: exec: "[some exectuable]": executable file not found in $PATH: unknown.
:
CMD
is not available to be invoked. The executable (if called by name) may also not be on $PATH
. Review if the executable exists in the image or at the proper filesystem path specified.OCI runtime create failed: container_linux.go:380starting container process caused: exec: "/some/entrypoint.sh" permission denied: unknown
OCI runtime create failed: container_linux.go:380starting container process caused: exec: "/some/entrypoint.sh" permission denied: unknown.
Dockerfile
and instructions - but this can be most likely attributed to your container entrypoint not having executable permissions.RUN chmod +x /some/entrypoint.sh
instruction, and rebuild the image.OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting \"/some/path" to rootfs at \"/somepath\": mkdir /mnt/to/somepath: no space left on device: unknown"}
no space left on device
could potentially occur if a volume to a file share is mounted in which its quota has been hitOCI runtime create failed: container_linux.go:380: starting container process caused: setup user: cannot set uid to unmapped user in user namespace: unknown
uid
being set for a file or user that is outside the allowed range of the host.More on this can be read here - Docker User Namespace remapping issues
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/var/lib/kubelet/pods/35ed546a-ff19-40ce-be79-76c8fe61d0a9/volumes/kubernetes.io~csi/azurefiless/mount" to rootfs at "/some/path/to/file.js": mount /var/lib/kubelet/pods/35ed546a-ff19-40ce-be79-76c8fe61d0a9/volumes/kubernetes.io~csi/azurefiless/mount:/some/path/to/file.js (via /proc/self/fd/6), flags: 0x5000: not a directory: unknown
Other errors may also be related to specific directories with the container, like /proc
, for instance - which should be avoided when using volumes - avoid doing this as this can affect the integrity of the container:
OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting ...." cannot be mounted because it is not of type proc: unknown"
“Command override” refers to what is seen in the portal (below), or for example, what the --command
parameter in the az containerapp update command
Using “command override” is the same concept of overriding a containers ENTRYPOINT
or CMD
with a custom command - which can be done in most environments that can run a container. For Kubernetes specific documentation, refer to Kubernetes - Containers - Define command arguments for a container
When using this option, it is possible to cause the above issues regarding failed container creation because of invalid input, ensure that:
$PATH