Troubleshooting / Containers Not Running Properly

1. Containers Continuously Restarting in Docker Stack Environment

This issue is generally caused by incorrect configurations, firewalls, or various whitelist settings.

Specific manifestations include:

Unable to open the page using a browser
When using the sudo docker ps -a command to view the container list, it is observed that the container keeps restarting
Using curl http://localhost:8088 on the deployment server returns the error curl: (7) Failed to connect to localhost port 8088: Connection refused
Error stack information is continuously output in the log files

Possible causes and solutions:

Possible Causes	Solutions
Manual configuration changes with errors	Check the modified configuration files, verify YAML syntax, database connection information, etc.
External server specified in configuration but network is unreachable	Check firewall, Alibaba Cloud security group settings, database whitelist configurations, etc.
Compatibility issues with the operating system	See below Compatibility Issues with the Operating System
Redis does not support the current system's page size	See below Redis Does Not Support the Current System's Page Size

Compatibility Issues with the Operating System

If using docker logs {Server Container ID} reveals the following or similar errors:

Text Only
node[9]: ../src/node_platform.cc:61:std::unique_ptr<long unsigned int> node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Start(): Assertion `(0) == (uv_thread_create(t.get(), start_thread, this))' failed.
0xb57f90 node::Abort() [node]
0xb5800e  [node]
0xbc915e  [node]
0xbc9230 node::NodePlatform::NodePlatform(int, v8::TracingController*, v8::PageAllocator*) [node]
0xb1b3d1 node::InitializeOncePerProcess(int, char**, node::InitializationSettingsFlags, node::ProcessFlags::Flags) [node]
0xb1bc89 node::Start(int, char**) [node]
0x7f2ca389fd90  [/lib/x86_64-linux-gnu/libc.so.6]
0x7f2ca389fe40 __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6]
0xa93f0e _start [node]
Aborted (core dumped)

This may be due to incompatibility between the current operating system / components and Docker (e.g., DataFlux Func 2.x comes with Docker 20.10.8, which may have issues on the latest OS versions).

Possible solutions:

Upgrade DataFlux FuncUpgrade DockerUpdate OS Components

Upgrade DataFlux Func to the latest version. During the upgrade, allow the installation script to upgrade Docker as well. For details, refer to Deployment and Maintenance / Upgrade and Restart / Upgrade System

If the latest version of DataFlux Func still has the above issues, users can manually download the official Docker binary package to upgrade Docker.

Download from the official Docker site: https://download.docker.com/linux/static/stable/

Or from Alibaba Cloud mirror site: https://mirrors.aliyun.com/docker-ce/linux/static/stable/

For example, on Ubuntu, use the following commands to upgrade OS components:

Bash
sudo apt update
sudo apt upgrade
sudo apt dist-upgrade

Redis Does Not Support the Current System's Page Size

The official Redis image may fail to start on some ARM-based operating systems with the error <jemalloc>: Unsupported system page size. See:

2. Containers Missing in Docker Stack Environment

This issue is generally caused by incorrect runtime environments.

Specific manifestations include:

Executing sudo docker stack ls shows dataflux-func
Executing sudo docker ps -a does not show the corresponding container
Executing sudo docker stack ps dataflux-func --no-trunc reveals abnormal container status

Possible causes and solutions:

Possible Causes	Solutions
Snap version of Docker installed on the system	Uninstall Snap Docker, reinstall Docker from official sources, or use the Docker included in the script
Others	Investigate based on the `ERROR` column in `sudo docker stack ps dataflux-func --no-trunc`

A typical example is no space left on device, indicating insufficient disk space.

3. Containers Failing to Start in k8s Environment

This issue is generally caused by host / k8s cluster problems.

Possible errors in k8s include:

Text Only
Events:
  Type     Reason   Age                   From     Message
  ----     ------   ----                  ----     -------
  Warning  Failed   36m                   kubelet  Error: failed to start container "func-server": Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: rootfs_linux.go:60: mounting "/home/cce/kubelet/pods/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/volume-subpaths/user-config/func-server/1" to rootfs at "/home/cce/docker/overlay2/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/merged/data/user-config-template.yaml" caused: no such file or directory: unknown
  Warning  Failed   36m                   kubelet  Error: failed to start container "func-server": Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: rootfs_linux.go:60: mounting "/home/cce/kubelet/pods/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/volume-subpaths/user-config/func-server/1" to rootfs at "/home/cce/docker/overlay2/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/merged/data/user-config-template.yaml" caused: no such file or directory: unknown
  Warning  Failed   36m                   kubelet  Error: failed to start container "func-server": Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: rootfs_linux.go:60: mounting "/home/cce/kubelet/pods/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/volume-subpaths/user-config/func-server/1" to rootfs at "/home/cce/docker/overlay2/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/merged/data/user-config-template.yaml" caused: no such file or directory: unknown
  Normal   Created  35m (x5 over 118d)    kubelet  Created container func-server
  Warning  Failed   35m                   kubelet  Error: failed to start container "func-server": Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: rootfs_linux.go:60: mounting "/home/cce/kubelet/pods/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/volume-subpaths/user-config/func-server/1" to rootfs at "/home/cce/docker/overlay2/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/merged/data/user-config-template.yaml" caused: no such file or directory: unknown
  Normal   Pulled   33m (x6 over 118d)    kubelet  Container image "dataflux-func.com/dataflux-func:2.7.0" already present on machine
  Warning  BackOff  2m8s (x157 over 36m)  kubelet  Back-off restarting failed container

A Func service may have the following error:

Text Only
Traceback (most recent call last):
File "_config.py", line 11, in <module>
CONFIG = yaml_resource.load_config(os.path.join(BASE_PATH, './config.yaml'))
File "/usr/src/app/worker/utils/yaml_resources.py", line 83, in load_config
user_config_content = _f.read()
OSError: [Errno 5] Input/output error

This is not a DataFlux Func issue. Please check the host / k8s cluster. If NAS is involved, also check if there are any issues with the NAS.