故障排查 / 容器無法正常運行
1. Docker Stack 環境中容器反覆重啓
此問題一般是由於配置、防火牆、各種白名單配置不正確引起。
具體表現為:
- 使用瀏覽器無法打開頁面
- 使用
sudo docker ps -a
命令查看容器列表時,發現重啓在不斷重啓
- 在部署服務器本機使用
curl http://localhost:8088
返回 curl: (7) Failed to connect to localhost port 8088: Connection refused
錯誤
- 日誌文件中不斷輸出錯誤堆棧信息
可能原因及解決方案:
可能原因 |
解決方案 |
手工修改過配置但配置存在錯誤 |
檢查修改過的配置文件,檢查如 YAML 語法、數據庫鏈接信息是否正確 |
修改配置指定了外部服務器,但實際網絡不通 |
檢查防火牆、阿里雲安全組配置、數據庫鏈接白名單等配置 |
與操作系統存在兼容性問題 |
見下文 與操作系統存在兼容性問題 |
Redis 不支持當前系統的 page size |
見下文 Redis 不支持當前系統的 page size |
與操作系統存在兼容性問題
使用 docker logs {Server 容器 ID}
如發現以下,或類似報錯:
Text Only |
---|
| node[9]: ../src/node_platform.cc:61:std::unique_ptr<long unsigned int> node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Start(): Assertion `(0) == (uv_thread_create(t.get(), start_thread, this))' failed.
1: 0xb57f90 node::Abort() [node]
2: 0xb5800e [node]
3: 0xbc915e [node]
4: 0xbc9230 node::NodePlatform::NodePlatform(int, v8::TracingController*, v8::PageAllocator*) [node]
5: 0xb1b3d1 node::InitializeOncePerProcess(int, char**, node::InitializationSettingsFlags, node::ProcessFlags::Flags) [node]
6: 0xb1bc89 node::Start(int, char**) [node]
7: 0x7f2ca389fd90 [/lib/x86_64-linux-gnu/libc.so.6]
8: 0x7f2ca389fe40 __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6]
9: 0xa93f0e _start [node]
Aborted (core dumped)
|
這可能是由於當前操作系統 / 組件與 Docker 不兼容導致(如:DataFlux Func 2.x 附帶的是 Docker 20.10.8,在最新版本的操作系統上可能存在問題)
可以嘗試以下方法解決:
Redis 不支持當前系統的 page size
官方 Redis 鏡像在某些 ARM 版操作系統上啓動時,會發生 <jemalloc>: Unsupported system page size
錯誤,參見:
2. Docker Stack 環境中容器不存在
此問題一般是因為運行環境不正確引起。
具體表現為:
- 執行
sudo docker stack ls
可以看到 dataflux-func
- 執行
sudo docker ps -a
看不到對應容器
- 執行
sudo docker stack ps dataflux-func --no-trunc
,發現容器狀態不正常
可能原因及解決方案:
可能原因 |
解決方案 |
系統中安裝的是 Snap 版 Docker |
卸載 Snap 版 Docker,重新安裝官方途徑的 Docker,或使用腳本自帶的 Docker |
其他 |
可根據 sudo docker stack ps dataflux-func --no-trunc 中 ERROR 欄目排查 |
典型例子如 no space left on device
表示磁盤空間不足
3. k8s 環境中容器無法啓動
此問題一般是因為宿主機 / k8s 集羣問題引起。
k8s 中可能存在如下錯誤:
Text Only |
---|
| Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Failed 36m kubelet Error: failed to start container "func-server": Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: rootfs_linux.go:60: mounting "/home/cce/kubelet/pods/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/volume-subpaths/user-config/func-server/1" to rootfs at "/home/cce/docker/overlay2/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/merged/data/user-config-template.yaml" caused: no such file or directory: unknown
Warning Failed 36m kubelet Error: failed to start container "func-server": Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: rootfs_linux.go:60: mounting "/home/cce/kubelet/pods/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/volume-subpaths/user-config/func-server/1" to rootfs at "/home/cce/docker/overlay2/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/merged/data/user-config-template.yaml" caused: no such file or directory: unknown
Warning Failed 36m kubelet Error: failed to start container "func-server": Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: rootfs_linux.go:60: mounting "/home/cce/kubelet/pods/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/volume-subpaths/user-config/func-server/1" to rootfs at "/home/cce/docker/overlay2/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/merged/data/user-config-template.yaml" caused: no such file or directory: unknown
Normal Created 35m (x5 over 118d) kubelet Created container func-server
Warning Failed 35m kubelet Error: failed to start container "func-server": Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: rootfs_linux.go:60: mounting "/home/cce/kubelet/pods/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/volume-subpaths/user-config/func-server/1" to rootfs at "/home/cce/docker/overlay2/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/merged/data/user-config-template.yaml" caused: no such file or directory: unknown
Normal Pulled 33m (x6 over 118d) kubelet Container image "dataflux-func.com/dataflux-func:2.7.0" already present on machine
Warning BackOff 2m8s (x157 over 36m) kubelet Back-off restarting failed container
|
某個 Func 服務中可能存在如下錯誤:
Text Only |
---|
| Traceback (most recent call last):
File "_config.py", line 11, in <module>
CONFIG = yaml_resource.load_config(os.path.join(BASE_PATH, './config.yaml'))
File "/usr/src/app/worker/utils/yaml_resources.py", line 83, in load_config
user_config_content = _f.read()
OSError: [Errno 5] Input/output error
|
這不是 DataFlux Func 的問題,請檢查宿主機 / k8s 集羣,如涉及 NAS 的使用,也應檢查 NAS 是否存在問題。