azure
January 13, 2025, 2:25am
1
源码地址:
https://github.com/OpenMathLib/OpenBLAS/releases/download/v0.3.29/OpenBLAS-0.3.29.tar.gz
编译:展开后进入目录,用"CC=gcc FC=gfortran make"命令进行编译。
希望测试CPU: 3A5000/3A6000/3C6000。
编译器:gcc/gfortran 14
系统:AOSC, Debian sid等新世界均可。
目前发现情况:我的3A6000在编译时,ctest的cblas_ssymv/cblas_dsymv测试出现"FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE"错误(可进入ctest子目录中用"./xscblat2 < sin2"或"./xdcblat2 < din2"命令进行检验),但3C5000L-LL可能没事。
azure
January 13, 2025, 8:42am
3
在oma topics里面已经可以看到openblas 0.3.29了。请问编译所使用的CPU?在编译中是否遇到了一楼所列的编译错误?
白铭骢
January 13, 2025, 9:01am
4
我们用的是 LOONGSON3A
,目前我正委托 xen0n 和 xry111 协助调查,目前发现问题只能在 6000 (LA664) 系列处理器上复现
azure
January 13, 2025, 9:10am
5
多谢回复!另外6000 (LA664) 系列用cmake编译openblas 0.3.29是没问题的。openblas的maintainer说"cmake uses O3 by design in Release mode, and no optimization option if no mode was specified (IIRC). Maybe this is enough to cause the difference in the loongson backend, there were a few cases in the past where I had to disable optimization by a pragma in one of the ctest sources to get rid of spurious test failures."
OpenMathLib:develop
← XiWeiGu:la64_update_symv
opened 08:40AM - 10 Jan 25 UTC
Improve the performance of the {s/d}symv interface with LASX optimization when I… NCX=1 and INCY=1.

白铭骢
January 13, 2025, 2:25pm
6
这个我表示怀疑(您有没有验证过?最好补充下记录),不过刚刚席同学提交了一个修复,能麻烦您方便时验证下吗:
OpenMathLib:develop
← xry111:xry111/lasx-la664
已打开 02:18PM - 13 Jan 25 UTC
"fmov.d $f2, $f4" leaves all the bits higher than the 63-th bit unpredictable bu… t it's obvious that the following code uses the value of those high bits. We actually want to replicate the lower 64 bits here, so we should use xvreplve0.d instead.
LA464 (Loongson 3[A-Z]-5000) happens to replicate them for us due to some uarch internal details so the issue was not detected, but for LA664 (Loongson 3[A-Z]-6000) and future uarch we need to do things correctly or we end up getting a lot of test failures.
Closes: https://bbs.aosc.io/t/topic/302
azure
January 13, 2025, 4:17pm
7
cmake我是验证过的,编译顺利通过。席同学的修复确实真正解决了问题,多谢了!
xry111
January 13, 2025, 4:46pm
8
cmake我试了一下,貌似出问题的代码没有被编译,而是编译了通用代码,可能是openblas的cmake文件需要人工指定目标架构或者尚不支持龙架构专属代码,明天再仔细看…
1 Like
azure
January 14, 2025, 2:07am
9
其实openblas里还有一个类似的情况:utest子目录里的test_potrs.c,四楼里openblas的maintainer曾提到。
这个修复:Use fld.d/fst.d in PROLOGUE/EPILOGUE in LOONGSON3R5 GEMM by martin-frbg · Pull Request #4881 · OpenMathLib/OpenBLAS · GitHub
似乎对5000系列有效,应用后可以去掉test_potrs.c文件头处的pragma,但对6000系列没有效果(个人测试了3A6000)。因此现在的utest/test_potrs.c文件头处还有pragma。
这里cmake编译也是没有问题的,不过openblas的cmake编译可能还不完善。
xry111
January 18, 2025, 2:17pm
10
但是我在 3A6000 上直接注释掉那个 pragma,结果没出现任何问题。
突然发现我机器上没有 gfortran