(置顶,已完结)nju-pa 心得
2023/4/20 ~ 2024/5/4
6705 commits (2 commits per compilation and execution)
A year of persistence has finally come to an end.
背景(个人吐槽,可skip)
鉴于本校的专业课并不能让我学到多少东西,我开始思索自己与非科班的同学的技术水平是否还存在着区别(抑或是他们可能已经通过报班的方式已经超越了我们?例如Java
),而我自己的优势又在哪里。
回想起一年前秦院对我说过,从用人单位的角度来看,本院的学子的编程水平不如隔壁电科。当时我还对这句话半信半疑,但从现在的课程设计角度来看,这的确是不争的事实(或者说是必然的结果)。学院也确实作出了一些改进,比如说C
的在线OJ(只不过因为较为
aggressive 与排版问题饱受诟病),当然这还远远不够。
从提升个人能力的角度来看,留给我的时间已经不多了,我必须摒弃一切与这个目标相悖的杂事(按优先级看依次是综测、大创、各种竞赛(包括低质量的 CTF ),最后是 GPA(自己的优势有时也是弱点)),以给自己留出尽可能多的时间来学习真正与我的方向相关的,或者是 fundamental 的东西。
关于 OS 这门课,学院的理论课只能说不算差,以应试为主。与此相比,实验课就处在一个十分尴尬的地位,具体理由如下:
- 没有先导课程:缺少对
linux
基础的讲授和git
的使用教程,这些东西在我完成nachOS
实验的过程中极大地提升了工作效率。为什么这么说呢,因为肯定有同学还是在通过注释之前几个 lab 代码的方式(或者重新 copy 原始的 source )来写当前 lab 的代码,懂的都懂。 - 过分降低了难度:目前 OS 实验课的方式是结合了 PPT
讲义与演示视频的形式,其中演示视频不可避免的会展示一些
source code
,而同学用手机拍摄/录制这段内容是无法避免的,从而实验的难度降格为在不同的 source 中补全部分代码,我们便丧失了RTFSC
的能力。这样做的后果就是:轻则不理解nachOS
的整体架构,重则无法回答梁刚老师在lab8
提出的那几个稍微RTFSC
就能回答的问题。 - 抄袭问题严重:人都是有惰性的,
nachOS
是一个陈旧的项目,其中许多 lab 的答案随便一搜就能找到,我其实也抄过。
why do I want to be a masochist (by doing PA)
simple, because I enjoy this
Being introduced by
Tiger1218
,nju_pa
is absolutely a great course. In compare withnand2tetris
I previously finished, it is more hard-core but a more smooth learning curve.I have no more time, I need to acquire more information in a rather short period of time. High information density means high difficulty. Therefore, keeping in touch with something challenging is unavoidable.
In academia, having a deeper understanding of ISA & OS benefits to further research. In engineering, praticing coding skills makes me more competent in both major or non-major students in CS field.
pa0
I've already used Linux and built workflow for some time. So I just
installed neovim
and clone the source.
Learned some useful git commands like git branch
,
git checkout
.
The Missing Semester of Your CS Education is a good course, bookmarked.
pa1
1.1
At first I was dumbfounded. Copilot gives some code suggestions, which makes me quickly understand what I need to do. Actually, it is quite easy.
1.2
Several months ago I learned regex and I forgot it. It took me 30min
to learn it again. Actually the tokenize
step is much
easier than compiler section of nand2tetris
.
Copilot helped me quickly finished the structure of eval
function, but it made a mistake when finding the dominant operator and I
spent several hours debugging this.
When it comes to modifying sdb.c
to test a batch of
expressions. I mistyped the path to my input file (btw, copilot
suggested the path of yzh
's project, which is a privacy
issue). At first I don't know I can enable debug info in
menuconfig
, and static
functions increased the
difficulty analyzing the assembly instructions when using
gdb
. Therefore, it took me nearly an hour to debug
this.
Also, I had a hard time tackling the floating point exception (div by
0) in expression generator. My idea is compile and run it, while
redirecting exceptions to stderr. If
grep exception stderr_file
doesn't return 0
,
we think the expression is valid. However there are still some
exceptions printed in my stdout_file
, finally I've to use
another command to filter the output.
1 | perl -pe 's/Floating\ point\ exception\n//g' stdout_file > final_input |
1.3
Expanding the eval
function is not very hard, one
important point is to change a condition to tackle unary operator (like
*
and -
).
Implementing watchpoint pool is just some basic linklist operations, copilot did a good job.
However, copilot made a big mistake implementing watchpoint itself,
it messed the return value of check_wp
. I spent several
hours again debugging this.
pa2
2.1
Understanding the design of risc-v is tough at first, copilot even
decreased my proficiency by 20%, but when I found a book named
RISC-V-Reader-Chinese-v2p1.pdf
, things got better. It is
actually just some repetitive work.
However there are still something requires patiece and you need to be careful especially when tackling opcodes which entail type conversion. It took me about an hour to debug again.
2.2
It's all about fundamental utilities again.
iringbuf
, mtrace
is quite easy, but
ftrace
takes a very long time, including these steps:
- spend a little time to parse args, but failed to find a way to add
this new feature to
Makefile
- spend some time to RTFM
man 5 elf
- spend a lot of time to distinguish
call
andret
step fromjal
andjalr
opcodes, especially to acertain if I did it correctly because difftesting this is not a easy task. (finally I think it is not very important, maybe a waste of time?)
Successfully find some bugs in strcpy
and
sprintf
by testcases from Copilot X
.
Writing differential testing is easy and returns a lot, not quite understand why it isn't compulsory.
There is a correspondence in batch tests from a chapter ago:
通过批处理模式运行NEMU
我们知道, 大部分同学很可能会这么想: 反正我不阅读Makefile, 老师助教也不知道, 总觉得不看也无所谓.
所以在这里我们加一道必做题: 我们之前启动NEMU的时候, 每次都需要手动键入c才能运行客户程序. 但如果不是为了使用NEMU中的sdb, 我们其实可以节省c的键入. NEMU中实现了一个批处理模式, 可以在启动NEMU之后直接运行客户程序. 请你阅读NEMU的代码并合适地修改Makefile, 使得通过AM的Makefile可以默认启动批处理模式的NEMU.
你现在仍然可以跳过这道必做题, 但很快你就会感到不那么方便了.
Actually I found it not convenient here, so I get back to this chapter and add it :(
2.3
The most painful step is debug the problem of system clock. After
finishing AM_TIMER_UPTIME
, first I use my laptop
i7-6700HQ @ 2.60GHz
to run performance test. However, it
runs extremely slow (for microbench
, it took over
an hour to finish and only got 12 points). So first I try to
find out why it runs in such a low efficiency for 2 or 3 days without
success.
Occasionally I copied my code to another desktop
i7-6700 @ 3.40GHz
and run the performance test again. This
time there is a floating point exception. I checked the formula for
performance score and found the problem is AM_TIMER_UPTIME
register hasn't been updated for each iteration. With knowing this, I
quickly fixed the bug puzzled me for serveral days.
The next problem is AM_GPU_FBDRAW
module. First I
finished it and it seems no problem in video test. However in
fceux-am
the graphics cannot display properly, just like
this:
To solve this problem, I enabled
differential testing
(difftest) and ftrace
. The
debug information shows the differs start at memcpy
in my
AM_GPU_FBDRAW
function. However, the diff position varies
when I run each time, which bothers me a lot. Occasionally, I deleted my
original memcpy
function
1 | memcpy(&fb[(y + i) * W + x], ctl->pixels + i * w, w * 4); |
to this:
1 | memcpy(&fb[(y + i) * W + x], ctl->pixels, w * 4); |
The graphics turn to whole blue. This assures me this bug relies on
the second argument of this function. With the help of
tiger1218
(I feel so sorry about that, I could solve this
problem all by myself), I realized that ctl->pixels
is a
void
pointer, it should address in bytes,
not in DWORD, which solves it.
This story hasn't come to an end. After fixing this bug, the
difftest
problem still exists. I tried to run other tests
in order to find some information helpful for debugging. During this
period, I also fixed some other minor bugs such as blackscreen of
slider
(due to the boundary isn't set properly in
AM_GPU_FBDRAW
), program crashes when showing help message
in am-tests
(because %c
is not implemented in
my own library).
The real solution comes when I give up to work on pa3. I run
nanos-lite
and the program crashes again. This time I
manually set panic
breakpoint in different parts of
main.c
and found it crashes when printing the logo. I
quickly realized the problem lies that the buffer is not big enough
(1024 failed, 16384 is maybe enough), which also solves
difftest
problem.
I also wanted to work on sound driver. However, this requires
cross-compilation of SDL2 library. I spent half an afternoon on this and
failed. Tiger1218
tried to help me but quickly lost
interest. He thinks this part is not very essential to whole PA. Maybe
I'll finish sound driver when I have more time.
However, difftest
failed to work after adjusting the
buffer of print
several days later, and I haven't fixed it
again till now.
pa3
3.1
After the final exam, I continued to work out pa3.1.
I stuck at yield()
for some days, for I have to read
RISCV-manual, figure out execption trace and where to implement
isa_raise_intr()
. However if you did it, the rest tasks are
much easier.
Although initialized mstatus
to 0x1800
,
difftest
still not able to work. I feel like giving up
using it.
3.2
Because I've RTFSC for several days in 3.1, finishing 3.2 is just a
piece of cake. I just stuck at printf
output for several
hours (it only prints H
for each line). Finally I found
that I forgot to make the whole directory of navy-apps
.
3.3
PA3.3 contains a lot of work, the workload is about 30% of the code you need to write from PA1 to PA3. Moreover, as the system getting more and more complex, the time of debugging also increases. Actually I took 17 days to finish this chapter.
The work can be concluded in 3 parts: the VFS, NDL library, SDL library and corresponding applications. Here are some bugs that I struggled for a long time.
- segmentation fault after
fclose
infile-test
: first I thought there is something wrong in_free_r
, however I'm not familiar with the code in system library. It is daunting to debugging this. So I tried to modify thefile-test.c
and observed segmentation fault has something to do withfscanf
. I suspected there was a buffer overflow but without proof. At last I found the problem was in_sbrk
which I written myself. menu
does not display correctly likemario
before: this time I didn't make thememcpy
mistake. However, I didn't figure out the relationship betweenwidth
andheight
ofcanvas
andscreen
. Also, I tackled the corner cases of SDL APIs incorrectly, which results jumbled output.- segmentation fault when entering the battle in
PAL
: I wanted to save time because usingftrace
to find the backtrace is slow. So I used the traditional "print" method. Acutally the calling stack is a little longer than I expected (about 5 or 6 layers) and it took me even more time. Finally this call trace points toSDL_FillRect
which written by myself again. And I found I didn't tackle the 8-bit color case (at first I added the fallback, but at sometime I think it was unnecessary and I deleted it) and the bound of pixel-copy procedure is incorrect, which caused my whole-day debugging.
Here is a screenshot of PAL in battle mode (I didn't use riscv32-nemu to take a screenshot for its extremely slowness):
pa4
4.1
According to ysyx, I need to finish rt-thread
first. It
took 9 days to finish it (from 11/28/2023 to 12/6/2023). After that I
was preparing for the experiment for a paper and restarted to work on
4/2/2024. Finally finished pa4.1 on 4/6/2024.
Here I just list some bugs I encountered:
rt-thread
does not work: The problem is the migration ofabstract-machine
, I restored the compile environment of it and it works.rt-thread
on NPC:- I have to
fflush(stdout)
to make the output visible. - Forgetting to modify
riscv.h
inabstract-machine
to make context-switching work.
- I have to
execve
with args: The return value of declaration insyscall.c
mistyped intovoid
, the correct one should beContext *
.execve
with args not working onpal
: Forgetting to copy theargv
andenvp
string to the user stack(Yes, only copy the pointer is not enough), which causes the content ofargv
andenvp
overwritten by the content ofpal
.
4.2
Between 4/9/2024 and 4/16/2024, I mainly spending time finishing my paper. After that, I continue to work on the rest of PA4. PA4.2 is mainly about paging mechanism, here are some points that worth mentioning:
- You will need RISCV
manual (privileged version) to understand the paging mechanism in
SV32 and the usage of
cpu.satp
register. The content in ChatGPT is not always reliable. - I forgot to dereference
as.area.end
pointer, which causes that some content of pages are overlapped and results in hard-to-resolve bugs. - Not having enough testing makes it more difficult to resolve bugs in PA4.3.
I finished PA4.2 in 4/26/2024.
4.3
This holiday I made a promise to finish the whole PA4 before
returning to school and I successfully achieved this. I first try to
finish the preemptive process scheduling. However, I previously
mentioned there are still some bugs in PA4.2 that hadn't been resolved.
Of course, preemptive scheduling makes the system, finite state machine
in essence, unpredictable and much harder to debug. Therefore, I
temporarily gave up finishing this part and started to work on stack
switching instead. Of course, I still came up with the same bugs as
well. To exclude possible factors, I created a new branch in
git
to do controlled experiments for these factors. I
finally found these bugs:
mm_brk
does not verify whether the memory is virtual memory.mm_brk
is not fully aligned to the page.
A very useful debugging tip is that when you want to memset a range to raw memory, you'd better choose a special value. If the program crashes here later on, it is much easier to locate the bug, instead of being obsessed with a random address.
And I finally understand the reason why yzh says PA4.2 is the most difficult part. The answer is, there are really a bunch of details needed to care about.
Another bug is because of my carelessness when translating the pseudo C code into x86 assembly:
- forgot to zero
mscratch
inam_asm_trap
.
After working out stack switching, the switching function of
foreground program is easy to implement. I went back to finish the
preemptive part. First, the program never reaches IRQ_TIMER
part, the reason is
- I didn't assign
cpu.pc
tocpu.mtvec
merely.
However the program still crashes after running for some time. And then I revised the exception handling procedure in PA3, and finally figured out
- I didn't assign
cpu.mepc
tocpu.pc
, either.
After fixing, the problem is finally solved. And the story of whole PA finally came to an end in 5/4/2024.
“一生一芯”C阶段项目感言(Updated on 2025/1/13)
博客中有写,我是从 2023 年的 4 月 20 号,也就是大二下学期开始有了 PA 的 第一行 git log,其实那个时候我就知道我们学校的专业课并不能让自己学到多少东西,便开始思索自己与非科班的同学差别在哪里。与此同时,周围的同学开始逐渐进入实验室并开始进行科研训练。而我始终坚信一个原则:本科是用来探索自己兴趣、提升能力、开阔自己眼界的阶段,其实没有太多必要马上就接触到学科的前沿开始进行 trial and error 的活动。
当然,我也理解“那些早早进入实验室的同学”的目的——或许有真正对科研感兴趣的,但大多数还是保研/留学 oriented behavior。这也是大环境“味道”逐渐变坏所导致的必然现象,因此我虽然在学校的成绩可以排在前5%左右,但有时也会被领导“发一篇C会就跑到你前面去了”这样的言论所影响,导致自己并不能全身心地沉下来写好每一行代码。尽管如此,我还是顶着升学的压力,继续在让我的 git log 延续下去。
最坏的事情发生在大三下学期,我的一位同学 L 拉着我和另一位同学 W 参加 OS 内核挑战赛,尽管我们最后拿到了全国三等奖的好成绩。这本来是一件令我高兴的事情,但我被 W 同学极强的架构能力所折服(他从零开始,在没有任何参考资料的情况下手搓了一个满足比赛条件的 OS),虽然我在大学期间一直没有荒废自己从高中开始锻炼的代码能力,但毕竟自己经过了四五年的训练,却还是连望其项背都做不到,这给我带来了不小的打击。
但其实自己也不是完全没有进步,作为一名高二退役的OI选手,在大学刚开始时根本怎么用 python 写过程序。而在大三下学期被逼参加信息安全作品赛时,也能在一个月写出几千行的 python 代码,甚至在 7 天之内写出一篇对应的论文用于“交差了事”。
虽然,这篇论文最后被拒了,原因是 writing 太烂,但这也至少证明了自己具有了“中等水平”同学的能力,不是吗?而且之后还有再投的机会。随后就是我们的密码学老师再课上讲了一遍 GMW 协议的执行流程,我因为对此感兴趣,便花了两三天的时间将这个流程用代码实现了一遍,这位老师很赏识我的代码,而我也觉得她的性格人品也很不错——于是她成为了我的研究生导师。
于是,我慢慢思考出了文章开头那个问题“与非科班的同学差别在哪里”的答案——你是一个人,而不是工具。我是时候应该从锻炼能力的阶段,逐渐过渡到用自己现有的能力,去为自己兴趣而奋斗的阶段了。另外,我大概也知道自己更加适合 research,而不是 engineering。
时隔一年之后,再次拾起自己所写的代码——感觉自己怎么写得这么烂?但毕竟自己是个单线程的人,为了追求兴趣,也许不得不放下一些执念。为了一个特定的目标去完成一些事情只能使我更加焦躁,从而会在屎山上堆上更多的屎。
所以,可能以后,有缘再会吧。