Go homepage(回首页)
Upload pictures (上传图片)
Write articles (发文字帖)

The author:(作者)
published in(发表于) 2013/11/12 12:35:10
Programmer’s most memorable Bug debugging experience

Programmers debugging experience most memorable Bug-Bug programmers-IT information Programmer's most memorable Bug debugging experience

Believe that every programmer has had a hard time Bug debugging experience, programmers hear my program a Bug, there will be all kinds of funny reactions, we can move to see "programmer encounters a Bug 30 common response".


At present, the famous community questions and answers site Quora appeared on a hot discussion: What's the hardest bug you've debugged? (What is the Bug you have debugged the hardest? ) A lot of programmers in a comment below, or most memorable Bug debugged their most painful experiences to share, I discussed content, compiled two programmers to answer. I wonder if everyone has had the same experience.


Dave Baggett: code, hardware, whose fault?


ITA Software co-founder Dave Baggett, founder, inky.com website share their painful debugging experiences, struggling after repeated tests, and discover is the fault of the hardware.


I wrote a memory card for Crash Bandicoot (load/save) code, took me 6 weeks to stop this code for debugging purposes, during this time I also did some other stuff, but still will come to debug the Bug--may be a couple of hours to days.


Every time you save the process, the program will always access the memory card, and working properly. Just a short writing, often corrupt the memory card. Players will keep some things, and it will not be saved, but also to corrupt the memory card.


Obviously, we can't skip this Bug, even after 6 weeks, I still haven't found any clues. After that, by Connie PS1 developer content on the Forum, see if anyone encountered a similar problem? The answer is no.


When the programmer cannot do anything when the only thing to do is to divide and conquer the code, continue to troubleshoot the error, eliminates errors, until the last remaining very small piece, then to chew on the problem, but unfortunately often this way, after ruling out many errors, the Bug is still there.


In the process, the challenge we face is if you delete code, how to make the game to run this? We need to replace the entire module, and simulating the real thing, but in reality, it is not up to much. Code you have to write a new framework to keep the whole run of the game, this is a slow and painful process.


I kept removing more and more of the code until only startup code – is just starting the system initialization and rendering hardware, and so on. Of course, load/save menu will be provided, that's for sure, because I have removed all of the graphics code. But I can assume that users are using (invisible) load/save operation, and ask what do you want to save, and then write on the card.


Finally, there is only very little code at work, but the Bug is still there, most of the time works just fine, but every once in a while, it will fail. Almost all the Crash code has been removed, but still error too ridiculous, really didn't do anything the rest of the code.


At this point, probably around 3 o'clock in the morning, an idea flashed in my mind. Reads and writes (I/O) operations require precise timing. No matter whether you're dealing with a hard drive, Flash memory cards, Bluetooth transmitter – no matter what, the underlying code to read and write operations must be performed according to a precise point in time.


Clock cycle which allows a hardware device does not connect directly to CPU--and let the code synchronized with the CPU. Clock determine Baud Rate--the rate at which data is sent from one end to the other end. If the time is messed up, the hardware or software may be problematic, it is really too bad, but under normal circumstances, can also lead to data corruption.


If we are prepared to set time code clutter, what about what situations will happen on a regular basis? I reviewed the code changes associated with time, programmable time clock is set noting PS1 to 1kHz (1000ticks/second), which is quite fast, most games are usually set in the 100Hz.


As time went on, I kept testing procedures, return to Crash blocks, load/save and modify code, during a visit to the memory card, programming time can be adjusted on a regular basis to the default setting (100Hz), and then broadcast them 1kHz, we never see read/write problems.


Why, I kept on thinking and testing, and continues to adjust the time. One day, I suddenly imperceptibly observed two things, and is fairly easy to reproduce: starting, steering controllers, memory card memory card damaged. It's like a hardware Bug.


Later, I find the design before in PS1 hardware engineers, Connie, and told her of the problems discovered, she replied: "no way", and we are engaged in a controversy, I would like to test showed her on the spot, but she thought it was a waste of time, and she was busy scrambling for new projects. The next day, she apologized to me, and tell me, is indeed a hardware Bug.



Source: Journal of the programmer


Amir Memon: difficult to reproduce Bug


Amir Memon is a iOS software engineer, he shared a related Bug was difficult to reproduce the debugging experience.


A few years ago, Microsoft and Mozilla Flash Player crash phenomenon had been reported, and we were unable to reproduce this crash, we would like to know where to crash from the log, but meaningless. Later, we learned that the same error, there are a few crashes record points to a different line of code.


Finally, a great quality in our team engineers tracked down this crash, and is fairly reliably reproduce the crash steps proved only occurs only when you use the slower hard drive.


The crash occurred only after being destroyed in a video, Flash Player destruction sequence (for example, in some cases, when you navigate to another page). Video file streams had not been cleared in a timely manner, with thread synchronization issues exposed.


Is this Bug, mostly because it is difficult to reproduce within one system, in fact, the program crashes where there are many multithreading issues. Finally, I was able to repair the problem, and is very popular. Has prevented millions of crashes occur.


Distinguished programmer, you come across one of the most memorable/interesting debugging experience is what? Or what do you think was the most difficult time debug Bug? May wish to share.


(

程序员最难忘的Bug调试经历 - Bug,程序员 - IT资讯
程序员最难忘的Bug调试经历

相信每位程序员都有过一段不堪回首地Bug调试经历,程序员一听到自己的程序有Bug,会有各种搞笑的反应,大家可以移步去看看“程序员遇到Bug后的30种常见反应”。


目前,著名的社区问答网站Quora上出现一个很火的讨论:What's the hardest bug you've debugged?(你调试过最难的Bug是什么?)很多程序员在下面留言,把自己最痛苦或者最难忘的Bug调试经历分享给大家,笔者就所讨论的内容,整理了两位程序员的回答。不知大家是否有过同样的经历。


Dave Baggett:代码、硬件谁之过?


ITA Software联合创始人、inky.com网站创始人Dave Baggett分享了自己痛苦的调试经历,在苦苦反复测试代码后,竟然发现是硬件惹的祸。


我为Crash Bandicoot写了内存卡(加载/保存)代码,我花了6周时间才停止对这段代码进行调试,在这段时间里我还做了些其它东西,但还是会过来调试这个Bug——可能是几个小时、几天。


在每次保存进程的时候,程序就会一直访问内存卡,并且进行正常工作。仅仅一个短暂的写入操作,常常会破坏内存卡。玩家会保存一些东西,而它不但不会去保存,反而还会去破坏内存卡。


显然,我们不能跳过这个Bug,即使6周后,我仍然也没发现任何线索。后来,我们通过Connie把内容放到了PS1开发者论坛上,看是否有人遇到过类似问题?答案是否定的。


当程序员无计可施的时候,唯一能做的是对代码进行分而治之,不断地去排查错误,消灭错误,直到最后剩下非常小的一块,再去慢慢研究问题所在,可不幸往往就这样,在排除了许多错误以后,该Bug还是会出现。


在这个过程中,我们所面临的挑战是如果把代码删除了,该如何让游戏继续运行下去?我们需要替换整个模块,并且模拟一些真实的东西,但实际上,这并未起到太大作用。你必须要编写新的架构代码来保持整个游戏的运行,这是慢且痛苦的过程。


我不停的移除越来越多的代码,直到只剩下启动代码——仅仅是启动这个系统并且初始化渲染硬件等。当然,这是肯定不会提供加载/保存菜单,因为我已经移除了所有的图形代码。不过我可以假设用户在使用(不可见)加载/保存操作,并且询问是否保存,最后再写入卡上。


最后,只剩下极少的代码在工作,但Bug仍然存在,大多数时候都可以正常工作,但每隔一段时间,它就会失败。几乎所有的Crash代码都被移除了,但仍然发生错误,太莫名其妙了,剩下的代码真的没有做任何事情。


此时,大概上午3点左右,一个念头在脑海一闪而过。读和写(I/O)操作需要精确定时。无论你是否在处理一个硬盘、闪存卡、蓝牙发射机——不管怎样,底层代码的读写操作必须要根据精确的时间点来执行。


时钟周期让硬件设备可以不直接连接到CPU——并且让代码运行与CPU保持同步。时钟决定Baud Rate——数据从一端发送到另一端的速率。如果时间混乱了,那么硬件或软件都有可能会有问题,这真的是太糟糕了,而且在通常情况下,还会导致数据损坏。


倘若我们编写的设置时间的代码混乱了,那么定时会发生怎样的情形呢?我再次查看了与时间相关的测试代码,注意到设置的可编程时间定时是PS1到1kHz(1000ticks/second),这是相当快速的,大多数游戏一般都是设置在100Hz这样。


随着时间的推移,我不停地测试程序,回到Crash代码块,并且修改加载/保存代码,在访问内存卡之前,把可编程时间定时调整到默认设置(100Hz),然后再将其调制1kHz,我们再也没看到读/写问题。


为什么,我反复地思考和测试,并且不停地调整时间。有一天,我突然细微地观察到了两个东西,并且很容易重现:开始写内存卡、操纵控制器、存储卡损坏。这看起来就是个硬件Bug。


后来,我找到曾经设计过PS1的硬件工程师Connie,并且把发现的问题告诉她,她回答:“不可能”,并且我们进行了争论,我想当场测试给她看,但是她觉得是在浪费时间,并且她为新项目忙地焦头烂额。第二天,她向我道歉,并且告诉我,的确是一个硬件Bug。



图片来源:《程序员杂志》


Amir Memon:难以重现的Bug


Amir Memon是一名iOS软件工程师,他分享了一个有关Bug难以重现的调试经历。


几年前,微软和Mozilla曾报道过Flash Player会出现崩溃的现象,然后我们却无法重现这个崩溃,我们想从日志中知道在哪里崩溃,但毫无意义。后来,我们才知道,出于同样的错误,有几个崩溃记录指向了不同的代码行。


最后,我们团队里的一个很棒的质量工程师追捕到了这个崩溃所在,并且能够相当可靠地重现崩溃步骤,事实证明,只有在使用慢的硬盘驱动时才会发生。


该崩溃只会发生在一个视频被销毁后,Flash播放器破坏序列(比如在某些情况下,当你导航到另一个页面的时候)。视频文件流没有得到及时清除,随之线程同步问题暴露出来。


之所以会提这个Bug,主要是因为它很难在一个系统内部重现,事实上,在程序崩溃的地方存在不少的多线程问题。最后,我修复了这个问题,并且很受欢迎。它阻止了成千上百万个崩溃的发生。


各位程序员,你们遇到过最难忘/有趣的一次调试经历是什么?或者你认为最难的一次调试Bug是什么呢?不妨分享一下哦。


)


If you have any requirements, please contact webmaster。(如果有什么要求,请联系站长)





QQ:154298438
QQ:417480759