Difference between revisions of "User:Bluddy"

From ScummVM :: Wiki
Jump to navigation Jump to search
Line 22: Line 22:
** Timidity
** Timidity
*** A lot of work. May improve speed over midi, but it'll be the last optimization.
*** A lot of work. May improve speed over midi, but it'll be the last optimization.
** MP3 using Media Engine
** MP3 using Media Engine - partly done.
*** I made a hack that'll do it, but unfortunately nobody knows how to use it for 22kHz mp3s, which is all of our MP3s. This means I need to reverse engineer the PSP FW in order to figure out how to do this properly (the PSP FW knows how to use the ME for 22 kHz files). So this is going to be hard too.
*** I made a hack that'll do it, but unfortunately nobody knows how to use it for 22kHz mp3s, which is all of our MP3s. This means I need to reverse engineer the PSP FW in order to figure out how to do this properly (the PSP FW knows how to use the ME for 22 kHz files). So this is going to be hard too.
*** Check if cooleyes' sirens2 program can play 22kHz files properly. If it can, we're doing something wrong.
*** Check if cooleyes' sirens2 program can play 22kHz files properly. If it can, we're doing something wrong.
**** Can't compile the darn thing!
**** Can't compile the darn thing!
*** Another program is supposed to be able to handle it. Check it out. (ps2forum post)
*** Another program is supposed to be able to handle it. Check it out. (ps2forum post)
**** Update: yes! It works.
** Video Speedup
** Video Speedup
*** Smush can be sped up for aligned platforms, and VFPU cache can be used in PSP
*** Smush can be sped up for aligned platforms, and VFPU cache can be used just for PSP
*** Check other codecs for possible speedups using VFPU/alignment
*** Check other codecs for possible speedups using VFPU/alignment
** Tests
** Tests
Line 35: Line 36:
*** how long does retrieving thread priority take? 5-7us.
*** how long does retrieving thread priority take? 5-7us.
*** how long do different length MS reads take? done. 1B = 1KB = 2.5ms. 2KB = 3.5ms. 10Kb = 10ms. The more, the more efficient.
*** how long do different length MS reads take? done. 1B = 1KB = 2.5ms. 2KB = 3.5ms. 10Kb = 10ms. The more, the more efficient.
**** Wrong. Fread already caches reads. I was testing it wrong (reopening files). 1b = 5us. 1KB = 1ms. 2KB = 2ms. There's a weird gap between 6-9KB where the time stays the same. Not sure. In general, no advantage to reading more.
**** Wrong. fread already caches reads. I was testing it wrong (reopening files). 1b = 5us. 1KB = 1ms. 2KB = 2ms. There's a weird gap between 6-9KB where the time stays the same. Not sure. In general, no advantage to reading more.
*** how much reading is done by MP3 rendering/movie playback? How much will we need to cache? done. Reads chunks of 15-25KB preceded by small 4KB reads. Whole MP3 loads are a problem (200KB+).
*** how much reading is done by MP3 rendering/movie playback? How much will we need to cache? done. Reads chunks of 15-25KB preceded by small 4KB reads. Whole MP3 loads are a problem (200KB+).
*** Check fseek's time. About 1ms normally. To the end of the file is much more - 3 to 7ms.
*** Check fseek's time. About 1ms normally. To the end of the file is much more - 3 to 7ms.
Line 43: Line 44:
** Caching of stream
** Caching of stream
*** Implement cache that reads while other threads don't (mutex) and fills up memory with file data. First stage will be like PS2: just read ahead. Done.  
*** Implement cache that reads while other threads don't (mutex) and fills up memory with file data. First stage will be like PS2: just read ahead. Done.  
**** Actually with the new tests this is useless. It will be useful though to have other-thread read ahead cache.
**** Actually with the new tests this is useless. It will be useful though for other-thread read ahead cache.  
*** Play with cache sizes to find the best default one. Nope. Get rid of basic cache.
** No More SDL!
** No More SDL!
*** Improve SDL audio output. done.
*** Improve SDL audio output. done.

Revision as of 12:00, 6 June 2010

Bluddy
Name Yotam Barnoy
Team Member since 2009-09-22
Working on PSP platform
PSP Optimization
Personal webpage/BLOG -
Email -

Worked On

  • PSP
    • Suspend/resume support
    • Plugin support (ELF loader)
    • Console-oriented virtual keyboard
    • D-pad directional support
    • Eliminating the evil undead flickering bug (it was a tough one)
    • Refactoring, redesign and cleanup

Working On

  • PSP Optimization
    • Rendering speedup - done
      • Made the wait for vsync not slow down the main thread, by using callbacks. Turns out the callback mechanism that's tied to the GU is unable to call waitForVblank, but the regular callback mechanism can. This reduced the wait from 15ms on average to 3ms, part of which is due to the fact that by notifying the callback I think we switch context to another thread. Not a huge deal.
      • Also, because the callback is in highest priority, it's super accurate. We get pefect vsync now, which we didn't before (see white fadeout in FOTAQ intro)
    • Timidity
      • A lot of work. May improve speed over midi, but it'll be the last optimization.
    • MP3 using Media Engine - partly done.
      • I made a hack that'll do it, but unfortunately nobody knows how to use it for 22kHz mp3s, which is all of our MP3s. This means I need to reverse engineer the PSP FW in order to figure out how to do this properly (the PSP FW knows how to use the ME for 22 kHz files). So this is going to be hard too.
      • Check if cooleyes' sirens2 program can play 22kHz files properly. If it can, we're doing something wrong.
        • Can't compile the darn thing!
      • Another program is supposed to be able to handle it. Check it out. (ps2forum post)
        • Update: yes! It works.
    • Video Speedup
      • Smush can be sped up for aligned platforms, and VFPU cache can be used just for PSP
      • Check other codecs for possible speedups using VFPU/alignment
    • Tests
      • cached vs uncached access in memory
      • how long does changing thread priority take? 9-10us. Negiligible.
      • how long does retrieving thread priority take? 5-7us.
      • how long do different length MS reads take? done. 1B = 1KB = 2.5ms. 2KB = 3.5ms. 10Kb = 10ms. The more, the more efficient.
        • Wrong. fread already caches reads. I was testing it wrong (reopening files). 1b = 5us. 1KB = 1ms. 2KB = 2ms. There's a weird gap between 6-9KB where the time stays the same. Not sure. In general, no advantage to reading more.
      • how much reading is done by MP3 rendering/movie playback? How much will we need to cache? done. Reads chunks of 15-25KB preceded by small 4KB reads. Whole MP3 loads are a problem (200KB+).
      • Check fseek's time. About 1ms normally. To the end of the file is much more - 3 to 7ms.
      • how many ticks per second? Can we do it more efficiently than SDL? done. 1,000,000. Yes and we did. Getting time struct is wasteful. Went down from 9-14us to 1-2us.
    • Improve memcpy: alignment, rotation
      • Possibly use VFPU's cache for even better performance.
    • Caching of stream
      • Implement cache that reads while other threads don't (mutex) and fills up memory with file data. First stage will be like PS2: just read ahead. Done.
        • Actually with the new tests this is useless. It will be useful though for other-thread read ahead cache.
    • No More SDL!
      • Improve SDL audio output. done.
        • SDL blocks when outputting audio. This is the thread we do most work in, so don't block.
          • Unfortunately, it wasn't the kind of blocking that the name of the function would imply. It just means that it blocks if we try to play and the audio's already playing. This is still the only way to render sound properly, possibly due to tiny delays when context switching.
          • Our thread is still more efficient than SDL. Also, we can control the priority. We can also make the thread consume and another low priority thread the PCM producer for possibly higher efficiency.
          • Check if the timer (which is under heavy use in CoMI) also causes the clicking issue in that game because of long sound rendering. Maybe we can play with priorities, or maybe we need to put context switches in the timer/music/main code.
            • Yes. Audio thread must be more prioritized than timer thread or we get clicking. Can probably go back to previous way of doing audio (no blocking) if we want to fill more buffers before playing.
        • Also, SDL creates threads that ALL have VFPU bit set. That's not efficient when switching contexts.
        • Additionally, change priority so that MP3/MIDI rendering is in same priority and is fairer while getting called back is higher priority. This could make issues in 222Mhz like freezing when playing several sounds in IHNM better. If not, may need to add a taskSwitch in main scummvm mixer code (too much work before context switch)
        • Maybe add a task switch in the Mixer to allow main thread to do stuff if there's a lot of mixing (and it's all from memory).
      • Replace SDL Timer. Done
        • All we need is a simple sleeping thread
      • Remove SDL condition for PowerManager
        • If getting threadId is fast, we can just get it on every criticalSection entry, and register with our threadId. Every thread will say it's in critical section after checking powermanager hasn't notified of suspend. Powermanager will notify of suspend, then loop once over the threadIds, then delay 100ms, then loop again over threadIds. No mutex needed.
      • Replace SDL mutex. Easy.

To Do

  • PSP
    • MP3 playback with Media Engine
    • Optimize speed in general
    • Optimize video playback speed
    • Use libTimidity for music
  • Generic virtual keyboard: take my keyboard and make it available to all. Involves switching from bitmaps to vectors.
  • Generic ELF loader