Thursday, September 9, 2010

Performance issues, one script, and call for testers


Since kde4.5 is out, a number of users with various NVidia graphic cards suffer from performance issues when using the oxygen style:
  • Lag, when scrolling some large views, like, e.g. in Dolphin
  • system becoming unresponsive over time
  • etc.
It is hard for me to anticipate such issues, as well as to fix them, since I have an integrated Intel graphic card, and none of these are present (believe me, my oxygen is snappy, I would not commit the changes otherwise). The curl pit appears to be the number of pixmaps that oxygen allocates, and stores in caches, to perform its animations. The fuller the caches would become the more unresponsive the system would be. I'm not 100% sure about it, but so it seems.

So I've been trying to optimize the code, and reduce the number of allocated pixmaps. Its a bit like shooting in the dark, since I can't see much difference here, from one change to the other.

Last night I made a decisive step in this direction by applying some discretization to the various animations, effectively reducing the number of cached pixmaps by a factor 10 to 20 (it's configurable), normally without any noticeable difference to the eye.

Now, well, I need testers (with an NVidia graphic card, and with a driver for which some of the problems above are present).

For users willing to help, I wrote a script available here that allows one to checkout, configure, compile, and install oxygen sources from kde svn trunk, without the need for compiling anything else of kde.

It should work against any kde4.4 and kde4.5 version, as well as trunk, naturally.

To use the script one must:
  • create a clean directory
  • run the script and follow instructions
One needs to have the necessary development packages installed for the sources to compile. They are: gcc-c++, subversion, cmake, kdebase-workspace-devel, libxrender-devel, libx11-devel (note that the names might change from one linux distribution to the other).

Once the code is successfully compiled and installed, any newly launched application should use the latest oxygen as opposed to the one provided by your distribution (which gets erased in the process).

I'll be available for debugging, in case of trouble.

If the patches I committed to trunk recently are effectively fixing the issues above, I'll backport them to the kde4.5 branch (and to oxygen-transparent), so that hopefully kde4.5.2 can benefit from it.

Notes:
  • kde anonymous svn is sometimes not very responsive and the checkout or update will fail. Just take a deep breath, wait 5 minutes, and retry.
  • in case one wants to revert to the 4.5 version of oxygen, one can run the same script (in a separate directory), with the additional argument --branch 4.5
Edit:
  • Since the feedback on the recent commits is largely positive, last night I backported the changes to the kde4.5 branch. So that hopefully this oxygen+nvidia issue will all be fixed in kde4.5.2. You can run oxygen-setup.pl --branch 4.5 to get this code, instead of the one from trunk (which has many other unrelated changes).
  • I also backported the change to oxygen-transparent

66 comments:

  1. i have an nvidia 8800 GT with nvidia official driver from kubuntu and i did have the problem described

    after i installed the script, the situation is much, much better. after 100 clicks (opening different movies, music and large folders), it did lag only twice (before it did lag much more, i kinda got used with that)

    so yes, for me it worked.. if something changes, i will come back here and tell u. thanks

    ReplyDelete
  2. I'm also testing on an older but still supported 7600GT.
    What if I disable pixmap cache? This will result in slower oxygen animations? If so, that is ok since I don't use them many.

    ReplyDelete
  3. On Kubuntu maverick, all updates I get the following running the script:


    perl oxygen-setup.pl
    --- oxygen easy setup script

    --- checking out source code
    cd /home/lindsay/projects/oxygen
    --- common
    svn co svn://anonsvn.kde.org/home/kde/trunk/KDE/kdebase/workspace/libs/oxygen
    svn: No such revision 1173737
    system svn co svn://anonsvn.kde.org/home/kde/trunk/KDE/kdebase/workspace/libs/oxygen>&1 failed: 256

    I've seen this happen erratically with kdesvn-build. No one's ever been able to satisfactorly resolve it.

    ReplyDelete
  4. Maybe temporary server issue?
    I've sucessfully used script on 2 opensuse machines so far.

    Retry a bit later

    ReplyDelete
  5. Yes, I just retried now and it was successfull. Seems more response, I'll see how it goes.

    GeForce 9400GT

    ReplyDelete
  6. Thanks for the script! That is indeed useful.

    ReplyDelete
  7. Does someone have problems with keyboard input after that?
    Sometimes keystrokes are ignored, this happened in mplayer and kopete so far.
    I've only installed this script.

    ReplyDelete
  8. And here I thought I was the only one experiencing this. I have a laptop with a 7600 Go, running the proprietary 256.53 driver from Ubuntu maverick's nvidia-current package.

    Sometimes Yakuake takes ~3 seconds to open up after I hit the global shortcut.

    Will test scipt now.

    ReplyDelete
  9. @kriko,
    in doubt, logout/login when you have a chance.
    Really, the change should be unrelated, but ...
    And check your cpu usage (look for X taking 100% of cpu)
    And, well, keep me posted :)

    ReplyDelete
  10. Hi,
    I tried your latest build, and well, the laggy scrolling problem is still not resolved. But other things seems to be a little bit smoother than before.

    ReplyDelete
  11. @Nece228
    yeah I'm not surprised. The scrolling is more of a painting performance issue (painting these shadows seems slow on some nvidia). My latest changes does not address that.
    I have more hopes concerning performaces deterioration over time (and overall memory footprint)

    ReplyDelete
  12. Tested it on GeForce 8400 GS. Performances doesn't drop over time but there is stille a problem : X uses 15% - 25% cpu when I move windows.

    ReplyDelete
  13. @RG:
    "X uses 15% - 25%"
    This I see too, but was assuming it is normal behavior.
    Is it different when you use other styles ?
    Do you have kwin effects enabled ?

    ReplyDelete
  14. Hi,
    thanks for this improvements, Hugo! I've tested it on kubuntu maverick with up-to-date packages and looked at the pixmaps count in xrestop. The pixmaps count is now half of what it was before for plasma-desktop.

    ReplyDelete
  15. Hello Hugo!

    I also had the problems you described. I heavily use KDevelop and Kile at the same time for my master thesis. After a while the system lagged and was not usable anymore!

    I tried your patch the hole day now and it seems that the problems were nearly gone. It's not as snappy as I want it to be, but it is definitely a huge improvement! Thanks man! This should go into KDE SC 4.5.2.

    Here some information about my system:
    - nvidia geforce 8400m gs
    - ArchLinux KDE SC 4.5.1 binary packages
    - nvidia 256.53
    - xorg-server 1.8.1.902

    ReplyDelete
  16. On Gnome, compiz enabled with, all effects on, < 15 %.

    ReplyDelete
  17. in fact, Gnome + compiz, average cpu load : 7%.
    Kde was unusable before executing your script, now it lags sometimes but it's usable. Thanks a lot, I'm back on my favorite desktop.:-)

    ReplyDelete
  18. I will try your changes. Note what I have found to be effective against this lagging, is flushing the pixmap cache of the nvidia card using:

    nvidia-settings -a PixmapCache=0
    nvidia-settings -a PixmapCache=1

    If things start to get slow, I do this, and the lagging is gone.

    ReplyDelete
  19. Would it be possible for you to post a diff from either KDE 4.5.1 or the KDE 4.5 branch in kdesvn? I'd find that easier to deal with.

    ReplyDelete
  20. @skitterman

    I'd rather not do that. I usually commit the change directly to SVN myself, and generating diffs everytime is kind of a pain. I'd rather do that only once when I backport the change. (which, according to the number of positive comments already, should occur rather quickly).

    Sorry

    ReplyDelete
  21. @Joris

    Yes, I'm aware of the fact that flushing the nvidia pixmap cache does the job. That's what made me believe that caching too many pixmaps is part of the issues.
    My latest commit strongly reduces the number of pixmaps needed for the animations, so that it should make you solution unnecessary. (I wish the nvidia drivers would do this themselves, the cleanup, if this is a limitation of theirs).

    ReplyDelete
  22. For those also experiencing this issue you may want to CC yourself to the original bug here:
    https://bugs.kde.org/show_bug.cgi?id=242653

    ReplyDelete
  23. When installing your script, how easy is it to clean up after it? I've always been against manual compilation and instead prefer leaving things up to the package manager (in my case, Portage, of which it's quite easy to hack together an ebuild for).

    ReplyDelete
  24. @Moult
    Well, not easy. Uninstalling is one thing, but you also need to get back the old oxygen, which got erased in the process. This requires either
    - re-install the original package
    - re-run the script (in a different directory) with additional option "--branch 4.5"

    ReplyDelete
  25. Things seem a lot snappier for me as well. I am using a Quadro NVS 140M laptop card with the latest stable drivers 256.53 from openSUSE's Nvidia packages.

    ReplyDelete
  26. So I ran the script and it got a lot better, but it seems to me that animations are now less fluid? I'm not 100% sure about that one though, since it's a very subjective experience.

    Looking at the output of xrestop, is the value I should be monitoring kwin's total memory use? It's at ~ 9mb after an uptime of 39 hours. CPU usage is within reason.

    plasma-desktop seems to progressively grow though. It was at 16mb at boot and now it's at 34.5mb, but perhaps that's unrelated.

    ReplyDelete
  27. Animations are fluid here, but something is still eating memory away (not cached or buffered) and after days of usage swapping occurs.

    ReplyDelete
  28. @Zorael
    To be honest, I have the same feeling too, and as soon as I tried looking at it more closely, it sort of dissapear.
    In any case you can try to play with the "AnimationSteps" hidden parameter that I added to tune the thing.
    It must be set in ~/.kde4/share/config/oxygenrc (or ~/.kde) under the [Style] section.

    The default value is
    AnimationSteps=10

    The maximum (which is laggy), would be:
    AnimationSteps=256

    You can set anything in between, and find the value you prefer in terms of smoothness, and that doesn't make your system freeze after a while. (I'd try 20, or 50. That should be enough).

    concerning the growing of plasma usage: plasma has its own caches too (on top of the one from oxygen, which should be largely unused). So oxygen is likely off the hook.

    To make sure of that, you can restart plasma, using a different widget style:

    kquitapp plasma-desktop;
    plasma-desktop --style plastique&;

    And see if the memory keeps growing.

    of course you loose the nice oxygen menus when doing so, but this is just a test.

    ReplyDelete
  29. @kriko
    The problem that I tryed to fix addresses GPU memory, not RAM. So the grow of RAM usage until swapping might well be something else, and even unrelated to oxygen.

    The output of 'top' when your system starts swapping would help.

    Also: did you already have this behavior before testing my changes ? Is it still there if you (temporarily) use another theme ?

    Last time I checked oxygen against valgrind (some months ago), to look for memory leaks, I did not find any.

    ReplyDelete
  30. All good so far. Indeed, seems snappier. Will these changes be in oxygen-transparent too?

    ReplyDelete
  31. great! i'm looking forward to use nvidia drivers again and not nouveau.

    Ah, and i have the swapping problem too(but also with nouveau). but top never gave any interesting output. It tells me that no app uses much ram, but the total ram usage is trough the roof

    ReplyDelete
  32. @beat wolf,
    Is the memory increase and swap still there when you use another theme ?

    ReplyDelete
  33. I've just saw the keyboard not responding simptom on another computer - opensuse 11.3 with 4.5. and it happens right after applying oxygen patch.
    Now I've updated packages and reapplied the patch, will see how it goes.

    One way to reproduce:
    - open mplayer, pause it
    - open dolphin
    - click on dolphin, mplayer, desktop, repeat cycle once more
    - click on player and press eg. space - playing will not resume (same goes for other apps)

    ReplyDelete
  34. @kriko
    The problem you mention should be quite unrelated to my changes to the animations and pixmap cache.
    Now it might well be another issue that has been added to oxygen in trunk (though I have not been able to rerpoduce so far).

    If the issue was not there with kde4.5 (unpatched), you can try run the script with '--branch 4.5' as argument. That should give you 4.5 + my animation patch (and the issue should not be there).

    I'll keep investigating in the meanwhile

    ReplyDelete
  35. I've had problems with both the mouse and keyboard not responding periodically. I'll see if it is still there after switching to the KDE 4.5 branch.

    ReplyDelete
  36. @toddrme2178, kriko
    Damn it. Another problem that I cannot reproduce here.
    What happens ? Do they (keyboad and mouse) recover after some time ? Or do you have to reboot ? How often is periodically ?

    ReplyDelete
  37. It would happen every few minutes, and last maybe 3-5 seconds. The mouse and keyboard became inactive simultaneously. I am not entirely sure it is a KDE issue, I will know when I reboot next time.

    ReplyDelete
  38. Hi, I've build kde 4.5 from trunk, and I must say things seem a lot better. While my machine is actually an intel based laptop (4500MHD/i915), and performance hasn't changed /much/, memory use has /greatly/ improved. I've been told that the intel gem driver will store as many pixmaps as you allocate in system memory, much of it will show up as "cached" memory that you can't free up while the app is using it is running.

    For a long time I've used the same basic kde session setup. KDE will load a crap load of applications at login, and up till recently would easily consume over 1.5G to 2G of ram. With this change, and a switch to Qt 4.7 it seems that memory use is back down under 1G.

    I have yet to test this change on my nvidia card, but I will soon and report back. Thank you so much for working on this.

    ReplyDelete
  39. After days of testing seems cool.
    However I'm still trying to figure out why such high memory usage and swapping. any tools for that?

    ReplyDelete
  40. @Tomasu

    I heard the same story about intel (which is what I run here), so your observations make sense, and I have the same here. In the past I did not worry that much about RAM, since computers have so much of it nowadays.

    Waiting for you report on Nvidia.

    @kriko
    Well, if 'top' doesn't show any application in particular I'm a bit out of idea. Again, do you have the same high memory and swapping issue if you use another style ? Another window decoration ? Another window manager ? (e.g. compiz instead of kwin) ?
    Do you use like 'one application' that you would keep running for ages, or do you keep opening, closing them randomly, before the swap starts ?

    ReplyDelete
  41. but good to know that i'm not the only one with that swapping issue where no app comes out in particular with top.

    I for example have like 2.8GB ram usage, but if i calculate the ram usage, i have no more than 1.3 GB. Some might be disk caches, but it is growing over time.

    here a screen to show the problem (thats after less than a day of usage). http://www.fryx.ch/Asraniel/ramUsage.png

    i use nouveau. just giving some input in case this helps to solve a puzzle.

    ReplyDelete
  42. @Beat Wolf
    I've always been confused by the output of top and ksysguard, and notably if virtual memory is a bad thing or not. But if it is, doesn't your ksysguard screenshot says that some Java process and Amarok eats up all your memory ?
    What's this java thing ? Actually ?
    (definitely oxygen unrelated)

    ReplyDelete
  43. PS: from what I recal, I do think that virtual mem is bad.

    ReplyDelete
  44. and again: Does the same behavior occur if you use a different style. (really, getting an answer on this would tell me whether I am off the hook or not).

    ReplyDelete
  45. I see kopete uses ~700MiB and amarok around 1GiB. This is crazy.
    I don't think it's oxygen related, though I didn't tested with other styles, since there is no contemporary and elegant one beside oxygen to replace it.

    ReplyDelete
  46. Yeah! Really improved performance. But I noticed that when using the Blur effect, cpu hit 30%. Just mentioning it though I'm sure it has nothing to do using Oxygen :-)

    ReplyDelete
  47. Now I'm back on default oxygen in 4.5 and yes - it's laggy, I get "reserve_memtype failed" in dmesg after time.
    Switching e.g. mplayer to fullscreen creates a noticeable delay in seconds, probably because vid mem is exhausted.

    Unfortunately with updated oxygen I get the mouse / keyboard not responding problem even with branch 4.5.

    ReplyDelete
  48. @kriko

    Argh. The keyboard/mouse lock really s..ks. And I am totally clueless about what might cause that ... I guess I'll go in more details in which commits have been made to the 4.5 branch. I really don't think that the changes to the cache handling can create that.

    For the record:
    - you confirm that it does not happen with 'native' oxygen@4.5 (the laggy version)
    - with the patch, do you still have locks if you disable animations ?

    ReplyDelete
  49. - I can confirm it doesn't happen with native oxygen
    Now I'm testing with branch 4.5 patch, will keep it using for a few days without updating the machine to see if it resurfaces
    If it does, then I'll disable animations.

    ReplyDelete
  50. i was experiencing bad performance with my 8800GT with a e2140 at 3ghz. I don't know if this can help however adding this lines to the xorg.conf gave me a more usable environment:

    Section "Screen"
    Identifier "Screen0"
    Device "Device0"
    Monitor "Monitor0"
    DefaultDepth 24
    Option "AddARGBGLXVisuals" "True"
    SubSection "Display"
    Depth 24
    EndSubSection
    EndSection

    Section "Extensions"
    Option "Composite" "Enable"
    EndSection

    The important lines are:
    - Option "AddARGBGLXVisuals" "True"
    - the extensions section

    ReplyDelete
  51. This doesn't fix the problem for me I'm afraid.

    I running Kubuntu Maverick on NVIDIA 8400M with KDE 4.5.1.

    When I start KDE its fine but then gets slower the more I do, xorg using more and more cpu.

    Things can be reset using the following but it only works for a while.

    nvidia-settings -a PixmapCache=0
    nvidia-settings -a PixmapCache=1

    It feels like with the patch it takes longer to slow down about 1 hour compared to 30 mins before but it all depends on what you are doing.

    ReplyDelete
  52. @Nick
    Sad to hear ...
    Question: how does your system behave when you use another style ? (e.g. plastique, compared to patched oxygen) I'm asking that because I heard plasma (who also caches quite some pixmaps, independently from oxygen), also suffers from a similar issue ...

    ReplyDelete
  53. I finally have a working desktop. Thank you for all your hard work.

    I running KDE 4.5.2a in Kubuntu Maverick 64bit with nvidia 260.19.06.

    KDE 4.5.2a includes your patch and I think that combined with changing to the Plastik Winodow Decoration has made the most difference (I'm still using the Oxygen style).

    I have removed all the settings from my xorg.conf except for Option "PixmapCache" "1" which is definitely needed. Without it xorg still causes a slow down.

    Konsole has been completely unusable for me for a while now. I think its the new nvidia driver which fixes this problem so konsole is now usable again.

    ReplyDelete
  54. KDE or Oxygen(i dont know) team should testing kde on nvidia cards before release. Your script dont work on all geforce gxxxm on my card g103m same situation on my girlfriend g105m and my brothers 7100 gs too... from kde 4.3 i have problem with nvidia and kde, i love kde but i cant work on them. Problem is NVIDIA. You should thinking about optimalization kde for nvidia cards.

    PS sorry for bad english but i hope you understand me.

    ReplyDelete
  55. Please, I wait for kde 4.6 but guys making good oxygen style i dont have problem with bespin or qtcurve, only with oxygen.

    ReplyDelete
  56. @Piotr.

    I'd gladly love to test on nvidia card if I have one. You buy me one and I'll test. Besides, many people here (and elsewhere) did report improvement (with the script and with kde4.5.2) on their nvidia cards. Does that mean that I should test with all cards ? And all driver versions ? You understand that this is not possible. You, on the other hand with your card and your driver, are welcome to join and test oxygen before release. That is how open source code works. I am not blaming you for not doing it, but don't blame me for not being able to test all possible configurations. We're not a company that sells products. Fair enough ?

    On your comment about bespin and QtCurve, well, this is different code, they both have other issues (and their respective authors will admit it).

    on the kde4.6 release improvement, I must admit, I am out of ideas on how to fix this better than I have already done. So this is unlikely to happen. Yes I am working hard on delivering a good oxygen style (in fact since kde4.4, and some others did before me). If this was not each of us primary motivation, I guess we would have no point in doing that anyway. Does that make sense ? So the comment is somewhat irrelevant.

    I have read reports that nvidia drivers (some of them) do leak in the GC pixmap cache. This is not oxygen's fault. Maybe you're card/driver is in this category (obviously not all of them are). Oxygen does use pixmap cache, because it is available, there is no 'hack' around it. We won't revert this decision because not every driver honor this availability properly. Other styles made different choices, good for them. (again, they have different issues).

    Sorry if I can't help more here. Limited manpower, limited hardware, limited amount of (free) time dedicated to the job. Feel free to join and help making kde+nvidia a nicer experience.

    Oh and finally, doesn't your comment
    "i dont have problem with bespin or qtcurve" contradict "from kde 4.3 i have problem with nvidia and kde" ? Not willing to be picky but the clearer the statements the easier for us to try help.

    ReplyDelete
  57. PS: (Piotr again).
    Did you try the suggested solution above, namely:

    nvidia-settings -a PixmapCache=0
    nvidia-settings -a PixmapCache=1

    If this does help (for some time), then I guess this proves oxygen is not to blame for the issues you have.

    (My fix does not try to fix 'that'. It tries to limit its occurence by being less resource hungry.

    ReplyDelete
  58. "'d gladly love to test on nvidia card if I have one. You buy me one and I'll test. Besides, many people here (and elsewhere) did report improvement (with the script and with kde4.5.2) on their nvidia cards. Does that mean that I should test with all cards ? And all driver versions ? You understand that this is not possible. You, on the other hand with your card and your driver, are welcome to join and test oxygen before release. That is how open source code works. I am not blaming you for not doing it, but don't blame me for not being able to test all possible configurations. We're not a company that sells products. Fair enough ?
    "

    Yep, sorry :) I didnt want to give you offense ;) I understand you and another sorry

    Do you have link for report about nv card which dont support GC pixmap cache?

    ReplyDelete
  59. Aren't you posting this (updating) for the 10th times now? ;)

    thorGT

    ReplyDelete
  60. Thanks for pointing out where this problem stems from. I've been looking for alternatives to KDE as it has been basically unusable for quite awhile now, with every user interaction going down the lag hell drain after a short time being logged in.

    I never realized that this was an issue with Oxygen. I totally blamed KDE as a whole. To be able to simply switch the style and get back being able to use the computer is great! I just downloaded bespin and switched to that, and so far all responsiveness has been restored!

    I think it would be a good idea to not have Oxygen as the default style seeing as how most users would have no clue about this bug. They'd be much more likely to just discard KDE as being extremely slow and laggy, while that's not the case.

    So happy to have gotten back my system :) Thanks for the tip!

    ReplyDelete
  61. @Daniel:

    Lets get it clear: the blame is not on KDE, nor on oxygen, its on crappy graphic card drivers. That bespin was able to work around it is a good thing, but does not make Oxygen a lesser style.

    Oxygen will remain kde's default style notably because your situation is (fortunately) not the majority. I would notably never commit code that render *my* system(s) all slugish.

    Finally this situation has supposedly improved quite some for more recent versions of KDE (and oxygen) , as well as GC drivers.

    For my curiosity, which KDE version are you using ?

    ReplyDelete
  62. Sorry if I sounded like dissing Oxygen. I think it's the nices style every on KDE, much nicer than Bespin. It's only that if it cripples the computer when you have an nVidia card (AFAIK the favorite discrete brand among Linuxers, as their support is much better than for ATI cards), then I just think that would be a pretty good reason for not using Oxygen by default, at least when nVidia is detected. AFAICT, all other styles work fine on nVidia, it's just the Oxygen one.

    I'm currently on KDE 4.6. I wanted to ditch KDE altogether seeing as how slow it was on 4.5 and previous 4.x versions, but then I discovered that Linux Mint's KDE version just came out with the 4.6 version, so I tried it out to see if was any less laggy (noticing improvements to KWin among the feature/changes list).

    As it turned out, it was just as laggy and unusable, so I gave it one more Google search and ended up here. Switching to a different style so far works perfectly. All UI operations and Compiz stuff is suddenly very very smooth :)

    So all in all, I'm just one happy camper to have found this thread. Of course, should Oxygen be fixed, I'd switch right back to it as it's really very nice on my eyes, but untill then bespin will do just fine :)

    ReplyDelete
  63. @Daniel

    Well, I do have an nvidia at home, and none of the problems that you report (definitly not with kde4.6) ...

    So then we could go discuss the drivers you have, the type of graphics cars, etc., but this is not the topic of this post.

    It is just to illustrate that not all NVidia have issues with oxygen, and that there is no reason for not having oxygen the default.

    There are other tricks elsewhere to fix things with nvidia. You can turn off animations in oxygen, you can periodically flush your GC pixmap cache -I think that's the curlpit), or yes: you can use another style :)

    ReplyDelete

Followers