Saturday, March 5, 2011

Don't Crash When Reading Metadata

Each time a tooltip is shown in Dolphin or the Information Panel is enabled the metadata of a file must be retrieved:

History

When Dolphin has been written for KDE SC 3 this was straight forward: The class KFileMetaInfo has been used and the metadata was read in the context of the application within a short period of time. The fast retrieving was possible by using the hint KFileMetaInfo::Fastest that assured only up to 64 KByte of the file are read. This limit was sufficient for getting the most important metadata.

With the transition to KDE SC 4.0 KFileMetaInfo internally has been rewritten to use Strigi for getting the metadata. Like Dolphin and other components in KDE SC 4.0 also KFileMetaInfo and some metadata analyzers of Strigi had rough edges:
  • The KFileMetaInfo::Fastest hint simply has been ignored. Hovering a big video file resulted in reading the whole file. As this reading has been done in the context of the application Dolphin was completely blocked during that period of time.
  • When implementing the KFileMetaInfo::Fastest hint later it turned out that a few analyzers of Strigi simply decided to ignore this limit.
  • To bypass a blocking Dolphin I've moved the reading of the metadata into a thread. This worked well until it turned out that some analyzers are not reentrant and yet-another-workaround was required to prevent two threads using a similar analyzer.
  • Sebastian Trüg (although not responsible for this part of the code) investigated some cool work into KFileMetaInfo and found a way to let all Strigi analyzers respect the 64 KByte limit without changing the analyzers. I was very happy about this until it turned out that some analyzers crash when they get streams ending after 64 KByte...

Current State

The following things work now in KDE SC 4.6:
  • The quality of a lot of Strigi analyzers has been improved during the last years and most of them work very reliable.
  • The analyzers work fast by getting only 64 KByte to read.
  • In comparison to KDE SC 3 the metadata is read in a custom thread which makes the user interface even more responsive.
Still open:
  • Some analyzers are not reentrant. This is not a big showstopper as reading metadata in parallel is anyhow quite rare so the workaround to serialize the access won't result in a noticable slowdown from a users point of view.
  • Some analyzers crash (and hence also Dolphin crashes).
From a users point of view it is not important whether Dolphin crashes because of an analyzer or because of an issue in the Dolphin code. Although most of the analyzers are very stable in the meantime it is still a pain for users that e.g. have some Powerpoint-files where the corresponding analyzer simply cannot scope with the 64 KByte limit.

KDE SC 4.7

I understand that writing metadata analyzers can be very tricky, but I was always convinced that not crashing when reading data is not difficult. But no matter whether I'm right or wrong with this assumption: Fact is that more than 3 years after the release of KDE SC 4.0 a few analyzers still crash and seem to be without maintainers that care about this.

Well, with KDE SC 4.7 those crashes will not result in crashing Dolphin anymore. I've moved the reading of the metadata into a custom process. If an analyzer crashes then the worst thing that happens is that no metadata is retrieved. The nice thing is that this also solves crashes for other components that read metadata: E.g. the fileproperties-dialog or overwrite-dialog of kdelibs. In retrospective I should have done this already a lot earlier, but the fact that the old approach worked so well in KDE SC 3 just made me resist to use a dedicated process for this. In the meantime I'm convinced that this approach is the right way in the longterm: It is simply risky if the stability of an application or some kdelibs-dialogs depend from external plugins where the quality of the code varies a lot.

So let's hope this is my last metadata-related post and that this topic is solved finally :-)

Update: Carsten Pfeiffer pointed out that in KDE 3 the KFileMetaInfo::Fastest hint did not restrict the parsing to 64 KByte. Instead it was up to the analyzers itself how they provide metadata that can be calculated cheaply. Too sad that most of the current Strigi analyzers don't support this.

15 comments:

Anonymous said...

Thanks a lot for this.

The bug is _really_ annoying, I suffered from it a lot and already submitted it via dr konqi on bugs.kde.org. I am happy to see that this finally gets solved. Pity though that I'll either have to wait until summer or use git/svn versions.

Keep up the good work and thanks for dolphin.

uetsah said...

> KDE SC 3

Is it official terminology to retrospectively add "SC" to the names of old KDE releases which were released when the software compilation was actually still called "KDE" or "K Desktop Environment", and not yet "KDE SC"?

It looks a little weird.

Anonymous said...

Thank you for this! Much appreciated.

maninalift said...

Presumably this is not an issue if the files have already been indexed. Right?

mwiesweg said...

As much as I appreciate pragmatic solutions: It is possible to fix crashes in code, as long as the code is maintained (maintainer + sample file + reproducible crash = fix). Crashing, unmaintained code should be removed from KDE SC (at least blacklisted), better not having PPT files indexed than a crashing dolphin?

Peter Penz said...

@uetsah:
> Is it official terminology to
> retrospectively add "SC" to the
> names of old KDE releases which
> were released when the software
> compilation was actually still
> called "KDE" or "K Desktop
> Environment", and not yet "KDE SC"?

I don't know... It seems that "SC" is not used at all anymore during the latest releases and I agree it looks somehow strange. I'll probably just write "KDE 4.x" in future.

@maninalift:
> Presumably this is not an issue
> if the files have already been
> indexed. Right?

Right :-)

@mwiesweg:
> As much as I appreciate pragmatic
> solutions: It is possible to fix
> crashes in code, as long as the
> code is maintained (maintainer +
> sample file + reproducible crash
> = fix). Crashing, unmaintained
> code should be removed from KDE
> SC (at least blacklisted), better
> not having PPT files indexed than
> a crashing dolphin?

I generally agree that the analyzers should be fixed in the first place, but not all analyzers are part of the KDE SC. Any kind of application may install a custom analyzer (e.g. like done by Amarok). Also blacklisting analyzers means having no metadata for specific filetypes at all, but some analyzers works well but crash only on some rare cases like corrupt files (e.g. the ppt-analyzer works and only fails "sometimes").

Andreas said...

Creating a process is relatively cheap on Linux, IIRC somewhere around 1-10 million CPU cycles. KIO regularly creates many processes, it usually doesn't hurt performance and if it does it's mostly due to further setup work after process creation. We can do something about that.
I am convinced that a process to generate previews is a good idea.

jospoortvliet said...

Maybe it would be good to at least ensure the process will stay around and gets re-used (if you don't do that already)...

About the KDE SC etc story:

We don't change the name of older releases. Dolphin once ran on KDE 3, keep that name. From KDE SC 4.0 onwards we say SC if we mean both workspaces, apps & platform. Most of the time you don't need to say SC. Dolphin ships as part of Plasma Workspace 4.6 and runs on KDE Platform 4.6. So in Plasma 4.7 Dolphin is fixed. Or Dolphin on Platform 4.7 will be fixed, both fine.

Carsten Pfeiffer said...

Thanks for caring about this, Peter

AFAIR, KFileMetaInfo and its plugins weren't actually restricted to read "the first 64K". It was rather a hint to the analyzers that they shall provide only that information that can be calculated cheaply.

Depending on the file format, that information may be stored in the beginning or even somewhere at the end.

Cheers,
Carsten

Carsten Pfeiffer said...
This comment has been removed by a blog administrator.
Anonymous said...

Cool feature.

Also sounds like a perfect app to protect with AppArmor & Co. to prevent things like this: http://www.thesecuritysamurai.com/2011/03/02/usb-as-a-vector-of-network-attack-by-diego-ramirez-soc-analyst/

Peter Penz said...

@jospoortvliet: Good to know, will take care to use the terms correctly in future :-)

@Carsten Pfeiffer:
> AFAIR, KFileMetaInfo and its
> plugins weren't actually restricted
> to read "the first 64K".

Thanks for clarification, this is quite interesting. Too bad that this approach does not seem to be respected by most of the Strigi analyzers...

moltonel said...

Nice :)

While you're at it, as anonymous said, it would be great if the analyzer was sandboxed so that we're less vulnerable to the filepreview exploits that are currently fashionable.

I wouldn't go with apparmor or selinux though, since they're less widely available than seccomp. Seccomp is ideal for this job : just open a pipe between dolphin and the analyzer before you fork()&&seccomp(), and then you can feed files to the anlyzer in a crash- and exploit-safe way.

Anonymous said...

Peter, I just noticed that Dolphin could crash when showing the metadata of some .zip files. Should I file a bug about this specific indexer ?

(using Archlinux)

Cheers ; thanks again for your amazing work !

PS : sorry for being annoying, but I think this is a problem if people who use a distro based on KDE 4.6.x always get this issue. This is pretty serious (filemanager crash !). What do you think ? I kind of tend to think stability is the most important feature :-)

PPS : would love to support you financially (like what is possible with Aurélien Gâteau or Raphaël Hertzog for instance)

Risto H. Kurppa said...

Running Kubuntu 11.04 with KDE 4.6.something, bumped into this, installed dbg packages, the bug reporting tool suggested me some similar bugreps, I found out that it's already fixed -> upgrading to KDE 4.7.x -> FIXED. Gotta love Open Source :) Thank you for your work!!!!!