Re: sata_sil24 broken since 2.6.23-rc4-mm1
- From: "Torsten Kaiser" <just.for.lkml@xxxxxxxxxxxxxx>
- Date: Sun, 30 Sep 2007 18:19:19 +0200
On 9/30/07, Tejun Heo <htejun@xxxxxxxxx> wrote:
Hello, Torsten.
Torsten Kaiser wrote:
On 9/28/07, Torsten Kaiser <just.for.lkml@xxxxxxxxxxxxxx> wrote:
So in case of -rc3-mm1 I'm pretty sure that it works.
That's still the case.
Ah... that's weird. It would be much better if -rc3-mm1 is broken too. :-P
It would be much better if it would break all the time. Then bisecting
would not be guesswork. :-P
Not completely sure is if 2.6.23-rc7-sglist kernel works. I booted
that 9 times, but from a quick look in /var/log/messages, I might not
have hit the "correct" situation to trigger the error.
That kernel is vanilla 2.6.23-rc7 plus the patch from
http://www.kernel.org/pub/linux/kernel/people/tomo/misc/v2.6.23-rc7-sglist-arch.diff.bz2
( http://marc.info/?l=linux-ide&m=119055574826083&w=2 )
That is no longer the case. Yesterday this kernel did also show the failure.
The error looked a little bit different, but happend at the same
location during the bootup.
Sadly dmesg overflowed and I was not able to capture the first error.
[ 53.462632] Freeing unused kernel memory: 332k freed
ite cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 77.170903] ata1.00: exception Emask 0x40 SAct 0x1 SErr 0x0 action 0x2
[ 77.170905] ata1.00: irq_stat 0x00020002, SGT no on qword boundary
[ 77.170908] ata1.00: cmd 61/08:00:09:d6:42/00:00:25:00:00/40 tag 0
cdb 0x0 data 4096 out
[ 77.170909] res 50/00:00:af:ea:42/00:00:25:00:00/e0 Emask
0x40 (internal error)
Hmm... SGT not on qword boundary? Please add the following to the end
of sil24_port_start() and report successful and failed kernel boot log.
printk("XXX sil24 cb=%p cb_dma=%llx\n",
cb, (unsigned long long)cb_dma);
Added to my current "2.6.23-rc4-mm0.5"
I will report it's output.
Also, does 'dmesg -s 10000000' give full dmesg?
That boot ended in a minimal initrd environment that normally only
starts the RAID5 and then opens contained encrypted real root.
I was just able to push the output from dmesg through the serial link,
but had no man pages to tell me about -s ...
And that kind of error was until now a one-of-a-kind one. All other
errors where not "internal error" but "timeout".
But one time I had another SGT related error:
Sep 11 19:19:24 treogen [ 33.340000] ata1.00: exception Emask 0x20
SAct 0x1 SErr 0x0 action 0x2
Sep 11 19:19:24 treogen [ 33.340000] ata1.00: irq_stat 0x00020002,
PCI master abort while fetching SGT
Sep 11 19:19:24 treogen [ 33.340000] ata1.00: cmd
61/08:00:09:d6:42/00:00:25:00:00/40 tag 0 cdb 0x0 data 4096 out
Sep 11 19:19:24 treogen [ 33.340000] res
50/00:00:af:ea:42/00:00:25:00:00/e0 Emask 0x20 (host bus error)
What I find kind of interessing is, that while I got three different
error codes the cmd part of the output was always the same.
Is it possible that the bug is much older, and only some change in
rc3-mm->rc4-mm makes it visible that the initialization of the SiI3132
is incomplete?
I can't tell but there is a pretty large userbase of sil24/32 and you
seem to be the only one to report this problem yet. I think it might be
coming somewhere else than libata or sata_sil24 itself. Hmmm... It
would be really great if you can break -rc3-mm1 too. Correct behavior
for something like that for just one -mm version sounds very weird.
It's not just 2.6.23-rc4-mm1. All -mm's after rc4 are broken for me.
Confirmed breakage on -rc4-mm1, -rc6-mm1 and -rc8-mm1. I'm just
narrowing on rc4-mm1 because that was the first version to break.
I'm currently trying to bisect 2.6.23-rc4-mm1. Here is the current status:
[the 2.6.23-rc4-mm1 series-file has 2013 lines]
Up to (incl.) x86_64-convert-to-clockevents.patch (line 747): 2 good boots
Up to (incl.) x86_64-cleanup-struct-irqaction-initializers-patch
(line779): 2 good boots
Up to (incl.) slub-optimize-cacheline-use-for-zeroing.patch (line
1045): 1 failed
Up to (incl.) fix-discrepancy-between-vdso-based... (line1461): 1 good, 1 failed
Next try: up to patch fs-remove-some-aop_truncated_page.patch
That means from the patches added to the rc4 variant of the mm-kernel
the following are remaining:
git-xfs-build-fix.patch
enforce-noreplace-smp-in-alternative_instructions.patch
paravirt-fix-preemptible-lazy-mode-bug.patch
i386-apic-fix-4-bit-apicid-assumption-of-mach-default.patch
fix-the-max-path-calculation-in-radix-treec-update.patch
mm-no-need-to-cast-vmalloc-return-value-in-zone_wait_table_init.patch
introduce-write_begin-write_end-aops-fix2.patch
implement-simple-fs-aops-fix.patch
ext2-convert-to-new-aops-fix2.patch
ext3-convert-to-new-aops-fix-fix.patch
ext4-convert-to-new-aops-fix-fix.patch
gfs2-convert-to-new-aops-fix.patch
reiserfs-convert-to-new-aops-fix2.patch
hostfs-convert-to-new-aops-fix-fix.patch
ufs-convert-to-new-aops-fix2.patch
sysv-convert-to-new-aops-fix2.patch
minix-convert-to-new-aops-fix2.patch
affs-convert-to-new-aops-fix-fix.patch
memoryless-nodes-add-n_cpu-node-state-move-setup-of-n_cpu-node-state-mask.patch
memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code-fix.patch
memoryless-nodes-fixup-uses-of-node_online_map-in-generic-code-fix-2.patch
update-n_high_memory-node-state-for-memory-hotadd.patch
slub-avoid-page-struct-cacheline-bouncing-due-to-remote-frees-to-cpu-slab.patch
slub-do-not-use-page-mapping.patch
slub-move-page-offset-to-kmem_cache_cpu-offset.patch
slub-avoid-touching-page-struct-when-freeing-to-per-cpu-slab.patch
slub-place-kmem_cache_cpu-structures-in-a-numa-aware-way.patch
slub-optimize-cacheline-use-for-zeroing.patch
But due to the unreliable nature of the bug, I can't be to sure about that.
Next version is compiled, now again switching the PC off for an hour...
Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Follow-Ups:
- Re: sata_sil24 broken since 2.6.23-rc4-mm1
- From: Tejun Heo
- Re: sata_sil24 broken since 2.6.23-rc4-mm1
- References:
- sata_sil24 broken since 2.6.23-rc4-mm1
- From: Torsten Kaiser
- Re: sata_sil24 broken since 2.6.23-rc4-mm1
- From: Tejun Heo
- Re: sata_sil24 broken since 2.6.23-rc4-mm1
- From: Tejun Heo
- Re: sata_sil24 broken since 2.6.23-rc4-mm1
- From: Torsten Kaiser
- Re: sata_sil24 broken since 2.6.23-rc4-mm1
- From: Jeff Garzik
- Re: sata_sil24 broken since 2.6.23-rc4-mm1
- From: Torsten Kaiser
- Re: sata_sil24 broken since 2.6.23-rc4-mm1
- From: Tejun Heo
- Re: sata_sil24 broken since 2.6.23-rc4-mm1
- From: Torsten Kaiser
- Re: sata_sil24 broken since 2.6.23-rc4-mm1
- From: Torsten Kaiser
- Re: sata_sil24 broken since 2.6.23-rc4-mm1
- From: Tejun Heo
- sata_sil24 broken since 2.6.23-rc4-mm1
- Prev by Date: Re: [PATCH] robust futex thread exit race
- Next by Date: Re: [PATCH RFC 3/9] RCU: Preemptible RCU
- Previous by thread: Re: sata_sil24 broken since 2.6.23-rc4-mm1
- Next by thread: Re: sata_sil24 broken since 2.6.23-rc4-mm1
- Index(es):
Relevant Pages
|